基数是CBO通过行资源或行资源组合评估出来的操作将产生的行数。在有些情况下,结果集的基数可能被评估错误。最常见的是使用复杂谓词且统计信息不能精确反映出谓词相关联的数据。例如:select ename from emp where sal >= 25000 and job = 'PRESIDENT';
在这个例子中,在sal与job列中的数据存在一种隐藏的关联。只有董事长才会挣的比$25,000多。没有其它的员工满足sal>=25000。优化器没有办法检查这种关联且会将这两个列单独对待。这将会对谓词产生一种不理解的选择率以及不精确的基数评估。
在Oracle 11g及以上版本通过对一组列创建扩展统计信息来解决这种问题。
SQL> create table t1(t_name varchar2(20) not null,t_country varchar2(20) not null,t_town varchar2(20) not null); Table created. SQL> desc t1 Name Null? Type ----------------------------------------- -------- ---------------------------- T_NAME NOT NULL VARCHAR2(20) T_COUNTRY NOT NULL VARCHAR2(20) T_TOWN NOT NULL VARCHAR2(20) SQL> begin 2 for i in 1..50 loop 3 insert into t1 values('JY','CHINA','QIHE'); 4 end loop; 5 for r in 51 .. 400 loop 6 insert into t1 values('A'||r,'USA','NewYork'); 7 end loop; 8 end; 9 / PL/SQL procedure successfully completed. SQL> commit; Commit complete.
收集表t1的统计信息
SQL> exec dbms_stats.gather_table_stats(ownname=>'JY',tabname=>'T1',estimate_percent=>100,method_opt=>'for all columns sizeauto',cascade=>true); PL/SQL procedure successfully completed.
查询t_name='JY'且t_country='CHINA'的记录为50
SQL> select count(*) from t1 where t_name='JY' and t_country='CHINA'; COUNT(*) ---------- 50
执行查询发现优化器评估的基数为7,与50相差很远
SQL> select * from t1 where t_name='JY' and t_country='CHINA'; 50 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3617692013 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 7 | 119 | 3 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| T1 | 7 | 119 | 3 (0)| 00:00:01 | --------------------------------------------------------------------------
创建组列
SQL> declare 2 cg_name varchar2(30); 3 begin 4 cg_name := sys.dbms_stats.create_extended_stats(null,'t1', '(t_name,t_country)'); 5 end; 6 / PL/SQL procedure successfully completed.
对创建的组列收集统计信息
SQL> exec dbms_stats.gather_table_stats(null,'t1',estimate_percent=>100,method_opt=>'for columns(t_name,t_country) size skewonly'); PL/SQL procedure successfully completed. SQL> SELECT extension_name, extension FROM user_stat_extensions WHERE table_name='T1'; EXTENSION_NAME EXTENSION ------------------------------ -------------------------------------------------------------------------------- SYS_STUVJ3HB84$ZIMSQJE_3AMWZ_9 ("T_NAME","T_COUNTRY") SQL> SELECT e.extension col_group, t.num_distinct, t.histogram 2 FROM user_stat_extensions e, user_tab_col_statistics t 3 WHERE e.extension_name=t.column_name 4 AND e.table_name = t.table_name 5 AND t.table_name='T1'; COL_GROUP NUM_DISTINCT HISTOGRAM -------------------------------------------------------------------------------- ------------ --------------- ("T_NAME","T_COUNTRY") 351 HEIGHT BALANCED
创建组列收集扩展统计信息后,其评估基数为47与实际的50相差很小,对于估算已经是很准确了
SQL> select * from t1 where t_name='JY' and t_country='CHINA'; 50 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3617692013 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 47 | 846 | 3 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| T1 | 47 | 846 | 3 (0)| 00:00:01 | -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("T_COUNTRY"='CHINA' AND "T_NAME"='JY') Statistics ---------------------------------------------------------- 8 recursive calls 0 db block gets 12 consistent gets 0 physical reads 0 redo size 1224 bytes sent via SQL*Net to client 452 bytes received via SQL*Net from client 5 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 50 rows processed
对于列之间存在关联关系的列创建Column Group Statistics可以让优化器进行更精确的基数评估从而选择最优执行计划来提高性能