转载

DBMS_STATS.AUTO_SAMPLE_SIZE的值是什么？

昨天类总在微信公众号，给我留言，

这是2014年写的一篇文章(http://blog.csdn.net/bisal/article/details/18910785#reply)，看了一下，当时的实验和说明是，

SQL> exec dbms_stats.gather_table_stats(ownname=>'SYS', tabname=>'T2');
PL/SQL procedure successfully completed.

查询dba_tables表，看到NUM_ROWS值是11218，说明此处采样比例是100%。

这里必须纠正，我的说法有误，不能因为从dba_tables中看见了NUM_ROWS值和表实际记录数相同，就认为默认采样比例就是100%。崔老师书中说了，11g默认值是DBMS_STATS.AUTO_SAMPLE_SIZE。

从官方文档看，gather_table_stats的estimate_percent参数，取值范围是[0.000001,100]默认值是DBMS_STATS.AUTO_SAMPLE_SIZE，

DBMS_STATS.AUTO_SAMPLE_SIZE的值是什么？

AUTO_SAMPLE_SIZE是一个NUMBER类型的常量，默认值是0，表示采用自动采样算法，

DBMS_STATS.AUTO_SAMPLE_SIZE的值是什么？

问题来了，AUTO_SAMPLE_SIZE下Oracle采用的采样比例究竟是什么？究竟之前我所说的默认比例是100%，是否完全错误？

(1) 9i和10g的描述，How to Gather Optimizer Statistics on 10g (文档 ID 605439.1)

DBMS_STATS.AUTO_SAMPLE_SIZE的值是什么？

指出，

(1) 9i中ESTIMATE_PERCENT自动采样比例默认为100%。

(2) 10g中ESTIMATE_PERCENT自动采样比例默认为DBMS_STATS.AUTO_SAMPLE_SIZE，注意这指出该值非常小。

(3) 11g中ESTIMATE_PERCENT自动采样比例默认为DBMS_STATS.AUTO_SAMPLE_SIZE，注意这说的是一个相对larger的estimate percentage，一直到100%。

Note that on 11g, although using auto size for ESTIMATE_PERCENT tends to default to 100% ,because this is an auto sample, the engine may still decide to use a different sample size for tables and columns. This means that Column statistics could still be gathered with a small sample size and create a histogram that is missing key values. When ESTIMATE_PERCENT is set to a specific numeric value, that value will be used for both the table and columns.

Additionally, even though a 100% sample is collected, the gathering process is really fast since a new hashing algorithm is used to compute the statistics rather than sorting (in 9i and 10g the "slow" part was typically the sorting). In 10g, support experience has shown that the default ESTIMATE_PERCENT sample size was extremely small which often resulted in poor statistics and is therefore not recommended.

这段描述说明，11g中ESTIMATE_PERCENT使用AUTO，会倾向于默认100%采样，由数据库引擎决定表和列，采样不同的采样比例。如果ESTIMATE_PERCENT设置了具体数值，则该值会应用于表和列。尽管100%采样，采集过程也会非常迅速，因为Oracle采用了一种新的HASH算法来计算统计信息，而不会像9i和10g中采用排序的方法，会显得非常slow。特别指出，10g中由于ESTIMATE_PERCENT默认值是一个非常非常小的数，通常会造成poor的统计信息，因此并不建议使用AUTO。

这篇文章则介绍了一些9i中ESTIMATE_PERCENT参数值设置的说明，

Using DBMS_STATS.GATHER_TABLE_STATS With ESTIMATE_PERCENT Parameter Samples All Rows (文档 ID 223065.1)

DBMS_STATS.AUTO_SAMPLE_SIZE的值是什么？

指出ESTIMATE_PERCENT有一个上限，超过则会采样所有行数据。9.0.1版本之前，这个上限是25%，9.0.1版本则是15%，从9.2版本开始，采样比例则由用户指定了。

(2) 11g的描述，How to Gather Optimizer Statistics on 11g (文档 ID 749227.1)

DBMS_STATS.AUTO_SAMPLE_SIZE的值是什么？

On 11g support suggests using the default DBMS_STATS.AUTO_SAMPLE_SIZE for ESTIMATE_PERCENT. This will generate estimate sample size of 100% for the table (if it is possible for this to fit within the maintenance window), even if that means that statistics are gathered on a reduced frequency. If this 100% sample is not feasible, then try using at least an estimate of 30%, however since 11g uses a hashing algorithm to compute the statistic, performance should be acceptable in most cases.

Generally, the accuracy of the statistics overall outweighs the day to day changes in most applications. See below for notes regarding earlier versions and this setting.

11g建议使用DBMS_STATS.AUTO_SAMPLE_SIZE，维护窗口内，尽可能按照100%完成自动采样，若100%的方式不适合，则会at least采用30%的采样比例。并且强调了11g采用了HASH算法，计算统计信息，因此几乎在所有场景下，性能都不是问题。

对于默认值，和上面10g文档描述是一样的，

DBMS_STATS.AUTO_SAMPLE_SIZE的值是什么？

In 11g, using auto size for ESTIMATE_PERCENT defaults to 100% and therefore is as accurate as possible for the table itself. In prior versions a 100% sample may have been impossible due to time collection constraints, however 11g implements a new hashing algorithm to compute the statistics rather than sorting (in 9i and 10g the "slow" part was typically the sorting) which significantly improves collection time and resource usage. Note that the column statistics are automatically decided and so a more variable sample may apply here.

11g中ESTIMATE_PERCENT设置AUTO，默认会采用100%的采样比例，因此对于表来说，统计信息会比较准确。之前的数据库版本中，限于采集时间，100%比例几乎不可能，然而11g使用了一种新的HASH算法，不会像之前9i和10g采用排序的方法，以前这种方法会增加采集的时间以及系统的资源消耗。

这篇文章中(How to Change Default Parameters for Gathering Statistics in Oracle 11g or Later (文档 ID 1493227.1))则介绍了如何修改统计信息收集中的默认值。

DBMS_STATS.AUTO_SAMPLE_SIZE的值是什么？

指出采样比例参数默认值是DBMS_STATS.AUTO_SAMPLE_SIZE，该参数可以设置为：

(1) DBMS_STATS.AUTO_SAMPLE_SIZE

(2) 从0.000001到100之间的有效值。

(3) NULL(会采用计算比例，100%)

如果没有显示使用NULL，则会使用默认值DBMS_STATS.AUTO_SAMPLE_SIZE，只有明确指出NULL，才会使用NULL的计算方法。

(3) 12的描述，

DBMS_STATS.AUTO_SAMPLE_SIZE的值是什么？

说明和之前的11g版本，并无区别，因此建议参考11g。

总结：

1. 9i中ESTIMATE_PERCENT默认100%。

2. 10g中ESTIMATE_PERCENT默认为DBMS_STATS.AUTO_SAMPLE_SIZE，一个非常非常小的数，通常会造成poor的统计信息，因此并不建议使用AUTO。

3. 11g中ESTIMATE_PERCENT默认为DBMS_STATS.AUTO_SAMPLE_SIZE，但由于其采用了一种新的HASH算法，即使倾向于默认100%采样，其性能要比9i和10g中更优，因此一般情况下，建议使用DBMS_STATS.AUTO_SAMPLE_SIZE，由Oracle来自主选择采样比例。

4. 我之前说的，默认采样比例是100%，其实是需要有一些前提条件的，从1-3可以看出，9i确实默认是100%，但10g肯定不是了，11g一般情况下是100%，但不能保证所有情况均为100%。

5. 任何说法，需要有理论和实践来论证，有时需要猜测，但一定是靠谱的猜测，并且可以证明。

6. 要感谢像类总这样的朋友们，对之前一些问题的疑问，我一直认为，我只是一个Oracle的爱好者，道行还很浅，相关的理论和实践知识，还很薄弱，有错误不怕，但我一直在努力中，所以欢迎朋友们指出各种问题，共同努力，共同进步！

如果您觉得此篇文章对您有帮助，欢迎关注微信公众号：bisal的个人杂货铺，您的支持是对我最大的鼓励！共同学习，共同进步:)

正文到此结束