Umar DBA: Recommendations for Gathering Optimizer Statistics on 10g

Recommendations for Gathering Optimizer Statistics on 10g
	Doc ID:	605439.1	Type:	HOWTO
	Modified Date:	28-JUL-2009	Status:	PUBLISHED

In this Document
  Goal
  Solution
  References

Applies to:

Oracle Server - Enterprise Edition - Version: 10.1.0.2 to 10.2.0.4
Oracle Server - Standard Edition - Version: 10.1.0.2 to 10.2.0.4
Information in this document applies to any platform.
Oracle Server Enterprise Edition - Version: 10.1.0.2 to 10.2.0.4

Goal

This Document outline the recommended method to gather a standard set of optimizer statistics for use by the Cost Based Optimizer under Oracle 10g.

Solution

Quick Recreate Recommendation

To achieve a quick delete and recreate of the statistics on an individual table and it's indexes (adding column statistics for any skewed columns) and following the recommendations in this article use:

exec dbms_stats.delete_table_stats(ownname=>'user_name',-
tabname=>'table_name',cascade_indexes=>true);

exec dbms_stats.gather_table_stats(ownname=>'user_name',-
   tabname=>'table_name',-
   estimate_percent => 100,-
   cascade=>true,-
   method_opt=>'for all columns size skewonly');

For explanation of these recommendations, see below. For more usage examples see the end of this article.
Note that, from 10gR2 statistics can be restored using:

Note 452011.1 - Restoring table statistics in 10G onwards

IMPORTANT: PLEASE NOTE:

These recommendations apply to the majority of databases.
The recommendations aim to generate statistics with as much statistical accuracy as possible. To this end 100% sample sizes are suggested since any reduction in sample size is always a concession to accuracy. It is acknowledged that such 100% samples are potentially time consuming and consideration needs to be made to fit the statistics gathering activities within the existing maintenance window.
Gathering new optimizer statistics should maintain or improve existing execution plans, but it is possible that some queries performance may degrade. Note that from 10gR1 previous copies of statistics are maintained by default for the last 30 days and can be restored in the case of problems. See:

Note 452011.1 Restoring table statistics in 10G onwards

Gathering new optimizer statistics may invalidate cursors in the shared pool so it is prudent to restrict all gathering operations execution to periods of low activity in the database, such as the scheduled maintenance windows.
For very large systems, the gathering of statistics can be a very time consuming and resource intensive activity.In this environment sample sizes need to be carefully controlled to ensure that gathering completes within acceptable timescale and resource constraints and within the maintenance window. For guidance on this topic See:

Note 44961.1 Statistics Gathering: Frequency and Strategy Guidelines

In these environments, it is also recommended to utilise change based statistics gathering to avoid re-gathering information unnecessarily.Please see:

Note 237901.1 Gathering Schema or Database Statistics Automatically - Examples
Note 377152.1 Best Practices for automatic statistics collection on Oracle 10g

Gathering Object statistics

The Cost Based Optimizer (CBO) uses statistics to determine the execution plan for a particular query. Potentially, with reduced sample sizes, sampling could produce different statistics due to chance groupings of data that may be the result of differing loading methods etc.

On 10g it is recommended to:

Gather statistics using scheduled statistics gathering scripts. In most cases the default scripts provide an adequate level of sampling taking into account the following recommendations:
Use sample size that is large enough. On 10g support suggests an estimate sample size of 100% (if it is possible for this to fit within the maintenance window), even if that means that statistics are gathered on a reduced frequency. If 100% is not feasible, try using at least an estimate of 30%. Generally, the accuracy of the statistics overall outweighs the day to day changes in most applications. This setting is because the default AUTO_SAMPLE_SIZE uses a very small estimate percentage which can result in poor estimates.
Ensure all objects (tables and indexes) have stats gathered. An easy way to achieve this is to use the CASCADE parameter.
Ensuring that any columns with skewed data distribution have histograms collected, and at sufficient resolution using the METHOD_OPT parameter. Support recommends a conservative and more plan-stable approach of "adding a histogram only if it is known to be needed" rather than collecting column statistics on all columns. This can be achieved manually or by using the SKEWONLY option to automatically add column statistics to columns that contain data with a non-uniform distribution. Using the default column statistics setting of AUTO which means that DBMS_STATS will decide which columns to add histogram to where it believes that they may help to produce a better plan. If statistics are not completely up to date then the presence of Histograms can cause trouble when parsing values are out of range, or between values for "frequency" histograms. In these circumstances the optimizer has to make guesses which may be inaccurate and, on occasion, cause poor plans.

Note that in earlier versions the default setting for the METHOD_OPT parameter was "FOR ALL COLUMNS SIZE 1" which would collect only a high and a low value and effectively meant that there were no detailed column statistics. It is known that in some cases, the effect of a histogram is adverse to the generation of a better plan so users moving between versions may initially wish to set this parameter to its pre-upgrade release value, and later adjust to the post-upgrade release default value. See:

Note 465787.1 Managing CBO Stats during an upgrade to 10g or 11g

As ever, testing different values with the application will yield the best results.

If partitions are in use, gather global statistics if possible due to time constraints. Global stats are very important but gathering is often avoided due to the sizes involved and length of time to required. If 100% samples are not possible then support would recommend going for a minimum of 1%. Gathering with small sample sizes (e.g. 0.001, 0.0001, 0.00001 etc. ) can be very effective but equally, a large proportion of the data will not be examined which could prove decisive to the optimizer's plan choices. Note that the available range for the ESTIMATE_PERCENT parameter is a very flexible [0.000001 -> 100] which can use very small sample sizes suitable for huge partitioned tables. Testing will reveal the most suitable settings for each system.
See:

Note 236935.1 Global statistics - An Explanation

Gather system statistics to reflect the CPU loading of the system and to improve the accuracy of the CBO's estimates by providing the CBO with CPU cost estimates in addition to the normal I/O cost estimates. See:

Note 470316.1 Using Actual System Statistics (Collected CPU and IO information
Note 149560.1 Collect and Display System Statistics (CPU and IO) for CBO usage
Note 153761.1 Scaling the System to Improve CBO optimizer

Note that the defaults for statistics gathering on different versions of Oracle are not necessarily the same, for example:

ESTIMATE_PERCENT: defaults:

9i : 100%
10g : DBMS_STATS.AUTO_SAMPLE_SIZE (using very small estimate percentage)
11g : DBMS_STATS.AUTO_SAMPLE_SIZE (using larger estimate percentage - 100%)

METHOD_OPT: defaults:

9i : "FOR ALL COLUMNS SIZE 1" effectively no detailed column statistics.
10g and 11g : "FOR ALL COLUMNS SIZE AUTO" - This setting means that DBMS_STATS decides which columns to add histogram to where it believes that they may help to produce a better plan.

In 11g, using auto size for ESTIMATE_PERCENT defaults to 100% and therefore is as accurate as possible. Additionally, even though a 100% sample is collected, the gathering process is really fast since a new hashing algorithm is used to compute the statistics rather than sorting (in 9i and 10g the "slow" part was typically the sorting). In 10g the default ESTIMATE_PERCENT sample size was extremely small which often resulted in poor statistics and is therefore not recommended.

SAMPLE STATISTIC GATHERING COMMANDS

Gathering statistics an individual table

exec dbms_stats.gather_table_stats(  -
       ownname => '  Schema_name ', -
       tabname => '  Table_name  ', -
       estimate_percent => 100,  -
       cascade => TRUE,  -
       method_opt => 'FOR ALL COLUMNS SIZE 1' );

N.B. replace ' Schema_name ' and ' Table_name ' with the name of the schema
and table to gather statistics for respectively.

Gathering statistics for all objects in a schema

exec dbms_stats.gather_schema_stats( -
      ownname => '  Schema_name ', -
       cascade => TRUE, -
       method_opt => 'FOR ALL COLUMNS SIZE 1' );

N.B. replace ' Schema_name ' with the name of the desired schema.

Gathering statistics for all objects in the database:

exec dbms_stats.gather_database_stats( -
cascade => TRUE, -
method_opt => 'FOR ALL COLUMNS SIZE 1' );

NOTE:

For cases where column data is known to be skewed and column statistics are known to be beneficial, Replace:

method_opt => 'FOR ALL COLUMNS SIZE 1'

with

method_opt => 'FOR ALL COLUMNS SIZE AUTO'

or with

method_opt => 'FOR ALL COLUMNS SIZE SKEWONLY'

to automatically add column statistics to columns that contain data with a non-uniform distribution.

@ dbms_stats dbms_stats dbms_stats dbms_stats dbms_stats

References

Note 153761.1 - System Statistics: Scaling the System to Improve CBO optimizer
Note 236935.1 - Global statistics - An Explanation
Note 237901.1 - Gathering Schema or Database Statistics Automatically in 8i and 9i - Examples
Note 377152.1 - Best Practices for automatic statistics collection on Oracle 10g
Note 388474.1 - Recommended method for Gathering Optimizer Statistics on 9i
Note 44961.1 - Statistics Gathering: Frequency and Strategy Guidelines
Note 452011.1 - Restoring table statistics in 10G onwards
Note 465787.1 - Managing CBO Stats during an upgrade to 10g or 11g
Note 470316.1 - Using Actual System Statistics (Collected CPU and IO information)

Umar DBA

Wednesday, November 18, 2009

Recommendations for Gathering Optimizer Statistics on 10g

No comments:

Post a Comment

Followers

Blog Archive