New diagnostics engine in Oracle 10g helps detect and diagnose performance problems
In my last column, I focused on one of Oracle Database 10g's quick wins for DBAs—using the new SQL Tuning Advisor to quickly tune poorly performing SQL statements—and touched only briefly on the new built-in diagnostic engine, Automatic Database Diagnostic Monitor (ADDM), which helps you easily identify problematic SQL statements (see "Advice and Consent" in the March/April 2004 issue for more information about using the SQL Tuning Advisor and the DBMS_SQLTUNE package). Identifying high-load SQL statements is just one of ADDM's many capabilities. A self-diagnostic engine built into the Oracle Database 10g kernel, ADDM automatically detects and diagnoses common performance problems, including:
- Hardware issues related to excessive I/O
- CPU bottlenecks
- Connection management issues
- Excessive parsing
- Concurrency issues, such as contention for locks
- PGA, buffer-cache, and log-buffer-sizing issues
- Issues specific to Oracle Real Application Clusters (RAC) deployments, such as global cache hot blocks and objects and interconnect latency issues
Let's continue exploring the new performance tuning capabilities of Oracle Database 10g by taking a look at ADDM.
Automatic, Effective Problem Diagnosis
ADDM automatically analyzes the performance of the database at regular intervals (once an hour by default) and identifies any performance issues. ADDM's diagnostic engine is modeled as a classification system that quickly and effectively sifts through the statistical data in Automatic Workload Repository (AWR)—a newly introduced built-in repository in Oracle Database 10g containing performance statistics and workload information—and evaluates problem areas, based on their impact on overall database performance. ADDM's processing logic encompasses best practices and the accumulated expertise of tuning professionals throughout the industry. Ultimately ADDM presents a set of "findings" that not only identifies the cause of performance problems but can also inform administrators that certain database subcomponents, such as I/O, are functioning properly and do not require any further investigation.
Automation is just one of several unique benefits of ADDM. Another important benefit is ADDM's ability to identify the root cause of problems. Just as physicians achieve better results for their patients by treating causes rather than symptoms, DBAs can achieve better database performance by finding the root cause of performance issues before making changes to the system. However, finding a root cause can be difficult and time-consuming, given that symptoms can sometimes mask the real problem.
ADDM can distinguish between the cause and symptoms of a problem. It's able to do so, in part, by using the extensive new data (events, statistics) generated by the database at runtime. Oracle database kernel code has always been instrumented to provide raw performance data, but in this release, the instrumentation is more comprehensive.
For example, wait events are more granular, with numerous locks and latches now separated into distinct wait events. Furthermore, wait events are now grouped into classes—Application, Administration, Commit, Concurrency, Network, User I/O, System I/O, and Configuration, for example—to facilitate processing through ADDM.
In addition to the new wait-event model, Oracle Database 10g also includes an array of new statistics that provide performance data on everything from operating system performance—hard disk I/O statistics, CPU utilization, and network statistics, for example—to cumulative statistics at the system and session levels. One of the most important new statistics is database time, or DB time—the amount of time the database is actually working on database calls.
It's About Time
ADDM uses DB time to measure throughput; it's the total time spent by foreground processes—such as a read I/O—waiting for a database resource, running on the CPU, and waiting for a free CPU. The overarching goal of ADDM is to reduce the value of DB time across the system, thereby improving overall throughput.
Operational information about active sessions—those using the CPU—as well as sessions waiting for the CPU, is sampled every second and held in a scrolling buffer in the server memory—Active Session History (ASH), an important new feature of Oracle Database 10g. ASH (V$ACTIVE_SESSION_HISTORY) maintains historical data; in previous Oracle releases, DBAs had information about currently active sessions only— not historical data—so figuring out the cause of performance problems after the fact wasn't easy.
Data from ASH, along with other important database statistics, persists on the disk in the form of "snapshots." Snapshots are taken every hour and stored for seven days, by default, but DBAs can change settings for both snapshot frequency and storage duration. AWR provides the raw information for ADDM analysis, which begins automatically as soon as a new AWR snapshot is taken.
Getting Started with ADDM
Because ADDM runs automatically after each new AWR snapshot is taken, no manual steps are required to generate its findings. But you can run ADDM on demand by creating a new snapshot manually, by using either Oracle Enterprise Manager (OEM) or the command-line interface. The following shows creation of a snapshot from the command line:
SQL> exec dbms_workload_repository.create_
snapshot();
PL/SQL procedure successfully completed.
After the new snapshot is created, its information (snap_id, begin_interval_ time, and the like) is populated to the DBA_HIST_SNAPSHOT dictionary view. A few seconds after you take the snapshot, ADDM brings new findings to the surface, based on analysis of that snapshot. The findings are available on the new OEM Web-based console, on the Database home page.
When you access the database instance in the new OEM console, you see at a glance how the database is operating in general, on a dashboard-style home page. You can quickly verify that the server has adequate CPU and memory resources and get a general overview of the system from this one page. The Active Sessions pie chart shows the current distribution across the server in three key areas: Using CPU, Waiting I/O, and Waiting Other. You can drill down on any of these items to see a line graph of activity over the last 24 hours (and change this to values such as 31 days, 7 days, or just a few minutes).
The results of the latest ADDM run are displayed in the Performance Analysis section of the home page which provides a summary view of the top ADDM findings: the impact of the diagnosed problem as a percentage of overall database performance, a description of the finding, and a recommendations summary. The text of each finding is a hyperlink to more-detailed information on that particular finding. ADDM recommendations can include running one of the new advisors—SQL Tuning Advisor, SQL Access Advisor, Space Management Advisor, and so on—when applicable. For example, a solution for a particular finding can involve using SQL Tuning Advisor to tune a particular SQL statement or adding more resources to the machine. Clicking on the recommendation link displays the detailed finding view, which includes the full text of any recommendations.
You can also generate an ADDM report that summarizes performance data and provides a list of all findings and recommendations. You can access ADDM reports through the Web-based OEM console or from a SQL*Plus command line by using the new DBMS_ADVISOR built-in package. For example, here's how to use the command line to create an ADDM report quickly (based on the most recent snapshot):
set long 1000000
set pagesize 50000
column get_clob format a80
select dbms_advisor.get_task_report(
task_name, 'TEXT', 'ALL')
as ADDM_report
from dba_advisor_tasks
where task_id=(
select max(t.task_id)
from dba_advisor_tasks t, dba_advisor_log l
where t.task_id = l.task_id
and t.advisor_name='ADDM'
and l.status= 'COMPLETED');
The ALL parameter generates additional information about the meaning of some of the elements in the report.
Findings report the impact of the identified problem as a percentage of DB time, which correlates with the expected benefit, based on the assumption that the problem described by the finding will be solved if the recommended action is taken. For example, here's a finding that identifies a configuration issue and recommends adjusting the sga_target value in the parameter file:
FINDING 3: 5.2% impact (147 seconds)
---------------------------------------
The buffer cache was undersized causing significant additional read I/O.
RECOMMENDATION 1: DB Configuration, 5.2% benefit (147 seconds)
ACTION: Increase SGA target size by increasing the value of parameter "sga_target" by 24 M.
SYMPTOMS THAT LED TO THE FINDING:
Wait class "User I/O" was consuming significant database time. (5.3% impact [150 seconds])
...
As you see, DBAs have the flexibility to decide how to resolve ADDM findings. In this case, the DBA might decide to use the new automatic SGA sizing feature, rather than modifying the parameter.
Conclusion
Oracle Database 10g provides DBAs with comprehensive, quantifiable diagnostic information that can be used to proactively monitor and prevent problems as well as respond to them when they arise. There's an enormous amount of new statistical information available to help keep the database running optimally. More important, the automatic performance-diagnostic capabilities of AWR and ADDM will not only help you resolve critical performance problems quickly but will also provide you with hard-and-fast, quantifiable data for making the business case for new hardware, additional staff, application development initiatives, or better training for employees. And Oracle plans to continually update ADDM's problem classification tree and diagnostic rules to keep pace with new features and technologies introduced in the database and underlying operating platform.
REFERENCES
A Closer Look at ADDM By Kimberly Floss , Oracle Corporation
No comments:
Post a Comment