Wednesday, November 11, 2009

True Session Wait Activity in Oracle 10g

Catching a session waiting on a resource used to be hit or miss. Let's take a look at how Oracle has helped give us a better mousetrap for seeing the waits experienced by sessions.

It used to be that a DBA would sit in front of the terminal screen and continually hit the ENTER key to watch activity in the V$SESSION_WAIT view to try and get a glimpse of the bottleneck and wait activity for connected sessions. This in itself was a very tedious task and honestly I would often get paranoid that I would miss some valuable information if I did not hit the ENTER key fast enough and thus I would never see the important wait event. Yes there was the V$SESSION_EVENT view that gave us a total for all waits but we never knew when the wait might have occurred. DBAs very quickly got smarter and started to sample the data in the V$SESSION_WAIT table and write it out to another table or dump it to an external list file. This was fine but the overhead of actually creating and then running the task that would populate these statistics could get high, and management of the task could get tedious.

To the Rescue

With Oracle 10g this has all been nicely taken over for us. Oracle will now sample data through internal kernel code for statistics similar to what was seen in the V$SESSION_WAIT view and store it in a limited amount of buffer space, 2 Meg per CPU. We can view these statistics by querying the new V$ACTIVE_SESSION_HISTORY view. The V$ACTIVE_SESSION_HISTORY view contains one row of sampled activity for each session that was active, either on the CPU or actually waiting for a resource. As this buffer gets full, because of the activity on the system, Oracle will move the captured statistics to disk as part of the Automatic Workload Repository (AWR) snapshot mechanism and it will now be available through the DBA_HIST_ACTIVE_SESS_HISTORY view. When querying for current information, depending on the activity of your system, you may need to go to both views to construct a valid picture of what has happened.

The important thing to remember here is that this is historical activity of the waits that have occurred for SQL executed and we can now go back in time and look at what was the true cause of performance bottlenecks. Be forewarned that since the statistics are stored in the rotating buffer and are only moved to the DBA_HIST_ACTIVE_SESS_HISTORY through a snapshot, you may loose statistics. In addition, when the snapshot mechanism (AWR) does kick in and does its work to move database statistics between V$ACTIVE_SESSION_HISTORY and DBA_HIST_ACTIVE_SESS_HISTORY, it only takes a sampling of the data in V$ACTIVE_SESS_HISTORY for the snapshot. Furthermore, when AWR does kick in, it only takes a sampling of the data in V$ACTIVE_SESS_HISTORY for the snapshot. What this means is that you need to either catch what you want in the V$ACTIVE_SESS_HISTORY view before a snapshot or have your own scrapping utility again. For this reason, this article will only be concerned with querying off of the V$ACTIVE_SESS_HISTORY view to show a few ways you might want to query for information that had not been available before.

What resource is currently in high demand?

While we have always been able to look at the system or a currently running session as a whole and see the resources that are in most use through the V$SYSTEM_EVENT and V$SESSTAT views, we really have not ever had a good way to determine for a current time frame what the resources being used were. This query will give you for the last 30 minutes those resources that are in high demand on your system.


1 select active_session_history.event,

2 sum(active_session_history.wait_time +

3 active_session_history.time_waited) ttl_wait_time

4 from v$active_session_history active_session_history

5 where active_session_history.sample_time between sysdate - 60/2880 and sysdate

6* group by active_session_history.event

7 order by 2


Wait Event TTL_WAIT_TIME

------------------------------ -------------

SQL*Net message to client 4

null event 5

LGWR wait for redo copy 161

ksfd: async disk IO 476

direct path read 30025

latch: cache buffers chains 33258

direct path write 93564

log file sequential read 178662

db file sequential read 458653

log file sync 612660

control file single write 1127626

read by other session 2024242

db file parallel write 2278618

control file parallel write 3091888

enq: CF - contention 4108238

rdbms ipc reply 4283877

db file scattered read 4357653

class slave wait 5123780

control file sequential read 6971659

rdbms ipc message 11899157

jobq slave wait 14732974

log file parallel write 15310721

log buffer space 21405250

SQL*Net message from client 26272575

log file switch completion 66115558

enq: HW - contention 100841479

buffer busy waits 114070065

What user is waiting the most?

Again, the power in this SQL query is that we are now able to determine what user is consuming the most resource at a point in time, independent of the total resources that the user has used. You can now, with this query, answer the question of who is waiting the most for resources at a point in time. If a user calls you up on the phone and says they are experiencing delays, you can use this query to verify that they are actually waiting in the database for a result set for a given time period. This SQL is written for a 30-minute interval from current system time so you may need to change.


1 select sesion.sid,

2 sesion.username,

3 sum(active_session_history.wait_time +

4 active_session_history.time_waited) ttl_wait_time

5 from v$active_session_history active_session_history,

6 v$session sesion

7 where active_session_history.sample_time between sysdate - 60/2880 and sysdate

8 and active_session_history.session_id = sesion.sid

9 group by sesion.sid, sesion.username

10* order by 3


SID User TTL_WAIT_TIME

----- ---------- -------------

135 SCOTT 91167481

149 SCOTT 107409491

153 SCOTT 110796799

What SQL is currently using the most resources?

This query will get you started in the proper direction of zeroing in on what SQL statement is consuming the most resource wait times on your system. No longer do you have to go to the V$SQLAREA and try to pick out the top 10 or 20 SQL hogs on your system by disk reads or executions. Now you really know what SQL statements are consuming resources and waiting the most. These are the SQL that you really need to tune because the fact that a SQL statement performs 20,000 reads does not mean that it is a bottleneck in your system.


1 select active_session_history.user_id,

2 dba_users.username,

3 sqlarea.sql_text,

4 sum(active_session_history.wait_time +

5 active_session_history.time_waited) ttl_wait_time

6 from v$active_session_history active_session_history,

7 v$sqlarea sqlarea,

8 dba_users

9 where active_session_history.sample_time between sysdate - 60/2880 and sysdate

10 and active_session_history.sql_id = sqlarea.sql_id

11 and active_session_history.user_id = dba_users.user_id

12 group by active_session_history.user_id,sqlarea.sql_text, dba_users.username

13* order by 4


USER_ID User SQL_TEXT TTL_WAIT_TIME

------- ------ ----------------------------------------------------- -------------

57 SCOTT insert into sys.sourcetable (select * from sys.source$) 304169752

What object is currently causing the highest resource waits?

This is a great query. Now we can actually see which objects a SQL statement is hitting. Moreover, if you take a further look at the V$ACTIVE_SESSION_HISTORY view you will see that you can tailor this query to report on the actual blocks that are being accessed within the objects for the SQL statement. This is great help in determining if you may need to reorg your object or redistribute to reduce the contention on that object.


1 select dba_objects.object_name,

2 dba_objects.object_type,

3 active_session_history.event,

4 sum(active_session_history.wait_time +

5 active_session_history.time_waited) ttl_wait_time

6 from v$active_session_history active_session_history,

7 dba_objects

8 where active_session_history.sample_time between sysdate - 60/2880 and sysdate

9 and active_session_history.current_obj# = dba_objects.object_id

10 group by dba_objects.object_name, dba_objects.object_type, active_session_history.event

11* order by 4


OBJECT_NAME OBJECT_TYPE Wait Event TTL_WAIT_TIME

-------------- ------------ ------------------------------ -------------

SOURCE$ TABLE ksfd: async disk IO 23

SOURCE TABLE TABLE ksfd: async disk IO 453

SOURCE TABLE TABLE latch: cache buffers chains 33258

SOURCE TABLE TABLE db file scattered read 82593

SOURCE TABLE TABLE db file sequential read 111202

SOURCE TABLE TABLE control file parallel write 137237

SOURCE TABLE TABLE read by other session 165501

SOURCE TABLE TABLE log file sync 612660

SOURCE$ TABLE log buffer space 932308

SOURCE TABLE TABLE control file sequential read 1428575

SOURCE$ TABLE log file switch completion 1856281

SOURCE$ TABLE read by other session 1858741

SOURCE$ TABLE db file scattered read 4275060

SOURCE TABLE TABLE rdbms ipc reply 4283877

SOURCE TABLE TABLE log buffer space 14152000

SOURCE$ TABLE enq: HW - contention 25483656

SOURCE TABLE TABLE log file switch completion 59228080

SOURCE TABLE TABLE enq: HW - contention 75357823

SOURCE TABLE TABLE buffer busy waits 114055403

The new information that Oracle has given us through the V$ACTIVE_SESSION_HISTORY view is invaluable. Now we can really zero in on those resources, session, users, SQL statements, and objects that are causing our systems to be resource intensive. By reducing the resource usage, you can in fact tune your systems better.

REFERENCES

http://www.dba-support.com

No comments:

Post a Comment