Tuesday, November 17, 2009

Mining for Clues


Use Oracle LogMiner to recreate transactions and undo damage.

Paul is on the other end of the line and is irate. He had been the account manager for a customer named Ellen Smith for the past 20 years. But today, when Ellen walked into the bank, the system said Paul was no longer the account manager! Embarrassed and justifiably upset, he is screaming for answers.

John, the lead DBA at the bank, needs to act fast. He performs a quick query on the ACCOUNTS table, which shows that the value of the ACC_MGR_ID column for account number 4567 (Ellen's account) is 1, whereas Paul's ID is 5. The value of the column was definitely 5 before. Who could have changed it, and why? Furthermore, a separate query shows that Paul was assigned another customer (with account number 9876), whom he does not recognize. Why did this happen, and who is the real account manager for account 9876?

Digging Through the Archives

What options does John have for rectifying the situation?

Let's start by looking at Oracle's undo and redo logs. Note that Oracle Database generates an undo and a redo entry for each change made to the database. In simple terms, an undo entry provides the values of data stored before a change and the redo entry provides the values of data stored after a change. The redo entries are written to log buffers in the system global area (SGA) whenever a change occurs. The log writer (LGWR) process then writes the redo entries to the online redo logs, and, if the database is in ARCHIVELOG mode, the archiver process (ARC0) writes the redo entries to the archived logs. The purpose of the redo entries is to ensure that database activities can be replayed in case a disaster occurs in the database. In the event of a complete loss, a DBA can restore a previous backup of the database and apply all the archived logs generated since that time to roll it forward to the time of failure.

What about undo? According to Oracle's read consistency model, a user sees the value of a database change only after the change is committed; otherwise, the database returns the previous (unchanged) values. These previous values are retrieved from the undo segments. Furthermore, during data recovery after a disaster, Oracle Database must roll back all changes that were not committed, to maintain consistency and data integrity. These entries are also obtained from the undo segments. Because undo segments are also database changes, they generate redo entries as well, which go into online redo logs and then to archived logs.

As a result, online and archived logs contain both undo and redo information. DBAs can use them to see both old and new data values. However, online and archived logs have an unpublished format and are not human-readable. To read them, DBAs use a tool called Oracle LogMiner, which makes it possible to read and search both online and archived logs. It is implemented via a supplied package named DBMS_LOGMNR, introduced in Oracle8i. This article describes how to use this powerful tool to solve common and not-so-common problems involving changes made to an Oracle database.

Start Mining

Let's return to the issue of Ms. Smith's account. John can start looking for how and when the account manager was changed by using Flashback Query to select prior values of the record in this table. Flashback Query is an Oracle feature that lets a user see the value of a column at some point in the past, before that value may have been changed and committed.

However, the Flashback Query approach works only if a change occurred within the period specified by the UNDO_RETENTION parameter, which is often as short as 30 minutes for an OLTP database. In this case, there's a good chance that Flashback Query will not find the change.

John can find the answer to the question at hand by looking directly at the online redo log files, using LogMiner. Here are the steps for setting up and using LogMiner:
1.     Install the DBMS_LOGMNR package. John installs the package by logging on to SQL*Plus as SYS and executing the $ORACLE_HOME/rdbms/admin/dbmslm.sql file.
2.     Grant the role. John then grants the user who will do the mining—in this case, himself—the appropriate role to execute this package:
3.  GRANT EXECUTE_CATALOG_ROLE TO JOHN;
4.     Create the synonym. John creates a public synonym:
5.  CREATE PUBLIC SYNONYM DBMS_LOGMNR FOR SYS.DBMS_LOGMNR;
These first three steps are required just once.
6.     Specify the scope of the mining. Because the error may have happened just recently, it's possible that the change is still within the online redo log files. So, for the first pass, John decides to mine only the online redo log files. He identifies the files by running the following query:
7.  SELECT distinct member LOGFILENAME FROM V$LOGFILE;
8.   
9.  LOGFILENAME
10.________________________
11./dev/vgredo01/rlog1a
12./dev/vgredo01/rlog1b
These are the two redo log files for the database. John limits the scope of mining to only these files, by executing the following:
BEGIN
   DBMS_LOGMNR.ADD_LOGFILE  
      ('/dev/vgredo01/rlog1a');
   DBMS_LOGMNR.ADD_LOGFILE 
      ('/dev/vgredo01/rlog1b');
END;
13.   Start the LogMiner session and specify a dictionary. John executes the following SQL to start the LogMiner session:
14.BEGIN
15.   DBMS_LOGMNR.START_LOGMNR
16.   (options => 
17.dbms_logmnr.dict_from_online_catalog);
18.END;
Using the OPTIONS parameter, he also specifies that Oracle Database read the dictionary information to convert the object names from the online catalog while starting LogMiner.
As previously mentioned, redo log entries are not written in clear text. LogMiner provides human-readable reporting from the redo log files, but some reported values may not look like the original objects. For example, the owner, table, and column names are not recorded in the online redo log in the same format users input; they are instead converted to hexadecimal values. For instance, the ACCOUNTS table may be recorded as OBJ#45C1. To make the redo log data more readable, John can ask LogMiner to convert these values to more recognizable ones by translating them from a dictionary.
19.   Check contents. Starting the LogMiner session populates a view named V$LOGMNR_CONTENTS, which John can then query to find out what happened to Ellen Smith's account. The query looks for the users and times of any updates against the ACCOUNTS table. The query against V$LOGMNR_CONTENTS and the output are shown in Listing 1. Note that John must query this view from the same session in which he started the LogMiner session. Because he is looking only for updates, he uses the predicate in Listing 1, line 6.

In the Listing 1 output, John sees that a user named Joe updated the table at 2:16 p.m. The SID and SERIAL# columns indicate those identifiers of the session in which Joe was logged in.

Code Listing 1: Retrieving information from V$LOGMNR_CONTENTS
 
SQL>  select username, to_char(timestamp,'mm/dd/yy hh24:mi:ss') timestamp,
  2  seg_type_name, seg_name, table_space, session# SID, serial#
  3  from v$logmnr_contents
  4  where table_name = 'ACCOUNTS'
  5  and seg_owner = 'ARUP'
  6  and operation = 'UPDATE';
 
USERNAME     TIMESTAMP          SEG_TYPE_N    SEG_NAME       TABLE_SPAC   SID    SERIAL#
___________  __________________  _____________  ______________  ___________  ____   _________
JOE          01/26/05 14:16:41  TABPART       ACCOUNTS,P1    ACC_D1       532    62
JOE          01/26/05 14:16:41  TABPART       ACCOUNTS,P2     ACC_D2      532    62
JOE          01/26/05 14:16:41  TABPART       ACCOUNTS,P3    ACC_D3       532    62
 
...

Another column in this view, SESSION_INFO, provides additional information on Joe's session. The value of the  
SESSION_INFO column for all the rows is
 
login_username=JOE 
client_info= 
OS_username=jsmoe 
Machine_name=ACMENTNY\JSMOE 
OS_terminal= OS_process_id=4080:3096 OS_program name=sqlplus.exe

Based on this information, John knows that Joe updated some records in the ACCOUNTS table from a client machine named JSMOE connected to the ACMENTNY network. The last piece of information, OS_program name=sqlplus .exe, provides another big clue. Because Joe was using SQL*Plus, he probably performed an ad hoc update, but because Joe is a developer, he is not supposed to perform updates to a production database except in a dire emergency. Why would he have made these changes?

In any event, John needs to know what Joe updated as well as the old values stored before the update. To find this information, John modifies the query in Listing 1 to select the two very important columns in the V$LOGMNR_CONTENTS view: SQL_REDO, which shows a statement that can be executed to cause the same change made by the user, and SQL_UNDO, a statement that can be executed to undo the changes. Listing 2 shows the output of this modified query.

Code Listing 2: Output of the SQL_UNDO and SQL_REDO columns
 
SQL>  select sql_undo, sql_redo
  2  from v$logmnr_contents
  3  where table_name = 'ACCOUNTS'
  4  and seg_owner = 'ARUP'
  5  and operation = 'UPDATE';
 
SQL_UNDO                                           SQL_REDO
____________________________-_______________        ___________ ________________________________
update "ARUP"."ACCOUNTS" set "ACC_M               update "ARUP"."ACCOUNTS" set "ACC_M
GR_ID" = '2' where "ACC_MGR_ID" = '               GR_ID" = '6' where "ACC_MGR_ID" = '
6' and ROWID = 'AAOKdAALAAAGcEAB'                 2' and ROWID = 'AAOKdAALAAAGcEAB'
 
update "ARUP"."ACCOUNTS" set "ACC_M               update "ARUP"."ACCOUNTS" set "ACC_M
GR_ID" = '1' where "ACC_MGR_ID" = '               GR_ID" = '6' where "ACC_MGR_ID" = '
6' and ROWID = 'AAAOKdAALAAAGcEAB0'               1' and ROWID = 'AAAOKdAALAAAGcEAB0'
 
update "ARUP"."ACCOUNTS" set "ACC_M               update "ARUP"."ACCOUNTS" set "ACC_M
GR_ID" = '5' where "ACC_MGR_ID" = '               GR_ID" = '6' where "ACC_MGR_ID" = '
6' and ROWID = 'AAAOKdAALAAAGcEAB5'               5' and ROWID = 'AAAOKdAALAAAGcEAB5'
 
...

When confronted, Joe admits that he made the changes. In his defense, however, Joe provides a strong rationale. A very important customer—Ellen Smith—had an unpleasant experience with her assigned account manager and demanded to deal with a new one. These changes are usually made by the general manager, but because he was away, the new account manager had called Joe and asked him if he could somehow "work his magic." Joe did so by issuing the following statement:
 
UPDATE ACCOUNTS
SET acc_mgr_id = 6
WHERE first_name = 'Ellen'
AND last_name = 'Smith';

Joe did not notice, however, that the above statement updated 340 records, each of which corresponded to a customer 
with the name Ellen Smith. Joe wanted to update the record of only one customer, but he inadvertently updated 340 records. Among those affected was Paul's customer, who clearly didn't want her account manager changed.

Note that in the above example, John mined the online redo logs. However, if the issue was old, the contents of the online redo log would already have gone to the archived log files. In that case, he could have mined archived logs as well. To use LogMiner against archived logs, he would simply have supplied the archived log file names in the call to the DBMS_LOGMNR.ADD_LOGFILE procedure. The rest of the tasks remain the same. He can locate the archived log file names from the V$ARCHIVED_LOG data dictionary view.

Undoing the Damage

John's next task is to update the other 339 records to their old values immediately. Note that in the output in Listing 2, the text under SQL_REDO shows what originally happened and the text under SQL_UNDO shows the statements that can be used to reverse those changes. Although Joe originally used a single statement, LogMiner reconstructs the statements to modify one record per statement, as shown. To reinstate the old account manager IDs, John can execute each of the statements listed under SQL_UNDO. This will undo the changes Joe made.

Useful Output

The output in the SQL_UNDO and SQL_REDO columns in Listing 2 is not very easy to read, because everything is on one line. John can use the print_pretty_sql option to solve that problem. He also wants to execute the SQL statements in a PL/SQL loop, using dynamic SQL, and this approach requires that the semicolon at the end of each statement be removed. The no_sql_delimiter option handles that task.

John decides to start a LogMiner session, using the options as follows:
 
begin
   dbms_logmnr.start_logmnr( 
      options =>
          dbms_logmnr.
dict_from_online_catalog +
          dbms_logmnr.print_pretty_sql +
          dbms_logmnr.no_sql_delimiter
   );
end;

After this command runs, the output for the SQL_UNDO column from the query in Listing 2 against  

V$LOGMNR_CONTENTS will look like the following:
 
SQL_UNDO
______________________________
update "ARUP"."ACCOUNTS"
  set
    "ACC_MGR_ID" = 2
  where
     "ACC_MGR_ID" = 6 and
     ROWID = 'AAOKdAALAAAGcEAB'  
...

Selective Correction

The previous sections discussed a way to retrieve the SQL statements to undo all the update operations made to the ACCOUNTS table. Of course, many other users may have updated the table as well, and John would not want to undo their changes. John can filter the contents of the V$LOGMNR_CONTENTS view to show only the pertinent ones, as shown in Listing 3.

Code Listing 3: Selective output from V$LOGMNR_CONTENTS
 
select
   acc_type, first_name, last_name,
   dbms_logmnr.mine_value(redo_value,'ARUP.ACCOUNTS.ACC_MGR_ID') new_value,
   dbms_logmnr.mine_value(undo_value,'ARUP.ACCOUNTS.ACC_MGR_ID') old_value
from v$logmnr_contents l, accounts a
where table_name = 'ACCOUNTS'
and operation = 'UPDATE'
and dbms_logmnr.column_present(undo_value,'ARUP.ACCOUNTS.ACC_MGR_ID') =1
and dbms_logmnr.column_present(redo_value,'ARUP.ACCOUNTS.ACC_MGR_ID') =1
and a.rowid = l.row_id;
 
A FIRST_NAME       LAST_NAME     NEW_VALUE     OLD_VALUE
________________    _____________  _____________  ____________
D Ellen             Smith         5               6
M Ellen             Smith         5              10
D Ellen             Smith         5               7
 
...

Review this SQL carefully. It uses additional functions provided in the DBMS_LOGMNR package. The first function, MINE_VALUE, selects the value of the specified column as recorded before or after the change. For instance, DBMS_LOGMNR.MINE_VALUE (UNDO_VALUE, 'ARUP.ACCOUNTS.ACC_MGR_ID') shows the value of the ACC_MGR_ID column before the change. If the UNDO_VALUE parameter is changed to REDO_VALUE, the statement will instead return the value after the change.

Furthermore, the COLUMN_PRESENT function returns 1 if the column is present and 0 otherwise. When joined with the ACCOUNTS table, it shows other pertinent details of the account.

The output of Listing 3 shows the old and updated values of ACC_MGR_ID, the ACC_TYPE, and the name (FIRST_NAME and LAST_NAME). This output helps John assess, in one glance, the overall damage done and provides a way to correct it. Using the SQL statement in Listing 3, John can write some simple PL/SQL code to update the ACC_MGR_ID values to their original values selectively after checking other columns such as ACC_TYPE or FIRST_NAME, as shown in Listing 4.

Code Listing 4: PL/SQL for executing SQL_UNDO values
 
SQL>  begin
  2    for undo_rec in (
  3      select sql_undo, row_id
  4      from v$logmnr_contents
  5      where table_name='ACCOUNTS' and seg_owner='ARUP' and operation='UPDATE' 
  6      and dbms_logmnr.column_present(undo_value,'ARUP.ACCOUNTS.ACC_MGR_ID')=1
  7      and dbms_logmnr.column_present(redo_value,'ARUP.ACCOUNTS.ACC_MGR_ID')=1
  8      )
  9      loop
10        begin
11           execute immediate undo_rec.sql_undo;
12        exception
13          when no_data_found then
14             dbms_output.put_line('RowId '||undo_rec.row_id||' Failed.');
15        end;
16      end loop;
17   end;  

Remote Mining

The foregoing discussion showed how to use LogMiner on the source database in the production environment. For increased security, John may want to transfer the archived logs to a different database and perform the mining there. Because this remote database would not be accessible to the DBAs of the source database, this approach minimizes the chances that a malicious DBA will remove his trail of commands. Another motivation behind remote mining is to limit the performance impact on the source database by offloading mining activity to a separate database.

With the present setup, however, there is one minor issue John must first address. In the scenario above, John used the online data dictionary as the dictionary for translating the redo log object names from hexadecimal values to readable names. That approach works fine for using the source database, but when the mining is performed on a remote database, the online dictionary of the source database is not available.

If John uses remote mining, he has two alternatives for solving this problem:
1.     He can make LogMiner write the online data dictionary information to the redo log files, which go into the archived logs and can be used at the remote location.
2.     He can download the source location's data dictionary to a flat file and use it at the remote location.

Both options require downloading the online data dictionary, as either a redo log or a flat file. This task can be performed with a supplied package called DBMS_LOGMNR_D. If the package is not already present, John can create it by running the dbmslmd.sql and prvtlmd.sql scripts in the $ORACLE_HOME/rdbms/admin directory.

If John chooses the first option (extracting the source data dictionary to the redo logs), he will execute the following command:
 
begin
   dbms_logmnr_d.build (
      options => dbms_logmnr_d.store_in_redo_logs
   );
end;

Note that the database must be in archivelog mode to enable this option. Archivelog mode adds the dictionary information to the online redo logs, which, in turn, go into the archived logs. When he is ready to mine, John will transfer these archived log files, using a file transfer utility such as FTP, and include them for mining in the add_logfile procedure in the dbms_logmnr package.

fter that, he starts the LogMiner session, but with a slightly different parameter:
 
dbms_logmnr.start_logmnr( options =>
       dbms_logmnr.dict_from_redo_logs
   );

If John chooses the second option (storing the data dictionary in a flat file), he will specify a file where the new dictionary should be generated. Oracle Database must be able to write to the directory, using the UTL_FILE package, so the file location must be specified in the UTL_FILE_DIR initialization parameter. If it is not set already, John will set it to a directory such as /tmp (in UNIX):
 
utl_file_dir = '/tmp'

John will then restart the source database instance for this change to take effect and will then generate the new dictionary by executing
 
begin
   dbms_logmnr_d.build (
      dictionary_filename => 'ora_dict.txt',
      dictionary_location => '/tmp'
   );
end;

This command generates a file named ora_dict.txt in the /tmp directory. John will then transfer the /tmp/ora_dict.txt file from the source to the remote database server in the /tmp directory and start the LogMiner session by executing
dbms_logmnr.start_logmnr(dictfilename => '/tmp/ora_dict.txt');

Note the new dictfilename parameter, which points to the dictionary file created earlier. There is no need for the OPTIONS parameter, because John is not reading from the online dictionary or the redo logs. He can still use the OPTIONS parameter for other things such as formatting the SQL, removing semicolons at the end, and so on.

Which approach is better? With the first approach, the redo stream contains the dictionary, so those redo logs must be available during mining. When you are mining a small set of log files, it may be inconvenient to keep those redo logs in the set as well. With the second approach, the flat file can be made available regardless of the number of redo log files to manage. If the UTL_FILE_DIR initialization parameter is not set, however, the flat-file option becomes unavailable; the only option is to use redo logs.

Supplemental Logging

Let's look at a slightly different scenario for the "Ellen Smith Account Manager" story. Suppose that the ACCOUNTS table is range-partitioned on the column ACC_MGR_ID, where one partition is created for each value of the ACC_MGR_ID. In this case, when Joe updated the ACC_MGR_ID column, the rows moved to a different partition. For instance, a row with ACC_MGR_ID = 6 was in the sixth partition, but when Joe updated the column value to 3, the row moved to the third partition. Because ROWID is based on the physical location of the row, this physical movement changed the ROWID of the row.

In this case, how would you proceed?

If you execute the following undo statement retrieved from the LogMiner session (and shown in Listing 2)—
 
update "ARUP"."ACCOUNTS" set "ACC_MGR_ID" = '6' where "ACC_MGR_ID" = '3' and ROWID =  'AAAOKeAAcAAAH8MAA7';

—it may not update anything, because this ROWID may not correspond to any current ROWID in the table. Worse, it may correspond to an actual ROWID, but it will be the wrong row to update, because the ROWID has definitely changed for the row in question. So you cannot depend on ROWID to get to the changed row.

To uniquely identify the row, you would use the primary key of the table. But the primary key is not recorded in the redo log files, by default.

The way to get to the primary key information in the log files is to enable supplemental logging, which records additional column values in the redo logs. You would issue the statement
 
alter database add supplemental log data (primary key) columns;

to begin supplemental logging. This statement causes the primary key values to be recorded, along with the other changes in the redo logs. Note that the above statement takes effect for changes occurring only after the statement is issued, not retroactively. If you had issued the above statement before the original changes were made, you would have been be able to mine the logs and see the contents of the SQL_UNDO column in the Listing 2 output as
 
update "ARUP"."ACCOUNTS" set "ACC_MGR_ID" = '6' where "ACC_NO" = '5720' and "ACC_MGR_ID" = '3' and ROWID =  'AAAOKeAAcAAAH8MAA7';

Note how this SQL_UNDO column output now lists the primary key, ACC_NO, in addition to ROWID, which makes it possible to identify the correct rows.

With supplemental logging, John can get the primary key values from LogMiner, which helps him uniquely identify a row. If a table does not have a primary key, John can use supplemental logging of all columns, which makes the row unique. On the other hand, recording this extra information places an additional burden on the database server, affecting performance, and should be evaluated carefully.

Conclusion

Oracle redo logs are literally a transcript of what goes on inside the database and contain a mine of information. LogMiner is the tool for retrieving the information from these log files and applying it to identify user transactions or even undo them.

REFERENCES


Mining for Clues By Arup Nanda

1 comment:

  1. Hi,

    Is there a way to filter the logminer records to display the data changes that were made yesterday?
    For example, today is Nov 3, 2011, and I only what to see changes made yesterday, Nov 2, 2011. How can I configure the logminer statement to do that? Many thanks,
    Eric

    ReplyDelete