Monday, November 16, 2009

Speed your backup and recovery with new features in Oracle Database 10g



John, the lead DBA at Acme Bank, is at the IT center of attention in a high-level company acquisition. Recently Acme Bank acquired another bank, taking over its entire technology infrastructure. One special area of interest is the other bank's data warehouse database, which has been recently upgraded to Oracle Database 10g. After the takeover, the management of Acme has asked John for some major enhancements in the operation of the other bank's data warehouse database. In the high-availability area, John has been asked to substantially reduce the recovery time from a database failure, because the current recovery time is not acceptable. Of course, having to pay for the acquisition, Acme has asked John to improve the database recovery time without investing in new servers and software licenses.

With the many powerful new features that Recovery Manager (RMAN) provides in Oracle Database 10g, John has several enhancements in mind, the combined result of these enhancements being faster backup and recovery without investing in expensive new hardware or database licenses. This article shows John's thought process, objectives, and strategy for meeting the company's goal of faster recoveries, while discussing RMAN and other key Oracle Database 10g features available to him. This article is not an introduction to RMAN; refer to the link in Next Steps at the end of this article for information on RMAN basics.


Brainstorming Recovery

The current backup strategy for the acquired bank's data warehouse uses standard RMAN backup features: a full backup once a month and an incremental backup each week. The archived logs are backed up once a day. All backups are sent over the network to the server that hosts the tape drives. The backups go directly to tape; they do not get copied to the server's hard disks and therefore do not require any disk storage space on that server.

So, if a recovery is needed, the full backup set has to be restored first; then all the incremental backup sets have to be applied to it; and finally, the archived logs have to be applied to bring the database to the prefailure state. With the application of so many backup sets and archived logs, the recovery naturally takes considerable time.

Looking at the limitations of the current RMAN backup setup and the requirement from Acme Bank upper management to reduce recovery time, John decides on three main objectives to satisfy management's requirements:
  • Increase the frequency of incremental backups without affecting database performance, and do the backups in less time. This will require a change in the current use of RMAN for incremental backups.
  • Reduce recovery time using only RMAN—no additional servers and software—using a strategy of backing up to disk first and then to tape.
  • Reduce the size of the incremental backups and therefore reduce the database CPU cycles and the network traffic caused by RMAN incremental backups.

Use Block Change Tracking

The acquired bank's data warehouse runs on Oracle Database 10g, which offers a few tools John can use to achieve his objectives. For the first objective, he decides to use the Block Change Tracking feature, new in this version. Block change tracking causes the changed database blocks to be flagged in a special file.

All John needs to do is to enable block change tracking by issuing the following command:
 
ALTER DATABASE ENABLE 
BLOCK CHANGE TRACKING USING FILE 
'/oracle/admin/SMILEY/bct.dbf';

Now any time a block is changed in the database, the fact is recorded in this special file: /oracle/admin/SMILEY/ bct.dbf. 

During incremental backup, RMAN checks this file to see which blocks need to be backed up instead of checking all the blocks of a data file. It dramatically reduces CPU cycles and speeds up incremental backup in the process. The block change tracking file, if present, is used automatically by the incremental backup; no special RMAN syntax is required. Using a block change tracking file saves enough time and CPU cycles for John to take incremental backups every day (or night) instead of once a week.

How big will this block change tracking file be? The size of this file does not depend on the frequency or the size of updates to the database. For John's data warehouse database, a single instance with one old backup retained, the file will be approximately 10MB for 1TB of total database size, 20MB for 2TB, and so on.

John can ensure that block change tracking is turned on and check the location and name of the file used by querying a new dictionary view:
 
SELECT * FROM V$BLOCK_CHANGE_TRACKING;
The output is shown in vertical format:
STATUS  : ENABLED
FILENAME: /oracle/admin/SMILEY/bct.dbf
BYTES   : 11599872

Here the file size is about 11MB, for John's test database of about 1TB. John can also change the location of this file. One option is to do a normal data file rename when the database is in the MOUNTED state:
 
ALTER DATABASE RENAME FILE 
'/oracle/admin/SMILEY/bct.dbf' 
TO '/tmp/bct.dbf';

If a database shutdown is not an option, John can disable the tracking and re-enable it with a different filename as follows:
 
SQL> ALTER DATABASE 
DISABLE BLOCK CHANGE TRACKING;
Database altered.
 
SQL> ALTER DATABASE 
ENABLE BLOCK CHANGE TRACKING 
USING FILE '/tmp/bct.dbf';
 
Database altered.

However, re-enabling block change tracking this way will reset the tracking, erasing all history of the earlier backups.


Using Incremental Merge

Now that John has a plan to address the first objective—to reduce the time required for incremental backup—he moves on to the next objective—to reduce the recovery time using only RMAN. Since incremental backups will be feasible every day thanks to the block change tracking file, John proposes a new backup strategy and schedule, shown in Table 1.

With the new strategy, every Sunday the Level 0 (full) backup will be taken and tagged WEEKLY. Every day the Level 1 (incremental) backups will be taken with tags that show the day of the week; for example, Monday's backup set is tagged MON. In this strategy, the RMAN command used for the Level 0 backup will be:
 
BACKUP INCREMENTAL 
LEVEL 0 TAG = WEEKLY DATABASE; 

This will create an RMAN backup set with the tag WEEKLY. The daily RMAN command for Monday will look like the following:

Backup Incremental LEVEL 1 TAG = MON DATABASE;

The daily command will be repeated for all other days, with appropriate tag values representing the weekday. After a week, if John queries the RMAN recovery catalog, he will see something similar to Listing 1.

Now let's examine a failure scenario with John's revised incremental backup schedule (which includes daily incremental backups). If the data file for tablespace USERS fails on Saturday, John has to restore the Level 0 backup taken (with tag WEEKLY) on Sunday and then apply all the incremental backups with keys from 36 through 46 from the output shown in Listing 1. After that he needs to apply the archived logs. The restore and recovery operations are shown in Listing 2. Note how all the incremental backups with tags MON, TUE, and so on are applied one after the other. After all of the incremental backups have been applied, the media recovery begins and applies the information in the archived logs.

If a full backup had been taken every day, the recovery would not have had to apply all the incremental backups, saving considerable time. But since that is not possible, what if John could apply the incremental backups taken on weekdays to the Level 0 backup taken earlier? This would make the Level 0 backup up-to-date with the latest day's changes and eliminate the need to apply all the subsequent incremental backups during recovery. In Oracle Database 10g, using a disk-based backup strategy, John can accomplish that using the Incremental Merge feature of the RMAN backup. With such an approach, the daily RMAN incremental backup script will look like this:
 
BACKUP INCREMENTAL LEVEL 1 
FOR RECOVER OF COPY 
WITH TAG WEEKLY DATABASE;
 
RECOVER COPY OF DATABASE 
WITH TAG WEEKLY;

When this command is executed every day, the incremental backups are applied to the Level 0 backup with the tag WEEKLY. Each incremental backup is merged with the current full backup, making it a complete up-to-date backup as of that day. For instance, this command executed on Monday will make the Level 0 backup taken on Sunday up-to-date as if it were a Level 0 (full) backup taken on Monday. In this situation, if the data file for tablespace USERS fails on Saturday, the RMAN restore operation will restore from the backup taken and merged with the Level 0 backup on Friday, since that will be most up-to-date, and then apply all the archived logs. Both the traditional RMAN operation (done by the old bank) and the Incremental Merge operation available in Oracle Database 10g are compared in Table 2.

The biggest time savings in Table 2 is in Step 2: the application of incremental backups is not necessary in John's proposed strategy, where the daily incremental backups are merged with the weekly full backup. This makes recovery very fast, satisfying his second objective and Acme Bank's main requirement.

Adding ASM

John's proposed use of disk-based backups and the addition of daily incremental backups requires a set of disks large enough to hold the full and at least one incremental backup temporarily. Since the old bank's backup solution wrote backups directly to tape and Acme's management wants to keep costs down, how does John justify a disk-based backup approach?

Enterprise-class disks in a storage area network (SAN), typically used with enterprise databases, are expensive. However, for the backup location, John proposes adding low-cost Serial ATA (SATA) disks to the data warehouse server to be the staging and merging location for backups.

SATA disks have a reputation of being slower and less reliable than enterprise-class disks, and John acknowledges that in his recommendations. He also mentions the fact that SATA disks cost much less than enterprise-class disks and that their reliability and performance can be significantly increased under Automatic Storage Management (ASM).

John is not asking for a new server or operating system; however, he is proposing that the SATA drives be located under ASM on the same server as the data warehouse database. Using ASM on a set of inexpensive disks like SATA makes the disk group mirrored, increasing its reliability, and striped, increasing the performance over that of a single disk. Other built-in features of ASM, such as kernelized asynchronous I/O and direct I/O, make the performance comparable to high-end disk systems. Best of all, ASM comes bundled with Oracle Database 10g with no extra cost—hence, no need to invest in an expensive file system.

John proposes these three steps:
1.     Create an ASM instance on the data warehouse server.
You do this by creating a parameter file with a special parameter named INSTANCE_TYPE=ASM and bringing up the instance. Note that there are no control files, redo log files, or data files for ASM instances.
2.     Once the ASM instance is started, create a disk group RMANDG as follows:
3.  CREATE DISKGROUP rmandg
4.  NORMAL REDUNDANCY
5.  FAILGROUP failgrp1 DISK
6.  '/dev/d1',
7.  '/dev/d2',
8.  FAILGROUP failgrp2 DISK
9.  '/dev/d3',
10.'/dev/d4';
 This assumes that John has four disks with which he wants to make an ASM group. He can expand it to as many disks as he likes. The above disk group is analogous to mirrored disks with the creation of failure groups, and each failure group is striped across two disks.
11.   Change the flash recovery area of the database to point to the disk group:
12.ALTER SYSTEM SET DB_RECOVERY_FILE_DEST = '+RMANDG'
Note the plus (+) sign before the disk group name, which indicates an ASM disk group.

RMAN backups by default go to the location specified by the parameter DB_RECOVERY_FILE_DEST, which is the disk group rmandg. In addition to the backups, the disk group can hold archived logs and control file backups. If the database has flashback enabled, the flashback logs are stored here as well.

Files inside an ASM location are not accessible as operating system files. To get the backups and archived logs from the ASM disk group to regular disks and tapes, John will use RMAN to back up the recovery area from ASM to tape (at the network backup server) through the following command:
 
BACKUP RECOVERY AREA;

This backs up the entire recovery area, including data file backups, archived logs, and control file backups from the ASM disk group to the tape.

There is no need to keep multiple copies of the backups on tape. The following RMAN command deletes the old backup sets:
 
DELETE OBSOLETE DEVICE TYPE sbt;

John proposes to back up the ASM disks to tape every week, so the tapes will contain a week's worth of data. In case of a data warehouse database failure, the recovery can be quick from the SATA disks. If the disks fail, the database can also be recovered from the tape.

Reducing Traffic with Compression

John's last objective is to reduce the size of the incremental backups significantly so that they do not overwhelm the network. John addresses this issue by proposing to enable the compression feature of RMAN, which uses a proprietary algorithm to compress the backup sets. John's proposed backup scripts change a little to add the compression feature:
 
BACKUP AS COMPRESSED INCREMENTAL 
LEVEL 1 TAG = WEEKLY DATABASE;

The clause AS COMPRESSED compresses data in the backup sets before sending them to the backup location. A compressed backup set means fewer bytes are transferred across the network, allowing incremental backups to be taken everyday without interfering with the regular network traffic. When the time comes to restore the backup sets, RMAN can read the compressed backup sets directly without having to first uncompress them. RMAN compression is also compatible with the Incremental Merge.


Conclusion

Reducing costs and time have been the goals of many projects at Acme Bank. John summarizes his proposed backup strategy in Table 3, presenting his plan in terms of the resources it saves. With a significantly reduced backup and recovery time, minimal disk costs, and reduced network bandwidth requirements, John's proposals are accepted with much appreciation.

REFERENCES

Faster Backup, Faster Recovery By Arup Nanda, Oracle Corporation

1 comment:

  1. Thank you for providing helpful information. I will must be share your blog in my close friends and others who are needed. Chicago Refrigerated Warehouse

    ReplyDelete