Thursday, November 12, 2009

A Closer Look at ADDM



New diagnostics engine in Oracle 10g helps detect and diagnose performance problems
In my last column, I focused on one of Oracle Database 10g's quick wins for DBAs—using the new SQL Tuning Advisor to quickly tune poorly performing SQL statements—and touched only briefly on the new built-in diagnostic engine, Automatic Database Diagnostic Monitor (ADDM), which helps you easily identify problematic SQL statements (see "Advice and Consent" in the March/April 2004 issue for more information about using the SQL Tuning Advisor and the DBMS_SQLTUNE package). Identifying high-load SQL statements is just one of ADDM's many capabilities. A self-diagnostic engine built into the Oracle Database 10g kernel, ADDM automatically detects and diagnoses common performance problems, including:
  • Hardware issues related to excessive I/O
  • CPU bottlenecks
  • Connection management issues
  • Excessive parsing
  • Concurrency issues, such as contention for locks
  • PGA, buffer-cache, and log-buffer-sizing issues
  • Issues specific to Oracle Real Application Clusters (RAC) deployments, such as global cache hot blocks and objects and interconnect latency issues
Let's continue exploring the new performance tuning capabilities of Oracle Database 10g by taking a look at ADDM.


Automatic, Effective Problem Diagnosis
ADDM automatically analyzes the performance of the database at regular intervals (once an hour by default) and identifies any performance issues. ADDM's diagnostic engine is modeled as a classification system that quickly and effectively sifts through the statistical data in Automatic Workload Repository (AWR)—a newly introduced built-in repository in Oracle Database 10g containing performance statistics and workload information—and evaluates problem areas, based on their impact on overall database performance. ADDM's processing logic encompasses best practices and the accumulated expertise of tuning professionals throughout the industry. Ultimately ADDM presents a set of "findings" that not only identifies the cause of performance problems but can also inform administrators that certain database subcomponents, such as I/O, are functioning properly and do not require any further investigation.
Automation is just one of several unique benefits of ADDM. Another important benefit is ADDM's ability to identify the root cause of problems. Just as physicians achieve better results for their patients by treating causes rather than symptoms, DBAs can achieve better database performance by finding the root cause of performance issues before making changes to the system. However, finding a root cause can be difficult and time-consuming, given that symptoms can sometimes mask the real problem.

ADDM can distinguish between the cause and symptoms of a problem. It's able to do so, in part, by using the extensive new data (events, statistics) generated by the database at runtime. Oracle database kernel code has always been instrumented to provide raw performance data, but in this release, the instrumentation is more comprehensive.
For example, wait events are more granular, with numerous locks and latches now separated into distinct wait events. Furthermore, wait events are now grouped into classes—Application, Administration, Commit, Concurrency, Network, User I/O, System I/O, and Configuration, for example—to facilitate processing through ADDM.
In addition to the new wait-event model, Oracle Database 10g also includes an array of new statistics that provide performance data on everything from operating system performance—hard disk I/O statistics, CPU utilization, and network statistics, for example—to cumulative statistics at the system and session levels. One of the most important new statistics is database time, or DB time—the amount of time the database is actually working on database calls.

It's About Time
ADDM uses DB time to measure throughput; it's the total time spent by foreground processes—such as a read I/O—waiting for a database resource, running on the CPU, and waiting for a free CPU. The overarching goal of ADDM is to reduce the value of DB time across the system, thereby improving overall throughput.
Operational information about active sessions—those using the CPU—as well as sessions waiting for the CPU, is sampled every second and held in a scrolling buffer in the server memory—Active Session History (ASH), an important new feature of Oracle Database 10g. ASH (V$ACTIVE_SESSION_HISTORY) maintains historical data; in previous Oracle releases, DBAs had information about currently active sessions only— not historical data—so figuring out the cause of performance problems after the fact wasn't easy.
Data from ASH, along with other important database statistics, persists on the disk in the form of "snapshots." Snapshots are taken every hour and stored for seven days, by default, but DBAs can change settings for both snapshot frequency and storage duration. AWR provides the raw information for ADDM analysis, which begins automatically as soon as a new AWR snapshot is taken.


Getting Started with ADDM
Because ADDM runs automatically after each new AWR snapshot is taken, no manual steps are required to generate its findings. But you can run ADDM on demand by creating a new snapshot manually, by using either Oracle Enterprise Manager (OEM) or the command-line interface. The following shows creation of a snapshot from the command line:
 
SQL> exec dbms_workload_repository.create_
snapshot();
PL/SQL procedure successfully completed.

After the new snapshot is created, its information (snap_id, begin_interval_ time, and the like) is populated to the DBA_HIST_SNAPSHOT dictionary view. A few seconds after you take the snapshot, ADDM brings new findings to the surface, based on analysis of that snapshot. The findings are available on the new OEM Web-based console, on the Database home page.


When you access the database instance in the new OEM console, you see at a glance how the database is operating in general, on a dashboard-style home page. You can quickly verify that the server has adequate CPU and memory resources and get a general overview of the system from this one page. The Active Sessions pie chart shows the current distribution across the server in three key areas: Using CPU, Waiting I/O, and Waiting Other. You can drill down on any of these items to see a line graph of activity over the last 24 hours (and change this to values such as 31 days, 7 days, or just a few minutes).

The results of the latest ADDM run are displayed in the Performance Analysis section of the home page which provides a summary view of the top ADDM findings: the impact of the diagnosed problem as a percentage of overall database performance, a description of the finding, and a recommendations summary. The text of each finding is a hyperlink to more-detailed information on that particular finding. ADDM recommendations can include running one of the new advisors—SQL Tuning Advisor, SQL Access Advisor, Space Management Advisor, and so on—when applicable. For example, a solution for a particular finding can involve using SQL Tuning Advisor to tune a particular SQL statement or adding more resources to the machine. Clicking on the recommendation link displays the detailed finding view, which includes the full text of any recommendations.


You can also generate an ADDM report that summarizes performance data and provides a list of all findings and recommendations. You can access ADDM reports through the Web-based OEM console or from a SQL*Plus command line by using the new DBMS_ADVISOR built-in package. For example, here's how to use the command line to create an ADDM report quickly (based on the most recent snapshot):
 
set long 1000000
set pagesize 50000
column get_clob format a80
select dbms_advisor.get_task_report(
task_name, 'TEXT', 'ALL') 
as ADDM_report
from dba_advisor_tasks
        where task_id=(
        select max(t.task_id)
        from dba_advisor_tasks t,                     dba_advisor_log l
          where t.task_id = l.task_id
          and t.advisor_name='ADDM'
          and l.status= 'COMPLETED');

The ALL parameter generates additional information about the meaning of some of the elements in the report.
Findings report the impact of the identified problem as a percentage of DB time, which correlates with the expected benefit, based on the assumption that the problem described by the finding will be solved if the recommended action is taken. For example, here's a finding that identifies a configuration issue and recommends adjusting the sga_target value in the parameter file:
 
FINDING 3: 5.2% impact (147 seconds)
---------------------------------------
The buffer cache was undersized causing significant additional read I/O.
RECOMMENDATION 1: DB Configuration, 5.2% benefit (147 seconds)
ACTION: Increase SGA target size by increasing the value of parameter "sga_target" by 24 M.
SYMPTOMS THAT LED TO THE FINDING:
Wait class "User I/O" was consuming significant database time. (5.3% impact [150 seconds])
...

As you see, DBAs have the flexibility to decide how to resolve ADDM findings. In this case, the DBA might decide to use the new automatic SGA sizing feature, rather than modifying the parameter.

Conclusion
Oracle Database 10g provides DBAs with comprehensive, quantifiable diagnostic information that can be used to proactively monitor and prevent problems as well as respond to them when they arise. There's an enormous amount of new statistical information available to help keep the database running optimally. More important, the automatic performance-diagnostic capabilities of AWR and ADDM will not only help you resolve critical performance problems quickly but will also provide you with hard-and-fast, quantifiable data for making the business case for new hardware, additional staff, application development initiatives, or better training for employees. And Oracle plans to continually update ADDM's problem classification tree and diagnostic rules to keep pace with new features and technologies introduced in the database and underlying operating platform. 

REFERENCES

A Closer Look at ADDM By Kimberly Floss  , Oracle Corporation

Wednesday, November 11, 2009

Oracle Enterprise Manager Grid Control Architecture for Very Large Sites

What is a "very large site" as far as Oracle Enterprise Manager Grid Control is concerned? It simply means an enormous collection of targets to be monitored and managed.

My team was directly involved with the first Grid Control production site in the world, at a large telco in Australia (with a lot of assistance from Oracle). The project involved setting up a central Grid Control site managing disparate servers and databases under different DBA teams in the corporate organization. The teams, servers and databases were scattered across the Australian continent.

Management was motivated for the following primary reasons: It needed a view of all Oracle versions and licenses as well as a means of reducing expensive SAN storage usage. The space was being paid for at a high annual cost but either lay unallocated at the Unix level or was over-allocated at the database level, with a lot of free unused space in tablespaces. To handle this particular aspect, Oracle assisted with specially developed Grid Control storage reports. These reports were so useful that they were eventually incorporated into Release 2 of Grid Control.

The other goals driving this project were to have one single database management tool throughout the enterprise, and to use Grid Control's excellent features such as database performance analysis, RMAN backup setup and scheduling, Oracle Data Guard setup, and monitoring, and cloning of Oracle databases and Oracle Homes from what we called a "gold copy" which would save a lot of internal database consulting time. Prior to Grid Control, all these tasks were performed using the traditional manual approach, which consumed a lot of time-besides being prone to human error.

Performing all these activities for literally more than two thousand database servers, monitoring them continuously, and having multiple DBA teams use the central site for this purpose required a particularly well architected site, so that it would scale as necessary in stages I, II, and III when more and more targets would be brought on-the goal being 2000 target servers or more.

In this article I will offer an overview of the architecture used to achieve this high scalability in Grid Control. This kind of information will be useful for customers that are contemplating the use of Grid Control but need
guidance about properly architecting their solutions.


The Wrong Architecture


Suppose a DBA team, or its management, decide to implement Grid Control. The normal tendency would be to use a test or development server to install the product, be it on a flavor of Unix, Linux, or Windows. This means all Grid Control components (the current release at the time of writing being Release 4) are placed on a single server. This includes the repository database, Oracle Management Service (OMS), and the EM agent.

Then, EM Agents would be installed by either the push or pull method, on a few other development and test database servers. After the DBA team experiments with the functionality of Grid Control, it would likely tentatively decide to install an agent on a production server for the first time.

Let's say eventually management decides to move the whole shebang of Grid Control to production, but it now makes the mistake of assuming that what works for a few development servers would also work for production. It authorizes the DBA team to install Grid Control on a production server, again a single server.

The team installs all the components again on a single server, perhaps sharing the Grid Control install with a production or test database. This is followed by EM agents being installed on all the production and test database servers pointing back to the Grid Control server.

Things work for a while. But as the Grid Control workload gradually increases, as more and more databases are managed by more DBAs, as more and more monitoring is performed, as Grid Control is used more and more for RMAN backups, Data Guard setup and monitoring, cloning of databases and homes and so on, the Grid Control system grinds to a halt.

Why would this happen? For the answer, we need to understand the Grid Control internals. The main working component of Grid Control, the engine as it were, is OMS. This is a J2EE application deployed on Oracle Application Server 10g; the member components are the Oracle HTTP Server, the Oracle Application Server Containers for Java (OC4J), and the OracleAS Web Cache. Therefore, Grid Control is a reduced version of Oracle Application Server itself.

At the Unix server level, we see a Unix process that is the actual OC4J_EM process. This is also seen when the opmnctl command is executed:


./opmnctl status    

Processes in Instance: EnterpriseManager0.GridMgt001.in.mycompany.com

-------------------+--------------------+-------+---------

ias-component      | process-type       |   pid | status

-------------------+--------------------+-------+---------

WebCache           | WebCacheAdmin      |  2071 | Alive

WebCache           | WebCache           |  2099 | Alive

OC4J               | OC4J_EM            | 27705 | Alive


OC4J               | home               |   N/A | Down

dcm-daemon         | dcm-daemon         |   N/A | Down

LogLoader          | logloaderd         |   N/A | Down

HTTP_Server        | HTTP_Server        |  2072 | Alive

A small digression at this stage: Since the OMS runs on Oracle Application Server, you can control it like you would do with Application Server: use the EM Application Server control, or at the command line use opmnctl (Oracle Process Management Notification Control), or dcmctl (Distributed Configuration Management Control). This is in addition to the Enterprise Manager Control (emctl) utility.

Thus, OC4J_EM is only a single Unix process with its own PID. The memory used by this process is also limited, it is set by the file $ORACLE_HOME/opmn/conf/opmn.xml. You could perhaps increase the memory used by the process but it remains just a single process. We can imagine the one process being used for managing numerous databases and servers-to perform various tasks such as Data Guard setups, cloning, and so on-and understand why such a setup will simply not scale.

Obviously, if the database itself were to run on a single process, with the db writer, the log writer, the archiver, and numerous other process functions being performed by a single process, then the database would become less efficient and scalable. This is the primary reason why, if all Grid Control components are placed on a single server, only limited scalability will be achieved: you would be limited to one OC4J_EM process with its own limits of memory and processor speed. If the OC4J_EM process were to reach the limits of its memory under heavy load, and the process were to slow down or not respond, then other DBAs would not be able to login to the Grid Control Console for their own database management work.

Placing Grid Control components on a single server is not recommended in production, neither is sharing it with a production or test database on the same server. Grid Control needs its own server, and it needs its own set of servers in a properly architected solution. It is recommended that some time be spent to plan the Grid Control site being contemplated for production. Senior management should be convinced of the need for this initial study, it should approve the budget for the solution, and the work should then be scoped out and performed as a professional project, since Grid Control is an enterprise solution and not a minor tool to deploy on a DBA workstation.

Grid Control Internals

Grid Control is drastically different from previous incarnations of Enterprise Manager. In the past, Enterprise Manager was not so scalable, simply because it was not N-tiered. The oldest avatar was Server Manager, which was a PC executable utility. Immediately before Grid Control, there was OEM 9i, which was a bulky Java beast sitting cross-legged on the PC's memory with numerous issues as a result.

When Grid Control was created, the internal architecture was drastically altered to the N-tier model. Oracle's vision is broadly N-tier, which is in line with and also sets the direction for modern IT thought. Grid Control became the three components mentioned previously, and because the main engine, the OMS, now runs on the application server as an OC4J application, it instantly became scalable.

Why is this possible? First of all, Grid Control is not tied to one PC or one server; multiple OC4J EM applications can be placed on the application server on different servers, and they can all point to the same EM repository.

Herein lies the secret of the immense scalability of Grid Control. The boundaries were broken, and horizontal scaling were opened to the EM world. The more OMS servers you add to the EM site, the more targets you can manage.

The Right Architecture

Our real-life large site implementation example will illustrate this concept more clearly. At the foundation of the implementation, industry-standard and open architecture can be utilized, such as Linux servers with the following configuration:


Specification Type
Specification Details
Hardware
Any industry vendor
OS
Linux (any version certified with Grid Control)
CPU
4 (2.2 GHZ or above)
Memory Requirement
8GB
Disk Space
10GB Free Space

There is no need to deploy powerful expensive servers (beefy beasts that typically have 24 or more CPUs and 32GB or more memory). Smaller 4 CPU machines with 8 GB memory are being used, since the intention is to scale horizontally and not vertically.

The "Free Space" mentioned in the specification table is for the Oracle software, such as the Oracle Database Home, the Oracle Management Service Home, and the Agent Home. It does not include the database, which will be placed on either a SAN or a NAS (Netapps filer). The database space requirement for the EM Repository would be approximately 60 to 70GB, with an equal amount of space reserved for the Flash Recovery Area, where all archive logs and RMAN backups will be stored. Oracle recommends database backups to disk (the Flash Recovery Area), so that fast disk-based recovery is possible..

Even with a large number of targets being monitored and managed, the database size rarely goes above above 60 to 70GB with out-of-the-box functionality. A new feature of Grid Control is that the EM repository database (10g) manages itself so far as space is concerned, in the sense that it performs rollups of metric data at predetermined intervals. Hence the metric data that is being collected continuously from the targets does not drastically increase the database size. On the other hand, it is possible to manually create extra metrics for monitoring, and this may lead to an increase in the database size greater than this example figure.

During the installation phase, the Full Grid Control software is installed first of all on one of the servers, using the Grid Control installation CDs. This is done by selecting the Enterprise Manager 10g Grid Control using a new database installation type. This server becomes the repository server since the repository database is created on this machine. Being a full install, an OMS and EM Agent are also installed on the same repository server. (You can ignore the OMS at this point: more on this later.)

Next, an additional OMS is installed on each of the other servers, this is done using the same Grid Control Installation Cds but selecting the Additional Management Service installation type. During the installation of the additional service, you are asked to point at an existing repository, so point to the repository database on the first server. The repository database must be up and running at this stage with a successful installation of the repository in the Sysman schema.

In the process of the Additional Management Service installation type, only the management service (OMS) and the EM agent will be installed. This is completed on three or more additional servers, these servers now become the management server pool.

The repository database server can be complemented with a standby database server using Oracle Data Guard, or optionally an Oracle RAC cluster on multiple nodes if it is a requirement to horizontally scale up the repository database performance. But a noteworthy point is that in Grid Control, the performance requirement is not so much on the database side, but more on the management server side. The highest scalability is achieved on the management servers since the OC4J_EM is where the bulk of the Grid Control work is performed. This is the reason why the architecture should include three or more management servers that are load balanced for a large Grid Control setup.

Load balancing the pool of management servers forms an integral part of this architecture. A hardware load balancer, such as a Big IP Application Switch Load Balancer from F5 Networks, can be used for this purpose. (This company's flagship product is the BIG-IP network appliance. The network appliance was originally a network load balancer, but now also offers more functionality such as access control and application security.)

The load balancer is set up with its own IP address and domain name for example: gridcentral.in.mycompany.com. The load balancer in turn points to the IP addresses of the three management servers. When a service request is received at the IP address or domain name of the load balancer, and this can be at a particular port which can be set up at the balancer level, the balancer decides to distribute the incoming service request to any of the three simultaneously active management servers in its pool, at the port specified. Grid control uses various ports for different purpose-for example, there is a certain port used for the Console logons, and a different port used for the Agent uploads of target metric data. The Big IP must be set up for all these ports so that load balancing occurs for Grid Control Console logons as well as for Agent uploads of target metric data.

An additional benefit is that this would give excellent redundancy to the Grid Control system. If one of the management servers were to stop functioning for any reason, such as could occur under heavy load, the OC4J_EM process may need to be restarted using opmnctl. Thus one of the management servers can be inactivated, while the other active management servers continue to service requests as distributed by the Big-IP load balancer. The load balancer automatically ignores the non-reachable IP (discovered to be so by its own monitors, which checks the pool members on an ongoing basis, at predetermined intervals). So, failure of any of the existing management server instances simply results in the load balancer directing all subsequent service requests to the active surviving instances. When the Big IP monitor detects that the node is back on line, the node or service is automatically added back into the pool.

Software load balancing could alternatively be used, instead of hardware load balancing. This is a simple solution that uses software, such as network domain names, to route requests to the three management servers. The hardware solution is more expensive, but it is recommended since it is a more powerful solution. A hardware load balancer responsible for load balancing as well as failover capabilities should form an integral part of the total architecture solution, making the solution much more robust and flexible.

If further redundancy is required at the Big IP load balancer level, then a standby load balancer can be deployed. The standby is used to shadow the production load balancer in all configuration changes, and takes over seamlessly if there is anything wrong with the production machine. This is an added precaution. When there are a number of production database teams using the centralized Grid Control to manage and monitor production systems, then this kind of high availability architecture is absolutely important where even the load balancer is deployed redundantly.

To manage the Big IP load balancers, internal IPs must be assigned to both the primary and the standby load balancers, and a floating IP address must be assigned which points to either the primary or standby load balancer depending on which balancer is active. You would then manage the load balancer via the floating IP using the URL as listed in the table below. This is the Big IP management utility or Web console. Login to this console using the Admin password or the Support password. (New users can be created in the Big IP web console with read-only rights if require.)

The Big IP root password is used for logging in at the Linux level using SSH. The balancer runs Linux but with a reduced command set shell. This is the command line interface (CLI) of Big IP. Commands are slightly different from normal Linux, for eg. in the CLI, the command "bigtop" is used to monitor the load balancer.

The internal IPs and Floating IP are illustrated in the following table (each IP address is shown as nnn.nnn.nnn.nn but is implicitly unique):

Hostname
Ip Address
Description
Big Ip Management URL
GridBal001
nnn.nnn.nnn.nn
Unit 1 IP Address
https://


GridBal002
nnn.nnn.nnn.nn
Unit 2 IP Address
https://


GridBal003
nnn.nnn.nnn.nn
Floating IP Address
https://


Of the two load balancer units GridBal002 and GridBal002, any one unit could be active (actually handling the load balancing). Typically the two units will have 3 addresses associated with them: Unit 1 IP, Unit 2 IP, Floating IP. The Floating IP is a shared IP address and will only "exist" on the unit that is active at that time. So, if you would like to only manage the active device, then you would connect to https://FloatingIP.

However, if you would like to manage the units directly, you could do so by accessing them via https://Unit1IP or https://Unit2IP.

The other servers in the Grid Control configuration are illustrated by the following table:


Hostname
Ip Address
Description
GridMgt001
nnn.nnn.nnn.nn
Management Server One (OMS 1)
GridMgt002
nnn.nnn.nnn.nn
Management Server Two (OMS 2)
GridMgt003
nnn.nnn.nnn.nn
Management Server Three (OMS 3)
GridMgt100
nnn.nnn.nnn.nn
Virtual Management Server (Virtual OMS)
GridDb001
nnn.nnn.nnn.nn
Database Server One (DBS 1) (Primary or RAC node)
GridDb002
nnn.nnn.nnn.nn
Database Server Two (DBS 2) (Standby or RAC node)

For the purposes of load balancing, Big IP uses the concepts of virtual servers, pools, associated nodes (members) and rules to guide the load balancing. A virtual OMS server is set up at the Big IP level with its own IP address, this in turn points to a pool of Oracle management servers with their own IP addresses. Therefore the outside world has merely to point to the virtual OMS server's IP address or domain name, for both Grid Console logons or Agent uploads from multiple targets. The pool of Oracle Management servers is set up using the IP address:port combination, which means you can have one pool set up for Grid Console logons, and another pool set up for Agent uploads to the OMS.

Keeping this in mind, and after studying the recommendations on load balancing in the Enterprise Manager Advanced Configuration Guide , the following setup was performed using the Big IP Management console:

Two new pools were created, EMAgentUploads and EMConsoles. Each pool has the three OMS nodes (the 3 active ones; however you could add a node which is still being setup and keep it as "forced down" in Big IP so it wont be monitored). The difference between the pools is at the port level. The pool EMAgentUploads is using port 4889 for Agent uploads, and the pool EMConsoles is using port 7777 for console access (7777 is the default port for Oracle Web Cache).

At the pool level, Big IP also allows you to define the persistence (stickiness) should subsequent service requests be routed to the same pool member or not. While Grid Console logons do not require stickiness (we do not care if the console uses a different OMS each time the DBA connects), it was decided that agent uploads could benefit from this stickiness. The pools were modified accordingly and "simple persistence" was set up for the agent uploads pool, but none for the console logons pool.

Two new Virtual OMS servers were created, the first using port 4889 for agent uploads using the EMAgentUploads pool, and the second using port 7777 for the Web Cache EM Console using the EMConsoles pool. Both virtual servers are using the same reserved IP address (but the ports are different).

Big IP Monitors that continuously inspect the status of pool members can also be set up. One such monitor EMMon was setup using the send string of "GET /em/upload" and the receive rule of "Http XML File receiver" which was as per the Enterprise Manager Advanced Configuration Guide. However this monitor worked for the 4889 ports but not the 7777 ports. Therefore a new monitor "EMConsoleMonitor" was created based on http with the send string of "GET /", this was used to successfully monitor the 7777 ports.

The URL http://GridMgt100:7777/em now worked successfully and load balanced Grid Console logons to all the three management servers in the pool.

The URL http://GridMgt100:4889/em was tested and also successfully load balanced in a similar manner, but this URL is to be used for agent uploads only.

Now, when the corporate network alias "gridcentral.in.mycompany.com" is switched to point to the virtual OMS server GridMgt100, the Big IP load balancer starts being used by production.

A point to note is that the initial changes, seen as successful at the Big IP management console, were not effective at the URL level (the URLs didn't work) until the Big IP was failed over to its standby and back again. Any configuration changes performed on the active load balancer should be propagated to the standby load balancer. This is done by the Big-IP configuration utility, go to Redundant Properties and click on Synchronize Configuration. This makes the standby balancer configuration to be the same as the active, including all pools, virtual servers, and rules, so the standby will be ready to take over the load balancing in the event of a failover. Another notable point is that when changing the admin password, because the admin user is configured as the configsync user, you must change the password to match on the peer controller in order for configsync to work.

It is also possible to manually fail over. Before any failover to the standby Big IP, it is recommended to mirror all connections. However, be aware that this setting has a CPU performance hit. This is selected under the properties of Virtual server ..Mirror connections.

It was noted that a management server had been installed on the Grid Control Repository server during the initial install. Since the management server function has been separated from the repository function in this architecture, it is not recommended to use the extra management server that has been installed on the repository server. Simply dedicate that server only for the repository. For this purpose, only the three stand-alone management servers were placed in the Big IP load balancer pools.

The extra management server is a Java process that runs on the repository server and takes up memory and processing power, so it may be a good idea to use opmnctl on this server and shutdown the management server (OC4J_EM). Or, if Unix reboot scripts are being written that startup the OMS, Agent, and Database on the servers whenever there is a reboot, simply leave out starting the OMS in the case of the repository server. Just start the Listener, the Database, and then the Agent. On the other management servers, start the OMS and the Agent.

A diagram of this architecture is shown below:





Conclusion


This kind of horizontal scaling architecture using hardware load balancers and multiple management servers is immensely powerful and fits in very well with Oracle's Grid vision. You can easily manage hundreds or even thousands of targets with such an architecture. The large corporate which had deployed this project scaled easily up to managing 600 to 700 targets with a pool of just three management servers, and the future plan is to manage 2,000 or more targets which is quite achievable.

REFERENCES

Oracle Enterprise Manager Grid Control Architecture for Very Large Sites, by Porus Homi Havewala, Oracle Corporation

Oracle RMAN Backups: Pushing the "Easy" Button

The day has come when DBA teams can no longer afford to ignore Oracle Enterprise Manager Grid Control (current release being 10g Release 5 at the time of this writing). It is not practical to do so, due to the immense complexity of today’s Oracle enterprise database software. The core database facilities are becoming more and more sophisticated, and it is extremely tedious to use only the command line to manage these complex options. Some DBAs may write and maintain scripts to do so; others may buy scripts from the market, and such scripts may not be guaranteed to work for new releases of the database. Thus, they have to be endlessly tested and maintained.

Furthermore, Oracle strongly recommends the use of Oracle Grid Control for managing sophisticated options like its fully active-active clustered database option, Oracle Real Application Clusters (Oracle RAC). This is so especially in the area of analyzing cluster performance, which is greatly aided by Oracle Grid Control.

In this article, you will learn the traditional use of UNIX shell scripts and crontab to set up and schedule Oracle Recovery Manager (Oracle RMAN) backups, including a technical explanation of the supplied scripts (in Part I). You will also see how the same end result of setting up and scheduling Oracle RMAN backups can be achieved seamlessly and easily via Oracle Grid Control, without using any UNIX shell scripts or cron (in Part II). The demonstration presented in this article is drawn from the real-life experiences of a DBA working at a large corporate site, who used both approaches in his career history.

Background

When the earliest Oracle databases were being backed up, they were offline (“cold”) backups. The DBA manually performed the backup as an OS file copy of all database files after shutting down the database. Some enterprising administrators started writing UNIX shell scripts to do the work of shutting down the database, copying the files using OS commands, and then starting the database again. The traditional UNIX scheduler cron was used to call the script at the appropriate time. So there was some level of automation, if one discounted the manual effort of writing, implementing, and testing the scripts. They also had to maintain the scripts for changes—the database name might change, or more databases might appear on the same server that would need to be backed up, so the scripts would need to be modified.

In Oracle Database Version 6, online (“hot”) backups were introduced. This meant that the database could stay open and process transactions at the same time as the backup was being performed. But the caveat was that each tablespace had to be placed in a special backup mode before the OS file copy could take place. So, the DBAs modified their scripts to connect to the database, get a list of tablespaces, put each tablespace in backup mode, perform the OS file copy backup, and finally take the tablespaces off backup mode. The scripts were becoming slightly more complex.

Eventually in Oracle Database Version 8, there was a groundbreaking event—Oracle Recovery Manager (RMAN) was introduced as the recommended backup method for Oracle databases. However, even though it was technically superior to the older backup mode online backups, there was some resistance to Oracle RMAN due to Oracle’s initial requirement for a separate catalog database to hold a history of the backups. Many DBAs thought it was illogical to have a second database to back up the first database.

Oracle realized this and soon made it possible to use the control file of the database itself to hold the history of the backups. This reduced the reliance on a separate catalog database. In Oracle Database Versions 8i, 9i, and 10g, the result was that the control file method of Oracle RMAN backups started being accepted by mainstream DBAs, and Oracle RMAN started being used to a greater extent on production databases.

However, because UNIX shell scripts and cron had been used in the past to execute and schedule offline and online database backups, Oracle RMAN inherited this history. The older UNIX shell scripts were modified to connect to Oracle RMAN and perform the backup. There was no need any longer to put the tablespaces in backup mode, but  everything else remained the same.

So, for a number of years, the traditional approach to Oracle’s RMAN utility was exactly the same as before: the use of UNIX shell scripts and cron.

In this age of sophisticated database management tools such as Oracle Grid Control, this may seem archaic, but there are a large number of companies who still use a shell script to call Oracle RMAN, and there are some who use layer upon layer of shell scripts.

Oracle purposely simplified the Oracle RMAN syntax to make the job of backup and recovery much easier, but the purpose is defeated if the Oracle RMAN commands are hidden under sublayers of OS shell scripts. Even experienced DBAs who walk into these large companies find it difficult to understand the customized shell scripts, although they know how to use Oracle RMAN.

Thus productivity in such companies is reduced, and the act of debugging errors is made much more difficult due to the script layers. And if the scripts are to be modified in the future to cater to changed requirements, there are high maintenance costs.

Next, on to the comparative approaches.

Part I: The Traditional Approach

For the purpose of our demonstration, the production server is proddb001 and the database on this server is FIN1P, a production Oracle database with the company’s financial information. The DBA has been asked to set up and schedule Oracle RMAN backups for this database. The DBA writes the shell scripts and places them on the server following the setup steps below.

These steps must be followed for every database server requiring Oracle RMAN backups for its databases. If the databases are on an active-passive cluster—for example, a SUN HA cluster or any such technology—then the steps would be followed for each server.
  1. As  the root UNIX user add the line “oracle” to /etc/cron.d/cron.allow in order to allow the oracle UNIX user to use the cron utility in UNIX.

  2. As the oracle UNIX user, add the following to the crontab:



    30 21 * * * dba/scripts/rman_backup_db.sh FIN1P

    As per crontabl syntax, this calls the rman_backup_db.sh script at 21:30 hours on each day. The script is asked to execute against the FIN1P database by specifying this database name as the first and only argument.



  3. Log on as a DBA to database FIN1P in SQL*Plus and create an externally identified user :



    create user ops$oracle identified externally;
    grant dba to ops$oracle;

    This has the effect that the Oracle UNIX user can log on to SQL*Plus as the Oracle database user without specifying the password—in other words, the user is identified externally.

    This is the technique used most often with scripts that log in to the database, especially those that require DBA rights for database-level backups. It is not a good idea to hard-code database passwords in scripts, because the scripts are UNIX files and may be readable by anyone on the computer (unless the file permissions are locked down). So as a safety precaution, an externally identified Oracle user can be used to log in as the DBA and perform the backup. This circumvents the need for specifying the database password in the script.



  4. In this approach you would use a Filer volume for the Oracle RMAN backups. Database standards for the corporate environment maintain that the backup volume must be mounted as /U99 at the server level.
    In /etc/vfstab, make sure that there is an entry for the backup mount point:



    ausmelb-corp-netappsfiler-tier3:/vol/vol1/dbbackup - /U99 nfs -  yes
                 hard,vers=3,intr,suid,proto=udp,rsize=32768,wsize=32768
    

    NetApp documentation should be consulted on the appropriate mount options, because these may differ as per the UNIX flavor of the host, the version of the filer, and the technology in use.

    If there is no such entry, consult with the storage department in corporate IT and ask them to allocate a volume to the server and add it to this file. The mount point will now be mounted every time the server is rebooted, or it can be mounted manually using the mount command as the root UNIX user:



    mkdir /U99
    mount /U99



  5. As the root UNIX user, execute the following at the UNIX command prompt:



    chown –R oracle:dba  /U99

    This is to make sure that the /U99 backup mount point and all subdirectories (as specified by the –R argument) under this mount point are owned by the oracle UNIX user and dba UNIX group, so Oracle RMAN is able to create backup pieces under this mount point.



  6. Assuming that /home/oracle is the Oracle UNIX user’s home directory, perform the following steps as the Oracle UNIX user:



    mkdir /home/oracle/dba
    mkdir /home/oracle/dba/scripts
    mkdir /home/oracle/dba/logs
    mkdir /home/oracle/dba/work
    mkdir /U99/FIN1P
    mkdir /U99/FIN1P/rmancmd
    mkdir /U99/FIN1P/log
    

    These commands create the UNIX subdirectories into which the UNIX script is placed and to which the logs are written, as well as creating temporary working directories and the directories for the generated Oracle RMAN command script and Oracle RMAN runtime log file.



  7. Under the subdirectory /home/oracle/dba/scripts, create the file rman_backup_db.sh using vi or any UNIX editor. The script, when executed every day at the specified time set in cron, will generate the following Oracle RMAN command file /U99/FIN1P/rmancmd/rman_FIN1P.cmd:



    #
    #        Creation                       Porus Homi Havewala      07/06/2008
    #
    
    
    # Configure RMAN settings
    CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 3 DAYS;
    CONFIGURE CONTROLFILE AUTOBACKUP ON;
    CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE  DISK TO '$BACKUPDIR/cf_%F';
    CONFIGURE BACKUP OPTIMIZATION ON;
    CONFIGURE DEVICE TYPE disk PARALLELISM 3;
    CONFIGURE DEFAULT DEVICE TYPE TO disk; 
    CONFIGURE CHANNEL 1 DEVICE TYPE DISK FORMAT  '$BACKUPDIR/b_%U';
    CONFIGURE CHANNEL 2 DEVICE TYPE DISK FORMAT  '$BACKUPDIR/b_%U';
    CONFIGURE CHANNEL 3 DEVICE TYPE DISK FORMAT  '$BACKUPDIR/b_%U';
    CONFIGURE SNAPSHOT CONTROLFILE NAME TO  '$BACKUPDIR/snapcf_${DBNAME}.f';
    
    
    # Perform backup of database and archivelogs, deleting  backed up archivelogs
    BACKUP DATABASE PLUS ARCHIVELOG DELETE INPUT;
    
    
    # Maintainance commands for crosschecks and deleting  expired backups
    ALLOCATE CHANNEL FOR MAINTENANCE DEVICE TYPE DISK;
    CROSSCHECK BACKUP;
    DELETE NOPROMPT EXPIRED BACKUP;
    DELETE NOPROMPT OBSOLETE DEVICE TYPE DISK;
    CROSSCHECK  ARCHIVELOG ALL;
    DELETE NOPROMPT EXPIRED ARCHIVELOG ALL;
    # End of RMAN command script
    

    As can be seen from the Oracle RMAN commands, a recovery window of three days is specified as the Oracle RMAN retention policy, which means that all backups necessary to recover up to the last three days of data will be retained. Obviously this setting may need to be changed, depending on the size of the database and the free space available in the backup mount point /U99 at any point. The DBA needs to closely monitor this space and change the recovery window if necessary.

    Control file autobackup is also configured, so the control file will be automatically backed up along with every database or archive log backup. This is especially important because we are using the nocatalog mode of Oracle RMAN, so all the history of our backups is only in the control file.

    A parallelism setting of 3 is used, which means that three Oracle RMAN channels will be used to actually create the backup pieces. This will increase the speed of the backup, depending on the structure of the disk subsystem.

    A full backup of the database and the archive logs is then performed, and the archive logs that are backed up are deleted immediately by Oracle RMAN due to the delete input clause.

    After the backups are completed, maintenance jobs are performed that cross-check the existing backups and archive logs and make sure that they are present on-disk. All expired (not found) and obsolete (as per the retention policy) database, archive log, and control file backup entries are deleted from the control file records. The maintenance commands also ensure that the obsolete database backups and control file autobackups are physically deleted from the disk.



  8. Make the rman_backup_db.sh script executable at the UNIX level by setting the user-executable file permission:



    cd /home/oracle/dba/scripts
    chmod u+x rman_backup_db.sh



  9. Test out the rman_backup_db.sh script by calling the script from the UNIX command prompt manually, instead of waiting for the cron scheduled execution to take place:



    cd /home/oracle/dba/scripts
    ./rman_backup_db.sh FIN1P

    This should execute the Oracle RMAN command script on the FIN1P database and take a complete database and archive log backup. After the backup is completed, inspect the Oracle RMAN log file generated to make sure that there are no errors and that the backup has completed successfully with Oracle RMAN backup pieces created in the backup directory /U99/FIN1P.



  10. On the next day, make sure that the scheduled Oracle RMAN backup in the UNIX cron has been started and completed successfully. If not, correct the cron entry.

Conclusions

As you can see, there are a number of manual steps in the traditional approach. The DBA using this approach estimates that two to three hours will be required to set up and test the Oracle RMAN backup in this manner, on every new provisioned computer.

Of course a rush job can be performed, but because rushing is more prone to human error, management takes two hours as the final estimate. This is the time consideration.

If there are a number of Oracle RMAN deployments happening on projects every week in a large corporate site, the time quickly adds up to a number of expensive DBA man-days.

The other consideration is the maintenance aspect of the UNIX shell scripts. Management must retain DBAs who are familiar with UNIX shell scripting. In some cases, the initial scripts have been written in Perl. So there would be a need for familiarity with that language. Because the choice of scripting language is open, there may be sites using other scripting languages as well. Also, scripts may be written in totally different ways to do the same job, because there are no scripting standards in place. The code may be totally uncommented or sparsely commented, and there may be missing or incomplete documentation. Everything is up to the initial script writer. New DBAs inherit the scripts and must try their own hand at fixing things, spending many hours understanding the scripts, experimenting with the code, and making any future maintenance changes. These are very common scenarios in the real world.

Oracle Enterprise Manager Grid Control aims to eliminate most of these issues. Next, we will see how to set up and schedule Oracle RMAN backups in a straightforward, no-nonsense way using the vast capabilities of Oracle Enterprise Manager Grid Control.

Part II: The Oracle Enterprise Manager Grid Control Approach

As noted previously, the major benefit of this approach is the time savings achieved in the tasks of setting up, testing, and day-to-day maintenance of the Oracle RMAN backups. Removing the dependence on UNIX-level scripts eliminates the necessity of hiring and retaining shell script specialists. Using Oracle Grid Control results in greatly simplified debugging of errors, because there are no shell scripting issues—the DBA can concentrate solely on the Oracle RMAN commands. Future maintenance and changes to the Oracle RMAN backups will be easier.

This full comparison of both approaches, when demonstrated in the real world,  has already convinced many major companies and financial institutions to consider the use of Oracle Grid Control for Oracle RMAN backups and other database management tasks.

For the purpose of this demonstration, the production database is FIN1P—a production Oracle database containing the company’s financial data.  The DBA’s task is to set up and schedule Oracle RMAN backups for the database.

The DBA has access to a central Oracle Grid Control site for managing and monitoring the multiple database targets for which his/her team has responsibility. The presumption is that the Oracle Grid Control site has been deployed using best practices and scalable architecture, so it can handle multiple database management tasks such as Oracle RMAN backups, Oracle Data Guard setup and monitoring, database cloning, and so on.

For the appropriate architecture, see my recently published article on Oracle Technology Network, “Grid Control Architecture for Very Large Sites”, which explains how a large central Oracle Grid Control site was able to manage and monitor 600 to 700 targets including databases, servers, and listeners.

It is always preferable to use the latest version of Oracle Grid Control at the central site. To find the exact version of Oracle Grid Control installed, go to the Oracle Grid Control home page and select the About Oracle Enterprise Manager link at the bottom of the page. In the case of this demonstration, the version is Oracle Enterprise Manager 10g Release 4 Grid Control 10.2.0.4.0 which was the latest version available at the time of installing the demonstration.

Often security is cited as a concern for a central management site; what if one group of DBAs gains access to production databases for which they are not responsible? Oracle Grid Control, however, is sophisticated enough to allow the creation of multiple administrator logins who have access to specific groups of targets. This allows multiple database teams in the corporate environment to log in and only have management access or view access to their own group of targets—whether it be database, application server, host server, or listener.

Thus potentially any of the DBA team members responsible for the FIN1P database could log in to the Oracle Grid Control console with their administrator login and would have Oracle Grid Control management rights to the FIN1P database, host, and associated listener targets.

Preinstall Tasks

Initially, there are a few preinstall tasks required to set up the environment, as follows:
  1. A separate Filer volume will be used for the Oracle RMAN backups. As per corporate database standards, the backup volume must be mounted as /U99 at the server level. The entry for the backup mount point should be present  in /etc/vfstab, and the mount point should appear in the list when the df –k (disk free in Kilobytes) command is executed at the UNIX prompt. If not, follow the UNIX-level steps as the root UNIX user, to add the entry in /etc/vfstab, and then mount the volume as /U99. Also change the ownership via this command:



    chown –R oracle:dba   /U99

    This is to ensure that the /U99 backup mount point and all subdirectories (as specified by the –R argument) under this mount point are owned by the Oracle UNIX user and DBA UNIX group, so that Oracle RMAN is able to create backup pieces under this mount point.



  2. The Oracle Enterprise Manager Agent is preinstalled on each database server that is to be accessed via Oracle Grid Control. The most popular method to perform this install is via the pull method, which uses an AgentDownload script and the wget utility to pull the Agent install files from the central site directly onto the server. Other common methods are the push method, as well as the normal GUI installation using the CD or software directly on the target server.

    Ideally, if the Provisioning Pack of Oracle Grid Control is used, it is possible to keep a “Gold Copy” of Oracle installations, in this case the Agent can already be preinstalled in the gold copy. The gold copy is then provisioned onto new servers.

    The Agent is to be installed in its own Agent home on the server, for which at least 2GB of space should have been allocated. Once installed and configured, the Agent starts to upload XML files to the central Oracle Management Service (OMS) server, and metrics about the targets on this database server begin to be stored in the central Oracle Grid Control repository.

  3. In Oracle Grid Control, a collection of associated targets is called a target group. This is normally used for grouping together targets to be associated with a particular corporate department, application, or DBA team.

    Such a target group, FINGRP, has been created by the Oracle Grid Control owner SYSMAN (password-controlled by the super-administrator of the central Oracle Grid Control site, normally a DBA of the central database team). This target group contains the FIN1P database, its listener, and also the host the database runs on.

    A new Oracle Grid Control administrator, FINADM, has also been created. This administrator is assigned full management privileges over the target group FINGRP.
The following steps must now be followed for the database requiring Oracle RMAN backups to be set up and scheduled. If the database is on an active-passive cluster—for example, a SUN HA cluster or any such technology—then these steps would be followed for each server, because the database could potentially run on either of the nodes in this active-passive technology and have a different target name for each node in Oracle Grid Control. In the case of Oracle RAC there is no such issue, because there is only one database shared by both or multiple nodes, and it can be backed up easily and efficiently by Oracle RMAN.

Setup Steps
  1. Log in to the Oracle Grid Control console using the FINADM login. You are able to see only the targets for which your team is responsible. Move to the Targets tab, click Databases, and then select the FIN1P database. This now displays the FIN1P database home page.
    The name of the user logged on to the console is clearly visible in the topmost bar of the browser as Oracle Enterprise Manager (FINADM) – Database Instance FIN1P.

  2. Select the Maintenance tab. On this tab, under High Availability, the options visible are Backup/Recovery, Backup/Recovery Settings, Oracle Secure Backup, and Data Guard. Select Backup Settings under Backup/Recovery Settings. At this point, the Oracle Grid Control login FINADM is asked to log in to the actual database itself.
    Log in as a user with DBA privileges, to set up and schedule the Oracle RMAN backup. It is not required to log in with SYSDBA or SYSOPER rights unless the intention is to start up, shut down, perform complete or incomplete recovery, change the archive log mode, or perform other such database administrative work. Connect as NORMAL instead.

    Do not check Save as Preferred Credential,because that would mean the login would be available without a password to any DBA who has logged on to the console as FINADM.
    Note that this procedure signifies that there are two levels of security in Oracle Grid Control—the first level to log in to the console and the second to log in to the database itself.

  3. Backup Settings is now displayed as three tabs: Device, Backup Set, and Policy. Under Device, enter 3 as the Parallelism (concurrent streams to disk drives). This means that three Oracle RMAN channels will be used to actually create the backup pieces. This will increase the speed of the backup, depending on the structure of the disk subsystem.
    The Disk Backup Location is not specified, so it uses the flash recovery area set up at the database level for the purpose of the backup. In this case, we presume that the flash recovery area of the database has been set up as /U99 as per the corporate standard.

  4. The Disk Backup Type has three radio buttons. You can ask for either a Backup Set, a Compressed Backup Set, or an Image Copy. Compressed Oracle RMAN backups were first made available in Oracle Database 10g. Select this option for the most optimal use of backup space.

  5. Enter the host credentials on the lower portion of this tab, and check Save as Preferred Credential. For a database based on a UNIX host, use the Oracle UNIX user. For a windows host, use \ to log on.
    The logon credentials in Windows occasionally do not work, even though \ is used. The solution is to add the Log on as a Batch Job privilege to the user in the Windows Control Panel -> Administrative Tools -> Local Security Policy -> Local policies -> User rights assignment -> Log on as a Batch Job, and add the Windows user that is being used in the host credentials.

    Note that the menu path mentioned here pertains to Windows XP Professional and may be slightly different for other Windows versions. This solution is as per Oracle MetaLink Note 109188.1.

  6. To make sure that the host credentials work correctly, click Test Disk Backup at the upper corner of the screen. This runs a physical backup test using Oracle RMAN and verifies that the actual backup can be physically created at the disk backup location, which is the flash recovery area—/U99 in this case. The backup test runs and, when completed successfully, displays the message Disk Backup Test Successful! at the top of the Backup Settings -> Device tab.

  7. Move to the Backup Set tab. At this point, nothing is to be changed on this tab, which specifies the maximum backup piece (file) size and certain tape settings such as copies of datafile and archive log backups. These settings are kept at the default.

  8. Move to the Policy tab. This is where the backup policy is specified, and it is particularly important for the control of Oracle RMAN backups.
    First check the box to Automatically backup the control file and server parameter file (SPFILE) with every backup and database structural change.

    This is the control file and SPFILE autobackup. You can specify the location of the autobackup or default to the flash recovery area /U99. Checking this option is highly recommended so that the control file and SPFILE will be automatically backed up along with every database or archive log backup. This is especially important because we are using the nocatalog mode of Oracle RMAN, so the history of our backups is in the control file. This is also another reason that the binary SPFILE is recommended for every database instead of the earlier text PFILE, because the SPFILE can easily be backed up by Oracle RMAN in this way.

    Check Optimize the whole database backup by skipping unchanged files such as read-only and offline datafiles that have been backed up.This is known as backup optimization.

    Check Enable block change tracking for faster incremental backups. This is a new feature inOracle RMAN 10g that keeps track of all changed blocks in a small file (approximately 11MB) so that incremental backups do not need to scan the whole database for block changes, as was the case in Oracle9i Database. This technique results in a much faster incremental backup, making a compelling reason to upgrade to the latest versions of the database.

    For the block change tracking file, specify the location and filename as D:\ORADATA\FIN1P\BLKCHGTRCK.DBF. This must be specified because the database area is not set for the FIN1P database.

    Under Retention Policy, select Retain at least the specified number of full backups for each datafileand specify 1 as the number of backups to retain. This means that the retention policy is based on redundancy instead of a recovery window.

    The redundancy is specified in terms of the number of backups to keep at any one time. If a new backup is made, then the previous backup is marked as obsolete and can be deleted. The recovery window is specified in terms of the number of days; backups are retained so that recovery is possible up to this many days in the past. For example, if a recovery window of 3 days is specified as the Oracle RMAN retention policy, it means that all backups necessary to recover up to the last three days’ worth of data will be retained. Obviously this setting may need to be changed depending on the size of the database and the free space available in the backup mount point /U99 at any time. The DBA needs to closely monitor this space and change the recovery window if necessary.

  9. Click OK; this saves the backup settings. The settings are actually saved in the control file of the database. This can be confirmed by logging in to Oracle RMAN at the command prompt and looking at the configuration settings via the show all command:



    -- Set the Oracle Sid in Windows
    C:\>set %ORACLE_SID%=FIN1P
    
    
    -- Move to the 11g Oracle Home
    C:\>cd C:\app\HaviPoriH\product\11.1.0\db_1\bin
    
    -- Start RMAN in the nocatalog mode connecting to the setOracle Sid
    C:\app\HaviPoriH\product\11.1.0\db_1\BIN> rman target=/nocatalog
    
    Recovery Manager: Release 11.1.0.6.0 - Production on SatJun 14 22:01:47 2008
    Copyright (c) 1982, 2007, Oracle.  All rights reserved.
    connected to target database: FIN1P (DBID=3660855775)
    using target database control file instead of recoverycatalog
    
    RMAN> show all;
    RMAN configuration parameters for database withdb_unique_name FIN1P are:
    CONFIGURE RETENTION POLICY TO REDUNDANCY 1; # default
    CONFIGURE BACKUP OPTIMIZATION ON;
    CONFIGURE DEFAULT DEVICE TYPE TO DISK; # default
    CONFIGURE CONTROLFILE AUTOBACKUP ON;
    CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPEDISK TO '%F'; # default
    CONFIGURE DEVICE TYPE DISK BACKUP TYPE TO COMPRESSEDBACKUPSET PARALLELISM 3;
    CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE DISK TO 1;# default
    CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE DISK TO1; # default
    CONFIGURE MAXSETSIZE TO UNLIMITED; # default
    CONFIGURE ENCRYPTION FOR DATABASE OFF; # default
    CONFIGURE ENCRYPTION ALGORITHM 'AES128'; # default
    CONFIGURE COMPRESSION ALGORITHM 'BZIP2'; # default
    CONFIGURE ARCHIVELOG DELETION POLICY TO NONE; # default
    CONFIGURE SNAPSHOT CONTROLFILE NAME TO'C:\APP\HAVIPORIH\PRODUCT\11.1.0\DB_1\DATABASE\SNCFFIN1P.ORA'; # default
    

    It is obvious that the changes in the Oracle Grid Control Backup Settings screen have filtered down to the configuration settings in Oracle RMAN, which are stored in the control file of the FIN1P database. The settings without the # default comment are the ones that have changed. For example, the backup optimization is shown as ON, the control file autobackup is shown as ON, and parallelism is set to 3 as a compressed backup set—all settings that were changed in Oracle Grid Control.



  10. Now click Schedule Backup on the Maintenance tab of the FIN1P database. Under Customized Backup, select Whole Database and click Schedule Customized Backup.
    Backup Settings Device
    It is also possible to back up individual tablespaces, datafiles, or only the archive logs when selecting the Customized Backups. Another option is to back up All Recovery Files on Disk; this would back up the flash recovery area itself to tape.

    Oracle’s visionary approach is to back up the database to disk—in other words, the flash recovery area—to enable fast disk-based recovery for the client and to minimize business downtime. This is the main purpose of the flash recovery area.

    Once a week or at any other interval of time, the disk-based database backup—in other words, the flash recovery area—can be backed up to tape using this option.

  11. On the Scheduled Customized Backup: Optionspage, select Full Backup, and Online Backup as the backup mode.
    Under Advanced,check the following options:



    • Also back up all archived logs on disk
    • Delete all archived logs from disk after they are successfully backed up
    • Delete obsolete backups

    The first two options enable the backup of all archive logs along with the full database backup, and the deletion of  the archive logs from disk after they are backed up. The Delete obsolete backups enables the deletion of backups that are marked as obsolete—those backups that are no longer required to satisfy the retention policy.



  12. Click the Encryption expandable icon at the end of the page. This expands the encryption fields that are new in Oracle RMAN 11g.
    To create encrypted backups on-disk with Oracle RMAN, the database must have the Advanced Security database option installed. This is a licensable option; an extra license fee is payable to Oracle in addition to the Enterprise Edition license.

    For creating encrypted Oracle RMAN backups directly onto tape, Oracle Secure Backup  is known to be the only supported interface. This is another licensable option.

    Select Secure the backup using Recovery Manager encryptionand specify a password for the user-supplied password option. Backups will be encrypted using this password. It is also possible to use the Oracle encryption wallet to encrypt the backup, and it is advised to use both encryption approaches for flexibility in using either the wallet or the password when restoring the Oracle RMAN backup.

    Note that AES256 has been selected as the encryption algorithm.

  13. On the next page confirm that the Disk Backup Location is the flash recovery area and proceed to the following page.

  14. The Scheduling page appears, where it is possible to enter the time zone of your location, the start date and time of the backup, the repeat frequency, and the end date (or if the schedule is to continue indefinitely).
    The default suggested setting is 2 a.m. the following morning. The default is accepted so that the backup will start at this time.

    The interval is selected as Frequency 1 Day, because as per the corporate standard a full database backup should be performed each day.

    Then select Repeat until Indefinite, so that the schedule is continued indefinitely.

  15. Click Next. The Review screen now appears, on which the following settings are displayed: Destination  Disk
    Backup Type  Full Backup
    Backup Mode  Online Backup 
    Encryption Algorithm  AES256
    Encryption Mode  Oracle Encryption Wallet, Password
    Flash Recovery Area  /U99
    Disk Parallelism  3

    The Oracle RMAN script is generated by Oracle Grid Control and displayed on the screen:



    set encryption on for all tablespaces  algorithm 'AES256' identified by '%PASSWORD';
    backup device type disk tag '%TAG'  database;
    backup device type disk tag '%TAG'  archivelog all not backed up delete all input;
    allocate channel for maintenance  type disk;
    delete noprompt obsolete device type  disk;
    release channel;

    These commands perform the actions selected in the Oracle Grid Control screens, such as turning on encryption when performing a full database and archive log backup, deleting archive logs after backup, and then deleting obsolete backups.

    It is possible to click Edit RMAN Scriptand modify the script before submitting; however, this prevents a return to the previous screens in this wizard. Nevertheless, the Oracle RMAN script is modified manually as follows—with some extra maintenance commands added to cross-check backups and archive logs and to delete expired backups and archive logs:



    set encryption on for all  tablespaces algorithm 'aes256' identified by '%password';
    backup device type disk tag '%tag'  database;
    backup device type disk tag '%tag'  archivelog all not backed up delete all input;
    allocate  channel for maintenance device type disk;
    crosscheck backup;
    delete  noprompt expired backup;
    delete  noprompt obsolete device type disk;
    crosscheck  archivelog all;
    delete  noprompt expired archivelog all;
    release channel;



  16. Click Submit Job. Once the job is successfully submitted, click View Job to see its progress. The job now appears under the Jobstab of Oracle Grid Control, under Job Activity, as a Scheduled job.

    At this point, it is possible to click Backup Script -> Show to display the backup script used. This shows the Oracle RMAN script wrapped in Perl:



    $rman_script="set encryption on for all tablespaces algorithm 'aes256' identified by '%password';
    backup device type disk tag '%tag' database;
    backup device type disk tag '%tag' archivelog all not backed up delete all input;
    allocate channel for maintenance device type disk;
    crosscheck backup;
    delete noprompt expired backup;
    delete noprompt obsolete device type disk;
    crosscheck archivelog all;
    delete noprompt expired archivelog all;
    release channel;
    ";
    &br_save_agent_env();
    &br_prebackup($l_db_connect_string, $l_is_cold_backup, $l_use_rcvcat, $l_db_10_or_higher, $l_backup_strategy, "FALSE");
    my $result = &br_backup();
    exit($result);



  17. It is possible, if so desired, to edit the scheduled job and change the schedule, the credentials, or the access. In the latter case, different access levels of View orFullcan be allocated to different Oracle Grid Control administrators. FINADM is seen as the owner of the job on this page.

  18. To set the Preferred Credentials for the database and host, click Preferences at the top of the Oracle Grid Control console and then click the Preferred Credentials link. This allows setting the logon name and password for both the database and the host. The Test button can be used to verify the username and password. After verification, click Apply.

  19. After setting the Preferred Credentials, click the Jobs tab of the Oracle Grid Control console. This shows the scheduled Oracle RMAN backup job.

  20. For the purpose of a backup test, change the schedule of the job to make itrun immediately. When this is done and the job is submitted, the job appearswith status Running. Press F5every few seconds to refresh the browser and watch the progress of the job.

  21. The job disappears from the JobActivity screen. Change the status to All and click Go.This shows all jobs, including jobs with a problem. The FIN1P backup job appears as a problem.

    Click the Problem link. The steps of the job appear, showing exactly where the job has had a problem.

    The steps for Prebackup and Postbackup appear to have succeeded. The step for Backup appears to have failed. Click this step, and the entire Oracle RMAN output is now displayed. An examination of the output reveals the error immediately.



    Output Log 
    Recovery Manager: Release 11.1.0.6.0 - Production on SatJun 14 23:29:03 2008
    Copyright (c) 1982, 2007, Oracle.  All rights reserved.
    
    RMAN> 
    connected to target database: FIN1P (DBID=3660855775)
    using target database control file instead of recoverycatalog
    
    RMAN> 
    echo set on
    
    RMAN> set encryption on for all tablespaces algorithm'aes256' identified by *;
    executing command: SET encryption
    
    RMAN> backup device type disk tag '%tag' database;
    Starting backup at 14-JUN-08
    allocated channel: ORA_DISK_1
    channel ORA_DISK_1: SID=125 device type=DISK
    allocated channel: ORA_DISK_2
    channel ORA_DISK_2: SID=123 device type=DISK
    allocated channel: ORA_DISK_3
    channel ORA_DISK_3: SID=122 device type=DISK
    channel ORA_DISK_1: starting compressed full datafilebackup set
    channel ORA_DISK_1: specifying datafile(s) in backup set
    input datafile file number=00001name=D:\ORADATA\FIN1P\SYSTEM01.DBF
    channel ORA_DISK_1: starting piece 1 at 14-JUN-08
    channel ORA_DISK_2: starting compressed full datafilebackup set
    channel ORA_DISK_2: specifying datafile(s) in backup set
    input datafile file number=00002name=D:\ORADATA\FIN1P\SYSAUX01.DBF
    input datafile file number=00004name=D:\ORADATA\FIN1P\USERS01.DBF
    channel ORA_DISK_2: starting piece 1 at 14-JUN-08
    channel ORA_DISK_3: starting compressed full datafilebackup set
    channel ORA_DISK_3: specifying datafile(s) in backup set
    input datafile file number=00005name=D:\ORADATA\FIN1P\EXAMPLE01.DBF
    input datafile file number=00003name=D:\ORADATA\FIN1P\UNDOTBS01.DBF
    channel ORA_DISK_3: starting piece 1 at 14-JUN-08
    RMAN-03009: failure of backup command on ORA_DISK_1channel at 06/14/2008 23:29:35
    ORA-19914: unable toencrypt backup
    ORA-28365: wallet isnot open
    continuing other job steps, job failed will not be re-run
    RMAN-03009: failure of backupcommand on ORA_DISK_2 channel at 06/14/2008 23:29:35
    ORA-19914: unable toencrypt backup
    ORA-28365: wallet isnot open
    continuing other job steps, job failed will not be re-run
    
    RMAN-00571:=================================================
    RMAN-00569: ======== ERROR MESSAGE STACK FOLLOWS ==========
    RMAN-00571:=================================================
    
    RMAN-03009: failure of backupcommand on ORA_DISK_3 channel at 06/14/2008 23:29:37
    ORA-19914: unable toencrypt backup
    ORA-28365: wallet isnot open
    

    This error has occurred because the Oracle wallet is not open or has not been created for the database. To create the encryption wallet for the FIN1P database, the following lines are first added to the sqlnet.ora file on the database server:


     
    ENCRYPTION_WALLET_LOCATION= 
    (SOURCE=(METHOD=FILE)(METHOD_DATA=
    (DIRECTORY=C:\app\HaviPoriH\product\11.1.0\db_1)))

    The listener for the database is then restarted so that these changes take effect.

    To create a new master key and begin using encrypted RMAN backups as well as Transparent Data Encryption (TDE) for the data, issue the following command:



    ALTER SYSTEM SET ENCRYPTION KEY IDENTIFIED BY "";
    

    Click the Retry button on the Job page for the failed job. The backup job now executes again, and this time it completes successfully.



    Examine the Oracle RMAN output by clicking the Step Backup link on this screen. This shows that the backup of the database has indeed completed successfully. Observe each step of the output in detail.

    The Oracle RMAN output is seen to first set on encryption using the AES256 algorithm. This is followed by creation of a compressed full backup set for the database, and a control file and SPFILE autobackup. After this, the archive logs are backed up as a compressed archive log backup set, and the archive logs are then deleted from disk by Oracle RMAN due to the delete input clause.

    Whenever Oracle RMAN wants to perform recovery, it is able to go to the archive log backup sets and extract the archive log it requires. This is one of the great benefits of Oracle RMAN. It contrasts with the manual recovery that DBAs performed in the past, extracting the archive logs from backup archives (UNIX tape archives or “tars”) and placing them where the recovery mechanism of Oracle could find them and use them.

    The Oracle RMAN backup pieces for the database and archive logs, and the control file and SPFILE autobackups, are created in their respective directories under the flash recovery area (/U99 in this case):



    \FIN1P\BACKUPSET\2008_06_15\
    \FIN1P\AUTOBACKUP\2008_06_15\

    The creation of the subdirectories for the database name, backup set, autobackup, and the date the backups have taken place are all automatically performed by Oracle due to the fact that the database is using the flash recovery area.

    The archive log backup is followed by another control file and SPFILE backup. This happens because autobackup of these important files occurs after any backup, whether it’s a backup of the database or of the archive logs. This is then followed by the maintenance actions. These cross-check the existing backups and archive logs and make sure that they are present on-disk. All expired (not found) and obsolete (as per the retention policy) database, archive log, and control file backup entries are deleted from the control file records. The maintenance commands also ensure that the obsolete database backups and control file autobackups are physically deleted from the disk.

    An important point to note is that if the database is restarted at any time in the future, the wallet must be reopened again. Otherwise, the Oracle RMAN backup fails with these errors:



    ORA-19914: unable to encrypt backup
    ORA-28365: wallet is not open

    This is resolved by opening the wallet again using the following command:



    alter system set wallet open  identified by "";



  22. The Oracle RMAN output has been examined in detail, and the backup job has been verified as fully successful. At the Oracle Grid Control console, move to the Jobs tab. Select status All and click Go to find the verified job.
    If you edit this job again, you are only able to change the access, not the schedule. So you need to re-create the job using the same steps, but this time use the schedule of the daily early morning run, repeated each day, indefinitely.

    When the job is scheduled thus, it appears in the Jobs tab of the Oracle Grid Control console with the status of Scheduled.

    When the job is thus set up as a scheduled execution, it runs each day and the Oracle Grid Control repository stores the Oracle RMAN log of each run, whether successful or not. The DBA is able to examine the status of all Oracle RMAN backup jobs for all databases from the central Oracle Grid

    Control console and can take corrective action if necessary. It is also possible to receive e-mail notifications from Oracle Grid Control on the failure of any scheduled jobs.

Conclusion

Although at first glance it seems that the Oracle Grid Control approach to setting up Oracle RMAN backups has a far greater number of steps than the traditional UNIX shell script and cron approach, in the practical sense these steps can be performed well within 10 to 15 minutes, because they are wizard driven and autogenerate the necessary scripts. This does not include the setup of the Oracle Enterprise Manager Agent on each server. The Agent is not just for Oracle RMAN backup purposes but is the foundation of all Oracle Grid Control management and monitoring activities for that database server.

Thus there are considerable time savings in this approach, as compared to the two or more hours required for the manual setup and testing of Oracle RMAN UNIX-scripted backups that were discussed in Part I. This leads to savings on expensive DBA man-days, especially if there are a large number of projects requiring databases to be provisioned. Backup strategy has always been a very important aspect of Oracle Database, and it is essential that backups are easily set up in a consistent manner for each production and test database.
Because the Oracle RMAN backup setup is largely wizard driven, the chances for human error are minimal. Scripts have been generated by Oracle Grid Control and scheduled by its own job scheduling mechanism, thus eliminating the need to hire and retain rare skills in shell script maintenance. There is no question of using any other script language, because Oracle Grid Control does all the script generation. Debugging is also greatly simplified. For any new requirement that may arise—for example, incremental backups or image copies—a new backup job can be easily created using the Oracle Grid Control console and the old job deleted. So maintenance and future enhancements are also made a great deal easier.

One other great advantage of the Oracle Grid Control approach is that the latest advances in database technology are available as options in the Wizards, thus leading to a new awareness of the possibilities. In the real world, it is often seen that old Oracle RMAN  scripts used in Oracle8i Database are still being used unchanged in Oracle9i Database and 10g, mainly because of the lack of knowledge in the DBAs maintaining the scripts. This results in new Oracle RMAN features not being used, such as compression of backups, or block change tracking for incremental backups, or encryption. When the Oracle Grid Control approach is used instead, it results in the DBAs being made aware of the new Oracle RMAN possibilities so that these can be used to the fullest benefit.

When Oracle Grid Control is showcased to management and DBAs through the demonstration in this article,  there is no need for further proof—the manifold benefits of this approach are visible beyond any doubt. More and more companies will necessarily set up and schedule Oracle RMAN backups using the vast capabilities of Oracle Enterprise Manager Grid Control. There will be no stopping this immensely beneficial juggernaut.

REFERENCES

Oracle RMAN Backups: Pushing the "Easy" Button, by Porus Homi Havewala, Oracle Coporation