Wednesday, November 18, 2009

Data Guard Broker High Availability


Data Guard Broker High Availability

Doc ID:
275977.1
Type:
BULLETIN

Modified Date:
11-JAN-2006
Status:
PUBLISHED
Introduction
In a Data Guard environment, the primary and standby databases, as well as their various interactions, may be managed using SQL*Plus. However, for easier manageability, Data Guard offers a distributed management framework, called the Data Guard Broker, which automates and centralizes the creation, maintenance, and monitoring of a Data Guard configuration, and abstracts the various complexities associated with SQL*Plus commands. Administrators may use either Oracle Enterprise Manager or the Broker’s own specialized command-line interface (DGMGRL) to take advantage of the Broker’s management capabilities.
This article will focus on how the Broker has been enhanced in Oracle Database 10g to tightly integrate with Real Application Clusters (RAC) and ensure seamless high availability in the event of failures of one or more instances in a RAC standby.
Data Guard Broker and the Apply Instance
In Oracle Database 10g Release 1, one of the significant features of Data Guard is that Data Guard Broker now completely supports RAC. In a Broker-enabled Data Guard configuration, if the standby database is a RAC database, the redo data from the primary is shipped to a single standby instance. Furthermore, the apply process, whether it is the Managed Recovery Process (MRP) for Redo Apply, or Logical Standby Process (LSP) for SQL apply, is started on this particular instance. This is referred to as the apply instance.
For purposes of this article, let’s assume the following Data Guard configuration:
RAC Primary
RAC Standby
Node 1 Instance
Dallas_N1
Node 1 Instance
Chicago_N1
Node 2 Instance
Dallas_N2
Node 2 Instance
Chicago_N2
Before enabling the RAC-enabled standby database, the administrator may indicate the preferred apply instance by setting the Data Guard Broker PreferredApplyInstance property to the preferred instance (SID):
DGMGRL> EDIT DATABASE ‘Chicago’ SET PROPERTY ‘PreferredApplyInstance’=‘Chicago_N1’;
This choice may also be selected through the GUI, using Oracle Enterprise Manager (EM), which leverages the Broker. If the administrator has no preference which instance is to be the apply instance in a RAC standby database, the Broker randomly picks an apply instance.
Once the apply instance is selected and, as long as the apply instance is still running, any subsequent change in the value of the PreferredApplyInstance property doesn’t come into play till the apply instance fails, or till a role change (if it this property was changed for the primary database). If the administrator does want to change the apply instance when one apply instance is already selected and is running, it can be done through the “[WITH APPLY INSTANCE = ]” clause of the “EDIT DATABASE” DGMGRL command:
DGMGRL> EDIT DATABASE ‘DR_Sales’ SET STATE=‘ONLINE’ WITH APPLY INSTANCE=‘Chicago_N2’;
This command resets redo data transmission to the new instance, and starts the apply process in that instance.
Data Guard Broker and Apply Instance Failovers For RAC Standby Databases
When the apply instance fails, not only does log apply services stop applying redo data to the standby database, but redo data transmission to the standby database is also stopped because the apply instance is not available to receive and store redo data locally. To tolerate a failure of the apply instance, the Broker leverages the availability of the RAC standby database by automatically failing over log apply services to a different standby instance. This apply instance failover capability provided by the Broker in Oracle Database 10g makes the data protection and high availability capabilities of Data Guard even more robust. Let’s look into it in more detail.
To be prepared for this automatic failover, the administrator may set the PreferredApplyInstance property of Data Guard Broker, to indicate which instance should take over this task if the current apply instance were to fail. For example, in the above diagram, if the current apply instance is ‘Chicago_N1’, the administrator may enter the following command to specify the failover instance:
DGMGRL> EDIT DATABASE ‘Chicago’ SET PROPERTY ‘PreferredApplyInstance’=‘Chicago_N2’;
If the current apply instance were to fail, the Data Guard Monitor (DMON) process (this is an Oracle background process that runs for every database instance that is managed by the Broker) in each of the other surviving standby instances becomes aware of this. After waiting for a predetermined amount of time that is configurable by means of the Broker’s ApplyInstanceTimeout property (default value = 2 minutes), the Broker selects a new apply instance according to the following rule: if the PreferredApplyInstance property indicates an instance that is currently running, select it as the new apply instance; else pick a random instance that is currently running to be the new apply instance. The Broker coordinates this such that the primary database now starts shipping redo data to the new instance, and also MRP or LSP is started on the new instance.

This automatic failover behavior works irrespective of the process – Log Writer (LGWR) or Archiver (ARCH), that is shipping redo data to the primary, and irrespective of the protection mode (Maximum Protection, Maximum Availability, Maximum Performance) of the Data Guard configuration. The functionality in the case of Maximum Protection mode is however a bit specialized. In this case, and with only one standby supporting this Maximum Protection mode, and with this standby being a RAC database, if the apply instance were to fail resulting in the LGWR on the primary encountering an error, another instance of this standby is immediately chosen by the Broker as the new apply instance, instead of waiting till the timeout value specified in the ApplyInstanceTimeout property. The LGWR process on the primary reattaches to the standby redo log (SRL) through this new instance and continues shipping redo data to the new instance. Additionally, the Broker automatically starts the apply process on this instance. Thus with the Broker’s seamless functionality, the primary database is prevented from being terminated, no data is lost and the standby database continues to be a viable switchover or failover candidate.

Note:
If Data Guard Broker is not enabled for this configuration, Oracle’s High Availability best practices recommend setting up a list of destination connect identifiers in the tnsnames.ora file on the primary, e.g.:

CHICAGO=
(DESCRIPTION=
(ADDRESS_LIST=
(ADDRESS=(PROTOCOL=tcp)(HOST=chicago_n1-server)(PORT=1521))
(ADDRESS=(PROTOCOL=tcp)(HOST=chicago_n2-server)(PORT=1521)))
(CONNECT_DATA=
(SERVICE_NAME=CHICAGO)))


In this case, LGWR will choose the next entry in the list to send redo data to, after a timeout period specified in the NET_TIMEOUT attribute of the log_archive_dest_n parameter (if NET_TIMEOUT is not specified, it will wait till the system’s TCP/IP timeout). However the apply process (Redo Apply or SQL Apply) would need to be manually started in the new instance, through SQL*Plus.

If such a connect identifier list is not set up, and Broker is not enabled for this configuration, upon failure of the apply instance of the last standby database available in the Maximum Protection mode, and upon timing out of the LGWR, the primary database will be brought down to ensure no data is lost.

If there are multiple standby databases available in the Maximum Protection mode, upon failure of the apply instance in one of the standby databases, the Broker would not immediately initiate an automatic failover. Instead, it will wait for the timeout value specified in the ApplyInstanceTimeout property, just as in the case of other protection modes. When the failover instance comes up, it will resynchronize with the primary database through Data Guard’s gap resolution mechanism.

Coordinating Database Shutdowns
For Maximum Protection mode configuration, the Broker coordinates the planned shutdown of standby databases initiated through SQL*Plus, SRVCTL (the RAC management interface), or the Broker itself. If the standby database is the last standby supporting the Maximum Protection mode of the primary, and is not a RAC database, the Broker prevents a planned shutdown of the standby database. If it is a RAC database, only the apply instance is prevented from shutting down. If this standby is not the last standby in the Maximum Protection mode configuration, planned shutdown is not prevented.

A planned shutdown of a standby also results in that destination being reset such that the primary database stops shipping redo data to this standby till it is restarted. This prevents the primary from unnecessarily attempting to ship redo data to the standby, which may result in errors. When the standby is restarted, the destination is automatically reactivated by the Broker and redo data is shipped to it. The Broker also starts the apply process on the standby. The underlying gap resolution mechanism ensures all missing logs are fetched and applied so that this standby can be maintained as a transactionally consistent copy of the primary database.

Conclusion
Data Guard Broker considerably simplifies the usability, administration and monitoring of a Data Guard configuration. In Oracle Database 10g, the additional functionality provided by the Broker regarding seamless integration with RAC and support of automatic apply instance failovers in case of a RAC standby database, is a compelling reason why administrators should enable the Broker and use Enterprise Manager and/or DGMGRL to manage their Data Guard configurations.

No comments:

Post a Comment