Wednesday, November 11, 2009

Oracle Enterprise Manager Grid Control Architecture for Very Large Sites

What is a "very large site" as far as Oracle Enterprise Manager Grid Control is concerned? It simply means an enormous collection of targets to be monitored and managed.

My team was directly involved with the first Grid Control production site in the world, at a large telco in Australia (with a lot of assistance from Oracle). The project involved setting up a central Grid Control site managing disparate servers and databases under different DBA teams in the corporate organization. The teams, servers and databases were scattered across the Australian continent.

Management was motivated for the following primary reasons: It needed a view of all Oracle versions and licenses as well as a means of reducing expensive SAN storage usage. The space was being paid for at a high annual cost but either lay unallocated at the Unix level or was over-allocated at the database level, with a lot of free unused space in tablespaces. To handle this particular aspect, Oracle assisted with specially developed Grid Control storage reports. These reports were so useful that they were eventually incorporated into Release 2 of Grid Control.

The other goals driving this project were to have one single database management tool throughout the enterprise, and to use Grid Control's excellent features such as database performance analysis, RMAN backup setup and scheduling, Oracle Data Guard setup, and monitoring, and cloning of Oracle databases and Oracle Homes from what we called a "gold copy" which would save a lot of internal database consulting time. Prior to Grid Control, all these tasks were performed using the traditional manual approach, which consumed a lot of time-besides being prone to human error.

Performing all these activities for literally more than two thousand database servers, monitoring them continuously, and having multiple DBA teams use the central site for this purpose required a particularly well architected site, so that it would scale as necessary in stages I, II, and III when more and more targets would be brought on-the goal being 2000 target servers or more.

In this article I will offer an overview of the architecture used to achieve this high scalability in Grid Control. This kind of information will be useful for customers that are contemplating the use of Grid Control but need
guidance about properly architecting their solutions.


The Wrong Architecture


Suppose a DBA team, or its management, decide to implement Grid Control. The normal tendency would be to use a test or development server to install the product, be it on a flavor of Unix, Linux, or Windows. This means all Grid Control components (the current release at the time of writing being Release 4) are placed on a single server. This includes the repository database, Oracle Management Service (OMS), and the EM agent.

Then, EM Agents would be installed by either the push or pull method, on a few other development and test database servers. After the DBA team experiments with the functionality of Grid Control, it would likely tentatively decide to install an agent on a production server for the first time.

Let's say eventually management decides to move the whole shebang of Grid Control to production, but it now makes the mistake of assuming that what works for a few development servers would also work for production. It authorizes the DBA team to install Grid Control on a production server, again a single server.

The team installs all the components again on a single server, perhaps sharing the Grid Control install with a production or test database. This is followed by EM agents being installed on all the production and test database servers pointing back to the Grid Control server.

Things work for a while. But as the Grid Control workload gradually increases, as more and more databases are managed by more DBAs, as more and more monitoring is performed, as Grid Control is used more and more for RMAN backups, Data Guard setup and monitoring, cloning of databases and homes and so on, the Grid Control system grinds to a halt.

Why would this happen? For the answer, we need to understand the Grid Control internals. The main working component of Grid Control, the engine as it were, is OMS. This is a J2EE application deployed on Oracle Application Server 10g; the member components are the Oracle HTTP Server, the Oracle Application Server Containers for Java (OC4J), and the OracleAS Web Cache. Therefore, Grid Control is a reduced version of Oracle Application Server itself.

At the Unix server level, we see a Unix process that is the actual OC4J_EM process. This is also seen when the opmnctl command is executed:


./opmnctl status    

Processes in Instance: EnterpriseManager0.GridMgt001.in.mycompany.com

-------------------+--------------------+-------+---------

ias-component      | process-type       |   pid | status

-------------------+--------------------+-------+---------

WebCache           | WebCacheAdmin      |  2071 | Alive

WebCache           | WebCache           |  2099 | Alive

OC4J               | OC4J_EM            | 27705 | Alive


OC4J               | home               |   N/A | Down

dcm-daemon         | dcm-daemon         |   N/A | Down

LogLoader          | logloaderd         |   N/A | Down

HTTP_Server        | HTTP_Server        |  2072 | Alive

A small digression at this stage: Since the OMS runs on Oracle Application Server, you can control it like you would do with Application Server: use the EM Application Server control, or at the command line use opmnctl (Oracle Process Management Notification Control), or dcmctl (Distributed Configuration Management Control). This is in addition to the Enterprise Manager Control (emctl) utility.

Thus, OC4J_EM is only a single Unix process with its own PID. The memory used by this process is also limited, it is set by the file $ORACLE_HOME/opmn/conf/opmn.xml. You could perhaps increase the memory used by the process but it remains just a single process. We can imagine the one process being used for managing numerous databases and servers-to perform various tasks such as Data Guard setups, cloning, and so on-and understand why such a setup will simply not scale.

Obviously, if the database itself were to run on a single process, with the db writer, the log writer, the archiver, and numerous other process functions being performed by a single process, then the database would become less efficient and scalable. This is the primary reason why, if all Grid Control components are placed on a single server, only limited scalability will be achieved: you would be limited to one OC4J_EM process with its own limits of memory and processor speed. If the OC4J_EM process were to reach the limits of its memory under heavy load, and the process were to slow down or not respond, then other DBAs would not be able to login to the Grid Control Console for their own database management work.

Placing Grid Control components on a single server is not recommended in production, neither is sharing it with a production or test database on the same server. Grid Control needs its own server, and it needs its own set of servers in a properly architected solution. It is recommended that some time be spent to plan the Grid Control site being contemplated for production. Senior management should be convinced of the need for this initial study, it should approve the budget for the solution, and the work should then be scoped out and performed as a professional project, since Grid Control is an enterprise solution and not a minor tool to deploy on a DBA workstation.

Grid Control Internals

Grid Control is drastically different from previous incarnations of Enterprise Manager. In the past, Enterprise Manager was not so scalable, simply because it was not N-tiered. The oldest avatar was Server Manager, which was a PC executable utility. Immediately before Grid Control, there was OEM 9i, which was a bulky Java beast sitting cross-legged on the PC's memory with numerous issues as a result.

When Grid Control was created, the internal architecture was drastically altered to the N-tier model. Oracle's vision is broadly N-tier, which is in line with and also sets the direction for modern IT thought. Grid Control became the three components mentioned previously, and because the main engine, the OMS, now runs on the application server as an OC4J application, it instantly became scalable.

Why is this possible? First of all, Grid Control is not tied to one PC or one server; multiple OC4J EM applications can be placed on the application server on different servers, and they can all point to the same EM repository.

Herein lies the secret of the immense scalability of Grid Control. The boundaries were broken, and horizontal scaling were opened to the EM world. The more OMS servers you add to the EM site, the more targets you can manage.

The Right Architecture

Our real-life large site implementation example will illustrate this concept more clearly. At the foundation of the implementation, industry-standard and open architecture can be utilized, such as Linux servers with the following configuration:


Specification Type
Specification Details
Hardware
Any industry vendor
OS
Linux (any version certified with Grid Control)
CPU
4 (2.2 GHZ or above)
Memory Requirement
8GB
Disk Space
10GB Free Space

There is no need to deploy powerful expensive servers (beefy beasts that typically have 24 or more CPUs and 32GB or more memory). Smaller 4 CPU machines with 8 GB memory are being used, since the intention is to scale horizontally and not vertically.

The "Free Space" mentioned in the specification table is for the Oracle software, such as the Oracle Database Home, the Oracle Management Service Home, and the Agent Home. It does not include the database, which will be placed on either a SAN or a NAS (Netapps filer). The database space requirement for the EM Repository would be approximately 60 to 70GB, with an equal amount of space reserved for the Flash Recovery Area, where all archive logs and RMAN backups will be stored. Oracle recommends database backups to disk (the Flash Recovery Area), so that fast disk-based recovery is possible..

Even with a large number of targets being monitored and managed, the database size rarely goes above above 60 to 70GB with out-of-the-box functionality. A new feature of Grid Control is that the EM repository database (10g) manages itself so far as space is concerned, in the sense that it performs rollups of metric data at predetermined intervals. Hence the metric data that is being collected continuously from the targets does not drastically increase the database size. On the other hand, it is possible to manually create extra metrics for monitoring, and this may lead to an increase in the database size greater than this example figure.

During the installation phase, the Full Grid Control software is installed first of all on one of the servers, using the Grid Control installation CDs. This is done by selecting the Enterprise Manager 10g Grid Control using a new database installation type. This server becomes the repository server since the repository database is created on this machine. Being a full install, an OMS and EM Agent are also installed on the same repository server. (You can ignore the OMS at this point: more on this later.)

Next, an additional OMS is installed on each of the other servers, this is done using the same Grid Control Installation Cds but selecting the Additional Management Service installation type. During the installation of the additional service, you are asked to point at an existing repository, so point to the repository database on the first server. The repository database must be up and running at this stage with a successful installation of the repository in the Sysman schema.

In the process of the Additional Management Service installation type, only the management service (OMS) and the EM agent will be installed. This is completed on three or more additional servers, these servers now become the management server pool.

The repository database server can be complemented with a standby database server using Oracle Data Guard, or optionally an Oracle RAC cluster on multiple nodes if it is a requirement to horizontally scale up the repository database performance. But a noteworthy point is that in Grid Control, the performance requirement is not so much on the database side, but more on the management server side. The highest scalability is achieved on the management servers since the OC4J_EM is where the bulk of the Grid Control work is performed. This is the reason why the architecture should include three or more management servers that are load balanced for a large Grid Control setup.

Load balancing the pool of management servers forms an integral part of this architecture. A hardware load balancer, such as a Big IP Application Switch Load Balancer from F5 Networks, can be used for this purpose. (This company's flagship product is the BIG-IP network appliance. The network appliance was originally a network load balancer, but now also offers more functionality such as access control and application security.)

The load balancer is set up with its own IP address and domain name for example: gridcentral.in.mycompany.com. The load balancer in turn points to the IP addresses of the three management servers. When a service request is received at the IP address or domain name of the load balancer, and this can be at a particular port which can be set up at the balancer level, the balancer decides to distribute the incoming service request to any of the three simultaneously active management servers in its pool, at the port specified. Grid control uses various ports for different purpose-for example, there is a certain port used for the Console logons, and a different port used for the Agent uploads of target metric data. The Big IP must be set up for all these ports so that load balancing occurs for Grid Control Console logons as well as for Agent uploads of target metric data.

An additional benefit is that this would give excellent redundancy to the Grid Control system. If one of the management servers were to stop functioning for any reason, such as could occur under heavy load, the OC4J_EM process may need to be restarted using opmnctl. Thus one of the management servers can be inactivated, while the other active management servers continue to service requests as distributed by the Big-IP load balancer. The load balancer automatically ignores the non-reachable IP (discovered to be so by its own monitors, which checks the pool members on an ongoing basis, at predetermined intervals). So, failure of any of the existing management server instances simply results in the load balancer directing all subsequent service requests to the active surviving instances. When the Big IP monitor detects that the node is back on line, the node or service is automatically added back into the pool.

Software load balancing could alternatively be used, instead of hardware load balancing. This is a simple solution that uses software, such as network domain names, to route requests to the three management servers. The hardware solution is more expensive, but it is recommended since it is a more powerful solution. A hardware load balancer responsible for load balancing as well as failover capabilities should form an integral part of the total architecture solution, making the solution much more robust and flexible.

If further redundancy is required at the Big IP load balancer level, then a standby load balancer can be deployed. The standby is used to shadow the production load balancer in all configuration changes, and takes over seamlessly if there is anything wrong with the production machine. This is an added precaution. When there are a number of production database teams using the centralized Grid Control to manage and monitor production systems, then this kind of high availability architecture is absolutely important where even the load balancer is deployed redundantly.

To manage the Big IP load balancers, internal IPs must be assigned to both the primary and the standby load balancers, and a floating IP address must be assigned which points to either the primary or standby load balancer depending on which balancer is active. You would then manage the load balancer via the floating IP using the URL as listed in the table below. This is the Big IP management utility or Web console. Login to this console using the Admin password or the Support password. (New users can be created in the Big IP web console with read-only rights if require.)

The Big IP root password is used for logging in at the Linux level using SSH. The balancer runs Linux but with a reduced command set shell. This is the command line interface (CLI) of Big IP. Commands are slightly different from normal Linux, for eg. in the CLI, the command "bigtop" is used to monitor the load balancer.

The internal IPs and Floating IP are illustrated in the following table (each IP address is shown as nnn.nnn.nnn.nn but is implicitly unique):

Hostname
Ip Address
Description
Big Ip Management URL
GridBal001
nnn.nnn.nnn.nn
Unit 1 IP Address
https://


GridBal002
nnn.nnn.nnn.nn
Unit 2 IP Address
https://


GridBal003
nnn.nnn.nnn.nn
Floating IP Address
https://


Of the two load balancer units GridBal002 and GridBal002, any one unit could be active (actually handling the load balancing). Typically the two units will have 3 addresses associated with them: Unit 1 IP, Unit 2 IP, Floating IP. The Floating IP is a shared IP address and will only "exist" on the unit that is active at that time. So, if you would like to only manage the active device, then you would connect to https://FloatingIP.

However, if you would like to manage the units directly, you could do so by accessing them via https://Unit1IP or https://Unit2IP.

The other servers in the Grid Control configuration are illustrated by the following table:


Hostname
Ip Address
Description
GridMgt001
nnn.nnn.nnn.nn
Management Server One (OMS 1)
GridMgt002
nnn.nnn.nnn.nn
Management Server Two (OMS 2)
GridMgt003
nnn.nnn.nnn.nn
Management Server Three (OMS 3)
GridMgt100
nnn.nnn.nnn.nn
Virtual Management Server (Virtual OMS)
GridDb001
nnn.nnn.nnn.nn
Database Server One (DBS 1) (Primary or RAC node)
GridDb002
nnn.nnn.nnn.nn
Database Server Two (DBS 2) (Standby or RAC node)

For the purposes of load balancing, Big IP uses the concepts of virtual servers, pools, associated nodes (members) and rules to guide the load balancing. A virtual OMS server is set up at the Big IP level with its own IP address, this in turn points to a pool of Oracle management servers with their own IP addresses. Therefore the outside world has merely to point to the virtual OMS server's IP address or domain name, for both Grid Console logons or Agent uploads from multiple targets. The pool of Oracle Management servers is set up using the IP address:port combination, which means you can have one pool set up for Grid Console logons, and another pool set up for Agent uploads to the OMS.

Keeping this in mind, and after studying the recommendations on load balancing in the Enterprise Manager Advanced Configuration Guide , the following setup was performed using the Big IP Management console:

Two new pools were created, EMAgentUploads and EMConsoles. Each pool has the three OMS nodes (the 3 active ones; however you could add a node which is still being setup and keep it as "forced down" in Big IP so it wont be monitored). The difference between the pools is at the port level. The pool EMAgentUploads is using port 4889 for Agent uploads, and the pool EMConsoles is using port 7777 for console access (7777 is the default port for Oracle Web Cache).

At the pool level, Big IP also allows you to define the persistence (stickiness) should subsequent service requests be routed to the same pool member or not. While Grid Console logons do not require stickiness (we do not care if the console uses a different OMS each time the DBA connects), it was decided that agent uploads could benefit from this stickiness. The pools were modified accordingly and "simple persistence" was set up for the agent uploads pool, but none for the console logons pool.

Two new Virtual OMS servers were created, the first using port 4889 for agent uploads using the EMAgentUploads pool, and the second using port 7777 for the Web Cache EM Console using the EMConsoles pool. Both virtual servers are using the same reserved IP address (but the ports are different).

Big IP Monitors that continuously inspect the status of pool members can also be set up. One such monitor EMMon was setup using the send string of "GET /em/upload" and the receive rule of "Http XML File receiver" which was as per the Enterprise Manager Advanced Configuration Guide. However this monitor worked for the 4889 ports but not the 7777 ports. Therefore a new monitor "EMConsoleMonitor" was created based on http with the send string of "GET /", this was used to successfully monitor the 7777 ports.

The URL http://GridMgt100:7777/em now worked successfully and load balanced Grid Console logons to all the three management servers in the pool.

The URL http://GridMgt100:4889/em was tested and also successfully load balanced in a similar manner, but this URL is to be used for agent uploads only.

Now, when the corporate network alias "gridcentral.in.mycompany.com" is switched to point to the virtual OMS server GridMgt100, the Big IP load balancer starts being used by production.

A point to note is that the initial changes, seen as successful at the Big IP management console, were not effective at the URL level (the URLs didn't work) until the Big IP was failed over to its standby and back again. Any configuration changes performed on the active load balancer should be propagated to the standby load balancer. This is done by the Big-IP configuration utility, go to Redundant Properties and click on Synchronize Configuration. This makes the standby balancer configuration to be the same as the active, including all pools, virtual servers, and rules, so the standby will be ready to take over the load balancing in the event of a failover. Another notable point is that when changing the admin password, because the admin user is configured as the configsync user, you must change the password to match on the peer controller in order for configsync to work.

It is also possible to manually fail over. Before any failover to the standby Big IP, it is recommended to mirror all connections. However, be aware that this setting has a CPU performance hit. This is selected under the properties of Virtual server ..Mirror connections.

It was noted that a management server had been installed on the Grid Control Repository server during the initial install. Since the management server function has been separated from the repository function in this architecture, it is not recommended to use the extra management server that has been installed on the repository server. Simply dedicate that server only for the repository. For this purpose, only the three stand-alone management servers were placed in the Big IP load balancer pools.

The extra management server is a Java process that runs on the repository server and takes up memory and processing power, so it may be a good idea to use opmnctl on this server and shutdown the management server (OC4J_EM). Or, if Unix reboot scripts are being written that startup the OMS, Agent, and Database on the servers whenever there is a reboot, simply leave out starting the OMS in the case of the repository server. Just start the Listener, the Database, and then the Agent. On the other management servers, start the OMS and the Agent.

A diagram of this architecture is shown below:





Conclusion


This kind of horizontal scaling architecture using hardware load balancers and multiple management servers is immensely powerful and fits in very well with Oracle's Grid vision. You can easily manage hundreds or even thousands of targets with such an architecture. The large corporate which had deployed this project scaled easily up to managing 600 to 700 targets with a pool of just three management servers, and the future plan is to manage 2,000 or more targets which is quite achievable.

REFERENCES

Oracle Enterprise Manager Grid Control Architecture for Very Large Sites, by Porus Homi Havewala, Oracle Corporation

3 comments:

  1. Are u dozing off due to the unexpectedly, slow xp startup of your PC and the problems associated with it? Your answer is startup manager.

    ReplyDelete
  2. Nice article. If I have a central GC, what are the additional network bandwidth requiremets for the number of targets monitored.

    ReplyDelete