Wednesday, 14 December 2011

Manually control Oracle Clusterware Stack (10G R2)


In this article I am going to show how to  use commands to control manage Oracle Clusterware (10G R2).
This article I uses and refers to some Oracle documents and Oracle resources (Oracle Metalink).
CRS is the primary program for managing High Availability oprations of applications within the cluster. Applications that CRS manages are called resources. By default, CRS can manage RAC resources such as database instance, ASM instances, listeners, instance VIPs, services, ONS, and GSD. CRS is also able to manage other types of application processes adn application VIPs.
CRS resources are managed according to their configuration parameters(resource profile) stored in OCR and an action script stored anywhere you want.
CRS provides the following commands to support the life cycle of resource:
CRS_PROFILE creates and edit a resource profile.
CRS_REGISTER adds the resource to the list of applications managed by CRS
CRS_START starts the resource
CRS_STAT informs you about the current status of a list of resources
CRS_RELOCATE moves the resource to another node of cluster.
CRS_UNREGISTER removes the resource from CRS
RAC RESOURCES
The CRS_START -t command shows you all the resources currently under Oracle Clusterware control. The resources start with the prefix .ora are the resources that implement RAC HA in cluster environment.
The state of resource can be ONLINE, OFFLINE, UNKNOWN. UNKNOWN results from a failed start/stop action, and can be reset by a CRS_STOP -f command.
You  can use the CRS_STAT -p resource_name command to show OCR contents for the named resource. Here is brief description of the most important attributes:
NAME is the name of the application resource.
TYPE always is APPLICATION for all CRS resources.
ACTION_SCRIPT is the name and location of the action script.
ACTIVE_PLACEMENT default to 0
AUTO_START is a flag indicating whether Oracle Clusterware should automatically start a resource after a cluster restart. When set to 0, Oracle Clusterware starts the resource only if it had been running before the restart. When set to 1, Oracle Clusterware always starts the resource after a start. When set to 2, Oracle Clusterware never starts the resource.
CHECK_INTERVAL is the time interval, in seconds, between repeated executions of the check command for the application.
FAILOVER_DELAY is the amount of time, in seconds, that Oracle Clusterware waits before trying to restart or fail over a resource.
PLACEMENT defines the placement policy( BALANCED, FAVORED, or RESTRICTED) that specifies how Oracle Clusterware chooses the cluster node on which to restart the resource:
HOSTING_MEMBERS  defines a list of ordered nodes that can host the resource
BALANCED: OC (Oracle Clusterware) favors starting or restarting the application on the node that is currently running the fewest resources.
FAVORED: OC refers to the list of nodes in the HOSTING_MEMBERS attribute of the application profile. If none of the nodes in the hosting node list are available, then OC places the application on any available node.
RESTRICTED: similar to the FAVORED policy, except that if none of the nodes on the hosting list are available, then OC does not start or restart the application
VOTING DISK
CSS is the service that determines which nodes in the cluster are available, and provides cluster group membership and locking services. CSS determines node availability via communication through a dedicated private network with a voting disk used as a secondary communication mechanism. This can be done by sending heartbeat messages through the network. The voting disk is a shared raw disk patition or file on a cluster file system that is available to all nodes in the cluster. The voting disk is used to communicate the node state information used to determine which nodes go offline. Without voting disk, it can not determine whether it is experiencing a network failure or other nodes are no longer available.
The CSS has two inportant parameters:
MISSCOUNT presents the maximum time, in seconds, that a network heartbeat across the interconnect can be missed before entering into a cluster reconfiguration for node eviction purposes. The default value for the MISSCOUNT parameter value is 30 seconds.
DISKTIMEOUT parameter represents the maximum time, in seconds, that disk heartbeat can be missed before entering into a cluster reconfiguration for node eviction purposes. Its default value is 200 seconds.
MULTIPLEXING VOTING DISK
Voting disk is a vital resource for your cluster availability. It is desirable to use multiple voting disk when using less reliable storage. OC requires you have at least three voting disks to avoid a single point of failure. The multiplexed voting disk should be located on physically idependent storage devices. You can have up to 32 voting disks, using the following formular to determine the number of voting disks: V = F*2 +1 where V is number of voting disks, and F id the number of disk failures you want to survive.
CHANGE VOTING DISK CONFIGURATION
To add a new voting disk:
    # crsctl add css votedisk < new voting disk path >
To remove a voting disk:
    # crsctl delete css votedisk <old voting disk path>
If your cluster is down, then you can use the -FORCE option:
    # crsctl add css votedisk < new voting disk path> -force
    # crsctl delete css votedisk <old voting disk path> -force
Note that you can not change your voting disk configuration online. To work around the problem, perform the configuration change with the -force option while clusterware is down on all nodes.
BACKUP and RECOVER YOUR VOTING DISKS
It is recommended to use symbolic links to specify your voting disk path because voting disk paths are stored in OCR, and it is not supported to edit the OCR file directly.
A new backup of one your available voting disks should be taken any time a new node is added, or an existing node is removed. Using the dd command (ocopy in Windows environment). A backup taken via the dd command can be a hot backup. Using the following commands to complete this task:
List the voting disks currently used by CSS
    $ crsctl query css votedisk
Backup a voting disk using command:
    $ dd if=<voting disk path> of=<backup path> bs=4k
The block size for the dd command should be 4K.
Recover voting disks by restoring the first one by using the dd command, and then multiplex it if necessaty. If no voting disk is available, reinstall Oracle Clusterware.
MANAGING OCR FILES and LOCATIONS
The ocrconfig tool is the main configuration tool for OCR. ocrcheck tool enables you to verify the OCR integrity of both OCR and its mirror. Use ocrdump tool to write OCR contents, or part of it, to a text or XML file.
    -export option generates logical backup
    - import option restores your OCR information taken by -export
    -upgrade and -downgrade upgrades or downgrades OCR
    -showbackup option to view the generated backups in default location. You can change backup location using the -backuplocation option.
    -replace ocr or -replace ocrmirror options to add, remove, or replace the primary OCR files.
    - repair option to change the OCR parameters
The default location of each automatically generated OCR backup file is the <CRS Home>/cdata/<cluster_name> directory.
OCR content is automatically backed up physically at every 4 hours, OCR keeps the last three copies, at the end of every day, OCR keeps the last two copies, at the end of every week, CRS keeps the last two copies.
BACK UP OCR MANUALLY
Daily backups of your automatic OCR backups to a different storage device.
Doing logical backup of your OCR before and after making sinificant changes using command:
    # ocrconfig -export file_name