Friday, March 19, 2010

RAC QUESTIONS AND ANSWERS

Q 1)How do we find the location of OCR and voting disks?

Ans: The location of OCR and Voting disks can be found as follows:

OCR location:

[oracle@dbcl1n1 AUCS1 ~]$ cat /etc/oracle/ocr.locO/P:
ocrconfig_loc=/export/ocw/oracle/ocr1
ocrmirrorconfig_loc=/export/ocw/oracle/ocr2
local_only=FALSE

Voting disk location:

[oracle@dbcl1n1 AUCS1 ~]$ cd $ORA_CRS_HOME/bin
[oracle@dbcl1n1 AUCS1 bin]$ ./crsctl query css votedisk
0. 0 /export/ocw/oracle/vote1
1. 0 /export/ocw/oracle/vote2
2. 0 /export/ocw/oracle/vote3


Q 3) What is cache fusion?
Ans :Cache fusion is the block shipping from one instance to another instance in a RAC environment.Due to Cache Fusion and the elimination of disk writes that occur when other instances request blocks for modifications, the performance overhead to manage shared data between instances is greatly diminished. Not only do Cache Fusion's concurrency controls greatly improve performance, but they also reduce the administrative effort for Real Application Clusters environments.

Cache Fusion addresses several types of concurrency as described under the following headings:

Concurrent Reads on Multiple Nodes
Concurrent Reads and Writes on Different Nodes
Concurrent Writes on Different Nodes

Q 4)Which process is responsible for cache fusion mechanism?
Ans:Global cache service(GCS)

A read request from an instance for a block that was modified by another instance and not yet written to disk can be a request for either the current version of the block or for a read-consistent version. In either case, the Global Cache Service Processes (LMSn) transfer the block from the holding instance's cache to the requesting instance's cache over the interconnect.

Q 5)If we perform a DML acivity in a 2 node RAC environment,if that node disconnect due to some reason,What will be the result?
Ans: The DML statement will execute successfully since Database is there on the shared device.

Note: Thanks Junad for noticing the above answer and helping me in clarifying that in actual:
We neeed to write the callout functions using FAST APPLICATION NOTIFICATION (FAN) Notifications FOR RAC.The callout can be written in OCI, for example JAVA for dml failover to work.

http://download.oracle.com/docs/cd/B19306_01/rac.102/b14197/hafeats.htm

If you are using TAF (Transparent Application failover) with RAC Than only: session(alter session is not) failover,select failover,pre-connect and Basic Failover are supported.

http://download.oracle.com/docs/cd/B28359_01/network.111/b28316/advcfg.htm#NETAG338
http://www.scribd.com/doc/19211546/Failover-for-Rac

In Oracle 11g rel2:
http://www.oracle.com/technetwork/database/app-failover-oracle-database-11g-173323.pdf

Q 6)How do we backup OCR and Voting disk?
Ans:VOTING DISK BACKUP:

To make a backup copy of the voting disk, use the Linux dd command. Perform this operation on every voting disk as needed where voting_disk_name is the name of the active voting disk and backup_file_name is the name of the file to which you want to back up the voting disk contents:

$dd if=voting_disk_name of=backup_file_name
If your voting disk is stored on a raw device, use the device name in place of voting_disk_name. For example:

dd if=/dev/sdd1 of=/tmp/voting.dmp

When you use the dd command for making backups of the voting disk, the backup can be performed while the Cluster Ready Services (CRS) process is active; you do not need to stop the crsd.bin process before taking a backup of the voting disk

OCR BACKUP:

Viewing Available OCR Backups
To find the most recent backup of the OCR, on any node in the cluster, use the following command:

$ocrconfig -showbackup

Backing Up the OCR
Because of the importance of OCR information, Oracle recommends that you use the ocrconfig tool to make copies of the automatically created backup files at least once a day.

In addition to using the automatically created OCR backup files, you should also export the OCR contents to a file before and after making significant configuration changes, such as adding or deleting nodes from your environment, modifying Oracle Clusterware resources, or creating a database. Exporting the OCR contents to a file lets you restore the OCR if your configuration changes cause errors. For example, if you have unresolvable configuration problems, or if you are unable to restart your cluster database after such changes, then you can restore your configuration by importing the saved OCR content from the valid configuration.

To export the contents of the OCR to a file, use the following command, where backup_file_name is the name of the OCR backup file you want to create:

$ocrconfig -exportbackup_file_name

Q 7)What is global cache service,global enque service and global resource directory?
Ans: Global Cache Service
GCS is the main controlling process that implements Cache Fusion. GCS tracks the location and the status (mode and role) of the data blocks, as well as the access privileges of various instances. GCS is the mechanism, which guarantees the data integrity by employing global access levels. GCS maintains the block modes for data blocks in the global role. It is also responsible for block transfers between the instances. Upon a request from an Instance GCS organizes the block shipping and appropriate lock mode conversions. The Global Cache Service is implemented by various background processes, such as the Global Cache Service Processes (LMSn) and Global Enqueue Service Daemon (LMD).

Global Enqueue Service

The Global Enqueue Service (GES) manages or tracks the status of all the Oracle enqueuing mechanisms. This involves all non Cache-fusion intra-instance operations. GES performs concurrency control on dictionary cache locks, library cache locks and the transactions. GES does this operation for resources that are accessed by more than one instance.

GES/GCS Areas

GES and GCS have the memory structures associated with global resources. It is distributed across all instances in a cluster. This area is located in the variable or shared pool section of the SGA. The list below shows the additions.

Global Resource Directory Global Resource Directory (GRD) is the internal database that records and stores the current status of the data blocks. Whenever a block is transferred out of a local cache to another instance’s cache, the GRD is updated. The following resources information is available in GRD.

* Data Block Identifiers (DBA)
* Location of most current version
* Modes of the data blocks ( (N)Null, (S)Shared, (X)Exclusive )
*The Roles of the data blocks (local or global) held by each instance
*Buffer caches on multiple nodes in the cluster

7 comments:

  1. Hi Rafi Salam,

    This is Alam from pune,got ur blogs.
    Thaks man

    ReplyDelete
  2. Walkslm Shadab,
    Ok cool....Its my pleasure...


    Best regards,

    Rafi.

    ReplyDelete
  3. It's really been a fascinating experience reading your blog & You 've done a great job for budding Apps DBA's like me.

    I wanna draw your attention to below ans that u provided

    Q 5)If we perform a DML acivity in a 2 node RAC environment,if that node disconnect due to some reason,What will be the result?

    Ans: The DML statement will execute successfully since Database is there on the shared device.

    After reading the same about DML statement failover at the link below
    http://www.dba-oracle.com/art_oramag_rac_taf.htm

    Kindly explain what should i assume that The DML statement will execute successfully in case of node failure ?

    Thanks!!

    ReplyDelete
    Replies
    1. DML statement will be rollback and SELECT will be failed over to the surviving node.
      Regards,
      Sharif,Abudhabi

      Delete
    2. In rac dml statements will not execute whenever node disconnect or evicted from cluster.at that time dml statements get rollback.there is no dml failover in rac .

      Delete
  4. This Nice stuff you have bloged, please keep sharing it.

    Thanks,
    nikhil

    ReplyDelete
  5. Hi Sharif,
    We don't have direct DML failover method from Oracle you are correct.

    Select statement uses the benefits of oracle net service and keep the track of cursor that how much rows has been returned to client associated with cursor. So when a select statement fails (due to network) oracle net service will connect to other node in rac and attach to previous cursor. That is called repositioning the cursor to client. Then oracle can resume (it know how many rows has been returned) select statement and fetching rows.

    There is a way for this to work, however it must be a part of your application code. This is over-simplified but you should be able to follow the logic.

    open connection
    start AUTONOMOUS transaction
    do stuff
    receive disconnect error
    log this error in your app log
    test connection
    if dead AND cnt < {n} #test 5 times before completely failing
    cnt=cnt+1
    close current connection
    go to open connection and start again
    commit
    if return code is failure on connection here, again, go to open connection as this transaction is no longer active.

    Thiis works due to the fact that during instance failure in the middle of this transaction, the first attempt (that failed) will be rolled back during instance recovery which is performed by a surviving node.

    If the entire cluster is down, then subsequent attempts will also fail and the app should fail completely at that point. otherwise, it will connect to one of the other nodes and simply continue on.

    ReplyDelete