Home > hp-ux, oracle > Many racgmain(check) processes at HP-UX 11iv3

Many racgmain(check) processes at HP-UX 11iv3

August 17th, 2009 Sve Leave a comment Go to comments

I was called that some commands for controlling the cluster and the oracle are not working. This was two node cluster installed with Oracle 10.2.0.4 RAC on HP-UX 11.31 Data Center OE (December 2008) working for a month already.

Arriving at the customer site I noticed that there are a lot (around 500) of hanging racgmain(check) processes which obviously were blocking some of the cluster commands. Errors also can be seen at this log: $CRS_HOME/log/$HOSTNAME/crsd/crsd.log:

2009-04-08 15:22:01.700: [  CRSEVT][90801] CAAMonitorHandler :: 0:Action Script /oracle/ora10g/bin/racgwrap(check) timed
out for ora.ORCL.ORCL1.inst! (timeout=600)
2009-04-08 15:22:01.700: [  CRSAPP][90801] CheckResource error for ora.ORCL.ORCL1.inst error code = -2
2009-04-08 15:25:42.180: [  CRSEVT][90811] CAAMonitorHandler :: 0:Could not join /oracle/ora10g/bin/racgwrap(check)
category: 1234, operation: scls_process_join, loc: childcrash, OS error: 0, other: Abnormal termination of the child

There are a lot of bugs at metalink, but no documents or suggestions how to fix that.

Fortunately we found a solution:

1. Stop CRS on all nodes.

2. Make a copy of racgwrap located under $ORACLE_HOME/bin and $CRS_HOME/bin on all nodes

3. Edit the file racgwrap and modify the last 3 lines from:

$ORACLE_HOME/bin/racgmain “$@”
status=$?
exit $status

to:

exec $ORACLE_HOME/bin/racgmain “$@”

4. Restart CRS and make sure that all the resources are starts.

We were lucky that hit the bug just before the migration and restarting the instances/servers was easy enough. I don’t know if this really solves the problem, but we never hit the bug again.

Categories: hp-ux, oracle Tags: ,
  1. October 26th, 2009 at 16:23 | #1

    Hi,
    Thanks for article. Everytime like to read you.
    Have a nice day
    GlenStef

  2. October 30th, 2009 at 13:58 | #2

    Hi there,
    It is a shame!
    Worker

  3. November 2nd, 2009 at 21:14 | #3

    Amazing! Not clear for me, how offen you updating your sve.to.
    Thanks
    Boldy

  4. sve
    November 3rd, 2009 at 14:07 | #4

    Hi there,

    Thanks for the interest. I’m checking the blog everyday and I’m trying to post few articles every month.

    Regards,
    Sve

  5. A.Wahab
    November 18th, 2009 at 13:09 | #5

    Hello,
    I am getting the same errors in crsd.log. i checked the racgwrap scripts in oracle and crs home. It is already
    exec $ORACLE_HOME/bin/racgmain “$@”
    !!
    any ideas?

  6. sve
    November 18th, 2009 at 16:20 | #6

    Hi A.Wahab, thanks for asking.
    Just to ask you:
    - what is the output of ps -ef , do you see a lot of these processes ?
    - have you changed the value in the script and then have you restarted the crs ?

    Regards,
    sve

  7. A.Wahab
    November 19th, 2009 at 07:55 | #7

    Hi sve,
    I saw only two racgmain process by ps -ef.
    I did not change anything as, it was already like what you mentioned:
    exec $ORACLE_HOME/bin/racgmain “$@”

  8. sve
    November 19th, 2009 at 23:01 | #8

    Hi Wanab,
    Well, our problem was that some of the commands for controlling the cluster were not working because this script was hanging. We were having around 500 hanging processes and we were observing these errors in crsd.log. After fixing the script we never hit the bug again.

    Are you having any problems or you just see these errors in the log ?

    Regards,
    sve

  9. June 26th, 2010 at 05:52 | #9

    Such a specific fix for such a specific problem. How did you came out with that idea?
    I tried in my cluster, didn’t work.

  10. Sve
    June 28th, 2010 at 01:13 | #10

    Hi, thanks for reading. Well I didn’t, actually the Oracle support came with the idea. Did you restart the whole CRS so it could load the change ?

    By the time we hit the bug there was only an internal bug and search for this now at Oracle Support I’m able to find this one:

    Many Orphaned Or Hanging “racgmain” processes Running [ID 732086.1]

    Other solution would be to apply the latest patch set (10.2.0.4) and then apply the one of CRS bundles patch from bundle #2 onwards. Anyway, if you go with this solution you should make a backup and be very careful.

    Regards,
    Sve

  1. No trackbacks yet.

WP SlimStat