Home > oracle, windows > Exhaust of Windows 2008 heap memory with Oracle Database 11.2.0.2

Exhaust of Windows 2008 heap memory with Oracle Database 11.2.0.2

Recently I had an interesting setup for one of our customers. Because they got Oracle Standard Edition and Windows 2008 Server R2 Standard Edition licenses I was asked to create HA database installation. After looking around I found few docs about installing Standard Edition with Clusterware and I had some ideas. Finally I installed Grid Infrastructure on both servers and Oracle Database binaries. Then created single instance database on the second server and replicated the configuration to the first one. Currently the relocation of the database is done manually, but one could create a start/stop/monitor scripts and integrate these with GI. Once the database starts it’s registering at the scan listener so in theory it’s running in HA (just the relocation is manual) :)

So during the weekend I received mail from my colleagues above error messages they received from the database: connect error, Socket read timed out. It wasn’t a rush as the database is not yet in production, but it’s ahead and this was the first task for the Monday. Next day I looked around and everything was up and running, except that I wasn’t able to login through the listener and I also wasn’t able to stop or relocate it. Looking at the logs I found at some point the following message: TNS-12531: TNS:cannot allocate memory which explains the previous message.

That was weird, the server on which error appeared was the first one and had only GI running and SCAN LISTENER. This really looked like a memory leak, it’s a Windows so maybe that was obvious. I decided to look around the processes using the Resource Monitor when I found a lot of many cmd.exe processes. To confirm the problem I used Process Explorer which is a very nice tool for Windows. As could be seen below I’ve got plenty of cmd processes which were spawned, but not (obviously) closed after completion:

It turned out that this is a bug for 11.2.0.2 and Windows (64 bit). The Oracle CVU resource (ora.cvu), which by default is started on the first node in the cluster (this makes sense now) it’s doing checks on every six hours (CHECK_INTERVAL=21600) and leaves process open. Because of this the heap memory is exhausted and that’s the reason why the SCAN LISTENER is failing and giving the error message TNS-12531: TNS:cannot allocate memory

 

The following errors could be seen in Windows Eventlog, once the patch is applied the errors disappeared:
Faulting application lsnrctl.exe, version 11.2.0.2, time stamp 0x4cea8f55, faulting module kernel32.dll, version 6.0.6001.18538, time stamp 0x4cb73957, exception code 0xc0000142, fault offset 0x00000000000b1b48, process id 0x1eac, application start time 0x01cc6ab588f992c0.

Faulting application cmd.exe, version 6.0.6001.18000, time stamp 0x47918bde, faulting module kernel32.dll, version 6.0.6001.18538, time stamp 0x4cb733e1, exception code 0xc0000142, fault offset 0x0006f1e7, process id 0x1004, application start time 0x01cc6af0fa982500.

Faulting application sclsspawn.exe, version 0.0.0.0, time stamp 0x4ce622a7, faulting module kernel32.dll, version 6.0.6001.18538, time stamp 0x4cb73957, exception code 0xc0000142, fault offset 0x00000000000b1b48, process id 0x1ca0, application start time 0x01cc6c0e5efd5380.

This is the bug at MOS:
Bug 12529945: CVU HEALTH CHECKS EXHAUST WINDOWS HEAP MEMORY

The bug should have been fixed in BP8, but I applied the latest one BP10:
Patch 12849789: ORACLE 11G 11.2.0.2 PATCH 10 BUG FOR WINDOWS (64-BIT AMD64 AND INTEL EM64)

 

Regards,
Sve

Similar Posts:

p5rn7vb
Categories: oracle, windows Tags: , , ,
  1. September 29th, 2011 at 16:38 | #1

    Hi Svetoslav,

    nice idea for HA solution – very cheap :)

    hm… very nasty bug for 11.2.0.2 – have to remember that.
    I would never expect so dangerous bug in 11gR2.

    Thanks for sharing.
    Regards,
    Marko

  2. Svetoslav Gyurov
    September 29th, 2011 at 16:45 | #2

    Hi Marko,

    Thanks for reading! Well it was not the best solution, I would prefer SE+RAC or Windows Enterprise, because of the cluster software.

    Anyway, the bug should have been fixed, but yesterday I got the same errors, probably it’s another bug or this one wasn’t fixed at all.

    Regards,
    Sve

  3. JC
    December 25th, 2011 at 13:40 | #3

    Hi Sve,

    We have similar setup of single instance on grid infrastructure. And I believe we are hitting the same bug.

    First, I was trying to apply the patch 8 on grid home, and yet got application error, which stop us to apply the oracle patch to the grid home, and yet I have no problem to apply the same patch to the db home.

    anyway, do you have the same problem to apply patch 10 on grid home? And after 3 months, do you have a fix yet?

    JC

  4. Svetoslav Gyurov
    January 2nd, 2012 at 12:58 | #4

    Hi JC, Happy New Year!

    When I found this to be a bug, it was fixed in BP8. When we decided to patch it, BP10 was already released and as the BPs are commutative, we decided to apply BP10. I also got some errors when I try to apply the BP to grid home, which I described here. After applying the patch the errors disappear.

    If you haven’t apply the patch yet and you found yourself in urgent situation you could kill all the cmd processes using ProcessExplorer. Still this is not a solution, but it’s a temporal workaround.

    Regards,
    Sve

  1. No trackbacks yet.