Archive

Archive for the ‘oracle’ Category

Shared disk support for VirtualBox

August 9th, 2010 Sve 1 comment

I’m very happy to announce that VirtualBox now supports shared disks. Finally we can attach one disk to several virtual machines and run Oracle RAC and other clusters. As Oracle promised, this feature is released with the next maintenance patch (thanks!).

There is a new image write mode which is called shareable and this options is now available for the commands createhd and modifyhd of VBoxManage. To create new shared image use the command VBoxManage createhd with type shareble, creating shared disk from the GUI is not possible. To mark an existing image as a shared use the command VBoxManage modifyhd with type shareable.

Something important is that only fixed size disks are supported. If the disk is dynamic you will encounter the following error if you try to modify the image:
ERROR: Cannot change type for medium ‘/home/vm/ora11g_shared.vdi’ to ‘Shareable’ since it is a dynamic medium storage unit

There is other minor issue, if the image is already attached to two virtual machines the command modifyhd will also fail:
ERROR: Cannot change the type of medium ‘/home/vm/ora11g_shared.vdi’ because it is attached to 2 virtual machines

And finally, YES it works, I have tested it already!

sve@host:~$ VBoxManage showhdinfo /home/vm/ora11g_shared.vdi
Oracle VM VirtualBox Command Line Management Interface Version 3.2.8
(C) 2005-2010 Oracle Corporation
All rights reserved.

UUID:                     7521f059-1196-4d68-a1a6-cf0082fb446a
Accessible:               yes
Description:          
Logical size:             2048 MBytes
Current size on disk:     2048 MBytes
Type:                     shareable
Storage format:           VDI
In use by VMs:            labs1 (UUID: 25475ff4-70bc-4e2e-aa38-d8fae289273e)
                          labs2 (UUID: e4441f4c-1ef9-42e0-8e54-d2aec2c6cf4f)
Location:                 /home/vm/ora11g_shared.vdi

Regards and happy migration ;)
Sve

Categories: oracle, virtualization Tags: , , ,

Patch Set 10.2.0.5 for Oracle Database Server re-released on Linux x86

August 9th, 2010 Sve No comments

A week ago Oracle has re-released the patch set 10.2.0.5 for Oracle Database on Linux x86 (32 bit). It seems that some additional bug fixes were added to the patch set, but I was unable to find exactly which one. The patch set is available for download from My Oracle Support with same number 8202632. There is also alert with MOS ID 1156958.1 regarding the re-release of the patch set.

Regards,
Sve

Categories: linux, oracle Tags:

Many open files on HP-UX after RAC upgrade to 10.2.0.4 – racgimon file handle leak

July 23rd, 2010 Sve No comments

Two months after patching a customer database to 10.2.0.4 I’ve received a call, telling me that the database is hanging. Usually this happens when they missed the backup of the archive logs and the database stops. This time there was enough space available and this was not the problem. I logged to the first node and start looking around, weird things were happening, some commands were failing and other were hanging. Then I realized that this is not an ordinary case and start looking deeper. It turns out that this is a bug of Oracle with HP-UX and there is a patch and work around too.

The customer was having HP-UX 11.23 (September 2006) with patch bundles from September 2008. The database was Oracle RAC Enterprise Edition 10.2.0.2.

This problem had very big impact on the database because although the database is running in RAC the database was not accessible and there were a lot of locks. Rebooting the node or killing the processes do the job

After some reading it figure out that this happens only on HP-UX, after patching the database to 10.2.0.4 and it happens only on the first node.

Here are some symptoms:


Executing sar -v show the current-size and maximum size of the system file table:

12:00:00   N/A   N/A 328/4200  0  1374/286108 0  41906/65536 0
12:02:00   N/A   N/A 330/4200  0  1376/286108 0  41944/65536 0
12:04:00   N/A   N/A 336/4200  0  1390/286108 0  41999/65536 0
12:06:00   N/A   N/A 331/4200  0  1377/286108 0  41983/65536 0
12:08:00   N/A   N/A 330/4200  0  1376/286108 0  41976/65536 0
12:10:00   N/A   N/A 330/4200  0  1377/286108 0  41935/65536 0


With lsof the following open files are seen:

racgimon   3506 oracle   14u   REG             64,0x9        1552   29678 /oracle/ora10g/dbs/hc_baandb1.dat
racgimon   3506 oracle   28u   REG             64,0x9        1552   29678 /oracle/ora10g/dbs/hc_baandb1.dat
racgimon   3506 oracle   30u   REG             64,0x9        1552   29678 /oracle/ora10g/dbs/hc_baandb1.dat
racgimon   3506 oracle   37u   REG             64,0x9        1552   29678 /oracle/ora10g/dbs/hc_baandb1.dat


The processes which is holding the open files:

 oracle  3506     1  0  Nov  5  ?        18:16 /oracle/ora10g/bin/racgimon startd baandb


At this log “$ORACLE_HOME/log/<NodeName>/racg/imon_<InstanceName>.log” every minute can be seen the following error:

2009-12-02 12:12:35.454: [    RACG][73] [3506][73][ora.baandb.baandb1.inst]: GIMH: GIM-00104: Health check failed to connect to instance.
GIM-00090: OS-dependent operation:mmap failed with status: 12
GIM-00091: OS failure message: Not enough space
GIM-00092: OS failure occurred at: sskgmsmr_13
2009-12-02 12:13:35.474: [    RACG][73] [3506][73][ora.baandb.baandb1.inst]: GIMH: GIM-00104: Health check failed to connect to instance.
GIM-00090: OS-dependent operation:mmap failed with status: 12
GIM-00091: OS failure message: Not enough space
GIM-00092: OS failure occurred at: sskgmsmr_13


When the file table gets full weird things start to happen,  in the syslog the following can be seen:

Nov  5 08:00:02 db1 vmunix: file: table is full
Nov  5 08:00:03 db1 vmunix: file: table...
Nov  5 08:00:03 db1 vmunix: file...
Nov  5 08:00:03 db1 vmunix: file...
Nov  5 08:01:13 db1 vmunix: file: table is full
Nov  5 08:11:15 db1  above message repeats 34260 times


Also in the alertlog file the following can be seen:

ORA-00603: ORACLE server session terminated by fatal error
ORA-27544: Failed to map memory region for export
ORA-27300: OS system dependent operation:socket failed with status: 23
ORA-27301: OS failure message: File table overflow
ORA-27302: failure occurred at: sskgxpcre1


Solution:
Base bug is 6931689 (SS10204-HP-PARISC64-080216.080324 HEALTH CHECK FAILED TO CONNECT TO INSTANCE), but it’s not public. It’s fixed in CRS 10.2.0.4 Bundle Patch #2, but the actual CRS bundle is PSU2 with Patch# 8705958: TRACKING BUG FOR 10.2.0.4.2 PSU FOR CRS which is around 41Mb big.
This patch# 8705958 should be applied to all Oracle homes although the bug is in the database CRS should always be a higher version.

To apply this patch OPatch version must be at least 10.2.0.4.7, which can be downloaded with patch# 6880880. At the moment of writing this the latest version was 10.2.0.4.9 and its 34Mb. To install it, simply download it and unzip it under ORACLE_HOME.

I didn’t went with the patch because I read some scary stuff at OTN and thanks to Ivan Kartik I integrated a dirty work around. He proposed very good script which is checking if opened files are more than 20000 just to kill the racgimon process:

13:56:00   N/A   N/A 307/4200  0  1352/286108 0  44102/65536 0
13:58:00   N/A   N/A 307/4200  0  1353/286108 0  44119/65536 0
14:00:01   N/A   N/A 309/4200  0  1355/286108 0  44135/65536 0
14:02:01   N/A   N/A 307/4200  0  1353/286108 0  44153/65536 0
14:04:01   N/A   N/A 301/4200  0  1336/286108 0  2583/65536 0
14:06:01   N/A   N/A 306/4200  0  1347/286108 0  2610/65536 0
14:08:01   N/A   N/A 299/4200  0  1333/286108 0  2583/65536 0
14:10:01   N/A   N/A 300/4200  0  1335/286108 0  2571/65536 0

The work around fixed the problem. This article was written half an year ago and reading MOS now they say that this bug is fixed in 10.2.0.5 which was released at the beginning of June.

Regards,
Sve

Categories: hp-ux, oracle Tags: ,

Oracle will bring back VirtualBox shared disk capability

July 1st, 2010 Sve No comments

During the questions section of the last webinar Introducing Oracle VM VirtualBox 3.2 Oracle said that they received a complains from a lot of customers using VirtualBox regarding the installation of Oracle RAC. This requires a shared disk drive to be accessed by the nodes (VMs) of the cluster simultaneously, but this cannot be achieved directly. There is a workaround by using iSCSI, but this is not the point.

Achim Hasenmueller from VirtualBox engineering team said that they plan to deliver this capability very soon with the next maintenance release and not to wait for the major update. I was surprised to hear that they used to have this feature working, but during one of the major changes to the storage stack they have lost it. I was not able to find this one at the changelogs, but by accident I found the announcement of this limitation at debian bug report log:

From: "VirtualBox" <trac@virtualbox.org>
Cc: vbox-trac@virtualbox.org
Subject: Re: [VirtualBox] #1188: Please support to share a disk image
 between two guests
Date: Wed, 08 Apr 2009 15:24:49 -0000

#1188: Please support to share a disk image between two guests
-----------------------------+----------------------------------------------
Reporter:  bzed              |        Owner:
    Type:  enhancement       |       Status:  closed
Priority:  minor             |    Component:  VM control
 Version:  VirtualBox 1.5.4  |   Resolution:  wontfix
Keywords:                    |        Guest:  other
    Host:  other             |
-----------------------------+----------------------------------------------
Changes (by frank):

  * status:  new => closed
  * resolution:  => wontfix

Comment:

 Starting with 2.1.0, a disk image can be attached to two VMs at the same
 time, but only one of these two VMs can be powered on at the same time.
 Klaus already explained why we wouldn't implement sharing an image between
 running VMs. Closing

I’ve been using VirtualBox for an year now, but recently I decided to install Oracle RAC. Like most of the ex-vmware users I’ve just created a new disk and added it to two virtual machines. The first one started normaly, but when I tryed to start the second one I got the following error:

Result Code: VBOX_E_INVALID_OBJECT_STATE (0x80BB0007)
Component: Machine
Interface: IMachine {6d9212cb-a5c0-48b7-bbc1-3fa2ba2ee6d2}

It turns out that VirtualBox will not allow more than one running VM to use a VDI file. The solution I found most useful is to setup a third server (or VM) with Openfiler iSCSI host. Then VirtualBox can transparently present iSCSI disk to a virtual machine as a virtual hard disk. The guest operating system will not see any difference between a virtual disk image (VDI file) and an iSCSI target. To achieve this, VirtualBox has an integrated iSCSI initiator.

Regards,
Sve

Categories: oracle, virtualization Tags: , ,

Patch Set 10.2.0.5 for Oracle Database Server

June 9th, 2010 Sve No comments

Just to mention that few days ago patch set 10.2.0.5 was released for HP-UX Itanium and IBM AIX systems. The patch set is available for download from My Oracle Support with number 8202632.

Regards,
Sve

Categories: hp-ux, oracle Tags:

Oracle 11g R2 installer fails on HP-UX 11iv3

May 20th, 2010 Sve 4 comments

Running the installer of any of the products (client, grid, database) of Oracle Database 11g Release 2 on HP-UX 11iv3 (Itanium) fails with:
“An internal error occurred within cluster verification framework”

After starting ./runInstaller the following error window pops-up:
runInstaller error

Also at the installAction$DATE.log the following error can be seen:

SEVERE: [FATAL] An internal error occurred within cluster verification framework
Unable to get the current group.

This happens, because patch PHCO_40381 is not installed. There is a list of patches to be installed at 2.3.4 Patch Requirement of the Database Installation guide for HP-UX.

The first one is:
PHCO_40381 11.31 Disk Owner Patch

The patch is available from ITRC. It’s 205Kb big and it fixes behavior of the command diskowner. The installation of the patch does not require reboot of the server.

After the installation of the patch, runInstaller starts succesfully.

There is also MOS Doc ID regarding this problem:
HP-UX: 11gR2 runInstaller Fails with “An internal error occurred within cluster verification framework” [ID 983713.1]


Regards,
Sve

Categories: hp-ux, oracle Tags: , ,

Visiting BGOUG

April 22nd, 2010 Sve No comments

I’ll be visiting the spring conference of the BGOUG this weekend. It will be very interesting since there are some topics related to Sun technologies. Again this time we have a lot of foreign presence.

Categories: Uncategorized, oracle Tags:

Presentation about Oracle on HP-UX and Linux

January 16th, 2010 Sve No comments

At the last BGOUG I was talking about some of the differences between HP-UX and Linux although they cannot be compared because they run on different platforms. I tried to figure out how Linux penetrate the Enterprise OS market in the last years, what is still missing and what features I would like to see in Linux that HP-UX has for a long time. I also discussed topics about memory, best practices in multipathing and networking, storage options, asmlib tips and tricks and some words about backup and recovery.

The presentation can be found here

Categories: hp-ux, linux, oracle Tags:

Many racgmain(check) processes at HP-UX 11iv3

August 17th, 2009 Sve 10 comments

I was called that some commands for controlling the cluster and the oracle are not working. This was two node cluster installed with Oracle 10.2.0.4 RAC on HP-UX 11.31 Data Center OE (December 2008) working for a month already.

Arriving at the customer site I noticed that there are a lot (around 500) of hanging racgmain(check) processes which obviously were blocking some of the cluster commands. Errors also can be seen at this log: $CRS_HOME/log/$HOSTNAME/crsd/crsd.log:

2009-04-08 15:22:01.700: [  CRSEVT][90801] CAAMonitorHandler :: 0:Action Script /oracle/ora10g/bin/racgwrap(check) timed
out for ora.ORCL.ORCL1.inst! (timeout=600)
2009-04-08 15:22:01.700: [  CRSAPP][90801] CheckResource error for ora.ORCL.ORCL1.inst error code = -2
2009-04-08 15:25:42.180: [  CRSEVT][90811] CAAMonitorHandler :: 0:Could not join /oracle/ora10g/bin/racgwrap(check)
category: 1234, operation: scls_process_join, loc: childcrash, OS error: 0, other: Abnormal termination of the child

There are a lot of bugs at metalink, but no documents or suggestions how to fix that.

Fortunately we found a solution:

1. Stop CRS on all nodes.

2. Make a copy of racgwrap located under $ORACLE_HOME/bin and $CRS_HOME/bin on all nodes

3. Edit the file racgwrap and modify the last 3 lines from:

$ORACLE_HOME/bin/racgmain “$@”
status=$?
exit $status

to:

exec $ORACLE_HOME/bin/racgmain “$@”

4. Restart CRS and make sure that all the resources are starts.

We were lucky that hit the bug just before the migration and restarting the instances/servers was easy enough. I don’t know if this really solves the problem, but we never hit the bug again.

Categories: hp-ux, oracle Tags: ,

Changing physical path of ASM disk group

August 11th, 2009 Sve No comments

The purpose of this document is to show that changing the psyhical path of ASM disk MEMBERS is possible and there is no risk.

For the purpose of the test, we create one logical volume called lvora and we grant ownership of this file to oracle:
root@node1:/# lvcreate -n lvora -L 1024 vg00
root@node1:/# chown oracle:dba /dev/vg00/rlvora

Start DBCA and create ASM instance:
- set sys password
- set data group name to DATA
- set redundancy to External
- set Disk Discovery Path to /dev/vg00/rlv*

At this stage only /dev/vg00/rlvora is CANDIDATE disk for disk group with size of 1 Gb.
Select the disk and create the disk group. Now we have one mounted disk group called DATA with external redundancy and
using /dev/vg00/rlvora as a MEMBER of the disk group.

To simulate changing (or failure) of the physical disk or even moving data from one physical disk to another we used dd
to copy raw data from /dev/vg00/rlvora to /dev/rdsk/c0t2d0 and then we delete the logical volume.

We shutdown the ASM instance and copy the contents of the logical volume to the raw physical disk using dd:

oracle@node1:/home/oracle$ export ORACLE_HOME=/oracle/ora10g
oracle@node1:/home/oracle$ export ORACLE_SID=+ASM
oracle@node1:/home/oracle$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.1.0 - Production on Thu Dec 13 01:50:38 2007

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

Connected to:
Oracle Database 10g Release 10.2.0.1.0 - 64bit Production
With the Real Application Clusters option

SQL> select GROUP_NUMBER, NAME, STATE, TYPE from v$asm_diskgroup;

GROUP_NUMBER NAME                           STATE       TYPE
------------ ------------------------------ ----------- ------
1 DATA                           MOUNTED     EXTERN

SQL> select GROUP_NUMBER, DISK_NUMBER, MODE_STATUS, STATE, NAME, PATH from v$asm_disk;

GROUP_NUMBER DISK_NUMBER MODE_ST STATE    NAME      PATH
------------ ----------- ------- -------- --------- ----------------
1             0          ONLINE  NORMAL   DATA_0000 /dev/vg00/rlvora

SQL> shutdown immediate
ASM diskgroups dismounted
ASM instance shutdown
SQL>  exit

oracle@node1:/home/oracle$ exit

root@node1:/root# chown oracle:dba /dev/rdsk/c0t2d0

root@node1:/root# dd if=/dev/vg00/rlvora of=/dev/rdsk/c0t2d0 bs=1024k
1024+0 records in
1024+0 records out
root@node1:/root#  lvremove /dev/vg00/lvora
The logical volume "/dev/vg00/lvora" is not empty;
do you really want to delete the logical volume (y/n) : y
Logical volume "/dev/vg00/lvora" has been successfully removed.
Volume Group configuration for /dev/vg00 has been saved in /etc/lvmconf/vg00.conf

We have moved data to /dev/rdsk/c0t2d0 and we have removed the logical volume.

Now if you try to mount the disk group or start the instance you will get the following error:

oracle@node1:/home/oracle$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.1.0 - Production on Thu Dec 13 02:05:48 2007

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup
ASM instance started

Total System Global Area  130023424 bytes
Fixed Size                  1991968 bytes
Variable Size             102865632 bytes
ASM Cache                  25165824 bytes
ORA-15032: not all alterations performed
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "DATA"

SQL> select GROUP_NUMBER, NAME, STATE, TYPE from v$asm_diskgroup;

no rows selected

SQL> select GROUP_NUMBER, DISK_NUMBER, MODE_STATUS, STATE, NAME, PATH from v$asm_disk;

no rows selected

SQL> show parameter diskstring

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
asm_diskstring                       string      /dev/vg00/rlv*

As you can seen the discovery path is still pointing to /dev/vg00/rlv*, now we will change disk discovery path by pointing asm_diskstring parameter to the new location of the disk and we will mount the ASM instance:

SQL> alter system set asm_diskstring='/dev/rdsk/*' scope=both;

System altered.

SQL> select GROUP_NUMBER, DISK_NUMBER, MODE_STATUS, STATE, NAME, PATH from v$asm_disk;

GROUP_NUMBER DISK_NUMBER MODE_ST STATE    NAME      PATH
------------ ----------- ------- -------- --------- ----------------
0            0           ONLINE  NORMAL             /dev/rdsk/c0t2d0

SQL> alter diskgroup data mount;

Diskgroup altered.

SQL> select GROUP_NUMBER, DISK_NUMBER, MODE_STATUS, STATE, NAME, PATH from v$asm_disk;

GROUP_NUMBER DISK_NUMBER MODE_ST STATE    NAME      PATH
------------ ----------- ------- -------- --------- ----------------
1            0           ONLINE  NORMAL   DATA_0000 /dev/rdsk/c0t2d0

SQL> select GROUP_NUMBER, NAME, STATE, TYPE from v$asm_diskgroup;

GROUP_NUMBER NAME                           STATE       TYPE
------------ ------------------------------ ----------- ------
1 DATA                           MOUNTED     EXTERN

SQL> show parameter diskstring;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
asm_diskstring                       string      /dev/rdsk/*

Final test to show that the changes are applied:

SQL> shutdown immediate
ASM diskgroups dismounted
ASM instance shutdown
SQL> startup
ASM instance started

Total System Global Area  130023424 bytes
Fixed Size                  1991968 bytes
Variable Size             102865632 bytes
ASM Cache                  25165824 bytes
ASM diskgroups mounted
SQL> exit
Disconnected from Oracle Database 10g Release 10.2.0.1.0 - 64bit Production
With the Real Application Clusters option
oracle@node1:/home/oracle$

Conclusion
ASM does not keep track of the physical disks of the data groups. Said in other way it does not matter the path or the mminor, major numbers of the physical disks, because the metadata is kept on the disk itself and there is nothing in the dictionary. When you start ASM instance it scans the disks based on the parameter asm_diskstring and reads the header information of the discovered disks.

Categories: hp-ux, oracle Tags: , ,

WP SlimStat