Archive

Posts Tagged ‘rac’

Grid Infrastructure 12c installation fails because of 255 in the subnet ID

August 25th, 2016 No comments

I was doing another GI 12.1.0.2 cluster installation last month when I got really weird error.

While root.sh was running on the first node I got the following error:

2016/07/01 15:02:10 CLSRSC-343: Successfully started Oracle Clusterware stack
2016/07/01 15:02:23 CLSRSC-180: An error occurred while executing the command '/ocw/grid/bin/oifcfg setif -global eth0/10.118.144.0:public eth1/10.118.255.0:cluster_interconnect' (error code 1)
2016/07/01 15:02:24 CLSRSC-287: FirstNode configuration failed
Died at /ocw/grid/crs/install/crsinstall.pm line 2398.

I was surprised to find the following error in the rootcrs log file:

2016-07-01 15:02:22: Executing cmd: /ocw/grid/bin/oifcfg setif -global eth0/10.118.144.0:public eth1/10.118.255.0:cluster_interconnect
2016-07-01 15:02:23: Command output:
> PRIF-15: invalid format for subnet
>End Command output

Quick MOS search suggested that my installation failed because I had 255 in the subnet ID:
root.sh fails with CLSRSC-287 due to: PRIF-15: invalid format for subnet (Doc ID 1933472.1)

Indeed we had 255 in the private network subnet (10.118.255.0). Fortunately this was in our private network which was easy to change but you will still hit this issue if you public network  has 255 in the subnet ID.

Categories: oracle Tags: , ,

Oracle GI 12.1 error when using NFS

January 16th, 2014 No comments

I had quite an interesting case recently where I had to build stretch cluster for a customer using Oracle GI 12.1 and placing quorum voting disk on NFS. There is a document at OTN regarding the stretch clusters and using NFS as a third location for voting disk but it has information for 11.2 only as of the moment. Assuming there is no difference in the NFS parameters I used the Linux parameters from that document and mounted the NFS share on the cluster nodes.

Later on when I tried to add the third voting disk within the ASM disk group I got this strange error:

SQL> ALTER DISKGROUP OCRVOTE ADD  QUORUM DISK '/vote_nfs/vote_3rd' SIZE 10000M /* ASMCA */
Thu Nov 14 11:33:55 2013
NOTE: GroupBlock outside rolling migration privileged region
Thu Nov 14 11:33:55 2013
Errors in file /install/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_26408.trc:
ORA-17503: ksfdopn:3 Failed to open file /vote_nfs/vote_3rd
ORA-17500: ODM err:Operation not permitted
Thu Nov 14 11:33:55 2013
Errors in file /install/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_33427.trc:
ORA-17503: ksfdopn:3 Failed to open file /vote_nfs/vote_3rd
ORA-17500: ODM err:Operation not permitted
NOTE: Assigning number (1,3) to disk (/vote_nfs/vote_3rd)
NOTE: requesting all-instance membership refresh for group=1
Thu Nov 14 11:33:55 2013
ORA-15025: could not open disk "/vote_nfs/vote_3rd"
ORA-17503: ksfdopn:3 Failed to open file /vote_nfs/vote_3rd
ORA-17500: ODM err:Operation not permitted
WARNING: Read Failed. group:1 disk:3 AU:0 offset:0 size:4096
path:Unknown disk
incarnation:0xeada1488 asynchronous result:'I/O error'
subsys:Unknown library krq:0x7f715f012d50 bufp:0x7f715e95d600 osderr1:0x0 osderr2:0x0
IO elapsed time: 0 usec Time waited on I/O: 0 usec
NOTE: Disk OCRVOTE_0003 in mode 0x7f marked for de-assignment
Errors in file /install/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_33427.trc  (incident=83441):
ORA-00600: internal error code, arguments: [kfgscRevalidate_1], [1], [0], [], [], [], [], [], [], [], [], []
ORA-15080: synchronous I/O operation failed to read block 0 of disk 3 in disk group OCRVOTE

This happens because with 12c direct NFS is used by default and it will use ports above 1024 to initiate connections. On the other hand there is a default option on the NFS server – secure which will require any incoming connections from ports below 1024:
secure This  option requires that requests originate on an internet port less than IPPORT_RESERVED (1024). This option is on by default. To turn it off, specify insecure.

The solution for that is to add insecure parameters to the exporting NFS server, remount the NFS share and then retry the above operation.

For more information refer to:
12c GI Installation with ASM on NFS Disks Fails with ORA-15018 ORA-15072 ORA-15080 (Doc ID 1555356.1)

 

Categories: linux, oracle Tags: , ,

How I run Oracle VM 2.2 guests with custom network configuration

August 15th, 2012 No comments

Recently I was given three virtual machines running Oracle Enterprise Linux 5 and Oracle 11gR2 RAC on Oracle VM 2.2.1, copied straight from /OVS/running_pool/. I had to get these machines up and running at my lab environment, but I found hard to setup the network. I’ve spent half day in debugging without success, but finally found a workaround, which I’ll explain here.

Just a little technical notes – Oracle VM (xen) has three main setup configurations within /etc/xen/xend-config.sxp:

Bridge Networking – this configuration is configured by default and it’s simplest to configure. Using this type of networking means that the VM guest should have IP from the same network as the VM host. Another thing is that the VM guest could take advantage of DHCP, if any. The following lines should be uncommented in /etc/xen/xend-config.sxp:
(network-script network-bridge)
(vif-script vif-bridge)

Routed Networking with NAT – this configuration is most common where a private LAN must be used, for example you have a VM host running  on your notebook and you can’t get another IP from corporate or lab network. For this you have to setup private LAN and NAT the VM guests so they can access the rest of the network. The following lines should be uncommented in /etc/xen/xend-config.sxp:
(network-script network-nat)
(vif-script vif-nat)

Two-way Routed Network – this configuration requires more manual steps, but offers greater flexibility. This one is exactly the same at the second one, except the fact that VM guests are exposed on the external network. For example when VM guest make connection to external machine, its original IP is seen. The following lines should be uncommented in /etc/xen/xend-config.sxp:
(network-script network-route)
(vif-script vif-route)

Typically only one of the above can be used at one time and selection and choice depends on the network setup. For second and third configurations to work, a “route” must be added to the Default Gateway. For example if my Oracle VM host has an IP address 192.168.143.10, then on the default gateway (192.168.143.1) a route has to be added to explicitly route all connection requests to my VM guests through my VM host. Something like that:
route add -net 10.0.1.0 netmask 255.255.255.0 gw 192.168.143.10

Now back to the case itself. Each of the RAC nodes had two NICs – one for the public connections and one for the private, which is used by GI an RAC. The public network was 10.0.1.X and private 192.168.1.X. What I wanted was to run the VM guests at my lab and access them directly with IP addresses from the lab network, which was 192.168.143.X. As we know the default network configuration is to use bridged networking so I went with this one. Having the vm guests config files all I had to do was to change the first address of every guest:

From:
vif = [‘mac=00:16:3e:22:0d:04, ip=10.0.1.11, bridge=xenbr0’, ‘mac=00:16:3e:22:0d:14, ip=192.168.1.11’,]

To:
vif = [‘mac=00:16:3e:22:0d:04, ip=192.168.143.151, bridge=xenbr0’, ‘mac=00:16:3e:22:0d:14, ip=192.168.1.11’,]

This turned to be real nightmare, I’ve spent half a day looking why my VM gusts doesn’t have access to the lab network. They had access to VM host, but not to the outside world. Maybe because I’m running Oracle VM on top of VMWare, but finally I  gave up this configuration.

Thus I had to use one of the other two network configurations – Routed Networking with NAT OR Two-way Routed Network. Either case I didn’t have access to the default gateway and would not be able to put static route to my VM guests.

Here is how I solved this – to run three nodes RAC on Oracle VM Server 2.2.1, keep their original network configuration and access them with IP address from my lab network (192.168.143.X). I’ve put logical IP’s of the VM guests on the VM host using ip (ifconfig could also be used) and then using iptables change packet destination to the VM guests themselves (10.0.1.X).

1. Change Oracle VM configuration to Two-way Routed Network, comment the lines for default bridge configuration and remove comments for routed networking:
(network-script network-route)
(vif-script vif-route)

2. Configure VM host itself for forwarding:
echo 1 > /proc/sys/net/ipv4/conf/all/proxy_arp
iptables -t nat -A POSTROUTING -s 10.0.1.0 -j MASQUERADE

3. Set network alias with the IP address that you want to use for the VM guests:
ip addr add 192.168.143.151/32 dev eth0:1
ip addr add 192.168.143.152/32 dev eth0:2
ip addr add 192.168.143.153/32 dev eth0:3

4. Create iptables rules in PREROUTING chain that will redirect the request to VM guests original IPs once it receive it on the lab network IP:
iptables -t nat -A PREROUTING -d 192.168.143.151 -i eth0 -j DNAT –to-destination 10.0.1.11
iptables -t nat -A PREROUTING -d 192.168.143.152 -i eth0 -j DNAT –to-destination 10.0.1.12
iptables -t nat -A PREROUTING -d 192.168.143.153 -i eth0 -j DNAT –to-destination 10.0.1.13

5. Just untar the VM guest in /OVS/running_pool/

[root@ovm22 running_pool]# ls -al /OVS/running_pool/dbnode1/
total 26358330
drwxr-xr-x 2 root root        3896 Aug  6 17:27 .
drwxrwxrwx 6 root root        3896 Aug  3 11:18 ..
-rw-r–r– 1 root root  2294367596 May 16  17:27 swap.img
-rw-r–r– 1 root root  4589434792 May 16  17:27 system.img
-rw-r–r– 1 root root 20107128360 May 16  17:27 u01.img
-rw-r–r– 1 root root         436 Aug 6 11:20 vm.cfg

6. Run the guest:
xm create /OVS/running_pool/dbnode1/vm.cfg

Now I have a three node RAC, nodes have their original public IPs and I can access them using my lab network IPs. The mapping is like this:

Request to 192.168.143.151 –> the IP address is up on the VM host –> on the VM host iptables takes action –> packet destination IP address is changed to 10.0.1.11 –> static route is already in place at VM host routing packet to the vif interface of the VM guest.

Now I can access my dbnode1 (10.0.1.11) directly with its lab network IP 192.168.143.151.

Regards,
Sve

Categories: linux, oracle, virtualization Tags: ,

Change of network interfaces in Oracle 10g RAC

July 12th, 2011 No comments

I was doing planned downtime on one of the 10.2.0.4 RAC systems and just before start of the second node I was told that during the downtime the network interfaces of the second node were aggregated. These servers are running HP-UX in which the default network interfaces are lan0 for the public network and lan1 for the interconnect. After they have been aggregated they became lan900 and lan901 respectively so I ask the guys to turn the things back and as I knew that the Clusterware would suffer from this change.

I decided to create a test scenario at the office, but with Linux OS (its was faster to deploy and test). Except the interfaces names everything else should be the same. I’m using eth0 for public and eth1 for private. Then for the purpose of demonstration at the second node I’m going to change the network interface which is used for public from eth0 to eth2. This would require also modifying nodeapps as VIP is running on this interface.

I installed Oracle 10.2.0.4 RAC on two nodes: oelvm5 and oelvm6 with orcl database. This is how the cluster configuration looks like before changing the interface:

[oracle@oelvm5 bin]$ ./oifcfg getif
eth0 192.168.143.0 global public
eth1 172.16.143.0 global cluster_interconnect

[oracle@oelvm5 bin]$ srvctl config nodeapps -n oelvm5 -a
VIP exists.: /oelvm5-vip/192.168.143.159/255.255.255.0/eth0

[oracle@oelvm5 bin]$ srvctl config nodeapps -n oelvm6 -a
VIP exists.: /oelvm6-vip/192.168.143.160/255.255.255.0/eth0

At this point I changed the interface eth0 to eth2 on the second node and restarted the node. After change of network interface on second node, listener is unable to run and VIP is relocated to the first node. I’m using a very handy script for getting the cluster resources status in formatted output and here is the output of it after the node boot:

[oracle@oelvm5 bin]$ crsstatus
HA Resource Target State
———– —— —–
ora.orcl.db ONLINE ONLINE on oelvm5
ora.orcl.orcl1.inst ONLINE ONLINE on oelvm5
ora.orcl.orcl2.inst ONLINE ONLINE on oelvm6
ora.oelvm5.ASM1.asm ONLINE ONLINE on oelvm5
ora.oelvm5.LISTENER_OELVM5.lsnr ONLINE ONLINE on oelvm5
ora.oelvm5.gsd ONLINE ONLINE on oelvm5
ora.oelvm5.ons ONLINE ONLINE on oelvm5
ora.oelvm5.vip ONLINE ONLINE on oelvm5
ora.oelvm6.ASM2.asm ONLINE ONLINE on oelvm6
ora.oelvm6.LISTENER_OELVM6.lsnr ONLINE OFFLINE
ora.oelvm6.gsd ONLINE ONLINE on oelvm6
ora.oelvm6.ons ONLINE ONLINE on oelvm6
ora.oelvm6.vip ONLINE ONLINE on oelvm5

Also following can be observed in $ORA_CRS_HOME/log/{HOST}/racg/ora.{HOST}.vip.log:
2011-05-28 16:20:39.157: [ RACG][3909306080] [4865][3909306080][ora.oelvm6.vip]: checkIf: interface eth0 is down
Invalid parameters, or failed to bring up VIP (host=node2)

So now its obvious, the VIP could not be started up on the second node, because interface eth0 is down. In order to change the public network interface, one has to use oifcfg first to delete the current interface and then add the correct one. Then for the node on which the interface is changed clusterware has to be stopped and nodeapps updated from the other node.

In case you are running in production and not using services, consider using crs_relocate on the VIP resource. It will relocate immediately the VIP address to the other node so none of the client would suffer from connection time out. In my lab VIP was easily relocated with just crs_relocate, but at the production environment ASM and LISTENER were dependant on the VIP and I had to stop them first. Not sure, but I think this was because there were two homes, one for ASM and one for DB.

Then change the public interface/subnet on the dependant node. While Clusterware is running, delete the interfaces using oifcfg and then add it with correct interface:

[oracle@oelvm6 ~]$ ./oifcfg delif -global eth0
[oracle@oelvm5 ~]$ oifcfg getif
eth1 172.16.143.0 global cluster_interconnect
[oracle@oelvm6 ~]$ oifcfg setif -global eth2/192.168.143.0:public

Now we have a correct configuration:

[oracle@oelvm5 ~]$ oifcfg getif
eth2 192.168.143.0 global public
eth1 172.16.143.0 global cluster_interconnect

Because the interface is the same on which VIP is running, nodeapps for this node has to be updated as well. For this action, stop the clusterware on the dependant node and execute srvctl from the other node. The other node has to be up and running in order to make the change:

[root@oelvm6 ~]# /etc/init.d/init.crs stop
Shutting down Oracle Cluster Ready Services (CRS):
May 30 11:33:15.380 | INF | daemon shutting down
Stopping resources. This could take several minutes.
Successfully stopped CRS resources.
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
Shutdown has begun. The daemons should exit soon.

[oracle@oelvm5 ~]$ srvctl config nodeapps -n oelvm6 -a
VIP exists.: /oelvm6-vip/192.168.143.160/255.255.255.0/eth0
[root@oelvm5 ~]# srvctl modify nodeapps -n oelvm6 -A oelvm6-vip/255.255.255.0/eth2
[oracle@oelvm5 ~]$ srvctl config nodeapps -n oelvm6 -a
VIP exists.: /oelvm6-vip/192.168.143.160/255.255.255.0/eth2

Finally start the Clusterware on the second node. It will automatically relocate it’s VIP address and start all the resources:

[root@oelvm6 ~]# /etc/init.d/init.crs start
Startup will be queued to init within 30 seconds.

It could be seen that the change is reflected and now the node applications are running fine:

[oracle@oelvm6 racg]$ crsstatus
HA Resource Target State
———– —— —–
ora.orcl.db ONLINE ONLINE on oelvm5
ora.orcl.orcl1.inst ONLINE ONLINE on oelvm5
ora.orcl.orcl2.inst ONLINE ONLINE on oelvm6
ora.oelvm5.ASM1.asm ONLINE ONLINE on oelvm5
ora.oelvm5.LISTENER_OELVM5.lsnr ONLINE ONLINE on oelvm5
ora.oelvm5.gsd ONLINE ONLINE on oelvm5
ora.oelvm5.ons ONLINE ONLINE on oelvm5
ora.oelvm5.vip ONLINE ONLINE on oelvm5
ora.oelvm6.ASM2.asm ONLINE ONLINE on oelvm6
ora.oelvm6.LISTENER_OELVM6.lsnr ONLINE ONLINE on oelvm6
ora.oelvm6.gsd ONLINE ONLINE on oelvm6
ora.oelvm6.ons ONLINE ONLINE on oelvm6
ora.oelvm6.vip ONLINE ONLINE on oelvm6

Regards,
Sve

Categories: hp-ux, oracle Tags: , ,

Oracle DB 10.2.0.3 LISTENER (VIP) goes down on HP-UX 11.23 without reason

January 5th, 2011 No comments

Happy New Year!

For a long time I’ve been receiving complains that the listener at one of the nodes in two node RAC is going offline from time to time. Without obvious reason the VIP of the second node fails, the listener is stopped and VIP is relocated to the first node. Since the VIP is relocated there are no problems if all the clients are configured correctly. In this case some of the clients were connecting explicitly to the second node and were unable to connect to the database. Database version is 10.2.0.3 RAC installed on two nodes running HP-UX 11.23 with December 2008 bundle patches.

The following can be observed in $CRS_HOME/log/$HOSTNAME/crsd/crsd.log:
2010-10-25 06:11:12.492: [ CRSAPP][8336] CheckResource error for ora.db2.vip error code = 1
2010-10-25 06:11:12.522: [ CRSRES][8336] In stateChanged, ora.db2.vip target is ONLINE
2010-10-25 06:11:12.522: [ CRSRES][8336] ora.db2.vip on db2 went OFFLINE unexpectedly
2010-10-25 06:11:12.523: [ CRSRES][8336] StopResource: setting CLI values
2010-10-25 06:11:12.527: [ CRSRES][8336] Attempting to stop `ora.db2.vip` on member `db2`
2010-10-25 06:11:13.182: [ CRSRES][8336] Stop of `ora.db2.vip` on member `db2` succeeded.
2010-10-25 06:11:13.185: [ CRSRES][8336] ora.db2.vip RESTART_COUNT=0 RESTART_ATTEMPTS=0
2010-10-25 06:11:13.188: [ CRSRES][8336] ora.db2.vip failed on db2 relocating.
2010-10-25 06:11:13.231: [ CRSRES][8336] StopResource: setting CLI values
2010-10-25 06:11:13.235: [ CRSRES][8336] Attempting to stop `ora.db2.LISTENER_DB2.lsnr` on member `db2`
2010-10-25 06:12:31.183: [ CRSRES][8336] Stop of `ora.db2.LISTENER_DB2.lsnr` on member `db2` succeeded.
2010-10-25 06:12:31.211: [ CRSRES][8336] Attempting to start `ora.db2.vip` on member `db1`
2010-10-25 06:12:38.327: [ CRSRES][8336] Start of `ora.db2.vip` on member `db1` succeeded.

At alert log can be seen following:
ALTER SYSTEM SET service_names=” SCOPE=MEMORY SID=’oradb2′;

There are couple of bugs logged about that. There is also MOS ID regarding this problem:
HP-UX Itanium: RACGMAIN Received SIGSEGV On CheckResource Causing a Crash of a Resource [ID 763724.1]

The solution is to change the executable mode which uses shared library from “delay binding” to “immediate binding” using following bash script. It has to be applied on both CRS and DB homes, all Oracle processes should be stopped:

cd $ORACLE_HOME/bin/
for i in crs_relocate.bin crs_start.bin crs_stop.bin crsd.bin evmd.bin racgons.bin racgeut racgevtf racgmain; do chatr -B immediate $i; done

cd $CRS_HOME/bin/
for i in crs_relocate.bin crs_start.bin crs_stop.bin crsd.bin evmd.bin racgons.bin racgeut racgevtf racgmain; do chatr -B immediate $i; done

For three months since implementing this solutions I haven’t seen this problem again!

Regards,
Sve

Categories: hp-ux, oracle Tags: , , ,

Shared disk support for VirtualBox

August 9th, 2010 2 comments

I’m very happy to announce that VirtualBox now supports shared disks. Finally we can attach one disk to several virtual machines and run Oracle RAC and other clusters. As Oracle promised, this feature is released with the next maintenance patch (thanks!).

There is a new image write mode which is called shareable and this options is now available for the commands createhd and modifyhd of VBoxManage. To create new shared image use the command VBoxManage createhd with type shareble, creating shared disk from the GUI is not possible. To mark an existing image as a shared use the command VBoxManage modifyhd with type shareable.

Something important is that only fixed size disks are supported. If the disk is dynamic you will encounter the following error if you try to modify the image:
ERROR: Cannot change type for medium ‘/home/vm/ora11g_shared.vdi’ to ‘Shareable’ since it is a dynamic medium storage unit

There is other minor issue, if the image is already attached to two virtual machines the command modifyhd will also fail:
ERROR: Cannot change the type of medium ‘/home/vm/ora11g_shared.vdi’ because it is attached to 2 virtual machines

And finally, YES it works, I have tested it already!

sve@host:~$ VBoxManage showhdinfo /home/vm/ora11g_shared.vdi
Oracle VM VirtualBox Command Line Management Interface Version 3.2.8
(C) 2005-2010 Oracle Corporation
All rights reserved.

UUID:                     7521f059-1196-4d68-a1a6-cf0082fb446a
Accessible:               yes
Description:          
Logical size:             2048 MBytes
Current size on disk:     2048 MBytes
Type:                     shareable
Storage format:           VDI
In use by VMs:            labs1 (UUID: 25475ff4-70bc-4e2e-aa38-d8fae289273e)
                          labs2 (UUID: e4441f4c-1ef9-42e0-8e54-d2aec2c6cf4f)
Location:                 /home/vm/ora11g_shared.vdi

Regards and happy migration 😉
Sve

Categories: oracle, virtualization Tags: , , ,

Many open files on HP-UX after RAC upgrade to 10.2.0.4 – racgimon file handle leak

July 23rd, 2010 No comments

Two months after patching a customer database to 10.2.0.4 I’ve received a call, telling me that the database is hanging. Usually this happens when they missed the backup of the archive logs and the database stops. This time there was enough space available and this was not the problem. I logged to the first node and start looking around, weird things were happening, some commands were failing and other were hanging. Then I realized that this is not an ordinary case and start looking deeper. It turns out that this is a bug of Oracle with HP-UX and there is a patch and work around too.

The customer was having HP-UX 11.23 (September 2006) with patch bundles from September 2008. The database was Oracle RAC Enterprise Edition 10.2.0.2.

This problem had very big impact on the database because although the database is running in RAC the database was not accessible and there were a lot of locks. Rebooting the node or killing the processes do the job

After some reading it figure out that this happens only on HP-UX, after patching the database to 10.2.0.4 and it happens only on the first node.

Here are some symptoms:


Executing sar -v show the current-size and maximum size of the system file table:

12:00:00   N/A   N/A 328/4200  0  1374/286108 0  41906/65536 0
12:02:00   N/A   N/A 330/4200  0  1376/286108 0  41944/65536 0
12:04:00   N/A   N/A 336/4200  0  1390/286108 0  41999/65536 0
12:06:00   N/A   N/A 331/4200  0  1377/286108 0  41983/65536 0
12:08:00   N/A   N/A 330/4200  0  1376/286108 0  41976/65536 0
12:10:00   N/A   N/A 330/4200  0  1377/286108 0  41935/65536 0


With lsof the following open files are seen:

racgimon   3506 oracle   14u   REG             64,0x9        1552   29678 /oracle/ora10g/dbs/hc_baandb1.dat
racgimon   3506 oracle   28u   REG             64,0x9        1552   29678 /oracle/ora10g/dbs/hc_baandb1.dat
racgimon   3506 oracle   30u   REG             64,0x9        1552   29678 /oracle/ora10g/dbs/hc_baandb1.dat
racgimon   3506 oracle   37u   REG             64,0x9        1552   29678 /oracle/ora10g/dbs/hc_baandb1.dat


The processes which is holding the open files:

 oracle  3506     1  0  Nov  5  ?        18:16 /oracle/ora10g/bin/racgimon startd baandb


At this log “$ORACLE_HOME/log/<NodeName>/racg/imon_<InstanceName>.log” every minute can be seen the following error:

2009-12-02 12:12:35.454: [    RACG][73] [3506][73][ora.baandb.baandb1.inst]: GIMH: GIM-00104: Health check failed to connect to instance.
GIM-00090: OS-dependent operation:mmap failed with status: 12
GIM-00091: OS failure message: Not enough space
GIM-00092: OS failure occurred at: sskgmsmr_13
2009-12-02 12:13:35.474: [    RACG][73] [3506][73][ora.baandb.baandb1.inst]: GIMH: GIM-00104: Health check failed to connect to instance.
GIM-00090: OS-dependent operation:mmap failed with status: 12
GIM-00091: OS failure message: Not enough space
GIM-00092: OS failure occurred at: sskgmsmr_13


When the file table gets full weird things start to happen,  in the syslog the following can be seen:

Nov  5 08:00:02 db1 vmunix: file: table is full
Nov  5 08:00:03 db1 vmunix: file: table...
Nov  5 08:00:03 db1 vmunix: file...
Nov  5 08:00:03 db1 vmunix: file...
Nov  5 08:01:13 db1 vmunix: file: table is full
Nov  5 08:11:15 db1  above message repeats 34260 times


Also in the alertlog file the following can be seen:

ORA-00603: ORACLE server session terminated by fatal error
ORA-27544: Failed to map memory region for export
ORA-27300: OS system dependent operation:socket failed with status: 23
ORA-27301: OS failure message: File table overflow
ORA-27302: failure occurred at: sskgxpcre1


Solution:
Base bug is 6931689 (SS10204-HP-PARISC64-080216.080324 HEALTH CHECK FAILED TO CONNECT TO INSTANCE), but it’s not public. It’s fixed in CRS 10.2.0.4 Bundle Patch #2, but the actual CRS bundle is PSU2 with Patch# 8705958: TRACKING BUG FOR 10.2.0.4.2 PSU FOR CRS which is around 41Mb big.
This patch# 8705958 should be applied to all Oracle homes although the bug is in the database CRS should always be a higher version.

To apply this patch OPatch version must be at least 10.2.0.4.7, which can be downloaded with patch# 6880880. At the moment of writing this the latest version was 10.2.0.4.9 and its 34Mb. To install it, simply download it and unzip it under ORACLE_HOME.

I didn’t went with the patch because I read some scary stuff at OTN and thanks to Ivan Kartik I integrated a dirty work around. He proposed very good script which is checking if opened files are more than 20000 just to kill the racgimon process:

13:56:00   N/A   N/A 307/4200  0  1352/286108 0  44102/65536 0
13:58:00   N/A   N/A 307/4200  0  1353/286108 0  44119/65536 0
14:00:01   N/A   N/A 309/4200  0  1355/286108 0  44135/65536 0
14:02:01   N/A   N/A 307/4200  0  1353/286108 0  44153/65536 0
14:04:01   N/A   N/A 301/4200  0  1336/286108 0  2583/65536 0
14:06:01   N/A   N/A 306/4200  0  1347/286108 0  2610/65536 0
14:08:01   N/A   N/A 299/4200  0  1333/286108 0  2583/65536 0
14:10:01   N/A   N/A 300/4200  0  1335/286108 0  2571/65536 0

The work around fixed the problem. This article was written half an year ago and reading MOS now they say that this bug is fixed in 10.2.0.5 which was released at the beginning of June.

Regards,
Sve

Categories: hp-ux, oracle Tags: ,

Oracle will bring back VirtualBox shared disk capability

July 1st, 2010 No comments

During the questions section of the last webinar Introducing Oracle VM VirtualBox 3.2 Oracle said that they received a complains from a lot of customers using VirtualBox regarding the installation of Oracle RAC. This requires a shared disk drive to be accessed by the nodes (VMs) of the cluster simultaneously, but this cannot be achieved directly. There is a workaround by using iSCSI, but this is not the point.

Achim Hasenmueller from VirtualBox engineering team said that they plan to deliver this capability very soon with the next maintenance release and not to wait for the major update. I was surprised to hear that they used to have this feature working, but during one of the major changes to the storage stack they have lost it. I was not able to find this one at the changelogs, but by accident I found the announcement of this limitation at debian bug report log:

From: "VirtualBox" <trac@virtualbox.org>
Cc: vbox-trac@virtualbox.org
Subject: Re: [VirtualBox] #1188: Please support to share a disk image
 between two guests
Date: Wed, 08 Apr 2009 15:24:49 -0000

#1188: Please support to share a disk image between two guests
-----------------------------+----------------------------------------------
Reporter:  bzed              |        Owner:
    Type:  enhancement       |       Status:  closed
Priority:  minor             |    Component:  VM control
 Version:  VirtualBox 1.5.4  |   Resolution:  wontfix
Keywords:                    |        Guest:  other
    Host:  other             |
-----------------------------+----------------------------------------------
Changes (by frank):

  * status:  new => closed
  * resolution:  => wontfix

Comment:

 Starting with 2.1.0, a disk image can be attached to two VMs at the same
 time, but only one of these two VMs can be powered on at the same time.
 Klaus already explained why we wouldn't implement sharing an image between
 running VMs. Closing

I’ve been using VirtualBox for an year now, but recently I decided to install Oracle RAC. Like most of the ex-vmware users I’ve just created a new disk and added it to two virtual machines. The first one started normaly, but when I tryed to start the second one I got the following error:

Result Code: VBOX_E_INVALID_OBJECT_STATE (0x80BB0007)
Component: Machine
Interface: IMachine {6d9212cb-a5c0-48b7-bbc1-3fa2ba2ee6d2}

It turns out that VirtualBox will not allow more than one running VM to use a VDI file. The solution I found most useful is to setup a third server (or VM) with Openfiler iSCSI host. Then VirtualBox can transparently present iSCSI disk to a virtual machine as a virtual hard disk. The guest operating system will not see any difference between a virtual disk image (VDI file) and an iSCSI target. To achieve this, VirtualBox has an integrated iSCSI initiator.

Regards,
Sve

Categories: oracle, virtualization Tags: , ,