Web Analytics

Troubleshooting ASM 11.2 disk discovery

I was doing some installation at customer site when they asked if there anything specific to run GI 11.2 on HP-UX as this was their first interaction with 11g. Of course I replied that there is nothing specific, just to make sure the ownership of the raw disk is correct and had a correct ASM discovery string. They said that this is all done as it's written in the documentation, but disks could not be discovered. This made me curious and asked them to log me in the system so I could have a look.

The system was running latest HP-UX 11.31 and we were going to install Oracle GI 11.2.0.2, the LUN was presented from HP EVA storage.

I couldn't believe what they are saying and wanted them to show me what exactly they are doing. Unfortunately they were correct, after installing GI 11.2.0.2 software only, we tried to create an asm instance with asmca, but no disks were discovered although everything looked correct.

While I was looking around I remembered that the disk owner patch in HP-UX is a mandatory and it should be installed as the installation guide says this explicitly. I asked the customer and he said that all the required patches are installed, but when I checked the patch wasn't installed. The patch number as per installation guide is PHCO_41479, but the latest version is PHCO_41903. Also running kfed against disk on system on which the patch is not installed shows following:

KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

I installed the patch and double checked everything and thought that this could be the reason why we are not seeing the disk, so I try to discover the disk, but again without success. The disk couldn't be seen at ASM so I had to go deeper and see what asmca was actually doing. For the purpose I had to trace the system calls and for HP-UX the utility capable of doing this was tusc. There is MOS note describing how to trace systems call and what utilities should be used with different unix distributions [ ID 110888.1].

I run asmca and then using tusc got attached to its process, then changed the discovery string, pointing exactly to the disk I would like to use (in my case /dev/rdisk/disk3). So this is the paragraph which makes sense to me:

access("/dev/rdisk/disk3", W_OK|R_OK) 	........................................................................... = 0
.......
open("/dev/rdisk/disk3", O_RDONLY|O_NDELAY|0x800, 0) 	............................................................ = 7
lseek(7, 8192, SEEK_SET) 	.................................................................................	....... = 8192
read(7, "L V M R E C 0 1 \r/ % aeN e2\va0".., 1024) 	............................................................. = 1024
lseek(7, 73728, SEEK_SET) 	.................................................................................	...... = 73728
read(7, "L V M R E C 0 1 \r/ % aeN e2\va0".., 1024) 		............................................................. = 1024
close(7) 	.................................................................................	....................... = 0

The disk is first successfully tested for read and write access and it's opened for read-only in non-blocking mode. Then first 1024 bytes are read from offset 8192 from /dev/rdisk/disk3. This looked like a LVM header, AHA! So it seems that the disk was once used as LVM Physical Volume. Although the disk is not part of any volume group it has a LVM header and that's why asmca it not showing this disk as CANDIDATE. It turned out that storage admins did not recreate the virtual disk on the storage, but the LUN was once used for LVM on another server.

After doing dd on the disk now the header looks better and disk could be seen as CANDIDATE:

oracle@vm:/$ dd if=/dev/zero of=/dev/rdisk/disk3 bs=1024k count=10

Now tusc output shows that header is filled with zeros:

read(7, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0".., 1024) ........................................................ = 1024

Just for troubleshooting purpose I try to read the disk header with kfed, before and after showed the same error:

KFED-00322: file not found; arguments: [kfbtTraverseBlock] [Invalid OSM block type] [] [0]

If you are not sure whether the disk contains valuable information you could import the physical volume and activate the volume group. In my case I was sure that the disk should be deleted and simple dd do the job.

Regards,
Sve