Archive

Archive for the ‘storage’ Category

Troubleshooting ASM 11.2 disk discovery

December 14th, 2011 2 comments

I was doing some installation at customer site when they asked if there anything specific to run GI 11.2 on HP-UX as this was their first interaction with 11g. Of course I replied that there is nothing specific, just to make sure the ownership of the raw disk is correct and had a correct ASM discovery string. They said that this is all done as it’s written in the documentation, but disks could not be discovered. This made me curious and asked them to log me in the system so I could have a look.

The system was running latest HP-UX 11.31 and we were going to install Oracle GI 11.2.0.2, the LUN was presented from HP EVA storage.

I couldn’t believe what they are saying and wanted them to show me what exactly they are doing. Unfortunately they were correct, after installing GI 11.2.0.2 software only, we tried to create an asm instance with asmca, but no disks were discovered although everything looked correct.

While I was looking around I remembered that the disk owner patch in HP-UX is a mandatory and it should be installed as the installation guide says this explicitly. I asked the customer and he said that all the required patches are installed, but when I checked the patch wasn’t  installed. The patch number as per installation guide is PHCO_41479, but the latest version is PHCO_41903. Also running kfed against disk on system on which the patch is not installed shows following:

KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

I installed the patch and double checked everything and thought that this could be the reason why we are not seeing the disk, so I try to discover the disk, but again without success. The disk couldn’t be seen at ASM so I had to go deeper and see what asmca was actually doing. For the purpose I had to trace the system calls and for HP-UX the utility capable of doing this was tusc. There is MOS note describing how to trace systems call and what utilities should be used with different unix distributions [ ID 110888.1].

I run asmca and then using tusc got attached to its process, then changed the discovery string, pointing exactly to the disk I would like to use (in my case /dev/rdisk/disk3). So this is the paragraph which makes sense to me:

access("/dev/rdisk/disk3", W_OK|R_OK) ........................................................................... = 0
.......
open("/dev/rdisk/disk3", O_RDONLY|O_NDELAY|0x800, 0) ............................................................ = 7
lseek(7, 8192, SEEK_SET) ........................................................................................ = 8192
read(7, "L V M R E C 0 1 \r/ % aeN e2\va0".., 1024) ............................................................. = 1024
lseek(7, 73728, SEEK_SET) ....................................................................................... = 73728
read(7, "L V M R E C 0 1 \r/ % aeN e2\va0".., 1024) ............................................................. = 1024
close(7) ........................................................................................................ = 0

The disk is first successfully tested for read and write access and it’s opened for read-only in non-blocking mode. Then first 1024 bytes are read from offset 8192 from /dev/rdisk/disk3. This looked like a LVM header, AHA! So it seems that the disk was once used as LVM Physical Volume. Although the disk is not part of any volume group it has a LVM header and that’s why asmca it not showing this disk as CANDIDATE. It turned out that storage admins did not recreate the virtual disk on the storage, but the LUN was once used for LVM on another server.

After doing dd on the disk now the header looks better and disk could be seen as CANDIDATE:

oracle@vm:/$ dd if=/dev/zero of=/dev/rdisk/disk3 bs=1024k count=10

Now tusc output shows that header is filled with zeros:

read(7, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0".., 1024) ........................................................ = 1024

Just for troubleshooting purpose I try to read the disk header with kfed, before and after showed the same error:

KFED-00322: file not found; arguments: [kfbtTraverseBlock] [Invalid OSM block type] [] [0]

If you are not sure whether the disk contains valuable information you could import the physical volume and activate the volume group. In my case I was sure that the disk should be deleted and simple dd do the job.

Regards,
Sve

Categories: hp-ux, oracle, storage Tags: ,

HP EVA4400/6400/8400 now ship with XCS v10000000

September 7th, 2011 No comments

Two months ago HP released a new firmware for the EVA family including 4400/6400/8400/P6300 and P6500, version XCS 10000000. As I blogged previously the EVA4400 could not compare with the midrange storage system from other vendors. It introduced few new features, which turned out not to be complete and some other features wasn’t even included. Now except the re-branding of the EVA family to HP P6000 EVA, they also introduced a lot of new features with the new release. Some of the really useful features in this release:

  • Thin provisioning – It’s something the other vendors had for long time, it was about time to have such a feature on the EVA. This feature dynamically increases the space allocated to a virtual disk.
  • Large LUN support – Because this feature was missing, last year I’ve spent one month for migrating data from several big disks to another diskgroup with LVM. This will support snapshot replication of LUNs greater than 2 TB, expanding and shrinking of the large LUNs is also supported.
  • Online virtual disk migration (change Vraid or disk group) – Another ‘must have’ feature, change a virtual disk’s redundancy level or disk group membership without impacting host I/O. Previously one has to create another virtual disk in required level and move data from the other.

There are also a lot of fixes included in this release and some of them sounds really scary. For more information refer to the release notes.

Regards,
Sve

Categories: hp-ux, storage Tags:

HP EVA4400 support for big LUNs, no thanks

May 11th, 2011 No comments

I was part of project for upgrading an EVA4400 with FC drives, which currently has only FATA drives. The final task was to have equal count of FATA and FC drives in the system (8 cages). Because the FATA group was not fully allocated we’ve decided to create a FC group with some drives and using the snapclone functionality replicate the LUNs from FATA group to FC group. Later we would delete the FATA disk group, group more FC drives and create new FATA disk group with less drives.

Going with this setup I’ve decided to see for how long one LUN would be replicated and make the calculations. Then guess what, I’ve got a message saying that we cannot create snapclone of the disk because it’s too big, the LUN was just a little bigger than the 2TB. This EVA was upgraded with the latest firmware which gives you the ability to create LUNs up to 32TB. It was already written in the release notes, although with different words:

  • Virtual disk size. Create virtual disks (LUNs) that are larger than 2 TB (maximum of 32 TB). You can only create, present, and delete virtual disks that are 2 TB or greater. You cannot perform replication tasks or extend and shrink virtual disks that are 2 TB or greater.

So what happens if I have a database running on top of 5TB LUN and I want to create a snapshot, for reporting let’s say … it would be just impossible.

Vdisk size must be an integer between 1 and 2047 in order this functionallity to work, otherwise you’ll get the following error:

Operation failed!
The requested operation is not currently support by the installed firmware for luns 2TB in size.

I’m not completely sure if this issue exists in EVA6/8400.

Regards,
Sve

Categories: storage Tags:

LVM adventures with SLES 10 SP2

February 16th, 2011 No comments

Recently I was asked to move one database and few files systems from one storage system to another, they all resided on one server. The source storage system was EVA4400 with FATA and destination was again EVA4400, but running with FC drives. Both storage systems were having 8Gbits connection. The storage layout was: three physical volumes 2TB each and one PV 2.5 TB, these were separate in three volume groups and were created five logical volumes on top of them.

For completing the task I had several options:
1. Regular file system copy (offline)
2. Using dd (offline)
3. Using BCV (almost online)
4. LVM mirroring (online)

So I started testing and giving pros and cons of every option:

1. Because I had already moved files on other systems I knew the speed of coping files from one file system to another. Doing to calculations it turned out that we need three or at least two ways and half for coping data from one file system (FATA) to another (FC). Another good point was the count of files in one of the file systems, it was 3M (a lot of small) files! which means that probably three ways wasn’t to be enough for process to complete. On top of this the process should be offline because of database consistency.

2. The other choice was doing dd for all of the volumes. Doing dd would be better than file system copy, but again it has to be offline and we have no control of the process. What’s more some we had LUNs which are bigger than 2TB on the first system, but were unable to create bigger  LUNs than 2TB on the second storage system, because of the firmware issue. It’s something I’m going to blog latter, by the same reason were unable to use Busineess Copy (snapshots and snapclones).

3. We had an option to move data to the same storage system using BCV, with snapclone we could move data from one disk group to another. This would be definitely the fastest ways and a little downtime would be required just to remount the new file systems and start the database and applications. Because using the latest firmware we had LUNs which were bigger than 2TB we were unable to do any replication solutions with them. Again, I’ll blog about this one soon.

4. So the last technology left was the LVM mirroring. I’ve had a lot of experience with LVM on the HP-UX systems and I really like it. I’ve decided to give it a try on the Linux, well I’ve worked with LVM on Linux, but nothing more than create/extend and resize. From here I started a month process with adventures:

The first difference with HP-UX was that I need one additional disk for mirror log (?). In Linux LVM if you want to create mirror, additional disk is need or else the log is written to memory and if the server is restarted, the process must be repeated. The error message is following:
sles10:/ # lvconvert -m1 vg01/lvtest /dev/sdc
Not enough PVs with free space available for parallel allocation.
Consider –alloc anywhere if desperate.
sles10:/ # lvconvert -m1 vg01/lvtest /dev/sdc /dev/sdd
Logical volume lvtest converted.

What’s disturbing is that I wasn’t able to find what should be the size of the mirror log. It’s not in the docs, some folks at the forums said that it should be 1% of the size of the mirrored logical volume. Actually it’s one extend big:
[lvtest_mlog]     vg01 lwi-ao    1 linear  4.00M /dev/sdd(0)

After I spent one week in creating virtual disks and mirroring logical volumes and got to the point where I should break the mirror and remove the physical volumes from the first storage system. For this purpose I had two options:
4.1. lvconvert (online)
4.2. vgsplit (almost online)

4.1. Using lvconvert -m0 I was supposed to remove the first physical volume and leave the file system on the second storage system. With no obvious reason I got the following error when I try to break the mirror i.e. convert the logical volume to linear:
sles10:/ # lvconvert -m0 vg01/lvtest /dev/sdc
No free extents on physical volume “/dev/sdc”
No specified PVs have space available

Sure I don’t have free extents, but why do I need them when I’m currently breaking and not creating the mirror ? I search a lot and didn’t found any solution, probably this is a bug of the current version of the SLES. Either way I decided to test this in a lab environment and figure out what could be done to finish the process. I created one group with two physical volumes, 1GB each and then created logical volume with exact same size. It was the same, I wasn’t able to break the mirror:
— Physical volumes —
PV Name               /dev/sdb
Total PE / Free PE    256 / 0
PV Name               /dev/sdc

Total PE / Free PE    256 / 0

I wasn’t able to remove the first physical volume, if I execute just lvconvert -m0 /dev/sdb, without the third argument then I’m at starting point.

I’ve got it literally, I don’t have enough physical extents, I’ve decided to test this in a lab environment and resizing the first physical volume just by one extent resolved the problem:
— Logical volume —
LV Name                /dev/vg01/lvtest
Current LE             257
— Physical volumes —
PV Name               /dev/sdb
Total PE / Free PE    258 / 1
PV Name               /dev/sdc
Total PE / Free PE    257 / 0

sles10:~ # lvconvert -m0 vg01/lvtest /dev/sdb
Logical volume lvtest converted.

I was hopeful that I will get over this error by just resizing the LUNs with its minimal step (in EVA this is 1 GB), but this was not possible. Again because of firmware issues I was not able to extent the LUNs. At this point I decided to do it the hard way, which is unpresent the LUNs from the first storage system without breaking the mirror. This worked great, just by unpresenting the LUNs the LVM detected this and removed the failed physical volumes, all the file systems continued working without interruption.

This was possible with four of the LUNs, but the last one was bigger and it spanned across two logical volumes. Because of this or some other reason unpresenting just the LUN didn’t worked out and I decided to go on with the last option.

4.2. Using vgsplit sounds promising, some of the manuals in Internet showed that this is the way to break a mirror. The steps are almost the same, using lvconvert remove the mirror from the second physical volume, delete any left lvol_mimage_X volumes and then using vgsplit create another volume group with the second physical volume. If the file systems are opened during lvconvert then lvol_mimage volume will reside for sure. After spliting the volume group the same logical volumes with exactly the same count of logical extents has to be created. At this point regular file system check would be enough and the file systems could be mounted. Well, it took me more than an hour to check 2TB the file system, but otherwise everything is fine.

As a conclusion I would say that BCV would be fastest for moving data within the same storage system. Of course we are only talking about online or almost online replication so the other option is using LVM mirroring. Depending on the Linux/Unix distribution this could be done with lvconvert to reduce the volume group with the first physical volume or using vgsplit to move the second physical volume to another volume group and recreate the logical volumes.

 

Regards,
Sve

Categories: linux, storage Tags: