ASM diskgroup unable to mount because of duplicate disk
This is something I would like to share for a long time, but for one or another reason I didn’t. The error is nothing special, but then I run into very interesting case.
This happen when I had to shutdown one server for maintenance and then boot it again. Once the OS started I notice that the database didn’t ran automatically. When I try to run it manually I got the following error:
SQL> startup ORA-01078: failure in processing system parameters ORA-01565: error in identifying file '+DATA/orcl/spfileorcl.ora' ORA-17503: ksfdopn:2 Failed to open file +DATA/orcl/spfileorcl.ora ORA-15077: could not locate ASM instance serving a required diskgroup
The reason for this was that the ASM diskgroup wasn’t mounted, again I tried to mount it manually, when I got this error:
SQL> alter diskgroup data mount; alter diskgroup data mount * ERROR at line 1: ORA-15032: not all alterations performed ORA-15024: discovered duplicately numbered ASM disk 29
So how could this happen as I didn’t change any of the parameter and configuration, the database was started for just three months. Running quick script with kfed:
ls -1 /dev/oracle/*for i in `ls -1 /dev/oracle/*`; do echo /dev/oracle/$i; kfed read /dec/oracle/$i | grep kfdhdb.dskname; done
showed me that I there are really two duplicate disks. ASM does not support disks with the same disk names, even if this is the same disk, but accessed by different path. The duplicate disk was an old disk, which wasn’t removed at the time of the change, only its filename was changed so once ASM discovers disks in the future this one would be excluded.
Looking at the asm_diskstring I found that it has changed to ‘/dev/oracle/*’ and should be ‘/dev/oracle/orcl*’ and that’s why ASM was scanning all the disks. This made me think what could happen for the time database was running and why this happen.
After looking at the logs and made few tests I had a conclusion. At some point when the DBA added new disks to ASM he done it through the dbca and he changed the discovery string for unknown reason. What happens in background is that dbca, implicitly is recreating the ASM parameter file with the new discovery string entered by the DBA. This is why ASM started with wrong discovery string when I rebooted the server and it discovered more disks.
The database ran at 26 March and a month later at 29 April new disks were added to the ASM and its parameter was recreated, almost two months later at 7 July I turned it back with correct values:
-rw-r----- 1 oracle oinstall 1336 Jul 7 18:42 init.ora -rw-r----- 1 oracle oinstall 1327 Apr 29 11:32 init.ora.662011221038