Archive

Posts Tagged ‘10.2’

Failed creating physical standby database

June 5th, 2012 No comments

These days I’m implementing Oracle Dataguard for two Oracle databases 10.2 as part of disaster recovery project, one of them is around 1.7TB, not yet production. As part of the DG setup backups have to be available for both primary and standby. I preferred to use ASM and was able to negotiate with the storage admin to run storage replication for the FRA disks during the backup. This way I would have the same structure and files, locally at the disaster site immediately after the backup of the primary database is completed.

Unfortunately two weeks passed and by the time (this morning) I had to start with the DG setup the FRA (2TB big) got exhausted, because of too many archivelogs. When I saw this the first thing it came to my mind was to delete the backup as it was too big and I already had it at the disaster site!?. Deleting archivelogs was not an option as I need these archivelogs so the standby could catch up with the primary. So what I’ve done was to delete the backup without thinking and then moved on to duplicate the primary database for standby.

I’ve setup the standby instance and when issued “DUPLICATE TARGET DATABASE FOR STANDBY NOFILENAMECHECK DORECOVER;” I got the following errors:

RMAN-03002: failure of Duplicate Db command at 06/05/2012 11:12:04
RMAN-03015: error occurred in stored script Memory Script
RMAN-06136: ORACLE error from auxiliary database: ORA-01180: can not create datafile 1
ORA-01110: data file 1: '+DATA/orcl/datafile/system.347.666704609'

I was surprised by this error and started thinking if there wasn’t a problem with the storage (it happened twice before to have read only disks), but this was not the case. Once the diskgroup is mounted it means that ASM can read/write to its disks.

It’s obvious what my mistake is, but it took me a while until I realize what I’ve done. Deleting backup at the primary database means that it no longer knows for my backup and for that reason I could not clone the primary database. That’s why I got error that datafile 1 could not be created, simply it has no backups to restore it from. Thinking now what if the backup was located at NFS share, then I would definitely not delete it, but maybe move the archivelogs and manually register them later on the standby.

So now I started a new backup, waiting for it to finish and got replicated to the disaster site.

Regards,
Sve

Categories: oracle Tags: , ,

Oracle Database 10.2 datapump job fails because of invalid queue

June 4th, 2012 No comments

Recently I had to clone one schema from Oracle Database 10.2.0.3 on HP-UX to 10.2.0.4 on Linux. The first problem came when I tried to export it and got the following errors:

UDE-00008: operation generated ORACLE error 31623
ORA-31623: a job is not attached to this session via the specified handle
ORA-06512: at "SYS.DBMS_DATAPUMP", line 2315
ORA-06512: at "SYS.DBMS_DATAPUMP", line 3185
ORA-06512: at line 1

 

These gave me no sign of what could be the problem, but when I tried to export only the metadata I got ORA-00600 which helped me finding the solution:

Tue May 29 12:21:13 2012
Errors in file /oracle/admin/brego/udump/brego_ora_21785.trc:
ORA-00600: internal error code, arguments: [kwqbgqc: bad state], [1], [1], [], [], [], [], []
Tue May 29 12:22:01 2012
kwqiclode Warning : 600 happened during Queue                         load

The error was caused by invalid DataPump queue, refer to the following MOS notes to resolve the error:
DataPump Import (IMPDP) Fails With Errors UDI-8 OCI-22303 ORA-600 [kwqbgqc: bad state] [ID 1086334.1]
Errors ORA-31623 And ORA-600 [kwqbgqc: bad state] During DataPump Export Or Import [ID 754401.1]

Regards,
Sve

Categories: oracle Tags: , , ,

ASM diskgroup unable to mount because of duplicate disk

March 6th, 2012 2 comments

This is something I would like to share for a long time, but for one or another reason I didn’t. The error is nothing special, but then I run into very interesting case.

This happen when I had to shutdown one server for maintenance and then boot it again. Once the OS started I notice that the database didn’t ran automatically. When I try to run it manually I got the following error:

SQL> startup
 ORA-01078: failure in processing system parameters
 ORA-01565: error in identifying file '+DATA/orcl/spfileorcl.ora'
 ORA-17503: ksfdopn:2 Failed to open file +DATA/orcl/spfileorcl.ora
 ORA-15077: could not locate ASM instance serving a required diskgroup

The reason for this was that the ASM diskgroup wasn’t mounted, again I tried to mount it manually, when I got this error:

SQL> alter diskgroup data mount;
 alter diskgroup data mount
 *
 ERROR at line 1:
 ORA-15032: not all alterations performed
 ORA-15024: discovered duplicately numbered ASM disk 29

So how could this happen as I didn’t change any of the parameter and configuration, the database was started for just three months. Running quick script with kfed:

ls -1 /dev/oracle/*for i in `ls -1 /dev/oracle/*`; do echo /dev/oracle/$i; kfed read /dec/oracle/$i | grep kfdhdb.dskname; done

showed me that I there are really two duplicate disks. ASM does not support disks with the same disk names, even if this is the same disk, but accessed by different path. The duplicate disk was an old disk, which wasn’t removed at the time of the change, only its filename was changed so once ASM discovers disks in the future this one would be excluded.

Looking at the asm_diskstring I found that it has changed to ‘/dev/oracle/*’ and should be ‘/dev/oracle/orcl*’ and that’s why ASM was scanning all the disks. This made me think what could happen for the time database was running and why this happen.

After looking at the logs and made few tests I had a conclusion. At some point when the DBA added new disks to ASM he done it through the dbca and he changed the discovery string for unknown reason. What happens in background is that dbca, implicitly is recreating the ASM parameter file with the new discovery string entered by the DBA. This is why ASM started with wrong discovery string when I rebooted the server and it discovered more disks.

The database ran at 26 March and a month later at 29 April new disks were added to the ASM and its parameter was recreated, almost two months later at 7 July I turned it back with correct values:

-rw-r----- 1 oracle oinstall 1336 Jul 7 18:42 init.ora
-rw-r----- 1 oracle oinstall 1327 Apr 29 11:32 init.ora.662011221038

Regards,
Sve

Categories: oracle Tags: , ,