It's been a while since my last post but I was really busy working on a number of projects.
The purpose of this post is to highlight an issue I had while building a standby database. The environment we had - three 18.104.22.168 databases at host A (primary) and same were restored from backup on another host B (standby), both hosts were running Linux. It's important to mention that both hosts were located in different Data Centers.
Once a standby database was mounted we would start shipping archive log files from the primary without adding it to the DataGuard Broker config as of that moment. We wanted to touch the production as little as possible and would add the database to the broker config just before doing the switchover. In the meanwhile we would manually recover the standby database to reduce the apply lag once the database is being added to the broker config. This approach worked fine for two of the databases but we got this error for the third one:
Fri Mar 13 13:33:43 2015 RFS: Assigned to RFS process 29043 RFS: Opened log for thread 1 sequence 29200 dbid -707326650 branch 806518278 CORRUPTION DETECTED: In redo blocks starting at block 20481count 2048 for thread 1 sequence 29200 Deleted Oracle managed file +RECO01/testdb/archivelog/2015_03_13/thread_1_seq_29200.8481.874244023 RFS: Possible network disconnect with primary database Fri Mar 13 13:42:45 2015 Errors in file /u01/app/oracle/diag/rdbms/testdb/testdb/trace/testdb_rfs_31033.trc:
Running through the trace file the first thing which I noticed was:
Corrupt redo block 5964 detected: BAD CHECKSUM
We already had two databases running from host A to host B so we rulled out the firewall issue. Then tried couple of other things - manually recovered the standby with incremental backup, recreated the standby, cleared all the redo/standby log groups but nothing helped. I found only one note in MOS with similar symptom for Streams in 10.2.
At the end the network admins were asked to check the config of the firewalls one more time. There were two firewalls - one where host A was located and another one where host B was located.
It turned out that the firewall at host A location had SQLnet class inspection enabled which was causing the corruption. The logs were successfully shipped from the primary database once this firewall feature was disabled. The strange thing was that we haven't had any issues with the other two databases running on the same hosts, well what can I say - smart firewalls.