Exadata onecommand fails at cell disk creation

February 3rd, 2016 No comments

I was installing another Exadata last month when I got an error on create cell disks step. I’ve seen the same error before when I was extending two to three rack Exadata configuration but thought it was one-off.

The cell disk creation failed as below:

[root@exa01db01 linux-x64]# ./install.sh -cf Customer-exa01.xml -s 8

 Initializing
 Executing Create Cell Disks
 Checking physical disks for errors before creating celldisks.........................
 Restarting cell services....................................................
 ERROR:

 Stopping the RS, CELLSRV, and MS services...
 The SHUTDOWN of services was successful.
 Starting the RS, CELLSRV, and MS services...
 Getting the state of RS services...  running
 Starting CELLSRV services...
 The STARTUP of CELLSRV services was not successful.
 CELL-01533: Unable to validate the IP addresses from the cellinit.ora file because the IP addresses may be down or misconfigured.
 Starting MS services...
 The STARTUP of MS services was successful.
 ERROR:

Going through the cell configuration is obvious why the process failed. The cell still had the default name and the IP addresses that the cell services should use are still the default ones:

CellCLI> list cell detail
         name:                   ru02
         ipaddress1:             192.168.10.1/24
         ipaddress2:             192.168.10.2/24
         cellsrvStatus:          stopped
         msStatus:               running
         rsStatus:               running

In short when you see an error like the one below then your ipaddress1 and/or ipaddress2 fields are most probably wrong:

         2       2015-12-15T17:57:03+00:00       critical        "ORA-00700: soft internal error, arguments: [main_6a], [3], [IP addresses in cellinit.ora not operational], [], [], [], [], [], [], [], [], []"

The solution to that is simple. You need to alter the cell name and IP addresses manually:

CellCLI> alter cell name=exa01cel02a,ipaddress1='192.168.10.13/22',ipaddress2='192.168.10.14/22'
Network configuration altered. Please issue the following commands as root to restart the network and open IB stack:
service openibd restart
service network restart
A restart of all services is required to put new network configuration into effect. MS-CELLSRV communication may be hampered until restart.
Cell exa01cel02a successfully altered

CellCLI> alter cell restart services all

Make sure all cells are fixed and re-run the onecommand step, this time it will succeed:

 Successfully completed execution of step Create Cell Disks [elapsed Time [Elapsed = 128338 mS [2.0 minutes] Thu Dec 17 14:26:59 GMT 2015]]

I’ve checked some older deployments and it’s the same step which should change the cell name and restart the cell services. For some reason this didn’t happened for me. For both deployments I used OEDA v15.300 (Oct 2015) so this might be a bug in this version.

Categories: oracle Tags:

Issues with Oracle Direct NFS

January 28th, 2016 No comments

This is a quick post to highlight two issues I had with Oracle dNFS. Both relate to wrong entries in the oranfstab file.

One might encounter ORA-00600 during database creation:

DBCA_PROGRESS : 7%
DBCA_PROGRESS : 8%
ORA-01501: CREATE DATABASE failed
ORA-00600: internal error code, arguments: [KSSRMP1], [], [], [], [], [], [], [], [], [], [], []
ORA-17502: ksfdcre:3 Failed to create file /oracle/ORCL/database/ORCL/controlfile/o1_mf_%u_.ctl

This was caused by having a wrong entry in oranfstab – there was difference between fstab and oranfstab for the same record:

server: zfs01
path: 192.168.10.100
export: /export/OTHERDB/database mount: /oracle/ORCL/database

The second issue was that the database wasn’t using dNFS. Simple query of v$dnfs_servers will return no rows and there were no errors in the alertlog. However looking around the tracefiles one can easily spot the following repetitive error in all trace files:

KGNFS_NFSPROC3_MNT FAIL 13

This was caused by trying to mount a share we don’t have access to or non existing share from the NFS server:

server: zfs01
path: 192.168.10.100
export: /export/NON_EXIST/database mount: /oracle/ORCL/database

The issue was fixed after correcting the typos in the oranfstab file and resting the database.

The bottom line is make sure that fstab and oranfstab match and have correct entries.

Categories: oracle Tags: , ,

Come and hear me speak at UKOUG Tech 15

December 4th, 2015 No comments

It’s this time of the year again when one of the biggest and last for the year Oracle User Groups is being held and that is UK Oracle User Group Conference.

I’m very grateful and proud that I’ll be speaking on this great conference, here are my talks:

Presentation Title: Oracle Exadata Meets Elastic Configurations
Description: With the release of Exadata X5 Oracle announced Elastic configuration to allow mixed number of db and cell nodes. This session will go through the implementation process of a X5 having two db nodes and four cells.
Date: Monday 7th December
Time: 14:10 – 15:00
Hall: Media Suite B

Presentation Title: Oracle DataGuard Fast-Start Failover: Live Demo
Description: Come and see a live demo of Oracle Fast-start failover and why a private bank moved from RAC to FSFO.
Date: Tuesday 8th December
Time: 16:30 – 17:20
Hall: Hall 11A

See you there!

Categories: oracle Tags:

How to rename ASM disk groups in Exadata

November 25th, 2015 No comments

Deployment of Exadata requires you to generate configuration using Oracle Exadata Deployment Assistant (OEDA). Within the same the default  ASM disk groups names are DBFS_DG, RECOC1 and DATAC1. I usually change those to RECO01 and DATA01 as others doesn’t make sense to me and the only place where I find the default ones is on Exadata.

I had an incident last year where the Exadata deployed half way through and names were left by default so I had to delete the configuration and start from scratch.

For my big surprise I got request recently where customer wanted to change RECO01 and DATA01 to RECOC1 and DATAC1! This was a pre-prod system, already deployed and having few databases running. The Exadata was X5-2 running ESS 12.1.2.1.2 and GI 12.1.0.2.

If this ever happens to you, here is what you need to do:

  1. Rename grid disks.
  2. Rename ASM disk groups and ASM disk names.
  3. Modify all databases to point to the new disk groups.

Rename grid disks

Since grid disks names consists of the disk group name they need to be changed too. Although this is not mandatory I strongly recommend it to avoid any confusion in the future.

The grid disks can be renamed very easily using cellcli but they should NOT be in use by GI at that time. Thus Grid Infrastructure has to be stopped on all servers, stop GI as root:

[root@exa01db01 ~]# /u01/app/12.1.0.2/grid/bin/crsctl stop cluster -all

Then run the following magic command to get the list of all grid disks and replace the disk group names with the new ones:

[root@exa01db01 ~]# for i in `dcli -g cell_group -l root cellcli -e list griddisk | awk -F":" '{print $2'} | awk '{print $1}'`; do echo "cellcli -e alter griddisk $i name=$i"; done | grep -v DBFS |sed -e "s/RECO01/RECOC1/2" -e "s/DATA01/DATAC1/2"

You’ll get a long list of cellcli commands – 12 for each cell which you need to run on the cell locally.

Once it’s done start the GI again and make sure all disk groups are mounted successfully:

[root@exa01db01 ~]# /u01/app/12.1.0.2/grid/bin/crsctl start cluster

 

Rename ASM disk groups and ASM disk names

Next is to rename the disk groups. To do so they must be dismounted on ALL cluster nodes before running renamedg on a disk group. Connect to each ASM instance and dismount the disk groups:

SQL> alter diskgroup datac1 dismount;

Diskgroup altered.

SQL> alter diskgroup recoc1 dismount;

Diskgroup altered.

At this point you can run renamеdg to rename the disk groups, here is an example for the DATAC1 disk group:

[oracle@exa01db01 ~]$ renamedg -dgname DATA01 -newdgname DATAC1

Parsing parameters..
renamedg operation: -dgname DATA01 -newdgname DATAC1
Executing phase 1
Discovering the group
Checking for hearbeat...
Re-discovering the group
Generating configuration file..
Completed phase 1
Executing phase 2
Completed phase 2

Do the same for RECO01 and after that make sure that both disk groups can be mounted on all database nodes successfully, then dismount them again so you rename the ASM disk names. In general there is a command to rename all the disks (ALTER DISKGROUP XX RENAME DISKS ALL) but it will rename the disks to a name of the form diskgroupname_####, where #### is the disk number. However ASM disk names have different names on Exadata (RECO01_CD_01_EXA01CEL01) and that’s why we need to rename them manually.

To rename the disks the disk group has to be mounted in restricted mode (so only one node in the cluster can mount the disk group). Then run the below two statement to generate the new ASM disk names:

SQL> alter diskgroup datac1 mount restricted;

Diskgroup altered.

SQL> select 'alter diskgroup datac1 rename disk ''' || name || ''' to ''' || REPLACE(name,'DATA01','DATAC1') || ''';' from v$asm_disk where name like 'DATA%';

SQL> select 'alter diskgroup recoc1 rename disk ''' || name || ''' to ''' || REPLACE(name,'RECO01','RECOC1') || ''';' from v$asm_disk where name like 'RECO%';

Execute the alter statement generated by the above two statements and mount both disk groups on all database nodes again.

There is no command to add the disk group back to Oracle Restart. They will be automatically added first time they are mounted. However you need to remove the old disk group resources:

[oracle@exa01db01 ~]$ srvctl remove diskgroup -g DATA01
[oracle@exa01db01 ~]$ srvctl remove diskgroup -g RECO01

 

Modify all databases to point to the new disk groups

The last step is to change datafile/tempfile/redolog files on all databases to point to the new disk groups. Make sure you disable block change tracking and flashback as database might not open since the location of BCT has changed:

SQL> alter database disable block change tracking;
SQL> alter database flashback off;

Next create pfile from spfile and substitute all the occurences of RECO01 and DATA01 OR you can modify the spfile just before you shut the database. Let’s assume you have created pfile, make sure all the parameters refer to the new disk group names, here are the default ones that you need to check:

*.control_files
*.db_create_file_dest
*.db_create_online_log_dest_1
*.db_create_online_log_dest_2
*.db_recovery_file_dest

Start the database in mount state and generate all the alter statements for datafiles/tempfiles and redologs:

[oracle@exa01db01 ~]$ sqlplus -s / as sysdba
set heading off
set echo off
set pagesize 140
set linesize 140
spool /tmp/rename.sql

select 'alter database rename file ''' || name || ''' to ''' || REPLACE(name,'DATA01','DATAC1') || ''';' from v$datafile;
select 'alter database rename file ''' || name || ''' to ''' || REPLACE(name,'DATA01','DATAC1') || ''';' from v$tempfile;
select 'alter database rename file ''' || member || ''' to ''' || REPLACE(member,'DATA01','DATAC1')|| ''';' from v$logfile where member like '%DATA%';
select 'alter database rename file ''' || member || ''' to ''' || REPLACE(member,'RECO01','RECOC1')|| ''';' from v$logfile where member like '%RECO%';
exit

Start another sqlplus and run the spool file from the above operation (rename.sql). At this point you can open the database (alter database open;). Once the database is open make sure you enable block change tracking and flashback:

SQL> alter database enable block change tracking;
SQL> alter database flashback on;

Finally change the database dependencies and spfile location:

For 12c databases:

[oracle@exa01db01 dbs]$ srvctl modify database -d dbm01 -nodiskgroup
[oracle@exa01db01 dbs]$ srvctl modify database -d dbm01 -diskgroup "DATAC1,RECOC1"
[oracle@exa01db01 dbs]$ srvctl modify database -d dbm01 -spfile +DATAC1/DBM01/spfiledbm01.ora

For 11g databases:

[oracle@exa01db01 dbs]$ srvctl modify database -d dbm01 -z
[oracle@exa01db01 dbs]$ srvctl modify database -d dbm01 -x "DATAC1,RECOC1"
[oracle@exa01db01 dbs]$ srvctl modify database -d dbm01 -p +DATAC1/DBM01/spfiledbm01.ora
Categories: oracle Tags: ,

How to move OEM12c management agent to new location

October 29th, 2015 2 comments

While working on another Exadata project recently I found that OEM12c agents on the compute nodes were installed on different locations on each of the three Exadatas. On one of them was under /home/oracle/agent, another one had them under /opt/oracle/agent and third one had them under /oracle/agent. Obviously this was not the standard and the agents had to be moved under /u01/app/oracle/agent. The only problem with that was that the three Exadatas were already discovered along with some database targets. Fortunately this wasn’t production yet but it would still require all the agents to be reinstalled and all targets rediscovered.

Fortunately there is an easier way to move the OEM management agents to new location without all the hassle of reinstalling agents and rediscovering agents. In the following example the agent was installed in /home/oracle/agent/ and I had to move it to /u01/app/oracle/agent/.

First you need to download the ConvertToStandalone.pl utility from 2021782.1 and then upload it to the server under /home/oracle

You need to create a list of plugins, otherwise the move process will fail:

[oracle@exa01db01 ~]$ /home/oracle/agent/core/12.1.0.5.0/perl/bin/perl /home/oracle/agent/core/12.1.0.5.0/sysman/install/create_plugin_list.pl -instancehome /home/oracle/agent/core/12.1.0.5.0

This will create a file /home/oracle/agent/plugins.txt which is used by the perl script later.

Export the following variables:

export OLD_AGENT_HOME=/home/oracle/agent/core/12.1.0.5.0
export ORACLE_HOME=/u01/app/oracle/agent/core/12.1.0.5.0

Another thing is you need to do is to modify the SBIN_MODIFIED_VERSION from 12.1.0.4.0. to 12.1.0.5.0 in /home/oracle/agent/agentimage.properties, otherwise the process will fail.

Then run the perl script which will migrate the agent home to the new location:

[oracle@exa01db01 ~]$ /home/oracle/agent/core/12.1.0.5.0/perl/bin/perl /home/oracle/ConvertToStandalone.pl -instanceHome /home/oracle/agent/agent_inst -newAgentBaseDir /u01/app/oracle/agent

Pay attention that the script accepts two arguments, instanceHome is the agent instance home directory e.g. /home/oracle/agent/agent_inst/ and the newAgentBaseDir is the new base dir for the agent /u01/app/oracle/agent/

After the command completes you need to run root.sh as root:

[oracle@exa01db01 ~]# /u01/app/oracle/agent/core/12.1.0.5.0/root.sh
Finished product-specific root actions.
/etc exists

Deinstall the old agent:

[oracle@exa01db01 ~]$ /home/oracle/agent/core/12.1.0.5.0/perl/bin/perl /home/oracle/agent/core/12.1.0.5.0/sysman/install/AgentDeinstall.pl -agentHome /home/oracle/agent/core/12.1.0.5.0

Finally remove the old agent directory where a log file from the deinstall process is left:

[oracle@exa01db01 ~]$ rm -rf /home/oracle/agent

The beauty of this process is that the script will create a blackout AGT_CNT_BLK_OUT on a node level and then stop the agent. It will then migrate the agent to the new home, start the agent and finally remove the blackout. The whole process takes less than five minutes.

Categories: oracle Tags: ,

Introducing Oracle ASM Filter Driver

October 27th, 2015 3 comments

The Oracle ASMFD (Filter Driver) was introduced in Oracle Database 12.1.0.2 and as of the moment it is available on Linux systems only.

Oracle ASM Filter Driver is a kernel module very much like the ASMLIB that resides in the I/O path of the Oracle ASM disks. It provides an interface between the Oracle binaries and the underlying operating environment.

Here are some of the features of ASMFD:

  • Reject non-Oracle I/O

The ASM filter driver will reject write I/O operation issued by non-Oracle commands. This prevents non-Oracle applications from writing to ASM disks and protects ASM from accidental corruption.

  • Device name persistence

Similarly to ASMLIB you don’t have to configure the device name persistence using UDEV.

  • Faster node recovery

According to the documentation ASMFD allows Oracle Clusterware to perform node level fencing without a reboot. So in case of CSS is not running or nodes are fenced the Oracle stack will be restarted instead of node to be rebooted. This is greatly reduce the boot time as with some enterprise servers it might take up to 10 minutes to boot.

  • Reduce OS resource usage

ASMFD exposes a portal device that can be used for all I/O on a particular host and thus decreasing the number of open file descriptors. With it each ASM process needs to have an open descriptor to each ASM disk. I’m not sure how much this will save you but might be useful in case you got hundreds of ASM disks.

  • Thin Provisioning & Data Integrity

This is another new and cool feature which is very popular in the virtualization world. When enabled the disk space not in use can be returned to the array also known as thin-provisioning. This attribute can be set only if the ASM compatibility is greater than or equal to 12.1.0.0 and requires you to use ASMFD!

In a way ASMFD is a replacement of ASMLIB as it includes base-ASMLIB features. However ASMFD takes it one step further by protecting the ASM disks from non-oracle write I/O operations to prevent accidental damage. Unlike ASMLIB the ASMFD is installed with the Oracle Grid Infrastructure installation.

 

Brief history of ASM and the need of ASM Filter Driver

To understand ASMFD better we need to understand where the need comes from. It’s important to say that this is very specific to Linux as other platforms have other methods to fulfill the requirements. Because that’s not the purpose of this post and it’s too long I decide to keep it at the end of the post.

In Linux as in any other platform there is a user separation which implies access restrictions. In Linux we usually install Oracle Database under the oracle user and to do so we need to have writable access to the directories we plan to use. By default that would be /home/oracle/ and as you can imagine that’s not very handy, also you might want to install the database in separate partition or file system. For this reason the root user will create the required directories and change their ownership to oracle, that is usually /u01 or /opt.

That would work if you want to store your database files in a file system. However the traditional file systems were not designed for database files, they need to have a file system check on a regular basis and sometimes they might get corrupted. For that reason and performance perspective many people would move to RAW devices in the past. Another case would be if you want to run RAC – you’ll either need a cluster file system or RAW devices.

Historically with 9i and 10g we used to create RAW devices which are one to one mapping between a device file and a logical name. For example you would create partition on each device /dev/sda1, /deb/sdb1 and then map those to /dev/raw/raw1, /dev/raw/raw2 and so on. Additional because in Linux the device files are rebuild each time the system reboots you need to make sure the permissions and ownership are preserved and persist after system reboot. This was achieved by having additional rules in your last boot scripts (often rc.local). For other platforms like HP-UX for example one had to buy additional license (HP Service Guard extension for RAC) which would give you the ability to have a shared LVM groups across two or more servers.

However the support and maintenance of raw devices was really difficult and Oracle came up with the idea to create their own volume manager to simplify database administration and eliminate the need to manage thousands of database files – Automatic Storage Management, ASM for short. A simple description is that ASM is very sophisticated volume manager for Oracle data. ASM could also be used if you deploy RAC hence you don’t need cluster file systems or RAW devices anymore. Additionally it provides a redundancy so if you have JBOD you can use ASM to do the mirroring of the data. Another important feature is that you don’t need persistent device naming anymore. Upon start ASM will read all the disk drives specified by asm_diskstring and use the ones on which ASM header is found. Although ASM was released in 10.1 people were still using raw devices at the time because ASM was too new and unknown for many DBAs.

So ASM will logically group all the disks (LUNs presented from the storage) into what’s called ASM disk groups and because it’s using Oracle Managed Files you don’t really care anymore where your files are and what their names are. ASM is just another abstraction layer in the database file storage. ASM is available on all platforms so in a way it will standardize the administration of database files. Often the DBAs will also administer the ASM but it could be the storage team managing the ASM. You still had to make sure the device files have the correct permissions before ASM could use them, otherwise no diskgroup will be available hence database could not start.

At the same time back in 2004 Oracle released another product ASMLib which only purpose was to persist the device naming and preserve the device files permissions. I don’t want to go into details about ASMLib here but there is an old and very good post on ASMLib from Wim Coekaerts (HERE). Just to mention that ASMLib is also available under RHEL, more can be found HERE.

In the recent years many people like myself used UDEV to persist the permissions and ownership of the device files used by ASM. I really like to have one to one match between device files and ASM disk names for better understanding and ease any future troubleshooting.

ASM Filter Driver takes this one step further by introducing the features above. I can see people start using ASMFD to take advantage of the thin provisioning OR make sure no one will overwrite (by mistake) the ASM device files, yes this happens and it happened to me recently.

Categories: oracle Tags: ,

Database system target in pending status for standby database in OEM 12c

October 6th, 2015 No comments

That’s not really a problem but annoying issue I had with OEM 12c. Once a standby database is promoted, the database system for the same is showing as metric collections error OR Status Pending.

The standby database doesn’t need its own system since it will join the primary database system. The solution is to associate the standby database with the primary system and then remove the standby database system.

For example – we’ve got primary and standby databases – TESTDB_LON, TESTDB_RDG. Once promoted the following targets are also created in OEM – TESTDB_LON_sys and TESTDB_RDG_sys.

The second one will always be having status Pending:
Status Pending (Post Blackout)

The way to resolve that is to associate the standby database with the primary database system. I usually rename the primary database system as well to omit the location (LON and RDG):
– Go to the Targets -> Systems and choose the system you want to edit
– Then go to Database System -> Target Setup -> Edit system
– Rename the system name from TESTDB_LON_sys to TESTDB_sys
– Save changes
– Go to Database System again, Target Setup -> Edit system
– Click next to go to Step 2
– Add the standby database to the Standby Database Associations table
– Save changes

At this point we’ve got one system TESTDB_sys with two database members TESTDB_LON and TESTDB_RDG.

Next step is to remove the database system for the standby using emcli:

[oracle@oem12c ~]$ /opt/app/oracle/em12cr4/middleware/oms/bin/emcli login -username=sysman
Enter password :
Login successful

[oracle@oem12c ~]$ /opt/app/oracle/em12cr4/middleware/oms/bin/emcli delete_target -name="TESTDB_RDG_sys" -type="oracle_dbsys"
Target "TESTDB_RDG_sys:oracle_dbsys" deleted successfully

Now it’s all sorted and hopefully all targets are “green”.

Categories: oracle Tags:

Exadata X5 PDU – CLI already in use

September 18th, 2015 No comments

Exadata X5-2 and X4-8B racks are delivered with the “Enhanced” PDU metering units connected via the Cisco switch. Although the documentation says they should have static addresses, they don’t. You need to configure them manually using serial console connection, this is described in my earlier post here.

However if you forget to exit the serial console connection to the PDU and then try to login using SSH later you’ll get the following message:

login as: admin
admin@192.168.1.10's password:

CLI already in use!!!
Please try again later .....

Then someone has to go all the way to the data centre and reset the PDU or exit from the serial console.

Categories: oracle Tags:

Start of ‘ora.crf’ failed after update to 12.1.0.2 DBBP7

July 25th, 2015 2 comments

This happened to me a month ago right after I applied DBBP7 on 12.1.0.2. For some reason the ora.crf resource didn’t start automatically:

CRS-5013: Agent "ORAROOTAGENT" failed to start process "/u01/app/12.1.0.2/grid/bin/osysmond" for action "start": details at "(:CLSN00008:)" in "/u01/app/oracle/diag/crs/exa01db01/crs/trace/ohasd_orarootagent_root.trc"
CRS-2674: Start of 'ora.crf' on 'exa01db01' failed

Checking the trace file for more details you can immediately spot where the problem is:

2015-06-04 10:35:51.156513 :CLSDYNAM:3286230784: [ ora.crf]{0:0:8275} [start] (:CLSN00008:)Utils:execCmd scls_process_spawn() failed 1
2015-06-04 10:35:51.156520 :CLSDYNAM:3286230784: [ ora.crf]{0:0:8275} [start] (:CLSN00008:) category: -1, operation: fail, loc: canexec2, OS error: 0, other: no exe permission, file [/u01/app/12.1.0.2/grid/bin/osysmond]

Indeed the osysmond is owned by the oracle user where it should be owned by root:

[root@exa01db01 ~]# ls -al /u01/app/12.1.0.2/grid/bin/osysmond
-rwxr-x--- 1 oracle oinstall 9441 Jun  4 10:42 /u01/app/12.1.0.2/grid/bin/osysmond

The fix for that is simple – you need to unlock and lock the GI:

[root@exa01db01 ~]# /u01/app/12.1.0.2/grid/crs/install/rootcrs.pl -unlock
[root@exa01db01 ~]# /u01/app/12.1.0.2/grid/crs/install/rootcrs.pl -patch

The osysmond has the correct permissions now and the resource ora.crf starts sucessfully:

[root@exa01db01 ~]# ls -al /u01/app/12.1.0.2/grid/bin/osysmond
-rwxr-x--- 1 root oinstall 9533 Jun  4 10:48 /u01/app/12.1.0.2/grid/bin/osysmond

 

For reference:

Categories: oracle Tags:

Exadata’s onecommand fails to validate NTP servers on storage servers

July 6th, 2015 No comments

This will be simple and short post on an issue I had recently. I got the following error while running the first step of onecommand – Validate Configuration File:

2015-07-01 12:31:03,712 [INFO  ][    main][     ValidationUtils:761] SUCCESS: NTP servers on machine exa01db02.local.net verified successfully
2015-07-01 12:31:03,713 [INFO  ][    main][     ValidationUtils:761] SUCCESS: NTP servers on machine exa01db01.local.net verified successfully
2015-07-01 12:31:03,714 [INFO  ][    main][     ValidationUtils:778] Following errors were found...
2015-07-01 12:31:03,714 [INFO  ][    main][     ValidationUtils:783] ERROR: Encountered error while running NTP validation error on host: exa01cel03.local.net
2015-07-01 12:31:03,714 [INFO  ][    main][     ValidationUtils:783] ERROR: Encountered error while running NTP validation error on host: exa01cel02.local.net
2015-07-01 12:31:03,714 [INFO  ][    main][     ValidationUtils:783] ERROR: Encountered error while running NTP validation error on host: exa01cel01.local.net

Right, so my NTP servers were accessible from the db nodes but not from the cells. When I queried the NTP server from the cells I got the following error:

# ntpdate -dv ntpserver1
1 Jul 09:00:09 ntpdate[22116]: ntpdate 4.2.6p5@1.2349-o Fri Feb 27 14:50:33 UTC 2015 (1)
Looking for host ntpserver1 and service ntp
host found : ntpserver1.local.net
transmit(172.16.1.100)
transmit(172.16.1.100)
transmit(172.16.1.100)
transmit(172.16.1.100)
transmit(172.16.1.100)
172.16.1.100: Server dropped: no data
server 172.16.1.100, port 123

Perhaps I should have mentioned that the cells have their own firewall (cellwall) which will only allow certain inbound/outbound traffic. During boot the script will build all the rules dynamically and apply them. Now the above error occurred because of two reasons:

A) The NTP servers were specified using hostname instead of IP addresses in OEDA
B) The management network was NOT available after the initial config (applyElasticConfig) was applied

Because of that cellwall was not able to resolve the NTP servers IP addresses and thus they were omitted from the firewall configuration. You can safely proceed with the deployment but if you want to get rid of the annoying message the solution is simply to restart the cell firewall – /etc/init.d/cellwall restart

Categories: oracle, Uncategorized Tags: