applyElasticConfig.sh fails with Unable to locate any IB switches
With the release of Exadata X5 Oracle introduced elastic configurations and changed the process on how the initial configuration is performed. Back before you had to run applyconfig.sh which would go across the nodes and change all the settings according to your config. This script has now evolved and it's called applyElasticConfig.sh which is part of OEDA (onecommand). During one of the recent deployments I ran into the below problem:
[root@node8 linux-x64]# ./applyElasticConfig.sh -cf Customer-exa01.xml Applying Elastic Config... Applying Elastic configuration... Searching Subnet 172.16.2.x.......... 5 live IPs in 172.16.2.x............. Exadata node found 172.16.2.46. Collecting diagnostics... Errors occurred. Send /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/Diag-150512_160716.zip to Oracle to receive assistance. Exception in thread "main" java.lang.NullPointerException at oracle.onecommand.commandexec.utils.CommonUtils.getStackFromException(CommonUtils.java:1579) at oracle.onecommand.deploy.cliXml.ApplyElasticConfig.doDaApply(ApplyElasticConfig.java:105) at oracle.onecommand.deploy.cliXml.ApplyElasticConfig.main(ApplyElasticConfig.java:48)
Going through the logs we can see the following message:
2015-05-12 16:07:16,404 [FINE ][ main][ OcmdException:139] OcmdException from node node8.my.company.com return code = 2 output string: Unable to locate any IB switches... stack trace = java.lang.Throwable
The problem was caused because of IB switch names in my OEDA XML file were different to the one's actually physically in the rack, actually the IB switch hostnames were missing from the hosts file. So if you ever run into this problem make sure your IB switch hosts file (/etc/hosts) has the correct hostname in the proper format:
#IP FQDN ALIAS 192.168.1.100 exa01ib01.local.net exa01ib01
Also make sure to reboot the IB switch after any change of the hosts file.