Archive for the ‘oracle’ Category

opatch fails with System Configuration Collection Failed

June 1st, 2015 No comments

I was recently upgrading an Exadata DBBP6 to DBBP7 and as usual I went for the latest opatch version which was (Apr 2015) as of that time.

After running the opatchauto apply or opatchauto apply -analyze I got the following error:

System Configuration Collection failed: oracle.osysmodel.driver.sdk.productdriver.ProductDriverException: java.lang.NullPointerException
Exception in thread "main" java.lang.RuntimeException: Stream closed
Caused by: Stream closed
        ... 2 more

opatchauto failed with error code 1.

This was a known problem caused by these bugs which are fixed in

As is not yet available the workaround is to use lower version of opatch which can be downloaded from this note:
Opatchauto Gives “System Configuration Collection Failed” Message (Doc ID 2001933.1)

You might run into the same problem if you are applying PSU.

Categories: oracle Tags: , ,

Speaking at UKOUG Systems Event and BGOUG

May 19th, 2015 No comments

I’m pleased to say that I will be speaking at the UKOUG Systems Event 2015, held at Cavendish Conference Center in London, 20 May 2015. My session “Oracle Exadata Meets Elastic Configurations” starts at 10:15 in Portland Suite. Here is the agenda of the UKOUG Systems Event.

In a month time I’ll be also speaking at the Spring Conference of the Bulgarian Oracle User Group. The conference will be held from 12th to 14th June, 2015 in hotel Novotel in Plovdiv, Bulgaria. I’ve got the conference opening slot at 11:00 in hall Moskva, my session topic is “Oracle Data Guard Fast-Start Failover: Live demo”. Here is the agenda of the conference.

I would like to thank EDBA for making this happen!

Categories: oracle Tags: , fails with Unable to locate any IB switches

May 15th, 2015 No comments

With the release of Exadata X5 Oracle introduced elastic configurations and changed the process on how the initial configuration is performed. Back before you had to run which would go across the nodes and change all the settings according to your config. This script has now evolved and it’s called which is part of OEDA (onecommand). During one of the recent deployments I ran into the below problem:

[root@node8 linux-x64]# ./ -cf Customer-exa01.xml

Applying Elastic Config...
Applying Elastic configuration...
Searching Subnet 172.16.2.x..........
5 live IPs in 172.16.2.x.............
Exadata node found
Collecting diagnostics...
Errors occurred. Send /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/ to Oracle to receive assistance.
Exception in thread "main" java.lang.NullPointerException
at oracle.onecommand.commandexec.utils.CommonUtils.getStackFromException(
at oracle.onecommand.deploy.cliXml.ApplyElasticConfig.doDaApply(
at oracle.onecommand.deploy.cliXml.ApplyElasticConfig.main(

Going through the logs we can see the following message:

2015-05-12 16:07:16,404 [FINE ][ main][ OcmdException:139] OcmdException from node return code = 2 output string: Unable to locate any IB switches... stack trace = java.lang.Throwable

The problem was caused because of IB switch names in my OEDA XML file were different to the one’s actually physically in the rack, actually the IB switch hostnames were missing from the hosts file. So if you ever run into this problem make sure your IB switch hosts file (/etc/hosts) has the correct hostname in the proper format:

#IP                 FQDN                      ALIAS       exa01ib01

Also make sure to reboot the IB switch after any change of the hosts file.

Categories: oracle Tags:

How to configure Link Aggregation Control Protocol on Exadata

May 13th, 2015 2 comments

During a recent X5 installation I had to configure Link Aggregation Control Protocol (LACP) on the client network of the compute nodes. Although the ports were running at 10Gbits and default configuration of Active/Passive works perfectly fine the customer wanted even distribution of traffic and workload across their core switches.

Link Aggregation Control Protocol (LACP), also known as 802.3ad is a methods of combining multiple physical network connections into one logical connection to increase throughput and provide redundancy in case one of the links should fail. The protocol requires both – the server and the switch(es) to have the same settings to allow LACP to work properly.

To configure LACP on Exadata you need to change the bondeth0 parameters.

On each of the compute nodes open the following file:


and replace the line saying BONDING_OPTS with this one:

BONDING_OPTS="mode=802.3ad xmit_hash_policy=layer3+4 miimon=100 downdelay=200 updelay=5000 num_grat_arp=100"

and then restart the network interface:

ifdown bondeth0
ifup bondeth0
Determining if ip address is already in use for device bondeth0...

You can check the status of the interface by query the proc filesystem. Make sure both interfaces are up and running at the same speed. The esential part to make sure the LACP is working is shown below:

cat /proc/net/bonding/bondeth0

802.3ad info
LACP rate: slow
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 2
Actor Key: 33
Partner Key: 34627
Partner Mac Address: 00:23:04:ee:be:c8
I had a problem with the network where the client network did NOT come up after server reboot. This was happening because during system boot the 10Gbit interfaces goes through multiple resets causing very fast link change. Here is the status of the bond as of that time:
cat /proc/net/bonding/bondeth0

802.3ad info
LACP rate: slow
Aggregator selection policy (ad_select): stable
bond bondeth0 has no active aggregator
The solution for that was to decrease the down_delay to 200. The issue is described in this note:
Bonding Mode 802.3ad Using 10Gbps Network – Slave NICs Fail to Come Up Consistently after Reboot (Doc ID 1621754.1)


Categories: linux, oracle Tags:

Smart firewalls

April 15th, 2015 No comments

It’s been a while since my last post but I was really busy working on a number of projects.

The purpose of this post is to highlight an issue I had while building a standby database. The environment we had – three databases at host A (primary) and same were restored from backup on another host B (standby), both hosts were running Linux. It’s important to mention that both hosts were located in different Data Centers.

Once a standby database was mounted we would start shipping archive log files from the primary without adding it to the DataGuard Broker config as of that moment. We wanted to touch the production as little as possible and would add the database to the broker config just before doing the switchover. In the meanwhile we would manually recover the standby database to reduce the apply lag once the database is being added to the broker config. This approach worked fine for two of the databases but we got this error for the third one:

Fri Mar 13 13:33:43 2015
RFS[33]: Assigned to RFS process 29043
RFS[33]: Opened log for thread 1 sequence 29200 dbid -707326650 branch 806518278
CORRUPTION DETECTED: In redo blocks starting at block 20481count 2048 for thread 1 sequence 29200
Deleted Oracle managed file +RECO01/testdb/archivelog/2015_03_13/thread_1_seq_29200.8481.874244023
RFS[33]: Possible network disconnect with primary database
Fri Mar 13 13:42:45 2015
Errors in file /u01/app/oracle/diag/rdbms/testdb/testdb/trace/testdb_rfs_31033.trc:

Running through the trace file the first thing which I noticed was:

Corrupt redo block 5964 detected: BAD CHECKSUM

We already had two databases running from host A to host B so we rulled out the firewall issue. Then tried couple of other things – manually recovered the standby with incremental backup, recreated the standby, cleared all the redo/standby log groups but nothing helped. I found only one note in MOS with similar symptom for Streams in 10.2.

At the end the network admins were asked to check the config of the firewalls one more time. There were two firewalls – one where host A was located and another one where host B was located.

It turned out that the firewall at host A location had SQLnet class inspection enabled which was causing the corruption. The logs were successfully shipped from the primary database once this firewall feature was disabled. The strange thing was that we haven’t had any issues with the other two databases running on the same hosts, well what can I say – smart firewalls.


Categories: oracle Tags:

RHEL6 udev and EMC PowerPath

January 26th, 2015 No comments

I’m working on Oracle database migration project where customer have chosen commodity x86 hardware with RHEL6 and EMC storage.

I’ve done many similar installations in the past and I always used the native MPIO in Linux (DM-Multipath) to load balance and failover I/O paths. This time however I’ve got EMC PowerPath doing the load balance and failover and got the native MPIO disabled. From my point of view it’s the same, whether I’ll be using /dev/emcpower* or /dev/mapper/* it’s the same. Obviously PowerPath has some advantages over the native MPIO which I really can’t tell yet. That’s a good paper from EMC giving a comparison between the native MPIO in different operating systems.

As mentioned before the aggregated logical names (pseudo names) with EMC PowerPath could be found under /dev/emcpowerX. I partitioned the disks with GPT tables and aligned the first partition to match the storage sector size. Also added to following line to udev rules to make sure my devices will get the proper permissions:

ACTION=="add", KERNEL=="emcpowerr1", OWNER:="oracle", GROUP:="dba", MODE="0600"

I restarted the server and then later udev to make sure ownership and permissions were picked up correctly. Upon running asmca to create ASM with the first disk group I got the following errors:

Configuring ASM failed with the following message:
One or more disk group(s) creation failed as below:
Disk Group DATA01 creation failed with the following message:
ORA-15018: diskgroup cannot be created
ORA-15031: disk specification '/dev/emcpowerr1' matches no disks
ORA-15025: could not open disk "/dev/emcpowerr1"
ORA-15056: additional error message

Well that’s strange, I’m sure the file had to correct permissions. However listing the file proved that it didn’t have the correct permissions. I repeated the process several times and always got the same result, you can use simple touch command to get the same result:

[root@testdb ~]# ls -al /dev/emcpowerr1
brw-rw---- 1 oracle dba 120, 241 Jan 23 12:35 /dev/emcpowerr1
[root@testdb ~]# touch /dev/emcpowerr1
[root@testdb ~]# ls -al /dev/emcpowerr1
brw-rw---- 1 root root 120, 241 Jan 23 12:35 /dev/emcpowerr1

Something was changing the ownership of the file and I didn’t know what. Well you’ll be no less surprised than I was to find that linux has a similar auditing framework as the Oracle database.

Auditctl will allow you to audit any file for any syscall run against it. In my case I would like to know which process is changing the ownership of my device file. Another helpful command is ausyscall whic allows you to map syscall names and numbers. In other words I would like to know what is the chmod syscall number on a 64bit platform (it does matter):

[root@testdb ~]# ausyscall x86_64 chmod --exact

Then I would like to set up auditing for all chmod calls against my device file:

[root@testdb ~]# auditctl -a exit,always -F path=/dev/emcpowerr1 -F arch=b64 -S chmod
[root@testdb ~]# touch /dev/emcpowerr1
[root@testdb ~]# tail -f /var/log/audit/audit.log
type=SYSCALL msg=audit(1422016631.416:4208): arch=c000003e syscall=90 success=yes exit=0 a0=7f3cfbd36960 a1=61b0 a2=7fff5c59b830 a3=0 items=1 ppid=60056 pid=63212 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="udevd" exe="/sbin/udevd" key=(null)
type=CWD msg=audit(1422016631.416:4208):  cwd="/"
type=PATH msg=audit(1422016631.416:4208): item=0 name="/dev/emcpowerr1" inode=28418 dev=00:05 mode=060660 ouid=54321 ogid=54322 rdev=78:f1 nametype=NORMAL
[root@testdb ~]# auditctl -D
No rules

Gotcha! So it was udev changing the permissions but why ?

I spent half day going through logs and tracing udev but couldn’t find anything.

At the end of the day I found an article by RHEL on which they had exactly the same problem. The solution was to have “add|change” into the ACTION directive instead of only “add”.

So here is the rule you need to have in order for UDEV to set a persistent ownership/permission on EMC PowerPath device files in RHEL 6:

[root@testdb ~]# cat /etc/udev/rules.d/99-oracle-asm.rules
ACTION=="add|change", KERNEL=="emcpowerr1", OWNER:="oracle", GROUP:="dba", MODE="0600"

Hope it helps and you don’t have to spent half day as I did.


Categories: linux, oracle Tags:

runInstaller fails at CreateOUIProcess with permission denied

January 16th, 2015 No comments

Just a short post on a problem I encountered recently.

I had to install 11.2 GI and right after running the installer I got a message saying permission denied. Below is the exact error:

[oracle@testdb grid]$ ./runInstaller -silent -showProgress -waitforcompletion -responseFile /u01/software/grid/response/grid_install_20140114.rsp
Starting Oracle Universal Installer...

Checking Temp space: must be greater than 120 MB.   Actual 7507 MB    Passed
Checking swap space: must be greater than 150 MB.   Actual 8191 MB    Passed
Preparing to launch Oracle Universal Installer from /tmp/OraInstall2015-01-15_12-12-20PM. Please wait ...Error in CreateOUIProcess(): 13
: Permission denied

Quickly tracing the process I can see that it fails to execute the java installer:

27316 execve("/tmp/OraInstall2015-01-15_12-05-40PM/jdk/jre/bin/java", ["/tmp/OraInstall2015-01-15_12-05-"..., "-Doracle.installer.library_loc=/"..., "-Doracle.installer.oui_loc=/tmp/"..., "-Doracle.installer.bootstrap=TRU"..., "-Doracle.installer.startup_locat"..., "-Doracle.installer.jre_loc=/tmp/"..., "-Doracle.installer.nlsEnabled=\"T"..., "-Doracle.installer.prereqConfigL"..., "-Doracle.installer.unixVersion=2"..., "-mx150m", "-cp", "/tmp/OraInstall2015-01-15_12-05-"..., ""..., "-scratchPath", "/tmp/OraInstall2015-01-15_12-05-"..., "-sourceLoc", ...], [/* 22 vars */]) = -1 EACCES (Permission denied)

I never had this problem before, see similar behaviour with having selinux enabled but that wasn’t the case.

Then why I remembered that while formatting a partition for u01 and adding to fstab I saw that tmp didn’t have the default mount options:

/dev/mapper/vglocal00-tmp00 /tmp                    ext4    defaults,noexec 1 2

Indeed, the noexec option will not let you execute binaries that are on that partition. This server was built by a hosting provider and I guess this was part of thir default deployment process.

After removing the option and remounting /tmp (mount -o remount /tmp), installer was able to run successfully.

Categories: linux, oracle Tags:

2014 in review

January 12th, 2015 No comments

Happy New Year!

So many things happened in the past six months, I really can’t tell how quickly the time passed. As people say the life is what happens to you while you are making plans for the future.

I wish I had the time to blog more in the past year and I plan to change this in the New Year!

2014 was really successful for me. I worked on some really interesting projects, configured my first Exadata, migrated a few more databases to Exadata and I faced some challenging problems. This year is about to be no different and I’ve already started another interesting and challenging migration.

Same year I presented at Oracle Open World for which I would like to thank Jason Arneil for the joint presentation and E-DBA for making this happen! At the same time e-DBA have been awarded the Oracle Excellence Award Specialised Global Partner of year for Oracle Engineered Systems.

Last but not least I was honoured with Employee of the Year award last month, again thank you E-DBA team!

Employee of the Year


Categories: oracle, personal Tags:

Speaking at Oracle Open World 2014

September 25th, 2014 No comments

I’m more than happy that I will be speaking at this year’s Oracle Open World. The first and only time I attended was back in 2010 and now I’m not only attending but speaking as well!

Both with Jason Arneil will talk about what we’ve learned on our Exadata implementations with two of the biggest UK retailers so please join us:
Session ID: CON2224
Session Title: Oracle Exadata Migrations: Lessons Learned from Retail
Venue / Room: Moscone South – 310
Date and Time: 9/30/14, 15:45 – 16:30

I would like to thank E-DBA and especially Jason for making this happen!

I’m also planning to attend Oaktable World 2014 and Oracle OpenWorld 2014 – Bloggers Meetup for the best part of OOW – really technical sessions and networking!

See you there!



Categories: oracle Tags:

Speaking at BGOUG 2014 Spring conference

May 30th, 2014 No comments

I’ll be speaking at the spring conference of the BGOUG held between 13th and 15th of June. I was a regular attendee of the conference for eight years in a row but since I moved to UK I had to skip the last two conferences. My session is about Oracle GoldenGate – it will cover the basics, components, usage scenarios, installation and configuration, trail files and GG records and many more.

See you there in two weeks.



Categories: oracle Tags: ,