Archive

Archive for the ‘oracle’ Category

RMAN fails to allocate channel with Tivoli Storage Manager

February 6th, 2014 No comments

I was recently configuring backup on the customers Exadata with IBM TSM Data Protection for Oracle and run into weird RMAN error. The configuration was Oracle Database 11.2, TSM client version 6.1 and TSM Server version 5.5 and this was the error:

[oracle@oraexa01 ~]$ rman target /

Recovery Manager: Release 11.2.0.3.0 - Production on Wed Jan 29 16:41:54 2014

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to target database: TESTDB (DBID=2128604199)

RMAN> run {
2> allocate channel c1 device type 'SBT_TAPE';
3> }

using target database control file instead of recovery catalog
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of allocate command on c1 channel at 01/29/2014 16:42:01
ORA-19554: error allocating device, device type: SBT_TAPE, device name:
ORA-27000: skgfqsbi: failed to initialize storage subsystem (SBT) layer
Linux-x86_64 Error: 106: Transport endpoint is already connected
Additional information: 7011
ORA-19511: Error received from media manager layer, error text:
SBT error = 7011, errno = 106, sbtopen: system error

You get this message because the Tivoli Storage Manager API error log file (errorlogname option specified in the dsm.sys file) is not writable by the Oracle user.

Just change the file permissions or change the parameter to point to a file under /<writable_path>/ and retry your backup:

[root@oraexa01 ~]# chmod a+w /usr/tivoli/tsm/client/ba/bin/dsmerror.log

This time RMAN allocates channel successfully:

[oracle@oraexa01 ~]$ rman target /

Recovery Manager: Release 11.2.0.3.0 - Production on Wed Jan 29 16:42:52 2014

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to target database: TESTDB (DBID=2128604199)

RMAN> run {
2> allocate channel c1 device type 'SBT_TAPE';
3> }

using target database control file instead of recovery catalog
allocated channel: c1
channel c1: SID=807 instance=TESTDB device type=SBT_TAPE
channel c1: Data Protection for Oracle: version 5.5.1.0
released channel: c1
Categories: linux, oracle Tags: , ,

Oracle GI 12.1 error when using NFS

January 16th, 2014 No comments

I had quite an interesting case recently where I had to build stretch cluster for a customer using Oracle GI 12.1 and placing quorum voting disk on NFS. There is a document at OTN regarding the stretch clusters and using NFS as a third location for voting disk but it has information for 11.2 only as of the moment. Assuming there is no difference in the NFS parameters I used the Linux parameters from that document and mounted the NFS share on the cluster nodes.

Later on when I tried to add the third voting disk within the ASM disk group I got this strange error:

SQL> ALTER DISKGROUP OCRVOTE ADD  QUORUM DISK '/vote_nfs/vote_3rd' SIZE 10000M /* ASMCA */
Thu Nov 14 11:33:55 2013
NOTE: GroupBlock outside rolling migration privileged region
Thu Nov 14 11:33:55 2013
Errors in file /install/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_26408.trc:
ORA-17503: ksfdopn:3 Failed to open file /vote_nfs/vote_3rd
ORA-17500: ODM err:Operation not permitted
Thu Nov 14 11:33:55 2013
Errors in file /install/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_33427.trc:
ORA-17503: ksfdopn:3 Failed to open file /vote_nfs/vote_3rd
ORA-17500: ODM err:Operation not permitted
NOTE: Assigning number (1,3) to disk (/vote_nfs/vote_3rd)
NOTE: requesting all-instance membership refresh for group=1
Thu Nov 14 11:33:55 2013
ORA-15025: could not open disk "/vote_nfs/vote_3rd"
ORA-17503: ksfdopn:3 Failed to open file /vote_nfs/vote_3rd
ORA-17500: ODM err:Operation not permitted
WARNING: Read Failed. group:1 disk:3 AU:0 offset:0 size:4096
path:Unknown disk
incarnation:0xeada1488 asynchronous result:'I/O error'
subsys:Unknown library krq:0x7f715f012d50 bufp:0x7f715e95d600 osderr1:0x0 osderr2:0x0
IO elapsed time: 0 usec Time waited on I/O: 0 usec
NOTE: Disk OCRVOTE_0003 in mode 0x7f marked for de-assignment
Errors in file /install/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_33427.trc  (incident=83441):
ORA-00600: internal error code, arguments: [kfgscRevalidate_1], [1], [0], [], [], [], [], [], [], [], [], []
ORA-15080: synchronous I/O operation failed to read block 0 of disk 3 in disk group OCRVOTE

This happens because with 12c direct NFS is used by default and it will use ports above 1024 to initiate connections. On the other hand there is a default option on the NFS server – secure which will require any incoming connections from ports below 1024:
secure This  option requires that requests originate on an internet port less than IPPORT_RESERVED (1024). This option is on by default. To turn it off, specify insecure.

The solution for that is to add insecure parameters to the exporting NFS server, remount the NFS share and then retry the above operation.

For more information refer to:
12c GI Installation with ASM on NFS Disks Fails with ORA-15018 ORA-15072 ORA-15080 (Doc ID 1555356.1)

 

Categories: linux, oracle Tags: , ,

EMDIAG Repvfy 12c kit – basics

October 30th, 2013 No comments

Second post in the series of emdiag repvfy kit about the basics of the tool. Having the kit already installed in earlier post it is time now to get some basics before we start troubleshooting.

There are three main commands with repvfy:

  • verify – repository-wide verification
  • analyze – objects specific verification/analysis
  • dump  – dump specific repository object

As you can tell from the description, the verify is run against the repository and doesn’t require any arguments by default while analyze and dump commands require specific object to be given. To get a list of all available commands of the kit run repvfy -h1.

 

Verify command

Let’s make something clear in the begging, the verify command will run repository-wide verification against many tests which are FIRST grouped into modules and SECOND categorized in several levels. To get a list of available modules run repvfy -h4, there are more than 30 modules and I won’t go into detail for each, but most useful are – Agents, Plugins, Exadata, Repository, Targets. The list of levels can be found at the end of the post, it’s important to say that levels are cumulative and by default tests are run in level 2!

When investigate or debug a problem with the repository always start with verify command. It’s a good starting point to run verify without any arguments, it will go through all modules and give you summary if certain problems (violations) are present and also get an initial look on the health of the repository and then start debugging specific problem.

So here is how verify output looked for my OEM repository:

[oracle@oem bin]$ ./repvfy verify

Please enter the SYSMAN password:

-- --------------------------------------------------------------------- --
-- REPVFY: 2013.1008     Repository: 12.1.0.3.0     29-Oct-2013 11:30:37 --
---------------------------------------------------------------------------
-- Module:                                          Test:   0, Level: 2 --
-- --------------------------------------------------------------------- --

verifyAGENTS
verifyASLM
verifyAVAILABILITY
1002. Disabled response metrics (16570376): 2
verifyBLACKOUTS
verifyCAT
verifyCORE
verifyECM
1002. Unregistered ECM metadata tables: 2
verifyEMDIAG
verifyEVENTS
verifyEXADATA
2001. Exadata plugin version mismatches: 5
verifyJOBS
verifyJVMD
verifyLOADERS
verifyMETRICS
verifyNOTIFICATIONS
verifyOMS
verifyPLUGINS
1003. Plugin metadata versions out of sync: 13
verifyREPOSITORY
verifyTARGETS
1021. Composite metric calculation with inconsistent dependant metadata versions: 3
2004. Targets without an ORACLE_HOME association: 2
2007. Targets with unpromoted ORACLE_HOME target: 2
verifyUSERS

The verify command can also be run with -detail argument to get more details for the problem found. It will also show which test found the problem and what actions can be taken to correct the problem. That’s useful for another reason – it will print the target name and guid which can be used for further analysis using analyze and dump commands.

The command can also be run with -level argument, starting with zero for a fatal errors and increasing to nine for more minor errors and best practices, list of levels can be found at the end of the post.

 

Analyze command

Analyze command is run against specific target which can be specific either by its name or its unique identifier (guid). To get a list of supported targets run repvfy -h5. The analyze command is very similar to the verify command except it is run against specific target. Again it can be run with -level and -detail arguments, like this:

[oracle@oem bin]$ ./repvfy analyze exadata -guid 6744EED794F4CCCDBA79EC00332F65D3 -level 9

Please enter the SYSMAN password:

-- --------------------------------------------------------------------- --
-- REPVFY: 2013.1008     Repository: 12.1.0.3.0     29-Oct-2013 12:00:09 --
---------------------------------------------------------------------------
-- Module: EXADATA                                  Test:   0, Level: 9 --
-- --------------------------------------------------------------------- --

analyzeEXADATA
2001. Exadata plugin version mismatches: 1
6002. Exadata components without a backup Agent: 4
6006. Check for DB_LOST_WRITE_PROTECT: 1
6008. Check for redundant control files: 5

For that Exadata target we can see there are few more problems found with level 9 except the one found earlier about plugin version mismatch with level 2.

One of the next posts will be dedicated to troubleshooting and fixing problems in Exadata module.

 

Dump command

Dump command is used to dump all the information about specific repository object, as analyze command it expects either target name or target guid. For a list of supported targets run repvfy -h6.

I won’t show any example because it will dump all the details about that target – more than 2000 lines. If you run the dump command against the same target used in analyze you will get a ton of information  like – associated targets with this Exadata (hosts, iloms, databases, instances), list of monitoring agents, plugin version, some address details, long list of  targets alerts/warnings.

Seems rather useless because it just dumps a lot of information but actually it helped me identifying the problem I had about plugin version mismatch within the Exadata module.

 

Repository verification and object analysis levels:

0  - Fatal issues (functional breakdown)
 These test highlight fatal errors found in the repository. These errors will prevent EM from functioning
 normally and should get addressed straight away.

1  - Critical issues (functionality blocked)

2  - Severe issues (restricted functionality)

3  - Warning issues (potential functional issue)
 These tests are meant as 'warning', to highlight issues which could lead to potential problems.

4 - Informational issues (potential functional issue)
 These tests are informational only. They represent best practices, potential issues, or just areas to verify.

5 - Currently not used

6  - Best practice violations
 These test highlight discrepancies between the known best practices, and the actual implementation
 of the EM environment.

7  - Purging issues (obsolete data)
 These test highlight failures to clean up (all the) historical data, or problems with orphan data
 left behind in the repository.

8  - Failure Reports (historical failures)
 These test highlight historical errors that have occurred.

9  - Tests and internal verifications
 These tests are internal tests, or temporary and diagnostics tests added to resolve specific problems.
 They are not part of the 'regular' kit, and are usually added while debugging or testing specific issues.

 

In the next post I’ll troubleshoot and fix the errors I had within the Availability module – Disabled response metrics.

 

For more information and examples refer to following notes:
EMDIAG Repvfy 12c Kit – How to Use the Repvfy 12c kit (Doc ID 1427365.1)
EMDIAG REPVFY Kit – Overview (Doc ID 421638.1)

 

EMDIAG repvfy blog series:
  • EMDIAG Repvfy 12c kit – installation
  • EMDIAG Repvfy 12c kit – basics
  • EMDIAG Repvfy 12c kit – troubleshoot Availability module
  • EMDIAG Repvfy 12c kit – troubleshoot Exadata module
  • EMDIAG Repvfy 12c kit – troubleshoot Plugins module
  • EMDIAG Repvfy 12c kit – troubleshoot Targets module

 

Categories: oracle Tags: , ,

Why my EM12c is giving Metric evaluation error for Exadata cell targets?

October 25th, 2013 No comments

As part of my Cloud Control journey I encountered a strange problem where I got the following error for few Exadata Storage Server (cell) targets:

Metric evaluation error start - oracle.sysman.emSDK.agent.fetchlet.exception.FetchletException: em_error=Failed to execute_exadata_response.pl ssh -q -o ConnectTimeout=60 -o BatchMode=yes -o StrictHostKeyChecking=no -o PreferredAuthentications=publickey -i /home/oracle/.ssh/id_dsa -l cellmonitor 10.141.8.68 cellcli -xml -e ' list cell attributes msStatus ':

Another symptom is that I received two mails from OEM, one saying that the cell and its services are up:

EM Event: Clear:exacel05.localhost.localdomain - exacel05.localhost.localdomain is Up. MS Status is RUNNING and Ping Status is SUCCESS.

and another one saying there is Metric evaluation error for the same target:

EM Event: Critical:exacel05.localhost.localdomain - Metric evaluation error start - oracle.sysman.emSDK.agent.fetchlet.exception.FetchletException: em_error=Failed to execute_exadata_response.pl ssh -q -o ConnectTimeout=60 -o BatchMode=yes -o StrictHostKeyChecking=no -o PreferredAuthentications=publickey -i /home/oracle/.ssh/id_dsa -l cellmonitor 10.141.8.68 cellcli -xml -e ' list cell attributes msStatus ':

I have to say that the error didn’t came up by itself, but it manifested after I had to redeploy the Exadata plugin on few agents. If you ever had to do this you would know that before removing the plugin from an agent you need to make sure the agent is not primary monitoring agent for Exadata  targets. In my case few of the agents were Monitoring Agents for the cells and I had to swap them with the Backup Monitoring Agent so I would be able to redeploy the plugin on the primary monitoring agent.

After I redeployed the plugin, I tried to revert back the initial configuration but for some reason the configuration messed up and I ended up with different agents monitoring different cell targets from what was at the beginning.

It turns out that one of the monitoring agents wasn’t able to connect to the cell and that’s why I got the email notifications and the Metric evaluation errors for the cells. Although that’s not a problem it’s quite annoying to receive such alerts and have all these targets with Metric collection error icons in OEM or having these targets reported with status Down.

Let’s first check which are the monitoring agents for that cell target from the OEM repository:

SQL> select target_name, target_type, agent_name, agent_type, agent_is_master
from MGMT$AGENTS_MONITORING_TARGETS
where target_name = 'exacel05.localhost.localdomain';

TARGET_NAME                      TARGET_TYPE     AGENT_NAME                         AGENT_TYPE AGENT_IS_MASTER
-------------------------------- --------------- ---------------------------------- ---------- ---------------
exacel05.localhost.localdomain   oracle_exadata  exadb03.localhost.localdomain:3872 oracle_emd               0
exacel05.localhost.localdomain   oracle_exadata  exadb02.localhost.localdomain:3872 oracle_emd               1

Looking on the cell secure log we can see that one of the monitoring agents wasn’t  able to connect because of failed publickey authentication:

Oct 23 11:39:54 exacel05 sshd[465]: Connection from 10.141.8.65 port 14594
Oct 23 11:39:54 exacel05 sshd[465]: Failed publickey for cellmonitor from 10.141.8.65 port 14594 ssh2
Oct 23 11:39:54 exacel05 sshd[466]: Connection closed by 10.141.8.65
Oct 23 11:39:55 exacel05 sshd[467]: Connection from 10.141.8.66 port 27799
Oct 23 11:39:55 exacel05 sshd[467]: Found matching DSA key: cf:99:0a:37:1a:e5:84:dc:a8:8a:b9:6f:0c:fd:05:c5
Oct 23 11:39:55 exacel05 sshd[468]: Postponed publickey for cellmonitor from 10.141.8.66 port 27799 ssh2
Oct 23 11:39:55 exacel05 sshd[467]: Found matching DSA key: cf:99:0a:37:1a:e5:84:dc:a8:8a:b9:6f:0c:fd:05:c5
Oct 23 11:39:55 exacel05 sshd[467]: Accepted publickey for cellmonitor from 10.141.8.66 port 27799 ssh2
Oct 23 11:39:55 exacel05 sshd[467]: pam_unix(sshd:session): session opened for user cellmonitor by (uid=0)

That’s confirmed by checking ssh authorized_keys file, which also confirms which were initially configured monitoring agents:

 [root@exacel05 .ssh]# grep exadb /home/cellmonitor/.ssh/authorized_keys | cut -d = -f 2
oracle@exadb03.localhost.localdomain
oracle@exadb04.localhost.localdomain

Another way to check which monitoring agent were configured initially is to check the snmpSubscriber attribute for that specific cell:

[root@exacel05 ~]# cellcli -e list cell attributes snmpSubscriber
((host=exadb03.localhost.localdomain,port=3872,community=public),(host=exadb04.localhost.localdomain,port=3872,community=public))

It’s obvious that exadb02 shouldn’t be monitoring this target but it should be exadb04 instead. I believe that when I redeployed the Exadata plugin this agent wasn’t eligible to monitor Exadata targets any more and was replaced with another one but that’s just a guess.

There are two solutions for that problem:

1. Move (relocate) target definition and monitoring to the correct agent:

I wasn’t able to find a way to do that through OEM Console and for that purpose I used emcli. Based on MGMT$AGENTS_MONITORING_TARGETS query and snmpSubscriber attribute I was able to find which agent was configured initially and which have to be removed.  Then I used emcli to relocate the monitoring agent for that target to the correct agent, the one which was configured initially:

[oracle@oem ~]$ emcli relocate_targets -src_agent=exadb02.localhost.localdomain:3872 -dest_agent=exadb04.localhost.localdomain:3872 -target_name=exacel05.localhost.localdomain -target_type=oracle_exadata -copy_from_src
Moved all targets from exadb02.localhost.localdomain:3872 to exadb04.localhost.localdomain:3872

2. Reconfigure the cell to use the new monitoring agent:

Add the current monitoring agent ssh publickey into the authorized_keys of the cell:

Place the oracle user DSA public key (/home/oracle/.ssh/id_dsa.pub) from exadb02 into exacel05:/home/cellmonitor/.ssh/authorized_keys

and also change the cell snmpSubscriber attribute:

[root@exacel05~]# cellcli -e "alter cell snmpSubscriber=((host='exadb03.localhost.localdomain',port=3872,community=public),(host='exadb02.localhost.localdomain',port=3872,community=public))"
Cell exacel05 successfully altered
[root@exacel05~]# cellcli -e list cell attributes snmpSubscriber
((host=exadb03.localhost.localdomain,port=3872,community=public),(host=exadb02.localhost.localdomain,port=3872,community=public))

After that the status at OEM for the Exadata Storage Server (cell) target became up and also the metrics were fine now.

 

Categories: oracle Tags: , , , ,

EMDIAG Repvfy 12c kit – installation

October 21st, 2013 No comments

Recently I’m doing a lot of OEM stuff for a customer and I’ve decided to tidy and clean up a little bit. The OEM version was 12cR2 and it was used for monitoring few Exadata’s and had around 650 targets. I planned to upgrade OEM to 12cR3, upgrade all agents and plugins, promote any un-promoted targets and delete any old/stale targets, at the end of the day I wanted up to date version of OEM and up to date information about the monitored targets.

Upgrade to 12cR3 went easy and smooth, described here in details, upgrade of agents and plugins went fairly easy as well. After some time everything was looking fine, but I wanted to be sure I didn’t missed anything so from one thing to another I found EMDIAG Repository Verification utility. It’s something I heard Julian Dontcheff mentioning for the first time on one of the BGOUG conferences and it’s something I was always looking to try.

So in series of posts I will describe installation of emdiag kit and how I fixed some of the problems I faced with my OEM repository.

What is EMDIAG kit

Basically EMDIAG consists of three kits:
– Repository verification (repvfy) – extracts data from the repository and run series of tests against the repository to help diagnose problems with OEM
– Agent verification (agtvfy) – troubleshoot problems with OEM agents
– OMS verification (omsvfy) – troubleshoot problems with the OEM management repository service

In this and following posts I’ll be referring to the first one – repository verification kit.

EMDIAG repvfy kit consist of set of tests which are run against OEM repository to help EM administrator to troubleshoot, analyze and help resolve OEM related problems.

The kit uses a shell and perl script as wrapper and a lot SQL scripts to run different tests and collect information from the repository.

It’s good to know that the emdiag kit has been around for some quite long time and it’ is available also for Grid Control 10g and 11g and DB Control 10g and 11g. This posts refer to the EMDIAG Repvfy 12c kit which can be installed only in a Cloud Control Management Repository 12c. Also emdiag kit repvfy 12c will be included in the RDA Release 4.30.

Apart from finding and resolving specific problem with emdiag kit it would be a good practice to run it at least once per week/month to check for any new problems that are reported.

Installing EMDIAG repvfy kit

Installation if pretty simple and straight forward, first download EMDIAG repvfy 12c kit from following MOS note:
EMDIAG REPVFY Kit for Cloud Control 12c – Download, Install/De-Install and Upgrade (Doc ID 1426973.1)

Just set your ORACLE_HOME to the database hosting the Cloud Control Management Repository and create new directory emdiag where the tool will be unziped:

[oracle@em ~]$ . oraenv
ORACLE_SID = [oracle] ? GRIDDB
The Oracle base for ORACLE_HOME=/opt/oracle/product/11.2.0/db_1 is /opt/oracle

[oracle@em ~]$ cd $ORACLE_HOME
[oracle@em db_1]$ mkdir emdiag
[oracle@em db_1]$ cd emdiag/
[oracle@em emdiag]$ unzip -q /tmp/repvfy12c20131008.zip
[oracle@em emdiag]$ cd bin
[oracle@em bin]$ ./repvfy install

Please enter the SYSMAN password:

...
...

COMPONENT            INFO
 -------------------- --------------------

EMDIAG Version       2013.1008
EMDIAG Edition       2
Repository Version   12.1.0.3.0
Database Version     11.2.0.3.0
Test Version         2013.1015
Repository Type      CENTRAL
Verify Tests         496
Object Tests         196
Deployment           SMALL

[oracle@em ~]$

And that’s it, emdiag kit for repository verification is now installed and we can start digging into OEM repository. With next post we’ll get some knowledge on the basics and commands used for verification and diagnostics.

EMDIAG repvfy blog series:
  • EMDIAG Repvfy 12c kit – installation
  • EMDIAG Repvfy 12c kit – basics
  • EMDIAG Repvfy 12c kit – troubleshoot Availability module
  • EMDIAG Repvfy 12c kit – troubleshoot Exadata module
  • EMDIAG Repvfy 12c kit – troubleshoot Plugins module
  • EMDIAG Repvfy 12c kit – troubleshoot Targets module

 

Categories: oracle Tags: , ,

Upgrade to Oracle Enterprise Manager Cloud Control 12c Release 3 (12.1.0.3)

August 15th, 2013 4 comments

Just a quick wrap up on EM12cR3 upgrade. I have to say that I was pleasantly surprised that everything went so smooth. I didn’t expected anything else, but with so how many products and components we have there I got few things in mind. The version I got was already upgraded to 12.1.0.2 so it was really easy for me to run the upgrade.

Things to watch out for:
– You need to be already running OEM version 12.1.0.2 to be able to upgrade to 12.1.0.3. If not you must apply BP1 to your 12.1.0.1 installation and then patch to 12.1.0.3. Remember to patch the agents as well.
– The upgrade to 12.1.0.3 is out-of-place upgrade, so you need to point out to new middleware home and you’ll need additional 15 GB for the installation.
– The installation takes between 1-2 hours to complete depending on you machine power.
– I didn’t stopped any of the agents during the upgrade.
– After the upgrade all OMS components were started automatically.

Here is what I’ve done:
1. Definitely take backup of the middleware home and database as well. You don’t want to end up removing the agents and reinstalling the OMS. I had 400+ targets and failure wasn’t an option. For the middleware home I used simple tar and RMAN for the repository database.

2. Stop the OMS and other components:

cd $ORACLE_HOME/bin/
cd bin
./emctl stop oms -all

3. It’s required that the EMKey be copied to the repository prior upgrade, if you miss that the installer will kindly remind you. There is also note  in the documentation:

$OMS_HOME/bin/emctl config emkey -copy_to_repos_from_file -repos_host [repository_host] -repos_port [port] -repos_sid [sid] -repos_user [username] -emkey_file $OMS_HOME/sysman/config/emkey.ora

4. The only command I run during the upgrade was a simple grant, everything else was done by the installer:

 SQL> grant execute on dbms_random to dbsnmp;

Grant succeeded.

5. Once the upgrade is complete:
– upgrade all the agents from the console:

Setup -> Manage Cloud Control -> Upgrade agents

– as a post upgrade step the old agent homes should be deleted:

Setup -> Manage Cloud Control -> Upgrade agents -> Post Agent Upgrade Tasks

– and secure the EMKey:

emctl config emkey -remove_from_repos

The upgrade guide is very useful, consider it before doing the upgrade.

Also have a look on New Features In Oracle Enterprise Manager Cloud Control 12.1.0.3 where the most notable thing is support of Oracle Database 12c and its new features, but there are plenty of other new features and improvements as well.

Categories: oracle Tags: , ,

Oracle EM auto discovery fails if the host is under blackout

August 5th, 2013 No comments

During Exadata project I had to put some order and tidy up the Enterprise Manager targets. I’ve decided to discover all targets, promote the one which are missing and delete old/stale one. I’ve got strange error and decided to share it if someone hit it. I’m running Oracle Enterprise Manager Cloud Control 12c Release 2.

When you try to run auto discovery from the console for a host you almost immediately get the following error:

Run Discovery Now failed on host oraexa201.host.net: oracle.sysman.core.disc.common.AutoDiscoveryException: Unable to run on discovery on demand.RunCollection: exception occurred: oracle.sysman.gcagent.task.TaskPreExecuteCheckException: non-existent, broken, or not fully loaded target

When review the agent log the following exception could be seen:

tail $ORACLE_HOME/agent/sysman/log/gcagent.log

2013-07-16 11:26:06,229 [34:2C47351F] INFO - >>> Reporting exception: oracle.sysman.emSDK.agent.client.exception.NoSuchMetricException: the DiscoverNow metric does not exist for host target oraexa201.host.net (request id 1) <<<
 oracle.sysman.emSDK.agent.client.exception.NoSuchMetricException: the DiscoverNow metric does not exist for host target oraexa201.host.net

Got another error message during my second (test) run:

2013-08-02 15:20:47,155 [33:B3DBCC59] INFO - >>> Reporting response: RunCollectionResponse ([DiscoverTargets : host.oraexa201.host.net oracle.sysman.emSDK.agent.client.exception.RunCollectionItemException: Metric evaluation failed : RunCollection: exception occurred: oracle.sysman.gcagent.task.TaskPreExecuteCheckException: non-existent, broken, or not fully loaded target @ yyyy-MM-dd HH:mm:ss,SSS]) (request id 1) <<<

Although emctl status agent shows that last successful heartbeat and upload are up to date,  still you cannot discover targets on the host.

This is caused by the fact that the host is under BLACKOUT!

1. Through the console end the blackout for that host:

Go to Setup -> Manager Cloud Control -> Agent, find the agent for which you experience the problem and click on it. Then you clearly can see that the status of the
agent is “Under Blackout”. Simply select Agent drop down menu – > Control and then End Blackout.

2. Using emcli, first login, list blackout and then stop the blackout:

[oracle@em ~]$ emcli login -username=sysman
Enter password

Login successful

[oracle@em ~]$ emcli get_blackouts
Name                                   Created By  Status   Status ID  Next Start           Duration  Reason                      Frequency  Repeat  Start Time           End Time             Previous End         TZ Region      TZ Offset

test_blackout                          SYSMAN      Started  4          2013-08-02 15:06:43  01:00     Hardware Patch/Maintenance  once       none    2013-08-02 15:06:43  2013-08-02 16:06:43  none                 Europe/London  +00:00

List of target which are under blackout and then stop the blackout:

[oracle@em ~]$ emcli get_blackout_targets -name="test_blackout"
Target Name                                         Target Type      Status       Status ID
has_oraexa201.host.net                              has              In Blackout  1
oraexa201.host.net                                  host             In Blackout  1
TESTDB_TESTDB1                                      oracle_database  In Blackout  1
oraexa201.host.net:3872                             oracle_emd       In Blackout  1
Ora11g_gridinfrahome1_1_oraexa201                   oracle_home      In Blackout  1
OraDb11g_home1_2_oraexa201                          oracle_home      In Blackout  1
agent12c1_3_oraexa201                               oracle_home      In Blackout  1
sbin12c1_4_oraexa201                                oracle_home      In Blackout  1
LISTENER_oraexa201.host.net                         oracle_listener  In Blackout  1
+ASM_oraexa2-cluster                                osm_cluster      In Blackout  1
+ASM4_oraexa201.host.net                            osm_instance     In Blackout  1

[oracle@em ~]$ emcli stop_blackout -name="test_blackout"
Blackout "test_blackout" stopped successfully

And now when the discovery is run again:

Run Discovery Now – Completed Successfully

I was unable to get an error initially when I set the blackout, but then got the error after restarting the EM agent.

 

22 Oct 2013 Update:

After update to 12c (described here) now meaningful error is raised when you try to discover targets during agent blackout:

Run Discovery Now failed on host oraexa201.host.net: oracle.sysman.core.disc.common.AutoDiscoveryException: Unable to run on discovery on demand.the target is currently blacked out

 

 

Categories: oracle Tags: , ,

Missing IPV6 address for local network interfaces causes service timeout

October 4th, 2012 1 comment

Year ago I installed Weblogic server + Oralce database server for a customer. Few months later my colleagues asked me whether something has changed in the environment or if something happened at the data center, because they started to see Java exceptions for unknown hostname for a web service they were calling and this was happening only from time to time. We’ve checked the firewall rules, DNS and all the stuff, but everything seemed to be working fine. The only solution by that time they came out was to add the hostname to the hosts file of the Weblogic server. These were versions of the software:
Oracle Enterprise Linux 6.2 (64bit)
Weblogic Server 10.3.5
JDK 1.6.0_31

Then six months later, problem showed up again. The new version of the application was calling another web service, which obviously was missing from the hosts file and this time I decided to investigate the problem and find out what’s really happening. After I received the email I immediately logged in to the server and fired several nslookups and ping requests to the host that was causing problems, both were successful and returned correct result.  I’ve double checked hosts file, nsswitch.conf file and all the network settings, everything was correct. Meanwhile the Weblogic server log kept getting java.net.UnknownHostException for the very same host.

Obviously the problem required different approach. I’ve found useful Java procedure to call the function getByName and in some way simulate the application behavior, webservice hostname was intentionally changed,  this is the procedure:

[root@srv tmp]# cd /tmp/
cat > DomainResolutionTest.java
java.net.InetAddress;
import java.net.UnknownHostException;
import java.io.PrintWriter;
import java.io.StringWriter;

public class DomainResolutionTest {

public static void main(String[] args) {
if (args.length == 0) args = new String[] { "sve.to" };

try {
InetAddress ip = InetAddress.getByName(args[0]);
System.out.println(ip.toString());
}catch (UnknownHostException uhx) {
System.out.println("ERROR: " + uhx.getMessage() + "\n" + getStackTrace(uhx));
Throwable cause = uhx.getCause();
if (cause != null) System.out.println("CAUSE: " + cause.getMessage());
}

}

public static String getStackTrace(Throwable t)
{
StringWriter sw = new StringWriter();
PrintWriter pw = new PrintWriter(sw, true);
t.printStackTrace(pw);
pw.flush();
sw.flush();
return sw.toString();
}

}

Then just compile the procedure and execute it:

[root@srv tmp]# javac DomainResolutionTest.java
[root@srv tmp]# java DomainResolutionTest
sve.to/95.154.250.125
[root@srv tmp]# java DomainResolutionTest
sve.to/95.154.250.125

Running the procedure several times returned the correct address and no error occurred, but looping the procedure for some time returned the exception I was looking for:

while 1>0; do java DomainResolutionTest; done > 1
^C
[root@srv tmp]# wc -l 2
2648 2
[root@srv tmp]# grep Unknown 2
java.net.UnknownHostException: sve.to
[root@srv tmp]# less 2
......
sve.to/95.154.250.125
ERROR: sve.to
java.net.UnknownHostException: sve.to
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:849)
at java.net.InetAddress.getAddressFromNameService(InetAddress.java:1202)
at java.net.InetAddress.getAllByName0(InetAddress.java:1153)
at java.net.InetAddress.getAllByName(InetAddress.java:1083)
at java.net.InetAddress.getAllByName(InetAddress.java:1019)
at java.net.InetAddress.getByName(InetAddress.java:969)
at DomainResolutionTest.main(DomainResolutionTest.java:12)

sve.to/95.154.250.125
....

From what I understand is that Java process is trying to lookup the IP address of the requested host, but first it takes all IP address of the local interfaces (eth0 and lo) including their default IPV6 address and then try to resolve these IP addresses to hostnames. Although I didn’t configured IPV6 addresses for the interfaces, they already had default ones. This is because the OS had IPV6 enabled by default and respectively the interfaces got default IPV6 addresses. During the installation I removed localhost6 (::1) record from hosts file, which later caused this error and also missing record for eth0 IP address.

The problem may be that the JVM performs both IPv6 and IPv4 queries and if the DNS server is not configured to handle IPv6 queries, the application might issue an unknown host exception. If the DNS is not configured to handle IPv6 queries properly, the application must wait for the IPv6 query to time out.  The workaround for this is to make Java use only IPV4 stack and run Java process with -Djava.net.preferIPv4Stack=true parameter. This will make Java process to prefer running in IPV4, thus avoiding the error for IPV6 look up.  Unfortunately running the above Java procedure with this parameter again returned UnknownHostException.

It looks like genuine bug with IPv6 in Java, I also saw few bugs opened at Sun regarding this Java behavior, but there was no solution. Finally after adding hostname for local interface’s IPV6 addresses in host file, the exceptions disappeared:

::1                                          localhost6
fe80::20c:29ff:fe36:4144                     srv6

So for the future installations I’ll be explicitly disabling IPV6 on the installed systems. The easiest way to do that is like this:

cat >> /etc/sysctl.conf
#disable all ipv6 capabilities on a kernel level
sysctl net.ipv6.conf.all.disable_ipv6 = 1

Regards,
Sve

Categories: linux, oracle Tags: , ,

How to run standalone Oracle APEX Listener 2.0 with Oracle 11g XE and APEX 4.1.1

August 23rd, 2012 2 comments

This is short guide on how to run standalone Oracle APEX Listener 2.0 beta with Oracle 11g XE. I’m using Oracle Enterprise Linux 5.7 for running Oracle APEX Listener and Oracle Database 11g XE with APEX 4.1.1. Although running APEX Listener standalone is not supported I’m using it to run several internal applications for company needs.

When using APEX Listener with Oracle XE, APEX won’t work properly and white screen appears when APEX is open. This is because the APEX images are stored in the XML DB repository, but APEX Listener have to be run with parameter –apex-images pointing to directory containing the images at the filesystem. To solve this I downloaded the latest patch of APEX and copied the images from the patch.

If you have another database running on the same machine, keep in mind this.

 

Install Oracle 11g XE and update Oracle APEX to latest version:

1. Download Oracle Database Express Edition 11g Release 2 for Linux x64

2. Install Oracle 11g XE:
rpm -ivh oracle-xe-11.2.0-1.0.x86_64.rpm

3. Configure Express Edition:
/etc/init.d/oracle-xe configure
Port: 1522
Password: secret

4. Update APEX to 4.1

Download APEX 4.1
cd /tmp
unzip -q apex_4.1.zip
cd apex
sqlplus / as sysdba
@apexins SYSAUX SYSAUX TEMP /i/
@apxldimg.sql /tmp/apex

5. Update APEX to version 4.1.1
Download patch set 13331096 from MOS

Disable Oracle XML DB HTTP server:

SQL> EXEC DBMS_XDB.SETHTTPPORT(0);
PL/SQL procedure successfully completed.

SQL> COMMIT;
Commit complete.

SQL> SELECT DBMS_XDB.GETHTTPPORT FROM DUAL;
GETHTTPPORT
———–
0

Run apxpatch.sql to patch the system:

SQL> @apxpatch.sql

Update the Images Directory When Running the Embedded PL/SQL Gateway:

@apxldimg.sql /tmp/patch

Commit complete.

Once the update finished do not enable Oracle XML DB HTTP server, because we’ll be using Oracle APEX Listener, which will setup next.

 

Install APEX Listener 2.0.0

1. Download Oracle APEX Listener 2.0.0 beta

2. Download and install latest JRE 1.6 version, currently latest version is 1.6.34

Unpack to /opt/jre1.6.0_34

3. Unlock and set password for apex_public_user at the Oracle XE database:
alter user APEX_PUBLIC_USER account unlock;
alter user APEX_PUBLIC_USER identified by secret;

4. Patch Oracle APEX to support RESTful  Services:
cd /oracle/apxlsnr/apex_patch/
sqlplus / as sysdba @catpatch.sql

Set passwords for both users APEX_LISTENER and APEX_REST_PUBLIC_USER.

5. Install Oracle APEX Listener:
mkdir /oracle/apxlsnr/
cd /oracle/apxlsnr/
unzip apex_listener.2.0.0.215.16.35.zip

Now this is tricky, for XE edition the images are kept in the XML DB repository, so images have to be copied from the patch to the listener home:
cp /tmp/patch/images .

6. Configure Oracle APEX Listener:
export JAVA_HOME=/opt/jre1.6.0_34
export PATH=$JAVA_HOME/bin:$PATH

Set APEX listener config dir:
java -jar apex.war configdir $PWD/config

Configure the listener:
java -jar apex.war

Once configuration is complete, listener is started. It has to be stopped and run with appropriate parameters, use Ctrl-C to stop it.

7. Finally start the listener:
java -jar apex.war standalone –apex-images /oracle/apxlsnr/images

In case you want to run it in background here’s how to do it:
nohup java -jar apex.war standalone –apex-images /oracle/apxlsnr/images > apxlsnr.log &

 

Periodically I was seeing exceptions like these:
ConnectionPoolException [error=BAD_CONFIGURATION]

Caused by: oracle.ucp.UniversalConnectionPoolException: Universal Connection Pool already exists in the Universal Connection Pool Manager. Universal Connection Pool cannot be added to the Universal Connection Pool Manager

I found that if APEX Listener is not configured with RESTful Services then these messages appeared in the log and could be safety ignored.

 

Regards,
Sve

Categories: linux, oracle Tags: ,

How I run Oracle VM 2.2 guests with custom network configuration

August 15th, 2012 No comments

Recently I was given three virtual machines running Oracle Enterprise Linux 5 and Oracle 11gR2 RAC on Oracle VM 2.2.1, copied straight from /OVS/running_pool/. I had to get these machines up and running at my lab environment, but I found hard to setup the network. I’ve spent half day in debugging without success, but finally found a workaround, which I’ll explain here.

Just a little technical notes – Oracle VM (xen) has three main setup configurations within /etc/xen/xend-config.sxp:

Bridge Networking – this configuration is configured by default and it’s simplest to configure. Using this type of networking means that the VM guest should have IP from the same network as the VM host. Another thing is that the VM guest could take advantage of DHCP, if any. The following lines should be uncommented in /etc/xen/xend-config.sxp:
(network-script network-bridge)
(vif-script vif-bridge)

Routed Networking with NAT – this configuration is most common where a private LAN must be used, for example you have a VM host running  on your notebook and you can’t get another IP from corporate or lab network. For this you have to setup private LAN and NAT the VM guests so they can access the rest of the network. The following lines should be uncommented in /etc/xen/xend-config.sxp:
(network-script network-nat)
(vif-script vif-nat)

Two-way Routed Network – this configuration requires more manual steps, but offers greater flexibility. This one is exactly the same at the second one, except the fact that VM guests are exposed on the external network. For example when VM guest make connection to external machine, its original IP is seen. The following lines should be uncommented in /etc/xen/xend-config.sxp:
(network-script network-route)
(vif-script vif-route)

Typically only one of the above can be used at one time and selection and choice depends on the network setup. For second and third configurations to work, a “route” must be added to the Default Gateway. For example if my Oracle VM host has an IP address 192.168.143.10, then on the default gateway (192.168.143.1) a route has to be added to explicitly route all connection requests to my VM guests through my VM host. Something like that:
route add -net 10.0.1.0 netmask 255.255.255.0 gw 192.168.143.10

Now back to the case itself. Each of the RAC nodes had two NICs – one for the public connections and one for the private, which is used by GI an RAC. The public network was 10.0.1.X and private 192.168.1.X. What I wanted was to run the VM guests at my lab and access them directly with IP addresses from the lab network, which was 192.168.143.X. As we know the default network configuration is to use bridged networking so I went with this one. Having the vm guests config files all I had to do was to change the first address of every guest:

From:
vif = [‘mac=00:16:3e:22:0d:04, ip=10.0.1.11, bridge=xenbr0′, ‘mac=00:16:3e:22:0d:14, ip=192.168.1.11′,]

To:
vif = [‘mac=00:16:3e:22:0d:04, ip=192.168.143.151, bridge=xenbr0′, ‘mac=00:16:3e:22:0d:14, ip=192.168.1.11′,]

This turned to be real nightmare, I’ve spent half a day looking why my VM gusts doesn’t have access to the lab network. They had access to VM host, but not to the outside world. Maybe because I’m running Oracle VM on top of VMWare, but finally I  gave up this configuration.

Thus I had to use one of the other two network configurations – Routed Networking with NAT OR Two-way Routed Network. Either case I didn’t have access to the default gateway and would not be able to put static route to my VM guests.

Here is how I solved this – to run three nodes RAC on Oracle VM Server 2.2.1, keep their original network configuration and access them with IP address from my lab network (192.168.143.X). I’ve put logical IP’s of the VM guests on the VM host using ip (ifconfig could also be used) and then using iptables change packet destination to the VM guests themselves (10.0.1.X).

1. Change Oracle VM configuration to Two-way Routed Network, comment the lines for default bridge configuration and remove comments for routed networking:
(network-script network-route)
(vif-script vif-route)

2. Configure VM host itself for forwarding:
echo 1 > /proc/sys/net/ipv4/conf/all/proxy_arp
iptables -t nat -A POSTROUTING -s 10.0.1.0 -j MASQUERADE

3. Set network alias with the IP address that you want to use for the VM guests:
ip addr add 192.168.143.151/32 dev eth0:1
ip addr add 192.168.143.152/32 dev eth0:2
ip addr add 192.168.143.153/32 dev eth0:3

4. Create iptables rules in PREROUTING chain that will redirect the request to VM guests original IPs once it receive it on the lab network IP:
iptables -t nat -A PREROUTING -d 192.168.143.151 -i eth0 -j DNAT –to-destination 10.0.1.11
iptables -t nat -A PREROUTING -d 192.168.143.152 -i eth0 -j DNAT –to-destination 10.0.1.12
iptables -t nat -A PREROUTING -d 192.168.143.153 -i eth0 -j DNAT –to-destination 10.0.1.13

5. Just untar the VM guest in /OVS/running_pool/

[root@ovm22 running_pool]# ls -al /OVS/running_pool/dbnode1/
total 26358330
drwxr-xr-x 2 root root        3896 Aug  6 17:27 .
drwxrwxrwx 6 root root        3896 Aug  3 11:18 ..
-rw-r–r– 1 root root  2294367596 May 16  17:27 swap.img
-rw-r–r– 1 root root  4589434792 May 16  17:27 system.img
-rw-r–r– 1 root root 20107128360 May 16  17:27 u01.img
-rw-r–r– 1 root root         436 Aug 6 11:20 vm.cfg

6. Run the guest:
xm create /OVS/running_pool/dbnode1/vm.cfg

Now I have a three node RAC, nodes have their original public IPs and I can access them using my lab network IPs. The mapping is like this:

Request to 192.168.143.151 –> the IP address is up on the VM host –> on the VM host iptables takes action –> packet destination IP address is changed to 10.0.1.11 –> static route is already in place at VM host routing packet to the vif interface of the VM guest.

Now I can access my dbnode1 (10.0.1.11) directly with its lab network IP 192.168.143.151.

Useful links:
http://wiki.kartbuilding.net/index.php/Xen_Networking
http://wiki.xensource.com/xenwiki/XenNetworking

Regards,
Sve

Categories: linux, oracle, virtualization Tags: ,