Archive

Posts Tagged ‘em12c’

EMDIAG Repvfy 12c kit – troubleshooting part 1

March 26th, 2014 No comments

The following blog post continue the EMDIAG repvfy kit series and will focus on how to troubleshoot and solve the problems reported by the kit.

The repository verification kit reports number of problems with our repository which we are about to troubleshoot and solve one by one. It’s important to notice that some of the problem are related so solving one problem could also solve another one.

Here is the output I’ve got for my OEM repository:

-- --------------------------------------------------------------------- --
-- REPVFY: 2014.0114     Repository: 12.1.0.3.0     23-Jan-2014 13:35:41 --
---------------------------------------------------------------------------
-- Module:                                          Test:   0, Level: 2 --
-- --------------------------------------------------------------------- --

verifyAGENTS

1004. Stuck PING jobs: 10
verifyASLM
verifyAVAILABILITY
1002. Disabled response metrics (16570376): 2
verifyBLACKOUTS
verifyCAT
verifyCORE
verifyECM
1002. Unregistered ECM metadata tables: 2
verifyEMDIAG
1001. Undefined verification modules: 1
verifyEVENTS
verifyEXADATA
2001. Exadata plugin version mismatches: 5
verifyJOBS
2001. System jobs running for more than 24hr: 1
verifyJVMD
verifyLOADERS
verifyMETRICS
verifyNOTIFICATIONS
verifyOMS
1002. Stuck PING jobs: 10
verifyPLUGINS
1003. Plugin metadata versions out of sync: 13
verifyREPOSITORY
verifyTARGETS
1021. Composite metric calculation with inconsistent dependant metadata versions: 3
2004. Targets without an ORACLE_HOME association: 2
2007. Targets with unpromoted ORACLE_HOME target: 2
verifyUSERS

I usually follow this sequence of actions when troubleshoot repository problems:
1. Verify the module with detail option. Increasing the level also might show more problems or problems related to the current one.
2. Dump the module and check for any unusual activity.
3. Check repository database alert log for any errors.
4. Check emagent logs for any errors.
5. Check OMS logs for any errors.

Troubleshoot Stuck PING jobs

Looking on the first problem reported for verifyAGENTS – Stuck PING jobs we can easily spot the relation between verifyAGENTS, verifyJOBS and verifyOMS modules where the same problem is occurring. For some reason there are ten ping jobs which are stuck and running for more than 24hrs.

The best approach would be running verify against any of these modules with the –detail option. This will show more information and eventually help analyze the problem. Running detail report for AGENTS and OMS didn’t helped and didn’t show much information related to the stuck pings jobs. However running detailed report for the JOBS we were able to identify the job_id, job_name and when the job was started:

[oracle@oem bin]$ ./repvfy verify jobs –detail

JOB_ID                           EXECUTION_ID                     JOB_NAME                                 START_TIME
-------------------------------- -------------------------------- ---------------------------------------- --------------------
ECA6DE1A67B43914E0432084800AB548 ECA6DE1A67B63914E0432084800AB548 PINGCFMJOB_ECA6DE1A67B33914E0432084800AB 03-DEC-2013 19:02:29

So we can see that the stuck job was started on 19:02 at 3rd of December and the time of check was 23rd of January.

Now we can say that there is a problem with the jobs rather than agents or oms, the problems at these two modules appeared as a result of the stuck job and we should be focusing on the JOBS module.

Running analyze against the job will show the same thing as verify with detail option, it’s usage would be appropriate if we got multiple jobs issues and want to see the details for particular one.

Dumping the job will show a lot of info from MGMT_ tables that’s useful, of particular interest are the details of the execution:

[oracle@oem bin]$ ./repvfy dump job -guid ECA6DE1A67B43914E0432084800AB548

[----- MGMT_JOB_EXEC_SUMMARY ------------------------------------------------]

EXECUTION_ID                     STATUS                           TLI QUEUE_ID                         TIMEZONE_REGION                SCHEDULED_TIME       EXPECTED_START_TIME  START_TIME           END_TIME                RETRIED
-------------------------------- ------------------------- ---------- -------------------------------- ------------------------------ -------------------- -------------------- -------------------- -------------------- ----------
ECA6DE1A67B63914E0432084800AB548 02-Running                         1                                  +00:00                         03-DEC-2013 18:59:25 03-DEC-2013 18:59:25 03-DEC-2013 19:02:29                               0

Again we can confirm that the job is still running and the next step would be to dump the execution which will show us on which step the job is waiting/hanging. That’s just an example because in my case I didn’t have any steps in my job execution:

[oracle@oem bin]$ ./repvfy dump execution -guid ECA6DE1A67B43914E0432084800AB548
[oracle@oem bin]$ ./repvfy dump step -id 739148

Checking job system health could also be useful by showing some job history, scheduled jobs and some performance metrics:

[oracle@oem bin]$ ./repvfy dump job_health

Back to our problem we may query MGMT_JOB to get the job name and confirm that’s system job run by SYSMAN:

SQL> SELECT JOB_ID, JOB_NAME,JOB_OWNER, JOB_DESCRIPTION,JOB_TYPE,SYSTEM_JOB,JOB_STATUS FROM MGMT_JOB WHERE UPPER(JOB_NAME) like '%PINGCFM%'

JOB_ID                           JOB_NAME                                                     JOB_OWNER  JOB_DESCRIPTION                                              JOB_TYPE        SYSTEM_JOB JOB_STATUS
-------------------------------- ------------------------------------------------------------ ---------- ------------------------------------------------------------ --------------- ---------- ----------
ECA6DE1A67B43914E0432084800AB548 PINGCFMJOB_ECA6DE1A67B33914E0432084800AB548                  SYSMAN     This is a Confirm EMD Down test job                          ConfirmEMDDown            2          0

We may try to stop the job using emcli and job name:

[oracle@oem bin]$ emcli stop_job -name=PINGCFMJOB_ECA6DE1A67B33914E0432084800AB
Error: The job/execution is invalid (or non-existent)

If that doesn’t work then use emdiag kit to cleanup the repository part:

./repvfy verify jobs -test 1998 -fix

Please enter the SYSMAN password:

-- --------------------------------------------------------------------- --
-- REPVFY: 2014.0114     Repository: 12.1.0.3.0     27-Jan-2014 18:18:36 --
---------------------------------------------------------------------------
-- Module: JOBS                                     Test: 1998, Level: 2 --
-- --------------------------------------------------------------------- --
-- -- -- - Running in FIX mode: Data updated for all fixed tests - -- -- --
-- --------------------------------------------------------------------- --

The repository is now ok but it will not remove the stuck thread at the OMS level. In order for the OMS to get healthy again it needs to be restarted:

cd $OMS_HOME/bin
emctl stop oms
emctl start oms

After OMS was restarted there were no stuck jobs anymore!

I’ve still wanted to know why that happened. Although there were few bugs at MOS they were no very applicable and didn’t found any of the symptoms in my case. After checking repository database alertlog I found few disturbing messages:

  Tns error struct:
Time: 03-DEC-2013 19:04:01
TNS-12637: Packet receive failed
ns secondary err code: 12532
.....
opiodr aborting process unknown ospid (15301) as a result of ORA-609
 opiodr aborting process unknown ospid (15303) as a result of ORA-609
 opiodr aborting process unknown ospid (15299) as a result of ORA-609
2013-12-03 19:07:58.156000 +00:00

I also found a lot of similar message on the target databases:

  Time: 03-DEC-2013 19:05:08
TNS-12535: TNS:operation timed out
ns secondary err code: 12560
nt main err code: 505

That pretty much matches the time when the job got stuck – 19:02:29. So I assume there was some network glitch at that time causing the ping job to stuck. The solution was simply run the repvfy with fix option and then restart the OMS service.

In case after restart the job is stuck again consider increasing the oms property oracle.sysman.core.conn.maxConnForJobWorkers.  Consider the following note if that’s the case:

EMDIAG repvfy blog series:

 

Categories: oracle Tags: , ,

OEM 12c installation fails if parallel_max_servers too high

February 21st, 2014 No comments

Just a quick post regarding OEM 12c installation where recently I had to install OEM 12c and during the repository configuration step the installation fails with error:

ORA-12801: error signaled in parallel query server P151

This was caused by a known bug which requires decreasing the number of parallel queries of the repository databases and start over the installation. The database had cpu_count set to 64 and parallel_max_servers to 270. After setting the parallel_max_servers to lower value the installation completed successfully.

For more information refer to:
EM 12c: Enterprise Manager Cloud Control 12c Installation Fails At Repository Configuration With Error: ORA-12805: parallel query server died unexpectedly (Doc ID 1539444.1)

 

Categories: linux, oracle Tags:

EMDIAG Repvfy 12c kit – basics

October 30th, 2013 No comments

Second post in the series of emdiag repvfy kit about the basics of the tool. Having the kit already installed in earlier post it is time now to get some basics before we start troubleshooting.

There are three main commands with repvfy:

  • verify – repository-wide verification
  • analyze – objects specific verification/analysis
  • dump  – dump specific repository object

As you can tell from the description, the verify is run against the repository and doesn’t require any arguments by default while analyze and dump commands require specific object to be given. To get a list of all available commands of the kit run repvfy -h1.

 

Verify command

Let’s make something clear in the begging, the verify command will run repository-wide verification against many tests which are FIRST grouped into modules and SECOND categorized in several levels. To get a list of available modules run repvfy -h4, there are more than 30 modules and I won’t go into detail for each, but most useful are – Agents, Plugins, Exadata, Repository, Targets. The list of levels can be found at the end of the post, it’s important to say that levels are cumulative and by default tests are run in level 2!

When investigate or debug a problem with the repository always start with verify command. It’s a good starting point to run verify without any arguments, it will go through all modules and give you summary if certain problems (violations) are present and also get an initial look on the health of the repository and then start debugging specific problem.

So here is how verify output looked for my OEM repository:

[oracle@oem bin]$ ./repvfy verify

Please enter the SYSMAN password:

-- --------------------------------------------------------------------- --
-- REPVFY: 2013.1008     Repository: 12.1.0.3.0     29-Oct-2013 11:30:37 --
---------------------------------------------------------------------------
-- Module:                                          Test:   0, Level: 2 --
-- --------------------------------------------------------------------- --

verifyAGENTS
verifyASLM
verifyAVAILABILITY
1002. Disabled response metrics (16570376): 2
verifyBLACKOUTS
verifyCAT
verifyCORE
verifyECM
1002. Unregistered ECM metadata tables: 2
verifyEMDIAG
verifyEVENTS
verifyEXADATA
2001. Exadata plugin version mismatches: 5
verifyJOBS
verifyJVMD
verifyLOADERS
verifyMETRICS
verifyNOTIFICATIONS
verifyOMS
verifyPLUGINS
1003. Plugin metadata versions out of sync: 13
verifyREPOSITORY
verifyTARGETS
1021. Composite metric calculation with inconsistent dependant metadata versions: 3
2004. Targets without an ORACLE_HOME association: 2
2007. Targets with unpromoted ORACLE_HOME target: 2
verifyUSERS

The verify command can also be run with -detail argument to get more details for the problem found. It will also show which test found the problem and what actions can be taken to correct the problem. That’s useful for another reason – it will print the target name and guid which can be used for further analysis using analyze and dump commands.

The command can also be run with -level argument, starting with zero for a fatal errors and increasing to nine for more minor errors and best practices, list of levels can be found at the end of the post.

 

Analyze command

Analyze command is run against specific target which can be specific either by its name or its unique identifier (guid). To get a list of supported targets run repvfy -h5. The analyze command is very similar to the verify command except it is run against specific target. Again it can be run with -level and -detail arguments, like this:

[oracle@oem bin]$ ./repvfy analyze exadata -guid 6744EED794F4CCCDBA79EC00332F65D3 -level 9

Please enter the SYSMAN password:

-- --------------------------------------------------------------------- --
-- REPVFY: 2013.1008     Repository: 12.1.0.3.0     29-Oct-2013 12:00:09 --
---------------------------------------------------------------------------
-- Module: EXADATA                                  Test:   0, Level: 9 --
-- --------------------------------------------------------------------- --

analyzeEXADATA
2001. Exadata plugin version mismatches: 1
6002. Exadata components without a backup Agent: 4
6006. Check for DB_LOST_WRITE_PROTECT: 1
6008. Check for redundant control files: 5

For that Exadata target we can see there are few more problems found with level 9 except the one found earlier about plugin version mismatch with level 2.

One of the next posts will be dedicated to troubleshooting and fixing problems in Exadata module.

 

Dump command

Dump command is used to dump all the information about specific repository object, as analyze command it expects either target name or target guid. For a list of supported targets run repvfy -h6.

I won’t show any example because it will dump all the details about that target – more than 2000 lines. If you run the dump command against the same target used in analyze you will get a ton of information  like – associated targets with this Exadata (hosts, iloms, databases, instances), list of monitoring agents, plugin version, some address details, long list of  targets alerts/warnings.

Seems rather useless because it just dumps a lot of information but actually it helped me identifying the problem I had about plugin version mismatch within the Exadata module.

 

Repository verification and object analysis levels:

0  - Fatal issues (functional breakdown)
 These test highlight fatal errors found in the repository. These errors will prevent EM from functioning
 normally and should get addressed straight away.

1  - Critical issues (functionality blocked)

2  - Severe issues (restricted functionality)

3  - Warning issues (potential functional issue)
 These tests are meant as 'warning', to highlight issues which could lead to potential problems.

4 - Informational issues (potential functional issue)
 These tests are informational only. They represent best practices, potential issues, or just areas to verify.

5 - Currently not used

6  - Best practice violations
 These test highlight discrepancies between the known best practices, and the actual implementation
 of the EM environment.

7  - Purging issues (obsolete data)
 These test highlight failures to clean up (all the) historical data, or problems with orphan data
 left behind in the repository.

8  - Failure Reports (historical failures)
 These test highlight historical errors that have occurred.

9  - Tests and internal verifications
 These tests are internal tests, or temporary and diagnostics tests added to resolve specific problems.
 They are not part of the 'regular' kit, and are usually added while debugging or testing specific issues.

 

In the next post I’ll troubleshoot and fix the errors I had within the Availability module – Disabled response metrics.

 

For more information and examples refer to following notes:
EMDIAG Repvfy 12c Kit – How to Use the Repvfy 12c kit (Doc ID 1427365.1)
EMDIAG REPVFY Kit – Overview (Doc ID 421638.1)

 

EMDIAG repvfy blog series:
  • EMDIAG Repvfy 12c kit – installation
  • EMDIAG Repvfy 12c kit – basics
  • EMDIAG Repvfy 12c kit – troubleshoot Availability module
  • EMDIAG Repvfy 12c kit – troubleshoot Exadata module
  • EMDIAG Repvfy 12c kit – troubleshoot Plugins module
  • EMDIAG Repvfy 12c kit – troubleshoot Targets module

 

Categories: oracle Tags: , ,

Why my EM12c is giving Metric evaluation error for Exadata cell targets?

October 25th, 2013 1 comment

As part of my Cloud Control journey I encountered a strange problem where I got the following error for few Exadata Storage Server (cell) targets:

Metric evaluation error start - oracle.sysman.emSDK.agent.fetchlet.exception.FetchletException: em_error=Failed to execute_exadata_response.pl ssh -q -o ConnectTimeout=60 -o BatchMode=yes -o StrictHostKeyChecking=no -o PreferredAuthentications=publickey -i /home/oracle/.ssh/id_dsa -l cellmonitor 10.141.8.68 cellcli -xml -e ' list cell attributes msStatus ':

Another symptom is that I received two mails from OEM, one saying that the cell and its services are up:

EM Event: Clear:exacel05.localhost.localdomain - exacel05.localhost.localdomain is Up. MS Status is RUNNING and Ping Status is SUCCESS.

and another one saying there is Metric evaluation error for the same target:

EM Event: Critical:exacel05.localhost.localdomain - Metric evaluation error start - oracle.sysman.emSDK.agent.fetchlet.exception.FetchletException: em_error=Failed to execute_exadata_response.pl ssh -q -o ConnectTimeout=60 -o BatchMode=yes -o StrictHostKeyChecking=no -o PreferredAuthentications=publickey -i /home/oracle/.ssh/id_dsa -l cellmonitor 10.141.8.68 cellcli -xml -e ' list cell attributes msStatus ':

I have to say that the error didn’t came up by itself, but it manifested after I had to redeploy the Exadata plugin on few agents. If you ever had to do this you would know that before removing the plugin from an agent you need to make sure the agent is not primary monitoring agent for Exadata  targets. In my case few of the agents were Monitoring Agents for the cells and I had to swap them with the Backup Monitoring Agent so I would be able to redeploy the plugin on the primary monitoring agent.

After I redeployed the plugin, I tried to revert back the initial configuration but for some reason the configuration messed up and I ended up with different agents monitoring different cell targets from what was at the beginning.

It turns out that one of the monitoring agents wasn’t able to connect to the cell and that’s why I got the email notifications and the Metric evaluation errors for the cells. Although that’s not a problem it’s quite annoying to receive such alerts and have all these targets with Metric collection error icons in OEM or having these targets reported with status Down.

Let’s first check which are the monitoring agents for that cell target from the OEM repository:

SQL> select target_name, target_type, agent_name, agent_type, agent_is_master
from MGMT$AGENTS_MONITORING_TARGETS
where target_name = 'exacel05.localhost.localdomain';

TARGET_NAME                      TARGET_TYPE     AGENT_NAME                         AGENT_TYPE AGENT_IS_MASTER
-------------------------------- --------------- ---------------------------------- ---------- ---------------
exacel05.localhost.localdomain   oracle_exadata  exadb03.localhost.localdomain:3872 oracle_emd               0
exacel05.localhost.localdomain   oracle_exadata  exadb02.localhost.localdomain:3872 oracle_emd               1

Looking on the cell secure log we can see that one of the monitoring agents wasn’t  able to connect because of failed publickey authentication:

Oct 23 11:39:54 exacel05 sshd[465]: Connection from 10.141.8.65 port 14594
Oct 23 11:39:54 exacel05 sshd[465]: Failed publickey for cellmonitor from 10.141.8.65 port 14594 ssh2
Oct 23 11:39:54 exacel05 sshd[466]: Connection closed by 10.141.8.65
Oct 23 11:39:55 exacel05 sshd[467]: Connection from 10.141.8.66 port 27799
Oct 23 11:39:55 exacel05 sshd[467]: Found matching DSA key: cf:99:0a:37:1a:e5:84:dc:a8:8a:b9:6f:0c:fd:05:c5
Oct 23 11:39:55 exacel05 sshd[468]: Postponed publickey for cellmonitor from 10.141.8.66 port 27799 ssh2
Oct 23 11:39:55 exacel05 sshd[467]: Found matching DSA key: cf:99:0a:37:1a:e5:84:dc:a8:8a:b9:6f:0c:fd:05:c5
Oct 23 11:39:55 exacel05 sshd[467]: Accepted publickey for cellmonitor from 10.141.8.66 port 27799 ssh2
Oct 23 11:39:55 exacel05 sshd[467]: pam_unix(sshd:session): session opened for user cellmonitor by (uid=0)

That’s confirmed by checking ssh authorized_keys file, which also confirms which were initially configured monitoring agents:

 [root@exacel05 .ssh]# grep exadb /home/cellmonitor/.ssh/authorized_keys | cut -d = -f 2
oracle@exadb03.localhost.localdomain
oracle@exadb04.localhost.localdomain

Another way to check which monitoring agent were configured initially is to check the snmpSubscriber attribute for that specific cell:

[root@exacel05 ~]# cellcli -e list cell attributes snmpSubscriber
((host=exadb03.localhost.localdomain,port=3872,community=public),(host=exadb04.localhost.localdomain,port=3872,community=public))

It’s obvious that exadb02 shouldn’t be monitoring this target but it should be exadb04 instead. I believe that when I redeployed the Exadata plugin this agent wasn’t eligible to monitor Exadata targets any more and was replaced with another one but that’s just a guess.

There are two solutions for that problem:

1. Move (relocate) target definition and monitoring to the correct agent:

I wasn’t able to find a way to do that through OEM Console and for that purpose I used emcli. Based on MGMT$AGENTS_MONITORING_TARGETS query and snmpSubscriber attribute I was able to find which agent was configured initially and which have to be removed.  Then I used emcli to relocate the monitoring agent for that target to the correct agent, the one which was configured initially:

[oracle@oem ~]$ emcli relocate_targets -src_agent=exadb02.localhost.localdomain:3872 -dest_agent=exadb04.localhost.localdomain:3872 -target_name=exacel05.localhost.localdomain -target_type=oracle_exadata -copy_from_src
Moved all targets from exadb02.localhost.localdomain:3872 to exadb04.localhost.localdomain:3872

2. Reconfigure the cell to use the new monitoring agent:

Add the current monitoring agent ssh publickey into the authorized_keys of the cell:

Place the oracle user DSA public key (/home/oracle/.ssh/id_dsa.pub) from exadb02 into exacel05:/home/cellmonitor/.ssh/authorized_keys

and also change the cell snmpSubscriber attribute:

[root@exacel05~]# cellcli -e "alter cell snmpSubscriber=((host='exadb03.localhost.localdomain',port=3872,community=public),(host='exadb02.localhost.localdomain',port=3872,community=public))"
Cell exacel05 successfully altered
[root@exacel05~]# cellcli -e list cell attributes snmpSubscriber
((host=exadb03.localhost.localdomain,port=3872,community=public),(host=exadb02.localhost.localdomain,port=3872,community=public))

After that the status at OEM for the Exadata Storage Server (cell) target became up and also the metrics were fine now.

 

Categories: oracle Tags: , , , ,

EMDIAG Repvfy 12c kit – installation

October 21st, 2013 No comments

Recently I’m doing a lot of OEM stuff for a customer and I’ve decided to tidy and clean up a little bit. The OEM version was 12cR2 and it was used for monitoring few Exadata’s and had around 650 targets. I planned to upgrade OEM to 12cR3, upgrade all agents and plugins, promote any un-promoted targets and delete any old/stale targets, at the end of the day I wanted up to date version of OEM and up to date information about the monitored targets.

Upgrade to 12cR3 went easy and smooth, described here in details, upgrade of agents and plugins went fairly easy as well. After some time everything was looking fine, but I wanted to be sure I didn’t missed anything so from one thing to another I found EMDIAG Repository Verification utility. It’s something I heard Julian Dontcheff mentioning for the first time on one of the BGOUG conferences and it’s something I was always looking to try.

So in series of posts I will describe installation of emdiag kit and how I fixed some of the problems I faced with my OEM repository.

What is EMDIAG kit

Basically EMDIAG consists of three kits:
– Repository verification (repvfy) – extracts data from the repository and run series of tests against the repository to help diagnose problems with OEM
– Agent verification (agtvfy) – troubleshoot problems with OEM agents
– OMS verification (omsvfy) – troubleshoot problems with the OEM management repository service

In this and following posts I’ll be referring to the first one – repository verification kit.

EMDIAG repvfy kit consist of set of tests which are run against OEM repository to help EM administrator to troubleshoot, analyze and help resolve OEM related problems.

The kit uses a shell and perl script as wrapper and a lot SQL scripts to run different tests and collect information from the repository.

It’s good to know that the emdiag kit has been around for some quite long time and it’ is available also for Grid Control 10g and 11g and DB Control 10g and 11g. This posts refer to the EMDIAG Repvfy 12c kit which can be installed only in a Cloud Control Management Repository 12c. Also emdiag kit repvfy 12c will be included in the RDA Release 4.30.

Apart from finding and resolving specific problem with emdiag kit it would be a good practice to run it at least once per week/month to check for any new problems that are reported.

Installing EMDIAG repvfy kit

Installation if pretty simple and straight forward, first download EMDIAG repvfy 12c kit from following MOS note:
EMDIAG REPVFY Kit for Cloud Control 12c – Download, Install/De-Install and Upgrade (Doc ID 1426973.1)

Just set your ORACLE_HOME to the database hosting the Cloud Control Management Repository and create new directory emdiag where the tool will be unziped:

[oracle@em ~]$ . oraenv
ORACLE_SID = [oracle] ? GRIDDB
The Oracle base for ORACLE_HOME=/opt/oracle/product/11.2.0/db_1 is /opt/oracle

[oracle@em ~]$ cd $ORACLE_HOME
[oracle@em db_1]$ mkdir emdiag
[oracle@em db_1]$ cd emdiag/
[oracle@em emdiag]$ unzip -q /tmp/repvfy12c20131008.zip
[oracle@em emdiag]$ cd bin
[oracle@em bin]$ ./repvfy install

Please enter the SYSMAN password:

...
...

COMPONENT            INFO
 -------------------- --------------------

EMDIAG Version       2013.1008
EMDIAG Edition       2
Repository Version   12.1.0.3.0
Database Version     11.2.0.3.0
Test Version         2013.1015
Repository Type      CENTRAL
Verify Tests         496
Object Tests         196
Deployment           SMALL

[oracle@em ~]$

And that’s it, emdiag kit for repository verification is now installed and we can start digging into OEM repository. With next post we’ll get some knowledge on the basics and commands used for verification and diagnostics.

EMDIAG repvfy blog series:
  • EMDIAG Repvfy 12c kit – installation
  • EMDIAG Repvfy 12c kit – basics
  • EMDIAG Repvfy 12c kit – troubleshoot Availability module
  • EMDIAG Repvfy 12c kit – troubleshoot Exadata module
  • EMDIAG Repvfy 12c kit – troubleshoot Plugins module
  • EMDIAG Repvfy 12c kit – troubleshoot Targets module

 

Categories: oracle Tags: , ,

Upgrade to Oracle Enterprise Manager Cloud Control 12c Release 3 (12.1.0.3)

August 15th, 2013 4 comments

Just a quick wrap up on EM12cR3 upgrade. I have to say that I was pleasantly surprised that everything went so smooth. I didn’t expected anything else, but with so how many products and components we have there I got few things in mind. The version I got was already upgraded to 12.1.0.2 so it was really easy for me to run the upgrade.

Things to watch out for:
– You need to be already running OEM version 12.1.0.2 to be able to upgrade to 12.1.0.3. If not you must apply BP1 to your 12.1.0.1 installation and then patch to 12.1.0.3. Remember to patch the agents as well.
– The upgrade to 12.1.0.3 is out-of-place upgrade, so you need to point out to new middleware home and you’ll need additional 15 GB for the installation.
– The installation takes between 1-2 hours to complete depending on you machine power.
– I didn’t stopped any of the agents during the upgrade.
– After the upgrade all OMS components were started automatically.

Here is what I’ve done:
1. Definitely take backup of the middleware home and database as well. You don’t want to end up removing the agents and reinstalling the OMS. I had 400+ targets and failure wasn’t an option. For the middleware home I used simple tar and RMAN for the repository database.

2. Stop the OMS and other components:

cd $ORACLE_HOME/bin/
cd bin
./emctl stop oms -all

3. It’s required that the EMKey be copied to the repository prior upgrade, if you miss that the installer will kindly remind you. There is also note  in the documentation:

$OMS_HOME/bin/emctl config emkey -copy_to_repos_from_file -repos_host [repository_host] -repos_port [port] -repos_sid [sid] -repos_user [username] -emkey_file $OMS_HOME/sysman/config/emkey.ora

4. The only command I run during the upgrade was a simple grant, everything else was done by the installer:

 SQL> grant execute on dbms_random to dbsnmp;

Grant succeeded.

5. Once the upgrade is complete:
– upgrade all the agents from the console:

Setup -> Manage Cloud Control -> Upgrade agents

– as a post upgrade step the old agent homes should be deleted:

Setup -> Manage Cloud Control -> Upgrade agents -> Post Agent Upgrade Tasks

– and secure the EMKey:

emctl config emkey -remove_from_repos

The upgrade guide is very useful, consider it before doing the upgrade.

Also have a look on New Features In Oracle Enterprise Manager Cloud Control 12.1.0.3 where the most notable thing is support of Oracle Database 12c and its new features, but there are plenty of other new features and improvements as well.

Categories: oracle Tags: , ,

Oracle EM auto discovery fails if the host is under blackout

August 5th, 2013 No comments

During Exadata project I had to put some order and tidy up the Enterprise Manager targets. I’ve decided to discover all targets, promote the one which are missing and delete old/stale one. I’ve got strange error and decided to share it if someone hit it. I’m running Oracle Enterprise Manager Cloud Control 12c Release 2.

When you try to run auto discovery from the console for a host you almost immediately get the following error:

Run Discovery Now failed on host oraexa201.host.net: oracle.sysman.core.disc.common.AutoDiscoveryException: Unable to run on discovery on demand.RunCollection: exception occurred: oracle.sysman.gcagent.task.TaskPreExecuteCheckException: non-existent, broken, or not fully loaded target

When review the agent log the following exception could be seen:

tail $ORACLE_HOME/agent/sysman/log/gcagent.log

2013-07-16 11:26:06,229 [34:2C47351F] INFO - >>> Reporting exception: oracle.sysman.emSDK.agent.client.exception.NoSuchMetricException: the DiscoverNow metric does not exist for host target oraexa201.host.net (request id 1) <<<
 oracle.sysman.emSDK.agent.client.exception.NoSuchMetricException: the DiscoverNow metric does not exist for host target oraexa201.host.net

Got another error message during my second (test) run:

2013-08-02 15:20:47,155 [33:B3DBCC59] INFO - >>> Reporting response: RunCollectionResponse ([DiscoverTargets : host.oraexa201.host.net oracle.sysman.emSDK.agent.client.exception.RunCollectionItemException: Metric evaluation failed : RunCollection: exception occurred: oracle.sysman.gcagent.task.TaskPreExecuteCheckException: non-existent, broken, or not fully loaded target @ yyyy-MM-dd HH:mm:ss,SSS]) (request id 1) <<<

Although emctl status agent shows that last successful heartbeat and upload are up to date,  still you cannot discover targets on the host.

This is caused by the fact that the host is under BLACKOUT!

1. Through the console end the blackout for that host:

Go to Setup -> Manager Cloud Control -> Agent, find the agent for which you experience the problem and click on it. Then you clearly can see that the status of the
agent is “Under Blackout”. Simply select Agent drop down menu – > Control and then End Blackout.

2. Using emcli, first login, list blackout and then stop the blackout:

[oracle@em ~]$ emcli login -username=sysman
Enter password

Login successful

[oracle@em ~]$ emcli get_blackouts
Name                                   Created By  Status   Status ID  Next Start           Duration  Reason                      Frequency  Repeat  Start Time           End Time             Previous End         TZ Region      TZ Offset

test_blackout                          SYSMAN      Started  4          2013-08-02 15:06:43  01:00     Hardware Patch/Maintenance  once       none    2013-08-02 15:06:43  2013-08-02 16:06:43  none                 Europe/London  +00:00

List of target which are under blackout and then stop the blackout:

[oracle@em ~]$ emcli get_blackout_targets -name="test_blackout"
Target Name                                         Target Type      Status       Status ID
has_oraexa201.host.net                              has              In Blackout  1
oraexa201.host.net                                  host             In Blackout  1
TESTDB_TESTDB1                                      oracle_database  In Blackout  1
oraexa201.host.net:3872                             oracle_emd       In Blackout  1
Ora11g_gridinfrahome1_1_oraexa201                   oracle_home      In Blackout  1
OraDb11g_home1_2_oraexa201                          oracle_home      In Blackout  1
agent12c1_3_oraexa201                               oracle_home      In Blackout  1
sbin12c1_4_oraexa201                                oracle_home      In Blackout  1
LISTENER_oraexa201.host.net                         oracle_listener  In Blackout  1
+ASM_oraexa2-cluster                                osm_cluster      In Blackout  1
+ASM4_oraexa201.host.net                            osm_instance     In Blackout  1

[oracle@em ~]$ emcli stop_blackout -name="test_blackout"
Blackout "test_blackout" stopped successfully

And now when the discovery is run again:

Run Discovery Now – Completed Successfully

I was unable to get an error initially when I set the blackout, but then got the error after restarting the EM agent.

 

22 Oct 2013 Update:

After update to 12c (described here) now meaningful error is raised when you try to discover targets during agent blackout:

Run Discovery Now failed on host oraexa201.host.net: oracle.sysman.core.disc.common.AutoDiscoveryException: Unable to run on discovery on demand.the target is currently blacked out

 

 

Categories: oracle Tags: , ,

Installing Oracle Enterprise Manager Cloud Control 12 on OEL 6.1

October 7th, 2011 2 comments

Few days ago Oracle announced the release of Oracle Enterprise Manager Cloud Control 12c. I tried to summarize most of the information in my post so I’ll not discuss any details here, but I’ll go only with few details regarding EM12c installation.

For the purpose I have setup a VMWare virtual machine with 2 CPUs, 4GB RAM and 32 GB HDD, one network interface. Installed Oracle Enterprise Linux 6.1 (64 bit) with following parameters:

  • Perform custom disk layout, I dedicated 4GB for swap and the rest for the root (/) file system and formatted it with ext4.
  • Perform default installation, needed packages will be installed later.
  • Set hostname, timezone and root password.
  • After installation disable the firewall and most of some of the services, like IPV6.

After installation the network adapter won’t be available that why you have to install several packages and then compile the VMWare tools. Insert the installation DVD/ISO and install the following packages:

mount /dev/cdrom /mnt
rpm -ivh -ivh gcc-4.4.5-6.el6.x86_64.rpm cloog-ppl-0.15.7-1.2.el6.x86_64.rpm cpp-4.4.5-6.el6.x86_64.rpm glibc-devel-2.12-1.25.el6.x86_64.rpm
glibc-headers-2.12-1.25.el6.x86_64.rpm kernel-uek-headers-2.6.32-100.34.1.el6uek.x86_64.rpm ppl-0.10.2-11.el6.x86_64.rpm
mpfr-2.4.1-6.el6.x86_64.rpm kernel-uek-devel-2.6.32-100.34.1.el6uek.x86_64.rpm
umount /dev/cdrom

Then disconnect the drive and from the console go to VM->Guest->Install/Upgrade VMWare Tools, then install the guest additions:

cp /mnt/VMwareTools-8.3.2-257589.tar.gz /tmp
umount /mnt
cd /tmp
tar xfz VMwareTools-8.3.2-257589.tar.gz
cd vmware-tools-distrib
./vmware-install.pl

At this point you should be able to configure the network interfaces.

Before starting the installation, download the packages from OTN and transfer them to the server. The installation consist of two zip packages, which are 5.5GB total, but this includes Oracle Weblogic Server 10.3.5, which is installed by default from the wizard.

 

Oracle Enterprise Manager Cloud Control 12c installation prerequisites

For the installation of Enterprise Manager Cloud Control I’m following the documentation:
Oracle® Enterprise Manager Cloud Control Basic Installation Guide 12c Release 1 (12.1.0.1)

1. From Oracle Database 10.2.0.5 onwards, all versions are certified for Management Repository. The last two releases 11.2.0.2 and 11.2.0.3 do not need additional patches for it to be configured successfully. For the rest of the version additional patches are needed, refer to MOS for more information.

Except the support for Management Repository few more parameters are needed to be set. They could be set before or after the installation. For setting database initialization parameters refer to Table-6 or Table-7 from Appendix A at the documentation.

Once you are ready you could run the EM Prerequisite Kit which is run by the wizard during the installation.

2. According to the Oracle documentation for small environment, you need following servers parameters:
For the OMS: 2 CPUs, 4 GB RAM and 7 GB space excluding the installation which is 5.5 GB.
For the Management Repository: 2 CPUs, 2 GB RAM and 50 GB space.

3. Packages and kernel parameters required for OMS:
The following packages should be installer, either from ISO or from public yum server:

yum install make.x86_64 binutils.x86_64 libaio.x86_64 glibc-common.x86_64 libstdc++.x86_64 sysstat.x86_64 glibc-devel.i686 glibc-devel.x86_64

The shmmax kernel parameter should be set to value bigger than 4GB. In OEL 6.1 this parameter is far beyond and it’s set to 64GB. Its current value could be retrieved by following command:

cat /proc/sys/kernel/shmmax

4. Create group and user for the installation of Enterprise Manager 12c
The installation could not be done by root and oracle user has to be created. I’m using the same group id and user id as they would be created by oracle-validated package (which is not yet available for OEL 6.x).

groupadd -g 54321 oinstall
useradd -u 54321 -g oinstall -s /bin/bash -d /home/oracle -m oracle
passwd oracle

5. Configure limits.conf file:
The following two parameter has to be set in /etc/security/limits.conf file
oracle soft nofile 4096
oracle hard nofile 4096

 

Oracle Enterprise Manager Cloud Control 12c installation

Proceed with default installation and following the installation wizard.

If the wizard gives you warning at “Checking whether required GLIBC installed on system” although you have installed all the prerequisites you could ignore the warning. The installer is checking whether the package glibc-devel.i386 is installed, but you have already installed glibc-devel.i686.

Supply the repository details and please be sure to check whether the database control doesn’t exists. Otherwise you’ll get an error after supplying the database credentials to drop the database control of the repository database:
$ORACLE_HOME/bin/emca -deconfig dbcontrol db -repos drop -SYS_PWD <sys_password> -SYSMAN_PWD <sysman_password>

Once the installation is complete you’ll get a screen with installation summary and details how to access the console.

Then you could login and select default home page. This is how the console look like:

 

Meanwhile I just saw two useful installation guides:

Regards,
Sve

Categories: linux, oracle Tags: , , ,

Oracle announces Enterprise Manager Cloud Control 12c

October 4th, 2011 No comments

Yesterday at Open World, Oracle announced the release of Enterprise Manager Cloud Control 12c release 1 (version 12.1.0.1), where c stand for cloud. EM12c is a management solution providing centralised monitoring, administration, and lifecycle management functionality for the complete IT infrastructure.

A lot of organisations deploy private cloud environments to offer great flexibility to business users and meet SLAs and security requirements. The Oracle response to these demands is Oracle Enterprise Manager Cloud Control 12c which offers complete life cycle cloud management – setup, deliver and manage clouds.

Once deployed IT must first setup the cloud infrastructure by defining share storage pools and compute resources. Administrators could also design a catalogue of virtual machines, assemblies, databases and applications and publish it.

It allows administrators to monitor usage, setup metering and chargeback for users, manage deployed applications, manage assets, link applications to MOS. It also allows quick to identify, diagnose and resolve incidents from a single console.

 Keynotes:

  • The press release, Oracle Unveils Oracle Enterprise Manager 12c.
  • Short flash presentation of Cloud Management with Oracle Enterprise Manager.
  • Oracle Enterprise Manager Cloud Control 12c Video Series are available here.
  • Meanwhile Oracle Learning is adding more videos to YouTube and they could be found here.
  • The full installer (OMS, Agent and Repository) is available for Linux x86-64 (64-bit) and could be download from here.
  • The Oracle Management Agent software can be downloaded using the Self Update feature within the Enterprise Manager Cloud Control console or use Self Update feature in offline mode then manually download the sar files available here.
  • Documentation is available here, thanks to Gokhan Atil for the link.
  • Installation requires Oracle Weblogic Server 10.3.5, which is already included in the package and installed by default.
  • Installation requires certified  Oracle Database on which Oracle Management Repository will be configured. The certified versions of Oracle Database are from 10.2.0.5 onwards. The EM12c itself could manage databases from 9.2.0.8 onwards.
  • You could upgrade your EM 10g GC Release 5 (10.2.0.5.0) or EM 11g GC Release 1 (11.1.0.1.0) to Enterprise Manager Cloud Control 12c (12.1.0.1).
  • I’m still unable to find any MOS notes regarding EM12c – few notes already appeared at MOS, refer to last update for more information.

 

UPDATE 1:

  • What’s New in Enterprise Manager 12c Install, here.
  • Documentation library is now available here.

 UPDATE 2:

  • Enterprise Manager Grid Control and Database Control Certification with 11g R2 Database [ID 1266977.1]
  • If repository database is less than 11.2.0.2, additional packages has to be applied. Refer at MOS Certification for more information.

UPDATE 3:

  • How to Install Enterprise Manager Cloud Control 12.1.0.1 (12c) [ID 1359176.1]
  • EM12c: How to install Enterprise Manager Cloud Control 12c Agent [ID 1360183.1]
  • Master Index for Cloud Control Agent Installation, Upgrade and Patching [ID 1363767.1]
  • FAQ: Enterprise Manager Cloud Control 12c Install / Upgrade Frequently Asked Questions [ID 1363863.1]
  • Master Index for Cloud Control Oracle Management Service (OMS) and Repository Installation, Upgrade and Patching [ID 1363769.1]

Regards,
Sve

Master Index for Cloud Control Oracle Management Service (OMS) and Repository Installation, Upgrade and Patching [ID 1363769.1]

Categories: oracle Tags: ,