Installing OEM13c management agent in silent mode

November 1st, 2016 3 comments

For many customers I work with SSH won’t be available between the OEM and monitoring hosts. Therefore you cannot push the management agent on to the host from the OEM console. Customer might have to raise CR to allow SSH between the hosts but this might take a while and it’s really unnecessary.

In that case the management agent has to be installed in silent mode. That is when the agent won’t be pushed from OEM to the host but pulled by the host and installed. There are thrее ways to do that – using AgentPull script, agentDeploy script or using RPM file.

When using AgentPull script, you download a script from the OMS and then run it. It will download the agent package and install it on the local host. Using the agentDeploy script is very similar but to obtain it you use EMCLI. The third method of using RPM file is similar – using EMCLI you download RPM file and install it on the system. These methods require HTTP/HTTPS access to the OMS, AgentDeploy and RPM file also require ELICLI to be installed. For that reason I always use AgentPull method since it’s quicker and really straight forward. Another benefit of using AgentPull method is that if you don’t have HTTP/HTTPS access to the OEM you can simply copy and paste the script.

Download a script from the OEM first, use curl or wget. The monitoring hosts usually don’t have HTTP access but many of them do so by using HTTP proxy. Download the script and make it executable:

curl "https://oem13c.local.net:7802/em/install/getAgentImage" --insecure -o AgentPull.sh
chmod +x AgentPull.sh

If a proxy server is not available then you can simply copy and paste the script, the location of the script on the OEM server is:

/opt/app/oracle/em13c/middleware/oms/install/unix/scripts/AgentPull.sh.template

Make sure you edit the file change the oms host and oms port parameters.

Check what are the available platforms:

[oracle@exa01db01 ~]$ ./AgentPull.sh -showPlatforms

Platforms    Version
Linux x86-64    13.1.0.0.0

Run the script by suppling the sysman password, platform you want to install, agent registration password and the agent base directory:

[oracle@exa01db01 ~]$ ./AgentPull.sh LOGIN_USER=sysman LOGIN_PASSWORD=welcome1 \
PLATFORM="Linux x86-64" AGENT_REGISTRATION_PASSWORD=welcome1 \
AGENT_BASE_DIR=/u01/app/oracle/agent13c

It takes less than two minutes to install the agent and then at the end you’ll see the following messages:

Agent Configuration completed successfully
The following configuration scripts need to be executed as the "root" user. Root script to run : /u01/app/oracle/agent13c/agent_13.1.0.0.0/root.sh
Waiting for agent targets to get promoted...
Successfully Promoted agent and its related targets to Management Agent

Now login as root and run the root.sh file.

To be honest, I use this method regardless of the circumstances, it’s so much easier and faster.

 

Categories: oracle Tags:

Exadata memory configuration

October 28th, 2016 No comments

Read this post if your Exadata compute nodes have 512/768GB of RAM or you plan to upgrade to the same.

There has been a lot of information about hugepages and I wouldn’t go into too much details. For efficiency, the (x86) CPU allocates RAM by chunks (pages) of 4K bytes and those pages can be swapped to disk. For example, if your SGA allocates 32GB this will take 8388608 pages and given that Page Table Entry consume 8bytes that’s 64MB to look-up. Hugepages, on the other hand, are 2M. Pages that are used as huge pages are reserved inside the kernel and cannot be used for other purposes.  Huge pages cannot be swapped out under memory pressure, obviously there is decreased page table overhead and page lookups are not required since the pages are not subject to replacement. The bottom line is that you need to use them, especially now with the amount of RAM we get nowadays.

For every new Exadata deployment I usually set the amount of hugepages to 60% of the physical RAM:
256GB RAM = 150 GB (75k pages)
512GB RAM = 300 GB (150k pages)
768GB RAM = 460 GB (230k pages)

This allows databases to allocate SGA from the hugepages. If you want to allocate the exact number of hugepages that you need, Oracle has a script which will walk through all instances and give you the number of hugepages you need to set on the system, you can find the Doc ID in the reference below.

This also brings important point – to make sure your databases don’t allocate from both 4K and 2M pages make sure the parameter use_large_pages is set to ONLY for all databases. Starting with 11.2.0.3 (I think) you’ll find hugepages information in the alertlog when database starts:

************************ Large Pages Information *******************
Per process system memlock (soft) limit = 681 GB

Total Shared Global Region in Large Pages = 2050 MB (100%)

Large Pages used by this instance: 1025 (2050 MB)
Large Pages unused system wide = 202863 (396 GB)
Large Pages configured system wide = 230000 (449 GB)
Large Page size = 2048 KB
********************************************************************

 

Now there is one more parameter you need to change if you deploy or upgrade Exadata with 512/768GB of RAM. That is the total amount of shared memory, in pages, that the system can use at one time or kernel.shmall. On Exadata, this parameter is set to 214G by default which is enough if your compute nodes have only 256GB of RAM. If the sum of all databases SGA memory is less than 214GB that’s ok but the moment you try to start another database you’ll get the following error:

Linux-x86_64 Error: 28: No space left on device

For that reason, if you deploy or upgrade Exadata with 512G/768GB of physical RAM make sure you upgrade kernel.shmall  too!

Some Oracle docs suggest this parameter should be set to the half of the physical memory, other suggest it should be set to the all available memory. Here’s how to calculate it:

kernel.shmall = physical RAM size / pagesize

To get the pagesize run getconf PAGE_SIZE on the command prompt. You need to set shmall to at least match the size of the hugepages – because that’s where we’d allocate SGA memory from. So if you run Exadata with 768G of RAM and have 460 GB of hugepages you’ll set shmall to 120586240 (460GB / 4K pagesize).

Using HUGEPAGES does not alter the calculation for configuring shmall!

Reference:
HugePages on Linux: What It Is… and What It Is Not… (Doc ID 361323.1)
Upon startup of Linux database get ORA-27102: out of memory Linux-X86_64 Error: 28: No space left on device (Doc ID 301830.1)

Categories: oracle Tags:

Speaking at BGOUG and UKOUG

October 17th, 2016 No comments

It’s my pleasure to be speaking at BGOUG and UKOUG again this year.

The coming Wednesday 19th Oct, I’ll be speaking at the UKOUG Systems SIG event here in London (agenda). I’ll talk about Exadata implementations I did last year and issues I encountered. Also, things you need to keep in mind when you plan to extend the system, attach it to ZFS Storage Appliance or Exalytics.

Next is all time my favorite user group conference BGOUG. It’s held in Pravetz, Bulgaria between 11-13 November. With an excellent line of speakers is one not to miss (agenda). I’ll be speaking on Saturday at 10:00 about Protecting single instance databases with Oracle Clusterware 12c. In case you don’t have RAC, RAC One Node or 3rd party cluster licenses but you still need high availability for your database. I’ll go through the clusters basics, the difference between single instance, RAC and RAC One Node and then more technical details around the implementation of a single instance failover cluster.

Finally, it’s the UKOUG Tech 16 with its massive 14 streams of sessions between 5-7 December and speakers from around the world (agenda). I’ll be speaking on Tuesday 11:35 about Exadata extension use cases. I’ll talk about the Exadata extension I did and what to keep in mind if you plan one. In particular extension of a quarter rack to an eighth rack, expansion of Exadata with more compute nodes or storage cell and extension of X3-8 two-rack configuration with another X4-8 rack.

I’d like to thank my company (Red Stack) for the support and BGOUG and UKOUG committees for accepting my sessions.

See you there!

 

Categories: oracle Tags: ,

OTN Appreciation Day: Oracle Data Guard Fast-Start Failover

October 11th, 2016 No comments

Thank you, Tim, for the great idea.

There are so many cool database features one could spend weeks blogging about them.

A feature which I like very much is Oracle DataGuard Fast-Start Failover, FSFO for short.

Oracle DataGuard Fast-Start Failover was one of the many new features introduced in Oracle Database 10.2. It’s an addition to the already available DataGuard option to maintain standby databases. DataGuard FSFO is a feature that automatically, quickly, and reliably fails over to a designated, synchronized standby database in the event of loss of the production database, without requiring manual intervention.

In FSFO configuration there are three participants – primary database, standby database and an observer and they follow a very simple rule – whichever two can communicate with each other will determine the outcome of fast-start failover. The observer usually runs on a third machine, requires only Oracle client and will continuously monitor the primary and standby databases for possible failure conditions.

FSFO solves the problem we used to have with clusters before – a “split brain” scenario where after a failure of the connection between the cluster nodes we end up having two primary databases. FSFO also gives you the option to establish an acceptable time limit (in seconds) that the designated standby is allowed to fall behind the primary database (in terms of redo applied), beyond which time a fast-start failover will not be allowed.

Oracle DataGuard Fast-Start Failover can be used only in a broker configuration in either maximum availability mode or maximum performance mode.

I don’t have post on FSFO (yet) but here are the links to the documentation:

Oracle Database 12.1 Data Guard Concepts and Administration

Oracle Database 12.1 Data Guard Broker

Oracle Database 12.1 Fast-Start Failover

Categories: oracle Tags: ,

How to enable Exadata Write-Back Flash Cache

October 10th, 2016 No comments

Yes, this is well-known and the process has been described in Exadata Write-Back Flash Cache – FAQ (Doc ID 1500257.1) but what the note fails to make clear is that you do NOT have to restart cell services anymore hence resync the griddisks!

I had to enable the WBFC many times before and every time I’d restart the cell services, as note suggests. Well, this is not required anymore, starting with 11.2.3.3.1 it is no longer necessary to shut down the cellsrv service on the cells when changing the flash cache mode. This is not big deal if you deploy the Exadata just now but it makes enabling/disabling WBFC for existing systems quicker and much easier.

The best way to do that is to use the script that Oracle has provided – setWBFC.sh. It will do all the work for you – pre-checks and changing the mode, either rolling or non-rolling.

Here are the checks it does for you:

  • Storage cells are valid storage nodes running at least 11.2.3.2.1 or later across all cells.
  • Griddisks status is ONLINE across all cells.
  • No ASM rebalance operations are running.
  • Flash cache state across all cells are “NORMAL”.

Enable Write-Back Flash Cache using a ROLLING method

Before you enable WBFC run a precheck to make sure the cells are ready and there are no faults.

./setWBFC.sh -g cell_group -m WriteBack -o rolling -p

At the end of the script which takes less than two minutes to run you’ll see message if storage cells passed the prechecks:

All pre-req checks completed:                    [PASSED]
2016-10-10 10:53:03
exa01cel01: flashcache size: 5.82122802734375T
exa01cel02: flashcache size: 5.82122802734375T
exa01cel03: flashcache size: 5.82122802734375T

There are 3 storage cells to process.

Then, once you are ready you run the script to enable the WBFC:

./setWBFC.sh -g cell_group -m WriteBack -o rolling

The script will go through the following steps on each cell, one cell at a time:

1. Recheck griddisks status to make sure none are OFFLINE
2. Drop flashcache
3. Change WBFC flashcachemode to WriteBack
4. Re-create the flashcache
5. Verify flashcachemode is in the correct state

On a Quarter Rack it took around four minutes to enable WBFC and you’ll this message at the end:

2016-10-10 11:23:24
Setting flash cache to WriteBack completed successfully.

Disable Write-Back Flash Cache using a ROLLING method

Disabling WBFC is not something you do every day but soon or later you might have to do it. I had to do it once for a customer who wanted to go back to WriteThrough because Oracle ACS said this was the default ?!

The steps to disable WBFC are the same as enabling it except that we need to flush all the dirty blocks off the flashcache before we drop it.

Again, run the precheck script to make sure everything looks good:

./setWBFC.sh -g cell_group -m WriteThrough -o rolling -p

if everything looks good then run the script:

./setWBFC.sh -g cell_group -m WriteThrough -o rolling

The script will first FLUSH flashcache across all cells in parallel and wait until the flush is complete!

You can monitor the flush process using the following commands:

dcli -l root -g cell_group cellcli -e "LIST CELLDISK ATTRIBUTES name, flushstatus, flusherror" | grep FD
dcli -l root -g cell_group cellcli -e "list metriccurrent attributes name,metricvalue where name like \'FC_BY_DIRTY.*\' "

The script will then go through the following steps on each cell, one cell at a time:

1. Recheck griddisks status to make sure none are OFFLINE
2. Drop flashcache
3. Change WBFC flashcachemode to WriteThrough
4. Re-create the flashcache
5. Verify flashcachemode is in the correct state

The time it takes to flush the cache depends on how dirty blocks you’ve got in the flashcache and the machine workload. I did two eighth racks and unfortunately, I didn’t check the number of dirty blocks but it took 75mins on the first one and 4hrs on the second.

Categories: oracle Tags:

Extending an Exadata Eighth Rack to a Quarter Rack

October 3rd, 2016 No comments

In the past year I’ve done a lot of Exadata deployments and probably half of them were eighth racks. It’s one of those temporary things – let’s do it now but we’ll change it later. It’s the same with the upgrades – I’ve never seen anyone doing an upgrade from an eighth rack to a quarter. However, a month ago one of our customers asked me to upgrade their three X5-2 HC 4TB units from an eighth to a quarter rack configuration.

What’s the different between an eighth rack and a quarter rack

X5-2 Eighth Rack and X5-2 Quarter rack have the same hardware and look exactly the same. The only difference is that only half of the compute power and storage space on an eighth rack is usable. In an eighth rack the compute nodes have half of their CPUs activated – 18 cores per server. It’s the same for the storage cells – 16 cores per cell, six hard disks and two flash cards are active.

While this is true for X3, X4 and X5 things have slightly changed for X6. Up until now, eighth rack configurations had all the hard disks and flash cards installed but only half of them were usable. The new Exadata X6-2 Eighth Rack High Capacity configuration has half of the hard disks and flash cards removed. To extend X6-2 HC to a quarter rack you need to add high capacity disks and flash cards to the system. This is only required for High Capacity configurations because X6-2 Eighth Rack Extreme Flash storage servers have all flash drives enabled.

What are the main steps of the upgrade:

  • Activate Database Server Cores
  • Activate Storage Server Cores and disks
  • Create eighth new cell disks per cell – six hard disk and two flash disks
  • Create all grid disks (DATA01, RECO01, DBFS_DG) and add them to the disk groups
  • Expand the flashcache onto the new flash disks
  • Recreate the flashlog on all flash cards

Here are few things you need to keep in mind before you start:

  • Compute nodes upgrade require a reboot for the new changes to come into action.
  • Storage cells upgrade do NOT require a reboot and it is an online operation.
  • Upgrade work is a low risk – your data is secure and redundant at all times.
  • This post is about X5 upgrade. If you were to upgrade X6 then before you begin you need to install the six 8 TB disks in HDD slots 6 – 11 and install the two F320 flash cards in PCIe slots 1 and 4.

Upgrade of the compute nodes

Well, this is really straight forward and you can do it at any time. Remember that you need to restart the server for the change to come into action:

dbmcli -e alter dbserver pendingCoreCount=36 force
DBServer exa01db01 successfully altered. Please reboot the system to make the new pendingCoreCount effective.

Reboot the server to activate the new cores. It will take around 10 minutes for the server to come back online.

Check the number of cores after server comes back:

dbmcli -e list dbserver attributes coreCount
cpuCount:               36/36

 

Make sure you’ve got the right number of cores. These systems allow capacity on demand (CoD) and in my case customer wanted to me activate only 28 cores per server.

Upgrade of the storage cells

Like I said earlier, the upgrade of the storage cells does NOT require reboot and can be done online at any time.

The following needs to be done on each cell. You can, of course, use dcli but I wanted to do that cell by cell and make sure each operation finishes successfully.

1. First, upgrade the configuration from an eighth to a quarter rack:

[root@exa01cel01 ~]# cellcli -e list cell attributes cpuCount,eighthRack
cpuCount:               16/32
eighthRack:             TRUE

[root@exa01cel01 ~]# cellcli -e alter cell eighthRack=FALSE
Cell exa01cel01 successfully altered

[root@exa01cel01 ~]# cellcli -e list cell attributes cpuCount,eighthRack
cpuCount:               32/32
eighthRack:             FALSE

 

2. Create cell disks on top of the newly activated physical disks

Like I said – this is an online operation and you can do it at any time:

[root@exa01cel01 ~]# cellcli -e create celldisk all
CellDisk CD_06_exa01cel01 successfully created
CellDisk CD_07_exa01cel01 successfully created
CellDisk CD_08_exa01cel01 successfully created
CellDisk CD_09_exa01cel01 successfully created
CellDisk CD_10_exa01cel01 successfully created
CellDisk CD_11_exa01cel01 successfully created
CellDisk FD_02_exa01cel01 successfully created
CellDisk FD_03_exa01cel01 successfully created

 

3. Expand the flashcache on to the new flash cards

This is again an online operation and it can be run at any time:

[root@exa01cel01 ~]# cellcli -e alter flashcache all
Flash cache exa01cel01_FLASHCACHE altered successfully

 

4. Recreate the flashlog

The flashlog is always 512MB big but to make use of the new flash cards it has to be recreated. Use the DROP FLASHLOG command to drop the flash log, and then use the CREATE FLASHLOG command to create a flash log. The DROP FLASHLOG command can be run at runtime, but the command does not complete until all redo data on the flash disk is written to hard disk.

Here is an important note from Oracle:

If FORCE is not specified, then the DROP FLASHLOG command fails if there is any saved redo. If FORCE is specified, then all saved redo is purged, and Oracle Exadata Smart Flash Log is removed.

[root@exa01cel01 ~]# cellcli -e drop flashlog
Flash log exa01cel01_FLASHLOG successfully dropped

 

5. Create grid disks

The best way to do that is to query the current grid disks size and use to create the new grid disks. Use the following queries to obtain the size for each grid disk. We use disk 02 because the first two does have DBFS_DG on them.

[root@exa01db01 ~]# dcli -g cell_group -l root cellcli -e "list griddisk attributes name, size where name like \'DATA.*02.*\'"
exa01cel01: DATA01_CD_02_exa01cel01        2.8837890625T
[root@exa01cel01 ~]# dcli -g cell_group -l root cellcli -e "list griddisk attributes name, size where name like \'RECO.*02.*\'"
exa01cel01: RECO01_CD_02_exa01cel01        738.4375G
[root@exa01cel01 ~]# dcli -g cell_group -l root cellcli -e "list griddisk attributes name, size where name like \'DBFS_DG.*02.*\'"
exa01cel01: DBFS_DG_CD_02_exa01cel01       33.796875G

Then you can either generate the commands and run them on each cell or use dcli to create them on all three cells:

dcli -g cell_group -l celladmin "cellcli -e create griddisk DATA_CD_06_\`hostname -s\` celldisk=CD_06_\`hostname -s\`,size=2.8837890625T"
dcli -g cell_group -l celladmin "cellcli -e create griddisk DATA_CD_07_\`hostname -s\` celldisk=CD_07_\`hostname -s\`,size=2.8837890625T"
dcli -g cell_group -l celladmin "cellcli -e create griddisk DATA_CD_08_\`hostname -s\` celldisk=CD_08_\`hostname -s\`,size=2.8837890625T"
dcli -g cell_group -l celladmin "cellcli -e create griddisk DATA_CD_09_\`hostname -s\` celldisk=CD_09_\`hostname -s\`,size=2.8837890625T"
dcli -g cell_group -l celladmin "cellcli -e create griddisk DATA_CD_10_\`hostname -s\` celldisk=CD_10_\`hostname -s\`,size=2.8837890625T"
dcli -g cell_group -l celladmin "cellcli -e create griddisk DATA_CD_11_\`hostname -s\` celldisk=CD_11_\`hostname -s\`,size=2.8837890625T"
dcli -g cell_group -l celladmin "cellcli -e create griddisk RECO_CD_06_\`hostname -s\` celldisk=CD_06_\`hostname -s\`,size=738.4375G"
dcli -g cell_group -l celladmin "cellcli -e create griddisk RECO_CD_07_\`hostname -s\` celldisk=CD_07_\`hostname -s\`,size=738.4375G"
dcli -g cell_group -l celladmin "cellcli -e create griddisk RECO_CD_08_\`hostname -s\` celldisk=CD_08_\`hostname -s\`,size=738.4375G"
dcli -g cell_group -l celladmin "cellcli -e create griddisk RECO_CD_09_\`hostname -s\` celldisk=CD_09_\`hostname -s\`,size=738.4375G"
dcli -g cell_group -l celladmin "cellcli -e create griddisk RECO_CD_10_\`hostname -s\` celldisk=CD_10_\`hostname -s\`,size=738.4375G"
dcli -g cell_group -l celladmin "cellcli -e create griddisk RECO_CD_11_\`hostname -s\` celldisk=CD_11_\`hostname -s\`,size=738.4375G"
dcli -g cell_group -l celladmin "cellcli -e create griddisk DBFS_DG_CD_06_\`hostname -s\` celldisk=CD_06_\`hostname -s\`,size=33.796875G"
dcli -g cell_group -l celladmin "cellcli -e create griddisk DBFS_DG_CD_07_\`hostname -s\` celldisk=CD_07_\`hostname -s\`,size=33.796875G"
dcli -g cell_group -l celladmin "cellcli -e create griddisk DBFS_DG_CD_08_\`hostname -s\` celldisk=CD_08_\`hostname -s\`,size=33.796875G"
dcli -g cell_group -l celladmin "cellcli -e create griddisk DBFS_DG_CD_09_\`hostname -s\` celldisk=CD_09_\`hostname -s\`,size=33.796875G"
dcli -g cell_group -l celladmin "cellcli -e create griddisk DBFS_DG_CD_10_\`hostname -s\` celldisk=CD_10_\`hostname -s\`,size=33.796875G"
dcli -g cell_group -l celladmin "cellcli -e create griddisk DBFS_DG_CD_11_\`hostname -s\` celldisk=CD_11_\`hostname -s\`,size=33.796875G"

6. The final step is to add newly created grid disks to ASM

Connect to the ASM instance using sqlplus as sysasm and disable the appliance mode:

SQL> ALTER DISKGROUP DATA01 set attribute 'appliance.mode'='FALSE';
SQL> ALTER DISKGROUP RECO01 set attribute 'appliance.mode'='FALSE';
SQL> ALTER DISKGROUP DBFS_DG set attribute 'appliance.mode'='FALSE';

Add the disks to the disk groups, you can either queue them on one instance or run them on both ASM instances in parallel:

SQL> ALTER DISKGROUP DATA01 ADD DISK 'o/*/DATA_CD_0[6-9]*',' o/*/DATA_CD_1[0-1]*' REBALANCE POWER 128;
SQL> ALTER DISKGROUP RECO01 ADD DISK 'o/*/RECO_CD_0[6-9]*',' o/*/RECO_CD_1[0-1]*' REBALANCE POWER 128;
SQL> ALTER DISKGROUP DBFS_DG ADD DISK 'o/*/DBFS_DG_CD_0[6-9]*',' o/*/DBFS_DG_CD_1[0-1]*' REBALANCE POWER 128;

Monitor the rebalance using select * from gv$asm_operations and once done change the appliance mode back to TRUE:

SQL> ALTER DISKGROUP DATA01 set attribute 'appliance.mode'='TRUE';
SQL> ALTER DISKGROUP RECO01 set attribute 'appliance.mode'='TRUE';
SQL> ALTER DISKGROUP DBFS_DG set attribute 'appliance.mode'='TRUE';

And at this point, you are done with the upgrade. I strongly recommend you to run (latest) exachk report and make sure there are no issues with the configuration.

A problem you might encounter is that the flash is not fully utilized, in my case I had 128MB free on each card:

[root@exa01db01 ~]# dcli -g cell_group -l root "cellcli -e list celldisk attributes name,freespace where disktype='flashdisk'"
exa01cel01: FD_00_exa01cel01         128M
exa01cel01: FD_01_exa01cel01         128M
exa01cel01: FD_02_exa01cel01         128M
exa01cel01: FD_03_exa01cel01         128M
exa01cel02: FD_00_exa01cel02         128M
exa01cel02: FD_01_exa01cel02         128M
exa01cel02: FD_02_exa01cel02         128M
exa01cel02: FD_03_exa01cel02         128M
exa01cel03: FD_00_exa01cel03         128M
exa01cel03: FD_01_exa01cel03         128M
exa01cel03: FD_02_exa01cel03         128M
exa01cel03: FD_03_exa01cel03         128M

This seems to be a known bug and to fix it you need to recreate both flashcache and flashlog.

References:
Extending an Eighth Rack to a Quarter Rack in Oracle Exadata Database Machine X4-2 and Later
Oracle Exadata Database Machine exachk or HealthCheck (Doc ID 1070954.1)
Exachk fails due to incorrect flashcache size after upgrading from 1/8 to a 1/4 rack (Doc ID 2048491.1)

Categories: oracle Tags:

Grid Infrastructure 12c installation fails because of 255 in the subnet ID

August 25th, 2016 No comments

I was doing another GI 12.1.0.2 cluster installation last month when I got really weird error.

While root.sh was running on the first node I got the following error:

2016/07/01 15:02:10 CLSRSC-343: Successfully started Oracle Clusterware stack
2016/07/01 15:02:23 CLSRSC-180: An error occurred while executing the command '/ocw/grid/bin/oifcfg setif -global eth0/10.118.144.0:public eth1/10.118.255.0:cluster_interconnect' (error code 1)
2016/07/01 15:02:24 CLSRSC-287: FirstNode configuration failed
Died at /ocw/grid/crs/install/crsinstall.pm line 2398.

I was surprised to find the following error in the rootcrs log file:

2016-07-01 15:02:22: Executing cmd: /ocw/grid/bin/oifcfg setif -global eth0/10.118.144.0:public eth1/10.118.255.0:cluster_interconnect
2016-07-01 15:02:23: Command output:
> PRIF-15: invalid format for subnet
>End Command output

Quick MOS search suggested that my installation failed because I had 255 in the subnet ID:
root.sh fails with CLSRSC-287 due to: PRIF-15: invalid format for subnet (Doc ID 1933472.1)

Indeed we had 255 in the private network subnet (10.118.255.0). Fortunately this was in our private network which was easy to change but you will still hit this issue if you public network  has 255 in the subnet ID.

Categories: oracle Tags: , ,

How to resolve missing dependency on exadata-sun-computenode-minimum

August 18th, 2016 No comments

I’ve been really busy last few months – except spending a lot of time on M25 I’ve been doing a lot of Exadata installations and consolidations. I haven’t posted for some time now but the good news is that I got many drafts and presentations ideas.

This is a quick post about an issue I had recently. I had to integrate AD authentication over Kerberos on the compute nodes (blog post to follow) but had to do compute node upgrade before that. This was Exadata X5-2 QR running 12.1.2.1.1 which had to be upgraded to 12.1.2.3.1 but I was surprised when dbnodeupdate failed with ‘Minimum’ dependency check failed. You’ll also notice the following in the logs:

exa01db01a: Exadata capabilities missing (capabilities required but not supplied by any package)
exa01db01a NOTE: Unexpected configuration - Contact Oracle Support

Starting with 11.2.3.3.0 the exadata-*computenode-exact and exadata-*computenode-minimum rpms were introduced. An update to 11.2.3.3.0 or later by default assumes the ‘exact’ rpm will be used to ‘update to’ with yum hence before running the upgrade dbnodeupdate will check if there are missing packages/dependencies.

Best way to check what is missing is to run yum check:

[root@exa01db01a ~]# yum check
Loaded plugins: downloadonly
exadata-sun-computenode-minimum-12.1.2.1.1.150316.2-1.x86_64 has missing requires of elfutils-libelf-devel >= ('0', '0.158', '3.2.el6')
exadata-sun-computenode-minimum-12.1.2.1.1.150316.2-1.x86_64 has missing requires of elfutils-libelf-devel(x86-64) >= ('0', '0.158', '3.2.el6')
exadata-sun-computenode-minimum-12.1.2.1.1.150316.2-1.x86_64 has missing requires of glibc-devel(x86-32) >= ('0', '2.12', '1.149.el6_6.5')
exadata-sun-computenode-minimum-12.1.2.1.1.150316.2-1.x86_64 has missing requires of libsepol(x86-32) >= ('0', '2.0.41', '4.el6')
exadata-sun-computenode-minimum-12.1.2.1.1.150316.2-1.x86_64 has missing requires of libselinux(x86-32) >= ('0', '2.0.94', '5.8.el6')
exadata-sun-computenode-minimum-12.1.2.1.1.150316.2-1.x86_64 has missing requires of elfutils-libelf(x86-32) >= ('0', '0.158', '3.2.el6')
exadata-sun-computenode-minimum-12.1.2.1.1.150316.2-1.x86_64 has missing requires of libcom_err(x86-32) >= ('0', '1.42.8', '1.0.2.el6')
exadata-sun-computenode-minimum-12.1.2.1.1.150316.2-1.x86_64 has missing requires of e2fsprogs-libs(x86-32) >= ('0', '1.42.8', '1.0.2.el6')
exadata-sun-computenode-minimum-12.1.2.1.1.150316.2-1.x86_64 has missing requires of libaio(x86-32) >= ('0', '0.3.107', '10.el6')
exadata-sun-computenode-minimum-12.1.2.1.1.150316.2-1.x86_64 has missing requires of libaio-devel(x86-32) >= ('0', '0.3.107', '10.el6')
exadata-sun-computenode-minimum-12.1.2.1.1.150316.2-1.x86_64 has missing requires of libstdc++-devel(x86-32) >= ('0', '4.4.7', '11.el6')
exadata-sun-computenode-minimum-12.1.2.1.1.150316.2-1.x86_64 has missing requires of compat-libstdc++-33(x86-32) >= ('0', '3.2.3', '69.el6')
exadata-sun-computenode-minimum-12.1.2.1.1.150316.2-1.x86_64 has missing requires of zlib(x86-32) >= ('0', '1.2.3', '29.el6')
exadata-sun-computenode-minimum-12.1.2.1.1.150316.2-1.x86_64 has missing requires of libxml2(x86-32) >= ('0', '2.7.6', '17.0.1.el6_6.1')
exadata-sun-computenode-minimum-12.1.2.1.1.150316.2-1.x86_64 has missing requires of elfutils >= ('0', '0.158', '3.2.el6')
exadata-sun-computenode-minimum-12.1.2.1.1.150316.2-1.x86_64 has missing requires of elfutils(x86-64) >= ('0', '0.158', '3.2.el6')
exadata-sun-computenode-minimum-12.1.2.1.1.150316.2-1.x86_64 has missing requires of ntsysv >= ('0', '1.3.49.3', '2.el6_4.1')
exadata-sun-computenode-minimum-12.1.2.1.1.150316.2-1.x86_64 has missing requires of ntsysv(x86-64) >= ('0', '1.3.49.3', '2.el6_4.1')
exadata-sun-computenode-minimum-12.1.2.1.1.150316.2-1.x86_64 has missing requires of glibc(x86-32) >= ('0', '2.12', '1.149.el6_6.5')
exadata-sun-computenode-minimum-12.1.2.1.1.150316.2-1.x86_64 has missing requires of nss-softokn-freebl(x86-32) >= ('0', '3.14.3', '18.el6_6')
exadata-sun-computenode-minimum-12.1.2.1.1.150316.2-1.x86_64 has missing requires of libgcc(x86-32) >= ('0', '4.4.7', '11.el6')
exadata-sun-computenode-minimum-12.1.2.1.1.150316.2-1.x86_64 has missing requires of libstdc++(x86-32) >= ('0', '4.4.7', '11.el6')
exadata-sun-computenode-minimum-12.1.2.1.1.150316.2-1.x86_64 has missing requires of compat-libstdc++-296 >= ('0', '2.96', '144.el6')
exadata-sun-computenode-minimum-12.1.2.1.1.150316.2-1.x86_64 has missing requires of compat-libstdc++-296(x86-32) >= ('0', '2.96', '144.el6')
Error: check all

Somehow all x86-32 packages and three x86-64 packages were removed. The x86-32 packages will be removed during as part of upgrade anyway, they were not present after the upgrade. I didn’t spend too much do understand why or how the packages were removed. I was told additional packages were installed before and then removed. Perhaps one had few dependencies and all got messed up when it was removed.

Anyway to solve this you need to download the patch for the same version (12.1.2.1.1). The p20746761_121211_Linux-x86-64.zip patch is still available from MOS 888828.1. So after that you unzip it, mount the iso, test install all the package to make sure nothing is missing and there are no conflicts and then finally install the packages:

[root@exa01db01a x86_64]# rpm -ivh --test zlib-1.2.3-29.el6.i686.rpm glibc-2.12-1.149.el6_6.5.i686.rpm nss-softokn-freebl-3.14.3-18.el6_6.i686.rpm libaio-devel-0.3.107-10.el6.i686.rpm libaio-0.3.107-10.el6.i686.rpm e2fsprogs-libs-1.42.8-1.0.2.el6.i686.rpm libgcc-4.4.7-11.el6.i686.rpm libcom_err-1.42.8-1.0.2.el6.i686.rpm elfutils-libelf-0.158-3.2.el6.i686.rpm libselinux-2.0.94-5.8.el6.i686.rpm libsepol-2.0.41-4.el6.i686.rpm glibc-devel-2.12-1.149.el6_6.5.i686.rpm elfutils-libelf-devel-0.158-3.2.el6.x86_64.rpm libstdc++-devel-4.4.7-11.el6.i686.rpm libstdc++-4.4.7-11.el6.i686.rpm compat-libstdc++-296-2.96-144.el6.i686.rpm compat-libstdc++-33-3.2.3-69.el6.i686.rpm libxml2-2.7.6-17.0.1.el6_6.1.i686.rpm elfutils-0.158-3.2.el6.x86_64.rpm ntsysv-1.3.49.3-2.el6_4.1.x86_64.rpm
Preparing...                ########################################### [100%]

[root@exa01db01a x86_64]# rpm -ivh zlib-1.2.3-29.el6.i686.rpm glibc-2.12-1.149.el6_6.5.i686.rpm nss-softokn-freebl-3.14.3-18.el6_6.i686.rpm libaio-devel-0.3.107-10.el6.i686.rpm libaio-0.3.107-10.el6.i686.rpm e2fsprogs-libs-1.42.8-1.0.2.el6.i686.rpm libgcc-4.4.7-11.el6.i686.rpm libcom_err-1.42.8-1.0.2.el6.i686.rpm elfutils-libelf-0.158-3.2.el6.i686.rpm libselinux-2.0.94-5.8.el6.i686.rpm libsepol-2.0.41-4.el6.i686.rpm glibc-devel-2.12-1.149.el6_6.5.i686.rpm elfutils-libelf-devel-0.158-3.2.el6.x86_64.rpm libstdc++-devel-4.4.7-11.el6.i686.rpm libstdc++-4.4.7-11.el6.i686.rpm compat-libstdc++-296-2.96-144.el6.i686.rpm compat-libstdc++-33-3.2.3-69.el6.i686.rpm libxml2-2.7.6-17.0.1.el6_6.1.i686.rpm elfutils-0.158-3.2.el6.x86_64.rpm ntsysv-1.3.49.3-2.el6_4.1.x86_64.rpm
Preparing...              ########################################### [100%]
1:libgcc                  ########################################### [  5%]
2:elfutils-libelf-devel   ########################################### [ 10%]
3:nss-softokn-freebl      ########################################### [ 15%]
4:glibc                   ########################################### [ 20%]
5:glibc-devel             ########################################### [ 25%]
6:elfutils                ########################################### [ 30%]
7:zlib                    ########################################### [ 35%]
8:libaio                  ########################################### [ 40%]
9:libcom_err              ########################################### [ 45%]
10:libsepol               ########################################### [ 50%]
11:libstdc++              ########################################### [ 55%]
12:libstdc++-devel        ########################################### [ 60%]
13:libaio-devel           ########################################### [ 65%]
14:libselinux             ########################################### [ 70%]
15:e2fsprogs-libs         ########################################### [ 75%]
16:libxml2                ########################################### [ 80%]
17:elfutils-libelf        ########################################### [ 85%]
18:compat-libstdc++-296   ########################################### [ 90%]
19:compat-libstdc++-33    ########################################### [ 95%]
20:ntsysv                 ########################################### [100%]

[root@exa01db01a x86_64]# yum check
Loaded plugins: downloadonly
check all

After that dbnodeupdate check completed successfully I upgraded the node to 12.1.3.2.1 in no time.

With Exadata you are allowed to install packages on the compute nodes as long as they don’t break any dependencies but you cannot install anything on the storage cells. Here’s oracle official statement:
Is it acceptable / supported to install additional or 3rd party software on Exadata machines and how to check for conflicts? (Doc ID 1541428.1)

Update 23.08.2016:
You might also get errors for two more packages in case you have updated to from OEL5 to OEL6 and now you try to patch the compute node:

fuse-2.8.3-4.0.2.el6.x86_64 has missing requires of kernel >= ('0', '2.6.14',
None)
2:irqbalance-1.0.7-5.0.1.el6.x86_64 has missing requires of kernel >= ('0',
'2.6.32', '358.2.1')

Refer to the following note for more information and how to fix it:

Categories: oracle Tags:

Dead Connection Detection in Oracle Database 12c

April 7th, 2016 No comments

In my earlier post I discussed what Dead Connection Detection is and why you should use it – read more here Oracle TNS-12535 and Dead Connection Detection

The pre-12c implementation of DCD used TNS packages to “ping” the client and relied on the underlying TCP stack which sometimes may take longer. Now in 12c this has changed and DCD probes are implemented by TCP Stack. The DCD probes will now use the TCP KEEPALIVE socket option to check if the connection is still usable.

To use the new implementation set the SQLNET.EXPIRE_TIME in sqlnet.ora to the amount of time between the probes in minutes. If the operating system supports TCP keep-alive tuning then Oracle Net automatically uses the new method. The new mechanism is supported on all platforms except on Solaris.

The following parameters are associated with the TCP keep-alive probes:
TCP_KEEPIDLE  – specifies the timeout of no activity until the probe is sent. The parameter takes value from SQLNET.EXPIRE_TIME.
TCP_KEEPCNT   – number of keep-alive probes to be sent, it is always set to 10.
TCP_KEEPINTVL – specifies the delay between probes if a keep-alive packets are sent and no acknowledgment is received, it is always set to 6.

If you need to revert to the pre-12c DCD mechanism (10 bytes TNS data) add the following parameters in sqlnet.ora:
USE_NS_PROBES_FOR_DCD=true

 

Categories: oracle Tags: ,

Oracle Exadata X6 released

April 5th, 2016 No comments

Oracle has just announced the next generation of Exadata Database Machine – X6-2 and X6-8.

Here are the changes for Exadata X6-2:
1) X6-2 Database Server: As always the hardware has been updated and the 2-socket database servers are now equip with latest twenty two-core Intel Xeon E5-2699 v4 “Broadwell” processors in comparison to X5 where we had eighteen-core Intel Xeon E5-2699 v3 processors. The memory is still DDR4 and the default configuration comes with 256Gb and can be expanded to 768Gb. The local storage can now be upgraded to 8 drives from default of 4 to allow more local storage in case of a consolidation with Oracle OVM.
2) X6-2 Storage Server HC: The storage server gets the new version CPUs as well and that is the ten-core Intel Xeon E5-2630 v4 processor (it was eight-core Intel Xeon E5-2630 v3 in X5). The flash cards are upgraded as well to 3.2 TB Sun Accelerator Flash F320 NVMe PCIe card for a total of 12.8 TB of flash cache (2x the capacity of X5 where we had 1.6Tb F160 cards).
2.1) X6-2 Storage Server EF – similarly to the High Capacity storage server this one gets the CPU and flash card upgraded. Also the NVMe PCIe Flash drives are now upgraded from 1.6Tb to 3.2Tb which gives you a total raw capacity of 25.6Tb per server.

This time Oracle released Exadata X6-8 together with X6-2 release. Changes aren’t many, I have to say that X6-8 compute node looks exactly the same as X5-8 in terms of specs so I guess that Exadata X6-8 actually consists of X5-8 compute nodes with X6-2 storage servers. Oracle’s vision on those big monsters is that they are specifically optimized for Database as a Service (DBaaS) and database in-memory. Indeed with 12Tb of memory we can host hundreds of databases or load a whole database in memory.

By the looks of it Exadata X6-2 and Exadata X6-8 will require the latest Exadata 12.1.2.3.0 software. This software has been around for some time now and has some new features:
1) Performance Improvements for Software Upgrades – I can confirm that, in the recent upgrade to 12.1.2.3.0 the cell upgrade took a bit more than an hour.
2) VLAN tagging support in OEDA – That’s not a fundamental new or exciting new feature but VLAN tagging was available before. Now it can be done through the OEDA hence it can be part of the deployment.
3) Quorum disk on database servers to enable high redundancy on quarter and eighth racks – You can now use database servers to deploy quorum disks and enable placement of voting disk on high redundancy disk groups on smaller (quarter and eight) rack. Here is more information – Managing Quorum Disks Using the Quorum Disk Manager Utility
4) Storage Index preservation during rebalance – The features enables Storage Indexes to be moved along the data when a disk hits predictive failure or true failure.
5) ASM Disk Size Checked When Reducing Grid Disk Size – this is a check on the storage server to make sure you cannot shrink a grid disk before decreasing the size of an ASM disk.

Capacity-On-Demand Licensing:
1) For Exadata X6-2 a minimum of 14 cores must be enabled per server.
2) For Exadata X6-8 a minumum of 56 cores must be enabled per server.

Here’s something interesting:
OPTIONAL CUSTOMER SUPPLIED ETHERNET SWITCH INSTALLATION IN EXADATA DATABASE MACHINE X6-2
Each Exadata Database Machine X6-2 rack has 2U available at the top of the rack that can be used by customers to optionally install their own client network Ethernet switches in the Exadata rack instead of in a separate rack. Some space, power, and cooling restrictions apply.

References:
Categories: oracle Tags: ,