Archive

Posts Tagged ‘bug’

Grid Infrastructure 12c installation fails because of 255 in the subnet ID

August 25th, 2016 No comments

I was doing another GI 12.1.0.2 cluster installation last month when I got really weird error.

While root.sh was running on the first node I got the following error:

2016/07/01 15:02:10 CLSRSC-343: Successfully started Oracle Clusterware stack
2016/07/01 15:02:23 CLSRSC-180: An error occurred while executing the command '/ocw/grid/bin/oifcfg setif -global eth0/10.118.144.0:public eth1/10.118.255.0:cluster_interconnect' (error code 1)
2016/07/01 15:02:24 CLSRSC-287: FirstNode configuration failed
Died at /ocw/grid/crs/install/crsinstall.pm line 2398.

I was surprised to find the following error in the rootcrs log file:

2016-07-01 15:02:22: Executing cmd: /ocw/grid/bin/oifcfg setif -global eth0/10.118.144.0:public eth1/10.118.255.0:cluster_interconnect
2016-07-01 15:02:23: Command output:
> PRIF-15: invalid format for subnet
>End Command output

Quick MOS search suggested that my installation failed because I had 255 in the subnet ID:
root.sh fails with CLSRSC-287 due to: PRIF-15: invalid format for subnet (Doc ID 1933472.1)

Indeed we had 255 in the private network subnet (10.118.255.0). Fortunately this was in our private network which was easy to change but you will still hit this issue if you public network  has 255 in the subnet ID.

Categories: oracle Tags: , ,

MGMTDB not automatically created on Exadata X5 and GI 12.1.0.2

July 1st, 2015 No comments

While deploying an X5 Full Rack recently it happened that the Grid Infrastructure Management Repository was not created by onecommand. The GIMR database was optional in 12.1.0.1 and became mandatory in 12.1.0.2 and should be automatically installed with Oracle Grid Infrastructure 12c release 1 (12.1.0.2). For unknown reason to me that didn’t happen and I had to create it manually. I’ve checked all the log files but couldn’t find any errors.  For reference the OEDA version used was Feb 2015 v15.050, image version on the Exadata was 12.1.2.1.0.141206.1.

To create the database login as the grid user and create file holding the following variables:

cat > /tmp/cfgrsp.properties
oracle.assistants.asm|S_ASMPASSWORD=[your ASM password]
oracle.assistants.asm|S_ASMMONITORPASSWORD=[your ASM password]

and run the following command:

GRID_HOME=/u01/app/12.1.0.2/grid
[oracle@exa01 ~]$ $GRID_HOME/cfgtoollogs/configToolAllCommands RESPONSE_FILE=/tmp/cfgrsp.properties

For reference, here is similar bug I found on MOS:
-MGMTDB Not Created When Using EM12c Provisioning (Doc ID 1983885.1)

Categories: oracle Tags: , ,

dbnodeupdate.sh post upgrade step fails on Exadata storage software 12.1.2.1.1

June 23rd, 2015 No comments

I’ve done several Exadata deployments in the past two months and had to upgrade the Exadata storage software on half of them. Reason for that was because units shipped before May had their Exadata storage software version of 12.1.2.1.0.

The upgrade process of the database nodes ran fine but when I ran dbnodeupdate.sh -c for completing post upgrade steps I got an error that the system wasn’t on the expected Exadata release or kernel:

(*) 2015-06-01 14:21:21: Verifying GI and DB's are shutdown
(*) 2015-06-01 14:21:22: Verifying firmware updates/validations. Maximum wait time: 60 minutes.
(*) 2015-06-01 14:21:22: If the node reboots during this firmware update/validation, re-run './dbnodeupdate.sh -c' after the node restarts..
(*) 2015-06-01 14:21:23: Collecting console history for diag purposes

ERROR: System not on expected Exadata release or kernel, exiting


ERROR: Correct error, or to override run: ./dbnodeupdate.sh -c -q -t 12.1.2.1.1.150316.2

Indeed, the database node was running the new Exadata software but still using the old kernel (2.6.39-400.243) and dbnodeupdate was expecting me to run the new 2.6.39-400.248 kernel:

imageinfo:
Kernel version: 2.6.39-400.243.1.el6uek.x86_64 #1 SMP Wed Nov 26 09:15:35 PST 2014 x86_64
Image version: 12.1.2.1.1.150316.2
Image activated: 2015-06-01 12:27:57 +0100
Image status: success
System partition on device: /dev/mapper/VGExaDb-LVDbSys1

The reason for that was that the previous run of dbnodeupdate installed the new kernel package but failed to update grub.conf. The solution is to manually add the missing kernel entry to grub.conf and reboot the server to pick up the new kernel, here is a note for more information which by the time I had this problem was still internal:


Bug 20708183 – DOMU:GRUB.CONF KERNEL NOT ALWAYS UPDATED GOING TO 121211, NEW KERNEL NOT BOOTED

 

 

Categories: oracle Tags: ,

opatch 12.1.0.1.7 fails with System Configuration Collection Failed

June 1st, 2015 No comments

I was recently upgrading an Exadata 12.1.0.2 DBBP6 to DBBP7 and as usual I went for the latest opatch version which was 12.1.0.1.7 (Apr 2015) as of that time.

After running the opatchauto apply or opatchauto apply -analyze I got the following error:

System Configuration Collection failed: oracle.osysmodel.driver.sdk.productdriver.ProductDriverException: java.lang.NullPointerException
Exception in thread "main" java.lang.RuntimeException: java.io.IOException: Stream closed
        at oracle.opatchauto.gi.GILogger.writeWithoutTimeStamp(GILogger.java:450)
        at oracle.opatchauto.gi.GILogger.printStackTrace(GILogger.java:465)
        at oracle.opatchauto.gi.OPatchauto.main(OPatchauto.java:97)
Caused by: java.io.IOException: Stream closed
        at java.io.BufferedWriter.ensureOpen(BufferedWriter.java:98)
        at java.io.BufferedWriter.write(BufferedWriter.java:203)
        at java.io.Writer.write(Writer.java:140)
        at oracle.opatchauto.gi.GILogger.writeWithoutTimeStamp(GILogger.java:444)
        ... 2 more

opatchauto failed with error code 1.

This was a known problem caused by these bugs which are fixed in 12.1.0.1.8:
Bug 20892488 : OPATCHAUTO ANALYZE FAILING WITH GIPATCHINGHELPER::CREATESYSTEMINSTANCE FAILED
BUG 20857919 – LNX64-121023GIPSU:APPLY GIPSU FAILED WITH SYSTEM CONFIGURATION COLLECTION FAILED

As 12.1.0.1.8 is not yet available the workaround is to use lower version of opatch 12.1.0.1.6 which can be downloaded from this note:
Opatchauto Gives “System Configuration Collection Failed” Message (Doc ID 2001933.1)

You might run into the same problem if you are applying 12.1.0.2.3 PSU.

Categories: oracle Tags: , ,

Not able to update Web service process in APEX 4.1

March 21st, 2012 No comments

Last month I created a simple APEX application with enabled mobile support and latest version of jQuery, which integrates with HP Service Manager through web services. The purpose was to give option for company engineers to open and update incidents through mobile in few easy steps.

The first step was to create form and report by using the web service. At this point web service request process is created where authentication and input parameter are described. The problem appears if you try to change any parameter and update the web service process. This is the error:

 Error error updating web service parameters
ORA-01403: no data found

To fix this, one option is to apply patch 12934733 on top of APEX 4.1. The other option is to apply latest patch set for APEX version 4.1.1, patch number 13331096.

At the time I got the error patch set wasn’t released yet and I went with the patch only to fix this issue. Later I’ve decided to update the APEX to latest version 4.1.1 and I’ll review the update process at glance.

To upgrade to APEX 4.1.1 make sure first to review the release notes here. The process is really simple and takes few minutes.

Before applying the patch make sure to prevent access to the APEX. In my case I’m using Oracle Database 11g Express Edition and I’m using Embedded PL/SQL gateway. Then apply the patch using apxpatch.sql and update the images directory. Because I’m using Express Edition, my images are stored in the XML DB repository and script apxldimg.sql has to be used to upload the new images within the repository.

 

Disabling Oracle XML DB HTTP Server:

SQL> SELECT DBMS_XDB.GETHTTPPORT FROM DUAL;

GETHTTPPORT
-----------
 0

SQL> EXEC DBMS_XDB.SETHTTPPORT(0);

PL/SQL procedure successfully completed.

SQL> COMMIT;

Commit complete.

SQL> SELECT DBMS_XDB.GETHTTPPORT FROM DUAL;

GETHTTPPORT
-----------
 0

 

Run apxpatch.sql to patch the system:

SQL> @apxpatch.sql

.......

timing for: Complete Patch
Elapsed: 00:06:25.48

 

Updating the Images Directory When Running the Embedded PL/SQL Gateway:

@apxldimg.sql /tmp/patch

.......

Commit complete.

timing for: Load Images
Elapsed: 00:04:12.56

Directory dropped.

 

Enabling Oracle XML DB HTTP Server:

SQL> EXEC DBMS_XDB.SETHTTPPORT(8080);

PL/SQL procedure successfully completed.

SQL> COMMIT;

Commit complete.

 

APEX is now updated to version 4.1.1

 

Regards,
Sve

Categories: oracle Tags: , , , ,

Database 11.2 bug causes huge number of alert log entries

December 22nd, 2011 5 comments

Few days ago I received a call from customer about problem with their EM console and messages about file system full. They run DB 11.2.0.2 on OEL 5.7 and had only binaries installation at that file system and the database itself was using ASM. I quickly logged on to find out the file system was really full and after looking around I figure out that all the free space was eaten by alert and trace diagnostic directories. The trace directory was full of 10MB files and the alertlog file was quick growing with following messages:

 WARNING: failed to read mirror side 1 of virtual extent 2917 logical extent 0 of file 271 in group [1.2242406296] from disk DATA_0000 allocation unit 24394 reason error; if possible,will try another mirror side
Errors in file /oracle/app/oracle/diag/rdbms/baandb/baandb/trace/baandb_ora_17785.trc:
WARNING: Read Failed. group:1 disk:0 AU:24394 offset:1007616 size:8192
WARNING: failed to read mirror side 1 of virtual extent 2917 logical extent 0 of file 271 in group [1.2242406296] from disk DATA_0000 allocation unit 24394 reason error; if possible,will try another mirror side
Errors in file /oracle/app/oracle/diag/rdbms/baandb/baandb/trace/baandb_ora_17785.trc: 

At first I though there is a storage problem, but looking at the ASM views everything seemed to be all right and these seemed to be false messages. I deleted all the trace files, but then few minutes later the file system became again full. It turned out that generated log per minute were more than 60MBor around 7GB for two hours, because of this huge number of messages the machine was already loaded.

Then after quick MOS search I found that this is a Bug 10422126: FAILED TO READ MORROR SIDE 1 and there is a 70KB patch for 11.2.0.2.

The following MOS notes are also useful:
WARNING: ‘Failed To Read Mirror Side 1’ continuously reported in the alert log [ID 1289905.1]
Huge number of alert log entries: ‘WARNING: IO Failed…’ ‘WARNING: failed to read mirror side 1 of virtual extent …’ [ID 1274852.1]

After applying the patch everything became normal and no more false messages appeared in the logs. The bug is fixed in 11.2.0.3.

Regards,
Sve

Categories: linux, oracle Tags: ,

Cannot apply BP10 to Oracle Database 11.2.0.2 on Windows Server 2008 R2

November 9th, 2011 2 comments

This happened to be when I tryed to apply Bundle Patch 10 of Oracle Database 11.2.0.2 on Windows 2008, but I guess it could happen to any 11.x database version. I decided to apply this patch after I stepped the bug in which the heap memory is exhausted because of an CVU health checks (I described it here).

After running opatch apply I got that the following files are still active:
d:\app\11.2.0\grid\bin\oraclient11.dll
d:\app\11.2.0\grid\bin\orageneric11.dll
d:\app\11.2.0\grid\bin\orapls11.dll
d:\app\11.2.0\grid\bin\oracommon11.dll
d:\app\11.2.0\grid\bin\oci.dll
d:\app\11.2.0\grid\bin\orahasgen11.dll
d:\app\11.2.0\grid\bin\oraocr11.dll
d:\app\11.2.0\grid\bin\oraocrb11.dll
d:\app\11.2.0\grid\bin\oraocrutl11.dll
d:\app\11.2.0\grid\bin\mDNSResponder.exe
d:\app\11.2.0\grid\bin\ocssd.exe
d:\app\11.2.0\grid\bin\cssdagent.exe
d:\app\11.2.0\grid\bin\cssdmonitor.exe
d:\app\11.2.0\grid\bin\evmd.exe
d:\app\11.2.0\grid\bin\evmlogger.exe
d:\app\11.2.0\grid\bin\gipcd.exe
d:\app\11.2.0\grid\bin\gpnpd.exe
d:\app\11.2.0\grid\bin\octssd.exe

It’s unlikely to have something running, because I have stopped all GI processes. Again to find out which is the process holding the dll’s I’ve used ProcessExplorer. It seemed that process WmiPrvSE.exe had the dlls open:

Description of WMI:
The wmiprvse.exe file is otherwise known as Windows Management Instrumentation. It is a Microsoft Windows-based component that provides control and information about management in an enterprise environment. Developers use the wmiprvse.exe file in order to develop applications used for monitoring purposes.

For some reason WMI is holding the CRS dlls. Stop the WMI service or kill the process and this should release the lock on the drivers and allow the opatch to proceed.

Regards,
Sve

Categories: oracle, windows Tags: , ,

Exhaust of Windows 2008 heap memory with Oracle Database 11.2.0.2

September 29th, 2011 4 comments

Recently I had an interesting setup for one of our customers. Because they got Oracle Standard Edition and Windows 2008 Server R2 Standard Edition licenses I was asked to create HA database installation. After looking around I found few docs about installing Standard Edition with Clusterware and I had some ideas. Finally I installed Grid Infrastructure on both servers and Oracle Database binaries. Then created single instance database on the second server and replicated the configuration to the first one. Currently the relocation of the database is done manually, but one could create a start/stop/monitor scripts and integrate these with GI. Once the database starts it’s registering at the scan listener so in theory it’s running in HA (just the relocation is manual) 🙂

So during the weekend I received mail from my colleagues above error messages they received from the database: connect error, Socket read timed out. It wasn’t a rush as the database is not yet in production, but it’s ahead and this was the first task for the Monday. Next day I looked around and everything was up and running, except that I wasn’t able to login through the listener and I also wasn’t able to stop or relocate it. Looking at the logs I found at some point the following message: TNS-12531: TNS:cannot allocate memory which explains the previous message.

That was weird, the server on which error appeared was the first one and had only GI running and SCAN LISTENER. This really looked like a memory leak, it’s a Windows so maybe that was obvious. I decided to look around the processes using the Resource Monitor when I found a lot of many cmd.exe processes. To confirm the problem I used Process Explorer which is a very nice tool for Windows. As could be seen below I’ve got plenty of cmd processes which were spawned, but not (obviously) closed after completion:

It turned out that this is a bug for 11.2.0.2 and Windows (64 bit). The Oracle CVU resource (ora.cvu), which by default is started on the first node in the cluster (this makes sense now) it’s doing checks on every six hours (CHECK_INTERVAL=21600) and leaves process open. Because of this the heap memory is exhausted and that’s the reason why the SCAN LISTENER is failing and giving the error message TNS-12531: TNS:cannot allocate memory

 

The following errors could be seen in Windows Eventlog, once the patch is applied the errors disappeared:
Faulting application lsnrctl.exe, version 11.2.0.2, time stamp 0x4cea8f55, faulting module kernel32.dll, version 6.0.6001.18538, time stamp 0x4cb73957, exception code 0xc0000142, fault offset 0x00000000000b1b48, process id 0x1eac, application start time 0x01cc6ab588f992c0.

Faulting application cmd.exe, version 6.0.6001.18000, time stamp 0x47918bde, faulting module kernel32.dll, version 6.0.6001.18538, time stamp 0x4cb733e1, exception code 0xc0000142, fault offset 0x0006f1e7, process id 0x1004, application start time 0x01cc6af0fa982500.

Faulting application sclsspawn.exe, version 0.0.0.0, time stamp 0x4ce622a7, faulting module kernel32.dll, version 6.0.6001.18538, time stamp 0x4cb73957, exception code 0xc0000142, fault offset 0x00000000000b1b48, process id 0x1ca0, application start time 0x01cc6c0e5efd5380.

This is the bug at MOS:
Bug 12529945: CVU HEALTH CHECKS EXHAUST WINDOWS HEAP MEMORY

The bug should have been fixed in BP8, but I applied the latest one BP10:
Patch 12849789: ORACLE 11G 11.2.0.2 PATCH 10 BUG FOR WINDOWS (64-BIT AMD64 AND INTEL EM64)

 

Regards,
Sve

Categories: oracle, windows Tags: , , ,

Unable to load Audit Vault console after login

September 7th, 2011 No comments

Well, this is quick notice in case someone else got into this error. I’m having Audit Vault server, patched up to 10.2.3.2.5 and its repository database to 10.2.0.7. The problem is that I’m able to connect as av_admin into the console, but not as av_auditor. When I try to login as av_auditor I’ve got redirected to wrong URL, like this one:

http://192.168.1.100:0/av/console/f?p=7700:100:::::

It’s obvious that’s wrong, port 0 does not exist and I’m getting error Unable to connect in the browser.

Just to make sure whether this is the problem, check to see if the lsnrctl status is having line like this one:

(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=hostname)(PORT=5707))(Presentation=HTTP)(Session=RAW))

Also use dbms_xdb.gethttpport to get the port on which the console is listening at:

SELECT DBMS_XDB.gethttpport() from dual;

DBMS_XDB.GETHTTPPORT()
----------------------
0

 

These tips are described at the documentation Oracle® Audit Vault Administrator’s Guide, in particular A.3.6.1 Oracle Audit Vault Reports Not Displaying

The correct port of Oracle Audit Vault Reports HTTP is 5707 and running the above query should return exactly this port.  If this is the case and you get port 0, then login as sysdba and set the correct port:

SQL> EXEC DBMS_XDB.SETHTTPPORT(5707);

PL/SQL procedure successfully completed.

SQL> commit;

Commit complete.

 

Make sure the changes are applied:

SQL> SELECT DBMS_XDB.gethttpport() from dual;

DBMS_XDB.GETHTTPPORT()
----------------------
                  5707

 

And finally register the database:

SQL> ALTER SYSTEM REGISTER;

 

You’re now happy Audit Vault auditor who can login successfully to the console.

Regards,
Sve

Categories: oracle Tags: ,

Oracle DB 10.2.0.3 LISTENER (VIP) goes down on HP-UX 11.23 without reason

January 5th, 2011 No comments

Happy New Year!

For a long time I’ve been receiving complains that the listener at one of the nodes in two node RAC is going offline from time to time. Without obvious reason the VIP of the second node fails, the listener is stopped and VIP is relocated to the first node. Since the VIP is relocated there are no problems if all the clients are configured correctly. In this case some of the clients were connecting explicitly to the second node and were unable to connect to the database. Database version is 10.2.0.3 RAC installed on two nodes running HP-UX 11.23 with December 2008 bundle patches.

The following can be observed in $CRS_HOME/log/$HOSTNAME/crsd/crsd.log:
2010-10-25 06:11:12.492: [ CRSAPP][8336] CheckResource error for ora.db2.vip error code = 1
2010-10-25 06:11:12.522: [ CRSRES][8336] In stateChanged, ora.db2.vip target is ONLINE
2010-10-25 06:11:12.522: [ CRSRES][8336] ora.db2.vip on db2 went OFFLINE unexpectedly
2010-10-25 06:11:12.523: [ CRSRES][8336] StopResource: setting CLI values
2010-10-25 06:11:12.527: [ CRSRES][8336] Attempting to stop `ora.db2.vip` on member `db2`
2010-10-25 06:11:13.182: [ CRSRES][8336] Stop of `ora.db2.vip` on member `db2` succeeded.
2010-10-25 06:11:13.185: [ CRSRES][8336] ora.db2.vip RESTART_COUNT=0 RESTART_ATTEMPTS=0
2010-10-25 06:11:13.188: [ CRSRES][8336] ora.db2.vip failed on db2 relocating.
2010-10-25 06:11:13.231: [ CRSRES][8336] StopResource: setting CLI values
2010-10-25 06:11:13.235: [ CRSRES][8336] Attempting to stop `ora.db2.LISTENER_DB2.lsnr` on member `db2`
2010-10-25 06:12:31.183: [ CRSRES][8336] Stop of `ora.db2.LISTENER_DB2.lsnr` on member `db2` succeeded.
2010-10-25 06:12:31.211: [ CRSRES][8336] Attempting to start `ora.db2.vip` on member `db1`
2010-10-25 06:12:38.327: [ CRSRES][8336] Start of `ora.db2.vip` on member `db1` succeeded.

At alert log can be seen following:
ALTER SYSTEM SET service_names=” SCOPE=MEMORY SID=’oradb2′;

There are couple of bugs logged about that. There is also MOS ID regarding this problem:
HP-UX Itanium: RACGMAIN Received SIGSEGV On CheckResource Causing a Crash of a Resource [ID 763724.1]

The solution is to change the executable mode which uses shared library from “delay binding” to “immediate binding” using following bash script. It has to be applied on both CRS and DB homes, all Oracle processes should be stopped:

cd $ORACLE_HOME/bin/
for i in crs_relocate.bin crs_start.bin crs_stop.bin crsd.bin evmd.bin racgons.bin racgeut racgevtf racgmain; do chatr -B immediate $i; done

cd $CRS_HOME/bin/
for i in crs_relocate.bin crs_start.bin crs_stop.bin crsd.bin evmd.bin racgons.bin racgeut racgevtf racgmain; do chatr -B immediate $i; done

For three months since implementing this solutions I haven’t seen this problem again!

Regards,
Sve

Categories: hp-ux, oracle Tags: , , ,