Sunday, August 7, 2016

Exadata Two Node Cluster VKTM Process Trace File Warning

Problem Summary
---------------------------------------------------
Warning: VKTM detected a time drift

Problem Description
---------------------------------------------------
We found below problem from dc and dr alert log file .
Warning: VKTM detected a time drift.
Time drifts can result in an unexpected behavior such as time-outs. Please check trace file for more det.

Please assist us on above warning . Will it impact on business?

Solution :

The time drifts usually occurring less than 1sec and 5 sec for forward and backward respectively are permissible and OK.
If the traces are emitting time drifts of amount beyond these ranges, then it needs to be analyzed.
Most of the times, during high loads, there would be issues with underlying OS due to virtual memory, network time protocol improper configuration etc.

In general VKTM process need to be scheduled in every 10ms, if due to above reasons this is not happening we see the time drifts and to certain level (mentioned above) are permissible.

So please upload the VKTM trace files to analyze.

You can also set the below event to supress these alerts in alert log:
==========================================
Event 10795 is not set in the database.
Event 10795 suppress the VKTM warnings in alert log file. Hence the event need to be set.

$ sqlplus / as sysdba
alter system set events '10795 trace name context forever, level 2' scope=spfile;
shut immediate
startup
oradebug setmypid
oradebug eventdump system

The last command should show you the 10795 event set in your system.

exadata ibcheckerror report

                    Exadata ibcheckerror report 




[root@drawdbadm02 ~]# ibcheckerrors

src/query_smp.c:196; umad (DR path slid 0; dlid 0; 0,1,13,29,36 Attr 0x11:0) bad status 110; Connection timed out
#warn: counter SymbolErrorCounter = 65532       (threshold 10) lid 19 port 255
#warn: counter LinkErrorRecoveryCounter = 16    (threshold 10) lid 19 port 255
#warn: counter PortRcvErrors = 260      (threshold 10) lid 19 port 255
#warn: counter PortXmitDiscards = 267   (threshold 100) lid 19 port 255
Error check on lid 19 (Infiniscale-IV Mellanox Technologies) port all:  FAILED
#warn: counter PortXmitDiscards = 204   (threshold 100) lid 19 port 7
Error check on lid 19 (Infiniscale-IV Mellanox Technologies) port 7:  FAILED
#warn: counter LinkErrorRecoveryCounter = 14    (threshold 10) lid 19 port 20
#warn: counter PortRcvErrors = 260      (threshold 10) lid 19 port 20
Error check on lid 19 (Infiniscale-IV Mellanox Technologies) port 20:  FAILED
#warn: counter PortRcvSwitchRelayErrors = 300   (threshold 100) lid 1 port 255
Error check on lid 1 (SUN DCS 36P QDR drawsw-ibb0 10.100.25.22) port all:  FAILED
#warn: counter PortRcvSwitchRelayErrors = 230   (threshold 100) lid 1 port 10
Error check on lid 1 (SUN DCS 36P QDR drawsw-ibb0 10.100.25.22) port 10:  FAILED
#warn: counter LinkDownedCounter = 10   (threshold 10) lid 2 port 255
#warn: counter PortRcvSwitchRelayErrors = 749   (threshold 100) lid 2 port 255
#warn: counter PortXmitDiscards = 290   (threshold 100) lid 2 port 255
Error check on lid 2 (SUN DCS 36P QDR drawsw-iba0 10.100.25.21) port all:  FAILED
#warn: counter PortRcvSwitchRelayErrors = 108   (threshold 100) lid 2 port 1
Error check on lid 2 (SUN DCS 36P QDR drawsw-iba0 10.100.25.21) port 1:  FAILED
#warn: counter PortXmitDiscards = 180   (threshold 100) lid 2 port 2
Error check on lid 2 (SUN DCS 36P QDR drawsw-iba0 10.100.25.21) port 2:  FAILED
#warn: counter PortRcvSwitchRelayErrors = 274   (threshold 100) lid 2 port 7
Error check on lid 2 (SUN DCS 36P QDR drawsw-iba0 10.100.25.21) port 7:  FAILED
#warn: counter PortRcvSwitchRelayErrors = 168   (threshold 100) lid 2 port 10
Error check on lid 2 (SUN DCS 36P QDR drawsw-iba0 10.100.25.21) port 10:  FAILED

## Summary: 11 nodes checked, 0 bad nodes found
##          46 ports checked, 7 ports have errors beyond threshold

          How To collect Oracle EXADATA Sundiag  



[root@drawceladm01 ~]# /opt/oracle.SupportTools/sundiag.sh

Oracle Exadata Database Machine - Diagnostics Collection Tool

Last alert date is beyond 7 days. Skipping OSW/Metrics collection
Gathering Linux information

src/query_smp.c:196; umad (DR path slid 0; dlid 0; 0,1,13,29,36 Attr 0x11:0) bad status 110; Connection timed out
src/query_smp.c:196; umad (DR path slid 0; dlid 0; 0,1,13,29,36 Attr 0x11:0) bad status 110; Connection timed out
src/query_smp.c:196; umad (DR path slid 0; dlid 0; 0,1,13,29,36 Attr 0x11:0) bad status 110; Connection timed out
Skipping collection of OSWatcher/ExaWatcher logs, Cell Metrics and Traces
Skipping ILOM collection. Use the ilom or snapshot options, or login to ILOM
over the network and run Snapshot separately if necessary.

/var/log/exadatatmp/sundiag_drawceladm01_1320FM501H_2016_08_07_14_55
Gathering Cell information

Generating diagnostics tarball and removing temp directory

==============================================================================
Done. The report files are bzip2 compressed in /var/log/exadatatmp/sundiag_drawceladm01_1320FM501H_2016_08_07_14_55.tar.bz2
==============================================================================
Collect report with filezilla

oracle sysman default password password set

      How To Set Oracle SYSMAN User Default Password

SYSMAN is not an Oracle user. It is related to OMS. This user does not appear in DBA_USERS.
When ever the console asks for the password for the first time give the default password as OEM_TEMP. Then it prompts for new password.
Change it and use the new password in the subsequent logins.

Its very simple first you check below query for  sysman user and then conn sys  (put your desire password) click here

[oracle@awback01 ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Sun Aug 7 13:15:21 2016

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> select username
    from dba_users
    where username like 'SYSM%'
    order by 1;  2    3    4

USERNAME
------------------------------
SYSMAN

SQL> conn sysman
Enter password:
Connected.
SQL> show user
USER is "SYSMAN"


When ever the console asks for the password for the first time give the defailt password as OEM_TEMP. Then it prompts for new password.
Change it and use the new password in the subsequent logins. 


Monday, August 1, 2016

PCI Link error has been detected on a PCI card



                A PCI Link error has been detected on a PCI card 

                 from exadata database server ilom overal status

 Solution:
Putty: IP.10.100.*.*
login as: root
Using keyboard-interactive authentication.
Password:*****
Oracle(R) Integrated Lights Out Manager
Version 3.2.4.76 r108980
Copyright (c) 2016, Oracle and/or its affiliates. All rights reserved.

Hostname: hostname-ilom

-> set /SYS/MB clear_fault_action=true
Are you sure you want to clear /SYS/MB (y/n)? y
Set 'clear_fault_action' to 'true'
-> set /SYS/MB/P0 clear_fault_action=true
Are you sure you want to clear /SYS/MB/P0 (y/n)? y
Set 'clear_fault_action' to 'true'

-> set /SYS/MB/RISER2/PCIE2 clear_fault_action=true
Are you sure you want to clear /SYS/MB/RISER2/PCIE2 (y/n)? y
Set 'clear_fault_action' to 'true'
->
Refresh the ilom page  and you get overal status ok

exadata_ilom_snapshort
Exadata ilom overall status