使用oradebug dump hanganalyze分析oracle hang系列一
背景
oracle db有可能因为诸多因系而hang,可以借助某些工具进行诊断分析,本文主要学习如何用不用oradebug dump hanganalyze 3进行分析产生的TRACE FILE.
以及所包含的内容结构。
结论
1,oradebug hanganalyze 3产生的TRACE FILE包括3部分内容
2,各部分内容如下:
第一部分
Chains most likely to have caused the hang:
[a] Chain 1 Signature: <='latch: cache buffers chains'
Chain 1 Signature Hash: 0xccebf225
[b] Chain 2 Signature: <='buffer busy waits'
Chain 2 Signature Hash: 0x9a13abed
[c] Chain 3 Signature: <='latch: cache buffers chains'
Chain 3 Signature Hash: 0xccebf225
第二部分(当然这里面又分为intersecting chains或者non-intersecting chains)
===============================================================================
Non-intersecting chains:
-------------------------------------------------------------------------------
Chain 1:
-------------------------------------------------------------------------------
Oracle session identified by:
{
instance: 1 (guowang.guowang)
os id: 10574
process id: 10, oracle@seconary (DBW0)
session id: 10
session serial #: 1
}
is waiting for 'latch: cache buffers chains' with wait info:
{
p1: 'address'=0xda4f8850
p2: 'number'=0x96
p3: 'tries'=0x0
time in wait: 0.012628 sec
timeout after: never
wait id: 50243
blocking: 0 sessions
current sql:
short stack: ksedsts()+461<-ksdxfstk()+32<-ksdxcb()+1782<-sspuser()+112<-__restore_rt()<-semop()+7<-skgpwwait()+156<-kslgess()+1799<-ksl_get_shared_latch()+620<-kcbbic1()+247<-kcbbiop()+868<-kcbbdrv()+566<-ksbabs()+465<-ksbrdp()+923<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+266<-ssthrdmain()+214<-main()+201<-__libc_start_main()+244<-_start()+36
wait history:
* time between current wait and wait #1: 0.000058 sec
1. event: 'db file async I/O submit'
time waited: 0.328988 sec
wait id: 50242 p1: 'requests'=0x1
p2: 'interrupt'=0x0
p3: 'timeout'=0x0
* time between wait #1 and #2: 0.000076 sec
2. event: 'db file async I/O submit'
time waited: 0.000748 sec
wait id: 50241 p1: 'requests'=0x1
p2: 'interrupt'=0x0
p3: 'timeout'=0x0
* time between wait #2 and #3: 0.000086 sec
3. event: 'db file async I/O submit'
time waited: 0.000177 sec
wait id: 50240 p1: 'requests'=0x1
p2: 'interrupt'=0x0
p3: 'timeout'=0x0
}
and is blocked by
=> Oracle session identified by:
{
instance: 1 (guowang.guowang)
os id: 6664
process id: 91, oracle@seconary (J063)
session id: 32
session serial #: 9465
}
which is not in a wait:
{
last wait: 0.000000 sec ago
blocking: 6 sessions
current sql:
short stack:
}
Chain 1 Signature: <='latch: cache buffers chains'
Chain 1 Signature Hash: 0xccebf225
第三部分
State of LOCAL nodes
([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]):
[9]/1/10/1/0xdcba9740/10574/NLEAF/[31]
[11]/1/12/1/0xdcba3a60/10578/SINGLE_NODE/
中间类似内容略
[346]/1/347/9522/0xdc7d75d0/6413/NLEAF/[118]
[348]/1/349/7696/0xdc7d18f0/6433/NLEAF/[204]
关于各列含义见下:
根据其每列含义
([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]):
数据库实例编号 会话sid 会话serial# 会话的地址saddr 会话对应的操作系统ID 表明会话是否等待 如有值,表明是持锁会话的CHAIN编号,否则为空
[163] / 1 / 164 / 10419 / 0xdc9ea7e0 / 4046 / NLEAF / [301]
3,non-intersecting chains的特色就是所包含的会话的阻塞会话,不隶属于任何chain
而intersecting chains包含的会话的阻塞会话隶属于另一个chains中的持锁会话,也就是说chains包含在另一个chains中
4,chains中包括每个会话以及阻塞会话的详细信息,包括SID,PID,SPID以及当前运行的SQL
所调用堆栈;以及近期的历史等待会话列表
5,关于 ([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]) 中的state列的各值含义如下,暂不全面(仍需要进一步测试)
NLEAF表明是等待会话
SINGLE_NODE表明它是等待任何会话或资源
LEAF表明它是持锁会话,即它不等待任何会话或资源
LEAF_NEW它也是持锁会话
测试
1,数据库版本
SQL> select * from v$version where rownum=1;
BANNER
----------------------------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
2,数据库未有任何HANG时,产生hanganalyze
SQL> oradebug setmypid
Statement processed.
SQL> oradebug hanganalyze 3
Hang Analysis in /oracle/diag/rdbms/guowang/guowang/trace/guowang_ora_13018.trc
*** 2015-11-02 07:02:02.135
===============================================================================
HANG ANALYSIS:
instances (db_name.oracle_sid): guowang.guowang
oradebug_node_dump_level: 3 --trace level级别
analysis initiated by oradebug --标明hanganalyze由oradebug生成
===============================================================================
Chains most likely to have caused the hang:--导致HANG的原因
===============================================================================
No chains found.
===============================================================================
Extra information that will be dumped at higher levels:
State of ALL nodes --全部节点的状态(我估计适用于RAC)
([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]): --每个列的含义
*** 2015-11-02 07:02:02.163
===============================================================================
END OF HANG ANALYSIS
===============================================================================
*** 2015-11-02 07:02:02.164
===============================================================================
HANG ANALYSIS DUMPS:
oradebug_node_dump_level: 3
===============================================================================
State of LOCAL nodes --本地节点的状态
([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]):--同state of all nodes
No processes qualify for dumping.
===============================================================================
HANG ANALYSIS DUMPS: END
===============================================================================
4,手工模拟产生锁等待,产生hanganalyze
测试会话1
SQL> select sid,serial# from v$session where sid=(select sid from v$mystat where rownum=1);
SID SERIAL#
---------- ----------
302 9226
SQL> select pid,spid from v$process where addr=(select paddr from v$session where sid=(select sid from v$mystat where rownum=1));
PID SPID
---------- ------------------------------------------------
165 3812
SQL> create table t_hang(a int,b int);
Table created.
SQL> insert into t_hang values(1,1);
1 row created.
SQL> commit;
Commit complete.
测试会话2
SQL> select sid,serial# from v$session where sid=(select sid from v$mystat where rownum=1);
SID SERIAL#
---------- ----------
164 10419
SQL> select pid,spid from v$process where addr=(select paddr from v$session where sid=(select sid from v$mystat where rownum=1));
PID SPID
---------- ------------------------------------------------
184 4046
hang住
SQL> update t_hang set a=2 where a=1;
SQL> oradebug setmypid
Statement processed.
SQL> oradebug hanganalyze 3
Hang Analysis in /oracle/diag/rdbms/guowang/guowang/trace/guowang_ora_6813.trc
*** 2015-11-02 07:16:19.252
===============================================================================
HANG ANALYSIS:
instances (db_name.oracle_sid): guowang.guowang
oradebug_node_dump_level: 3
analysis initiated by oradebug
===============================================================================
Chains most likely to have caused the hang:
[a] Chain 1 Signature: <='latch: cache buffers chains'
Chain 1 Signature Hash: 0xccebf225
[b] Chain 2 Signature: <='buffer busy waits'
Chain 2 Signature Hash: 0x9a13abed
[c] Chain 3 Signature: <='latch: cache buffers chains'
Chain 3 Signature Hash: 0xccebf225
--经在TRACE FILE过滤等待会话的sid,即164,以后可以直接查session id:164或者对应的进程号process id:184,或者对应的操作系统进程os id:4096
发现它是在一个chain中,这个chain中清晰标明了,等待会话到底在等待什么资源,以及是哪个会话阻塞它,并且会列出阻塞会话的详细 信息
Chain 10:
-------------------------------------------------------------------------------
Oracle session identified by:
{
instance: 1 (guowang.guowang)
os id: 4046
process id: 184, oracle@seconary (TNS V1-V3)
session id: 164
session serial #: 10419
}
is waiting for 'enq: TX - row lock contention' with wait info:
{
p1: 'name|mode'=0x54580006
p2: 'usn<<16 | slot'=0xca001d
p3: 'sequence'=0x2842
time in wait: 1 min 23 sec
timeout after: never
wait id: 33
blocking: 0 sessions
current sql: update t_hang set a=2 where a=1 -会列出当前等待会话正在执行的SQL以及调用堆栈
short stack: ksedsts()+461<-ksdxfstk()+32<-ksdxcb()+1782<-sspuser()+112<-__restore_rt()<-semtimedop()+10<-skgpwwait()+156<-ksliwat()+1821<-kslwaitctx()+162<-ksqcmi()+5641<-ksqgtlctx()+3401<-ksqgelctx()+552<-ktcwit1()+330<-kdddgb()+7099<-kdusru()+490<-updrowFastPath()+1075<-qerupFetch()+2171<-updaul()+1179<-updThreePhaseExe()+328<-updexe()+663<-opiexe()+14377<-kpoal8()+2288<-opiodr()+1149<-ttcpip()+1251<-opitsk()+1633<-opiino()+958<-opiodr()+1149<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+214<-main()+201<-__lib
wait history: --等待会话近期的等待事件历史列表,且基于时间反向排序
* time between current wait and wait #1: 0.043281 sec
1. event: 'Disk file operations I/O'
time waited: 0.000090 sec
wait id: 32 p1: 'FileOperation'=0x2
p2: 'fileno'=0x4
p3: 'filetype'=0x2
* time between wait #1 and #2: 0.000764 sec
2. event: 'SQL*Net message from client'
time waited: 1 min 0 sec
wait id: 31 p1: 'driver id'=0x62657100
p2: '#bytes'=0x1
* time between wait #2 and #3: 0.000007 sec
3. event: 'SQL*Net message to client'
time waited: 0.000003 sec
wait id: 30 p1: 'driver id'=0x62657100
p2: '#bytes'=0x1
}
and is blocked by --标明被哪个会话所阻塞
=> Oracle session identified by:
{
instance: 1 (guowang.guowang)
os id: 3812
process id: 165, oracle@seconary (TNS V1-V3)
session id: 302
session serial #: 9226
}
which is waiting for 'SQL*Net message from client' with wait info: -同上理,也会列出持锁会话近期的历史等待事件列
{
p1: 'driver id'=0x62657100
p2: '#bytes'=0x1
time in wait: 2 min 49 sec
timeout after: never
wait id: 42
blocking: 1 session
current sql:
short stack: ksedsts()+461<-ksdxfstk()+32<-ksdxcb()+1782<-sspuser()+112<-__restore_rt()<-read()+14<-ntpfprd()+115<-nsbasic_brc()+338<-nsbrecv()+69<-nioqrc()+485<-__PGOSF24_opikndf2()+978<-opitsk()+824<-opiino()+958<-opiodr()+1149<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+214<-main()+201<-__libc_start_main()+244<-_start()+36
wait history:
* time between current wait and wait #1: 0.000015 sec
1. event: 'SQL*Net message to client'
time waited: 0.000005 sec
wait id: 41 p1: 'driver id'=0x62657100
p2: '#bytes'=0x1
* time between wait #1 and #2: 0.338132 sec
2. event: 'SQL*Net message from client'
time waited: 24.359100 sec
wait id: 40 p1: 'driver id'=0x62657100
p2: '#bytes'=0x1
* time between wait #2 and #3: 0.000013 sec
3. event: 'SQL*Net message to client'
time waited: 0.000005 sec
wait id: 39 p1: 'driver id'=0x62657100
p2: '#bytes'=0x1
}
经后查,hanganalyze会列出所有ORACLE进程的信息
并且chain又分为intersecting chains和non-intersecting chains
我们来分析下intersecting chains和non-intersecting chains的区别是什么,先看
non-intersecting chains
经过对比分析,non-intersecting chains的特色就是所包含的会话的阻塞会话,不隶属于任何chain
Chain 2:
-------------------------------------------------------------------------------
Oracle session identified by:
{
instance: 1 (guowang.guowang)
os id: 6728
process id: 146, oracle@seconary (J117)
session id: 29
session serial #: 10069
}
is waiting for 'buffer busy waits' with wait info:
{
p1: 'file#'=0x1
p2: 'block#'=0x15ab9
p3: 'class#'=0x1
time in wait: 0.012849 sec
timeout after: never
wait id: 31
blocking: 0 sessions
current sql:
short stack:
wait history:
* time between current wait and wait #1: 0.000028 sec
1. event: 'enq: TX - index contention'
time waited: 0.016712 sec
wait id: 30 p1: 'name|mode'=0x54580004
p2: 'usn<<16 | slot'=0x12a0009
p3: 'sequence'=0x1953
* time between wait #1 and #2: 0.000066 sec
2. event: 'buffer busy waits'
time waited: 0.552796 sec
wait id: 29 p1: 'file#'=0x1
p2: 'block#'=0x15ab9
p3: 'class#'=0x1
* time between wait #2 and #3: 0.000340 sec
3. event: 'buffer busy waits'
time waited: 0.497199 sec
wait id: 28 p1: 'file#'=0x1
p2: 'block#'=0x225eb
p3: 'class#'=0x1
}
and is blocked by
=> Oracle session identified by:
{
instance: 1 (guowang.guowang)
os id: 6439 --也就是说持锁会话不隶属于chain
process id: 111, oracle@seconary (J083)
session id: 205
session serial #: 9696
再来看看intersecting chains
发现intersecting chains包含的会话的阻塞会话隶属于另一个chains中的持锁会话,也就是说chains包含在另一个chains中
-------------------------------------------------------------------------------
Chain 12:
-------------------------------------------------------------------------------
Oracle session identified by:
{
instance: 1 (guowang.guowang)
os id: 6612
process id: 59, oracle@seconary (J031)
session id: 33
session serial #: 11792
}
is waiting for 'buffer busy waits' with wait info:
{
p1: 'file#'=0x1
p2: 'block#'=0x15ab9
p3: 'class#'=0x1
time in wait: 0.007726 sec
timeout after: never
wait id: 0
blocking: 0 sessions
current sql:
short stack:
}
and is blocked by 'instance: 1, os id: 6439, session id: 205',
which is a member of 'Chain 2'.
Chain 12 Signature: <='buffer busy waits'
Chain 12 Signature Hash: 0x9a13abed
最后学习下state of all nodes(注:如果是单实例数据库,其state of all nodes和state of local nodes内容相同)
先摘录一节内容
State of ALL nodes
([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]):
[9]/1/10/1/0xdcba9740/10574/NLEAF/[31]
[11]/1/12/1/0xdcba3a60/10578/SINGLE_NODE/
[28]/1/29/10069/0xdcb724f0/6728/NLEAF/[204]
[31]/1/32/9465/0xdcb699a0/6664/LEAF_NW/
[32]/1/33/11792/0xdcb66b30/6612/NLEAF/[204]
[33]/1/34/3/0xdcb63cc0/10707/SINGLE_NODE/
[38]/1/39/9453/0xdcb55490/6754/NLEAF/[204]
其实每个列的含义都有注明,我们依次来分析,我采用的方法仍是基于我上述构建的测试2个会话即持锁会话以及等待锁会话,在TRACE FILE查等待会话164,匹配结果如下:
[163]/1/164/10419/0xdc9ea7e0/4046/NLEAF/[301]
根据其每列含义
([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]):
我们来分解下:
SQL> select saddr from v$session where sid=164;
SADDR
----------------
00000000DC9EA7E0
数据库实例编号 会话sid 会话serial# 会话的地址saddr 会话对应的操作系统ID 表明会话是否等待 如有值,表明是持锁会话的CHAIN编号,否则为空
[163] / 1 / 164 / 10419 / 0xdc9ea7e0 / 4046 / NLEAF / [301]
但是163是什么含义呢,以及301是何义呢,会不会是chains的编号呢,还有NLEAF是何义,另急,我们试着分析下,我试着查301,定位到如下内容
[301]/1/302/9226/0xdc859f80/3812/LEAF/
经过对比分析,301就是持锁会话所在的chains的编号,所以可知NLEAF表明是等待锁的会话,而LEAF就是不等待锁的会话
还不不够,因为state列发现有几个不同的值:NLEAF,SINGLE_NODE,LEAF_NEW,继续分析
以chain 9为例
([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]):
[9]/1/10/1/0xdcba9740/10574/NLEAF/[31]
从adjlist 31定位到chain 31
[31]/1/32/9465/0xdcb699a0/6664/LEAF_NW/
先看chain 9
Chain 1:
-------------------------------------------------------------------------------
Oracle session identified by:
{
instance: 1 (guowang.guowang)
os id: 10574
process id: 10, oracle@seconary (DBW0)
session id: 10
session serial #: 1
}
is waiting for 'latch: cache buffers chains' with wait info:
{
p1: 'address'=0xda4f8850
p2: 'number'=0x96
p3: 'tries'=0x0
time in wait: 0.012628 sec
timeout after: never
wait id: 50243
blocking: 0 sessions
current sql:
short stack: ksedsts()+461<-ksdxfstk()+32<-ksdxcb()+1782<-sspuser()+112<-__restore_rt()<-semop()+7<-skgpwwait()+156<-kslgess()+1799<-ksl_get_shared_latch()+620<-kcbbic1()+247<-kcbbiop()+868<-kcbbdrv()+566<-ksbabs()+465<-ksbrdp()+923<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+266<-ssthrdmain()+214<-main()+201<-__libc_start_main()+244<-_start()+36
wait history:
* time between current wait and wait #1: 0.000058 sec
1. event: 'db file async I/O submit'
time waited: 0.328988 sec
wait id: 50242 p1: 'requests'=0x1
p2: 'interrupt'=0x0
p3: 'timeout'=0x0
* time between wait #1 and #2: 0.000076 sec
2. event: 'db file async I/O submit'
time waited: 0.000748 sec
wait id: 50241 p1: 'requests'=0x1
p2: 'interrupt'=0x0
p3: 'timeout'=0x0
* time between wait #2 and #3: 0.000086 sec
3. event: 'db file async I/O submit'
time waited: 0.000177 sec
wait id: 50240 p1: 'requests'=0x1
p2: 'interrupt'=0x0
p3: 'timeout'=0x0
}
and is blocked by
=> Oracle session identified by:
{
instance: 1 (guowang.guowang)
os id: 6664
process id: 91, oracle@seconary (J063)
session id: 32
session serial #: 9465
再看下SINGLE_NODE的含义是什么
[33]/1/34/3/0xdcb63cc0/10707/SINGLE_NODE/
可见SINGLE_NODE表明会话不被任何会话或进程所阻塞
Chain 146:
-------------------------------------------------------------------------------
Oracle session identified by:
{
instance: 1 (guowang.guowang)
os id: 10707
process id: 29, oracle@seconary (CJQ0)
session id: 34
session serial #: 3
}
is waiting for 'job scheduler coordinator slave wait' with wait info:
{
time in wait: 0.449655 sec
timeout after: 15.550345 sec
wait id: 81976
blocking: 0 sessions
current sql:
short stack: ksedsts()+461<-ksdxfstk()+32<-ksdxcb()+1782<-sspuser()+112<-__restore_rt()<-semtimedop()+10<-skgpwwait()+156<-ksliwat()+1821<-kslwaitctx()+162<-kslwait()+141<-jscrs_select0()+6801<-jscrs_select()+595<-rpiswu2()+1541<-kkjcjexe()+713<-kkjssrh()+559<-ksbcti()+212<-ksbabs()+1732<-ksbrdp()+923<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+266<-ssthrdmain()+214<-main()+201<-__libc_start_main()+244<-_start()+36
wait history:
* time between current wait and wait #1: 0.000108 sec
1. event: 'rdbms ipc message'
time waited: 2.064533 sec
wait id: 81975 p1: 'timeout'=0xc8
* time between wait #1 and #2: 0.006081 sec
2. event: 'rdbms ipc message'
time waited: 0.501429 sec
wait id: 81974 p1: 'timeout'=0x32
* time between wait #2 and #3: 0.000706 sec
3. event: 'job scheduler coordinator slave wait'
time waited: 16.003771 sec
wait id: 81973
}
Chain 146 Signature: 'job scheduler coordinator slave wait'
Chain 146 Signature Hash: 0x8c8a9c97
总结:NLEAF表明是等待会话
SINGLE_NODE表明它是等待任何会话或资源
LEAF表明它是持锁会话,即它不等待任何会话或资源
LEAF_NEW它也是持锁会话
LEAF和LEAF_NEW的区别是什么?我们将在下文进行分析与测试。
正文到此结束