使用oradebug dump hanganalyze 分析oracle hang系列三
背景
db hang时,分析下oradebug dump hanganalyze 产生的trace file第一部分内容与未HANG的区别,进一步理解其内容构成。
结论
1,DB 未HANG时,第一部分显示如下:
Chains most likely to have caused the hang:
[a] Chain 1 Signature: <='latch: cache buffers chains'
Chain 1 Signature Hash: 0xccebf225
[b] Chain 2 Signature: <='buffer busy waits'
Chain 2 Signature Hash: 0x9a13abed
[c] Chain 3 Signature: <='latch: cache buffers chains'
Chain 3 Signature Hash: 0xccebf225
2,DB HAGN时,第一部分显示如下
Chains most likely to have caused the hang:
*** 2015-11-06 03:42:00.529
[a] Chain 1 Signature: 'rdbms ipc message'<='log file sync'
Chain 1 Signature Hash: 0xcfe404d3
[b] Chain 2 Signature: 'rdbms ipc message'<='log file sync'
Chain 2 Signature Hash: 0xcfe404d3
[c] Chain 3 Signature: 'rdbms ipc message'<='log file sync'
Chain 3 Signature Hash: 0xcfe404d3
3,可见二者区别在于未HANG有not in a wait的字样,而HANG却是具体的等待事件
4, 可以基于第一部分的 Chain 1 Signature Hash: 0xcfe404d3直接定位到发生HANG的chain
然后再查找 is blocked by,即可找到持锁会话,进行针对性分析即可
测试
1,数据库未HANG时产生的oracle dump hanganalyze 3的trace file第一部分
Chains most likely to have caused the hang:
[a] Chain 1 Signature: <='latch: cache buffers chains'
Chain 1 Signature Hash: 0xccebf225
[b] Chain 2 Signature: <='buffer busy waits'
Chain 2 Signature Hash: 0x9a13abed
[c] Chain 3 Signature: <='latch: cache buffers chains'
Chain 3 Signature Hash: 0xccebf225
2,模拟DB HANG,分析TRACE FILE第一部分
--oradebug session1
SQL> select pname,pid,spid,addr from v$process where pname in ('LGWR','PMON');
PNAME PID SPID ADDR
---------- ---------- ------------------------------------------------ ----------------
PMON 2 3856 00000000DD59FC40
LGWR 11 3876 00000000DD5A8E80
SQL> select pid,spid,addr from v$process where addr=(select paddr from v$session where sid=(select sid from v$mystat where rownum=1));
PID SPID ADDR
---------- ------------------------------------------------ ----------------
19 3952 00000000DD5B1080
SQL> select sid from v$session where sid=(select sid from v$mystat where rownum=1);
SID
----------
17
--session 2
SQL> select sid from v$session where sid=(select sid from v$mystat where rownum=1);
SID
----------
183
SQL> select pid,spid,addr from v$process where addr=(select paddr from v$session where sid=(select sid from v$mystat where rownum=1));
PID SPID ADDR
---------- ------------------------------------------------ ----------------
35 6845 00000000DD5C1480
SQL> create table t_db_hang(a int);
Table created.
--session1
SQL> oradebug setospid 3876
Oracle pid: 11, Unix process pid: 3876, image: oracle@seconary (LGWR)
SQL> oradebug suspend
Statement processed.
--session2
---insert hang
SQL> insert into t_db_hang values(1);
-- session 3
--可见session 3不能产生新的会话,数据库已经HANG住
[oracle@seconary ~]$ sqlplus scott/system
SQL*Plus: Release 11.2.0.1.0 Production on Fri Nov 6 03:37:28 2015
Copyright (c) 1982, 2009, Oracle. All rights reserved.
--session1
--生成一个oradebug dump hanganalyze 3
SQL> oradebug setmypid
Statement processed.
SQL> oradebug dump hanganalyze 3
Statement processed.
SQL> oradebug tracefile_name
/oracle/diag/rdbms/guowang/guowang/trace/guowang_ora_3952.trc
--分析trace file
第一部分
Chains most likely to have caused the hang:
*** 2015-11-06 03:42:00.529
[a] Chain 1 Signature: 'rdbms ipc message'<='log file sync'
Chain 1 Signature Hash: 0xcfe404d3
[b] Chain 2 Signature: 'rdbms ipc message'<='log file sync'
Chain 2 Signature Hash: 0xcfe404d3
[c] Chain 3 Signature: 'rdbms ipc message'<='log file sync'
Chain 3 Signature Hash: 0xcfe404d3
基于等待会话183查找,发现如下信息,可见等待会话11,如果发现DB HANG了,也可以查找is blocked by,即可找到持锁会话,进行针对性分析即可
-------------------------------------------------------------------------------
Chain 80:
-------------------------------------------------------------------------------
Oracle session identified by:
{
instance: 1 (guowang.guowang)
os id: 6845
process id: 35, oracle@seconary (TNS V1-V3)
session id: 183
session serial #: 183
}
is waiting for 'log file sync' with wait info: --发现其在等待LGWR,其实这个和上面ORADEBUG 模拟LGWR被阻塞也有关
{
p1: 'buffer#'=0x22f7
p2: 'sync scn'=0x4705d0e
time in wait: 1 min 43 sec (last interval)
time in wait: 5 min 14 sec (total)
timeout after: never
wait id: 241
blocking: 0 sessions
wait history:
* time between current wait and wait #1: 0.000000 sec
1. event: 'latch free'
time waited: 0.004440 sec
wait id: 242 p1: 'address'=0xdf0e9298
p2: 'number'=0x2
p3: 'tries'=0x0
* time between wait #1 and #2: 0.000000 sec
2. event: 'log file sync'
time waited: 3 min 30 sec
wait id: 241 p1: 'buffer#'=0x22f7
p2: 'sync scn'=0x4705d0e
* time between wait #2 and #3: 0.120403 sec
3. event: 'db file sequential read'
time waited: 0.014195 sec
wait id: 240 p1: 'file#'=0x1
p2: 'block#'=0xaad
p3: 'blocks'=0x1
}
and is blocked by 'instance: 1, os id: 3876, session id: 11',
which is a member of 'Chain 1'.
看下等待会话11是什么,可见会话11就是后台进程LGWR
and is blocked by
=> Oracle session identified by:
{
instance: 1 (guowang.guowang)
os id: 3876
process id: 11, oracle@seconary (LGWR)
session id: 11
session serial #: 1
--既然lgwr是阻塞会话,恢复它的功能即可
--session1
[oracle@seconary ~]$ sqlplus -prelim '/as sysdba'
SQL*Plus: Release 11.2.0.1.0 Production on Fri Nov 6 03:51:57 2015
Copyright (c) 1982, 2009, Oracle. All rights reserved.
SQL>
SQL> oradebug setospid 3876
Oracle pid: 11, Unix process pid: 3876, image: oracle@seconary (LGWR)
SQL> oradebug resume
Statement processed.
---session 2
--hang dml执行成功
SQL> insert into t_db_hang values(1);
1 row created.
--session 3也可以新生成会话
[oracle@seconary ~]$ sqlplus scott/system
SQL*Plus: Release 11.2.0.1.0 Production on Fri Nov 6 03:58:36 2015
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL>
再学习下另一个知识
Chains most likely to have caused the hang:
*** 2015-11-06 03:42:00.529
[a] Chain 1 Signature: 'rdbms ipc message'<='log file sync'
Chain 1 Signature Hash: 0xcfe404d3
[b] Chain 2 Signature: 'rdbms ipc message'<='log file sync'
Chain 2 Signature Hash: 0xcfe404d3
[c] Chain 3 Signature: 'rdbms ipc message'<='log file sync'
Chain 3 Signature Hash: 0xcfe404d3
经过分析,这就是发生HANG的相关chain信息,可以由具体的chain signature hash对应值,直接定们到具体的chain即可
正文到此结束