DB Version:11.2.0.4+RAC
OS Version:Oracle Linux Server 6.7
在我的一个两节点RAC数据库上,alert日志中平均每3秒左右就会产生一条连接超时错误,如下:
- ***********************************************************************
- Fatal NI connect error 12170.
- VERSION INFORMATION:
- TNS for Linux: Version 11.2.0.4.0 - Production
- Oracle Bequeath NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production
- TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production
- Time: 29-APR-2016 15:29:47
- Tracing not turned on.
- Tns error struct:
- ns main err code: 12535
-
- TNS-12535: TNS:operation timed out
- ns secondary err code: 12606
- nt main err code: 0
- nt secondary err code: 0
- nt OS err code: 0
- Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=10.xxx.xx.xx)(PORT=39766))
- WARNING: inbound connection timed out (ORA-3136)
登录数据库之后,检查会话的等待事件,发现很多会话的event为library cache lock,并且发现由于library cache lock造成锁阻塞的会话也很多,如下:
- SYS@xxxxxxx(dmxxxxx)> select sid,event,SECONDS_IN_WAIT from v$session where state='WAITING' and event not like 'SQL%' and event not like 'rdbms%' order by SECONDS_IN_WAIT;
-
- SID EVENT SECONDS_IN_WAIT
- ---------- ---------------------------------------- ---------------
- 921 PING 7
- 1382 library cache lock 10
- 2417 Streams AQ: qmn coordinator idle wait 21
- 692 library cache lock 35
- 349 library cache lock 35
- 2189 library cache lock 36
- 2419 library cache lock 46
- 235 library cache lock 46
- 2305 library cache lock 46
- 1613 library cache lock 51
- 1271 library cache lock 58
- 2303 library cache lock 74
- 1962 library cache lock 74
- 1732 library cache lock 79
- 468 library cache lock 95
- 926 library cache lock 95
- 6 library cache lock 106
- 812 library cache lock 108
- 348 library cache lock 118
- 1617 library cache lock 139
- 237 library cache lock 153
- 120 library cache lock 155
- 2534 library cache lock 166
- 469 library cache lock 171
- 2077 library cache lock 173
- 1501 library cache lock 173
- 696 library cache lock 173
- 583 library cache lock 173
- 2648 library cache lock 173
- 5 library cache lock 173
- 2647 library cache lock 173
- 2533 library cache lock 173
- 2422 library cache lock 173
- 1 smon timer 209
- 2532 Streams AQ: waiting for time management 1050393
- or cleanup tasks
-
- 461 VKTM Logical Idle Wait 2260216
-
- --锁阻塞查询结果
- INSTANCE SES SESS_SERIAL# BLOCKER_INSTANCE BLOCKER_SID BLOCKER_SESS_SERIAL#
---------- --------------- ------------ ---------------- ----------- --------------------
2 1729 22581
1 3 14009 2 1729 22581
1 4 9749 2 1729 22581
2 4 22677 2 1729 22581
1 119 6855 2 1729 22581
2 120 11049 2 1729 22581
1 237 1639 2 1729 22581
2 237 51763 2 1729 22581
1 349 5499 2 1729 22581
2 350 18479 2 1729 22581
1 351 6311 2 1729 22581
2 351 20871 2 1729 22581
1 464 21611 2 1729 22581
1 465 7039 2 1729 22581
1 466 8357 2 1729 22581
2 469 6163 2 1729 22581
1 580 7181 2 1729 22581
1 582 7545 2 1729 22581
2 583 15015 2 1729 22581
2 692 45275 2 1729 22581
1 694 6941 2 1729 22581
2 696 15463 2 1729 22581
1 810 28941 2 1729 22581
2 812 18891 2 1729 22581
2 926 8275 2 1729 22581
1 1155 5669 2 1729 22581
1 1270 5621 2 1729 22581
1 1385 9857 2 1729 22581
1 1501 3555 2 1729 22581
2 1501 21727 2 1729 22581
1 1617 1515 2 1729 22581
2 1617 14759 2 1729 22581
2 1728 43225 2 1729 22581
1 1846 3393 2 1729 22581
1 1961 5643 2 1729 22581
1 2077 4889 2 1729 22581
2 2077 20353 2 1729 22581
1 2191 5141 2 1729 22581
2 2303 21061 2 1729 22581
1 2306 5609 2 1729 22581
1 2419 5091 2 1729 22581
1 2421 7683 2 1729 22581
2 2422 17493 2 1729 22581
2 2533 15515 2 1729 22581
2 2534 19505 2 1729 22581
1 2536 6187 2 1729 22581
1 2647 25537 2 1729 22581
2 2647 41505 2 1729 22581
1 2649 7135 2 1729 22581
2 2650 21481 2 1729 22581
50 rows selected.
从AWR报告的Time Model Statistics部分可以看到,connection management call 消耗的时间占整个DB Time的99.95%,记得以前好象看到过由于11g的新特性密码错误可能造成大量library cache lock的文章,于是上MOS也查了查相关的文档,结合dba_audit_trail视图发现,从4月7日起到现在,平均每天约有九千多次的登录失败记录,那么是不是4月7日对PAMPAS_W用户更改密码了呢? 通过查询sys.user$也得到了验证,用户在4月7日9点31分更改过密码。到此问题查清了,解决方案也很简单,通知负责应用的工程师来改正密码。在数据库层面也可通过设置28401事件来屏蔽密码错误验证延迟功能。
alter system set event ="28401 TRACE NAME CONTEXT FOREVER, LEVEL 1" scope=spfile;
oracle采用的这种密码延时登录机制主要是为了防止口令猜测攻击,我们也可以通过profile,设置密码登录错误几次后被锁定的方式。不过不管怎样,都应该避免应用程序源源不断的向数据库发送无效的密码。 Time Model Statistics
- Total time in database user-calls (DB Time): 86942.6s
- Statistics including the word "background" measure background process time, and so do not contribute to the DB time statistic
- Ordered by % or DB time desc, Statistic name
Statistic Name | Time (s) | % of DB Time |
connection management call elapsed time | 86,899.87 | 99.95 |
DB CPU | 39.93 | 0.05 |
sql execute elapsed time | 26.45 | 0.03 |
parse time elapsed | 8.23 | 0.01 |
hard parse elapsed time | 7.58 | 0.01 |
PL/SQL execution elapsed time | 0.70 | 0.00 |
PL/SQL compilation elapsed time | 0.27 | 0.00 |
RMAN cpu time (backup/restore) | 0.17 | 0.00 |
failed parse elapsed time | 0.09 | 0.00 |
hard parse (sharing criteria) elapsed time | 0.07 | 0.00 |
inbound PL/SQL rpc elapsed time | 0.01 | 0.00 |
repeated bind elapsed time | 0.00 | 0.00 |
DB time | 86,942.64 | |
background elapsed time | 117.33 | |
background cpu time | 52.15 | |
- SYS@xxxndbx(dmxxxx2)> select os_username,USERNAME,userhost,to_char(TIMESTAMP,'yyyy-mm-dd') qq,count(*) from dba_audit_trail
- 2 where timestamp > '2016-01-01 12:00:00'
- 3 and returncode=1017 group by os_username,username,userhost,to_char(TIMESTAMP,'yyyy-mm-dd') order by 4;
-
- OS_USERNAME USERNAME USERHOST QQ COUNT(*)
- --------------- ------------------------------ -------------------- ---------- ----------
- qqpamp0 PAMPAS_R ltxshe0kqal2 2016-04-06 3
- qqpamp0 PAMPAS_W ltxshe0kqal2 2016-04-06 2
- qqpamp0 PAMPAS_W ltxshe0ipso2 2016-04-07 5495
- qqpamp0 PAMPAS_W ltxshe0ipso2 2016-04-08 9099
- qqpamp0 PAMPAS_W ltxshe0ipso2 2016-04-09 9105
- qqpamp0 PAMPAS_W ltxshe0ipso2 2016-04-10 9098
- qqpamp0 PAMPAS_W ltxshe0ipso2 2016-04-11 9070
- qqpamp0 PAMPAS_W ltxshe0ipso2 2016-04-12 9085
- qqpamp0 PAMPAS_W ltxshe0ipso2 2016-04-13 9088
- qqpamp0 PAMPAS_W ltxshe0ipso2 2016-04-14 9093
- qqpamp0 PAMPAS_W ltxshe0ipso2 2016-04-15 9117
- qqpamp0 PAMPAS_W ltxshe0ipso2 2016-04-16 9096
- qqpamp0 PAMPAS_W ltxshe0ipso2 2016-04-17 9099
- qqpamp0 PAMPAS_W ltxshe0ipso2 2016-04-18 9072
- qqpamp0 PAMPAS_W ltxshe0ipso2 2016-04-19 9121
- qqpamp0 PAMPAS_W ltxshe0ipso2 2016-04-20 9114
- qqpamp0 PAMPAS_W ltxshe0ipso2 2016-04-21 9118
- qqpamp0 PAMPAS_W ltxshe0ipso2 2016-04-22 9070
- qqpamp0 PAMPAS_W ltxshe0ipso2 2016-04-23 9112
- qqpamp0 PAMPAS_W ltxshe0ipso2 2016-04-24 9112
- qqpamp0 PAMPAS_W ltxshe0ipso2 2016-04-25 9091
- qqpamp0 PAMPAS_W ltxshe0ipso2 2016-04-26 9081
- qqpamp0 PAMPAS_W ltxshe0ipso2 2016-04-27 9105
- qqpamp0 PAMPAS_W ltxshe0ipso2 2016-04-28 9077
- qqpamp0 PAMPAS_W ltxshe0ipso2 2016-04-29 5306
-
- 26 rows selected.
-
SYS@xxxndbx(dmxxxx2)> select ptime , ctime from sys.user$ where name='PAMPAS_W';
PTIME CTIME
------------------- -------------------
2016-04-07 09:31:32 2015-02-09 14:01:28
Oracle 11g下要注意的几个关于密码方面问题:
1. 11g默认开始密码区分大小写,可以通过把参数设置为SEC_CASE_SENSITIVE_LOGON =FALSE 屏蔽
2. 11g密码默认有效期180天,可以通过修改ALTER PROFILE DEFAULT[根据实际的profile] LIMIT PASSWORD_LIFE_TIME UNLIMITED; 注意需要修改密码生效
3. 密码错误验证延迟,可以通过设置EVENT="28401 TRACE NAME CONTEXT FOREVER, LEVEL 1" 屏蔽
MOS上的相关文档:
High 'library cache lock' Wait Time Due to Invalid Login Attempts(1309738.1) BUG:11742803
- LOTS OF 'LIBRARY CACHE LOCK' DURING USER LOGON AUTHENTICATION NOTE:7715339.8
- Bug 7715339 - Logon failures causes "row cache lock" waits - Allow disable of logon delay BUG:19867671
- LIBRARY CACHE LOCK CAUSED BY WRONG PASSWORD LOGIN High Waits for 'library cache lock' and 'library cache: mutex' Contention with Application Client Connection Timeouts and Associated ORA-03136 Errors
(1620409.1)