结论
1,如果节点间的DML操作非常频繁,会在节点间产生大量的GES信息传递
2,ges messages sent可以评估RAC节点的DML操作或通讯是否频繁,如果此值小,说明RAC数据库并不繁忙
3,lmd进程是负责发送ges message到远端的RAC节点
4,如果用oradebug suspend lmd,会导致远端RAC节点的insert操作受阻,可见lmd进程就是管理全局锁资源
5,ges messages sent在lmd hang时,也会有微量的增加
6,引申一下,可能v$sysstat中的性能指标,说白了就是评估RAC不同的后台进程的性能的,通过这些指标可以进一步分析后台进程是否正常
进而进一步诊断分析RAC数据库的性能
测试
SQL> select * from v$version where rownum=1;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bi
SQL> select statistic#,name,class from v$statname where lower(name) like '%ges%';
STATISTIC# NAME CLASS
---------- ---------------------------------------------------------------- ----------
22 messages sent 128
23 messages received 128
44 gcs messages sent 32
45 ges messages sent 32 ---主要研究这个指标
可见ges messages sent隶属于集群层面
CLASS NUMBER A number representing one or more statistics classes. The following class numbers are additive:
1 - User
2 - Redo
4 - Enqueue
8 - Cache
16 - OS
32 - Real Application Clusters
64 - SQL
128 - Debug
--node1
SQL> select v$statname.name,v$sysstat.value from v$sysstat,v$statname where v$sysstat.statistic#=v$statname.statistic# and v$statname.statistic# in (45);
NAME VALUE
---------------------------------------------------------------- ----------
ges messages sent 44374
SQL> create table t_ges(a int,b int);
Table created.
SQL> insert into t_ges select level,level from dual connect by level<=1000000;
1000000 rows created.
SQL> commit;
Commit complete.
可见大量的DML操作后,指标值大幅提升
SQL> select v$statname.name,v$sysstat.value from v$sysstat,v$statname where v$sysstat.statistic#=v$statname.statistic# and v$statname.statistic# in (45);
NAME VALUE
---------------------------------------------------------------- ----------
ges messages sent 44925
可见TRUNCATE大表指标也会提升,不过不如INSERT增幅大
SQL> truncate table t_ges;
Table truncated.
SQL> select v$statname.name,v$sysstat.value from v$sysstat,v$statname where v$sysstat.statistic#=v$statname.statistic# and v$statname.statistic# in (45);
NAME VALUE
---------------------------------------------------------------- ----------
ges messages sent 45104
--node2
SQL> create table t_ges2(a int,b int);
Table created.
可见在另一个节点也会使指标提升
SQL> insert into t_ges2 select level,level from dual connect by level<=1000000;
1000000 rows created.
SQL> commit;
Commit complete.
---node1
SQL> select v$statname.name,v$sysstat.value from v$sysstat,v$statname where v$sysstat.statistic#=v$statname.statistic# and v$statname.statistic# in (45);
NAME VALUE
---------------------------------------------------------------- ----------
ges messages sent 45373
如果暂停lmd,上述指标还有变化吗?会如何表现呢?
SQL> select addr,program,username,pid,spid from v$process where username='oracle' and pid=6;
ADDR PROGRAM USERNAME PID SPID
---------------- ------------------------------------------------ --------------- ---------- ------------
0000000083A585C8 oracle@jingfa1 (LMD0) oracle 6 15271
SQL> select v$statname.name,v$sysstat.value from v$sysstat,v$statname where v$sysstat.statistic#=v$statname.statistic# and v$statname.statistic# in (45);
NAME VALUE
---------------------------------------------------------------- ----------
ges messages sent 45430
SQL> oradebug setospid 15271
Oracle pid: 6, Unix process pid: 15271, image: oracle@jingfa1 (LMD0)
SQL> oradebug suspend
Statement processed.
如果HANG LMD,指值增量极小
SQL> select v$statname.name,v$sysstat.value from v$sysstat,v$statname where v$sysstat.statistic#=v$statname.statistic# and v$statname.statistic# in (45);
NAME VALUE
---------------------------------------------------------------- ----------
ges messages sent 45678
SQL> oradebug resume
Statement processed.
SQL> select v$statname.name,v$sysstat.value from v$sysstat,v$statname where v$sysstat.statistic#=v$statname.statistic# and v$statname.statistic# in (45);
NAME VALUE
---------------------------------------------------------------- ----------
ges messages sent 45711
--可见如果LMD超过一个时间期限,仍不能恢复正常,会引发IPC TIMEOUT,进而会把另一个节点从集群中驱逐
oracle@jingfa2 bdump]$ tail -f alert_jingfa2.log
IPC Send timeout to 0.0 inc 8 for msg type 29 from opid 22
Wed Nov 11 03:29:07 2015
Communications reconfiguration: instance_number 1
Wed Nov 11 03:29:07 2015
Trace dumping is performing id=[cdmp_20151111032907]
Wed Nov 11 03:29:11 2015
IPC Send timeout detected.Sender: ospid 18114
Receiver: inst 1 binc 433078410 ospid 15271
Wed Nov 11 03:29:13 2015
IPC Send timeout to 0.0 inc 8 for msg type 12 from opid 18
Wed Nov 11 03:30:59 2015
Evicting instance 1 from cluster
Wed Nov 11 03:31:01 2015
Trace dumping is performing id=[cdmp_20151111033042]
Wed Nov 11 03:31:06 2015
Reconfiguration started (old inc 8, new inc 12)
List of nodes:
1
Global Resource Directory frozen
* dead instance detected - domain 0 invalid = TRUE
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Wed Nov 11 03:31:06 2015
LMS 0: 0 GCS shadows cancelled, 0 closed
Set master node info