11g备库无法开启ADG的原因分析
今天碰到一个有些奇怪的问题,但是奇怪的现象背后都是有本质的因果。
下午在做一个环境的检查时,发现备库是在mount阶段,这可是一个11gR2的库,没有ADG实在是太浪费了,对于这种情况感觉太不应该了。
所以尝试启动至open阶段,发现状态一直是read only,在ADG中应该是READ ONLY WITH APPLY才对啊。
使用dg broker设置为READ-ONLY,备库的数据库日志如下:
Standby Database: stestdb3, Enabled Physical Standby (0x02010000)
08/14/2014 16:03:28
version check on database stestdb3 detected stale metadata,
requesting update from primary database
Creating process RSM0
12/29/2015 16:28:11
Command EDIT DATABASE stestdb3 SET STATE = READ-ONLY completed
Read-Only state no longer supported
12/29/2015 16:29:10
似乎也看不出来什么端倪。使用dg broker查看一下。发现报了下面的错误。
DGMGRL> show configuration;
Configuration - testdb
Protection Mode: MaxPerformance
Databases:
testdbbak93 - Primary database
stestdb3 - Physical standby database
Error: ORA-16766: Redo Apply is stopped
Fast-Start Failover: DISABLED
Configuration Status:
ERROR
查看dg broker的日志如下:
Data Guard Broker initializing...
Data Guard Broker initialization complete
Tue Dec 29 16:47:15 2015
SMON: enabling cache recovery
No Resource Manager plan active
Physical standby database opened for read only access.
Completed: alter database open
Tue Dec 29 16:47:16 2015
idle dispatcher 'D000' terminated, pid = (18, 1)
Tue Dec 29 16:51:40 2015
Primary database is in MAXIMUM PERFORMANCE mode
RFS[3]: Assigned to RFS process 3596
RFS[3]: Selected log 7 for thread 1 sequence 72606 dbid -1549369665 branch 746558785
Tue Dec 29 16:51:41 2015
RFS[4]: Assigned to RFS process 3590
RFS[4]: Selected log 8 for thread 1 sequence 72605 dbid -1549369665 branch 746558785
Tue Dec 29 16:51:42 2015
Archived Log entry 69432 added for thread 1 sequence 72605 ID 0xa829ec3b dest 2:
从上面的情况可以很明显看到,确实MRP没有开始工作,只有RFS在接收归档。
然后使用dg broker把备库设置为ONLINE状态,再次查看dg broker的检查,发现检查就没有问题了。
DGMGRL> show configuration;
Configuration - testdb
Protection Mode: MaxPerformance
Databases:
testdbbak93 - Primary database
stestdb3 - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS
总体感觉这不是一个11g的库。
然后再次尝试,手工启动到open阶段,然后可以看到备库还是READ ONLY,重启之后问题依然存在。
对于这个问题,最好的方式也还是查看日志,这个备库是一年前重启的了,庆幸的是数据库日志依然存在。从当时的启动情况来看,也没有其它的错误。
但是我注意到了compatible这个参数,因为在11g的库中还是比较显眼的。所以这个参数引起了我的好奇。
结果带着疑问在MOS一查,果然有几篇相关的文章,看来又碰上一个遗留问题,而且有一个相关的BUG描述。
ACTIVE DATAGUARD (ADG) NOT POSSIBLE WITH COMPATIBLE < 11.1.0.0.0 (Doc ID 1363396.1)
BUG:13032521 - ADG PHYSICAL STANDBY GOES TO MOUNT STATE INSTEAD OF READ ONLY WITH APPLY
问题基本定位后,主备库中查看这个参数都是10.2.0.5.0
SQL> show parameter compa
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
compatible string 10.2.0.5.0
那么按照bug描述的WA,是设置备库的compatible为11.1.0.7以上,这个参数的修改需要重启实例,所以还是比较影响的,主库目前是没法重启了。
SQL> alter system set compatible='11.2.0.3.0';
alter system set compatible='11.2.0.3.0'
*
ERROR at line 1:
ORA-02095: specified initialization parameter cannot be modified
现在备库设置一番,先看看行不行。
SQL> alter system set compatible='11.2.0.3.0' scope=spfile;
System altered.
重启时,可以看到备库的数据库日志有下面这么一段输出。
Tue Dec 29 17:25:26 2015
Spfile /U01/app/oracle/product/11.2.3/db_1/dbs/spfiletestdb.ora is in old pre-11 format and compatible >= 11.0.0; converting to new H.A.R.D. compliant format.
Completed: alter database mount
但是再次设置为ONLINE,查看数据库状态依旧是MOUNT
SQL> select open_mode from v$database;
OPEN_MODE
--------------------
READ ONLY
看来备库修改还不行,主库也得修改一致。
不过查看数据库日志可以看到下面的这么一段内容,发现MRP启动失败。
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE THROUGH ALL SWITCHOVER DISCONNECT USING CURRENT LOGFILE
Attempt to start background Managed Standby Recovery process (testdb)
Tue Dec 29 17:57:03 2015
MRP0 started with pid=29, OS id=17740
MRP0: Background Managed Standby Recovery process started (testdb)
started logmerger process
Tue Dec 29 17:57:08 2015
Managed Standby Recovery starting Real Time Apply
Parallel Media Recovery started with 16 slaves
Waiting for all non-current ORLs to be archived...
All non-current ORLs have been archived.
Media Recovery Log /U01/app/oracle/fra/StestDB3/archivelog/2015_12_29/o1_mf_1_72606_c84n0xml_.arc
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE THROUGH ALL SWITCHOVER DISCONNECT USING CURRENT LOGFILE
Errors with log /U01/app/oracle/fra/StestDB3/archivelog/2015_12_29/o1_mf_1_72606_c84n0xml_.arc
MRP0: Background Media Recovery terminated with error 38800
Errors in file /U01/app/oracle/diag/rdbms/stestdb3/testdb/trace/testdb_pr00_17745.trc:
ORA-38800: Cannot start Redo Apply on the open physical standby database
Managed Standby Recovery not using Real Time Apply
Recovery interrupted!
MRP0: Background Media Recovery process shutdown (testdb)
看来这个参数变化影响确实不小,备库先恢复正常状态再说,等协调主库重启再处理了,所以开始恢复参数原有的设置。把compatible设置为10.2.0.5.0
?但是重启的时候就开始报错了。
SQL> alter database mount;
alter database mount
*
ERROR at line 1:
ORA-00201: control file version 11.2.0.3.0 incompatible with ORACLE version
10.2.0.5.0
ORA-00202: control file: '/U01/app/oracle/oradata/testdb/control01.ctl'
这个问题看似还有余地,在主库生成备库控制文件,传输过去,mount就没有问题了
主库:
SQL> alter database create standby controlfile as '/tmp/std1.ctl';
Database altered.
?备库:
SQL> alter database mount standby database;
Database altered.
但是这个时候查看备库的数据库日志,发现问题貌似变麻烦了。文件头部已经修改,已经不同步了。
ALTER DATABASE RECOVER managed standby database disconnect from session
Attempt to start background Managed Standby Recovery process (testdb)
Tue Dec 29 18:28:13 2015
MRP0 started with pid=30, OS id=24283
MRP0: Background Managed Standby Recovery process started (testdb)
started logmerger process
Tue Dec 29 18:28:18 2015
Managed Standby Recovery not using Real Time Apply
Read of datafile '/U01/app/oracle/oradata/testdb/system01.dbf' (fno 1) header failed with ORA-01130
Rereading datafile 1 header failed with ORA-01130
MRP0: Background Media Recovery terminated with error 1110
Errors in file /U01/app/oracle/diag/rdbms/stestdb3/testdb/trace/testdb_pr00_24288.trc:
ORA-01110: data file 1: '/U01/app/oracle/oradata/testdb/system01.dbf'
ORA-01122: database file 1 failed verification check
ORA-01110: data file 1: '/U01/app/oracle/oradata/testdb/system01.dbf'
ORA-01130: database file version 11.2.0.3.0 incompatible with ORACLE version 10.2.0.5.0
Slave exiting with ORA-1110 exception
Errors in file /U01/app/oracle/diag/rdbms/stestdb3/testdb/trace/testdb_pr00_24288.trc:
ORA-01110: data file 1: '/U01/app/oracle/oradata/testdb/system01.dbf'
ORA-01122: database file 1 failed verification check
ORA-01110: data file 1: '/U01/app/oracle/oradata/testdb/system01.dbf'
ORA-01130: database file version 11.2.0.3.0 incompatible with ORACLE version 10.2.0.5.0
Recovery Slave PR00 previously exited with exception 1110
MRP0: Background Media Recovery process shutdown (testdb)
Completed: ALTER DATABASE RECOVER managed standby database disconnect from session
对应的trace文件如下:
*** 2015-12-29 18:28:18.495 4320 krsh.c
Managed Standby Recovery not using Real Time Apply
Read of datafile '/U01/app/oracle/oradata/testdb/system01.dbf' (fno 1) header failed with ORA-01130
Rereading datafile 1 header failed with ORA-01130
V10 STYLE FILE HEADER:
Compatibility Vsn = 186647296=0xb200300
Db ID=2745597631=0xa3a67ebf, Db Name='testDB'
Activation ID=0=0x0
Control Seq=1=0x1, File size=147200=0x23f00
File Number=1, Blksiz=8192, File Type=3 DATA
Tablespace #0 - SYSTEM rel_fn:1
对于这种情况,其实恢复备库11g的控制文件,重启主库 应该就可以解决了,但是重启主库还需要协调时间,找维护窗口,所以不是一蹴而就的事情,那么这个期间容灾是重中之重,一旦主库出了问题,影响还是不小,所以最后的无奈之举就是重建备库。
当然搭建备库还是可以采用11g的active方式。
rman target sys@xxxxx auxiliary sys@xxxx nocatalog
RMAN> duplicate target database for standby from active database nofilenamecheck;
?然后就没有然后了,就是备库搭建成功了,看着白忙活一场,心中像打翻了五味瓶。
正文到此结束