Oracle层次查询中connect_by_iscycle伪列的取值研究
这半年过的很充实,无暇顾及blog,还好阔别不长,兴致未减。
表里的记录若存在上下级关系,借助层次查询(Hierarchical query)能将记录按照树状形式输出,关于层次查询这里不展开介绍。
我们要研究的是当表中的上下级记录之间存在循环关系时,oracle是如何把这些引起循环的行标记出来的。
#####创建测试用表
drop table scott.t0704_1;
create table scott.t0704_1(tn varchar2(1),fatherid number,childid number);
insert into scott.t0704_1 values('A',null,1);
insert into scott.t0704_1 values('B',1,2);
insert into scott.t0704_1 values('C',1,3);
insert into scott.t0704_1 values('D',2,4);
insert into scott.t0704_1 values('E',4,1);
insert into scott.t0704_1 values('F',4,5);
commit;
select * from scott.t0704_1;
TN FATHERID CHILDID
-- ---------- ----------
A 1
B 1 2
C 1 3
D 2 4
E 4 1
F 4 5
6 rows selected
#####执行层次查询
SQL> select tn,fatherid,childid,ltrim(sys_connect_by_path(childid,'->'),'->') rel,level from scott.t0704_1 start with fatherid is null connect by prior childid=fatherid;
ERROR:
ORA-01436: CONNECT BY loop in user data
no rows selected
childid:2是childid:1的后代,childid:4是childid:2的后代,childid:1又是childid:4的后代,即1->2->4->1,其中1出现了两次,构成了一个循环,层次结构不确定,所以出现了ORA-01436错误
可以在connect by 之后加入nocycle,在表内记录层次结构出现循环的情况下依然打印出部分记录,同时利用connect_by_iscycle伪列协助标记循环是从哪一行开始的(connect_by_iscycle必须与nocycle连用)
>>>>> Example 1:
col rel format a15
select tn,fatherid,childid,connect_by_iscycle,ltrim(sys_connect_by_path(childid,'->'),'->') rel,level from scott.t0704_1 start with fatherid is null connect by nocycle prior childid=fatherid;
T FATHERID CHILDID CONNECT_BY_ISCYCLE REL LEVEL
- ---------- ---------- ------------------ --------------- ----------
A 1 0 1 1
B 1 2 0 1->2 2
D 2 4 1 1->2->4 3
F 4 5 0 1->2->4->5 4
C 1 3 0 1->3 2
SQL Reference上的对于connect_by_iscycle的解释是:
The CONNECT_BY_ISCYCLE pseudocolumn returns 1 if the current row has a child which is also its ancestor. Otherwise it returns 0. 当前行的后代同时也是当前行的祖先时,这一行就会被标示为connect_by_iscycle=1。按照这一逻辑,在处理到tn='D'这行时发现4的后代是1,而1又是4的祖先,所以tn='D'所在行的connect_by_iscycle=1,tn='E'这一行使得层次结构上出现了循环就没有输出,这样解释似乎很合情理
我们把查询稍微修改一下: start with fatherid is null=>start with fatherid=1
>>>>> Example 2:
col rel format a15
select tn,fatherid,childid,connect_by_iscycle,ltrim(sys_connect_by_path(childid,'->'),'->') rel,level from scott.t0704_1 start with fatherid=1 connect by nocycle prior childid=fatherid;
T FATHERID CHILDID CONNECT_BY_ISCYCLE REL LEVEL
- ---------- ---------- ------------------ --------------- ----------
B 1 2 0 2 1
D 2 4 0 2->4 2
E 4 1 1 2->4->1 3
C 1 3 0 2->4->1->3 4
F 4 5 0 2->4->5 3
C 1 3 0 3 1
按照上面的解释tn='D'所在行的connect_by_iscycle伪列应当被标记为1,但实际却是tn='E'这行的connect_by_iscycle=1。
官档对于connect_by_iscycle伪列的解释没错,但不足以解释上述两个查询,对于connect_by_iscycle列何时为1,我的理解如下:
因connect by是按照深度优先的原则进行遍历的,在Example 1里当遍历了tn='D'(fatherid=2、childid=4)后,再往深一层遍历的时候就轮到tn='E'(fatherid=4、childid=1)了,此时childid=1已经在tn='A'所在行输出过一次了,鉴于tn='E'所在行会导致层次结构上的循环,所以这一行不会被输出,其祖先tn='D'所在行的connect_by_iscycle=1。
在Example 2里当遍历了tn='E'(fatherid=4、childid=1)后,再往深一层遍历的时候就又轮到tn='B'(fatherid=1、childid=2)了,此时childid=2已经在tn='B'所在行输出过一次了,鉴于tn='B'所在行会导致层次结构上的循环,所以这一行不会被重复的输出第二遍,tn='E'所在行的connect_by_iscycle=1。
稍加总结:connect by prior c1=f1作为表内记录层次关联的条件时,在遍历过程中c1字段会与祖先节点的c1字段进行比较,在level=m时遍历到c1=k,在level=n时(n>m)又遍历到c1=k,那么level=n时的c1=k所在行不会输出,level>n时以c1=k作为祖先的行自然也不会输出;level=(n-1)时c1=k的祖先所在行输出且connect_by_iscycle=1
扩展一下:
若要在Example 1的查询中输出tn='E'所在的行,可以这样改写:
col rel format a15
select tn,fatherid,childid,connect_by_iscycle,ltrim(sys_connect_by_path(childid,'->'),'->') rel,level from scott.t0704_1 start with fatherid is null connect by nocycle prior childid=fatherid and (prior fatherid is null or prior fatherid is not null);
T FATHERID CHILDID CONNECT_BY_ISCYCLE REL LEVEL
- ---------- ---------- ------------------ --------------- ----------
A 1 0 1 1
B 1 2 0 1->2 2
D 2 4 0 1->2->4 3
E 4 1 1 1->2->4->1 4
C 1 3 0 1->2->4->1->3 5 <---因为'E'输出了,所以其child:C(level=5)也输出了
F 4 5 0 1->2->4->5 4
C 1 3 0 1->3 2
(prior fatherid is null or prior fatherid is not null)这个条件看似无意义,但却能让childid、fatherid两个列都加入到与祖先节点是否相等的判断中,只有这两个列的值都与祖先行相等这一行才不会被输出。概括地讲:connect nocycle by prior c1=f1 and prior c2=f2 ... and prior cn=fn,那么prior后的所有字段(c1,c2,...cn)都将被纳入循环的判断条件,使得即使循环的行也能被正常的输出
正文到此结束