转载

Scalaz（24）－泛函数据结构： Tree-数据游览及维护

上节我们讨论了Zipper－串形不可变集合（immutable sequencial collection）游标，在串形集合中左右游走及元素维护操作。这篇我们谈谈Tree。在电子商务应用中对于xml,json等格式文件的处理要求非常之普遍，scalaz提供了Tree数据类型及相关的游览及操作函数能更方便高效的处理xml,json文件及系统目录这些树形结构数据的相关编程。scalaz Tree的定义非常简单：scalaz/Tree.scala

 * A multi-way tree, also known as a rose tree. Also known as Cofree[Stream, A].  */ sealed abstract class Tree[A] {    import Tree._    /** The label at the root of this tree. */   def rootLabel: A    /** The child nodes of this tree. */   def subForest: Stream[Tree[A]] ...

Tree是由一个A值rootLabel及一个流中子树Stream[Tree[A]]组成。Tree可以只由一个A类型值rootLabel组成，这时流中子树subForest就是空的Stream.empty。只有rootLabel的Tree俗称叶（leaf），有subForest的称为节（node）。scalaz为任何类型提供了leaf和node的构建注入方法：syntax/TreeOps.scala

 final class TreeOps[A](self: A) {   def node(subForest: Tree[A]*): Tree[A] = Tree.node(self, subForest.toStream)    def leaf: Tree[A] = Tree.leaf(self) }  trait ToTreeOps {   implicit def ToTreeOps[A](a: A) = new TreeOps(a) }

实际上注入方法调用了Tree里的构建函数：

 trait TreeFunctions {   /** Construct a new Tree node. */   def node[A](root: => A, forest: => Stream[Tree[A]]): Tree[A] = new Tree[A] {     lazy val rootLabel = root     lazy val subForest = forest      override def toString = "<tree>"   }    /** Construct a tree node with no children. */   def leaf[A](root: => A): Tree[A] = node(root, Stream.empty)

Tree提供了构建和模式拆分函数：

 object Tree extends TreeInstances with TreeFunctions {   /** Construct a tree node with no children. */   def apply[A](root: => A): Tree[A] = leaf(root)    object Node {     def unapply[A](t: Tree[A]): Option[(A, Stream[Tree[A]])] = Some((t.rootLabel, t.subForest))   } }

我们可以直接构建Tree：

  1  Tree("ALeaf") === "ALeaf".leaf                  //> res5: Boolean = true  2   val tree: Tree[Int] =  3     1.node(  4       11.leaf,  5       12.node(  6         121.leaf),  7      2.node(  8       21.leaf,  9       22.leaf) 10      )                                            //> tree  : scalaz.Tree[Int] = <tree> 11   tree.drawTree                                   //> res6: String = "1 12                                                   //| | 13                                                   //| +- 11 14                                                   //| | 15                                                   //| +- 12 16                                                   //| |  | 17                                                   //| |  `- 121 18                                                   //| | 19                                                   //| `- 2 20                                                   //|    | 21                                                   //|    +- 21 22                                                   //|    | 23                                                   //|    `- 22 24                                                   //| "

Tree实现了下面众多的接口函数：

 sealed abstract class TreeInstances {   implicit val treeInstance: Traverse1[Tree] with Monad[Tree] with Comonad[Tree] with Align[Tree] with Zip[Tree] = new Traverse1[Tree] with Monad[Tree] with Comonad[Tree] with Align[Tree] with Zip[Tree] {     def point[A](a: => A): Tree[A] = Tree.leaf(a)     def cobind[A, B](fa: Tree[A])(f: Tree[A] => B): Tree[B] = fa cobind f     def copoint[A](p: Tree[A]): A = p.rootLabel     override def map[A, B](fa: Tree[A])(f: A => B) = fa map f     def bind[A, B](fa: Tree[A])(f: A => Tree[B]): Tree[B] = fa flatMap f     def traverse1Impl[G[_]: Apply, A, B](fa: Tree[A])(f: A => G[B]): G[Tree[B]] = fa traverse1 f     override def foldRight[A, B](fa: Tree[A], z: => B)(f: (A, => B) => B): B = fa.foldRight(z)(f)     override def foldMapRight1[A, B](fa: Tree[A])(z: A => B)(f: (A, => B) => B) = (fa.flatten.reverse: @unchecked) match {       case h #:: t => t.foldLeft(z(h))((b, a) => f(a, b))     }     override def foldLeft[A, B](fa: Tree[A], z: B)(f: (B, A) => B): B =       fa.flatten.foldLeft(z)(f)     override def foldMapLeft1[A, B](fa: Tree[A])(z: A => B)(f: (B, A) => B): B = fa.flatten match {       case h #:: t => t.foldLeft(z(h))(f)     }     override def foldMap[A, B](fa: Tree[A])(f: A => B)(implicit F: Monoid[B]): B = fa foldMap f     def alignWith[A, B, C](f: (/&/[A, B]) ⇒ C) = {        def align(ta: Tree[A], tb: Tree[B]): Tree[C] =         Tree.node(f(/&/(ta.rootLabel, tb.rootLabel)), Align[Stream].alignWith[Tree[A], Tree[B], Tree[C]]({           case /&/.This(sta) ⇒ sta map {a ⇒ f(/&/.This(a))}           case /&/.That(stb) ⇒ stb map {b ⇒ f(/&/.That(b))}           case /&/.Both(sta, stb) ⇒ align(sta, stb)         })(ta.subForest, tb.subForest))       align _     }     def zip[A, B](aa: => Tree[A], bb: => Tree[B]) = {       val a = aa       val b = bb       Tree.node(         (a.rootLabel, b.rootLabel),         Zip[Stream].zipWith(a.subForest, b.subForest)(zip(_, _))       )     }   }    implicit def treeEqual[A](implicit A0: Equal[A]): Equal[Tree[A]] =     new TreeEqual[A] { def A = A0 }    implicit def treeOrder[A](implicit A0: Order[A]): Order[Tree[A]] =     new Order[Tree[A]] with TreeEqual[A] {       def A = A0       import std.stream._       override def order(x: Tree[A], y: Tree[A]) =         A.order(x.rootLabel, y.rootLabel) match {           case Ordering.EQ =>             Order[Stream[Tree[A]]].order(x.subForest, y.subForest)           case x => x         }     }

那么Tree就是个Monad，也是Functor,Applicative，还是traversable,foldable。Tree也实现了Order,Equal实例，可以进行值的顺序比较。我们就用些例子来说明吧：

  1 // 是 Functor...  2     (tree map { v: Int => v + 1 }) ===  3     2.node(  4       12.leaf,  5       13.node(  6         122.leaf),  7      3.node(  8       22.leaf,  9       23.leaf) 10      )                                            //> res7: Boolean = true 11  12  // ...是 Monad 13     1.point[Tree] === 1.leaf                      //> res8: Boolean = true 14     val t2 = tree >>= (x => (x == 2) ? x.leaf | x.node((-x).leaf)) 15                                                   //> t2  : scalaz.Tree[Int] = <tree> 16     t2 === 1.node((-1).leaf, 2.leaf, 3.node((-3).leaf, 4.node((-4).leaf))) 17                                                   //> res9: Boolean = false 18     t2.drawTree                                   //> res10: String = "1 19                                                   //| | 20                                                   //| +- -1 21                                                   //| | 22                                                   //| +- 11 23                                                   //| |  | 24                                                   //| |  `- -11 25                                                   //| | 26                                                   //| +- 12 27                                                   //| |  | 28                                                   //| |  +- -12 29                                                   //| |  | 30                                                   //| |  `- 121 31                                                   //| |     | 32                                                   //| |     `- -121 33                                                   //| | 34                                                   //| `- 2 35                                                   //|    | 36                                                   //|    +- 21 37                                                   //|    |  | 38                                                   //|    |  `- -21 39                                                   //|    | 40                                                   //|    `- 22 41                                                   //|       | 42                                                   //|       `- -22 43                                                   //| " 44  // ...是 Foldable 45     tree.foldMap(_.toString) === "1111212122122"  //> res11: Boolean = true

说到构建Tree，偶然在网上发现了这么一个Tree构建函数：

   def pathTree[E](root: E, paths: Seq[Seq[E]]): Tree[E] = {     root.node(paths groupBy (_.head) map {       case (parent, subpaths) =>         pathTree(parent, subpaths collect {           case pp +: rest if rest.nonEmpty => rest         })     } toSeq: _*)   }

据说这个pathTree函数能把List里的目录结构转化成Tree。先看看到底是不是具备如此功能：

  1   val paths = List(List("A","a1","a2"),List("B","b1"))  2                                                   //> paths  : List[List[String]] = List(List(A, a1, a2), List(B, b1))  3   pathTree("root",paths) drawTree                 //> res0: String = ""root"  4                                                   //| |  5                                                   //| +- "A"  6                                                   //| |  |  7                                                   //| |  `- "a1"  8                                                   //| |     |  9                                                   //| |     `- "a2" 10                                                   //| | 11                                                   //| `- "B" 12                                                   //|    | 13                                                   //|    `- "b1" 14                                                   //| " 15  val paths = List(List("A","a1","a2"),List("B","b1"),List("B","b2","b3")) 16              //> paths  : List[List[String]] = List(List(A, a1, a2), List(B, b1), List(B, b2, 17                                                   //|  b3)) 18   pathTree("root",paths) drawTree                 //> res0: String = ""root" 19                                                   //| | 20                                                   //| +- "A" 21                                                   //| |  | 22                                                   //| |  `- "a1" 23                                                   //| |     | 24                                                   //| |     `- "a2" 25                                                   //| | 26                                                   //| `- "B" 27                                                   //|    | 28                                                   //|    +- "b2" 29                                                   //|    |  | 30                                                   //|    |  `- "b3" 31                                                   //|    | 32                                                   //|    `- "b1" 33                                                   //| "

果然能行，而且还能把"B"节点合并汇集。这个函数的作者简直就是个神人，起码是个算法和FP语法运用大师。我虽然还无法达到大师的程度能写出这样的泛函程序，但好奇心是挡不住的，总想了解这个函数是怎么运作的。可以用一些测试数据来逐步跟踪一下：

 1   val paths = List(List("A"))           //> paths  : List[List[String]] = List(List(A)) 2   val gpPaths =paths.groupBy(_.head)    //> gpPaths  : scala.collection.immutable.Map[String,List[List[String]]] = Map(A-> List(List(A))) 3   List(List("A")) collect { case pp +: rest if rest.nonEmpty => rest } 4                                                   //> res0: List[List[String]] = List()

通过上面的跟踪约化我们看到List(List(A))在pathTree里的执行过程。这里把复杂的groupBy和collect函数的用法和结果了解了。实际上整个过程相当于：

 1  "root".node( 2        "A".node(List().toSeq: _*) 3        ) drawTree                                 //> res3: String = ""root" 4                                                   //| | 5                                                   //| `- "A" 6                                                   //| "

如果再增加一个点就相当于：

 1  "root".node( 2      "A".node(List().toSeq: _*), 3      "B".node(List().toSeq: _*) 4      ) drawTree                                   //> res4: String = ""root" 5                                                   //| | 6                                                   //| +- "A" 7                                                   //| | 8                                                   //| `- "B" 9                                                   //| "

加多一层：

  1   val paths = List(List("A","a1"))                //> paths  : List[List[String]] = List(List(A, a1))  2   val gpPaths =paths.groupBy(_.head)              //> gpPaths  : scala.collection.immutable.Map[String,List[List[String]]] = Map(A  3                                                   //|  -> List(List(A, a1)))  4   List(List("A","a1")) collect { case pp +: rest if rest.nonEmpty => rest }  5                                                   //> res0: List[List[String]] = List(List(a1))  6   7 //化解成  8  "root".node(  9        "A".node( 10           "a1".node( 11            List().toSeq: _*) 12            ) 13        ) drawTree                                 //> res3: String = ""root" 14                                                   //| | 15                                                   //| `- "A" 16                                                   //|    | 17                                                   //|    `- "a1" 18                                                   //| "

合并目录：

  1   val paths = List(List("A","a1"),List("A","a2")) //> paths  : List[List[String]] = List(List(A, a1), List(A, a2))  2   val gpPaths =paths.groupBy(_.head)              //> gpPaths  : scala.collection.immutable.Map[String,List[List[String]]] = Map(A  3                                                   //|  -> List(List(A, a1), List(A, a2)))  4   List(List("A","a1"),List("A","a2")) collect { case pp +: rest if rest.nonEmpty => rest }  5                                                   //> res0: List[List[String]] = List(List(a1), List(a2))  6   7 //相当产生结果  8 "root".node(  9        "A".node( 10           "a1".node( 11            List().toSeq: _*) 12            , 13           "a2".node( 14            List().toSeq: _*) 15            ) 16        ) drawTree                                 //> res3: String = ""root" 17                                                   //| | 18                                                   //| `- "A" 19                                                   //|    | 20                                                   //|    +- "a1" 21                                                   //|    | 22                                                   //|    `- "a2" 23                                                   //| "

相信这些跟踪过程足够了解整个函数的工作原理了。
有了Tree构建方法后就需要Tree的游动和操作函数了。与串形集合的直线游动不同的是，树形集合游动方式是分岔的。所以Zipper不太适用于树形结构。scalaz特别提供了树形集合的定位游标TreeLoc，我们看看它的定义：scalaz/TreeLoc.scala

 final case class TreeLoc[A](tree: Tree[A], lefts: TreeForest[A],                             rights: TreeForest[A], parents: Parents[A]) { ... trait TreeLocFunctions {   type TreeForest[A] =   Stream[Tree[A]]    type Parent[A] =   (TreeForest[A], A, TreeForest[A])    type Parents[A] =   Stream[Parent[A]]

树形集合游标TreeLoc由当前节点tree、左子树lefts、右子树rights及父树parents组成。lefts,rights,parents都是在流中的树形Stream[Tree[A]]。
用Tree.loc可以直接对目标树生成TreeLoc：

  1 /** A TreeLoc zipper of this tree, focused on the root node. */  2   def loc: TreeLoc[A] = TreeLoc.loc(this, Stream.Empty, Stream.Empty, Stream.Empty)  3    4  val tree: Tree[Int] =  5     1.node(  6       11.leaf,  7       12.node(  8         121.leaf),  9      2.node( 10       21.leaf, 11       22.leaf) 12      )                           //> tree  : scalaz.Tree[Int] = <tree> 13  14   tree.loc                      //> res7: scalaz.TreeLoc[Int] = TreeLoc(<tree>,Stream(),Stream(),Stream())

TreeLoc的游动函数：

   def root: TreeLoc[A] =     parent match {       case Some(z) => z.root       case None    => this     }    /** Select the left sibling of the current node. */   def left: Option[TreeLoc[A]] = lefts match {     case t #:: ts     => Some(loc(t, ts, tree #:: rights, parents))     case Stream.Empty => None   }    /** Select the right sibling of the current node. */   def right: Option[TreeLoc[A]] = rights match {     case t #:: ts     => Some(loc(t, tree #:: lefts, ts, parents))     case Stream.Empty => None   }    /** Select the leftmost child of the current node. */   def firstChild: Option[TreeLoc[A]] = tree.subForest match {     case t #:: ts     => Some(loc(t, Stream.Empty, ts, downParents))     case Stream.Empty => None   }    /** Select the rightmost child of the current node. */   def lastChild: Option[TreeLoc[A]] = tree.subForest.reverse match {     case t #:: ts     => Some(loc(t, ts, Stream.Empty, downParents))     case Stream.Empty => None   }    /** Select the nth child of the current node. */   def getChild(n: Int): Option[TreeLoc[A]] =     for {lr <- splitChildren(Stream.Empty, tree.subForest, n)          ls = lr._1     } yield loc(ls.head, ls.tail, lr._2, downParents)

我们试着用这些函数游动：

  1  val tree: Tree[Int] =  2     1.node(  3       11.leaf,  4       12.node(  5         121.leaf),  6      2.node(  7       21.leaf,  8       22.leaf)  9      )                                            //> tree  : scalaz.Tree[Int] = <tree> 10   tree.loc                                        //> res7: scalaz.TreeLoc[Int] = TreeLoc(<tree>,Stream(),Stream(),Stream()) 11   val l = for { 12    l1 <- tree.loc.some 13    l2 <- l1.firstChild 14    l3 <- l1.lastChild 15    l4 <- l3.firstChild 16    } yield (l1,l2,l3,l4)                          //> l  : Option[(scalaz.TreeLoc[Int], scalaz.TreeLoc[Int], scalaz.TreeLoc[Int], 17                                                   //|  scalaz.TreeLoc[Int])] = Some((TreeLoc(<tree>,Stream(),Stream(),Stream()),T 18                                                   //| reeLoc(<tree>,Stream(),Stream(<tree>, <tree>),Stream((Stream(),1,Stream()), 19                                                   //|  ?)),TreeLoc(<tree>,Stream(<tree>, <tree>),Stream(),Stream((Stream(),1,Stre 20                                                   //| am()), ?)),TreeLoc(<tree>,Stream(),Stream(<tree>, ?),Stream((Stream(<tree>, 21                                                   //|  <tree>),2,Stream()), ?)))) 22    23   l.get._1.getLabel                               //> res8: Int = 1 24   l.get._2.getLabel                               //> res9: Int = 11 25   l.get._3.getLabel                               //> res10: Int = 2 26   l.get._4.getLabel                               //> res11: Int = 21

跳动函数：

   /** Select the nth child of the current node. */   def getChild(n: Int): Option[TreeLoc[A]] =     for {lr <- splitChildren(Stream.Empty, tree.subForest, n)          ls = lr._1     } yield loc(ls.head, ls.tail, lr._2, downParents)    /** Select the first immediate child of the current node that satisfies the given predicate. */   def findChild(p: Tree[A] => Boolean): Option[TreeLoc[A]] = {     @tailrec     def split(acc: TreeForest[A], xs: TreeForest[A]): Option[(TreeForest[A], Tree[A], TreeForest[A])] =       (acc, xs) match {         case (acc, Stream.cons(x, xs)) => if (p(x)) Some((acc, x, xs)) else split(Stream.cons(x, acc), xs)         case _                         => None       }     for (ltr <- split(Stream.Empty, tree.subForest)) yield loc(ltr._2, ltr._1, ltr._3, downParents)   }    /**Select the first descendant node of the current node that satisfies the given predicate. */   def find(p: TreeLoc[A] => Boolean): Option[TreeLoc[A]] =     Cobind[TreeLoc].cojoin(this).tree.flatten.find(p)

find用法示范：

  1   val tree: Tree[Int] =  2     1.node(  3       11.leaf,  4       12.node(  5         121.leaf),  6      2.node(  7       21.leaf,  8       22.leaf)  9      )                                            //> tree  : scalaz.Tree[Int] = <tree> 10   tree.loc                                        //> res7: scalaz.TreeLoc[Int] = TreeLoc(<tree>,Stream(),Stream(),Stream()) 11   val l = for { 12    l1 <- tree.loc.some 13    l2 <- l1.find{_.getLabel == 2} 14    l3 <- l1.find{_.getLabel == 121} 15    l4 <- l2.find{_.getLabel == 22} 16    l5 <- l1.findChild{_.rootLabel == 12} 17    l6 <- l1.findChild{_.rootLabel == 2} 18   } yield l6                                      //> l  : Option[scalaz.TreeLoc[Int]] = Some(TreeLoc(<tree>,Stream(<tree>, ?),St 19                                                   //| ream(),Stream((Stream(),1,Stream()), ?)))

注意：上面6个跳动都成功了。如果无法跳转结果会是None
insert,modify,delete这些操作函数：

   /** Replace the current node with the given one. */   def setTree(t: Tree[A]): TreeLoc[A] = loc(t, lefts, rights, parents)    /** Modify the current node with the given function. */   def modifyTree(f: Tree[A] => Tree[A]): TreeLoc[A] = setTree(f(tree))    /** Modify the label at the current node with the given function. */   def modifyLabel(f: A => A): TreeLoc[A] = setLabel(f(getLabel))    /** Get the label of the current node. */   def getLabel: A = tree.rootLabel    /** Set the label of the current node. */   def setLabel(a: A): TreeLoc[A] = modifyTree((t: Tree[A]) => node(a, t.subForest))    /** Insert the given node to the left of the current node and give it focus. */   def insertLeft(t: Tree[A]): TreeLoc[A] = loc(t, lefts, Stream.cons(tree, rights), parents)    /** Insert the given node to the right of the current node and give it focus. */   def insertRight(t: Tree[A]): TreeLoc[A] = loc(t, Stream.cons(tree, lefts), rights, parents)    /** Insert the given node as the first child of the current node and give it focus. */   def insertDownFirst(t: Tree[A]): TreeLoc[A] = loc(t, Stream.Empty, tree.subForest, downParents)    /** Insert the given node as the last child of the current node and give it focus. */   def insertDownLast(t: Tree[A]): TreeLoc[A] = loc(t, tree.subForest.reverse, Stream.Empty, downParents)    /** Insert the given node as the nth child of the current node and give it focus. */   def insertDownAt(n: Int, t: Tree[A]): Option[TreeLoc[A]] =     for (lr <- splitChildren(Stream.Empty, tree.subForest, n)) yield loc(t, lr._1, lr._2, downParents)    /** Delete the current node and all its children. */   def delete: Option[TreeLoc[A]] = rights match {     case Stream.cons(t, ts) => Some(loc(t, lefts, ts, parents))     case _                  => lefts match {       case Stream.cons(t, ts) => Some(loc(t, ts, rights, parents))       case _                  => for (loc1 <- parent) yield loc1.modifyTree((t: Tree[A]) => node(t.rootLabel, Stream.Empty))     }   }

用法示范：

  1   val tr = 1.leaf                                 //> tr  : scalaz.Tree[Int] = <tree>  2   val tl = for {  3     l1 <- tr.loc.some  4     l3 <- l1.insertDownLast(12.leaf).some  5     l4 <- l3.insertDownLast(121.leaf).some  6     l5 <- l4.root.some  7     l2 <- l5.insertDownFirst(11.leaf).some  8     l6 <- l2.root.some  9     l7 <- l6.find{_.getLabel == 12} 10     l8 <- l7.setLabel(102).some 11   } yield l8                                      //> tl  : Option[scalaz.TreeLoc[Int]] = Some(TreeLoc(<tree>,Stream(<tree>, ?),S 12                                                   //| tream(),Stream((Stream(),1,Stream()), ?))) 13    14   tl.get.toTree.drawTree                          //> res8: String = "1 15                                                   //| | 16                                                   //| +- 11 17                                                   //| | 18                                                   //| `- 102 19                                                   //|    | 20                                                   //|    `- 121 21                                                   //| " 22

setTree和delete会替换当前节点下的所有子树：

  1   val tree: Tree[Int] =  2     1.node(  3       11.leaf,  4       12.node(  5         121.leaf),  6      2.node(  7       21.leaf,  8       22.leaf)  9      )                                            //> tree  : scalaz.Tree[Int] = <tree> 10    def modTree(t: Tree[Int]): Tree[Int] = { 11       val l = for { 12         l1 <- t.loc.some 13         l2 <- l1.find{_.getLabel == 22} 14         l3 <- l2.setTree { 3.node (31.leaf) }.some 15       } yield l3 16       l.get.toTree 17    }                                              //> modTree: (t: scalaz.Tree[Int])scalaz.Tree[Int] 18    val l = for { 19    l1 <- tree.loc.some 20    l2 <- l1.find{_.getLabel == 2} 21    l3 <- l2.modifyTree{modTree(_)}.some 22    l4 <- l3.root.some 23    l5 <- l4.find{_.getLabel == 12} 24    l6 <- l5.delete 25   } yield l6                                      //> l  : Option[scalaz.TreeLoc[Int]] = Some(TreeLoc(<tree>,Stream(<tree>, ?),St 26                                                   //| ream(),Stream((Stream(),1,Stream()), ?))) 27   l.get.toTree.drawTree                           //> res7: String = "1 28                                                   //| | 29                                                   //| +- 11 30                                                   //| | 31                                                   //| `- 2 32                                                   //|    | 33                                                   //|    +- 21 34                                                   //|    | 35                                                   //|    `- 3 36                                                   //|       | 37                                                   //|       `- 31 38                                                   //| "

通过scalaz的Tree和TreeLoc数据结构，以及一整套树形结构游览、操作函数，我们可以方便有效地实现FP风格的不可变树形集合编程。

正文到此结束