转载

跳跃表

在看Redis底层实现的时候,看到一个数据结构“跳跃表”,随手学习了一下。大致结构如下图:

跳跃表

一个头指针数据控制着多个链表,分表用S0 - Sn 表示。有一堆数据节点用于链表节点的数据指针指向。最底层的S0链表内指向的数据节点内的值是有序的。

查找过程:从头指针列表控制的顶层链表开始一次查找,如果找到对应值就返回,否则找到一个不大于(或不小于,根据有序的方向决定)所查值的最大值的节点向下减一层继续查找,直到找到最底层S0也没找到既为空。时间复杂度期望为 O(logn)。

插入过程:先用查找算法找到相应的应该插入的有序链表的底层节点位置,新节点应该插入该节点后。其中关键是这里有一个随机化的跳跃算法,即插入的节点应该向上层链表跳跃几层。执行一个 i=0;while(p < random(1)) { i++;} 跳出循环得到的i即为应该跳跃的层次数。这样把相应层次和低于该层的所有链表内的相应位置加入一个指针指向该新插入节点。时间复杂度期望为 O(logn)。

删除操作根据数据结构特点相应的删除指针和值节点即可。时间复杂度期望为 O(logn)。

空间复杂度: O(n)           (期望)跳跃表高度: O(logn)        (期望)

下边是一个网上的统计P值对各个操作的影响列表:

P 平均操作时间 平均列高 总结点数

(平均值)

(平均值)

(平均值)

2/3 0.0024690 ms 3.004 91233 39.878 41.604 41.566
1/2 0.0020180 ms 1.995 60683 27.807 29.947 29.072
1/e 0.0019870 ms 1.584 47570 27.332 28.238 28.452
1/4 0.0021720 ms 1.330 40478 28.726 29.472 29.664
1/8 0.0026880 ms 1.144 34420 35.147 35.821 36.007

关于跳跃表比二叉树、红黑树等其他有序数据结构的优势看一下redis作者的使用跳跃表理由就了解了:

There are a few reasons:

1) They are not very memory intensive. It's up to you basically. Changing parameters about the probability of a node to have a given number of levels will make then less memory intensive than btrees.

2) A sorted set is often target of many ZRANGE or ZREVRANGE operations, that is, traversing the skip list as a linked list. With this operation the cache locality of skip lists is at least as good as with other kind of balanced trees.

3) They are simpler to implement, debug, and so forth. For instance thanks to the skip list simplicity I received a patch (already in Redis master) with augmented skip lists implementing ZRANK in O(log(N)). It required little changes to the code.

About the Append Only durability & speed, I don't think it is a good idea to optimize Redis at cost of more code and more complexity for a use case that IMHO should be rare for the Redis target (fsync() at every command). Almost no one is using this feature even with ACID SQL databases, as the performance hint is big anyway.

About threads: our experience shows that Redis is mostly I/O bound. I'm using threads to serve things from Virtual Memory. The long term solution to exploit all the cores, assuming your link is so fast that you can saturate a single core, is running multiple instances of Redis (no locks, almost fully scalable linearly with number of cores), and using the "Redis Cluster" solution that I plan to develop in the future.

ps:其实也非常合竞赛使用,相比线段树红黑树平衡树等的学习成本、编码等简单方便非常多,调试等也会极大节省时间。推荐竞赛选手使用。

原文  http://blogread.cn/it/article/7940?f=hot1
正文到此结束
Loading...