转载

通过源码了解JDK8中的LongAdder为何性能优于AtomicLong

Doug lea 在jdk8中给我们带来了比AtomicLong性能更优的LongAdder,AtomicLong的实现方式是内部有个value 变量，当多线程并发自增，自减时，均通过CAS 指令从机器指令级别操作保证并发的原子性。AtomicLong这种通过乐观锁的方式相对于其他机制已经是很高效了，难不成LongAdder使用了更高效的指令？

猜的再多也不如去看一眼源码。LongAdder的继承树如下：

继承自Striped64，这个类包装了一些很重要的内部类和操作。

LongAdder的方法：

通过其中的 add 方法源码来了解内部原理，其他API也是差不多的。

add 方法：

/**
     * Adds the given value.
     *
     * @param x the value to add
     */
    publicvoidadd(longx){
        Cell[] as; long b, v; int m; Cell a;
        if ((as = cells) != null || !casBase(b = base, b + x)) {
            boolean uncontended = true;
            if (as == null || (m = as.length - 1) < 0 ||
                (a = as[getProbe() & m]) == null ||
                !(uncontended = a.cas(v = a.value, v + x)))
                longAccumulate(x, null, uncontended);
        }
    }

Cell 是在 Striped64 中的一个内部类， Cell 代表了一个单元，从字面上理解是这样的。

static final classCell{
        volatile long value;
        Cell(long x) { value = x; }
        finalbooleancas(longcmp,longval){
            return UNSAFE.compareAndSwapLong(this, valueOffset, cmp, val);
        }

        // Unsafe mechanics
        private static final sun.misc.Unsafe UNSAFE;
        private static final long valueOffset;
        static {
            try {
                UNSAFE = sun.misc.Unsafe.getUnsafe();
                Class<?> ak = Cell.class;
                valueOffset = UNSAFE.objectFieldOffset
                    (ak.getDeclaredField("value"));
            } catch (Exception e) {
                throw new Error(e);
            }
        }
    }

Cell内部有一个非常重要的value变量，并且通过CAS来更新其值。

首次调用 add 方法时， (as = cells) != null 为true,看下 casBase 方法，其定义在 Striped64 中。

finalbooleancasBase(longcmp,longval){
        return UNSAFE.compareAndSwapLong(this, BASE, cmp, val);
    }

BASE 是变量 base 在类中的偏移量, base 暂时认为就相当于”实际”存储的值。在这里如果cas更新成功了，则直接就returnl了。如果cas更新失败，说明有线程在竞争。此时AtomicLong则会死循环进行cas操作，而LongAdder使用了一种非常巧妙的思想。因为 as == null ,所以来看下接下去调用的 longAccumulate 方法：

/**
     * Handles cases of updates involving initialization, resizing,
     * creating new Cells, and/or contention. See above for
     * explanation. This method suffers the usual non-modularity
     * problems of optimistic retry code, relying on rechecked sets of
     * reads.
     *
     * @param x the value
     * @param fn the update function, or null for add (this convention
     * avoids the need for an extra field or function in LongAdder).
     * @param wasUncontended false if CAS failed before call
     */
    finalvoidlongAccumulate(longx, LongBinaryOperator fn,
                              boolean wasUncontended) {
        int h;
        if ((h = getProbe()) == 0) {
            ThreadLocalRandom.current(); // force initialization
            h = getProbe();
            wasUncontended = true;
        }
        boolean collide = false;                // True if last slot nonempty
        for (;;) {
            Cell[] as; Cell a; int n; long v;
            if ((as = cells) != null && (n = as.length) > 0) {
            		//h索引到的cell为null,则生成cell对象并设置
                if ((a = as[(n - 1) & h]) == null) {
                    if (cellsBusy == 0) {       // Try to attach new Cell
                        Cell r = new Cell(x);   // Optimistically create
                        if (cellsBusy == 0 && casCellsBusy()) {
                            boolean created = false;
                            try {               // Recheck under lock
                                Cell[] rs; int m, j;
                                if ((rs = cells) != null &&
                                    (m = rs.length) > 0 &&
                                    rs[j = (m - 1) & h] == null) {
                                    rs[j] = r;
                                    created = true;
                                }
                            } finally {
                                cellsBusy = 0;
                            }
                            if (created)
                                break;
                            continue;           // Slot is now non-empty
                        }
                    }
                    collide = false;
                }
                else if (!wasUncontended)       // CAS already known to fail
                    wasUncontended = true;      // Continue after rehash
                //a为索引到的cell，如果cas更新成功则结束循环
                else if (a.cas(v = a.value, ((fn == null) ? v + x :
                                             fn.applyAsLong(v, x))))
                    break;
                else if (n >= NCPU || cells != as)
                    collide = false;            // At max size or stale
                else if (!collide)
                    collide = true;
                //扩容，size变成之前的2倍
                else if (cellsBusy == 0 && casCellsBusy()) {
                    try {
                        if (cells == as) {      // Expand table unless stale
                            Cell[] rs = new Cell[n << 1];
                            for (int i = 0; i < n; ++i)
                                rs[i] = as[i];
                            cells = rs;
                        }
                    } finally {
                        cellsBusy = 0;
                    }
                    collide = false;
                    continue;                   // Retry with expanded table
                }
                h = advanceProbe(h);
            }
            //初始化Cell数组
            else if (cellsBusy == 0 && cells == as && casCellsBusy()) {
                boolean init = false;
                try {                           // Initialize table
                    if (cells == as) {
                        Cell[] rs = new Cell[2];
                        rs[h & 1] = new Cell(x);
                        cells = rs;
                        init = true;
                    }
                } finally {
                    cellsBusy = 0;
                }
                if (init)
                    break;
            }
            else if (casBase(v = base, ((fn == null) ? v + x :
                                        fn.applyAsLong(v, x))))
                break;                          // Fall back on using base
        }
    }

局部变量h是和线程绑定的非0值，h用来定位到 Cell 数组的某一项。

LongAdder给了我们一个非常容易想到的解决方案：减少并发，将单一value的更新压力分担到多个value中去，降低单个value的 “热度”，分段更新！唯一会制约AtomicLong高效的原因是高并发，高并发意味着CAS的失败几率更高，重试次数更多，越多线程重试，CAS失败几率又越高，变成恶性循环，AtomicLong效率降低。 Cell 数组中的每一项就是一个段，当我们需要得到值时，将所有cell中的数据和base相加就可以了。

/**
     * Returns the current sum.  The returned value is <em>NOT</em> an
     * atomic snapshot; invocation in the absence of concurrent
     * updates returns an accurate result, but concurrent updates that
     * occur while the sum is being calculated might not be
     * incorporated.
     *
     * @return the sum
     */
    publiclongsum(){
        Cell[] as = cells; Cell a;
        long sum = base;
        if (as != null) {
            for (int i = 0; i < as.length; ++i) {
                if ((a = as[i]) != null)
                    sum += a.value;
            }
        }
        return sum;
    }

理解了LongAdder的设计思想， longAccumulate 方法应该也很容易看懂。LongAdder在第一次cas失败的时候，并没有选择死循环，而是通过 以空间换时间 的方式提高了性能。低并发时LongAdder和AtomicLong性能差不多，高并发时LongAdder更高效！

因本人水平有限，若文章内容存在问题，恳请指出。允许转载，转载请在正文明显处注明原站地址以及原文地址，谢谢！

原文 http://vinoit.me/2017/02/07/jdk8-LongAdder-sourceCode/

正文到此结束