转载

java7 HashMap 源码详解

Entry<K,V>[] table

static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        //存储下一个节点地址，所以hashMap是单项链表
        Entry<K,V> next;
        //注意这个不是真的k的hashcode，是经过hashmap扰动之后的hashcode
        int hash;
    }
复制代码

问题，既然有数组，那么数组的大小怎么设置？

HashMap 容器初始化怎么设置

//如果对hashMap初始化传入这两个参数，那么数组大小多少呢，是传入多少就多少嘛？
  HashMap map = new HashMap(2,0.75);
  //下面是实例化的源码
      public HashMap(int initialCapacity, float loadFactor) {
       //加载因子
        this.loadFactor = loadFactor;
       //阈值    
        threshold = initialCapacity;
        init();
    }
 //所以我们只能设置，hashmap的加载因子和阈值，这两个跟数组大小什么关系
 //阈值 = 数组大小*加载因子
 //每次put的时候如果数组元素个数>=阈值就要进行扩容
 //现在问题是我们传入这个阈值，对数组大小有什么影响？
 //看下put的源码
     public V put(K key, V value) {
        if (table == EMPTY_TABLE) {
        //会拿传入阈值去初始化数组大小，得到大于等于阈值的二的幂次方
            inflateTable(threshold);
        }
    }    
复制代码

put->inflateTable -> roundUpToPowerOf2

roundUpToPowerOf2 是设置容器初始值的关键
// Find a power of 2 >= toSize 在 roundUpToPowerOf2 的注释有这句话，说是容器初始值要2的幂次方大于 tosize，后面再来讨论为什么要这么做，先看下怎么做到的

private static int roundUpToPowerOf2(int number) {
        // assert number >= 0 : "number must be non-negative";
        return number >= MAXIMUM_CAPACITY
                ? MAXIMUM_CAPACITY
                : (number > 1) ? Integer.highestOneBit((number - 1) << 1) : 1;
    }
复制代码

上面那段代码最关键的是 Integer.highestOneBit 写个测试方法简单测下这个方法的作用

public static void main(String[] args) {
        System.out.println(Integer.highestOneBit(15));//8
        System.out.println(Integer.highestOneBit(16));//16
        System.out.println(Integer.highestOneBit(20));//16
    }
复制代码

测试可知 Integer.highestOneBit 获得的结果 1.二的幂次方 2.大于等于传入值。点进入看下源码实现

public static int highestOneBit(int i) {
        // HD, Figure 3-1
        i |= (i >>  1);
        i |= (i >>  2);
        i |= (i >>  4);
        i |= (i >>  8);
        i |= (i >> 16);
        return i - (i >>> 1);
    }
复制代码

按照 highestOneBit 的移位和或运算举个列子传入值为17

17    0001 0001
>>    0000 1000  右移1位结果
|     0001 1001  或运算结果

>>    0000 0110  右移2位结果
|     0001 1111  或运算结果

>>    0000 0001  右移4位结果
|     0001 1111 或运算
........ 最后会发现，结果都是 0001 1111 也就是说把  0001 0001 的低位全部改成 1 

i - (i >>> 1)
i >>> 1               0000 1111
i - (i >>> 1)        0001 1111-0000 1111=0001 0000

复制代码

总结： highestOneBit 1.右移动+或运算的目的是将所有位改为1 这个称为全1 2. i - (i >>> 1) 首先 i >>> 1 的目的是获取比全1少一位，然后全1减去少一位的就是最高位是1，其他都是0，这样就可以拿到比传入值小的二的幂次方。

hashmap 是怎么获取比传入值大的二的幂次方： Integer.highestOneBit((number - 1) << 1)

加入传入的number = 15 要获得的是16
Integer.highestOneBit((number - 1) << 1)
number - 1  15-1 14
<< 1        14-> 28 
Integer.highestOneBit（28）= 16
复制代码

总结： Integer.highestOneBit((number - 1) 1.<< 1 ) 1.<< 1 左移一位比较好理解，因为 Integer.highestOneBit 获取的是小于等于传入值的二的幂次方，所以左移一位可以传入值变大二的幂次方，结果也会大于等于传入值 2. number-1 比较不好理解，这个主要是避免特殊情况，加入 number =16，如果没有减一 16<<1=32, Integer.highestOneBit(32) = 32,然而结果是要获取16 3.以上就是获取容器大小为二得幂次方得方式，挺复杂也挺牛逼得。

为什么数组长度是二的幂次方

跟计算hash槽位有关，往下看

java7 HashMap 是怎么计算key的哈希槽位的

前提需要了解 `HashMap` 数据结构：数组+链表

indexFor(hash, table.length)

indexFor 通过 hash 和数组长度计算

static int indexFor(int h, int length) {
        return h & (length-1);
    }
假如 h：0010 1011   length：16
length-1            0001 0000 -> 0000 1111 
h & (length-1)                   0010 1011
                                 0000 1011   转为十进制为11在0-15
复制代码

总结：1.要知道 indexFor 的目的是什么，是要获取0-（length-1）的数字 2.lengthe-1有什么含义，这就涉及到为什么length要是二的幂次方，从二进制的角度看，二的幂次方最高位是1其他都是0，减一这样就获得最高位是0，其他位都是1，hashcode & 上除了最高位的都是1之后结果的范围只能在0-（length-1），所以从这里就可以知道为什么上面获取数组的长度为二的幂次方。3.但是这个算法又有个问题，就是 hashCode & (length-1) 这个操作 hashCode 只有低位跟（length—1）发生 & 操作，参与到槽位的分配，高位参与不到，可能导致槽位分配不均匀，所以，需要进行 hash 扰动。

存入之得hash值是怎么计算得，如何更加均匀

问题：是直接通过 key.hashCode 嘛，如果是这样得话，可能存在一个问题，我们对象重新 hashcode 方法，但是大部分值都一样，这样就做不到 hash 均匀分撒，所以 HashMap 对 hashCode 之后得值进行了 hash 扰动，让其更加散列，看下 HashMap 具体是怎么实现得。

final int hash(Object k) {
//获得hash种子，默认为0
        int h = hashSeed;
        if (0 != h && k instanceof String) {
            return sun.misc.Hashing.stringHash32((String) k);
        }
//0 异或 key的hashcode ，结果不变
        h ^= k.hashCode();
        //h >>> 的过程就是让高位参与hash槽计算
        //异或操作扰动hash
        h ^= (h >>>  20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }
      
复制代码

总结： hash 这个方法大概就是通过左移，异或操作让原本的 hashCode 能够更加散列，让原本的 hashCode 高位能参与到hash槽位的计算

数组长度计算，槽位计算，hash取值这三个是HashMap的重点也是难点，同样也是互相联系的

链表是怎么插入的，经典面试题，头插入and尾插法

HashMap 遇到 hash 冲突怎么解决，用链表，用链表就涉及到一个问题，怎么插入链表比较合适

头插法

场景：往第2个槽位put数据
//头插法一行代码就可以搞定
//1.新建一个节点，next节点指向原本链表的头节点 next = table[1]
//2.头节点  table[1] = 新建节点
table[1] = new Entry（hash，key，value，next = table[1]）;

复制代码

尾插法

//获取头节点
//需要遍历节点，将最后一位节点的next指向新节点
for(Entry entry = table[1];entry = entry.next){
   if(entry.next == null){
       entry.next =  new Entry（hash，key，value，next = null）;
   }
}
复制代码

比较：很明显头插法比尾插法简单，效率更高，大概是因为这个原因， HahsMap 采用了 头插法 ，但是头插法也带来了多线程扩容问题:循环链表，导致后面如果有 get 或者 put 需要遍历链表的时候就会出现死循环。

put方法

public V put(K key, V value) {
//刚开始没有设置数组大小，就会进入`inflateTable`方法促使话数组大小
//inflateTable调用roundUpToPowerOf2方法获取到数组大小为二的幂次方
//threshold 默认16
        if (table == EMPTY_TABLE) {
            inflateTable(threshold);
        }
        if (key == null)
//对 null 的设置，所以hashmap是可以传入null作为key的
//key = nulll 放到数组的第一位
            return putForNullKey(value);
//根据key获得到经过扰动的hashcode            
        int hash = hash(key);
//根据hashcode和数组长度获取hash槽位        
        int i = indexFor(hash, table.length);
//遍历所在的hash槽位上的链表，如果存在相同的key就替代   
//什么才算是相同的key，这就涉及到为什么重写hashCode之后还要重新equals
//通过比较hahsCode是否相等，之后还需要比较==或者通过equals比较是否相同
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
        modCount++;
//添加元素        
        addEntry(hash, key, value, i);
        return null;
    }
复制代码

addEntry

void addEntry(int hash, K key, V value, int bucketIndex) {
    //数组扩容条件  (size >= threshold) && (null != table[bucketIndex])
    //数组元素大小大于等于阈值同时需要put的头节点不为空
        if ( (size >= threshold) && (null != table[bucketIndex])
        //) {
        //扩容大小是原来数组大小的两倍
            resize(2 * table.length);
        //扩容之后需要重新计算hash值    
            hash = (null != key) ? hash(key) : 0;
        //扩容之后重新计算槽位    
            bucketIndex = indexFor(hash, table.length);
        }
   //新建新的节点
        createEntry(hash, key, value, bucketIndex);
    }
复制代码

createEntry

void createEntry(int hash, K key, V value, int bucketIndex) {
    //头插法插入数据
        Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<>(hash, key, value, e);
        size++;
    }
复制代码

HashMap 扩容，以及多线程下扩容的问题

addEntry

void addEntry(int hash, K key, V value, int bucketIndex) {
    //在添加元素的时候会进行判断是否扩容
        if ((size >= threshold) && (null != table[bucketIndex])) {
        //扩容
            resize(2 * table.length);
        //todo 不懂为什么还要重新计算hashCode    
            hash = (null != key) ? hash(key) : 0;
            bucketIndex = indexFor(hash, table.length);
        }
     //...
     }
复制代码

resize

void resize(int newCapacity) {
        Entry[] oldTable = table;
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return;
        }
//新建一个数组
        Entry[] newTable = new Entry[newCapacity];
//进行数据迁移        
        transfer(newTable, initHashSeedAsNeeded(newCapacity));
//新数组赋值给老数组        
        table = newTable;
//扩容之后重新设置阈值        
        threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
    }
复制代码

transfer

void transfer(Entry[] newTable, boolean rehash) {
        int newCapacity = newTable.length;
        //两层for循环
        //第一层针对数组
        //第二层针对链表
        for (Entry<K,V> e : table) {
            while(null != e) {
                Entry<K,V> next = e.next;
                if (rehash) {
                    e.hash = null == e.key ? 0 : hash(e.key);
                }
                //重新计算操作
                int i = indexFor(e.hash, newCapacity);
                e.next = newTable[i];
                newTable[i] = e;
                e = next;
            }
        }
    }
复制代码

**总结：**1.扩容数组长度为原来的两倍 2.扩容是新建一个新的数组，hashmap直接引用新的数组对象 3.扩容之后需要重新设置阈值 4.扩容多线程会产生循环链表

扩容之后迁移数据槽位分配有什么规律嘛

假设1：
hash： 0010 0011  
原数组长度16：0001 0000
原槽位： 0010 0011 & （0001 0000 - 0000 0001）
       0010 0011 & 0000 1111 = 0000 0011 (十进制为3)
现数组长度2*16=32： 0010 0000      
现槽位： 0010 0011 & （0010 0000 - 0000 0001）
        0010 0011 & 0001 1111 = 0000 0011 (十进制为3)
结果：发现槽位还是原来那个位置
假设2：
把hash从0010 0011  改为 0011 0011 
现槽位： 0010 0011 & （0010 0000 - 0000 0001）
        0011 0011 & 0001 1111 = 0001 0011 (十进制为3+16)
        
总结：数组扩容之后，迁移数据，槽位要么是在原来的地方，要么是原来的槽位+原来数组长度

复制代码

原文 https://juejin.im/post/5f1567bff265da22c7574fe4

正文到此结束