转载

数据结构之哈希表 Java中的经典实现HashMap分析

HashMap是最常用的Map族中的一个，Java Collection Framework 重要成员之一，HashMap 在项目中最常用到，既然HashMap如此重要，更应该了解HashMap的数据结构、实现原理、源码分析以及p如何实现快速的存取和扩容。

本文关于HashMap的源码是基于 Android 7.0的源码，不同版本的jdk 源码也会存在一些差异，这些都不重要，重点在与我们主要了解HashMap的数据结构以及原理实现。

哈希表知识回顾

在上一篇文章中讲解了Hash表的基础知识

为什么会出现哈希表？

已知顺序表（数组），查找容易，插入、删除困难消耗性能。然而链表虽然解决了顺序表插入删除的问题，但是链表的查找却降低了性能。

结合两者的优缺点，于是便出现了，查找容易同时插入删除容易的表Hash表。

哈希表有什么特点？

优点：查找、插入、删除只需要接近敞亮的时间O（1），实际上就是几条及其指令，在速度和易用性上是非常棒的。

缺点：基于数组，数组创建后都是固定大小的，难于拓展，当hash表被填满后，性能下降很严重，所以要求预先清楚表中存储多少数据。

HashMap 概述

HashMap是基于Map接口实现的，以Key-Value的形式存在，存储了 static class HashMapEntry<K,V> implements Map.Entry<K,V> HashMapEntry 对象，同时HashMap会根据hash算法计算Key-Vaule的存储位置实现快速存取，HashMap是不支持并发操作的。

HashMap的数据结构

数据结构之哈希表 Java中的经典实现HashMap分析

每一个小块中都嵌套Node的实例

static class HashMapEntry<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;
    }

从上图可以看出，HashMap里有一个数组，并且数组中每一个元素都是一个单向链表。

HashMap hash冲突解决方案：拉链法。HashMap就是一个链表数组。

通过上图，可以推算出HashMap大致的实现原理：

HashMap 插入：通过hash函数 f(key)函数中有一个hash算法来得到一个hash值，hash值的范围一定在数组范围之内，通过hash值找到数组对应的下标，然后在该数组的单向链表插入表头(为什么插入表头呢，如果插入链表尾端就需要循环链表，很明显会造成性能上的消耗，如果插入表头，就直接插入操作便可以了)。

HashMap 查找：通过hash函数 f(key)函数中有一个hash算法来得到一个hash值，一般hash值就是数组的下标，但是还是需要判断看hash值是否超过了数组的长度，然后遍历该数组处的链表，直到 HashMapEntry.key == key

HashMap 删除：同样通过key得到一个hash值，找到数组的下标，然后遍历该数组处的链表，直到 HashMapEntry.key == key，然后就是链表的删除操作。

[图片上传失败…(image-ebe763-1532773587503)]

HashMap源码分析

AbstractMap 抽象类提供了 Map 接口的骨干实现，以最大限度地减少实现Map接口所需的工作

public class HashMap<K,V>
    extends AbstractMap<K,V>
    implements Map<K,V>, Cloneable, Serializable

HashMap的属性

/**
   * 
   * 当前数组的容量，默认是4个数组，必须是2的n次方，扩容后的大小为当前的两倍
   */
  static final int DEFAULT_INITIAL_CAPACITY = 4;

  /**
   * 在构造函数中没有指定时使用的默认负载因子。
   */
  static final float DEFAULT_LOAD_FACTOR = 0.75f;

  /**
   * 一个空的数组
   */
  static final HashMapEntry<?,?>[] EMPTY_TABLE = {};

  /**
   * HashMap的数组，该数组根据需要调整大小，大小始终是2的N次方
   */
  transient HashMapEntry<K,V>[] table = (HashMapEntry<K,V>[]) EMPTY_TABLE;

  /**
   * HashMap的大小
   */
  transient int size;

  /**
   * 扩容的阈值 (capacity * load factor).
   * @serial
   */
  // If table == EMPTY_TABLE then this is the initial capacity at which the
  // table will be created when inflated.
  int threshold;

  /**
   * 负载因子，默认为 0.75。
   *
   */
  // Android-Note: We always use a load factor of 0.75 and ignore any explicitly
  // selected values.
  final float loadFactor = DEFAULT_LOAD_FACTOR;
  
  /**
  * 最大容量
  */
  static final int MAXIMUM_CAPACITY = 1 << 30;

HashMap 构造函数

HashMap(int initialCapacity, float loadFactor) 指定初始容量和负载因子

HashMap(int initialCapacity) 指定初始容量

public HashMap(int initialCapacity, float loadFactor) {
       //初始容量不能小于 0 
       if (initialCapacity < 0)
           throw new IllegalArgumentException("Illegal initial capacity: " +
                                              initialCapacity);
       //初始容量不能超过 2^30
       if (initialCapacity > MAXIMUM_CAPACITY) {
           initialCapacity = MAXIMUM_CAPACITY;
       } else if (initialCapacity < DEFAULT_INITIAL_CAPACITY) {
           initialCapacity = DEFAULT_INITIAL_CAPACITY;
       }
       //负载因子不能小于0
       if (loadFactor <= 0 || Float.isNaN(loadFactor))
           throw new IllegalArgumentException("Illegal load factor: " +
                                              loadFactor);
                                              
       //--------注意这一段是 在Android的源码中--------------- 
       //Android中的源码并没有改变负载因子，负载因子始终为0.75
       // Android-Note: We always use the default load factor of 0.75f.

       // This might appear wrong but it's just awkward design. We always call
       // inflateTable() when table == EMPTY_TABLE. That method will take "threshold"
       // to mean "capacity" and then replace it with the real threshold (i.e, multiplied with
       // the load factor). 
       //阈值便是初始的容量
       threshold = initialCapacity;
       //----------------------------------------------------
       
       //--------注意这一段是在Java的源码中--------------------------
       // HashMap 的容量必须是2的幂次方，超过 initialCapacity 的最小 2^n 
       int capacity = 1;
       while (capacity < initialCapacity)
           capacity <<= 1;   

       //负载因子
       this.loadFactor = loadFactor;

       //设置HashMap的容量极限，当HashMap的容量达到该极限时就会进行自动扩容操作
       threshold = (int)(capacity * loadFactor);

       // HashMap的底层实现仍是数组，只是数组的每一项都是一条链
       table = new Entry[capacity];
       //--------------------------------------------------------
       
       init();
   }

   
   public HashMap(int initialCapacity) {
       //指定容量大小
       this(initialCapacity, DEFAULT_LOAD_FACTOR);
   }
   
   public HashMap() {
       //使用默认值
       this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
   }
   
    public HashMap(Map<? extends K, ? extends V> m) {
       // 接收一个map m的大小/负载因子+1 如果小于 默认的容量大小则 使用默认的容量
       this(Math.max((int) (m.size() / DEFAULT_LOAD_FACTOR) + 1,
                     DEFAULT_INITIAL_CAPACITY), DEFAULT_LOAD_FACTOR);
       inflateTable(threshold);

       putAllForCreate(m);
   }
   
   init(){
       
   }

HashMap put过程分析，跟随源码走一遍

public V put(K key, V value) {
        //如果数组部分是一个空表，也可以说插入第一个元素的时候，需要先初始化数组大小
        if (table == EMPTY_TABLE) {
            inflateTable(threshold);
        }
        //如果key == null,最终会放到table[0]中
        if (key == null)
            return putForNullKey(value);
        // 计算key的hash值    
        int hash = sun.misc.Hashing.singleWordWangJenkinsHash(key);
        //找到数组的下标，一般hash为数组的下标，同时要防止hash > table.length ,这时取table.length
        int i = indexFor(hash, table.length);
        
        //遍历一下对应下标处数组的链表，查看是否重复key的存在如果有则直接覆盖，到此put结束
        for (HashMapEntry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
        //如果没有重复的key 则添加到链表中
        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }

上述代码中可以看出，源码的逻辑非常严谨，首先判断table是否是一个空表，如果是空表，初始化数组的大小，然后在判断key是否为null，如果为null，就放到table[0]对应的链表中，如果key不为null，就根据key计算hash值，然后得到数组的下标，再遍历对应下标处数组存储的链表，看是否存在重复的key，如果存在就替换掉，如果不存在就添加到链表的头节点。这就是一个完整的put过程。

标注：不同版本的Java SDK 可能计算的hash值的算法不同

//Java jdk 8
static int hash(int h) {
        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }
    
    
 //android 源码中计算hash值的算法    
 int hash = sun.misc.Hashing.singleWordWangJenkinsHash(key);

inflateTable 数组的初始化分析，我们看一下inflateTable的实现，代码如下：

private void inflateTable(int toSize) {
       // 保证数组的大小一定是 2^n
       int capacity = roundUpToPowerOf2(toSize);

       //计算扩容阈值 
       float thresholdFloat = capacity * loadFactor;
       if (thresholdFloat > MAXIMUM_CAPACITY + 1) {
           thresholdFloat = MAXIMUM_CAPACITY + 1;
       }
        threshold = (int) thresholdFloat;

       //初始化数组       
       table = new HashMapEntry[capacity];
   }
   
   //有兴趣可以了解这个方法的实现
    private static int roundUpToPowerOf2(int number) {
       // assert number >= 0 : "number must be non-negative";
       int rounded = number >= MAXIMUM_CAPACITY
               ? MAXIMUM_CAPACITY
               : (rounded = Integer.highestOneBit(number)) != 0
                   ? (Integer.bitCount(number) > 1) ? rounded << 1 : rounded
                   : 1;

       return rounded;
   }

indexFor 计算数组的具体位置

当length为2的n次方时，h&(length - 1)就相当于对length取模，而且速度比直接取模要快得多，这是HashMap在速度上的一个优化

//使用 key 的 hash 值对数组长度进行取模
static int indexFor(int h, int length) {
        // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
        return h & (length-1);
    }

hash()方法和indexFor()方法的作用只有一个：保证元素均匀分布到table的数组中以便充分利用空间。

putForNullKey key == null 时的处理

private V putForNullKey(V value) {
        //遍历数组的第一个下标的链表，查找是否存在key == null，如果存在替换之前的数据，至此结束
        for (HashMapEntry<K,V> e = table[0]; e != null; e = e.next) {
            if (e.key == null) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
        //如果不存在添加到数组的第一个下标的链表中 
        modCount++;
        addEntry(0, null, value, 0);
        return null;
    }

addEntry 添加节点到链表

有一点值得注意，HashMap 永远都是在链表的表头添加新元素。

void addEntry(int hash, K key, V value, int bucketIndex) {
     //当前HashMap的大小已经达到了阈值 并且要插入的数组的位置已经有了元素，那么需要扩容
     if ((size >= threshold) && (null != table[bucketIndex])) {
         //扩容之前的两倍
         resize(2 * table.length);
         //扩容后重新计算hash值
         hash = (null != key) ? sun.misc.Hashing.singleWordWangJenkinsHash(key) : 0;
         //重新计算下标
         bucketIndex = indexFor(hash, table.length);
     }
     //这个方法很简单 将新值放到链表的表头
     createEntry(hash, key, value, bucketIndex);
 }

 
 void createEntry(int hash, K key, V value, int bucketIndex) {
     HashMapEntry<K,V> e = table[bucketIndex];// 之前的链表
     table[bucketIndex] = new HashMapEntry<>(hash, key, value, e);//将新的节点作为表头
     size++;
 }

数组扩容resize

如果插入新值是HashMap的size 达到了阈值（length * 负载因子），并且要插入的数组上已经有了元素，那么就需要进行扩容，扩容后数组的大小为原来的两倍。这也是为了保证HashMap的效率，尽可能小的影响HashMap的存取速度

void resize(int newCapacity) {
        HashMapEntry[] oldTable = table;//旧的数组
        int oldCapacity = oldTable.length;//旧的数组的大小
        //如果旧的数组的大小>最大容量，那么将阈值设置为最大整数
        if (oldCapacity == MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return;
        }
        //新的数组
        HashMapEntry[] newTable = new HashMapEntry[newCapacity];
        //将原来数组的值迁移到新的数组中
        transfer(newTable);
        table = newTable;
        //重新计算阈值
        threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
    }
    
    //将旧数组的值迁移到新数组上
    void transfer(HashMapEntry[] newTable) {
        int newCapacity = newTable.length;
        //遍历旧数组，将旧数组中的每个链表添加到新的数组中
        for (HashMapEntry<K,V> e : table) {
            while(null != e) {
                //将值进行迁移
                HashMapEntry<K,V> next = e.next;
                //数组的下标
                int i = indexFor(e.hash, newCapacity);
                e.next = newTable[i];
                newTable[i] = e;
                e = next;
            }
        }
    }

扩容就是用新的大数组来代替原来的小数组，并将原来数组的值迁移到新数组中。

get过程分析，分析完put过程那么get很简单

原理：通过key，哈希函数计算得到hash值，然后通过hash值得到数组的下标，然后遍历该数组处的链表。

 public V get(Object key) {
        if (key == null)
            // 当key等于null时的处理，遍历table[0]的链表，找到key==null的节点。
            return getForNullKey();
        //通过key获取    
        HashMapEntry<K,V> entry = getEntry(key);

        return null == entry ? null : entry.getValue();
    }

//这个方法就是遍历 table[0] 中的链表。
private V getForNullKey() {
        if (size == 0) {
            return null;
        }
        for (HashMapEntry<K,V> e = table[0]; e != null; e = e.next) {
            if (e.key == null)
                return e.value;
        }
        return null;
    }

getEntry获取节点

final Entry<K,V> getEntry(Object key) {
      //size==0 说明HashMap大小为0，返回null
      if (size == 0) {
          return null;
      }
      //如果key == null hash =0;否则就计算hash值
      int hash = (key == null) ? 0 : sun.misc.Hashing.singleWordWangJenkinsHash(key);
      //遍历对应数组下标的单向链表
      for (HashMapEntry<K,V> e = table[indexFor(hash, table.length)];
           e != null;
           e = e.next) {
          Object k;
          if (e.hash == hash &&
              ((k = e.key) == key || (key != null && key.equals(k))))
              return e;
      }
      return null;
  }

remove过程分析

实现过程同样也很简单，都需要得到hash值，可见hash值在哈希表中是非常重要的角色，通过hash值找到对应下标的链表，然后做链表删除操作。

public V remove(Object key) {
      Entry<K,V> e = removeEntryForKey(key);
      return (e == null ? null : e.getValue());
  }

removeEntryForKey 删除操作

final Entry<K,V> removeEntryForKey(Object key) {
       //同样需要判断HashMap大小是否为0
       if (size == 0) {
           return null;
       }
       //得到hash值
       int hash = (key == null) ? 0 : sun.misc.Hashing.singleWordWangJenkinsHash(key);
       //得到数组下标
       int i = indexFor(hash, table.length);
       //得到对应数组下标的链表
       HashMapEntry<K,V> prev = table[i];
       HashMapEntry<K,V> e = prev;
       //遍历链表找到对应的key
       while (e != null) {
           HashMapEntry<K,V> next = e.next;
           Object k;
           //链表中存在key，则删除这个节点
           if (e.hash == hash &&
               ((k = e.key) == key || (key != null && key.equals(k)))) {
               modCount++;
               size--;//HashMap大小减一
              
               if (prev == e)
                   table[i] = next; //删除头节点
               else
                   prev.next = next;//删除节点
               e.recordRemoval(this);
               return e;
           }
           prev = e;
           e = next;
       }

       return e;
   }