黄鹏宇
发布于 2025-04-11 / 18 阅读
0
0

HashMap

基于jdk1.8

一、概览

1. Hash table based implementation of the Map interface. This implementation provides all of the optional map operations, and permits null values and the null key. (The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.) This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.

基于hash表实现map接口。本实现提供所有map中的optional操作(比如:clear,put,remove,putAll...),并且允许空值和空键。

下图是一些实现Map接口,但未实现全部optional操作的class

a7e399548dfbfbac9afde96863861ad.png

这个类大致等同于Hashtable,但HashMap是线程不安全的,并且支持空值。

HashMap类不保证映射(Map)中的键值对的顺序;特别是,它不保证顺序会随着时间保持不变。

扩容(rehashing)会导致重新分配

2. This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets.

Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.

如果各个元素被hash函数均匀分布在各个桶上,那么这个实现类,对于get和put基本操作能提供常数时间性能。
遍历操作(?)所需的时间,与容量(桶的数量)和大小(key-value映射数量)成比例。因此不要把初始容量设置太高或者扩容系数设置太低。

3. An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

初始容量和负载系数对HashMap的实例的性能有影响。容量指的是hashtable中的桶数量,初始容量指的是hashtable被创建时的容量。扩容系数用来评估当负载多高的时候,容量需要自动增加。当键的数量接近当前容量*扩容系数时,hashtable会重新哈希(内部会重新构建),这样hashtable大致就有2倍于之前的通数量。

4. As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

一般来说,**默认的负载因子(0.75)** 在时间和空间成本之间提供了一个很好的折中。更高的负载因子值可以减少空间开销,但会增加查找成本(这体现在 HashMap类的大多数操作中,包括 getput操作)。

设置哈希表的初始容量时,要考虑预计要存多少东西以及它的占用比例(负载因子),这样才能尽量少扩容。

如果初始容量设置得足够大(大于最大条目数除以负载因子),那么哈希表就永远不会扩容。简单来说,就是“初始容量足够大,哈希表就不会再‘变大’”。

If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table.Note that using many keys with the same hashCode() is a sure way to slow down performance of any hash table.

如果要在 HashMap中存储大量数据,一开始就设置一个足够大的初始容量,可以让哈希表更高效地运行,而不是让它频繁“自动扩容”。扩容操作会浪费时间,因为每次扩容都需要重新分配和调整内部结构。

另外,如果很多键的 hashCode()返回相同的值(哈希冲突),会让哈希表的性能大幅下降。哈希表的效率依赖于键的哈希值分布均匀,否则性能会变得很慢。

简单来说:

  • 初始容量设大点:让哈希表少“折腾”,减少扩容次数。
  • 避免哈希冲突:尽量让键的 hashCode()返回值均匀分布,否则性能会崩。

To ameliorate impact, when keys are Comparable, this class may use comparison order among keys to help break ties.

为了减少哈希冲突的影响,当键实现了 Comparable接口时,这个类可以利用键的比较顺序来帮助“打破平局”。

Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.)

请注意,这种实现方式不是线程安全的。如果多个线程同时访问哈希映射,并且至少有一个线程在结构上修改了映射,那么必须通过外部同步来控制访问。(结构上的修改是指任何添加或删除一个或多个映射的操作。仅仅是更改实例中已存在键所关联的值,并不算是结构上的修改。)

This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method. This is best done at creation time, to prevent accidental unsynchronized access to the map:

Map m = Collections.synchronizedMap(new HashMap(...));

这通常通过在某个自然封装映射的对象上进行同步来实现。如果没有这样的对象,应使用 Collections.synchronizedMap 方法将映射“包装”起来。最好在创建时就进行包装,以防止对映射进行意外的未同步访问:

Map m = Collections.synchronizedMap(new HashMap(...));

The iterators returned by all of this class's "collection view methods" are fail-fast: if the map is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove method, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.

此类所有“集合视图方法”返回的迭代器都是快速失败的:如果在迭代器创建后,以任何方式(除了通过迭代器自身的 remove 方法)对映射进行结构修改,迭代器将抛出 ConcurrentModificationException。因此,在面对并发修改时,迭代器会快速且地失败,而不是冒着在未来某个不确定时间出现任意、非确定性行为的风险。

Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.

需要注意的是,迭代器的快速失败行为不能得到保证,因为一般来说,在存在未同步的并发修改的情况下,不可能做出任何硬性保证。快速失败的迭代器是在尽最大努力的基础上抛出 ConcurrentModificationException 的。因此,编写一个依赖此异常正确性的程序将是错误的:迭代器的快速失败行为应仅用于检测错误。


评论