▪️ 🔕 🛁 加快ConcurrentReferenceHashMap的创建 🧛 🐗 👨🏻‍🌾

问候，在本文中，我将讨论如何轻松完成org.springframework.util.ConcurrentReferenceHashMap的创建。

有兴趣提高性能吗？欢迎光临

智商

当然，我们将开始进行测量，并尝试了解我们将确切地改进什么。为此，请使用JMH 1.21，JDK 8和JDK 11以及async-profiler 。

为了弄清楚创建一个空字典需要多少钱，我们提供了一个简单的经验：

 @Benchmark public Object original() { return new ConcurrentReferenceHashMap(); }

配置文件如下所示：

 55.21% 2429743 osuConcurrentReferenceHashMap.calculateShift 20.30% 891404 osuConcurrentReferenceHashMap$Segment.<init> 8.79% 387198 osuConcurrentReferenceHashMap.<init> 3.35% 147651 java.util.concurrent.locks.ReentrantLock.<init> 2.34% 102804 java.lang.ref.ReferenceQueue.<init> 1.61% 70748 osuConcurrentReferenceHashMap.createReferenceManager 1.53% 67265 osuConcurrentReferenceHashMap$Segment.createReferenceArray 0.78% 34493 java.lang.ref.ReferenceQueue$Lock.<init> 0.76% 33546 osuConcurrentReferenceHashMap$ReferenceManager.<init> 0.36% 15948 osuAssert.isTrue

方向很明确，您可以继续。

数学

因此，我们将大部分时间用在calculateShift方法中。这是：

 protected static int calculateShift(int minimumValue, int maximumValue) { int shift = 0; int value = 1; while (value < minimumValue && value < maximumValue) { value <<= 1; shift++; } return shift; }

很难提出新的东西，所以让我们切换到使用它：

 public ConcurrentReferenceHashMap(/*...*/ int concurrencyLevel, /*...*/) { //... this.shift = calculateShift(concurrencyLevel, MAXIMUM_CONCURRENCY_LEVEL); //... } // ConcurrentReferenceHashMap$Segment public Segment(int initialCapacity) { this.referenceManager = createReferenceManager(); this.initialSize = 1 << calculateShift(initialCapacity, MAXIMUM_SEGMENT_SIZE); this.references = createReferenceArray(this.initialSize); this.resizeThreshold = (int) (this.references.length * getLoadFactor()); }

注意使用Segment构造函数：

 int roundedUpSegmentCapacity = (int) ((initialCapacity + size - 1L) / size); //... for (int i = 0; i < this.segments.length; i++) { this.segments[i] = new Segment(roundedUpSegmentCapacity); }

经过循环时， roundedUpSegmentCapacity的值roundedUpSegmentCapacity恒定的，因此在Segment构造函数中执行的表达式1 << calculateShift(initialCapacity, MAXIMUM_SEGMENT_SIZE)也将始终恒定。因此，我们可以将指定的表达式带到构造函数和循环之外。

对于表达式(int) (this.references.length * getLoadFactor()) ，同样的说法也适用，因为references数组是使用initialCapacity变量创建的，并且在创建每个段时其大小是恒定的。将表达式拉出构造函数和循环的范围。

数组

考虑一下createReferenceArray方法：

 private Reference<K, V>[] createReferenceArray(int size) { return (Reference<K, V>[]) Array.newInstance(Reference.class, size); }

使用Array::newInstance显然Array::newInstance多余的，没有什么可以阻止我们使用构造函数创建数组：

 private Reference<K, V>[] createReferenceArray(int size) { return new Reference[size]; }

构造函数的性能不劣于在C2级别调用Array::newInstance ，但对于C1模式下的小数组（ -XX:TieredStopAtLevel=1属性-XX:TieredStopAtLevel=1 ）和解释器（ -Xint属性），其性能-XX:TieredStopAtLevel=1 ：

 //C2 length Mode Cnt Score Error Units constructor 10 avgt 50 5,6 ± 0,0 ns/op constructor 100 avgt 50 29,7 ± 0,1 ns/op constructor 1000 avgt 50 242,7 ± 1,3 ns/op newInstance 10 avgt 50 5,5 ± 0,0 ns/op newInstance 100 avgt 50 29,7 ± 0,1 ns/op newInstance 1000 avgt 50 249,3 ± 9,6 ns/op //C1 length Mode Cnt Score Error Units constructor 10 avgt 50 6,8 ± 0,1 ns/op constructor 100 avgt 50 36,3 ± 0,6 ns/op constructor 1000 avgt 50 358,6 ± 6,4 ns/op newInstance 10 avgt 50 91,0 ± 2,4 ns/op newInstance 100 avgt 50 127,2 ± 1,8 ns/op newInstance 1000 avgt 50 322,8 ± 7,2 ns/op //-Xint length Mode Cnt Score Error Units constructor 10 avgt 50 126,3 ± 5,9 ns/op constructor 100 avgt 50 154,7 ± 2,6 ns/op constructor 1000 avgt 50 364,2 ± 6,2 ns/op newInstance 10 avgt 50 251,2 ± 11,3 ns/op newInstance 100 avgt 50 287,5 ± 11,4 ns/op newInstance 1000 avgt 50 486,5 ± 8,5 ns/op

替换不会影响我们的基准测试，但会在C2尚未运行时在应用程序启动时加速代码。本文末尾将介绍有关此模式的更多信息。

关键的小事情

让我们再次转到构造函数ConcurrentReferenceHashMap

 ConcurrentReferenceHashMap(/*...*/) { Assert.isTrue(initialCapacity >= 0, "Initial capacity must not be negative"); Assert.isTrue(loadFactor > 0f, "Load factor must be positive"); Assert.isTrue(concurrencyLevel > 0, "Concurrency level must be positive"); Assert.notNull(referenceType, "Reference type must not be null"); this.loadFactor = loadFactor; this.shift = calculateShift(concurrencyLevel, MAXIMUM_CONCURRENCY_LEVEL); int size = 1 << this.shift; this.referenceType = referenceType; int roundedUpSegmentCapacity = (int) ((initialCapacity + size - 1L) / size); this.segments = (Segment[]) Array.newInstance(Segment.class, size); for (int i = 0; i < this.segments.length; i++) { this.segments[i] = new Segment(roundedUpSegmentCapacity); } }

奇怪的是：用构造函数替换Array.newInstance会导致编译错误，我们Array.newInstance了。但是循环非常有趣，或者说对segments很有吸引力。要了解性能有时会是多么灾难性，可以通过Nitzan Wakart的文章volatile read suprise来建议这种吸引力。

在我看来，本文中描述的案例与所讨论的代码相关。专注于细分：

 this.segments = (Segment[]) Array.newInstance(Segment.class, size); for (int i = 0; i < this.segments.length; i++) { this.segments[i] = new Segment(roundedUpSegmentCapacity); }

创建数组后，立即将其写入ConcurrentReferenceHashMap.segments字段，循环与该字段相互作用。在Segment构造函数中，波动率字段references有一条记录：

 private volatile Reference<K, V>[] references; public Segment(int initialCapacity) { //... this.references = createReferenceArray(this.initialSize); //... }

这意味着不可能改善对segments字段的访问，换句话说，它的内容在循环的每个回合都被读出。如何验证这句话的真实性？最简单的方法是将代码复制到单独的程序包中，然后从Segment.references字段的声明中删除volatile ：

 protected final class Segment extends ReentrantLock { //  private volatile Reference<K, V>[] references; //  private Reference<K, V>[] references; }

检查是否已更改：

 @Benchmark public Object original() { return new tsypanov.map.original.ConcurrentReferenceHashMap(); } @Benchmark public Object nonVolatileSegmentReferences() { return new tsypanov.map.nonvolatile.ConcurrentReferenceHashMap(); }

我们发现性能显着提高（JDK 8）：

 Benchmark Mode Cnt Score Error Units original avgt 100 732,1 ± 15,8 ns/op nonVolatileSegmentReferences avgt 100 610,6 ± 15,4 ns/op

在JDK 11上，花费的时间减少了，但是相对差距几乎没有变化：

 Benchmark Mode Cnt Score Error Units original avgt 100 473,8 ± 11,2 ns/op nonVolatileSegmentReferences avgt 100 401,9 ± 15,5 ns/op

当然， 需要将 volatile 返回该位置并寻找另一种方法。发现了瓶颈-这是该领域的吸引力。如果是这样，则可以创建segments变量，填充数组，然后将其写入字段中：

 Segment[] segments = (Segment[]) Array.newInstance(Segment.class, size); for (int i = 0; i < segments.length; i++) { segments[i] = new Segment(roundedUpSegmentCapacity); } this.segments = segments;

结果，即使进行了如此简单的改进，仍实现了良好的增长：

杰克8

 Benchmark Mode Cnt Score Error Units originalConcurrentReferenceHashMap avgt 100 712,1 ± 7,2 ns/op patchedConcurrentReferenceHashMap avgt 100 496,5 ± 4,6 ns/op

杰克11

 Benchmark Mode Cnt Score Error Units originalConcurrentReferenceHashMap avgt 100 536,0 ± 8,4 ns/op patchedConcurrentReferenceHashMap avgt 100 486,4 ± 9,3 ns/op

用'new T []'替换'Arrays :: newInstance'会得到什么

从Idea启动Spring Booth应用程序时，开发人员通常设置标志“启用启动优化”，该标志添加-XX:TieredStopAtLevel=1 -noverify到VM参数，通过禁用性能分析和C2来加快启动速度。让我们使用指定的参数进行测量：

 // JDK 8 -XX:TieredStopAtLevel=1 -noverify Benchmark Mode Cnt Score Error Units originalConcurrentReferenceHashMap avgt 100 1920,9 ± 24,2 ns/op patchedConcurrentReferenceHashMap avgt 100 592,0 ± 25,4 ns/op // JDK 11 -XX:TieredStopAtLevel=1 -noverify Benchmark Mode Cnt Score Error Units originalConcurrentReferenceHashMap avgt 100 1838,9 ± 8,0 ns/op patchedConcurrentReferenceHashMap avgt 100 549,7 ± 6,7 ns/op

增加三倍以上！

这是为了什么

特别是，这是加快Spring Data JPA中返回投影的查询所必需的。

JMC配置文件显示，创建ConcurrentReferenceHashMap几乎花费执行表单查询的时间的五分之一

 public interface SimpleEntityRepository extends JpaRepository<SimpleEntity, Long> { List<HasIdAndName> findAllByName(String name); }

HasIdAndName是视图投影

 public interface HasIdAndName { int getId(); String getName(); }

另外，在Spring代码中， ConcurrentReferenceHashMap了数十次，因此绝对不是多余的。

结论

改善性能并不像乍看起来那样困难
周期附近的不稳定访问是可能的瓶颈之一
寻找不变性并使它们脱离周期

读什么

Nitzan Wakart的文章

代码示例

变化：
https://github.com/spring-projects/spring-framework/pull/1873
https://github.com/spring-projects/spring-framework/pull/2051

加快ConcurrentReferenceHashMap的创建

智商

数学

数组

关键的小事情

用'new T []'替换'Arrays :: newInstance'会得到什么

这是为了什么

结论

读什么

More articles: