🔔 🤶🏿 🔓 如何通过改善性能来降低性能 🐛 🦋 👇🏻

我们想要最好的，但结果却一如既往。
Victor Chernomyrdin，
俄罗斯政治家

在生活中，有时候似乎一切都做对了，但是出了点问题。
这个故事就是关于这种情况的。

一旦我看了一下这段代码并考虑过要加快速度：

public String appendBounds(Data data) { int beginIndex = data.beginIndex; int endIndex = data.endIndex; return new StringBuilder() .append('L') .append(data.str, beginIndex, endIndex) .append(';') .toString(); }

首先，我想使用beginIndex和endIndex变量（以及除截断的字符串外还将向StringBuilder添加2个字符的事实）来计算字符串的总长度，并将此值传递给StringBuilder构造函数以立即选择所需大小的数组。这个想法对我来说似乎太明显了，所以我决定尝试其他方法。尽管“聪明”女孩通常建议使用StringBuilder::append的短字符串替换为字符串，但该代码并未被“ Idea”突出显示，这使我想到了正确的想法，因为字符串较短，而且更易于阅读。

这种简化的障碍是使用StringBuilder.append(CharSequence, int, int)方法。假设data.str字段是一个字符串，则使用String.substring(beginIndex, endIndex)可以从中选择一个子字符串并将其传递给StringBuilder.append(String) 。

转换后的代码：

 public String appendBounds(Data data) { int beginIndex = data.beginIndex; int endIndex = data.endIndex; String subString = data.str.substring(beginIndex, endIndex); return new StringBuilder() .append('L') .append(subString) .append(';') .toString(); }

现在，该想法提供了一个简化：

 public String appendBounds(Data data) { int beginIndex = data.beginIndex; int endIndex = data.endIndex; return 'L' + data.str.substring(beginIndex, endIndex) + ';'; }

但是，在这种情况下，我们的目标不是提高可读性，而是提高生产率。比较两种方法：

 @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g"}) public class StringBuilderAppendBenchmark { @Benchmark public String appendSubString(Data data) { String latinStr = data.latinStr; String nonLatinStr = data.nonLatinStr; int beginIndex = data.beginIndex; int endIndex = data.endIndex; String substring = data.nonLatin ? nonLatinStr.substring(beginIndex, endIndex) : latinStr.substring(beginIndex, endIndex); return new StringBuilder() .append('L') .append(substring) .append(';') .toString(); } @Benchmark public String appendBounds(Data data) { String latinStr = data.latinStr; String nonLatinStr = data.nonLatinStr; int beginIndex = data.beginIndex; int endIndex = data.endIndex; String appended = data.nonLatin ? nonLatinStr : latinStr; return new StringBuilder() .append('L') .append(appended, beginIndex, endIndex) .append(';') .toString(); } @State(Scope.Thread) public static class Data { String latinStr; String nonLatinStr; @Param({"true", "false"}) boolean nonLatin; @Param({"5", "10", "50", "100", "500", "1000"}) private int length; private int beginIndex; private int endIndex; private ThreadLocalRandom random = ThreadLocalRandom.current(); @Setup public void setup() { latinStr = randomString("abcdefghijklmnopqrstuvwxyz"); nonLatinStr = randomString(""); beginIndex = 1; endIndex = length + 1; } private String randomString(String alphabet) { char[] chars = alphabet.toCharArray(); StringBuilder sb = new StringBuilder(length + 2); for (int i = 0; i < length + 2; i++) { char c = chars[random.nextInt(chars.length)]; sb.append(c); } return sb.toString(); } } }

基准测试只需两个便士：将一个随机字符串添加到StringBuilder ，其大小由length字段决定，并且由于码是2019，因此您需要将其检查为仅包含主要拉丁字母字符的字符串（所谓的压缩行，其中每个字符对应于1个字节），以及一个包含非拉丁字符的字符串（每个字符由2个字节表示）。

粗略地检查一下， appendSubString方法在我们appendSubString较慢，因为要粘贴的数据量与appendBounds方法的量一致，但是，在appendSubString方法中appendSubString还明确创建了一个子字符串，即为新对象分配内存并将其内容从data.latinStr复制到其中/ data.nonLatinStr 。

我在家用计算机（Intel Core i5-4690，3.50 GHz）上使用JDK11执行的测量结果更令人惊讶（但乍看之下）似乎是：

 Benchmark nonLatin length Score Error Units appendBounds true 5 44,6 ± 0,4 ns/op appendBounds true 10 45,7 ± 0,7 ns/op appendBounds true 50 129,0 ± 0,5 ns/op appendBounds true 100 218,7 ± 0,8 ns/op appendBounds true 500 907,1 ± 5,5 ns/op appendBounds true 1000 1626,4 ± 13,0 ns/op appendSubString true 5 28,6 ± 0,2 ns/op appendSubString true 10 30,8 ± 0,2 ns/op appendSubString true 50 65,6 ± 0,4 ns/op appendSubString true 100 106,6 ± 0,6 ns/op appendSubString true 500 430,1 ± 2,4 ns/op appendSubString true 1000 839,1 ± 8,6 ns/op appendBounds:·gc.alloc.rate.norm true 5 184,0 ± 0,0 B/op appendBounds:·gc.alloc.rate.norm true 10 200,0 ± 0,0 B/op appendBounds:·gc.alloc.rate.norm true 50 688,0 ± 0,0 B/op appendBounds:·gc.alloc.rate.norm true 100 1192,0 ± 0,0 B/op appendBounds:·gc.alloc.rate.norm true 500 5192,0 ± 0,0 B/op appendBounds:·gc.alloc.rate.norm true 1000 10200,0 ± 0,0 B/op appendSubString:·gc.alloc.rate.norm true 5 136,0 ± 0,0 B/op appendSubString:·gc.alloc.rate.norm true 10 160,0 ± 0,0 B/op appendSubString:·gc.alloc.rate.norm true 50 360,0 ± 0,0 B/op appendSubString:·gc.alloc.rate.norm true 100 608,0 ± 0,0 B/op appendSubString:·gc.alloc.rate.norm true 500 2608,0 ± 0,0 B/op appendSubString:·gc.alloc.rate.norm true 1000 5104,0 ± 0,0 B/op appendBounds false 5 20,8 ± 0,1 ns/op appendBounds false 10 24,0 ± 0,2 ns/op appendBounds false 50 66,4 ± 0,4 ns/op appendBounds false 100 111,0 ± 0,8 ns/op appendBounds false 500 419,2 ± 2,7 ns/op appendBounds false 1000 840,4 ± 7,8 ns/op appendSubString false 5 25,3 ± 0,3 ns/op appendSubString false 10 25,7 ± 0,2 ns/op appendSubString false 50 36,0 ± 0,1 ns/op appendSubString false 100 52,8 ± 0,4 ns/op appendSubString false 500 206,1 ± 6,1 ns/op appendSubString false 1000 388,1 ± 1,6 ns/op appendBounds:·gc.alloc.rate.norm false 5 80,0 ± 0,0 B/op appendBounds:·gc.alloc.rate.norm false 10 88,0 ± 0,0 B/op appendBounds:·gc.alloc.rate.norm false 50 320,0 ± 0,0 B/op appendBounds:·gc.alloc.rate.norm false 100 544,0 ± 0,0 B/op appendBounds:·gc.alloc.rate.norm false 500 2144,0 ± 0,0 B/op appendBounds:·gc.alloc.rate.norm false 1000 4152,0 ± 0,0 B/op appendSubString:·gc.alloc.rate.norm false 5 96,0 ± 0,0 B/op appendSubString:·gc.alloc.rate.norm false 10 112,0 ± 0,0 B/op appendSubString:·gc.alloc.rate.norm false 50 192,0 ± 0,0 B/op appendSubString:·gc.alloc.rate.norm false 100 288,0 ± 0,0 B/op appendSubString:·gc.alloc.rate.norm false 500 1088,0 ± 0,0 B/op appendSubString:·gc.alloc.rate.norm false 1000 2088,0 ± 0,0 B/op

驳斥我们的假设，在大多数情况下（包括总是针对非拉丁字符串） appendSubString方法被证明是更快，更轻松（即使String::substring返回一个新对象）。怎么发生的？

我看书，看到无花果

学习StringBuilder源代码将有助于揭开保密StringBuilder面纱。两种使用的方法都将调用传递给AbstractStringBuilder的相同方法：

 public final class StringBuilder extends AbstractStringBuilder implements java.io.Serializable, Comparable<StringBuilder>, CharSequence { @Override public StringBuilder append(String str) { super.append(str); return this; } @Override public StringBuilder append(CharSequence s, int start, int end) { super.append(s, start, end); return this; } }

转到AbstractStringBuilder.append(String) ：

 public AbstractStringBuilder append(String str) { if (str == null) { return appendNull(); } int len = str.length(); ensureCapacityInternal(count + len); putStringAt(count, str); count += len; return this; } private final void putStringAt(int index, String str) { if (getCoder() != str.coder()) { inflate(); } str.getBytes(value, index, coder); }

这里有趣的是什么？顾名思义， AbstractStringBuilder::inflate在组合不同的字符串时会扩展AbstractStringBuilder.value数组。数据在String::getBytes ：

 void getBytes(byte[] dst, int dstBegin, byte coder) { if (coder() == coder) { System.arraycopy(value, 0, dst, dstBegin << coder, value.length); } else { // this.coder == LATIN && coder == UTF16 StringLatin1.inflate(value, 0, dst, dstBegin, value.length); } }

重要的是什么？如果字符串是同质的，则使用固有的System::arraycopy来移动数据，否则使用StringLatin1::inflate ，它通过委托将我们StringUTF16::inflate ：

 // inflatedCopy byte[] -> byte[] @HotSpotIntrinsicCandidate public static void inflate(byte[] src, int srcOff, byte[] dst, int dstOff, int len) { // We need a range check here because 'putChar' has no checks checkBoundsOffCount(dstOff, len, dst); for (int i = 0; i < len; i++) { putChar(dst, dstOff++, src[srcOff++] & 0xff); } } @HotSpotIntrinsicCandidate static void putChar(byte[] val, int index, int c) { assert index >= 0 && index < length(val) : "Trusted caller missed bounds check"; index <<= 1; val[index++] = (byte)(c >> HI_BYTE_SHIFT); val[index] = (byte)(c >> LO_BYTE_SHIFT); }

因此，如果行是同质的，则使用依赖于平台的方法System::arraycopy来移动数据，否则使用循环（也是固有的）。这意味着，当粘贴两行时，所有字符都位于主拉丁字母的集合中（换句话说，适合1个字节），其性能应比粘贴异构行时好得多。基准测试确认了这一点（请参见nonLatin = false输出）。

现在， AbstractStringBuilder.append(CharSequence, int, int)方法AbstractStringBuilder.append(CharSequence, int, int) ：

 @Override public AbstractStringBuilder append(CharSequence s, int start, int end) { if (s == null) { s = "null"; } checkRange(start, end, s.length()); int len = end - start; ensureCapacityInternal(count + len); appendChars(s, start, end); return this; } private final void appendChars(CharSequence s, int off, int end) { if (isLatin1()) { byte[] val = this.value; for (int i = off, j = count; i < end; i++) { char c = s.charAt(i); if (StringLatin1.canEncode(c)) { val[j++] = (byte)c; } else { count = j; inflate(); StringUTF16.putCharsSB(this.value, j, s, i, end); count += end - i; return; } } } else { StringUTF16.putCharsSB(this.value, count, s, off, end); } count += end - off; }

在这里，该方法类似于上一个示例中的方法：对于同构字符串，使用一种更简单的机制（此处是循环中的符号复制），对于异类字符串，我们使用StringUTF16 ，但是请注意，并未固有地调用StringUTF16::putCharsSB 。

 public static void putCharsSB(byte[] val, int index, CharSequence s, int off, int end) { checkBoundsBeginEnd(index, index + end - off, val); for (int i = off; i < end; i++) { putChar(val, index++, s.charAt(i)); } }

因此，这两种方法的内部结构以及它们不同性能的原因对我们来说还是差不多的。问题自然而然地出现了-如何处理接下来获得的知识？一次有几个选项：

1）请记住这一点，当它检测到可疑代码时，请用手进行更改
2）前往塔吉尔（Tagir）并要求他提交支票，代替我们来做
3）对JDK进行更改，以使代码完全不变。

当然，我们从第三个开始。准备好冒险了吗？

深渊

我们将训练 ~~在猫上~~ 有关第11个Java的源代码，可以在此处下载。

最简单，最明显的改进方法是选择一个位于AbstractStringBuilder.append(CharSequence, int, int)方法内部的子字符串：

 //  public AbstractStringBuilder append(CharSequence s, int start, int end) { if (s == null) { s = "null"; } checkRange(start, end, s.length()); int len = end - start; ensureCapacityInternal(count + len); appendChars(s, start, end); return this; } //  public AbstractStringBuilder append(CharSequence s, int start, int end) { if (s == null) { s = "null"; } checkRange(start, end, s.length()); return append(s.subSequence(start, end).toString()); }

现在，您需要构建JDK，运行测试并在其上运行StringBuilderAppendBenchmark::appendBounds基准测试，需要将其结果与原始JDK上相同基准测试的结果进行比较：

 #   before      JDK, # after -   Benchmark nonLatin length before after Units avgt true 5 44,6 64,4 ns/op avgt true 10 45,7 66,3 ns/op avgt true 50 129,0 168,9 ns/op avgt true 100 218,7 281,9 ns/op avgt true 500 907,1 1116,2 ns/op avgt true 1000 1626,4 2002,5 ns/op gc.alloc.rate.norm true 5 184,0 264,0 B/op gc.alloc.rate.norm true 10 200,0 296,0 B/op gc.alloc.rate.norm true 50 688,0 904,0 B/op gc.alloc.rate.norm true 100 1192,0 1552,0 B/op gc.alloc.rate.norm true 500 5192,0 6752,0 B/op gc.alloc.rate.norm true 1000 10200,0 13256,0 B/op avgt false 5 20,8 38,0 ns/op avgt false 10 24,0 37,8 ns/op avgt false 50 66,4 82,9 ns/op avgt false 100 111,0 138,8 ns/op avgt false 500 419,2 531,9 ns/op avgt false 1000 840,4 1002,7 ns/op gc.alloc.rate.norm false 5 80,0 152,0 B/op gc.alloc.rate.norm false 10 88,0 168,0 B/op gc.alloc.rate.norm false 50 320,0 440,0 B/op gc.alloc.rate.norm false 100 544,0 688,0 B/op gc.alloc.rate.norm false 500 2144,0 2688,0 B/op gc.alloc.rate.norm false 1000 4152,0 5192,0 B/op

什么叫突然！不仅没有改善，而且恶化了。该死的，但是怎么办？

事实是，在一开始，在StringBuilder::append方法的描述中StringBuilder::append我做了一个小但至关重要的遗漏。方法描述如下：

 public final class StringBuilder { @Override public StringBuilder append(String str) { super.append(str); return this; } }

这是其完整视图：

 public final class StringBuilder { @Override @HotSpotIntrinsicCandidate public StringBuilder append(String str) { super.append(str); return this; } }

我们上面检查过的Java代码（在C2级别上进行了加热和编译）并不重要，因为它不是执行的，而是内在的。通过使用async-profiler删除配置文件很容易证明这一点。此后，将删除length = 1000且nonLatin = true的配置文件：

 #   `appendSubString`, JDK    ns percent samples top ---------- ------- ------- --- 19096340914 43.57% 1897673 jbyte_disjoint_arraycopy <--------- 13500185356 30.80% 1343343 jshort_disjoint_arraycopy <--------- 4124818581 9.41% 409533 java.lang.String.<init> #   2177311938 4.97% 216375 java.lang.StringUTF16.compress #   1557269661 3.55% 154253 java.util.Arrays.copyOfRange #   349344451 0.80% 34823 appendSubString_avgt_jmhStub 279803769 0.64% 27862 java.lang.StringUTF16.newString 274388920 0.63% 27312 org.openjdk.jmh.infra.Blackhole.consume 160962540 0.37% 15946 SpinPause 122418222 0.28% 11795 __memset_avx2

StringBuilder （和AbstractStringBuilder ）的代码在这里甚至都没有味道，几乎3/4的配置文件被一个内部函数占用。我想在我们“改进的” StringBuilder.append(CharSequence, int, int)的配置文件中观察到大约相同的图片。

实际上，我们有：

  ns percent samples top ---------- ------- ------- --- 19071221451 43.78% 1897827 jbyte_disjoint_arraycopy 6409223440 14.71% 638348 jlong_disjoint_arraycopy 3933622128 9.03% 387403 java.lang.StringUTF16.newBytesFor 2067248311 4.75% 204193 java.lang.AbstractStringBuilder.ensureCapacityInternal 1929218737 4.43% 194751 java.lang.StringUTF16.compress 1678321343 3.85% 166458 java.util.Arrays.copyOfRange 1621470408 3.72% 160849 java.lang.String.checkIndex 969180099 2.22% 96018 java.util.Arrays.copyOf 581600786 1.34% 57818 java.lang.AbstractStringBuilder.<init> 417818533 0.96% 41611 appendBounds_jmhTest 406565329 0.93% 40479 java.lang.String.<init> 340972882 0.78% 33727 java.lang.AbstractStringBuilder.append 299895915 0.69% 29982 java.lang.StringBuilder.toString 183885595 0.42% 18136 SpinPause 168666033 0.39% 16755 org.openjdk.jmh.infra.Blackhole.consume

您会说：“它们在内部，在最顶层！” 实际上，只有这些不是相同的内在函数（包括从上面比较第二个的名称）。回想一下：

 public final class StringBuilder { @Override @HotSpotIntrinsicCandidate public StringBuilder append(String str) { super.append(str); return this; } }

在此，内在函数替换了对StringBuilder.append(String)的调用，但是在我们的补丁程序中，此调用不是！称为AbstractStringBuilder.append(String) 。我们jbyte_disjoint_arraycopy的jbyte_disjoint_arraycopy调用是StringLatin1::inflate的内在调用，它是通过String::getBytes从AbstractStringBuider::putStringAt jbyte_disjoint_arraycopy调用的。也就是说，与StringBuilder::append不同StringBuilder::append不仅处理特定于平台的代码，还处理Java代码，

了解失败的原因，否则请尝试成功。很容易猜到我们需要以某种方式引用StringBuilder::append 。您可以通过撕下以前的补丁并对StringBuilder本身进行更改来做到这一点：

 public final class StringBuilder { //  @Override public StringBuilder append(CharSequence s, int start, int end) { super.append(s, start, end); return this; } //  @Override public StringBuilder append(CharSequence s, int start, int end) { if (s == null) { s = "null"; } checkRange(start, end, s.length()); return this.append(s.subSequence(start, end).toString()); } }

现在，一切都明智地完成了：内在的StringBuilder :: append被调用。
重建，运行，比较：

 #   before      JDK, # after -   Benchmark nonLatin length before after Units avgt true 5 44,6 60,2 ns/op avgt true 10 45,7 59,1 ns/op avgt true 50 129,0 164,6 ns/op avgt true 100 218,7 276,2 ns/op avgt true 500 907,1 1088,8 ns/op avgt true 1000 1626,4 1959,4 ns/op gc.alloc.rate.norm true 5 184,0 264,0 B/op gc.alloc.rate.norm true 10 200,0 296,0 B/op gc.alloc.rate.norm true 50 688,0 904,0 B/op gc.alloc.rate.norm true 100 1192,0 1552,0 B/op gc.alloc.rate.norm true 500 5192,0 6752,0 B/op gc.alloc.rate.norm true 1000 10200,0 13256,0 B/op avgt false 5 20,8 37,9 ns/op avgt false 10 24,0 37,9 ns/op avgt false 50 66,4 80,9 ns/op avgt false 100 111,0 125,6 ns/op avgt false 500 419,2 483,6 ns/op avgt false 1000 840,4 893,8 ns/op gc.alloc.rate.norm false 5 80,0 152,0 B/op gc.alloc.rate.norm false 10 88,0 168,0 B/op gc.alloc.rate.norm false 50 320,0 440,0 B/op gc.alloc.rate.norm false 100 544,0 688,0 B/op gc.alloc.rate.norm false 500 2144,0 2688,0 B/op gc.alloc.rate.norm false 1000 4152,0 5187,2 B/op

我真的感到非常难过，但并没有好转。现在是一个新的配置文件：

  ns percent samples top ---------- ------- ------- --- 19614374885 44.12% 1953620 jbyte_disjoint_arraycopy 6645299702 14.95% 662146 jlong_disjoint_arraycopy 4065789919 9.15% 400167 java.lang.StringUTF16.newBytesFor 2374627822 5.34% 234746 java.lang.AbstractStringBuilder.ensureCapacityInternal 1837858014 4.13% 183822 java.lang.StringUTF16.compress 1472039604 3.31% 145956 java.util.Arrays.copyOfRange 1316397864 2.96% 130747 appendBounds_jmhTest 956823151 2.15% 94959 java.util.Arrays.copyOf 573091712 1.29% 56933 java.lang.AbstractStringBuilder.<init> 434454076 0.98% 43202 java.lang.String.<init> 368480388 0.83% 36439 java.lang.AbstractStringBuilder.append 304409057 0.68% 30442 java.lang.StringBuilder.toString 272437989 0.61% 26833 SpinPause 201051696 0.45% 19985 java.lang.StringBuilder.<init> 198934052 0.45% 19810 appendBounds_avgt_jmhStub

几乎没有改变。对我来说，目前尚不清楚为什么从StringBuilder访问StringBuilder.append(String)时内在函数为什么不起作用。有人怀疑将StringBuilder.append(String)方法的主体粘贴（内联）到StringBuilder.append(CharSequence, int, int)的主体中会改变VM方法调用的处理过程。

无论如何，这是惨败，兄弟。无法修补JDK，但是我们仍然可以在需要的地方手动进行替换。

失败文学撤退

响应加密在两天内完成。航海家不想与Oto Velara分手，后者的公司制造出了惊人的快速而强大的军舰。导航器不想向我读取加密。他只是简单地从命令栏中重复响应：“否”。加密不能解释为什么“不”。无论如何，“否”表示他是大型计算机所认识的人。如果对他一无所知，答案是肯定的：尝试一下。太可惜了失去这么有趣的人真可惜。指挥官一定为我感到抱歉也许第一次是可惜的。他看到我闯入维京人。他不想再把我逼入猎狗犬。
他保持沉默。但是我知道在提供大量的工人短缺时：
-我，同志，明天工作。放开我
-继续 -突然她笑了。 “你知道，每一朵云都有一线希望。”
“我，将军同志，总是生病没有好处。”
“就在这里。” 你被禁止见他，这很糟糕。 但是，除了我们的经验宝藏外，我们还添加了另一种谷物。

结论：

在某些情况下，JDK方法的代码与实际执行无关，因为可以代替方法的主体而执行内部函数，该内部函数隐藏在VM的肠道中。
可以识别出此类方法，特别是@HotSpotIntrinsicCandidate标签指向它们，尽管某些方法被内化而没有任何提示，例如String::equals （以及许多其他）。
从前两个得出的结论是，我们对JDK代码如何工作的讨论可能与现实相反。 Cestest vie

聚苯乙烯
另一种可能的替代方法：

 StringBuilder sb = new StringBuilder(); sb.append(str, 0, endIndex); // --> StringBuilder sb = new StringBuilder(str.substring(o, endIndex));

PPS
Oracle开发人员正确地指出

在我看来，将代码路径引入
sb.append（cs，int，int）分配内存以便获得一个
只是有时候会使事情运行得更快。如您所见，性能
权衡并不明显。

相反，如果我们要优化sb.append（cs，int，int），也许我们应该去
并通过添加或重新排列内在函数来做到这一点。

提出的解决方案是StringBuilder.append(CharSequence, int, int)的内在化。

→ 任务
→ 讨论

PPS
有趣的是，目前，当写类似

 StringBuilder sb = new StringBuilder(); sb.append(str.substring(0, endIndex));

“想法”建议将代码简化为

 StringBuilder sb = new StringBuilder(); sb.append(s, 0, endIndex);

如果此位置的性能对您而言不是很重要，则使用第二个简化版本可能更正确。尽管如此，我们编写的大多数代码是给我们的战友而不是机器的。

如何通过改善性能来降低性能

我看书，看到无花果

深渊

More articles: