字符串拼接问题应该是每个Java程序员都熟知的事情了,几乎每个Java程序员都读过关于StringBuffer/StringBuilder来拼接字符串。
在大多数的教程中,也许你会看到用+号拼接字符串会生成多个String,导致性能过差,建议使用StringBuffer/StringBuilder来拼接。
可是真的是这样的吗?
本文在JDK8中做了如下实验:
public static void main(String[] args) { String result = ""; result += "some more data"; System.out.println(result); }
通过javap -c来反编译得到:
Code: 0: aload_0 // Push 'this' on to the stack 1: invokespecial #1 // Invoke Object class constructor // pop 'this' ref from the stack 4: return // Return from constructor public static void main(java.lang.String[]); Code: 0: ldc #2 // Load constant #2 on to the stack 2: astore_1 // Create local var from stack (pop #2) 3: new #3 // Push new StringBuilder ref on stack 6: dup // Duplicate value on top of the stack 7: invokespecial #4 // Invoke StringBuilder constructor // pop object reference 10: aload_1 // Push local variable containing #2 11: invokevirtual #5 // Invoke method StringBuilder.append() // pop obj reference + parameter // push result (StringBuilder ref) 14: ldc #6 // Push "some more data" on the stack 16: invokevirtual #5 // Invoke StringBuilder.append // pop twice, push result 19: invokevirtual #7 // Invoke StringBuilder.toString:(); 22: astore_1 // Create local var from stack (pop #6) 23: getstatic #8 // Push value System.out:PrintStream 26: aload_1 // Push local variable containing #6 27: invokevirtual #9 // Invoke method PrintStream.println() // pop twice (object ref + parameter) 30: return // Return void from method
可以看到Java编译器优化了生成的字节码,自动创建了一个StringBuilder,并进行append操作。
由于构建最终字符串的子字符串在编译时已经已知了,在这种情况下Java编译器才会进行如上的优化。这种优化称为a static string concatenation optimization,自JDK5时就开始启用。
那是否就能说明在JDK5以后,我们不再需要手动生成StringBuilder,通过+号也能达到同样的性能?
我们尝试下动态拼接字符串:
动态拼接字符串指的是仅在运行时才知道最终字符串的子字符串。比如在循环中增加字符串:
public static void main(String[] args) { String result = ""; for (int i = 0; i < 10; i++) { result += "some more data"; } System.out.println(result); }
同样反编译:
Code: 0: aload_0 // Push 'this' on to the stack 1: invokespecial #1 // Invoke Object class constructor // pop 'this' ref from the stack 4: return // Return from constructor public static void main(java.lang.String[]); Code: 0: ldc #2 // Load constant #2 on to the stack 2: astore_1 // Create local var from stack, pop #2 3: iconst_0 // Push value 0 onto the stack 4: istore_2 // Pop value and store it in local var 5: iload_2 // Push local var 2 on to the stack 6: i2d // Convert int to double on // top of stack (pop + push) 7: ldc2_w #3 // Push constant 10e6 on to the stack 10: dcmpg // Compare two doubles on top of stack // pop twice, push result: -1, 0 or 1 11: ifge 40 // if value on top of stack is greater // than or equal to 0 (pop once) // branch to instruction at code 40 14: new #5 // Push new StringBuilder ref on stack 17: dup // Duplicate value on top of the stack 18: invokespecial #6 // Invoke StringBuilder constructor // pop object reference 21: aload_1 // Push local var 1 (empty String) // on to the stack 22: invokevirtual #7 // Invoke StringBuilder.append // pop obj ref + param, push result 25: ldc #8 // Push "some more data" on the stack 27: invokevirtual #7 // Invoke StringBuilder.append // pop obj ref + param, push result 30: invokevirtual #9 // Invoke StringBuilder.toString // pop object reference 33: astore_1 // Create local var from stack (pop) 34: iinc 2, 1 // Increment local variable 2 by 1 37: goto 5 // Move to instruction at code 5 40: getstatic #10 // Push value System.out:PrintStream 43: aload_1 // Push local var 1 (result String) 44: invokevirtual #11 // Invoke method PrintStream.println() // pop twice (object ref + parameter) 47: return // Return void from method
可以看到在14的时候new了StringBuilder,但是在37的时候goto到了5,在循环过程中,并没有达到最优化,不断在生成新的StringBuilder。
所以上述代码类似:
String result = ""; for (int i = 0; i < 10; i++) { StringBuilder tmp = new StringBuilder(); tmp.append(result); tmp.append("some more data"); result = tmp.toString(); } System.out.println(result);
可以看到不断生成新的StringBuilder,并且通过tostring,原来的StringBuilder将不再引用,作为垃圾,也增加了GC成本。
所以,在实际的使用中,当你无法区分字符串是静态拼接还是动态拼接的时候,还是使用StringBuilder吧。