Reduce your TCO of building and using Spring Native applications with just 3 steps. Switch to the best Unified Java Runtime. Learn More.

Efficient code vs string concatenation

Efficient code vs string concatenation. Solve This Java™ Snippet!


Published June 25, 2021


BellSoft is starting a series called Solve This Java™ Snippet! The premise is straightforward: our expert engineers will give a short piece of code and then explain what is wrong with it.

Our first candidate is this little guy:

  int n = 100;
  String s = "1";
  for(int i = 0; i < n; i++)
    s = s + 1;

A cycle that repeats an ‘n’ number of times and adds ‘1’ to an ‘s’ string with an assigned value of ‘1.’

Can you assess the algorithmic complexity?

This may not be as simple as it seems.

It may look like doing ‘n’ operations means O(n) complexity. Well, study the operation precisely. The thing you should notice is that it’s a concatenation of a string accumulator and a string representation of an integer. At each iteration of the loop, ‘s’ grows:

1
11
111
....

Since Java strings are immutable, growing means copying all previous contents. The number of copied characters will be

(1) + (1 + 1) + (1 + 1 + 1) + … ,

which is a well-known sum of an arithmetic progression calculated as (n^2-n)/2. It means time complexity will be quadratic: O(n^2).

Let’s check what happens under the hood and how the benchmarks behave. This explanation is relevant for Java versions newer than 8, where such code is more efficient. There are comprehensive materials online that explain the Java 8 situation in depth.1 All examples are made with Liberica JDK.

If we inspect the bytecode targeted to Java 9+, there will be fragments like the invokedynamic call for string concatenation:

  14: invokedynamic #4,  0              // InvokeDynamic #0:makeConcatWithConstants:(Ljava/lang/String;)Ljava/lang/String;

and

BootstrapMethods:
  0: #38 REF_invokeStatic java/lang/invoke/StringConcatFactory.makeConcatWithConstants:(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;

But basically, the default behavior is similar to what see for Java 8 target when javac (the primary Java compiler) generates concatenation code equivalent to

  s = new StringBuilder().append(s).append(1).toString();

What does interest us the most in this example? The JVM performance of this code, and whether it’s good enough. Now we’ll run some tests.

@State(Scope.Thread)
public class ConcatBench {

  @Param({"100", "400"})
  int n;

  @Benchmark
  public String chain() {
    String s = "1";
    for(int i = 0; i < n; i++)
      s = s + 1;
    return s;
  }

  @Benchmark
  public String sameSB() {
    String s = "1";
    StringBuilder sb = new StringBuilder(s);
    for(int i = 0; i < n; i++)
      sb.append(1);
    return sb.toString();
  }

  @Benchmark
  public String newSB() {
    String s = "1";
    for(int i = 0; i < n; i++) {
      s = new StringBuilder().append(s).append(1).toString();
    }
    return s;
  }
}

This example also has a third code variant where StringBuilder is created outside of the loop.

If we target javac to Java 16 and run the benchmark on JDK 16, we get the following:

Benchmark           (n)        Score        Error  Units
ConcatBench.chain   100   940733.514 ±  39773.301  ops/s
ConcatBench.newSB   100   867987.506 ±  10770.221  ops/s
ConcatBench.sameSB  100  4937177.165 ± 166843.186  ops/s # 5.2x faster than chain

ConcatBench.chain   400   128595.073 ±    819.279  ops/s # 7.3x slower than n=100
ConcatBench.newSB   400   127075.294 ±   1383.764  ops/s
ConcatBench.sameSB  400  1403760.812 ±  37459.769  ops/s # 3.5x slower than n=100, 11x faster than chain

When ‘n’ grows by four times, the code variant where StringBuilder is reused slows down by 3.5x (close to linear), while the variant with re-created StringBuilder slows down by 7.3x. The exact quadratic slowdown would be 16-fold. However, there are additive costs and optimizations for data copying (see StubRoutines::jbyte_disjoint_arraycopy in the OpenJDK code).

As an experiment, you can run the benchmarks on your hardware. Try playing with javac and JDKs and compare 7, 8, 11, and 16 bytecode running on JDK 7, 8, 11, and 16.

In short, prepare for a higher cost

The complexity will be quadratic — expressed as a sum of an arithmetic sequence to n terms — and it will be in the same proportion for both the number of operations and memory. The reason is string concatenation, an infamous Java caveat.

Isn’t performance in the JVM exciting? Would you like to see more of such posts? Subscribe to our newsletter and get informed about new articles first!

  1. java.lang.String Catechism - Stay Awhile And Listen by Aleksey Shipilёv
  2. String concatenation with Java 8
  3. GitHub: jdk/stringopts.cpp
  4. StringConcatFactory#makeConcatWithConstants()
  5. src/hotspot/share/opto/stringopts.cpp
Author image

Dmitry Chuyko

Senior Performance Architect at BellSoft

 Twitter

 LinkedIn