Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

JVM Languages

Optimized Java


String Building

A straightforward example for programming with strings of text shows how memory management, standard API selection, and Java runtime version affect performance.

Whenever characters are read one at a time from an input stream, a string of text may be constructed from individual characters. If the total number of characters is not known, then using the constructor for java.lang.String that takes an array of characters as an input parameter is not an option. In this example, I only examine the time spent in string construction, and a constant 'a' character is used in place of reading from an input stream.

The first implementation leverages Java's ease of development to create a string of 100,000 characters without writing much code:

String string = "";
for (int i = 0; i < 100000; i++)
   string += 'a';

The Java "+" operator makes string concatenation trivial. Most Java developers use "+" or "+=" to concatenate strings when developing in a hurry. One or two characters for the operator is not much to type and it gets the job done. However, under the hood Java defines strings as immutable objects in memory. This means that any time a string value is changed (as with "+=" in this example), a new String object is allocated. The reference to the old String object is replaced, so it is available for garbage collection. Over the course of building this string, 100,000 String objects are allocated and 99,999 are available for garbage collection when execution finishes. This process ran in 325 seconds with a Java 5.0 runtime. The same code ran in 125 seconds with a Java SE 6 runtime on the same machine. Java 5.0 spent 160 percent more time than Java SE 6. Garbage-collection speed for this example is much improved in the latest version of the Java VM.

Experienced Java developers immediately recognize that significantly improved performance can be achieved by changing the implementation. The java.lang.StringBuffer class can be used to reduce the number of object allocations. This class uses an internal array of characters as a buffer that may be larger than the number of characters in the buffer at any given time. The advantage to this approach is that the internal array can be resized in chunks to accommodate several new characters before being resized again. Consequently, the example code can be rewritten to use java.lang.StringBuffer:

StringBuffer buffer =
     new StringBuffer ();
for (int i = 0; i < 100000; i++)
     buffer.append ('a');

The new implementation surprisingly executed in 0 milliseconds. In reality, some time was spent filling the StringBuffer, but java.lang.System.currentTimeMillis(), which was used to measure the time difference, has an approximate 10-millisecond resolution on Windows. Increasing the number of characters from 100,000 to 10,000,000 yielded measurable times. The Java 5.0 runtime executed the loop with 10,000,000 characters in 1310 milliseconds, and the Java SE 6 runtime did the same in 1230 milliseconds. Java SE 6 still provides the best performance, but the difference between the two versions is much narrower than with the original implementation.

Changing the implementation again to leverage the new Java 5.0 API class java.lang.StringBuilder leads to even better performance. java.lang.StringBuilder works as a drop-in replacement for java.lang.StringBuffer with an important difference—StringBuffer is thread safe, so that methods designed to access or modify the contents synchronize with a monitor to ensure that multithreaded interactions never see the buffer in an intermediate state. StringBuilder does not have those protections. It is suitable for single-threaded access, or for multithreaded access when explicit protection for simultaneous access has been implemented in surrounding code. If java.lang.StringBuffer is changed to java.lang.StringBuilder, this example executes in 810 milliseconds in Java 5.0 and 640 milliseconds in Java SE 6. Even with code optimizations, Java 5.0 still requires 27 percent more time than Java SE 6.

The new version of the Java runtime should provide faster execution for most string operations that involve memory allocations. Even though executing in the new runtime produces noticeable improvements, it is no substitute for optimized programming. Awareness of how memory management is done in Java, combined with the new API, improved the performance of string building in this example beyond measure. With 100,000 characters, the improved implementations returned in 0 time; with 10,000,000 characters, the original implementation would never finish in a reasonable time. Unfortunately, this slow implementation is the simplest to program and easiest to overlook performance implications. Poor implementations have given Java a bad reputation for being slower than most programming languages. Java's reputation would be much improved if compilers could detect—or even replace—similarly slow code.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.