Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Flexible C++ #2: Efficient Integer to String Conversions, part 3


Efficient Integer To String Conversions, Part 2

In this inaugural installment of a new Experts series, Matthew Wilson presents a fast and useful technique for converting integers to strings.

The source code for this article is available for download here. Part 1 of this article is available here.

(Read the Flexible C++ column philosophy before proceeding.)

In a previous article, [1] I described a technique for fast integer to string conversions, based on writing the least significant digit backwards into a caller-supplied buffer, dividing by the base (10) and repeating until all digits are written and the integer is fully converted to string form.

While this technique is fast, robust (thread-safe and type-safe) and works with arbitrary character-encodings (e.g., Unicode), it leaves a little to be desired in terms of usability and being clear from programmer error. This is because it requires a caller-supplied buffer within which it writes the converted integer, as in:

uint64_t      i = . . . 
wchar_t       buff[21];
wchar_t const *result = stlsoft::integer_to_string(buff, 21, i);

The second parameter, the buffer length (note: this is length in characters, not size in bytes), is used to determine the end-point of the resultant string form. In debug mode, there is an assertion to determine that the given length was sufficient, but (in release form) it is beholden on the programmer to provide a buffer of sufficient length. (This is one of the ways in which the technique derives its extra speed.) Although the required sizes for buffers of the various types are both small and constant (see Table 1), and developers who've used the function suite readily comprehend the idiom, it still has an uneasy sense of fragility. Furthermore, though no one who's used it has reported any error, there are many complaints regarding the verbosity of the code [2].

This article examines how we can extend the technique to address these issues. Three solutions are discussed, which sacrifice only a small part of the performance advantages of the original solution for substantial gains in usability. (The first two of these form part of the WinSTL [3] Conversion Library.)

The reason that client code of the integer_to_string<>() function family is verbose is that the buffer (and its length) is caller-supplied. This is to enable them to be re-entrant -- thread-safe in other words. But if we could somehow arrange to have a thread-safe internal buffer, then the parameter list would drop to one: the integer to convert. Sounds like a job for some Thread-Specific Storage (TSS) [4]. Using TSS, we can extend the original functions in a new function int_to_string<>() [5], which could be defined as:

template< typename C
        , typename I
        >
inline C const *int_to_string(I value)
{
  const size_t  CCH     = 21; // fits 64-bit + sign
  C             *buffer = i2str_get_tss_buffer<C, CCH>();

  return stlsoft::integer_to_string(buffer, CCH, value);
}

We make CCH = 21, to provide enough space for any integer type up to 64-bits (signed or unsigned). The implementation relies on the function i2str_get_tss_buffer<>() to return a buffer, of static linkage, of the given character type C on a thread-specific basis. It is the implementation of this function that forms the substance of the article.

__declspec(thread)

On the Win32 platform, TSS (or TLS - Thread-Local Storage - as Microsoft calls it) is available in two forms. For executables, and dynamic libraries that are explicitly loaded during application start-up [6], several Win32 compilers provide the Microsoft extension __declspec(thread), which declares a thread-specific variable. For example,
__declspec(thread) int t_index;

DWORD __stdcall ThreadProc(void *)
{
	. . .

t_index is a thread-local integer; each thread will get its own copy of the variable. Using __declspec(thread) we can offer an implementation of i2str_get_tss_buffer<>() as follows:

template< typename C
        , size_t   CCH
        >
inline C *i2str_get_tss_buffer()
{
  __declspec(thread) static C s_buffer[CCH];

  return s_buffer;
}

(Note that in the actual WinSTL implementation there are eight int_to_string<>() functions [5], and that they all use 21 as their buffer length. This actually saves space -- both code and data - since the number of instantiations of i2str_get_tss_buffer<>() drop from a potential maximum of 8 to just 2. In addition, it also reduces time costs where the implementation of the function contains non-trivial logic, as we see with the next implementation.)

This implementation has extremely good performance characteristics. Alas, __declspec(thread) is of largely specious utility, given a serious restriction to its use. Without delving into the gory details [6], the allocation of space in the .tls section for all decorated variables can only be carried out at application load time. What this means is that any DLLs that are loaded dynamically -- which, given the proliferation of component-based architectures and the trend for delay-loading, is a lot -- cannot contain __declspec(thread). If they do, the load fails.

This means that __declspec(thread) should not be used in any DLLs whatsoever, since one can always conceive of scenarios in which a DLL that is link-bound (i.e., linked via export library, and therefore loaded at start-up) to an application may later be used in circumstances in which it may be statically loaded (e.g., it may be linked to an OCX).

Win32 TLS

Win32 offers another form of TSS/TLS, in the guise of the TLS API. A TLS "index" is allocated by calling TlsAlloc(), which may subsequently be accessed from any thread by calling TlsGetValue() and TlsSetValue(), where the value (called a "slot") is maintained by the API on a thread-specific basis. (The index is released by calling TlsFree().) Using Win32 TLS, the implementation of i2str_get_tss_buffer<> becomes:
template< typename C
        , size_t   CCH
        >
inline C *i2str_get_tss_buffer()
{
  static Index<C, CCH>  s_index;
  Slot<C, CCH>          *slot = s_index.GetSlot();

  if(0 == slot)
  {
    slot = s_index.AllocSlot();
  }

  return slot->buff;
}

All the action is carried out by the Index<> and Slot<> classes, which are shown in Listing 1. The Index<> class allocates a TLS index by calling TlsAlloc(), which is then used in its GetSlot() and AllocSlot() methods. GetSlot() simply returns the suitably cast return from TlsGetValue(). AllocSlot() is a little more complicated, since it needs to allocate a Slot<> instance and add it onto Index<>'s linked-list of Slot<> instances, within a thread-safe block. This block only needs to guard the integrity of the linked-list (in the form of the m_top member), hence it does not include the call to TlsSetValue(). (The Slot<> instances are all destroyed in Index<>'s destructor, which occurs at module/process shutdown and does not need to use any thread-safe measures.)

There's a fair bit of code involved, here, but once the Index<> has been constructed the first time through the function (on any thread), and a Slot<> has been allocated the first time through for each thread there is very little further cost, as we'll see in the performance section. The very observant among you may have noticed a potential race-condition [7] in that there is no thread-serialisation protection visible in int_to_string<>() for the static construction of the Index<> instance. It's not terribly obvious, because we all become accustomed to thinking in C++ and think of statics as somehow handled for us magically, but remember that local statics are implemented by a flag plus the C runtime library's onexit(): the first time through a piece of code with a static class instance the instance is constructed and the flag is set; subsequent times through the flag is checked and the construction is skipped. However there is no thread-safety built into the instantiation of static instances; at least I've never heard of such. Hence, the race condition is that one thread could be part way through the is-constructed => construct => mark-constructed cycle, another thread (or threads) enters it. Needless to say, the minimum badness expected would be leaks, and we're far more likely to have a crash. I'm not going to go into how this is solved in the WinSTL libraries because (i) we're getting off topic in this limited space, and (ii) despite being correct, robust and portable (between Win32 compilers), it's very ugly, involving lots of in-place construction and spin-locks. (I probably will cover this in a later column; for now, impatient parties may consult the implementation in the STLSoft libraries, version 1.7.1 onwards.)

So the DLL dynamic-loading issue has been addressed, but the garden's not necessarily all green. The problem is that the number of slots on a given Win32 system is finite. On later operating systems this is not a big problem, but on earlier versions of Windows there are very few TLS slots available. Table 2 shows the number of available slots on the various Win32 operating systems I had to hand-obtain by a simple utility I wrote for that purpose (included in the archive). The API makes a guarantee of there being at least 64 slots available on any system, but it's not difficult to imagine very sophisticated software having a great many components that utilize TLS, so it is quite conceivable that exhaustion may occur.

Another downside is that this form of TSS is slower than __declspec(thread). As noted above, making all the int_to_string<>() overloads use CCH == 21 is efficient in space and time terms. However, there is another benefit. In light of what we now know about the potential scarcity of slots, we can see that we will now use a maximum of two TLS slots, rather than up to 8, making the catastrophic failure to allocate a slot significantly reduced.

Nonetheless, there is the issue of what to do if and when a TLS index is unavailable. From a practical point of view, one can ensure that both char and wchar_t variants of one of the conversion functions are called in the application initialization. While not proofing the application from failure, this will at least precipitate that failure sooner, so it is a significant help with testing of practical robustness. Naturally it does absolutely nothing to guarantee that exhaustion cannot happen. Where absolute robustness is required, we must take another approach.

One slight flaw with these more usable refinements is that there is no longer any character-based parameter from which the compiler can deduce the character type [8], which means that the template function must be explicitly parameterised, as in

uint64_t      i = . . . 
wchar_t const *result = int_to_string<wchar_t>(i);

If you don't like that, you can easily define some inline int_to_string_a and int_to_string_w templates.

Platform-independent APIs

The previous two solutions were both variants [9] of the same function(s) in the WinSTL project. In my commercial incarnation as a consultant for Synesis Software, I implemented a platform-independent version of the int-to-string functions, which have been recently updated to use the STLSoft integer_to_string<>() function(s). In one of the Synesis core DLLs exists the function LongToStringA() -- along with its unsigned, Unicode and 64-bit siblings -- defined within the SynesisStd namespace (see Listing 2).

It utilizes a platform-independent TSS API, which works in a logically similar way to Win32's TLS except that the creation of an "index" also involves specification of functions that are called when a thread-specific datum (slot) is overwritten or deleted, and when the index itself is destroyed. These functions enable non-leaking resource management without the need for explicit Get-delete-Set client code, and also ensure that the APIs & data associated with indexes are not unloaded prematurely if they reside in other dynamically loaded modules (a bit like COM's in-proc server module locking infrastructure).

The implementation involves intra-process mutual exclusion synchronisation objects and ordered thread-identifier slot lists, so it shouldn't come as a great surprise that the disadvantage with this approach is that it performs substantially less well than the other two approaches.

Performance

We've improved usability and removed any possibility for programmer error (in stipulating an erroneous buffer length), but how has this affected performance? Let's see how the three solutions stack up against stlsoft::integer_to_string<>() and sprintf(). Listing 3 shows an extract from the test program (included in the archive) where, for each function, 10 million 32-bit signed integer values are converted to string form. This is done twice, and the time of the second iteration is reported. All tests were carried out on a 2GHz 512MB Pentium IV machine in a single session with no other busy processes. (In fact the program main() contains a thread priority promotion, to further eliminate the effects of any other system activity.) (see Figure 1.)

There are a number of interesting facts apparent. First, sprintf() is undoubtedly the slowest way to convert integers to strings. Second, Borland and Digital Mars are not very good at optimizing the STLSoft/WinSTL template functions. Conversely, CodeWarrior, GCC and Intel are all very good at optimising the template functions. For each of these three compilers, stlsoft::integer_to_string<>() is between 12% and 25% that of sprintf(). This is in line with our expectations from the efficiency gains demonstrated in [1].

What we're keen to discover is whether the increases in usability we've effected in the new functions result in a loss of this performance supremacy. It's clear that SynesisStd::LongToStringA() has given away a lot of the performance superiority of stlsoft::integer_to_string<>() over sprintf(). All that platform-independency has a cost. In defense of this strategy, I would point out that the TssStore_ API has been unchanged for about five years, and having revisited it for this article, it is hardly what an older, wiser, me would call "efficiently implemented". I would assert that I could substantially trim its performance costs - and I probably soon will - which would show this technique in a much better light.

The picture is far better for the two WinSTL int_to_string<>() implementations. Using __declspec(thread) results in an additional overhead over stlsoft::integer_to_string<>() of no more than 1% for CodeWarrior, Digital Mars and Intel. For Visual C++ it is about 25%. Relative to sprintf() this implementation ranges from 12% to 34%, except for Digital Mars (79%).

The TlsAlloc() based version is also impressively efficient, costing, relative to stlsoft::integer_to_string<>(), between 2% and 11% for all compilers except CodeWarrior (where it is 51% extra). Relative to sprintf() it ranges from 13% to 34%, except for Borland (64%) and Digital Mars (80%). Intriguingly, both Visual C++ 7.0 and 7.1 perform significantly more efficiently with TlsAlloc() than with __declspec(thread), though this behavior is not observed with Visual C++ 6.0.

It's worthwhile to take a look at the absolute times. Clearly GCC and Intel respond very well in partnership with the STLSoft conversion components (with CodeWarrior somewhat less so), and perform substantially better than any permutation involving the other compilers.

Conclusion

So what is the recommended solution? Well as with most things in software engineering, the answer is: it depends. Where the __declspec(thread) limitations fall within your requirements, it is the most efficient. If you're working with DLLs on Win32, then the TlsAlloc() version probably provides your best mix of performance and an acceptably low risk of failure [9]. If you want platform-independence, or a virtual guarantee against failure (within the limits of available memory) to allocate TSS indexes, then you'll need some equivalent of the TssStore API. (Of course, where it's convenient and appropriate you can stick with the perfectly functional, albeit verbose, original stlsoft::integer_to_string<> functions, and get maximum performance.)

Notes and references

[1] Matthew Wilson. "Efficient Integer To String Conversions", C/C++ User's Journal, December 2002.

[2] I never reuse numeric literals in code, and would always use some dimensionof() pseudo-operator -- such as (sizeof(X)/sizeof(X[0])) -- so the example given would actually look like

result = stlsoft::integer_to_string(buff, stlsoft_num_elements(buff), i);
which, though less fragile, is even more verbose.

[3] WinSTL is a sub-project of STLSoft and deals with all things Win32 in an STL fashion. It is available at http://winstl.org/. The int_to_string<> components will be available with STLSoft v1.7.1, which is due out in August 2003.

[4] Douglas Schmidt, Nat Pryce & Timothy Harrison. "Thread-Specific Storage for C/C++", 1997, http://www.cs.wustl.edu/~schmidt/PDF/TSS-pattern.pdf

[5] Actually there are 8 functions, corresponding to the eight integer_to_string<> functions, for each numeric integer type (signed and unsigned 8, 16, 32 and 64-bit integers). It's more effort for the poor library writer (sigh), and occasionally inconvenient for the user, but this way helps ensure that other integral types, e.g., wchar_t and bool, are not used with these purely numeric functions. There are actually three choices in this guise.

[6] Jeffrey Richter."Advanced Windows", Third Edition, (Microsoft Press, 1997).

[7] I'm sure that those of you who spotted it will have reasoned just how incredibly unlikely this race condition is to be encountered; however "incredibly unlikely" doesn't cut the mustard in multi-threaded software development, so it must be rendered impossible.

[8] C++ standard ISO/IEC 14882:1998(E). See sections 13.1 and 14.8.

[9] The WinSTL implementation actually defaults to the TlsAlloc() version, but allows you to specify the appropriately ugly _WINSTL_INT_TO_STRING_USE_DECLSPECTHREAD_FOR_EXES to use __declspec(thread). If you do so in a DLL build, however, you'll receive some warnings advising you not to do so.

About the Author

Matthew Wilson is a software development consultant for Synesis Software, specializing in robustness and performance in C, C++, C# and Java. Matthew is the author of the STLSoft libraries, and the forthcoming book Imperfect C++ (to be published by Addison-Wesley, 2004). He can be contacted via [email protected] or at http://stlsoft.org/ .


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.