Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Flexible C++ #10: Beware Logical Constness


Introduction

This month's installment is a bit of a change, since I'm not going to be showing you a cunning new technique. Rather, it's a story of how a powerful and useful language feature const ([1]), used with an associated keyword mutable ([1]), can leave you open to race-conditions when used in multithreaded contexts.

A Brief History of const

One of C++'s best features is its ability to express, via application of the const keyword, the semantic constraint on an object that it may not be modified. This constraint serves as a very useful and obvious documentation of the intent of the author of the code; it is also earnestly policed by the compiler. Consider the following examples of const:

<b>1.</b> 
const size_t len = strlen(str);
. . .
. . .
  something = 1 + len;

<b>2.</b> 
char const *find_char(char const *s, char ch); 

<b>3.</b> 
class String
{
  . . .
  size_t      length() const;
  char const  *c_str() const;
  . . .

In all three of the above cases, const serves to enhance program correctness. With the first case, len is declared const because the author of the algorithm does not intend to change its value during the processing of the algorithm. The compiler complements this intent by enforcing that any code that attempts to change the value of len is rejected as an error. In long code blocks, this technique is very helpful in avoiding errors. (It's my practice to make all local variables const by default. When I later come back to change an algorithm, and that change requires a nonconst-ing of a variable, this adds an extra level of awareness/reminder to consider all the ramifications of the changes I'm about to make.)

The second case—the declaration of the function find_char()—uses const to indicate to users of the find_char() function that the contents of the C-string that will be searched for the given character will not be changed. The compiler acts on this to ensure that when the implementation is compiled, it will not be allowed to change anything that s points to. Furthermore, it will not allow the value of s (that is to say, the address of the thing to which s is pointing) to be passed to another pointer that is nonconst.

The third case shows a string class, whose length() and c_str() members are declared const. This tells users of the String class that calling either method will not alter the contents of the string on which it's called. As with find_char(), the compiler ensures that the implementations of the methods adhere to that commitment.

The three uses of const described above collaborate to give C++ a powerful advantage in expressiveness and behavior, as in:

bool string_has_character(String const &str, char ch)
{
  char const *const s = str.c_str(); // Get a pointer to str's C-string

  return NULL != find_char(s, ch);
}

const String  str("What you talkin' 'bout, Willis?");

bool bHasW = string_has_character(str, 'W');

You can see that we have a const instance of String, str. We can pass that to string_has_character(), confident that it will not be altered because string_has_character takes a reference-to-const-String. Because the c_str() method is const, and therefore does not alter the contents of the string, the compiler allows it to be called on the const instance, str. The return value is assigned into a const pointer to const char, which is allowed because that means we will not be able to alter the contents of the C-string to which s points. (Making the pointer itself const, by using const after the *, means that s cannot be altered to anything else. In this case, it's not strictly necessary, but it's worthwhile being aware of this syntax.)

There's also another use of the const keyword in C++ for declaring its compile-time constants, as in:

const size_t MAX_INT64_CHARS	=	21;

This is a definite improvement over C's globally active #defines, in terms of locality-of-scope, type-safety, generic programming, and debuggability.

So const is a great thing, and C++ proponents make great play of the fact that most other important languages have no such similar concept—ever tried to get the compiler to help out clients of your Java/.NET library, rather than them having to pour over the fine, and possibly outdated, details of its documentation? Furthermore, as I bang on at length in Imperfect C++ [2], the application of const to a class's member data is a mechanism for enforcing the design assumptions of the class, proofing code against poorly executed maintenance and against unconscious or ignored design mistakes (such as forgetting to implement or proscribe copy constructor/assignment operator). But there's not universal applause. One of my friends is a compiler writer (including C/C++ compilers), and he hates const with a passion. One of his favorite criticisms is that one can subvert constness without much effort. This is true. Indeed, since C++.98, the language itself facilitates this subversion with a new keyword: mutable. What gives? To find out, we need to dig into the notion of 'constness'.

Physical constness versus Logical constness

So far we've discussed const things that are truly const. Well, duh! But there is, as always with C++, a little more to it. Let's look into a possible implementation for the c_str() and length() methods for our String class:

class String
{
 . . .

// Accessors
public:
  char const *c_str() const
  {
    return &m_data[0]; 
  }
  size_t      length() const
  {
    return m_length;
  }

  . . .

// Members
private:
  char   *m_data;  // ptr to contiguous array of (1 + m_length) char
  size_t m_length; // current length
  . . .

The implementations of the method respect the constness of the string instance on which they're called because they do not alter its state in any way. c_str() merely returns a pointer to const char, which cannot be used to alter the contents of the array to which m_data points. length() returns the value of m_length and does not change it.

This adherence, within the implementation of the member functions, to the constness of the state of the instance on which the methods are called, is known as Physical Constness. I've never heard a sensible criticism of C++'s support for physical constness.

All peachy so far. The author of string_has_character() can use the c_str() method knowing that he/she will not invalidate his/her promise to users of the function not to alter the state of the str argument. However, physical constness is not the full picture. There's also a thing known as Logical Constness. Let me give you an example to illustrate.

As you may know, the Standard does not require that string contents are stored with a null-terminator, although almost all implementations out there do so. STLSoft [3] offers a basic_string_view class template that effectively acts like a slice; that is, it consists of a length and a pointer, and does not "own" any data of its own. This emulates the representation of a string in D [4] —where slices are a part of the language—as a slice of a (potentially larger) array of characters. (The important difference is that C++ does not use garbage collection, and thus it's possible to have a string view instance that refers to something that no longer exists, with the obvious consequences; it's like the relationship between iterators and their containers. Naturally, they are to be used with care.)

basic_string_view provides a class interface in accordance with the Standard Library's String model [5, 6], and therefore has a c_str() method, which returns a null-terminated array of bytes representing the C-string form of the viewed string. Since the string view does not know whether it points to something that has a null-terminator one off the end of its slice, which would be unlikely in most cases anyway, it has to synthesize the null-terminated copy by allocating storage for its slice plus one for the null-terminator. Furthermore, since the raison d'être of string views is to be able to deal with slices of larger strings in situ, rather than paying the cost of copying those slices, they naturally do not make this copy when created, but only when and if c_str() is called. Subsequent calls to c_str()then reuse this buffer, which is destroyed in ~basic_string_view().

But, I hear you cry, c_str() is a const method, and so cannot alter the state of basic_string_view. You are correct, and the highlighted line will result in a compile error:

template< typename C // character type
        , typename T // char traits type
        , typename A // allocator type
        >
C const *basic_string_view<C, T, A>::c_str() const
{
  if(NULL == m_ptr)
  {
    static const C s_emptyString[1] = { '\0' }; // == "" or L""

    return &s_emptyString[0]; // an empty string view returns 'the empty string'
  }
  else
  {
    if(NULL == m_cstr)
    {
<table width="100%" border="0" bgcolor="lightgrey">
<tr>
<td>
<pre>      m_cstr = . . . allocate a copy of the slice, and assign to m_cstr
}

return m_cstr; } }

except...this situation—known as lazy evaluation [7] —is an exceedingly useful technique. So useful, in fact, that C++ provides direct support for it via the mutable keyword. Now, if the m_cstr member of basic_string_view is declared to be mutable, the above code will compile.

(Note that it's also possible to achieve the same thing by casting away the constness, via const_cast, as in:

      const_cast<C*&>(m_cstr) = . . . allocate a copy of the slice, and assign to m_cstr

But this is frowned on; we have mutable to do that, and using it facilitates better grepability of this situation.)

Anyway, all this actually represents is a separation of the physical constness of the object from the logical constness. We're not changing the slice itself (in fact, slice string instances are immutable, and cannot take part in (re-)assignment operations), merely getting some state at a later stage, only if/when it's needed. So logically we're not doing any mutation. Thus mutable is semantically OK when the observable state is not changed by altering the mutable thingy.

So far so good. If any of you are unfamiliar with this, it does take a little thinking about, but I'm guessing most C++ practitioners come across logical constness reasonably early in their experience.

Multithreading

Unfortunately, as with many aspects of the language, when multithreading enters the picture, things get complicated. (For a wider discussion of threading concerns in C++ check out Chapters 10, 11, and 31 of Imperfect C++ [2].) Again, I'll illustrate with an example.

I don't propose to go to town on multithreading here, but if you have worked on multithreaded developments, I think it's a fair assumption that you'll have heard of race conditions and deadlocks. A race condition is what happens when two threads of execution attempt to modify something at the same time. (Notice I said "threads of execution," which is more general than simply threads, since race conditions are just as meaningful between two or more processes as between two or more threads.)

To prevent race conditions, access to the resource must be serialized. Typically, this is achieved by using a synchronization object, such as a mutex, which can only be "acquired" by any one thread at a time. The way it works is as follows:

  • Two threads attempt to acquire the mutex that protects the resource they wish to modify. Only one succeeds, and the other is forced to wait.
  • The first successful thread modifies the resource, and then releases the mutex.
  • The second thread is now allowed to acquire the mutex, and hence granted access to modify the resource, after which it releases the mutex.

In this way, no thread is allowed to modify a resource while another is partway through doing so. Data integrity is preserved. (Note: some operations can be carried out atomically, such as reading a (correctly aligned) 32-bit value on a 32-bit architecture, without requiring use of synchronization. If you're interested in reading up on some of the weird and wonderful things you can do with atomic operations, check out Chapter 10 of Imperfect C++ [2].)

Now consider the situation where there are one or more threads modifying the resource, and one or more threads only reading from it. Clearly, the readers also need to have their access serialized, so that they do not end up reading halfway through another's write.

But here's where it gets interesting. Consider if there are no writers, just readers. Since nothing is going to change the resource, the readers do not need to have their access serialized. It is perfectly safe for one reader to be partway through reading the resource while another reads from it. In such a circumstance, we're not just limited to a few bytes in memory, or a structure, or an instance of a single class. We can read from the contents of all of the parts of memory that we "know" are not going to change, without worrying about race-conditions. Because all threads within a process share the process memory, the 'resource' can include pointers, and pointers to pointers, and so on. (Indeed, this can even apply to threads within different processes if they are operating on shared memory. This requires extreme care, however, since the views on the shared memory will have to be mapped to the same address in all processes if there are pointers in the "resource." I'd advise you to give it a lot of thought before going down this path in your day job.)

Since C++ objects are, to the system, just memory like any other, we can safely share objects between threads if we know that they are not going to change. And as we know, C++ objects are not going to change by dint of whether or not their methods are marked const. (This assumes they don't have public data, of course. In that case, all bets are off.) Sounds marvellous, doesn't it? const has opened up a safer, simpler and more efficient door into the multithreading world for us. Well, yes and no...

Threading Dangers

I recently worked on a large-scale network infrastructure project, consisting of cooperating processes communicating via TCP and TIBCo's EMS message queueing middleware. The purpose of the processes on which I worked is to route the (EFTPOS-based financial protocol) messages based on their type, content, and other factors (such as time of day), between "legacy" mainframes and a new transactional system. Without getting bogged down in the details, think of the Messages as being read from the communication streams via a message factory. This factory—implementing the ImessageFactory interface (according to the Abstract Factory pattern [8]) —is effectively a Singleton, being created in main(), and passed down to the various subsystems, some of which execute in different threads.

So, at various points during the lifetime of the multithreaded process, the message factory will be called upon, via its CreateMessage() method, to read the communication stream (or a memory stream), and create a corresponding Message instance. These calls may come from any thread.

Reading a message's contents from the stream (or buffer) and instantiating the Message instance can take a nontrivial amount of time, so for performance reasons, it was desirable to allow multiple threads to be in the process of reading and creating messages concurrently, aiding throughput. This means that the CreateMessage() method does not have any kind of thread synchronization.

For this to work, it relies on the fact that no methods on the factory will be altering its state. Since, CreateMessage() is the only method in the interface, we just need to ensure that it does not affect any state of the message factory. So when we made this design decision, the very first thing that was done was to ensure that CreateMessage() was declared const, so the compiler could help us out if we'd missed something obvious. We hadn't. (Naturally, we also undertook a mini code review at that time, but the const-ification gave us a good level of confidence.) The message factory class—called MessageFactory, would you believe? —does all its work in the constructor because all the message types are defined in the financial protocol specification, and are therefore known at compile time.

Hence, we have what appears at first glance to be a dangerous thing: a class that has no thread-synchronization protection, shared between several threads. The reason it is fine is because none of the code running in those threads calls a method that affects the state of the factory, and we can be sure of that because that method is const.

But, correct as that rationale is, it should now be obvious to you that it's only a good one because the CreateMethod()'s const modifier indicates physical constness. Imagine a different system, where we might plug-in message creation functions dynamically at runtime, perhaps by loading modules. We might be tempted to use mutable and thereby have only logical constness. We would then have a race-condition on our hands, and live time would drop from weeks/months/years (hopefully years, but it's only been in production for a month so far) to minutes/hours.

Table 1
Synchronized Access to Method Constness of Method Safe?
No Nonconst No
No Logically const No
No Physically const Yes
Yes Nonconst Yes
Yes Logically const Yes
Yes Physically const Yes

I think it's fair to say that, in multithreaded contexts, the potential of logical constness means that relying on constness is a potentially hazardous tactic and should be used sparingly. Table 1 summarizes the situation for a class method in a multithreaded scenario, assuming no other non-(physically) const methods; naturally any other nonconst method can easily lead to mutating changes, mandating the use of synchronization. (Even more important: The class must not have public member variables or friends—either of those and all bets are off, and not just for our const technique; even synchronization won't help you there!) There's one last caveat: The const methods should not call into some unsynchronized shared state outside of the purview of the class.

From a practical perspective, what's a programmer to do? There are a few measures that can be taken to avoid the problem. In a multithreaded development, for classes that need to be safely shareable between threads:

  1. By default, synchronize access to all nonstatic methods.
  2. Only in classes whose nonstatic methods are all const, consider omitting synchronization.
  3. For the group of classes that meet criteria 2, discount all that have any mutable members.

In addition, remember to always subject your multithreaded designs to code review with your peers. (This one's pretty good advice for all developments, but pending the commercial exigencies of your organization permit.)

There's one trick to help you discover any such uses. To discover mutable members (or the ones which have an effect, anyway) in multithreaded builds, you can remove it, as follows:

#ifdef ACMESOFTWARE_BUILD_IS_MULTITHREADED
<table width="100%" border="0" bgcolor="lightgrey">
<tr>
<td>
<pre>
# define mutable
#endif /* ACMESOFTWARE_BUILD_IS_MULTITHREADED */

This still doesn't help you find where const members have been mutated by const_cast, though. Sigh.

Conclusion

const is a wonderful tool and, when used in physical constness guise, is entirely a positive thing. However, logical constness, though also very useful, and generally a positive contribution to the language, does not always mix well with multithreading, and can lead to subtle race conditions, or requires locking on some/all of a type's ostensibly nonmutating methods.

Walter Bright reminds me that many of the features of C++ have made it into other languages, but yet const has not. Walter questions whether that's because the promise of const is not fulfilled. Since I'm a big fan of const, I think maybe it's because it's somewhat misunderstood by many programmers, and also because it complicates the implementation of compilers.

Acknowledgments

Thanks to Bjorn Karlsson, Garth Lancaster, John Torjo, and Walter Bright for their excellent criticisms and suggestions.

About the Author

Matthew Wilson is a software development consultant for Synesis Software, and creator of the STLSoft libraries. He is author of Imperfect C++ (Addison-Wesley, 2004), and is currently working on his next two books: a guide to extending the STL, and, with Walter Bright, an introduction to the D programming language. Matthew can be contacted via http://imperfectcplusplus.com/.

Notes & References

[1] Stroustrup, Bjarne. The C++ Programming Language, Special Edition, Addison-Wesley, 2000.

[2] Wilson, Matthew. Imperfect C++, Addison-Wesley, 2004. I can't recommend this book highly enough! :-)

[3] STLSoft is an open-source organization whose focus is the development of robust, lightweight, cross-platform STL-compatible software, and is located at http://www.stlsoft.org/. The basic_string_view component will be available from Version 1.8.3 onwards, which is due for release in late February 2005. It is also included in v1.8.3 Beta 1, which is available now.

[4] D is a new systems programming language, created by Walter Bright (of Digital Mars; http://www.digitalmars.com/), which merges many of the best features of C, C++, and other advanced languages. Walter and I are about to write a book on it, entitled D Programming Distilled, for Addison-Wesley.

[5] Austern, Matt. Generic Programming And The STL, Addison-Wesley, 1999.

[6] Musser, David, Atul Saini, and Gillmer Derge. STL Tutorial and Reference Guide, 2nd Edition, Addison-Wesley, 2001.

[7] Meyers, Scott. More Effective C++, Addison-Wesley, 1996.

[8] Gamma, Erich, Ralph Johnson, Richard Helm, and John Vlissides. Design Patterns, Addison-Wesley, 1995.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.