Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

The Boost Multi-Index Containers Library


September, 2004: The Boost Multi-Index Containers Library

Joaquín Ma López Muñoz is a telecommunications engineer from Madrid Polytechnic University working at Telefónica, Investigación y Desarrollo. He is also the author of the Boost Multi-index Containers Library. He can be reached at [email protected].


The choice of a particular container among those provided by the STL is frequently driven by the access interface that best suits your needs. Sometimes, however, no single interface fits the bill—a particular constraint of one part of your design might call for a sorted view to the data, while you need a different sorting or a list-like interface somewhere else. The Boost Multi-index Containers Library [1], or Boost.MultiIndex for short, is a convenient framework that provides, according to your specification, multiple, distinct interfaces to a single underlying container. The result is a container that mimics the public interface of each of the various STL containers that correspond to your specification.

Each of the interfaces you specify for a multi-index container is called an "index" of the container. In most respects, indices [2] do not differ from the interface of the equivalent STL classes that they mimic, so you will quickly learn to use the library once you understand how to specify multi-index containers. Each index behaves like a separate container with its own interface, but the data isn't duplicated and all of the indices are synchronized automatically.

What's in a Container?

A container can be regarded as playing two distinct roles with respect to the elements it holds. First, it takes ownership of the objects managed, controlling the insertion and removal of elements. Second, it provides a public access interface to these elements, which determines how they are iterated and which operations can be performed on them. I refer to these two roles as "containment" and "access." STL containers do not make this distinction clear, as they provide a monolithic interface that takes on both responsibilities. However, considering containment and access as separate concerns in your program can make a lot of sense when designing data structures that depart from the trivial.

In some situations, it is desirable to handle the same collection of elements by means of different access interfaces, each of them suited to the needs of a particular part of the program. This involves somehow weakening the strong association between containment and access. There are two usual approaches to do so:

  • Views, as implemented, for instance, by the VTL library [3] or Maciej Sobczak's view library [4], let you adapt a container to a more suitable interface. For example, an std::vector can be adapted by a sorted view to obtain ordered access to the elements.
  • The element ownership imposed by the STL containers can be relaxed by having containers store (smart) pointers to the actual objects. Several access interfaces to the same data can be set up using different containers of pointers to the same objects.

These approaches present a fundamental problem: Without further effort from the programmer, synchronization among the various parts of these assemblies is lost as soon as an element is added or removed. The different access interfaces are not automatically aware of changes in the underlying collection and cannot impose any criterion on whether to permit a given element to be added. Instead, you have to manually keep the whole data structure in sync and ensure that only elements that maintain consistency in the whole assembly are added, which leads to tedious and error-prone code, especially if the program must be exception-safe. Boost.MultiIndex offers a solution to this problem.

Boost.MultiIndex in Action

Assume you are writing a payroll application in which employees are modeled with a class called, sensibly enough, employee. All the employees of the company are stored in a container that is sorted according to employee IDs, which are unique across the company. std::set is a perfect choice for this task; see Listing 1.

Now, suppose a requirement is added to supply functions print_by_name and print_by_ age producing listings of employees sorted by name and age, respectively. The order provided by company is based on the employee ID, so you'll have to set up an additional data structure (a vector of pointers to the employees, perhaps) and sort it by name or age in order to produce the corresponding listing. If the operation is done locally inside the functions, and these are called frequently, the incurred overhead can be large, so a slightly better solution is to maintain these additional data structures as global objects and take care that they are synchronized with the main company container. Boost.MultiIndex can handle this kind of situation in a more convenient manner through the class multi_index_container. Each access interface—the main one based on employee ID, plus one for sorting by age and another for sorting by name—is specified as a distinct index of the same multi-index container, as in Listing 2.

The core of a multi_index_container is described inside the indexed_by construct. In the case of company, three indices are specified, as the notation hopefully suggests, based on employee's operator <, comparison by employee::name, and comparison by employee::age, respectively. Accessing the different indices is accomplished by means of a simple get interface as shown in Listing 3. Like most things in C++, indices are numbered from zero, so cmp.get<1>() retrieves a reference to the second index, the one sorted by name. print_by_age would be implemented in the same way, but accessing the data through index 2 (the third one) instead. Each index in our example provides a public interface practically identical to that of an std::set, so your knowledge of the STL can be applied directly.

By design, multi_index_container inherits the public interface of index 0, so that, for instance:

cmp.begin();

is the same as:

cmp.get<0>().begin();

This special rule is provided as a convenient shorthand notation when migrating from legacy code to Boost.MultiIndex. In this example, company started out as an std::set and later changed to a multi_index_container. Most of the original code dealing with company defined as an std::set will work automatically after the change against index 0, without the need to insert get<0>() throughout.

It can be cumbersome to access every member function of an index using the get mechanism. Instead, you can retrieve and repeatedly use a reference to the index, as in Listing 4. Note that get returns a reference to the index being retrieved, and not a copy: Index objects cannot be constructed separately from their container and thus they cannot be copied.

// common error: the following fails as indices are not copyable
company::nth_index<1>::type name_index = cmp.get<1>();

One Container, Several Access Interfaces

In light of the previous discussion on separating containment from access, multi_index_container can be regarded as a container controlling a single collection of elements while providing several access interfaces to them. Unlike similar constructs based on views or containers of pointers, multi_index_container indices are aware of additions and deletions of elements, and can even ban the acceptance of a new element so that the semantic integrity of the container is maintained. In Listing 5, for instance, the last insertion won't succeed because index 0, which was declared as ordered_unique, preserves the uniqueness of IDs in the company. The interesting thing to note about this is that the insertion is not done through index 0, but rather through an index based on employee::name, which does not declare any uniqueness constraint. This may well be the strongest feature provided by Boost.MultiIndex: Indices are not just passive channels to the underlying data; they also impose their own constraints on the container. multi_index_container maintains your semantic restrictions automatically, which saves you a lot of trouble.

Ordered Indices

Let's examine in more detail how a multi_index_container is specified. The first template parameter is the type of the elements, much as in any STL container. The second template parameter, which is built with the indexed_by construct, describes the indices the container will provide.

In the example, I used two types of indices:

  • ordered_unique indices are sorted according to a given key, and no elements with the same key are allowed.
  • ordered_non_unique indices resemble the former, but they accept duplicates.

Ordered indices borrow the concept of a key from relational databases, conveniently abstracted to suit C++: A key is a piece of information extracted from the element through a key extractor. In the example, the identity extractor specified that the whole employee object is the key, which is how std::set works by default. The member extractor was used to stipulate that employee::name and employee::age are the keys for indices 1 and 2, respectively. The syntax of member is a little contrived, but conceptually simple. The three parameters required are, respectively: the element type, the key type, and a pointer to the member that provides the key [5].

Boost.MultiIndex provides other key extractors, and you can even specify your own should the necessity arise. An ordered index maintains its order based on the default comparison operator for the associated key or on a comparison predicate you supply. This is similar to how std::set accepts a comparison predicate that defaults to std::less. To illustrate this, consider the alternative definition of company with ages sorted in descending order in Listing 6.

The whole interface of std::set is replicated by ordered indices, with one important difference—query operations such as find or lower_bound, accept as the search parameter a key object rather than an entire element value; see Listing 7. Note that find is passed the string "Susan Kovac", and not an employee object with that name, as would happen with an std::set. This is more efficient because query operations only need the key [6].

Sequenced Indices

Ordered indices are not the only ones provided by Boost.MultiIndex. It also has sequenced indices, which are the alter ego of std::lists. Sequenced indices are not key based, so their specification is simpler than that of ordered indices. Consider a word-processing application in which each word in the text is stored in an std::list of strings:

typedef std::list<std::string> text;

If you equip the word processor with a means to compute the occurrences of a word in the text, you can write:

std::size_t occurrences (const text & txt, const std::string & word)
{
  return std::count(txt.begin(), txt.end(), word);
}

This is not an ideal implementation since counting the occurrences of a given word takes linear time on the total number of words, which is not acceptable if the word processor is to handle large documents. Now consider the definition in Listing 8 of text based on multi_index_container. Index 0 behaves much like std::list, so that, for instance:

for(text::iterator it = txt.begin(), end = txt.end(); it != end; ++it){
  std::cout << *it;
} 

produces the same output as an iteration over the original std::list (this example took advantage of the aforementioned rule that allows get<0>() to be omitted.) The interesting change lies in the specification of an additional ordered index, which simplifies and dramatically increases the performance of the occurrences function (see Listing 9).

Index 1 keeps the words sorted in alphabetical order, so counting word occurrences can be implemented in logarithmic time. The combination of a sequenced and an ordered index lets you retain the sequential nature of the words in the text while having an efficient query interface, too. Most uses of sequenced indices derive their utility from such combinations [7].

Updating

Indices take as much inspiration as possible from equivalent STL containers, but occasionally expand on those interfaces. Having the privilege to revisit the functionality provided by the STL allows for the addition of some extra features. Updating is one such addition. Elements of an associative container (such as std::set) are not mutable, and with good reason; if you were allowed to modify an element of an associative container, its basic invariant, preserving the order of the elements, could be breached. So, changing an element requires erasing it and inserting a modified copy. This is cumbersome and has several nasty complications. For instance, iterators to the original element become invalid. Also, if the modified object is no longer unique, you will not be able to insert it, leaving the container with neither the original nor the modified element.

All types of indices in Boost.MultiIndex feature the member function replace, which allows for the atomic updating of an element; see Listing 10. The call to replace first checks the new object against the constraints imposed by all indices. If it violates no constraints, then replace replaces the current element with the new object; otherwise, it returns false. The operation is atomic and exception safe, and iterator and reference validity are preserved.

The aforementioned code is not particularly efficient because it involves a preliminary copy of the employee to be updated and an assignment operation inside replace. If elements are expensive to copy, you could find this unacceptable. There exists an alternative updating mechanism called modify that avoids the need to incur such overhead at the expense of a more complicated interface and different semantics in case of failure. (Refer to the library documentation for further details.)

Complexity and Efficiency

Briefly put, Boost.MultiIndex is as efficient as it can be. Each index maintains its own internal data structure, so the complexity estimate of an insertion or removal operation is just the sum of the individual contributions by all indices of the container. For instance, if you have a multi_index_container with two ordered indices, inserting an element has the same algorithmic complexity as two insertions in an std::set.

On the other hand, read-only operations such as searching, only involve the index used for that operation, so their complexity is the same as the equivalent operation in an STL container.

As for memory consumption, Boost.MutiIndex outperforms equivalent constructs based on views or compositions of containers. The internal information maintained by all indices is stored in a single node per element, while a composition of containers allocates as many nodes per element as the number of containers involved. This optimization has a beneficial impact on the amount of memory required. The reduced number of allocation calls can also result in significantly faster execution times, especially if the memory routines provided by the C++ Standard Library are not very efficient.

Other Features

There are many other Boost.MultiIndex features that should at minimum be mentioned:

  • Index retrieval can be made more descriptive by using an alternative version of the get mechanism that accepts tags instead of numbers. A tag is any C++ type whose name may be used as a mnemonic for the associated index.
  • Boost.MultiIndex provides extractors for retrieving calculated keys. Rather than a particular data member of the element object, these keys are the value returned by a given member function.
  • There is a notion of a composite key modeled after the analogous concept found in relational databases. A composite key aggregates the results of several elementary key extractors.
  • The query interface (find, lower_bound, and so on) of ordered indices has been augmented to support searching based not only on keys, but also on compatible keys. A compatible key is any type for which comparison with values of the original key type is defined. Currently, STL associative containers impose the somewhat arbitrary restriction that the search parameter be a value of the key type, which makes some otherwise simple operations infeasible.
  • Boost.MultiIndex has a so-called safe mode, intended as a debugging aid. Safe mode detects at runtime many common errors in user code, such as attempts to use invalidated iterators or incorrect mixing of iterators with containers they don't belong to.

Conclusion

Boost.MultiIndex is a library designed to solve the recurrent need for different access interfaces to the same collection of elements. Its design is strongly based on that of the STL to allow interoperability with the standard facilities and to help the user become familiar with it rapidly. The benefits obtained from using Boost.MultiIndex, as compared with hand-made compositions of STL containers, are simplicity and robustness. Perhaps more importantly, the library provides a certain economy of expression in the definition of complex data structures, so that the resulting code more accurately reflects the underlying design. The library comes equipped with a number of additional features that are regarded as useful extensions to the interface provided by STL containers. These extensions can make Boost.MultiIndex a convenient choice even for single-index containers.

Currently, Boost.MultiIndex supports ordered indices, behaving like std::set, and sequenced indices, modeled after the interface of std::list. These types of indices can be freely mixed to compose arbitrarily complex instantiations of multi_index_container.

Boost.MultiIndex recently shipped with Boost 1.32, and the evolution roadmap for the following months is currently under specification. If you'd like to contribute your experience to the use of the library and ideas for future extensions, please contact me or engage in the Boost mailing lists [8][9].

References

  1. [1] Boost Multi-index Containers Library (http://www.boost.org/libs/multi_index).
  2. [2] An interesting discussion arose in the Boost mailing list about which plural form is preferred, "indices" or "indexes." Although the latter seems to be somehow more prevalent in modern English, I chose "indices" if only as a token of snobbism.
  3. [3] Powell, G. and M. Weiser. View Template Library (http://www.zib.de/ weiser/vtl).
  4. [4] Sobczak, M. "STL Sequences & the View Concept," C/C++ Users Journal, April 2004.
  5. [5] If you think specifying the element type in member is redundant since it has already been declared for the multi_index_container, you're right. Unfortunately, there is no easy way to eliminate this redundancy and keep the instantiation syntax generic. Also, specifying the key type seems unnecessary as it can be deduced from the pointer to member parameter. The sad fact is that the language does not provide a means to extract this implicit information: The duplication could only be avoided through some sort of typeof facility. Maybe C++0x will allow for a simplification of the syntax of member.
  6. [6] Actually, this is a consistent approach if you consider std::set as a degenerate case of an ordered index where the key is the entire object.
  7. [7] However, there is nothing that prevents us from defining a multi_index_container composed of, say, two sequenced indices. This construct would behave as a pair of lists sharing their elements.
  8. [8] Boost developers mailing list (http://lists .boost.org/mailman/listinfo .cgi/boost).
  9. [9] Boost users mailing list (http://lists.boost.org/ mailman/listinfo.cgi/ boost-users).


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.