yasli::vector Is on the Move

By Andrei Alexandrescu, June 01, 2004

According to Andrei, what std::vector cannot do well or at all, the legendary yasli::vector does like a champ.

June, 2004: yasli::vector Is on the Move

Andrei is a graduate student in Computer Science at the University of Washington and author of Modern C++ Design. He can be contacted at andreimetalanguage.com.

If the general mood at this Spring's Software Development Conference was any indication, C++ is in very good shape. The C++ track has been the strongest of the conference and the main reason for the unexpectedly high attendance. Old and new projects use C++ galore, and programmer enthusiasm in advanced C++ techniques is on the rise. (Even I felt jaded in comparison!) Moreover, I got word that C++ programmers are the most sought after, that C++ consultancy requests have suddenly ramped up, and that C++ books are selling best. I'm not sure why that is the case; proponents of other languages consider C++ an "efficient language" as opposed to other languages that are dubbed "productivity languages." Oh well.

Recently, there's been a surge of questions and discussions on the Usenet newsgroup comp.lang.c++.moderated about using std::vector effectively. It turns out that std::vector's current interface and implementation need some enhancements if we are to use it as an effective replacement for the built-in array, and even more enhancements if we are to use it multidimensionally (vector of vectors). Such needs have also been discussed in past articles [2] and in Herb Sutter's book [8].

What std::vector cannot do well or at all, yasli::vector does like a champ. The legendary YASLI (Yet Another Standard Library Implementation) is, by definition, the best implementation of the C++ Standard Library around. YASLI uses advanced C++ implementation techniques, optimizations, and system and architectural assumptions to achieve performance numbers that smoke any other implementation around, and in particular the one you might be currently using. Unfortunately—and here's where the "legendary" bit kicks in—YASLI, developed in Yasland, is not available to anyone, not even to yours truly. All we can do here on Earth is to write approximations of it. With that in mind, this month I present a subset of yasli::vector, and focus on the data moving aspect of its implementation.

Mailcontainer

Following my presentations at the aforementioned conference, there's been a lot of good feedback. First, many people liked flex_string [5] a lot. Given that the version published together with the original article had some bugs, I finally decided to publish it at http://moderncppdesign.com/. Click on "Code" and you can download the latest and greatest flex_string. I've also gotten favorable real-world usage data from Harmut Kaizer, who wrote me that in his C++ preprocessor [7], simply replacing std::string with flex_string boosted (no pun intended) speed by 5 to 10 percent, depending on input.

What They Wouldn't Want You to Know About std::vector

Do you want to know what they don't want you to know about std::vector? Do you know why they are afraid that the truth will escape out one day for everybody to know?

At the risk of being kidnapped and confined by them at a secure, top-secret location, I'll tell you now. The biggest problem with std::vector is that it conflates two different concepts—data moving and data copying.

There. I said it.

std::vector is overly generous with copying data around, when most of the time moving is the way to go. The untold assumption that the Standard spelling of std::vector makes is that copying vector elements around is not a major issue, and is not part of the asymptotic behavior (it's traditionally considered O(1)). That makes for nice numbers on paper, while the situation in the trenches might not be as bright as it could—and should. There are ways around that; they are not perfect, but they go a long way. Part of the solution is to define traits that give insights into the vector's element type, and also to use ideas from Mojo [6], which differentiates between copying and moving.

"Moving" means that I have an object obj at a location adr and I want to move it to a different address adr1. Now, there are four variations, depending on the pre- and post-move situation:

Move an object to an uninitialized location, and consider that the former seat of the object is junk and will never be read again.
Move an object to a location that already contains a valid object, after which the old location is junk.
Move an object to an uninitialized location, but after the move, consider the old location adr to still contain an "empty" object that might not hold the same state, but is still usable and will ultimately be destroyed elsewhere.
Move an object to a location that already contains a valid object, and consider the old location to still contain an "empty" object.

Because I'm an equal-opportunity name-coiner, let's call the four versions (1) destructive move, (2) destructive move assignment, (3) nondestructive move initialization, and (4) nondestructive move assignment. First, let's review the reasons (already mentioned in many places [4]) for which a move can often be less expensive than a copy. Oftentimes, a class just holds pointers or handles to expensive-to-duplicate resources, such as large chunks of memory or OS resources or whatnot. To C++, moving (reseating) is not a fundamental operation, and therefore std::vector typically implements it as a copy followed by the destruction of the source. During the copying, resources might be duplicated; but that's not needed, because in the next step, the just-duplicated resources will be destroyed. Obviously, it is more efficient to simply pass the "baton" from the source to the destination.

So the central concept is to differentiate between copying and moving data. However, it's all in the details. My previous articles [2,3,4] focused on moving data using memmove. The key questions in implementing typed buffers was, what conditions does a type have to meet to be moveable by simply copying its bits by using memmove(&obj, ptr1,sizeof(obj))? In theory, nothing but Plain Old Data (POD) types can be moved with memmove. POD types are, roughly speaking, all the types that would make sense in C: fundamental types, arrays thereof, and simple structs without embellishments such as virtuals or private data. In reality, it turns out that many types are moveable, but not all. Types that qualify don't have internal pointers (pointers that point to addresses inside the object would be totally messed if the object is simply memmoved). The most dangerous internal pointers are those that are also hidden—data added by the compiler (such as the pointers added by some compilers in the case of virtual inheritance). But then again, if we were to just follow the Standard, moving any fancy class via memmove is a no-no; the possibility of memmoveing classes is based on arguments like, "Yeah, it's illegal, but the compiler would be really crazy to cause any trouble."

But focusing on whether or not a type can be moved with memmove was simply wrong. The Yaslanders understood very well that the real focus is not to decide on memmoveing, but rather to define and follow a protocol between vector<T> and its hosted type T. Then, YASLI defines a default conservative implementation of the protocol. Any type that wants to achieve efficiency with vector only needs to override the default protocol implementation with the one of choice. In particular, that implementation could use memmove. If memmove doesn't work with a particular class, there still are other efficient ways. It's that simple.

yasli::vector's Protocol

Why all the fuss about moving data versus just copying it, you may ask? It can't be too bad, could it? Well, how about an improvement of about 23. And that's not percentage, it's times; that is, insertion using memmove, as implemented and measured by Howard Hinnant in CodeWarrior Pro 9 against a vector<string> (100 elements, 20 characters each) is 23 times faster than the politically correct version that copies strings around.

So, let's do a bit of analysis: We need to define a protocol for moving objects that yasli::vector will use. The protocol should have the following properties:

Genericity. We need the protocol to work for any type that yasli::vector can ever be instantiated with.
Flexibility. We'd like the protocol to be customizable so that we can implement it in various ways for types with various properties.
Reasonable defaults. This is an important point. The protocol must implement a conservative, reasonable default that works for any type. We don't want to require anyone using yasli::vector to define some special functionality.
Robust. That means the protocol should be easy to implement and use correctly, and hard to implement and use incorrectly. Robustness is somewhat in tension with reasonable defaults; if you implement the protocol incorrectly, there is a risk that the default protocol will be executed for your type without you realizing it.

Now, let's brainstorm some designs that would meet these requirements. For simplifying the discussion, I'll focus here on only one (destructive initialization) of the four move scenarios discussed earlier. The simplest approach is to arrange things such that yasli::vector calls a template function:

// yasli_protocols holds all of
// user-overridable portions of YASLI
namespace yasli_protocols
{
   template <class T>
   T* destructive_move(
      T* begin,
      T* end,
      void* dest);
}

The charter of destructive_move is to move the data range (begin, end) to a chunk of memory pointed to by dest. Then it returns dest, cast to T* so as to reveal the new type of the destination. After the operation has completed, the source range is considered to be bits; that it, it has lost type—hence the "destructive" particle in the function name. The default implementation of destructive_move would do the conservative thing: It copies the data element by element, then destroys each element in the source. Then, vector would use destructive_move religiously whenever it comes about moving data around. The users of vector can define their own overloads of destructive_move to implement efficient moves for their own types.

Unfortunately, this turns out to be a bad design that fails the flexibility and robustness test. In fact, it is the route taken by std::swap, with disastrous results. The short version of an explanation is that relying on function overloading is the wrong way to go due to the way name lookup works; I've erased all traces of that trauma from my memory, but you are welcome to read the details [1].

A nicer, tamer way to do things is to rely on a template class that defines a number of functions. Template classes obey simpler rules with regard to name lookup and instantiation. Here's how the declaration of such a protocol would look:

namespace yasli_protocols
{
   // that's more like it
   template <class T>
   struct move_traits
   {
      static T* destructive_move(
         T* begin,
         T* end,
         void* dest);
   };
}

Now yasli::vector<T> will use move_traits<T>::destructive_move whenever it comes to moving data around. Any type can simply specialize move_traits<T>::destructive_move to perform the move in some efficient way. The first candidates that come to mind would be primitive types such as integers or pointers, and here's how the specializations would look:

namespace yasli_protocols
{
   template <>
   struct move_traits<int>
   {
      static int* destructive_move(
         int* b,
         int* e,
         int* d)
     {
       return (int*)memmove(d,
         b,
         (e - b) *sizeof(int));
     }
};
template <class T>
struct move_traits<T*>
{
   static T** destructive_move(
      T** b,
      T** e,
      void* d)
   {
     return (T**)memmove(
        d,
        b,
        (e - b) * sizeof(T*));
   }
 };
}

The first is a total specialization applying to int alone, while the second is a partial specialization that applies to all pointer types. Nothing special (no pun intended). But there are two less-than-perfect things about this design. One is that there's some duplication creeping up. But hey, the real code generated is just one line, and if worst comes to worst, you know that yours truly won't be coy about defining a, um..., macro. But the more disturbing problem is the feel of a brute force approach. We want to implement move_traits in a certain way for types that have a specific property (are moveable by using memmove, property that all primitive types possess). We end up achieving that by implementing move_traits for each type sporting that property, and that doesn't bode well. Later, we may want to implement move_traits in a special way for all types that implement the Mojo protocol [6], and that set is simply not enumerable! In other words, this design is not very good at fulfilling the flexibility requirement.

Looks like relying on overloading or template specialization both have shortcomings. Fortunately, although we don't have full access to the Yaslander technology, they do have full access to ours. In particular, they've read an article of important impact, fatefully entitled "Function Overloading Based on Arbitrary Properties of Types" [10], which presents a simple template enable_if that lets you control when a specific overload kicks in. This concept is so interesting, and applies to our needs of defining a protocol so well, it has a short section of its own (at the cost of plagiarizing)—but do read the mentioned article for the full story.

Controlling Overloading

Several C++ luminaries arrived independently at the same interesting conclusion: When resolving function overloading in the presence of templates, the compiler might try and silently abandon a lot of dead ends. For example:

void transmogrify(unsigned int) { ... }
template <class T>
typename T::result
transmogrify(T) { ... }
 ...
transmogrify(5);

When seeing such code, the compiler gives the second overload of transmogrify a chance by attempting to instantiate it with int. It doesn't take a strike of genius, though, to realize that int::result is not a type. The interesting bit is that the compiler does not issue a compile-time error at this point, but instead it just removes the function from the set of candidate functions that might apply. This phenomenon, coined as the Latin-sounding SFINAE (Substitution Failure Is Not An Error) [9], led to a simple technique of controlling overloading. First, we define a template class enable_if as shown here:

template <bool B, class T = void> 
struct enable_if
{
   typedef T type;
};
template <class T>
struct enable_if<false, T> {};

So, enable_if defines the type if the compile-time Boolean value passed is true, or it doesn't if the Boolean value passed is false. Now all you have to do is to use enable_if<expression, T>::type instead of T somewhere in the function signature, and voilà! The Boolean now controls whether the function is ever considered or not.

The applicability of this idea to move_traits is hinted to in the "Future Work" part of [10], which says:

template <class T, class U=void>
struct move_traits
{
   static T* destructive_move(
      T* begin,
      T* end,
      void* dest);
};

Now, say we want to specialize for all primitive types, which we can do in a single shot!

tempalte <class T>
struct move_traits<T,
   typename enable_if
      <!is_class<T>::value>::type>
{
   static T* destructive_move(
     T* begin,
     T* end,
     void* dest)
  {
   return (T**)memmove(
       d,
       b,
       (e - b) * sizeof(T*));
  }
};

Unless you're familiar with Boost's type traits, there's more than a mouthful in the aforementioned snippet of code—but nothing you can't handle. The expression is_class<T>::value is a compile-time Boolean constant that evaluates to true if T is a class; that is, not a built-in type. Conversely, !is_class<T>::value is true for all built-in types. When you mention move_traits<floats>, the compiler attempts to instantiate the more specialized version of move_traits. That instantiation succeeds because typename enable_if<!is_class<float>::value>::type evaluates to void and, as such, takes over.

Next, we can easily customize move_traits for all classes that implement the Mojo protocol [6]:

template <class T>
struct move_traits<T,
   typename enable_if
     <is_base_and_derived<mojo::enabled,
     T>::value>::type>
 {
 ...
};

We also can specialize move_traits on its first argument by matching std::complex. Most of the time, std::complex<T> can be moved using memmove. The only condition is that the compiler doesn't insert some funky additional data with a complex object (which no compiler I know of does), a condition that can be easily checked with a sizeof test:

template <class T>
struct move_traits<std::complex<T>,
   typename enable_if
   <sizeof(std::complex<T>) ==
      2 * sizeof(T)>
        ::type>
{
   ... use memmove ...
};

To conclude this section, defining a protocol as a two-parameters template class as shown and using enable_if to guide specialization scores high on all of our requirements.

Resizing Without Initialization

Here's another common request made by users of std::vector. Many programmers want to create an uninitialized vector because, for example, they need to fill it with a C-style API function. There is no way of doing that with std::vector, and allowing it is tricky because it would create a type-unsafe hole inside std::vector. What to do? The approach of yasli::vector is particularly elegant in that it offers initialization functions putting both the responsibility and the accountability in its user's hands. Consider, for example, this resizing functor:

// ... inside yasli::vector ...
template <class F>
void resize_nstd(size_type newSize, F functor);

This nonstandard resizing function resizes the vector. If the vector grows, resize_nstd doesn't initialize the newly added elements; instead, it calls F(range_begin, range_end), where range_begin and range_end are pointers pointing to the newly added range of elements. If F fails by throwing an exception, yasli::vector aborts the resizing. The vector's size stays the same, but it is possible that capacity has increased.

If you want to leave them uninitialized, write a do-nothing function and pass it in. There's no better way out of the "who's responsible and who's accountable" conundrum.

Guarded Assumptions

If you look around yasli::vector's source code, you'll notice a funny-looking piece of code that is called whenever an empty object of type yasli::vector is being initialized:

// ...inside yasli::vector...
   void init_empty()
   {
#if YASLI_UNDEFINED_POINTERS_COPYABLE == 1
      end_ = beg_;
      eos_ = beg_;
#else
      beg_ = 0;
      end_ = 0;
      eos_ = 0;
#endif
   }

The three members beg_, end_, and eos_ are the classic "begin, end, and end-of-storage" pointers that all implementations of std::vector hold as members. The interesting bit is the true branch of the #if-guarded code. If YASLI_UNDEFINED_POINTERS_COPYABLE is 1, then the vector is initialized by copying the uninitialized pointer beg_ to the other two pointers! How can that be even close to correct?

The empty state of a vector is that its begin, end, and end-of-storage pointers have the same value. However, the actual value of the three pointers is irrelevant! The test for emptiness spells beg_ == end_, not beg_ == 0. Traditionally, the three pointers are initialized with the singular value NULL. But on some architectures (such as Intel's Pentium family), you can copy and compare for (in)equality any uninitialized pointers, as long as you don't dereference them. Some other processors don't allow such manipulation of uninitialized pointers. On the former processors, you can get away with initializing a vector with only two assignments instead of three—a nice speed improvement (30 percent as measured on a Pentium 4).

Exploiting uninitialized pointer manipulation when YASLI_UNDEFINED_POINTERS_COPYABLE is 1 is a good example of an important piece of Yaslander technology: guarded assumptions. A guarded assumption is a system-dependent assumption about the way a hardware-compiler-runtime combo works. The "guard" is a preprocessor definition; if that definition is absent, then YASLI implements conservative standard behavior. If the definition is present, then YASLI is free to exploit that assumption to achieve better execution speed and/or pack data better, with code that invokes undefined behavior according to the C++ Standard. Other examples of guarded assumptions, not necessarily present in vector per se, are:

YASLI_REALLOC_AFTER_NEW. When this preprocessor symbol is defined as 1, then YASLI will assume that it is safe to call realloc on a pointer obtained with the global operator new.
YASLI_STANDARD_OBJECT_LAYOUT. This means that fields are laid out in an unsurprising manner—there are no hidden fields and the declaration order is respected.
YASLI_FLOATS_BITWISE_ZERO. This means that a floating-point value filled with zero bitwise will also evaluate to 0.

yasli::vector—Standard or Not?

So the important question that arises is, "Is yasli::vector compliant with the C++ Standard's prescription?" The answer is either a qualified yes or qualified no—depending, of course, on what you qualify the answer with.

First, if you grep yasli::vector's header file for "nstd," you will find a number of functions and methods what are, well, not standard, but that ought to be if std::vector's interface would allow for a maximally efficient implementation. Such an example is the yasli::vector<T>::move_back_nstd(T&) function that appends an element to the vector by moving the content of the passed-in object—as opposed to copying it.

Second, as said, yasli::vector commits to using the yasli_protocol::move_traits protocol. That, by default, ensures Standard-compliant behavior. If you provide your own implementation of the protocol, of course, yasli::vector will use that one and therefore deviate from the letter of the Standard.

So we could say that yasli::vector provides some additional functionality that is nonstandard. However, in doing that, it requires the cooperation of the user, either in the form of providing a custom implementation of yasli_protocol::move_traits, or in the form of compulsively calling functions that have "nstd" spelled as part of their name. That makes it easy for you to make your choices.

Conclusion

This column presented the data moving protocol used by yasli::vector. Per popular request, we might run another installment regarding other std::vector implementation goodies, such as enhanced checking features. The existing std::vector is of good quality, but fails to address a number of important points, such as memory friendliness, distinction between moving and copying, and resizing without initialization. The actual code, as stolen from the Yaslanders, is available from http://moderncppdesign.com/code, and your comments are welcome.

Acknowledgments

In addition to authoring (for CodeWarrior 9) the best std::vector implementation available (well, barring the unavailable YASLI version), Howard Hinnant provided excellent input for the article that, hopefully, you enjoyed. Many thanks to Eric Niebler, who provided a valuable review (plus reassurance that English verb tenses are hard).

References

[1] ISO/IEC IS 14882:1998(E). Issue 226 in C++ Standard Library Active Issues List (Revision 28). Available at http://gcc.gnu.org/onlinedocs/libstdc++/ext/lwg-active.html#226.
[2] Andrei Alexandrescu. "Generic Programming: A Policy-Based basic_string Implementation," C++ Experts Online, June 2001. Available at http://moderncppdesign.com/publications/cuj-06-2001.html.
[3] Andrei Alexandrescu. "Generic Programming: Typed Buffers (II)," C++ Experts Online, October 2001. Available at http://moderncppdesign.com/publications/cuj-10-2001.html.
[4] Andrei Alexandrescu. "Generic Programming: Typed Buffers (III)," C++ Experts Online, December 2001. Available at http://moderncppdesign.com/publications/cuj-12-2001.html.
[5] Andrei Alexandrescu. "Generic Programming: A Policy-Based std::string Implementation," C++ Experts Online, August 2003. Available at http://www.cuj.com/documents/s=7992/cujcexp1908alexandr/ alexandr.htm.
[6] Andrei Alexandrescu. "Generic Programming: Move Constructors," C++ Experts Online, February 2003. Available at http://moderncppdesign.com/publications/cuj-02-2003.html.
[7] Harmut Kaizer. "The Boost Preprocessor." Available from http://boost.org/.
[8] Herb Sutter. Exceptional C++, Addison-Wesley Longman, 2000.
[9] Daveed Vandevoorde and Nicolai Josuttis. C++ Templates, Addison-Wesley Longman, 2003.
[10]Howard Hinnant, Jaakko Jrvi, Andrew Lumsdaine, and Jeremiah Willcock. "Function Overloading Based on Arbitrary Properties of Types," C/C++ Users Journal, June 2003.

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

yasli::vector Is on the Move

Mailcontainer

What They Wouldn't Want You to Know About std::vector

yasli::vector's Protocol

Controlling Overloading

Resizing Without Initialization

Guarded Assumptions

yasli::vector—Standard or Not?

Conclusion

Acknowledgments

References

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

yasli::vector Is on the Move

Mailcontainer

What They Wouldn't Want You to Know About std::vector

yasli::vector's Protocol

Controlling Overloading

Resizing Without Initialization

Guarded Assumptions

yasli::vector—Standard or Not?

Conclusion

Acknowledgments

References

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content