FREE Subscription to Dr. Dobb’s Digest: Same Great Content, New Digital Edition
Site Archive (Complete)
C++
Email
Print
Reprint

add to:
Del.icio.us
Digg
Google
Furl
Slashdot
Y! MyWeb
Blink
November 01, 2005
Flexible C++ #13: Beware Mixed Collection/Enumerator Interfaces

Matthew Wilson
When the semantics of collection and enumerator interfaces are blurred, either through mistakes that we (the programmers) make, or by less-than-optimal design of the interfaces themselves, things can get hairy.
Flexible C++ #13

This month I take a look at some of the scrapes we can find ourselves in when the semantics of collection and enumerator interfaces are blurred, either through mistakes that we (the programmers) make, or by less-than-optimal design of the interfaces themselves.

COM Collections and Enumerators

The COM Collection model represents collections with a loosely-defined interface containing some or all of the following members: the methods Add(), Clear(), and Remove(), and the properties Count and Item. The one member that is mandatory is _NewEnum (which may be expressed as either method or property), which returns an enumerator in the form of a COM Enumerator interface. Where _NewEnum is the COM analogue to STL's begin() and end() or .NET's GetEnumerator(), COM enumerator interfaces are the analogue of an STL iterator, or .NET's IEnumerator interface. They always contain the four methods Next(), Skip(), Reset() and Clone(), with the significant variation being in the value types returned by the Next() method. (Convention has it that COM Collections provide enumerators with the IEnumVARIANT interface to be compatible with late binding code.) This simple and powerful model, although cumbersome to work with from C and C++, has been tried and tested over the past decade or so, and underpins an enormous number of COM applications.

I was recently informed by Jean-Marie Auville, a user of the STLSoft libraries (http://stlsoft.org/), that one of the COM components in STLSoft's COMSTL sub-project, the collection_sequence class template, was malfunctioning when used with a collection from Microsoft's Debug Interface Access (DIA) SDK. The user suggested that either the DIA SDK had a bug, or the collection_sequence did. Hubris (and the fact that it's had a lot of use) precluded my accepting that the COMSTL component could be in error. And despite a natural cynicism about what comes out of any large software company, I was also pretty skeptical that the DIA SDK would have such a crippling bug. When I dug into the problem I was pleased (relieved, actually) to learn that the collection_sequence didn't have a bug, and nor did the DIA SDK. What I did learn, however, was a lesson in getting caught out by "helpful" changes to established collection idioms.

comstl::collection_sequence

The COMSTL collection_sequence class template provides an STL-like wrapper over a COM collection, implementing the Instance Adaptor Pattern [1]. It has seven (!) template parameters [2], of which only the first four are not defaulted. These four represent the collection interface, the enumerator interface for that collection, the value type of the collection, and a policy type for determining how the value types will be initialised and copied. For example, we might use it to manipulate a fictitious Thingy collection, for which we have a process_Thingy client function, as shown in Listing 1.

Listing 1: Enumerating the Thingy collection contents using comstl::collection_sequence.

using comstl::collection_sequence;
using comstl::interface_policy;

void process_Thingy(Thingy &);

typedef collection_sequence < ICollThingy               // collection interface
                            , IEnumThingy               // enumerator interface
                            , Thingy                    // value type
                            , interface_policy<Thingy>  // value type handling policy
                            >     Thingy_coll_t;

ICollThingy   *pcoll = . . .; // Acquire a COM collection of "Thingy"s
Thingy_coll_t coll(pcoll, true); // true => coll calls AddRef() on pcoll

std::for_each(coll.begin(), coll.end(), process_Thingy);
Inside the begin() method of the sequence class, the get__NewEnum() method of the collection interface (ICollThingy) is called to retrieve the enumerator object via its IUnknown interface. (The _NewEnum property/method always returns the enumerator object via a pointer to its IUnknown.) This is then queried for the enumerator interface, and the result passed to an instance of the sequence's iterator. (As I mentioned above, the associated enumerator interface for COM collections is usually IEnumVARIANT, but it does not have to be: in this case it's IEnumThingy.) The iterator instance then uses the enumerator interface's Next() method to implement its iteration functionality. The alternative longhand form would look something like that shown in Listing 2.

Listing 2: Enumerating the Thingy collection contents longhand.
void process_Thingy(Thingy &);
void release_Thingy(Thingy &);

ICollThingy   *pcoll = . . .; // Acquire a COM collection of "Thingy"s
IUnknown      *punk;
HRESULT       hr;

hr = pcoll->get__NewEnum(&punk);
if(SUCCEEDED(hr))
{
  IEnumThingy *penum;
  hr = punk->QueryInterface(IID_IEnumThingy, reinterpret_cast<void**>(&penum));
  if(SUCCEEDED(hr))
  {
    Thingy    thingy;
    for(penum->Reset(); S_OK == penum->Next(1, &thingy, NULL); )
    {
      process_Thingy(&thingy);
      release_Thingy(&thingy);
    }
    penum->Release();
  }
  punk->Release();
}
Not only is this tedious, verbose, error-prone (make sure you don't forget the & when calling QueryInterface()!), it is also not exception-safe. If process_Thingy() throws an exception, you've just experienced at least two resource leaks [3]. Had the enumerator interface been IEnumVARIANT there would need to have been further work to get a Thingy from a VARIANT.

Thus the collection_sequence class template depends on the relationship between the two interfaces specified as its first two arguments and, just as with any other COM programming, it works correctly when this relationship is respected. It provides a simple, useful, efficient and robust shorthand for handling what is very verbose and error-prone coding in the hands of even the most accomplished COM programmers. But for all its virtues, it is still dependent on the stipulated relationship between the collection and enumerator interfaces being correctly specified by the programmer, because this is not something that the compiler can help us with. This is where the reported problem manifests itself.

Microsoft's Debug Interface Access (DIA) SDK

In what I assume was an attempt to simplify the often onerous amount of code required to manipulate COM interfaces in C/C++, the author's of Microsoft's DIA SDK have defined the IDiaEnumSymbols interface to have the get__NewEnum(), get_Count() and Item() methods in addition to the usual enumerator methods Next(), Skip(), Reset() and Clone(). (Listing 3 shows the IDL form of the composite interface.) In a very real sense, then, IDiaEnumSymbols can act as both a collection and an enumerator. This saves a degree of coding, since one no longer needs to call get__NewEnum() and QueryInterface() (see Listing 2) to navigate from collection to enumerator when manipulating the interfaces manually.

Listing 3: IDL definition of composite IDiaEnumSymbols interface.
interface IDiaEnumSymbols
  : IUnknown
{
  // COM collection properties/methods
  [propget, id(DISPID_NEWENUM)]
  HRESULT _NewEnum([out, retval] IUnknown **pRetVal);
  [propget, id(1)]
  HRESULT Count([out, retval] LONG *pRetVal);
  [id(0)]
  HRESULT Item( [in]          DWORD       index
            ,   [out, retval] IDiaSymbol  **symbol);

  // COM enumerator methods
  HRESULT Next( [in]  ULONG       celt
            ,   [out] IDiaSymbol  **rgelt
            ,   [out] ULONG       *pceltFetched);
  HRESULT Skip([in] ULONG celt);
  HRESULT Reset();
  HRESULT Clone([out] IDiaEnumSymbols **ppenum);
}
Aficionados of COM may have alarm bells ringing in their heads at this point, and I'd concur. The problem is that, by blurring the distinction between collection and enumerator, developers can be led astray by their hard won experience. (And experience in COM is generally very hard won, which partly accounts for its having fallen out of favour this millennium.) It's not just the inexperienced that fall prey to this. Despite having a lot of COM experience, both Jean-Marie and myself were wrong-footed. The designers of the DIA SDK have ignored the Principle of the Separation of Concerns, and thereby set a trap for their users.

It took me a lot of head banging and a good night's sleep to make the leap to the solution: despite being an enumerator interface itself, IDiaEnumSymbol's own enumerator interface is not IDiaEnumSymbols, but IEnumVARIANT. Hence, the original parameterization of collection_sequence (as shown in Listing 4) was not valid.

Listing 4: Erroneous specialization of the collection_sequence instance adaptor class template with the IDiaEnumSymbols interface.
void process_Symbol(IDiaSymbol *);

typedef collection_sequence< IDiaEnumSymbols
                           , IDiaEnumSymbols  // **** HERE!! ****
                           , IDiaSymbol*
                           , interface_policy<IDiaSymbol>
                           >   dia_symbols_coll_t;

IDiaEnumSymbols     *pdes = . . . 
dia_symbols_coll_t  coll(pdes, false); // false => coll eats the reference 

std::for_each(coll.begin(), coll.end(), process_Symbol);
To be sure, it's obvious after the fact, and the DIA SDK documentation of IDiaEnumSymbols::get__NewEnum() mentions IEnumVARIANT, but hindsight's always 20/20. The unfortunate part about this mistake is that it manifests only at runtime. This is because the relationship between COM interfaces are only "discovered" at runtime via IUnknown::QueryInterface(), via an exception [4].

The solution to the apparent "IDiaEnumSymbols bug" is to specify IEnumVARIANT as the enumerator interface. Consequently the value type must be changed to VARIANT, which means that each enumerated value has to be processed to acquire the IDiaSymbol interface from the VARIANT. Naturally, this introduces more code into the solution, nullifying the attempts of IDiaEnumSymbols' design to reduce verbosity, as illustrated in Listing 5.

Listing 5: Valid, but verbose, use of the collection_sequence instance adaptor class template with the IDiaEnumSymbols interface.
void process_Symbol(IDiaSymbol *);

typedef collection_sequence< IDiaEnumSymbols
                           , IEnumVARIANT
                           , VARIANT
                           , comstl::VARIANT_policy
                           >   dia_symbols_coll_t;

IDiaEnumSymbols     *pdes = . . . 
dia_symbols_coll_t  coll(pdes, false); // false => coll eats the reference 

for(dia_symbols_coll_t::iterator b = coll.begin(); b != coll.end(); ++b)
{
  assert((*b).vt == VT_UNKNOWN || (*b).vt == VT_DISPATCH);

  VARIANT    &var = *b;
  IDiaSymbol *psymbol;
  HRESULT    hr;

  hr = (*it).punkVal->QueryInterface(IID_IDiaSymbol, reinterpret_cast<void**>(&psymbol));
  {
    process_Symbol(psymbol);
    psymbol->Release(); // Don't forget to release it! (Hope process_Symbol() doesn't throw)
  }
}
Again, we've exposed a potential resource leak in the case where process_Symbol() might throw an exception.

comstl::interface_cast

COMSTL provides interface_cast custom casts—template classes that emulate built-in C++ cast syntax—that help to enhance the succinctness of such code. (The business of writing custom cast classes and functions is described in detail in Chapter 19 of Imperfect C++ [5], including the COMSTL interface casts). Using the interface casts can help somewhat in this case, as in:

Listing 6: Improved use of the collection_sequence instance adaptor class template with the IDiaEnumSymbols interface, using the COMSTL interface cast.
void process_Symbol(IDiaSymbol *);

typedef collection_sequence< IDiaEnumSymbols
                           , IEnumVARIANT
                           , VARIANT
                           , comstl::VARIANT_policy
                           >   dia_symbols_coll_t;

IDiaEnumSymbols     *pdes = . . . 
dia_symbols_coll_t  coll(pdes, false); // false => coll eats the reference 

for(dia_symbols_coll_t::iterator b = coll.begin(); b != coll.end(); ++b)
{
  assert((*b).vt == VT_UNKNOWN || (*b).vt == VT_DISPATCH);

  process_Symbol(comstl::interface_cast_noaddref<IDiaSymbol*>((*b).punkVal));
}

comstl::enumerator_sequence

However, there's a better solution. There's another COMSTL sequence class template, enumerator_sequence, which provides an STL-like wrapper over a COM enumerator interface. For example, with IEnumThingy, we would use it as in Listing 7:

Listing 7: Example use of the enumerator_sequence instance adaptor class template.
using comstl::enumerator_sequence;
using comstl::interface_policy;

void process_Thingy(Thingy &);

typedef enumerator_sequence < IEnumThingy               // enumerator interface
                            , Thingy                    // value type
                            , interface_policy<Thingy>  // value type handling policy
                            >     Thingy_enum_t;

IEnumThingy   *penum = . . .; // Acquire a COM collection of "Thingy"s
Thingy_enum_t en(penum, true); // true => en calls AddRef() on penum

std::for_each(en.begin(), en.end(), process_Thingy);
Thus, with mixed Collection/Enumerator interfaces, we can just use enumerator_sequence, as shown in Listing 8:

Listing 8: Optimal use of the enumerator_sequence instance adaptor class template with the IDiaEnumSymbols interface.
void process_Symbol(IDiaSymbol *);

typedef enumerator_sequence< IDiaEnumSymbols
                           , IDiaSymbol*
                           , interface_policy<IDiaSymbol>
                           >    dia_symbols_coll_t;

IDiaEnumSymbols     *pdes = . . . 
dia_symbols_coll_t  coll(pdes, false); // false => coll eats the reference 

std::for_each(coll.begin(), coll.end(), process_Symbol);
As well as resulting in less code (including a smaller typedef), the advantage here is that the type of the interface passed to an instance of the enumerator_sequence is enforced at compile time, so you've got one less thing to be diligent about in your testing.

Mixing Metaphors in .NET

Collection/enumerator mix-ups aren't restricted to COM. An analogous situation can arise in .NET, where one might be tempted to implement the IEnumerable and IEnumerator interfaces in the same class, in order to reduce the amount of boilerplate required in the implementation of simple collections. While I can rightly assert that I've never deployed any production code doing this "trick," I must admit to trying it out to see what would happen, and investing some effort into seeing if I could persuade it to behave well. I couldn't.

Let's look at an example to see how it might work. Consider that we might want to write a collection adaptor that would filter out items in an enumeration. For example, we might want to filter a collection of strings and only enumerate those whose length is greater than a given number of characters. (Well, we might!). The filter interface and a string filter class are shown in Listing 9.

Listing 9: ICollectionFilter and an implementing class StringLengthFilter.
interface ICollectionFilter
{
  bool IsItemAccepted(object o);
}

class StringLengthFilter
    : ICollectionFilter
{
  public StringLengthFilter(int minLength)
  {
    m_minLength = minLength;
  }

  public bool IsItemAccepted(object o)
  {
    return ((string)o).Length >= m_minLength;
  }

  private readonly int m_minLength;
}
Listing 10 shows part of a simple test program that populates a container and then searches a filtered view of it.

Listing 10: First half of collection filter test program.
class Test
{
  public static void Main(string[] args)
  {
    ArrayList coll    = new ArrayList();

    coll.Add("123");
    coll.Add("abcdef");
    coll.Add("9876543210");
    coll.Add("NOPQRSTUVQXYZ");

    IEnumerable   filteredCollection  = new FilteredCollection(coll, new StringLengthFilter(5));

    System.Console.WriteLine("Unfiltered items:");
    foreach(string s in coll)
    {
      System.Console.WriteLine("  " + s);
    }

    System.Console.WriteLine("Filtered items:");
    foreach(string s in filteredCollection)
    {
      System.Console.WriteLine("  " + s);
    }

    System.Console.WriteLine("Filtered items (take 2):");
    foreach(string s in filteredCollection)
    {
      System.Console.WriteLine("  " + s);
    }

    . . .
  }
}
The wrong way to implement the FilteredCollection class is shown in Listing 11.

Listing 11: Incorrect way to implement FilteredCollection.
class FilteredCollection
  : IEnumerable
  , IEnumerator
{
  public FilteredCollection(IEnumerable coll, ICollectionFilter filter)
  {
    m_enumerator  = coll.GetEnumerator();
    m_filter    = filter;
  }

  public IEnumerator GetEnumerator()
  {
    return this;
  }

  public bool MoveNext()
  {
    for(; m_enumerator.MoveNext(); )
    {
      if(m_filter.IsItemAccepted(m_enumerator.Current))
      {
        return true;
      }
    }

    return false;
  }

  public object Current
  {
    get { return m_enumerator.Current; }
  }

  public void Reset()
  {
    m_enumerator.Reset();
  }
  
  private IEnumerator     m_enumerator;
  private ICollectionFilter m_filter;
}
Beguilingly, this works the first time the filtered items are enumerated. However, in the "Take 2" loop, nothing happens. This is because the underlying enumerator is now at its end position. Because the implicit call to GetEnumerator() in the foreach statement has returned "this" for both loops, the first loop leaves the filter at its end position. Thus, with a composite collection/enumerator, to enumerate as expected, one would be required to call IEnumerator.Reset() in order to be able to call IEnumerable.GetEnumerator() additional times.

Even worse, when such a collection is used to instantiate nested loops (as in the second half of the test program, shown in Listing 12) the client code can never perform as intended.

Listing 12: Second half of collection filter test program.
    . . .
    IEnumerable   filteredCollection2 = new FilteredCollection(coll, new StringLengthFilter(5));
    . . .
    System.Console.WriteLine("Enumerating with inner and outer loops:");
    foreach(string s1 in filteredCollection2)
    {
      foreach(string s2 in filteredCollection2)
      {
        if(Object.ReferenceEquals(s1, s2))
        {
          System.Console.WriteLine("  Same instance: [s1={0}, s2={1}]", s1, s2);
        }
        else
        {
          if(s1 == s2)
          {
            System.Console.WriteLine("  Equal instances: [s1={0}, s2={1}]", s1, s2);
          }
          else
          {
            System.Console.WriteLine("  Unequal instances: [s1={0}, s2={1}]", s1, s2);
          }
        }
      }
    }
  }
}
The right way is to do what's prescribed by .NET lore, and separate the collection from the enumerator, as shown in Listing 13.

Listing 13: Correct implementation of FilteredCollection class.
class FilteredEnumerator
  : IEnumerator
{
  public FilteredEnumerator(IEnumerator enumerator, ICollectionFilter filter)
  {
    m_enumerator  = enumerator;
    m_filter    = filter;
  }

  public bool MoveNext()
  {
    for(; m_enumerator.MoveNext(); )
    {
      if(m_filter.IsItemAccepted(m_enumerator.Current))
      {
        return true;
      }
    }
    return false;
  }

  public object Current
  {
    get { return m_enumerator.Current; }
  }

  public void Reset()
  {
    m_enumerator.Reset();
  }

  private IEnumerator     m_enumerator;
  private ICollectionFilter m_filter;
}

class FilteredCollection
  : IEnumerable
{
  public FilteredCollection(IEnumerable coll, ICollectionFilter filter)
  {
    m_collection  = coll;
    m_filter    = filter;
  }

  public IEnumerator GetEnumerator()
  {
    return new FilteredEnumerator(m_collection.GetEnumerator(), m_filter);
  }

  private IEnumerable     m_collection;
  private ICollectionFilter m_filter;
}
The output of the correct implementation is:

Unfiltered items:
  123
  abcdef
  9876543210
  NOPQRSTUVQXYZ
Filtered items:
  abcdef
  9876543210
  NOPQRSTUVQXYZ
Filtered items (take 2):
  abcdef
  9876543210
  NOPQRSTUVQXYZ
Enumerating with inner and outer loops:
  Same instance: [s1=abcdef, s2=abcdef]
  Unequal instances: [s1=abcdef, s2=9876543210]
  Unequal instances: [s1=abcdef, s2=NOPQRSTUVQXYZ]
  Unequal instances: [s1=9876543210, s2=abcdef]
  Same instance: [s1=9876543210, s2=9876543210]
  Unequal instances: [s1=9876543210, s2=NOPQRSTUVQXYZ]
  Unequal instances: [s1=NOPQRSTUVQXYZ, s2=abcdef]
  Unequal instances: [s1=NOPQRSTUVQXYZ, s2=9876543210]
  Same instance: [s1=NOPQRSTUVQXYZ, s2=NOPQRSTUVQXYZ] 
Contrast this with the strange behaviour of the incorrect version:

Unfiltered items:
  123
  abcdef
  9876543210
  NOPQRSTUVQXYZ
Filtered items:
  abcdef
  9876543210
  NOPQRSTUVQXYZ
Filtered items (take 2):
Enumerating with inner and outer loops:
  Unequal instances: [s1=abcdef, s2=9876543210]
  Unequal instances: [s1=abcdef, s2=NOPQRSTUVQXYZ] 

Though this behaviour is comprehensible to us, who are privy to the internals of this malfunctioning implementation, it could be quite jarring to a (rightly) unsuspecting user of such a class.

Summary

The conclusion is simple: Beware mixed Collection/Enumerator interfaces. Be extra careful when you are forced to use them, and don't ever inflict them on your users!

Acknowledgements

Thanks to Bjorn Karlsson, Christopher Diggins, Chuck Allison, Garth Lancaster, Greg Peet, John Torjo, Nevin Liber, Pablo Aguilar, Sean Kelly, Thorsten Ottosen, and Walter Bright.

About the Author

Matthew is a development consultant for Synesis Software, and creator of the STLSoft libraries. He is author of Imperfect C++ (Addison-Wesley, 2004), and is currently working on his modicum opus, Extended STL (to be published in 2006); the internals of the COM sequence adaptors feature in volume 1. Matthew can be contacted via http://imperfectcplusplus.com/.

Notes & References

[1] The Instance Adaptor pattern is described in Design Patterns, Gamma et al, Addison-Wesley, 1995, where it is known as Object Adaptor. The latter term is, in my opinion, ambiguous, so I prefer Instance Adaptor. Whatever the name, it refers to an instance of a class that takes an instance of another class, and adapts its interface (the source) to an interface (the destination) required by the application. In this case, the source interfaces are the COM collection and enumerator interfaces, and the destination interface is the STL iterator model.

[2] Kevlin Henney has postulated what I like to call Henney's Hypothesis: "For each additional [non-defaulted] template parameter, the potential number of users is halved." (Kevlin, perhaps rightly, demurs from my attempts to call it a Law, and I'm not too sure he likes it being called a Hypothesis, but I think it's a very important postulate, and worthy of the grandiloquent term, if only to seal it in our consciousness.) The weight of the hypothesis is obvious and compelling for anyone who makes it their business to write template libraries: struggling to understand the ramifications of several template parameters in one's own code is a sobering experience.

[3] The longhand version also retrieves the items one at a time, which can be inefficient in cases where the enumerator instance is in another apartment, or when the cost of accessing/synthesising its elements can be amortised by doing so in multiples. collection_sequence facilitates retrieving multiple elements at a time via a template parameter (which defaults to 8). To enumerate en bloc manually makes the code even more verbose and fragile; I leave this as an exercise for the enthusiastic reader.

[4] Like many parts of the STLSoft libraries, the collection_sequence class template has well-defined behaviour when compiled without exception support. In the case of the wrong enumerator being specified, the begin() method returns an "end()" iterator, indicating to client code that there are no items to be enumerated. In such compilation contexts, therefore, the misspecification of interfaces is even harder to detect.

[5] Imperfect C++, Matthew Wilson, Addison-Wesley, 2005.

TOP 5 ARTICLES
No Top Articles.



MICROSITES
FEATURED TOPIC

ADDITIONAL TOPICS

INFO-LINK