FREE Subscription to Dr. Dobb’s Digest: Same Great Content, New Digital Edition
Site Archive (Complete)
C++
Email
Print
Reprint

add to:
Del.icio.us
Digg
Google
Furl
Slashdot
Y! MyWeb
Blink
December 01, 2004
Flexible C++ #9: Succinct Options Validation with Expression Templates

Matthew Wilson
One of C++'s greatest strengths is its support for powerful and succinct expression. (It can, of course, be a weakness if abused, but we're going to be optimistic in this installment.) This support includes facilities such as function/method overloading, operator overloading, templates, and exceptions to name but a few.
Untitled Document

One of C++'s greatest strengths is its support for powerful and succinct expression. (It can, of course, be a weakness if abused, but we're going to be optimistic in this installment.) This support includes facilities such as function/method overloading, operator overloading, templates, and exceptions to name but a few.

This means we can write code such as the following snippet from a system on which I was recently working, which used the Open-RJ [1] library:

openrj::stl::database db(configFile, openrj::ORJ_FLAG_ELIDEBLANKRECORDS); 

for(size_t i = 0; i < db.size(); ++i)
{
  openrj::stl::record   record               = db[i];
  openrj::stl::string_t Name                 = record["Name"];

  . . . // More element retrieval

  openrj::stl::string_t UpstreamConnectivity = record["UpstreamConnectivity"];
  openrj::stl::string_t UpstreamPort         = record["UpstreamPort"];

  Channel *channel = new Channel(Name, . . ., UpstreamConnectivity, atoi(UpstreamPort.c_str());
What's actually happening is that the database fields belonging to the record instance are searched for the matching name, and then the value of that field is returned in an openrj::stl::string_t instance [2]. As you'd expect, if a field with the given name is not located in that record, an exception is thrown; in this case, it's std::out_of_range. By encapsulating the lookup, test, and value-return in the subscript method, we get to write succinct and clear client code. This is a big boon over many other languages in this respect. There are two downsides to using libraries with such operator overloading, although neither is necessarily dissuasive to using them in the coding form shown above. First, one needs to be aware of the exception type thrown by the library when named fields are not found in records. This is a really big issue, and not one I intend to cover here in depth at this time, so I'll just say that there's a trade-off between the generality of exception types, and the specificity with which one would wish, or need, to deal with exceptions. If your library uses a generic exception, such as std::out_of_range, you need to catch exceptions closer to the throw site; otherwise, you may find yourself in a position where it is unclear from which subsystem you're catching the exception. Conversely, if your library uses a specific exception, you have to either leak out implementation knowledge to higher levels of the application, which introduces coupling and reduces encapsulation, or lose the specificity of your exception handling, and instead rely on the message accompanying all std::exception-derived exceptions, via the what() method. The specific problem for us here is that such exceptions, unlike, say, std::bad_alloc, are not best served at the outermost application level with a terse message of doom and immediate application termination. One would rather wish to capture them and perhaps couch their message in a wider, and more user-friendly, context. The second downside is that you may wish for default values; i.e., use a field value if that field is present, otherwise using a default value. Naturally, if you had to catch the "not found" exception in your client code, it would undermine the succinctness of the record's subscript operator:
  openrj::stl::record   record(*r);
  openrj::stl::string_t Name                 = record["Name"];

  . . . // More element retrieval

  openrj::stl::string_t UpstreamConnectivity = record["UpstreamConnectivity"];
  openrj::stl::string_t UpstreamPort         = record["UpstreamPort"];
  openrj::stl::string_t HighWatermark;
  try
  {
    HighWatermark                            = record["HighWatermark"];
  }
  catch(std::out_of_range &)
  {
    HighWatermark                            = "16384";
  }
Now that's truly awful stuff! Thankfully, the solution to this is reasonably simple: The record class has a get_field_value() method, whose second parameter is the default value if the named field does not exist:
  openrj::stl::record   record(*r);
  openrj::stl::string_t Name                 = record["Name"];

  . . . // More element retrieval

  openrj::stl::string_t UpstreamConnectivity = record["UpstreamConnectivity"];
  openrj::stl::string_t UpstreamPort         = record["UpstreamPort"];
  openrj::stl::string_t HighWatermark        = record.get_field_value("HighWatermark", "16384");
Thus, we can see how straightforward it is, in C++, to use very powerful functionality in clear and succinct client code. What we've been able to effect, with our use of the subscript operator and the get_field_value() method, is a set of constraints on the structure of our channel information records in the Open-RJ database configuration file, and validate these constraints by our code. So far we've validated the requisite record + field structure, but applications usually also need to ensure that the values are correct. This comes in two forms: validating that a value falls within a set of known options, and validating that (numeric) values fall within a given range. The latter can be done with C++'s built-in arithmetic range comparison operators, <, >=, and so on. The former is the challenge that we'll be addressing in the remainder of this month's article.

Validating Value Options

In our application, the UpstreamConnectivity had to be one of two values: Active or Passive. Naturally, we could have had the field named UpstreamIsActive, and provided Boolean values, but that would have made migration to other connectivity models needlessly difficult. In any case, it's not hard for us to imagine a field with three or more valid values. Normally, one might validate using if or, in the case of integral / enum variables, switch statements. However, this will immediately break our nice succinct and readable code:
openrj::stl::database database(configFile, openrj::ORJ_FLAG_ELIDEBLANKRECORDS); 

for(size_t i = 0; i < db.size(); ++i)
{
  openrj::stl::record   record               = db[i];
  openrj::stl::string_t Name                 = record["Name"];

  . . . // More element retrieval

  openrj::stl::string_t UpstreamConnectivity = record["UpstreamConnectivity"];
  openrj::stl::string_t UpstreamPort         = record["UpstreamPort"];

  if( UpstreamConnectivity != "Active" &&
      UpstreamConnectivity != "Passive")
  {
    . . . // Throw a wobbly
  }
Channel *channel = new Channel(Name, . . ., UpstreamConnectivity, atoi(UpstreamPort.c_str());
How do we keep our succinctness? It'd be nice to have a one-liner to validate the values of UpstreamConnectivity, and really nice if the code largely pertained to the logic, with as little infrastructure as possible. Well, here's one I prepared earlier:
stlsoft::verify_options(UpstreamConnectivity, "Invalid 'UpstreamConnectivity' value"), "Active", "Passive";
That's pretty succinct and to the point, don't you agree? It reads:
Verify that UpstreamConnectivity is equal to one of the options "Active" or "Passive". 
If not, report that it is not with the message "Unrecognised 'UpstreamConnectivity' value".
So how does it work? Well, first let me address any qualms you may have that it's using arch macro trickery: It's not. There is no involvement from the preprocessor at all. This can be illustrated by rewriting it in method form:
stlsoft::verify_options(UpstreamConnectivity, "Unrecognised value for 'UpstreamConnectivity'").test("Active").test("Passive");
verify_options() is a creator function that returns an instance of stlsoft::options_verifier, which is defined as shown in Listing 1:
Listing 1
template< typename T
        , typename XP  = option_verification_policy
        >
class options_verifier
  : private XP // Inherit, so we can utilise EBO
{
public:
  typedef T                       value_type;
  typedef XP                      exception_policy_type;
  typedef XP                      parent_class_type;
  typedef options_verifier<T, XP> class_type;

public:
  options_verifier(T const &value, char const *failureMessage)
    : parent_class_type()
    , m_value(value)
    , m_failureMessage(failureMessage)
    , m_bMatched(false)
  {}
  options_verifier(T const &value, exception_policy_type policy, char const *failureMessage)
    : parent_class_type(policy)
    , m_value(value)
    , m_failureMessage(failureMessage)
    , m_bMatched(false)
  {}
  options_verifier(class_type const &rhs)
    : parent_class_type(rhs)
    , m_value(rhs.m_value)
    , m_failureMessage(rhs.m_failureMessage)
    , m_bMatched(rhs.m_bMatched)
  {
    rhs.m_bMatched = true;
  }
  ~options_verifier()
  {
    // If we've not had a match, and we're not currently unwinding
    // from another exception, then we report the failure.
    if( !m_bMatched &&
# if defined(__MWERKS__)
        1)
# else /* ? compiler */
        !::std::uncaught_exception())
# endif /* compiler */
    {
      exception_policy_type   &policy =   *this;

      policy(m_failureMessage);
    }
  }

public:
  template <typename U>
  class_type &test(U const &u)
  {
    if( !m_bMatched &&
        m_value == u)
    {
      m_bMatched = true;
    }

    return *this;
  }

private:
  T const           &m_value;         // The variable to monitor
  char const *const m_failureMessage; // The failure message
  mutable bool      m_bMatched;       // Match flag

private:
  class_type &operator =(class_type const &);
};
It's not the most complex class ever written, to be sure, but it has a few subtleties worth looking into. Let's start with the member variables. m_value is a reference to the variable to be monitored. It's a reference to const since we're not going to be changing it. m_failureMessage is a const pointer to a C-style string, which will be used to report the exception. Since options_verifier is intended to only be used as a temporary—which is returned by verify_options()—there's no need to take a deep copy of the message. The last member, m_bMatched, is a (mutable) Boolean that is set when a matching option is found, via the test() method. With these members, we have all that we need: The methods and free functions combine to effect tests on the value, flagging any successful match, and throwing an exception based on the failure message if no matches are found. As I mentioned before, verify_options() is a creator function, with two overloads to support the provision of a different exception policy, invoking different constructors of the options_verifier class:
template <:typename T>
options_verifier verify_options(T const &value, char const *failureMessage)
{
  return options_verifier(value, failureMessage);
}

template< typename T
        , typename XP
        >
options_verifier<T, XP> verify_options(T const &value, XP const &policy, char const *failureMessage)
{
  return options_verifier<T, XP>(value, failureMessage);
}
The constructors all do pretty straightforward things—taking the value reference, the failure message and setting up the match flag. Note that the copy constructor sets the matched flag of its source instance, thereby "passing the baton" of testing on between copies; this is necessary to ensure that only the instance tested against the given option values can fail. Failure is effected in the destructor, where the only real complexity to the class comes in. If the match flag indicates that there have been no matches, then the exception policy function call operator is invoked to throw the exception. Since options_verifier inherits from the exception policy, in order to take advantage of the Empty Base Optimization [3], it is first cast to the exception policy type. Since throwing exceptions from destructors is a big no-no in C++, we also test the result of the standard function uncaught_exception(), which indicates whether an exception is currently active. Of course, options_verifier is intended for the sole context of a temporary instance returned by the verify_options() function, so unless one deliberately abuses the class, there is no way that its destructor will be called as a result of a thrown exception. Nonetheless, it's good to check; note that uncaught_exception() is not supported by Metrowerks CodeWarrior, for which we must rely on good practice alone [4]. Match testing is carried out in the test() function, which simply does an equality comparison of m_value and the comperand. It is implemented as a member template function so that it can support heterogeneous comparison, such as comparing std::string with C-style strings, as in the motivating example shown above. The last part of the puzzle is the overload for the comma operator—operator ,() —which takes an options_verifier instance and a reference to const value of arbitrary type. It simply calls test() on the verifier, and returns the reference to facilitate the option chaining:
template< typename T
        , typename XP
        , typename U
        >
inline options_verifier<T, XP> &operator ,(options_verifier<T, XP> &ov, U const &u)
{
  return ov.test(u);
}
All together, this gives us the sequence verify_options(), options_verifier(), test(), test(), . . . test(), ~options_verifier(), all wrapped up in a single statement. No nontemporary instances, no dangling references—inline, efficient, safe, and, last but not least, succinct.

A Fuller Syntax?

You may be looking at this and thinking, "Well, it's kind of nice and succinct, but the comma's a bit of a stretch." If so, I can understand your position. Perhaps a more polished approach would be to use logical operators, such that rather than looking like:
verify_options(UpstreamConnectivity, "Invalid value"), "Active", "Passive";
we'd actually see something like:
verify_options(UpstreamConnectivity, "Invalid value") == "Active" || "Passive";
The latter form certainly reads a bit more obviously, along the lines of "Verify that UpstreamConnectivity is equal to Active or Passive". There are two reasons I've not elected to follow this approach. First, and somewhat trivially, using operator || requires more space: at least one extra character, and maybe more, depending on your spacing conventions. Since the whole point of this component is succinct validation, we're kind of defeating ourselves a touch. Second, and more importantly, to support this syntax, we'd have to overload operator == and operator ||. That's simple in itself: Each can be overloaded in precisely the same manner as I've done with the comma operator. However, if that's done, then we can just as readily write other expressions with a mix of the operators that are anything but appropriate, such as:
verify_options(UpstreamConnectivity, "Invalid value") == "Active" == "Passive";
Indeed, if we were verifying integral types, we could even write:
verify_options(Port, "Invalid value") || 100 == 200;
Naturally, this is not good. Any kind of noninteger-emulating operator overloading is a slippery slope [3], but we've gone way past the first step here. We're careering wildly downhill towards the Land of the Obfuscated Programmer. As I see it, we have only three reasonable choices:
  1. Avoid operator overloading entirely. The syntax is unambiguous—the concatenation of test() calls shown earlier—but not succinct.
  2. Use the comma operator syntax. We've abused the comma operator's natural semantics—sequencing—for our own arch purposes, but it's succinct and is unlikely to cause problems (see below).
  3. Use an enhanced operator overloading to support only the "sensible" syntax of a single == and subsequent ||.
The latter choice would require that we overload operator == to take an options_verifier instance as its left parameter and return an instance of a different type, which would wrap the instance, say, options_verifier_comparison_ref. This type would then be used as the left-hand argument and the return type to operator ||. In this way, the "correct" syntax would be enforced by the compiler. I've coded this up—you can find it in the stlsoft/options_verifier.hpp in STLSoft 1.8.3 (beta 1) onwards [5]; look for options_verifier_comparison_ref—but it relies on some const_casting in the == operator. I'm comfortable with the comma operator because it's succinct, there's only one operator involved, and it's quite hard to make it misbehave; one would have to use braces to precipitate built-in sequence semantics, as in:
verify_options(UpstreamConnectivity, "Invalid value"), ("Active", "Passive");
which only tests UpstreamConnectivity against "Passive". The only other way in which it can be broken is if one of the tested option values is of a type for which an overloaded comma operator is defined. Since the values are overwhelmingly likely to be of Value Types [3], this does not represent a serious problem, in my opinion. Of course, when operating slightly off the beaten track, there is much scope for equivocation, and you may choose to do things differently.

Beware Alternative Tokens!

For grins, I'll share a little embarrassing experience I had when developing the options_verifier class before we finish. In order to cater to developers whose equipment may not support the full range of ASCII special characters used by C++, various alternative tokens are defined that act as lexical replacements for common operators [6]. For example, and_eq is parsed as &=, not as !, bitand as &,, and so on.. The problem I encountered was environmental. I prototyped the options_verifier using a compiler that does not recognize and translate the automatic tokens, and I was less than mentally adroit in failing to keep the alternative tokens in mind while developing. I'm sure you're ahead of me, and recognise immediately my mistake, which precipitated the following compilation errors when I started testing on other compilers:
GCC 3.2:
H:/STLSoft/Identities/STLSoft/stlsoft/stlsoft/options_verifier.hpp:145: parse
   error before `||' token
H:/STLSoft/Identities/STLSoft/stlsoft/stlsoft/options_verifier.hpp:153: semicolon
   missing after declaration of `stlsoft::options_verifier<T, XP>'
H:/STLSoft/Identities/STLSoft/stlsoft/stlsoft/options_verifier.hpp: In
   constructor `stlsoft::options_verifier<T, XP>::options_verifier(const T&,
   const char*)':
H:/STLSoft/Identities/STLSoft/stlsoft/stlsoft/options_verifier.hpp:121: class `
   stlsoft::options_verifier<T, XP>' does not have any field named `m_value'
H:/STLSoft/Identities/STLSoft/stlsoft/stlsoft/options_verifier.hpp:122: class `
   stlsoft::options_verifier<T, XP>' does not have any field named `
   m_failureMessage'
H:/STLSoft/Identities/STLSoft/stlsoft/stlsoft/options_verifier.hpp:123: class `
   stlsoft::options_verifier<T, XP>' does not have any field named `m_bMatched'

CodeWarrior 8.3:
#      In: ..\..\..\..\stlsoft\options_verifier.hpp
#    From: ..\options_verifier_test.cpp
# -------------------------------------
#     145:      class_type &or(U const &u)
#   Error:                  ^^
#   illegal template declaration
### mwcc.exe Compiler:
#     146:      {
#   Error:      ^
#   declaration syntax error
### mwcc.exe Compiler:
#     147:          if( !m_bMatched &&
#   Error:          ^^
#   declaration syntax error
### mwcc.exe Compiler:
#     148:              m_value == u)
#   Error:              ^^^^^^^
#   declaration syntax error

After way too many minutes wasted puzzling and needlessly moving code around, the lights finally went on and I realized that I'd been using a reserved word for my method: What you see as test() in Listing 1 was formerly called or(). Ouch!

Acknowledgments

Thanks to Bjorn Karlsson, Garth Lancaster, John Torjo, and Walter Bright for their excellent criticisms and suggestions.

About the Author

Matthew Wilson is a software development consultant for Synesis Software, and creator of the STLSoft libraries. He is author of Imperfect C++ (Addison-Wesley, 2004), and is currently working on his next two books, one of which is not about C++. Matthew can be contacted via http://imperfectcplusplus.com/.

Notes & References

[1] Open-RJ (http://openrj.org/) is one of the exemplar projects for my "Positive Integration" column in CUJ. Note that the classes in the Open-RJ C++ and STL mappings are lightweight wrapper facades, and may be accessed by value as efficiently as by reference, as shown in the sample code.

[2] You may (like a couple of my noble reviewers) be perplexed by the typedef string_t in the openrj::stl namespace—the namespace within which the Open-RJ STL mapping resides. Since there are occasional practical reasons—coupling, performance, custom behavior, compilers that do not support the STL—to want to avoid the std::string type from your compiler vendor's standard library, I favor providing mechanisms (e.g. #defines) in the library for providing such types, while defaulting to the 'expected' type in the default case. Hence, Open-RJ's string_t (in the openrj::stl namespace) is std::string by default, but by using the typedef in our code we can be happily oblivious, or at least agnostic, should we have cause to customize it.

[3] Imperfect C++, Matthew Wilson, Addison-Wesley, 2004. (See http://imperfectcplusplus.com/)

[4] Without the protection afforded by uncaught_exception, users of CodeWarrior have the additional restriction that options_verifier should not be used in catch clauses. Naturally, this is not exactly going to be a hardship, but it's worth noting for pedagogy, if nothing else.

[5] STLSoft is an open-source organization whose focus is the development of robust, lightweight, cross-platform STL-compatible software, and is located at http://www.stlsoft.org/. The options_verifier component will be available from version 1.8.3 onwards, which is due for release in early January 2005. It is also included in v1.8.3 beta 1, which is available now.

[6] The C++ Programming Language, Bjarne Stroustrup, Addison-Wesley, 2000. (http://www.awprofessional.com/title/0201700735)

TOP 5 ARTICLES
No Top Articles.



MICROSITES
FEATURED TOPIC

ADDITIONAL TOPICS

INFO-LINK