December 01, 2004
Flexible C++ #9: Succinct Options Validation with Expression TemplatesMatthew Wilson
One of C++'s greatest strengths is its support for powerful and succinct expression. (It can, of course, be a weakness if abused, but we're going to be optimistic in this installment.) This support includes facilities such as function/method overloading, operator overloading, templates, and exceptions to name but a few.
One of C++'s greatest strengths is its support for powerful and succinct expression. (It can, of course, be a weakness if abused, but we're going to be optimistic in this installment.) This support includes facilities such as function/method overloading, operator overloading, templates, and exceptions to name but a few.
This means we can write code such as the following snippet from a system on which I was recently working, which used the Open-RJ [1] library:
openrj::stl::database db(configFile, openrj::ORJ_FLAG_ELIDEBLANKRECORDS);
for(size_t i = 0; i < db.size(); ++i)
{
openrj::stl::record record = db[i];
openrj::stl::string_t Name = record["Name"];
. . . // More element retrieval
openrj::stl::string_t UpstreamConnectivity = record["UpstreamConnectivity"];
openrj::stl::string_t UpstreamPort = record["UpstreamPort"];
Channel *channel = new Channel(Name, . . ., UpstreamConnectivity, atoi(UpstreamPort.c_str());
What's actually happening is that the database fields belonging to the record instance are searched for the matching name, and then the value of that field is returned in an openrj::stl::string_t instance [2]. As you'd expect, if a field with the given name is not located in that record, an exception is thrown; in this case, it's std::out_of_range. By encapsulating the lookup, test, and value-return in the subscript method, we get to write succinct and clear client code. This is a big boon over many other languages in this respect.
There are two downsides to using libraries with such operator overloading, although neither is necessarily dissuasive to using them in the coding form shown above. First, one needs to be aware of the exception type thrown by the library when named fields are not found in records. This is a really big issue, and not one I intend to cover here in depth at this time, so I'll just say that there's a trade-off between the generality of exception types, and the specificity with which one would wish, or need, to deal with exceptions. If your library uses a generic exception, such as std::out_of_range, you need to catch exceptions closer to the throw site; otherwise, you may find yourself in a position where it is unclear from which subsystem you're catching the exception. Conversely, if your library uses a specific exception, you have to either leak out implementation knowledge to higher levels of the application, which introduces coupling and reduces encapsulation, or lose the specificity of your exception handling, and instead rely on the message accompanying all std::exception-derived exceptions, via the what() method. The specific problem for us here is that such exceptions, unlike, say, std::bad_alloc, are not best served at the outermost application level with a terse message of doom and immediate application termination. One would rather wish to capture them and perhaps couch their message in a wider, and more user-friendly, context.
The second downside is that you may wish for default values; i.e., use a field value if that field is present, otherwise using a default value. Naturally, if you had to catch the "not found" exception in your client code, it would undermine the succinctness of the record's subscript operator:
openrj::stl::record record(*r); openrj::stl::string_t Name = record["Name"]; . . . // More element retrieval openrj::stl::string_t UpstreamConnectivity = record["UpstreamConnectivity"]; openrj::stl::string_t UpstreamPort = record["UpstreamPort"]; openrj::stl::string_t HighWatermark;
openrj::stl::record record(*r); openrj::stl::string_t Name = record["Name"]; . . . // More element retrieval openrj::stl::string_t UpstreamConnectivity = record["UpstreamConnectivity"]; openrj::stl::string_t UpstreamPort = record["UpstreamPort"];
Validating Value OptionsIn our application, the UpstreamConnectivity had to be one of two values: Active or Passive. Naturally, we could have had the field named UpstreamIsActive, and provided Boolean values, but that would have made migration to other connectivity models needlessly difficult. In any case, it's not hard for us to imagine a field with three or more valid values. Normally, one might validate using if or, in the case of integral / enum variables, switch statements. However, this will immediately break our nice succinct and readable code:
openrj::stl::database database(configFile, openrj::ORJ_FLAG_ELIDEBLANKRECORDS);
for(size_t i = 0; i < db.size(); ++i)
{
openrj::stl::record record = db[i];
openrj::stl::string_t Name = record["Name"];
. . . // More element retrieval
openrj::stl::string_t UpstreamConnectivity = record["UpstreamConnectivity"];
openrj::stl::string_t UpstreamPort = record["UpstreamPort"];
stlsoft::verify_options(UpstreamConnectivity, "Invalid 'UpstreamConnectivity' value"), "Active", "Passive";That's pretty succinct and to the point, don't you agree? It reads: Verify that UpstreamConnectivity is equal to one of the options "Active" or "Passive". If not, report that it is not with the message "Unrecognised 'UpstreamConnectivity' value".So how does it work? Well, first let me address any qualms you may have that it's using arch macro trickery: It's not. There is no involvement from the preprocessor at all. This can be illustrated by rewriting it in method form:
stlsoft::verify_options(UpstreamConnectivity, "Unrecognised value for 'UpstreamConnectivity'").test("Active").test("Passive");
verify_options() is a creator function that returns an instance of stlsoft::options_verifier, which is defined as shown in Listing 1:
Listing 1
template< typename T
, typename XP = option_verification_policy
>
class options_verifier
: private XP // Inherit, so we can utilise EBO
{
public:
typedef T value_type;
typedef XP exception_policy_type;
typedef XP parent_class_type;
typedef options_verifier<T, XP> class_type;
public:
options_verifier(T const &value, char const *failureMessage)
: parent_class_type()
, m_value(value)
, m_failureMessage(failureMessage)
, m_bMatched(false)
{}
options_verifier(T const &value, exception_policy_type policy, char const *failureMessage)
: parent_class_type(policy)
, m_value(value)
, m_failureMessage(failureMessage)
, m_bMatched(false)
{}
options_verifier(class_type const &rhs)
: parent_class_type(rhs)
, m_value(rhs.m_value)
, m_failureMessage(rhs.m_failureMessage)
, m_bMatched(rhs.m_bMatched)
{
rhs.m_bMatched = true;
}
~options_verifier()
{
// If we've not had a match, and we're not currently unwinding
// from another exception, then we report the failure.
if( !m_bMatched &&
# if defined(__MWERKS__)
1)
# else /* ? compiler */
!::std::uncaught_exception())
# endif /* compiler */
{
exception_policy_type &policy = *this;
policy(m_failureMessage);
}
}
public:
template <typename U>
class_type &test(U const &u)
{
if( !m_bMatched &&
m_value == u)
{
m_bMatched = true;
}
return *this;
}
private:
T const &m_value; // The variable to monitor
char const *const m_failureMessage; // The failure message
mutable bool m_bMatched; // Match flag
private:
class_type &operator =(class_type const &);
};
It's not the most complex class ever written, to be sure, but it has a few subtleties worth looking into. Let's start with the member variables. m_value is a reference to the variable to be monitored. It's a reference to const since we're not going to be changing it. m_failureMessage is a const pointer to a C-style string, which will be used to report the exception. Since options_verifier is intended to only be used as a temporarywhich is returned by verify_options()there's no need to take a deep copy of the message. The last member, m_bMatched, is a (mutable) Boolean that is set when a matching option is found, via the test() method. With these members, we have all that we need: The methods and free functions combine to effect tests on the value, flagging any successful match, and throwing an exception based on the failure message if no matches are found.
As I mentioned before, verify_options() is a creator function, with two overloads to support the provision of a different exception policy, invoking different constructors of the options_verifier class:
template <:typename T> options_verifierThe constructors all do pretty straightforward thingstaking the value reference, the failure message and setting up the match flag. Note that the copy constructor sets the matched flag of its source instance, thereby "passing the baton" of testing on between copies; this is necessary to ensure that only the instance tested against the given option values can fail. Failure is effected in the destructor, where the only real complexity to the class comes in. If the match flag indicates that there have been no matches, then the exception policy function call operator is invoked to throw the exception. Since options_verifier inherits from the exception policy, in order to take advantage of the Empty Base Optimization [3], it is first cast to the exception policy type. Since throwing exceptions from destructors is a big no-no in C++, we also test the result of the standard function uncaught_exception(), which indicates whether an exception is currently active. Of course, options_verifier is intended for the sole context of a temporary instance returned by the verify_options() function, so unless one deliberately abuses the class, there is no way that its destructor will be called as a result of a thrown exception. Nonetheless, it's good to check; note that uncaught_exception() is not supported by Metrowerks CodeWarrior, for which we must rely on good practice alone [4]. Match testing is carried out in the test() function, which simply does an equality comparison of m_value and the comperand. It is implemented as a member template function so that it can support heterogeneous comparison, such as comparing std::string with C-style strings, as in the motivating example shown above. The last part of the puzzle is the overload for the comma operatoroperator ,() which takes an options_verifier instance and a reference to const value of arbitrary type. It simply calls test() on the verifier, and returns the reference to facilitate the option chaining:
template< typename T
, typename XP
, typename U
>
inline options_verifier<T, XP> &operator ,(options_verifier<T, XP> &ov, U const &u)
{
return ov.test(u);
}
All together, this gives us the sequence verify_options(), options_verifier(), test(), test(), . . . test(), ~options_verifier(), all wrapped up in a single statement. No nontemporary instances, no dangling referencesinline, efficient, safe, and, last but not least, succinct.
A Fuller Syntax?You may be looking at this and thinking, "Well, it's kind of nice and succinct, but the comma's a bit of a stretch." If so, I can understand your position. Perhaps a more polished approach would be to use logical operators, such that rather than looking like:verify_options(UpstreamConnectivity, "Invalid value"), "Active", "Passive";we'd actually see something like: verify_options(UpstreamConnectivity, "Invalid value") == "Active" || "Passive";The latter form certainly reads a bit more obviously, along the lines of "Verify that UpstreamConnectivity is equal to Active or Passive". There are two reasons I've not elected to follow this approach. First, and somewhat trivially, using operator || requires more space: at least one extra character, and maybe more, depending on your spacing conventions. Since the whole point of this component is succinct validation, we're kind of defeating ourselves a touch. Second, and more importantly, to support this syntax, we'd have to overload operator == and operator ||. That's simple in itself: Each can be overloaded in precisely the same manner as I've done with the comma operator. However, if that's done, then we can just as readily write other expressions with a mix of the operators that are anything but appropriate, such as: verify_options(UpstreamConnectivity, "Invalid value") == "Active" == "Passive";Indeed, if we were verifying integral types, we could even write: verify_options(Port, "Invalid value") || 100 == 200;Naturally, this is not good. Any kind of noninteger-emulating operator overloading is a slippery slope [3], but we've gone way past the first step here. We're careering wildly downhill towards the Land of the Obfuscated Programmer. As I see it, we have only three reasonable choices:
verify_options(UpstreamConnectivity, "Invalid value"), ("Active", "Passive");
which only tests UpstreamConnectivity against "Passive". The only other way in which it can be broken is if one of the tested option values is of a type for which an overloaded comma operator is defined. Since the values are overwhelmingly likely to be of Value Types [3], this does not represent a serious problem, in my opinion.
Of course, when operating slightly off the beaten track, there is much scope for equivocation, and you may choose to do things differently.
Beware Alternative Tokens!For grins, I'll share a little embarrassing experience I had when developing the options_verifier class before we finish. In order to cater to developers whose equipment may not support the full range of ASCII special characters used by C++, various alternative tokens are defined that act as lexical replacements for common operators [6]. For example, and_eq is parsed as &=, not as !, bitand as &,, and so on.. The problem I encountered was environmental. I prototyped the options_verifier using a compiler that does not recognize and translate the automatic tokens, and I was less than mentally adroit in failing to keep the alternative tokens in mind while developing. I'm sure you're ahead of me, and recognise immediately my mistake, which precipitated the following compilation errors when I started testing on other compilers:
GCC 3.2:
H:/STLSoft/Identities/STLSoft/stlsoft/stlsoft/options_verifier.hpp:145: parse
error before `||' token
H:/STLSoft/Identities/STLSoft/stlsoft/stlsoft/options_verifier.hpp:153: semicolon
missing after declaration of `stlsoft::options_verifier<T, XP>'
H:/STLSoft/Identities/STLSoft/stlsoft/stlsoft/options_verifier.hpp: In
constructor `stlsoft::options_verifier<T, XP>::options_verifier(const T&,
const char*)':
H:/STLSoft/Identities/STLSoft/stlsoft/stlsoft/options_verifier.hpp:121: class `
stlsoft::options_verifier<T, XP>' does not have any field named `m_value'
H:/STLSoft/Identities/STLSoft/stlsoft/stlsoft/options_verifier.hpp:122: class `
stlsoft::options_verifier<T, XP>' does not have any field named `
m_failureMessage'
H:/STLSoft/Identities/STLSoft/stlsoft/stlsoft/options_verifier.hpp:123: class `
stlsoft::options_verifier<T, XP>' does not have any field named `m_bMatched'
CodeWarrior 8.3:
# In: ..\..\..\..\stlsoft\options_verifier.hpp
# From: ..\options_verifier_test.cpp
# -------------------------------------
# 145: class_type &or(U const &u)
# Error: ^^
# illegal template declaration
### mwcc.exe Compiler:
# 146: {
# Error: ^
# declaration syntax error
### mwcc.exe Compiler:
# 147: if( !m_bMatched &&
# Error: ^^
# declaration syntax error
### mwcc.exe Compiler:
# 148: m_value == u)
# Error: ^^^^^^^
# declaration syntax error
After way too many minutes wasted puzzling and needlessly moving code around, the lights finally went on and I realized that I'd been using a reserved word for my method: What you see as test() in Listing 1 was formerly called or(). Ouch!
AcknowledgmentsThanks to Bjorn Karlsson, Garth Lancaster, John Torjo, and Walter Bright for their excellent criticisms and suggestions.
About the AuthorMatthew Wilson is a software development consultant for Synesis Software, and creator of the STLSoft libraries. He is author of Imperfect C++ (Addison-Wesley, 2004), and is currently working on his next two books, one of which is not about C++. Matthew can be contacted via http://imperfectcplusplus.com/.
Notes & References[1] Open-RJ (http://openrj.org/) is one of the exemplar projects for my "Positive Integration" column in CUJ. Note that the classes in the Open-RJ C++ and STL mappings are lightweight wrapper facades, and may be accessed by value as efficiently as by reference, as shown in the sample code. [2] You may (like a couple of my noble reviewers) be perplexed by the typedef string_t in the openrj::stl namespacethe namespace within which the Open-RJ STL mapping resides. Since there are occasional practical reasonscoupling, performance, custom behavior, compilers that do not support the STLto want to avoid the std::string type from your compiler vendor's standard library, I favor providing mechanisms (e.g. #defines) in the library for providing such types, while defaulting to the 'expected' type in the default case. Hence, Open-RJ's string_t (in the openrj::stl namespace) is std::string by default, but by using the typedef in our code we can be happily oblivious, or at least agnostic, should we have cause to customize it. [3] Imperfect C++, Matthew Wilson, Addison-Wesley, 2004. (See http://imperfectcplusplus.com/) [4] Without the protection afforded by uncaught_exception, users of CodeWarrior have the additional restriction that options_verifier should not be used in catch clauses. Naturally, this is not exactly going to be a hardship, but it's worth noting for pedagogy, if nothing else. [5] STLSoft is an open-source organization whose focus is the development of robust, lightweight, cross-platform STL-compatible software, and is located at http://www.stlsoft.org/. The options_verifier component will be available from version 1.8.3 onwards, which is due for release in early January 2005. It is also included in v1.8.3 beta 1, which is available now. [6] The C++ Programming Language, Bjarne Stroustrup, Addison-Wesley, 2000. (http://www.awprofessional.com/title/0201700735)
|
|
|||||||||||||||||||||||||||||||
|
|
|
|