When I read about the Interpreter pattern in Design Patterns [1], it was like being hit in the face with a two-by-four. This is my code! I thought. In complete ignorance, I had used something akin to Interpreter to solve the same problem that the GoF [2] used as their motivational example: regular expressions. Interpreter (in the design pattern sense) is a wonderfully clean and powerful way to express grammar rules in code such that they can be interpreted, or executed, at run time. But this pattern often passed over because of performance considerations. Even the GoF discourages its use for high-performance applications:
The Interpreter pattern works best when ... efficiency is not a critical concern. The most efficient interpreters are not implemented by interpreting a parse tree directly but by first translating them into another form. For example, regular expressions are often transformed into state machines. ([1], p.245)
It doesnt have to be this way. True, a straightforward, naive implementation of Interpreter has some serious performance issues. But some very simple tricks yield an implementation of Interpreter that rivals state machines for speed without giving up any of the patterns benefits: clean, modular, and extensible design. I have used these techniques while implementing the GRETA Regular Expression Template Archive [3], a regular expression engine that, at the time of writing, outperforms Boosts regex [4] in many common usage scenarios [5]. Below I describe one of the tricks I use. The trick involves using templates to inline virtual function calls.
A Problem of Interpretation
If you can express your problem as a mini-language, then the Interpreter pattern can give you some traction. Typically, your language translates directly into code by assigning each grammar rule in your language to a class. Sentences in your language parse into trees. To execute these sentences, you traverse these trees recursively. This is the modus operandi of the Interpreter design pattern. Virtual function calls get you from one node in the tree to the next. Thats where the performance problems creep in, so thats where Ill start. Below is a specific example, taken from Design Patterns.
Here is a grammar rule from the regular expression language:
repetition ::= expression '*'
repetition is some expression followed by the character literal '*'. This rule lets you write Perl regular expressions like /a*/, which matches zero or more 'a's. A straightforward implementation of an interpreter for this rule might look like the code in Listing 1.
With these classes, I can parse the regular expression /a*/ into a tree as follows:
// parse and interpret /a*/ RegularExpression * pRE = new RepetitionExpression( new LiteralExpression('a')); const char *sz = "aaaaaabbbbbb"; pRE->Interpret(sz);
This is all very nice. Its clean and it works. If you follow the first rule of optimization, Dont do it, then you are done. But wheres the fun in that? You can do better. The big problem with this implementation is the following line in RepetitionExpression::Interpret:
while(repeat_->Interpret(sz));
repeat_ is a pointer to the polymorphic base RegularExpression, and Interpret is virtual, so this call uses the virtual function call mechanism. A virtual function call is slow: from repeat_, you fetch the pointer to the virtual function table, then you index into that table to get a pointer to the Interpret method, and only then can you call it. Whew! Worse, this is happening in a tight loop, so you do it over and over again. (Technically, C++ implementations are not required to use virtual function tables, but in reality v-tables are the only game in town.)
Heres the real kicker: all this work is totally unnecessary. For this particular tree (which is representing the regular expression /a*/), the object pointed to by repeat_ is never going to change. It will always point to a LiteralExpression and repeat_->Interpret will always call LiteralExpression::Interpret, but you blindly make virtual function calls every time. Now you see why Interpreter has a reputation for being slow: unnecessary virtual function calls. At first blush, it seems like this is a fatal flaw of the pattern. After all, the pattern is the tree, and it is the polymorphic base pointers that hold the tree together.
The problem is that when the time comes to actually Interpret the expression, the compiler doesnt know the exact types of the nodes in the tree, so it has no choice but to make the function calls virtual. Is there a better way? Well, you do know the exact types when you are building the tree (i.e., parsing the pattern). If there were some way to preserve that information and use it in Interpret, youd be golden. In fact, there is.
Partial Evaluation
The elegance and simplicity of the Interpreter pattern is evident in RepetitionExpression. It is totally general. It can repeat any RegularExpression. But that generality comes at a price: unnecessary virtual function calls leading to slow execution times. Lets mess things up. Suppose I want to repeat literals really fast, but I dont care so much about repeating other expressions quickly. I could do it by writing an extra class that specialized for repeating literals. See the RepeatLiteral class in Listing 2.
Now, when I am parsing a pattern and encounter a '*', I check to see if the '*' is operating on a literal. If so, I use RepeatLiteral; otherwise, I use the more general RepetitionExpression.
Wait a minute, whats going on here? What is this?
while(literal_-> LiteralExpression::Interpret(sz));
This is an explicitly qualified call to a member function. By telling the compiler that I want to call the Interpret method in the LiteralExpression class, I am bypassing the virtual function mechanism. Im effectively telling the compiler, Look, I know I told you that Interpret is virtual. But for this call, forget all that and call LiteralExpression::Interpret directly. And since LiteralExpression::Interpret is defined in situ, this call can and probably will be inlined, even though it is marked virtual. (This is one of those rare times when a compiler is allowed to inline a virtual function.) I have just turned a slow virtual call into a super-fast inline function call.
This is nice but far from ideal. First, I had to define an extra class. More code is bad; less code is good. Second, the parsing code needs to know about this special case in order to create a RepeatLiteral when required. This increases code complexity and creates maintenance problems. Third, only LiteralExpressions are benefiting from this special treatment. All the other Expression types are going to get jealous. And, theyre still going to be slow, too.
What Now?
What I have now is an ugly hack, a wart on my otherwise beautiful interpreter. But it is a hack that works, and its fast. It works because it uses type information. Since the compiler knows the exact type of the object being repeated, it can do at compile time what could only be done at run time before: function call resolution. But my hack is not generic. By now, youve heard the terms type information, compile time, and generic in the same paragraph, and a solution should be screaming out at you. Templates! Lets use templates to make a generic Repeater that knows the exact type of the object it is repeating. Listing 3 shows my new implementation.
I have replaced the generic RepetitionExpression class with a template class called Repeater. Repeater takes a concrete expression type as a template parameter. Internally, it stores a pointer to an object of exactly that type. In the Interpret method, it uses that type information to turn what would have been a virtual call into an inline call. Repeater is just as generic as my original RepetitionExpression, except that the polymorphism is happening at compile time instead of at run time, which is exactly when I want it to happen.
Youll notice that I added an extra virtual method in the RegularExpression base class. It is called RepeatMe(). The code that parses regular expressions into trees will call RepeatMe() on a RegularExpression to create and initialize a Repeater object for it. This is some nice syntactic sugar that will make it easier to write the parser code. (Since only the parser code will call it, RepeatMe() would probably be private in a real implementation.)
With these classes, I can build a tree representing the pattern /a*/ as follows:
RegularExpression * pRE = (new LiteralExpression('a')) ->RepeatMe(); const char *sz = "aaaaaabbbbbb"; pRE->Interpret( sz );
First, I create a new LiteralExpression. Then, I invoke RepeatMe() on it. This calls LiteralExpression::RepeatMe(), which creates a new object of type Repeater<LiteralExpression> and initializes it with the this pointer of the LiteralExpression object. This new Repeater is what gets Interpreted on the last line. The first call to Interpret is still virtual, and it gets you to Repeater<LiteralExpression>::Interpret via the usual virtual function call mechanism. After that, though, its smooth sailing. The Repeater will now execute code equivalent to:
while(repeat_-> LiteralExpression::Interpret(sz));
These calls are all inline. In the above example, I am matching against the string "aaaaaabbbbbb", so I will call LiteralExpression::Interpret once for each of the six 'a's and once for the first 'b', which will return false. I went from seven virtual function calls to no function calls at all! (This is assuming, of course, that the compiler did its job and actually inlined the calls for me.) This is a very nice result.
Taking Stock
Looking over the new implementation, you should notice how much it looks like my original implementation. The inheritance hierarchy is unchanged, I didnt have to add very much code, and the Interpreter design pattern survived intact. In fact, the new code is just as modular, generic, and extensible as it was before, with the exception that I need to implement RepeatMe() for each expression type that can be repeated. Thats a small price to pay, especially
considering that RepeatMe() is always going to be a one-liner. I can even implement it with a macro. In exchange, you get a nice little performance boost, having eliminated a bunch of virtual function calls.
As I mentioned earlier, I use this trick in my own regular-expression code. It was very easy to integrate this change into my existing code base. After just a few hours of work, much of it mindless cut-and-paste, I was rewarded with a 30% performance improvement in pattern matching time. I fight and claw for every percent of performance, so 30% in one fell swoop is huge!
Is it all roses? Did I pay anything for this performance? Sure. I instantiated a bunch of templates, and that takes up space. Its our old friend, the space/time tradeoff. In this case, though, it is not much of a tradeoff. I am instantiating a bunch of Repeater classes, which are small, so my performance gains are worth it. In your application, though, things might be different. If your grammar is very large, or if you turn a large class into a template, then your code bloat could bite you. As always, your mileage may vary.
Summary
I started with a slow, naive implementation of the Interpreter design pattern and finished up with a lean, mean language-interpreting engine. And I did it all with very little work and the help of templates. I believe that the Interpreter pattern is worthy of consideration for even the most demanding applications. If you need proof, you can check out my regular expression code. It is online at <http://research.microsoft.com/projects/greta>. Enjoy!
Notes
[1] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides.
Design Patterns Elements of Reusable Object-Oriented Software
(Addison Wesley, 1994).
[return to text]
[2] GoF stands for the Gang of Four, authors of Design
Patterns [1].
[return to text]
[3] Get the source code for the GRETA Regular Expression Template
Archive at <http://research.microsoft.com/projects/greta>.
It is free for non-commercial use.
[return to text]
[4] <www.boost.org>.
[return to text]
[5] Find the results of a performance comparision between
Boost regex++ and GRETA at <http://research.microsoft.com/projects/greta/regex_perf.html>.
[return to text]
About the Author
Eric Niebler studied Computer Science at the University of Virginia. He spent several years in the Windows 2000 group at Microsoft before moving to a development position at Microsoft Research in the Natural Language Processing group. He is now a library developer in the Visual C++ group. His interestes include data structures and algorithms; compiler, language, and library design; data serialization and persistence; and pattern matching. He can be contacted at [email protected].