October 23, 2006
Abstraction and performance -- a bit of history, part 1
Programmers like to talk about "the abstraction penalty" -- the cost in performance for programming in an abstract style. The word "the" here is misleading, because it suggests that there is only one such penalty, independent of circumstances.
I have already observed that this point of view is too simplistic. Here's an example to illustrate that observation.
About 20 years ago, I was part of an organization that was trying to get C programmers to start (or at least consider) using C++. To do so, I would write C++ code to make everyday programming tasks easier; then I would give presentations that explained the code and showed examples of how to use it.
We already had list and string classes available, so I hit on the idea of writing an associative-array class. Because C++ did not yet have templates, I resorted to the C preprocessor to adapt this class to the index and object types. I believe that this is the first associative-array class ever to be written in C++.
In this class, I chose two ideas that persist in the C++ library to this day: Using an order-based, rather than a hash-based algorithm, and choosing an algorithm that was order log(n) in the worst case. Because I valued ease of implementation and robustness over micro-performance (after all, I was entirely responsible for implementation and support), I strove to make the code as clear as possible.
The result was an associative-array library that really did deliver its order log(n) performance claim, but with a fairly high constant overhead. Much of that overhead came from going to the operating system's memory allocator for each associative array element, and storing various pointers (used for debugging purposes) that were not strictly necessary.
When I looked around for an example to illustrate this library in context, I found the notion of topological sorting. A topological sort is an algorithm that takes in pairs of ordering constraints (e.g. A must precede B, and so on) and produces a list of items that meets all the constraints at once (if such a list exists). A common use of this algorithm was to reorder the modules in a library so that no module relied on any module that preceded it in the library. This strategy would allow a library to be searched in a single pass.
There was a system command that did topological sorting. It was written in C, and was a little more than 200 lines long. If I remember correctly, the C++ version using my library was about 30 lines long. So I was very pleased with myself until someone asked me: "How does the C++ version perform?"
(to be continued...)
Posted by Andrew Koenig at 11:33 AM Permalink
|