Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Conversations:Making a Real Hash of Things


May 2002 C++ Experts Forum/Conversations


I was groggy when we landed, but free. After returning to Terran orbit and spending over a month in detention, we had been released when the tensions were defused. Control of the alien technology would be overseen by a newly formed body and protected by militia from all of the world's major factions.

We were waiting for the elevator. Jeannine grunted. "I feel so heavy. It's like acceleration that won't stop."

"We were used to it once; we'll acclimatize again." I sank back into the chair; it was too hard to stand for long. "How long is that blasted thing going to take?"

"I think they made the mistake of optimizing the elevator schedule to get the minimum average wait time." Jeannine sat too. "Which is too bad for us when we happen to hit the unbearable worst-case wait time, unfortunately."

"I'm just glad to be home."

There was quiet between us for a time, while we waited in the crowd. I was thinking of the trials of the past two years. I knew Jeannine was thinking about them too. "We sure made a hash of things," she said, finally, and I laughed tiredly. "Oh," she smiled, "that reminds you of a story, doesn't it? Well, go ahead. What is it, hon? Don't keep me in suspense."

And so I began to tell that story, just as I would continue to tell many others in the happy years to come, during our new life together...


It was the elevators and the word "hash" that made me think of that day, when I was at my first programming job, years ago.

"Oh, and d'ja hear what Bahb did?" Wendy was saying and chuckled with a merry gleam in her eye. "The new boss is big on standards. So Bahb decided that it was time to look good and be the fair-haired standards-conforming boy in his latest piece of code."

"But isn't that a good thing?" I asked, puzzled.

Wendy smiled and walked away. The words "Wait till you see his code..." wafted over her shoulder as she turned the corner into her cube. I shrugged mentally and went back to work.

It was later the next day that the first explosion hit. The sound seemed to come from the direction of Bob's cube, so I walked over to see what was going on. I found Bob glowering at his screen, muttering foul things under his breath. A forgotten latte was sitting, cold, at his elbow.

"Wassup?" I asked innocently. I was putting myself in a dangerous position, but I judged that the potential humor value was worth the risk.

Bob muttered something more, and I thought I caught the words "standards" and "stupid" and "slow" in the mix. "Williams hasn't got the faintest clue what it means to write production code," he grumbled on, more audibly now. "First he wants me to use standard stuff, now this."

I looked over Bob's shoulder and saw an email from the QA department. Skimming it quickly, I saw that it was complaining about the speed of the code Bob had checked in the day before. "What's their beef?" I asked.

"Speed," Bob said, scratching himself. "Speed. It seems the artsy-fartsy testing department doesn't care that my code works; they just claim it's not fast enough to meet spec."

"What's the spec?"

"Oh, constant-time lookup, on average. I could've told them that those stupid STL containers wouldn't hack it." In another window, I saw part of his code:

typedef map<string, Employee> SearchIndex;

// ...

SearchIndex::iterator result = ind.find( empname );

"How many map entries will there be?" I asked.

"Oh, lots," Bob waved vaguely.

Snap! The sudden sound of a book snapping shut made us both jump a bit. The Guru, who had once again appeared behind us silently, gave a stern look. She pointed at the cold latte, then at Bob's gut, and demanded: "How do you get so big eating food of this kind?" I suppressed a laugh, and Bob turned red and drew breath, but the Guru continued before Bob could respond: "This performance problem is not the fault of the blessed Standard Template Library. It is the fault of the programmer who uses the wrong container. Why did you use a map?"

"Because that's the blasted standard container for looking stuff up!" Bob groused irritably. "Why d'you think I've been pulling my hair out, sweetheart? Just shows those ivory-tower folks who designed the STL haven't got a clue what we need out here in the real world. This container with its internal trees and all always does logarithmic lookup, and that's just too slow for us."

"And, in this case, does it matter if sometimes a search may take a longer time, as long as the average case remains constant time?" the Guru asked sweetly. Something about her mien told me that she already knew the answer.

Bob saw it too. "No, we only care what the stress-test benchmarks show for millions of searches. You should know; you wrote the spec!"

"That," she agreed, "I did. The solution you require involves adding five characters to your code and using mostly-standard-conforming containers that are not, yet, themselves part of the C++ Standard."

"Huh?" Bob and I said together.

In answer, the Guru reached over and slightly modified one line of Bob's source code:

typedef <span style='color:blue'>hash_</span>map<string, Employee> SearchIndex;

"Now you have constant-time lookup, in the average case," she said. "In production code, when you want associative containers, you almost always want hash-based containers, and not tree-based ones. Note that the hash_map is not yet standard, but the one we have here is available to us on all the platforms we are using [1]. You must change the included header name, of course. You should also check whether the default hash function is good for your data and customize it if needs must. Test it, time it, and if it suffices check it in."

And before we could ask anything more, she had already reopened her tome and walked off. Thinking it was a good time to disappear, I too made my escape before Bob could recover.

Later that day, at the water cooler, I learned that Bob had made the suggested change, and that it indeed had solved the problem. The QA department was happy, our manager Pete Williams was happy, Bob was unhappy but he was that way most of the time anyway, and I thought I'd heard the last of it.

Not so.

It was two days later that the second explosion happened. I rushed over to Bob's cube and saw almost the identical scene: a cold latte, a heated Bob, and an email from QA. This time, I saw after a quick glance, QA was complaining about the speed of a new piece of code Bob had written the night before and checked into version control.

"Hey, Bob," I said innocently. "What's got their shorts in a knot now?"

"Speed," Bob said, scratching himself. "Speed. What do they want from me? My code works! I'm even using a hash_map just to suit the powers-that-be and their blessed coding requirements. Now those idiots in QA have the nerve to complain again that my code's not fast enough for spec."

"What's this code for?"

"Oh," Bob waved his hand listlessly, "I think they're using my routine in a real-time interrupt handler or something." I managed to catch a glimpse of his source:

typedef hash_map<Descriptor, Resource> Lookup;

// ...

Lookup::iterator result = res.find( d );

"How many map entries will there be?" I asked again.

"Oh, lots," Bob waved vaguely as before.

Snap! There she was again, and although I'd half-suspected the Guru would be nearby I was still surprised. So was Bob. " 'One size fits all,' is it?" she asked quietly. "Why are you using a hash_map here?"

Bob turned deep red. I backed away a few steps, sensing that today his fuse was short and burning quickly. "Because," he said carefully, "just a couple of days ago you told me that I should always use those nonstandard hash-based containers!"

The Guru shook her head sadly. "I said nothing of the sort." Bob nearly exploded, but she continued: "I said you almost always want hash-based containers. My acolyte?" she added, turning now to me.

The last thing in the world I wanted was to be roped into this situation. "Uh, yes?" I uhhhed as intelligently as I could.

A moment of silence passed between us, and another. Then: "Did you ever hear, my child," the Guru asked pensively, "of the man who drowned crossing a river that averaged only three feet deep?"

A ray of light gleamed in my mind. "Ah," I nodded, "and the man who waited fifteen minutes for an elevator that averaged only a one-minute wait. The usual containers are tree-based and they always offer logarithmic lookup, in both the average and worst cases. That makes hash-based containers faster on average — and if you pick a suitable hash function," I was quick to add and got the instant gratification of the Guru's nod and smile, "but their worst case might be, uh, worse, is that it?"

"Indeed. My child, never," she emphasized, "ever confuse the average and worst cases." The Guru reopened the tome she was carrying and found a different page. "Here," she pointed, and: "Hear. The prophet Musser." She looked over the edge of the book at me, then at Bob, then back down to the page, and read: "'Worst-case performance of hash table operations can be very bad — linear in the size, N, of the table — under some circumstances, but that of balanced binary trees is always O(log N).'" [2]

"Right," I pressed on, eager to show off my understanding over Bob's. "And two days ago all that mattered was the total time for millions of searches, so it didn't matter if a handful were slow, only the average case was important. But with a service routine..." I trailed off.

"...you have a hard upper limit on the amount of time that may be consumed," she confirmed, "and the worst case becomes the more important criterion to manage."

Bob just kept looking from one of us to the other and back, as though he were watching a tennis match. "Enough!" he bellowed, and got up so quickly his chair nearly tipped over, and stormed off in the direction of the local latte outlet.

"Indeed," the Guru continued, also turning to leave, "hashing offers the advantage of constant average time for insertion and searching, while tree-based lookup offers reliable performance. Choose which you need. As I said," she smiled back mischievously for a moment, "'in production code, when you want associative containers, you almost always want hash-based containers...'"


Our wedding day came a few months later, a quiet family event, a new start. Some political infighting continued in the world at large, as nations continued to jockey for position and advantage with the alien technology, but it didn't concern me any more. It didn't concern us. We were well shut of that.

One sunny afternoon not many weeks after the wedding, Jeannine sat in our new living room, thumbing through a book.

"What's that?" I asked, and she showed me the book and the passage she was reading. "Oh," I said, "of course. Well, will our road go ever on and on, do you think?"

"'Down from the door where it began,'" she agreed, smiling. "And I think we have an ending for the first volume of our book, too. 'And they lived happily until the end of their days,' see Volume Two for details."

I laughed, and we laughed, and it was true. And that is the end of the story of this part of our life, of how Jeannine and I met and went to Jove, and made wonderful discoveries and had terrible adventures, and safely returned. But our happy life went on, and you may be sure that in our later life together we would often continue to reminisce about the events of our youth, long before we'd ever met, and I would tell Jeannine about many more funny and touching and dangerous and memorable times I'd had, together with Wendy, and Bob, and the local Guru.

References

[1] Implementations of hash-based containers are typically called hash_set, hash_multiset, hash_map, and hash_multimap. You can get them in the box with several popular C++ compilers, including Comeau (which uses a modified version of the SGI STL or Dinkumware standard library), Metrowerks (which has implemented its own, originally based on Modena's but since completely redesigned and rewritten), and Microsoft (which uses the Dinkumware standard library). They're also included in several popular third-party standard library implementations that you can probably use with your existing compiler, including Dinkumware, SGI STL, and STLport (which is based on the SGI STL). Some implementation details vary; consult your library's documentation.

[2] D. Musser et al. STL Tutorial and Reference Guide, Second Edition (Addison-Wesley, 2001).

Jim Hyslop is a senior software designer at Leitch Technology International Inc. He can be reached at [email protected].

Herb Sutter (<www.gotw.ca>) is secretary of the ISO/ANSI C++ standards committee, author of the acclaimed books Exceptional C++ and More Exceptional C++, and one of the instructors of The C++ Seminar (<www.gotw.ca/cpp_seminar>). In addition to his independent writing and consulting, he is also C++ community liaison for Microsoft.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.