Books in Brief

By Ron Burk, December 01, 2000

December 2000/Books in Brief

Transaction Processing: Concepts and Techniques
Jim Gray and Andreas Reuter
1,070 pages
Morgan Kaufmann, 1992
$89.95
ISBN 1558601902
www.mkp.com

This is a book of both conceptual and detailed information about building transaction processing systems. The chapters are grouped into parts that include: The Basics of Transaction Processing; The Basics of Fault Tolerance; Transaction-Oriented Computing; Concurrency Control; Recovery; Transactional File System: A Sample Resource Manager; and System Surveys. Chapters include: Introduction; Basic Computer Science Terminology; Fault Tolerance; Transaction Models; Transaction Processing Monitors — An Overview; Transaction Processing Monitors; Isolation Concepts; Lock Implementation; Log Manager; Transaction Manager Concepts; Transaction Manager Structure; Advanced Transaction Manager Topics; File and Buffer Management; The Tuple-Oriented File System; Access Paths; and Survey of TP Systems.

There are a number of good reasons this book shouldn’t show up in this column. It will soon be ten years old, it’s not Windows-specific, and it costs close to one hundred dollars (which kept me from buying it for several years). The real compelling reason to include this book is the fact that many WDJ readers will sooner or later be involved in the design of a website that needs to not lose data. The authors claim the book’s purpose is to “give you an understanding of how large, distributed, heterogenous computer systems can be made to work reliably.” That’s exactly the understanding that anyone setting out to design a website needs most. Clearly, anyone can easily put together a web site that serves up static pages to a few users and works most of the time. However, a quick surf around the Web shows that very few people can put up a website that handles complex and dynamic customer interactions, scales up to handle large numbers of clients, and stays operational 99.99 percent of the time. Without ever focusing on the Web, per se, this book covers topics crucial to understand for anyone designing a scalable, reliable website.

Transaction processing is a pretty vague term, but what this book is really about is designing systems that tolerate failure. It’s tempting to think that by selecting a database product from a major vendor that offers transaction protection, you’ve automatically got a system that won’t fail. After all, your database vendor’s transaction protection software guarantees you that any given transaction will either completely finish, or be completely rolled back — even if a hard disk crashes or a machine dies in the middle of a transaction. It’s a seductive logic: why should I learn all about transaction processing when I can buy off-the-shelf software to handle those kinds of problems for me? Unfortunately, many of the websites running today that are screwing up data and crashing are using transaction protection systems that are working just fine.

For example, suppose I’m using your website to order a product. I type in my credit card and shipping information and press Enter. You’re using an Oracle or SQL Server back end with transaction protection, so if there’s a crash any time during my transaction, you know that you’re covered, right? Not so fast. If you’re doing real-time credit card verification, you’ll likely be talking to a third-party credit card processor at some point during my transaction. Suppose that the system crashes right after you ding my credit card for $50, but before you complete the transaction. When you bring the system back up, your fancy database system will detect that the transaction did not complete and will undo any changes it had made, leaving you with a clean database — and leaving me with a $50 bill (that resides on the credit card processor’s computer, which your database software knows nothing about) for a product I never got! The point here is that the off-the-shelf transaction processing services you can buy do not relieve you from having to think about and understand transaction processing issues if you want to build a truly reliable system. Even more important than building a reliable system, however, is knowing exactly what kinds of problems your system can’t tolerate and just how much damage can happen when one of those problems arises.

Most Windows programmers tend to think about reliability in terms of bugs; the more bugs I find and eliminate, the more reliable this program is. But this books shows that people who design highly reliable systems take a broader view — they assume that there will be bugs and try to build systems that keep working despite them! Here’s an excerpt from the chapter on fault tolerance that is both educational and entertaining:

When a production computer system crashes due to software, computer users restart the system and expect it to work the next time; after all, they reason, it worked yesterday. By using transactions, a recent consistent system state is restored so that service can continue. The theory is that it was a Heisenbug that crashed the system. A Heisenbug is a transient software error (a soft software error) that only appears occasionally and is related to timing or overload. Heisenbugs are contrasted to Bohrbugs which, like the Bohr atom, are good, solid things with deterministic behavior.

Although this is preposterous, the test of a theory is whether it explains the facts — and the Heisenbug theory does explain many observations. For example, a careful study by Adams of all software faults of large IBM systems over a five-year period showed that most bugs were Heisenbugs. Adams also concluded from this study that customers should not rush to install bug fixes for benign bugs, as the expense and risk are unjustified.

If you think that designing software to survive unknown bugs is just an academic exercise, consider the Apache web server. This is likely the most widely used web server software in the world and, despite the fact that many people customize it by adding their own code to it, it is also extremely reliable. The source of this reliability is largely due to a central framework that isolates the work to be done into separate processes and kills off and restarts any process that appears to have problems. In fact, Apache kills off and restarts processes periodically even if they don’t exhibit problems — which deals with creeping bugs, such as memory leaks. I’ve written code for Apache that had flat-out, scribble-on-memory bugs in it, and watched the server just keep plugging away, while it neatly killed and restarted any processes that invoked my bugs. This is a different way of thinking about reliability and one that you need if you want to design a highly reliable system.

From fault tolerance, to transactions, to messaging, to resource management, to databases, the authors have to cover a lot of ground in order to get their arms around the problem of building reliable systems. I usually dislike books that skip around to topics already covered by other books I own. In this case, however, the authors are hitting these topics from the unique perspective of reliability. For example, do I really need another book that tells me how to construct a B-Tree? Surprisingly, the answer is “yes,” because none of my other books discuss how to implement a B-Tree that supports transactions that can be rolled back later. In reading this book, I continually discovered either unfamiliar topics, or familiar topics made novel because of the perspective of reliability.

Despite the book’s size, you can profitably skim or dive into sections of particular interest to you. The book is encyclopedic in scope and goes from high-level conceptual information all the way down to particular algorithms and even modest amounts of code. The authors combine a knowledge of academic research with a relentlessly practical point of view — they constantly recommend the approaches and techniques which have proven more successful in the real world, while warning against those which have proven difficult to “get right.” Although not tied to this book in any way, there happens to be an open-source database system at www.sleepycat.com (The Berkeley Database) that demonstrates several concepts described here, in case you want to the see the complete code for a real-life transaction protection library.

This book costs at least twice what a typical programming book does, but offers much more than twice the quality of such books. It belongs on the shelf of any software company’s library, and at the desk of anyone who is learning to design large, reliable information systems.

Extreme Programming Explained: Embrace Change
Kent Beck
224 Pages
$29.95
Addison-Wesley, 1999
ISBN 0201616416
www.aw.com

The programming world has been between programming methodology fads for a while now. The last really big one was object-oriented programming, which was so huge that it collected multiple methodologies under its banner, and lasted so long that the recommendations of current practitioners barely resemble those made during its inception. This is a book about “Extreme Programming,” aka XP, a software development methodology that may be on its way to being the first big programming methodology fad of the new millennium. This is the first of a projected series of books about XP.

This book sets out to describe XP and its motivations well enough for you to decide whether or not your team should try extreme programming on a project. Chapters in the book include: Risk: The Basic Problem; A Development Episode; Four Variables; Cost of Change; Learning to Drive; Four Values; Basic Principles; Back to Basics; Management Strategy; Facilities Strategy; Splitting Business and Technical Responsibility; Planning Strategy; Development Strategy; Design Strategy; Testing Strategy; Adopting XP; Retrofitting XP; Lifecycle of an Ideal XP project; Roles for People; 20-80 Rule; What Makes XP Hard; When You Shouldn’t Try XP; and XP at Work. A related website is available at www.xprogramming.com.

After an overview, the author begins by modeling software development with four variables: cost, time, quality, and scope. XP declares that of these four, scope is the variable that you should seek to vary to satisfy the other three. The argument is made that, because business requirements always change over time, postponing implementing lesser features can represent a cost advantage. (You’re less likely to spend money developing a feature that later gets thrown away.)

The author next argues for creating only the very simplest design that will satisfy all of today’s requirements. If that sounds like he is arguing against generality, that’s exactly right. In fact, one of the anecdotal stories in the book is about discovering that a certain feature was not currently being used anywhere in the system and then stopping to take the time to strip that extra generality out! In the author’s words:

XP is making a bet. It is betting that it is better to do a simple thing today and pay a little more tomorrow to change it if it needs it, than to do a more complicated thing today that may never be used anyway.

To counter classic studies showing that the cost of adding features rises exponentially during a system’s lifetime, the author argues that this will not be the case with XP. He feels that the elements of simplistic design, thorough testing, and an attitude of constant design refinement, result in a system where the costs of making changes trend toward some affordable constant over time, rather than becoming exorbitant.

XP is described as being rooted in values and principles. The four values are communication, simplicity, feedback (constant feedback from tests, other programmers, and customers), and courage. The principles are: rapid feedback (you learn faster), assume simplicity (learning to stop anticipating the future), incremental change (always get from point A to point B in small steps), embracing change (pick the path that solves today’s problem, postpone anything else), and quality work (never sacrifice quality).

Eventually, the book gets into exactly what it is that XP programmers do. It starts with planning, where the customer has to participate in prioritizing features and planning the product as a series of incremental releases, rather than one big bang. The customer also has to participate in refining and changing the project plan over time, since development will start long before the plan is highly detailed. Programmers help the customer prioritize by providing time estimates for specific features. Actual coding begins right away, some of it purely exploratory to answer questions, some of it to begin constructing the minimal possible system that does anything useful. Programmers code in pairs and are expected to create tests for code before creating the code itself. (Testing is of primary importance in XP.) Coding is then the process of satisfying the test. Periodically, programmers detect that there is a simpler design than what has evolved via incremental coding, and they “refactor” — changing existing code to make the overall design simpler or cleaner. Instead of specializing or owning particular parts of a project, all the programmers are expected to be involved in all aspects of the project: planning, writing tests, coding, refactoring, and so on. The code is supposed to be integrated and tested as a whole with high frequency (preferably every few hours). A customer is expected to sit with the team. Overtime is expected to be the exception. As the author says, it’s not that these are brand new practices; it’s just a matter of taking certain well-known practices to an extreme.

In addition to explanations of all these practices, the book contains much detail about selling this strategy of development to programmers, customers, and management. One of the more surprising parts of the book is a chapter describing situations in which the author expects that XP won’t work or isn’t appropriate. Too often, authors selling a particular technique seem unaware that the world of software and programming problems is much larger than any one person’s personal experience.

In the preface, the author applies the adjective “scientific” to XP. Whenever I hear that word, I immediately look around for double-blind experiments that include a control group. However, if there is any scientific evidence whatsoever that XP produces better results than any other methodology, I did not find it in this book. Of course, that is no black mark against XP in particular — it turns out that there is very little real scientific evidence to support using any of the software methodologies of the past half century. Truly, everybody wants to get to heaven (have a software methodology), but nobody wants to die (perform controlled scientific studies). If you believe in the theory of memes, then you can see that there’s no incentive for software methodologies to encourage scientific study. Enthusiastically supportive anecdotal evidence is easy to obtain and tout, while controlled studies are expensive and most likely to produce ambiguous or only modestly encouraging results.

I’m a reformed True Believer in methodologies and currently view them as largely benign, no matter what form they take. As an analogy, I do not believe that palm reading works, but I do believe that some people who believe that they’re using a documented science of palmistry can do an amazing job of perceptively answering questions. Would it be outlandish to suggest that some people are very good at helping teams come together and produce better than average results, results that are not scientifically reproducible simply because the practitioner doesn’t accurately understand what is causing them? I also have a healthy respect for the Hawthorne effect, which means that the mere fact that someone is paying attention to the output of a group of workers can cause their productivity to improve — no matter what “methodology” (e.g., let’s see what happens if we paint all the walls pink this month) is used.

As software methodologies go, XP is somewhat refreshing. It is not obsessed with minutia, such as what geometric shape must represent which programming concept. It’s much more concerned with how programmers get their work done than with the quixotic search for a mechanical process that guarantees good software design. XP seems almost certainly benign in many ways — it’s hard to imagine a software project that would not benefit from having testing elevated to a primary project activity, or one that would suffer from increased communication between team members, or one that would not go more smoothly if the customer were more intimately involved. In the areas where XP may be malignant (e.g., decreasing documentation, software reuse, and generality of design), it may produce results no worse than approaches a given team is currently using on a given project. Also, XP gives everyone on your team a reason to stop feeling guilty about not actually understanding and using any of those more academic software methodologies, a dichotomy that Robert L. Glass has written eloquently about for years. Anything a team of programmers can agree on and feel good about is almost certainly beneficial for the project they’re working on.

But there’s something else that I think might be XP’s really important contribution to programming. If you filter out the people who have merely substituted XP for their previous religion and are mindlessly parroting its tenets and denigrating all expressions of disagreement, you’ll find there’s a group of people talking about programming. They’re talking about projects they’ve worked on, what seemed to work, what didn’t seem to work, and why they think that was. This ongoing, open-minded discussion about how programmers are really writing software and the results they’re getting is invaluable for the software community.

Code
Charles Petzold
Microsoft Press, 1999
393 pages
$22.39
ISBN 073560505X
mspress.microsoft.com/

Petzold is the author of probably the best general Windows programming book around, but this is a book about how computers work, starting from the very humblest beginnings, with switches. Chapters include: Codes and Combinations; Braille and Binary Codes; Anatomy of a Flashlight; Seeing Around Corners; Telegraphs and Relays; Our Ten Digits; Alternatives to Ten; Bit by Bit by Bit; Logic and Switches; Gates (Not Bill); A Binary Adding Machine; But What About Subtraction?; Feedback and Flip-Flops; Bytes and Hex; An Assemblage of Memory; Automation; From Abaci to Chips; Two Classic Microprocessors; ASCII and a Cast of Characters; Get on the Bus; The Operating System; Fixed Point, Floating Point; Languages High and Low; and The Graphical Revolution.

The book begins with a simple battery/switch/bulb circuit used to transmit Morse code. Combining historical and technical descriptions, the text brings the reader through increasing complexity all the way up to full-fledged computers (which are, after all, huge conglomerations of switches). The descriptions are truly excellent and highly approachable, and the historical progression provides a wonderfully detailed refutation to the growing percentage of the public that apparently believes that computers were handed to us by aliens, because we couldn’t possibly be smart enough to have invented them ourselves.

Of course, technical people will not find the book flawless. For example, my EE degree rolled over in its grave when I read this description of an NPN transistor:

A small voltage on the base can control a much larger voltage passing from the collector to the emitter.

Transistors are current devices, not voltage devices, and voltages do not “pass” from one point to another. But the nits are greatly outweighed by the fact that the book really can bring someone from knowing nothing about computers to the point of having a surprisingly solid and detailed understanding of how they work.

The book is marketed for the general public, which I suspect is not reading it in hordes. It’s not that the book is too difficult for anyone to follow, it’s that the general public simply does not have the motivation to learn all these details. Why should they? There’s just no need to know how a computer actually works in order to be good at using the software that runs on it. But I do think there is a perfect audience out there for this book. When primitive computers first started appearing in the electronic hobbyist magazines I read as a child, I eagerly pored over every description. Unfortunately, all the descriptions were similar and bad. (Memory was always described as a wall of post office boxes — I could never see why anyone wanted to store numbers in a bunch of boxes.) I would have been overjoyed to have had this book to read back then. This is an ideal Christmas gift for that teenager (or precocious pre-teen) you know who’s expressing an interest in programming, as opposed to just using, computers.

Got an opinion about these or other programming books? Send them to [email protected]. To submit books for review, see the guidelines at www.wdj.com/vendor/.

1 2 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Books in Brief

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Books in Brief

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content