What This Means for Software: The Next Revolution
In the 1990s, we learned to grok objects. The revolution in mainstream software development from structured programming to object-oriented programming was the greatest such change in the past 20 years, and arguably in the past 30 years. There have been other changes, including the most recent (and genuinely interesting) nascence of web services, but nothing that most of us have seen during our careers has been as fundamental and as far reaching a change in the way we write software as the object revolution.
Until now. Starting today, the performance lunch isn't free any more. Sure, there will continue to be generally applicable performance gains that everyone can pick up, thanks mainly to cache size improvements. But if you want your application to benefit from the continued exponential throughput advances in new processors, it will need to be a well-written, concurrent (usually multithreaded) application. And that's easier said than done, because not all problems are inherently parallelizable and because concurrent programming is hard.
I can hear the howls of protest: "Concurrency? That's not news! People are already writing concurrent applications." That's true. Of a small fraction of developers.
Remember that people have been doing object-oriented programming since at least the days of Simula in the late 1960s. But OOP didn't become a revolution, and dominant in the mainstream, until the 1990s. Why then? The reason the revolution happened was primarily because our industry was driven by requirements to write larger and larger systems that solved larger and larger problems and exploited the greater and greater CPU and storage resources that were becoming available. OOP's strengths in abstraction and dependency management made it a necessity for achieving large-scale software development that is economical, reliable, and repeatable.
Similarly, we've been doing concurrent programming since those same dark ages, writing coroutines and monitors and similar jazzy stuff. And for the past decade or so, we've witnessed incrementally more and more programmers writing concurrent (multithreaded, multiprocess) systems. But an actual revolution marked by a major turning point toward concurrency has been slow to materialize. Today, the vast majority of applications are single-threaded, and for good reason.
By the way, on the matter of hype: People have always been quick to announce "the next software development revolution," usually about their own brand-new technology. Don't believe it. New technologies are often genuinely interesting and sometimes beneficial, but the biggest revolutions in the way we write software generally come from technologies that have already been around for some years and have already experienced gradual growth before they transition to explosive growth. This is necessary: You can only base a software development revolution on a technology that's mature enough to build on (including having solid vendor and tool support), and it generally takes any new software technology at least seven years before it's solid enough to be broadly usable without performance cliffs and other gotchas. As a result, true software development revolutions like OOP happen around technologies that have already been undergoing refinement for years, often decades. Even in Hollywood, most genuine "overnight successes" have really been performing for many years before their big break.
Concurrency is the next major revolution in how we write software. Different experts still have different opinions on whether it will be bigger than OO, but that kind of conversation is best left to pundits. For technologists, the interesting thing is that concurrency is of the same order as OOP both in the (expected) scale of the revolution and in the complexity and learning curve of the technology.
Benefits and Costs of Concurrency
There are two major reasons for which concurrency, especially multithreading, is already used in mainstream software. The first is to logically separate naturally independent control flows; for example, in a database replication server I designed, it was natural to put each replication session on its own thread because each session worked completely independently of any others that might be active (as long as they weren't working on the same database row). The second and less common reason to write concurrent code in the past has been for performance, either to scalably take advantage of multiple physical CPUs or to easily take advantage of latency in other parts of the application; in my database replication server, this factor applied as well, and the separate threads were able to scale well on multiple CPUs as our server handled more and more concurrent replication sessions with many other servers.
There are, however, real costs to concurrency.
Some of the obvious costs are actually relatively unimportant. For example, yes, locks can be expensive to acquire, but when used judiciously and properly, you gain much more from the concurrent execution than you lose on the synchronization, if you can find a sensible way to parallelize the operation and minimize or eliminate shared state.
Perhaps the second-greatest cost of concurrency is that not all applications are amenable to parallelization.
Probably the greatest cost of concurrency is that concurrency really is hard: The programming model, meaning the model in programmers' heads that they need to reason reliably about their program, is much harder than it is for sequential control flow.
Everybody who learns concurrency thinks he understands it, but ends up finding mysterious races he thought weren't possible and discovers that he didn't actually understand it after all. As developers learn to reason about concurrency, they find that usually those races can be caught by reasonable in-house testing, and they reach a new plateau of knowledge and comfort. What usually doesn't get caught in testing, however, except in shops that understand why and how to do real stress testing, are those latent concurrency bugs that surface only on true multiprocessor systems, where the threads aren't just being switched around on a single processor but really do execute truly simultaneously and thus expose new classes of errors. This is the next jolt for developers who thought that, surely now, they know how to write concurrent code: I've come across many teams whose application worked fine even under heavy and extended stress testing, and ran perfectly at many customer sites, until the day that a customer actually had a real multiprocessor machineand then deeply mysterious races and corruptions started to manifest intermittently. In the context of today's CPU landscape, then, redesigning your application to run multithreaded on a multicore machine is a little like learning to swim by jumping into the deep endgoing straight to the least forgiving, truly parallel environment that is most likely to expose the things you got wrong. Even when you have a team that can reliably write safe concurrent code, there are other pitfalls; for example, concurrent code that is completely safe but isn't any faster than it was on a single-core machine, typically because the threads aren't independent enough and share a dependency on a single resource that reserializes the program's execution. This stuff gets pretty subtle.
Just as it is a leap for a structured programmer to learn OOP ("what's an object?" "what's a virtual function?" "how should I use inheritance?" and beyond the "whats" and "hows," "why are the correct design practices actually correct?"), it's a leap of about the same magnitude for a sequential programmer to learn concurrency ("what's a race?" "what's a deadlock?" "how can it come up, and how do I avoid it?" "what constructs actually serialize the program that I thought was parallel?" and beyond the "whats" and "hows," "why are the correct design practices actually correct?"). The vast majority of programmers aren't there today, just as the vast majority of programmers 15 years ago didn't yet grok objects. But the concurrent programming model is learnable, particularly if we stick to lock-based programming, and once grokked, it isn't that much harder than OOP and hopefully can become just as natural. Just be ready and allow for the investment in training and time, for you and for your team.
(I deliberately limit the aforementioned discussion to lock-based concurrent programming models. There is also lock-free programming, supported most directly at the language level in Java 5 and in at least one popular C++ compiler. But concurrent lock-free programming is known to be very much harder for programmers to understand and reason about than even concurrent lock-based programming. Most of the time, only systems and library writers should have to understand lock-free programming, although virtually everybody should be able to take advantage of the lock-free systems and libraries those people produce.)
What it Means for Us
Okay, back to what it means for us.
- The clear primary consequence we've already covered is that applications will increasingly need to be concurrent if they want to fully exploit CPU throughput gains that have now started becoming available and will continue to materialize over the next several years. "Oh, performance doesn't matter so much, computers just keep getting faster" has always been a naïve statement to be viewed with suspicion, and for the near future, it will almost always be simply wrong.
- Now, not all applications (or more precisely, important operations of an application) are amenable to parallelization. Some problems, such as compilation, are almost ideally parallelizable. Others aren't; the usual counterexample here is that just because it takes one woman nine months to produce a baby doesn't imply that nine women could produce one baby in one month. You've probably come across that analogy before. But did you notice the problem with leaving the analogy at that? Here's the trick question to ask the next person who uses it on you: Can you conclude from this that the Human Baby Problem is inherently not amenable to parallelization? Usually, people relating this analogy err in quickly concluding that it demonstrates an inherently nonparallel problem, but that's actually not necessarily correct at all. It is indeed an inherently nonparallel problem if the goal is to produce one child. It is actually an ideally parallelizable problem if the goal is to produce many children! Knowing the real goals can make all the difference. This basic goal-oriented principle is something to keep in mind when considering whether and how to parallelize your software.
- Perhaps a less obvious consequence is that applications are likely to become increasingly CPU-bound. Of course, not every application operation will be CPU-bound, and even those that will be affected won't become CPU-bound overnight if they aren't already, but we seem to have reached the end of the "applications are increasingly I/O-bound or network-bound or database-bound" trend, because performance in those areas is still improving rapidly (gigabit Wi-Fi, anyone?) while traditional CPU performance-enhancing techniques have maxed out. Consider: We're stopping in the 3GHz range for now. Therefore, single-threaded programs are likely not going to get much faster any more for now except for benefits from further cache size growth (which is the main good news). Other gains are likely to be incremental and much smaller than we've been used to seeing in the past; for example, as chip designers find new ways to keep pipelines full and avoid stalls, which are areas where the low-hanging fruit has already been harvested. The demand for new application features is unlikely to abate, and even more so the demand to handle vastly growing quantities of application data is unlikely to stop accelerating. As we continue to demand that programs do more, they will increasingly often find that they run out of CPU to do it unless they can code for concurrency.
There are two ways to deal with this sea change toward concurrency. One is to redesign your applications for concurrency. The other is frugality, or writing code that is more efficient and less wasteful. This leads to the third interesting consequence:
- Efficiency and performance optimization will get morenot lessimportant. Those languages that already lend themselves to heavy optimization will find new life; those that don't will need to find ways to compete and become more efficient and optimizable. Expect long-term increased demand for performance-oriented languages and systems.
- Finally, programming languages and systems will increasingly be forced to deal well with concurrency. Java has included support for concurrency since its beginning, although mistakes were made that later had to be corrected over several releases to do concurrent programming more correctly and efficiently. C++ has long been used to write heavy-duty multithreaded systems well, but it has no standardized support for concurrency at all (the ISO C++ standard doesn't even mention threads, and does so intentionally), and so typically, the concurrency is of necessity accomplished by using nonportable platform-specific concurrency features and libraries. (It's also often incomplete; for example, static variables must be initialized only once, which typically requires that the compiler wrap them with a lock, but many C++ implementations do not generate the lock.) Finally, there are a few concurrency standards, including pthreads and OpenMP, and some of these support implicit as well as explicit parallelization. Having the compiler look at your single-threaded program and automatically figure out how to parallelize it implicitly is fine and dandy, but those automatic transformation tools are limited and don't yield nearly the gains of explicit concurrency control that you code yourself.
If you haven't done so already, now is the time to take a hard look at the design of your application, determine what operations are CPU-sensitive now or are likely to become so soon, and identify how those places could benefit from concurrency. Now is also the time for you and your team to grok concurrent programming's requirements, pitfalls, styles, and idioms.
A few rare classes of applications are naturally parallelizable, but most aren't. Even when you know exactly where you're CPU-bound, you may well find it difficult to figure out how to parallelize those operations; all the more reason to start thinking about it now. Implicitly parallelizing compilers can help a little, but don't expect much; they can't do nearly as good a job of parallelizing your sequential program as you could do by turning it into an explicitly parallel and threaded version.
Thanks to continued cache growth and probably a few more incremental straight-line control flow optimizations, the free lunch will continue a little while longer; but starting today, the buffet will only be serving that one entrée and that one dessert. The filet mignon of throughput gains is still on the menu, but now it costs extra-extra development effort, extra code complexity, and extra testing effort. The good news is that for many classes of applications the extra effort will be worthwhile because it will let them fully exploit the continuing exponential gains in processor throughput.