FREE Subscription to Dr. Dobb’s Digest: Same Great Content, New Digital Edition
Site Archive (Complete)
High Performance Computing
Email
Print
Reprint

add to:
Del.icio.us
Digg
Google
Furl
Slashdot
Y! MyWeb
Blink
August 03, 2007
How Much Scalability Do You Have or Need?

"Orders" of Throughput

(Page 1 of 5)
Herb Sutter
How many cores (or hardware threads) can your code harness to get its answers faster?
Herb is a software architect at Microsoft and chair of the ISO C++ Standards committee. He can be contacted at www.gotw.ca.


In your application, how many independent pieces of work are ready to run at any given time? Put another way, how many cores (or hardware threads, or nodes) can your code harness to get its answers faster? And when should the answers to these questions not be "as many as possible"?

Table 1 summarizes three "orders" of throughput available in a given application or operation, using big-Oh notation to describe the amount of CPU-intensive work that will typically be ready to be actively executed at a given time. The rest of this article overviews these orders of scalability and summarizes when each one is appropriate.

Last month [2], we looked at three "pillars" of concurrency: Pillar 1, isolation by structuring work asynchronously and communicating through messages; Pillar 2, scalable throughput by exploiting parallelism in algorithms and data structures; and Pillar 3, dealing with mutable shared state. This month's topic delves into throughput-oriented techniques using techniques in Pillar 1 and Pillar 2.

Order O(1): Single-Core O(K): Fixed O(N): Scalable
Tagline One thing at a time Explicit threading Re-enable the free lunch
Summary Sequential applications, and bottlenecked parallel applications Explicitly express how much work can be done in parallel Express lots of latent concurrency in a way that can be efficiently mapped to N cores
Examples Multithreaded code convoyed on a global lock or message queue, occasional or intermittent background work Pipelining, hardwired division of tasks, regular or continuous background computation Tree traversal, quicksort, compilation
Applicability Single-core hardware, single-threaded OS, or nonCPU-bound app Hardware with fixed concurrency, or app whose CPU-bound parts have limited scalability Hardware with variable (esp. growing) concurrency, and app with CPU-bound parts that are scalably parallelizable
Examples Code targeting legacy hardware, small embedded systems, single-core game consoles; simple text processor Game targeting one multicore game console generation; code whose key operations are order-sensitive (e.g., can be pipelined but not fully parallelized) Mainstream desktop or server software with CPU-bound features and targeting commodity hardware or future upgradeable game consoles
Pillar (see Notes [2]), and today's mainstream tools Pillar 1: Threads, message queues, futures Pillar 2: Thread pools, futures, OpenMP

Table 1: Orders of throughput.

1 How Much Scalability? | 2 O(1): Sequential Code | 3 O(K): Explicitly Threaded Code | 4 Comparing O(1) and O(K) | 5 O(N): Scalable Throughput And the Free Lunch Next Page
RELATED ARTICLES
No Related Articles
TOP 5 ARTICLES
No Top Articles.



MICROSITES
FEATURED TOPIC

ADDITIONAL TOPICS

INFO-LINK