FREE Subscription to Dr. Dobb’s Digest: Same Great Content, New Digital Edition
Site Archive (Complete)
Email
Print
Reprint

add to:
Del.icio.us
Digg
Google
Furl
Slashdot
Y! MyWeb
Blink
December 29, 2008
Is the Free Lunch Really Over? Scalability in Many-core Systems: Part 1

Michael Wrinn
With the industry shift to multiple core systems, performance has to be realized through concurrency, with applications designed to scale as the number of cores increases.

During the "Serial Coding Era", that historic period in computer evolution when hardware platforms were built around one single-core processor, the more astute developers noticed that performance improvements to their work came more or less automatically, with release of the next generation processor. Blogger Joel Spolsky put it this way:

As a programmer, thanks to plummeting memory prices, and CPU speeds doubling every year, you had a choice. You could spend six months rewriting your inner loops in Assembler, or take six months off to play drums in a rock and roll band, and in either case, your program would run faster. Assembler programmers don't have groupies.

So, we don't care about performance or optimization much anymore.

While anachronistically posted (in autumn of 2007, when the Serial Coding Era was well and truly over), the message accurately describes choices once available; performance improvements were free -- just wait a bit for new hardware.

Herb Sutter, C++ and now concurrency guru for Dr. Dobb's, famously alerted us to the end of all that in his 2005 (!) article, The Free Lunch is Over. For many sound reasons, including semiconductor physics, chip design has moved to multiple cores; single-core performance gains will be substantially less dramatic than before. There are gains to be realized, but only with effort: software must now be designed and implemented for concurrency.

As Sutter and many others point out, concurrency is hard. Few developers are yet comfortable with it, traditional programming models ignore it, and the currently favored implementation mechanism -- threads -- is fraught with peril (race conditions). Among the challenges is scalability: how does the application perform as additional cores are added? We will examine aspects of scalability in this series of papers.

Predicting Scalability

The term scalability is nicely defined by blogger Werner Vogels:

A service is said to be scalable if when we increase the resources in a system, it results in increased performance in a manner proportional to resources added.

This concept is broadly applicable. It is a key concern for data centers, for example, in predicting the impact of adding more servers, or of implementing virtualization. We focus here more narrowly, on CPU cores as the resource under consideration, and confine the discussio n to threads as the implementation model.

We are reminded (Sutter, again) that "cores" are both an execution and a memory resource, thanks to local cache memory inherent in all current designs; we will address scaling considerations for both aspects of cores, in Chapter 3.

Things look grim (Amdahl was an optimist)

Scalability discussions nearly always start with "Amdahl's law", the clever-looking equation which states that serial code is not parallel code -- or rather, you cannot speed up serial code by adding more cores. A common expression of this is:

where Speedup, as a function of the number of processing units p, depends upon the values of the serial portion of code s, and the parallelizable portion (1-s). For example, consider a code which is 90% parallelizable (and thus s = 0.1), running on 8 cores (thus p = 8); the resulting Speedup is 4.7. This represents the maximum possible speedup realizable in this case. Figure 1 shows this result graphically, along with results for codes with larger serial components. Perfect scaling, where s = 0, is represented by the blue diagonal line; we are falling rather short of perfection, in these cases.

The results in Figure 1-1 show scaling predictions for modest systems, up to 8 cores. What happens in a many core system, or with a more favorable fraction of the code parallelizable? This is shown in Figure 1-2 (from Wikipedia), and the picture is not encouraging: even a code which is 95% perfectly parallelized will not speed up more than a factor of 20, regardless of the number of cores available.

Figure 1-1: Amdahl's Law curves for 8 cores, varying degrees of parallelism.
Figure 1-2: Amdahl's Law curves for 216 cores, varying degrees of parallelism.

Beyond these standard textbook considerations, the situation may be even worse: Amdahl's idealized expression ignores performance degradations from thread creation overhead, synchronization, and communication, so a "perfect" implementation will not reach even the meager scaling shown in these graphs. What hope, then, is there for parallel applications, especially for efficient execution on large numbers of cores?

Reevaluating Amdahl's Law (beware hidden assumptions)

The key flaw in Amdahl's Law was pointed out as early as 1988, by John Gustafson: it assumes either a fixed computational workload, or a fixed scaling of serial and parallel portions as the problem size is increased.

The alternative, often referred to as Scaled Speedup, looks instead at increasing the problem size for a given amount of time, and may be expressed as follows:

That is, the speedup for a given number of processors p is that number p, plus that part of the parallel code (1-p) spent on serial operations. There is a subtle difference, compared to Amdahl's law: this describes serial work when the application is run on p processors.

It is revealing to examine the scaling numbers for specific cases. Suppose an application is run on 64 cores, with a measured elapsed time of 220 seconds, with 11 seconds determined (through profiling, for example) to be dedicated to serial execution. Inserting these values into the Scaled Speedup formula gives:

That is, speedup is nearly a factor of 61, on 64 cores. If the same value for serial time, 0.05, is used in Amdahl's Law, that formula predicts a speedup of only 15 on 64 cores.

While the Scaled Speedup model makes the same idealizations of Amdahl's Law (no overhead, etc), it exposes a dramatically more optimistic view regarding scalability.

Conclusion

To this point, we have only begun to set the stage: Amdahl's Law does not tell the full story, and parallel scalability is possible, in many cases.

With a fully scalable application, a doubling in core number would result in a near-doubling, so the investment in scalability now will pay off in "free" performance improvements as the hardware core count increases. Then, the astute developers could simply take time offknow any rock bands needing a drummer?

In the next installment, we examine factors which inhibit scalability, and how to mitigate them.

TOP 5 ARTICLES
No Top Articles.
DR. DOBB'S CAREER CENTER
Ready to take that job and shove it? open | close
Search jobs on Dr. Dobb's TechCareers
Function:

Keyword(s):

State:  
  • Post Your Resume
  • Employers Area
  • News & Features
  • Blogs & Forums
  • Career Resources

    Browse By:
    Location | Employer | City
  • Most Recent Posts:
    MEDIA CENTER  more
    NetSeminar
    Reduce Costs, Risks and Resource Constraints with Web Application Security OnDemand
    Not surprisingly, Web application security & compliance continues to be a top priority for companies who rely on their Web site to transact business. But in these challenging economic times, managing costs while reducing risk becomes an added challenge. Join us for this 1-hour webcast and let us share with you the importance of Web application security and our Security as a Service. Date: Tuesday May 26, 2009 Time: 9 am PT/12 pm ET
    Modernize your Development by Moving Build and Code Quality Upstream
    Moderated by Jon Erickson, Editor-in-Chief of Dr. Dobb's, this interactive panel discussion brings industry experts Anders Wallgren, CTO of Electric Cloud and Gwyn Fisher, CTO of Klocwork together for a candid discussion of the cost savings, productivity and quality benefits that can be achieved by stabilizing builds and code quality as early in the development cycle as possible.

    The reality of today's development environment - geographically distributed teams, the use of Agile development practices, increasing application complexity, etc. - is straining the viability of the traditional coding, build and release process. To stay ahead of the curve, development teams are modernizing their approach to dealing with these issues, and as a result are achieving new levels of development productivity. Register for the webcast.
    Date: Wednesday, July 15, 2009
    Time: 11 am PT/2 pm ET
                                   
    CONTEST

    Challenge Winners Announced
    The results are in for Dr. Dobb's Challenge Deuce, and the winners are some of the most innovative Silverlight games out there. Play the winning games.
    INFO-LINK

    Resource Links:




    Techweb
    Informationweek Business Technology Network
    InformationweekInformationweek 500Informationweek 500 ConferenceInformationweek AnalyticsInformationweek Events
    Informationweek MagazineGlobal CIOIWK Government ITbMightyByte and SwitchDark Reading
    Digital LibraryIntelligent EnterpriseInternet EvolutionNetwork ComputingPlug Into The CloudDr. DobbsContentinople
    space
    TechWeb Events Network
    InteropVoiceConWeb 2.0 ExpoWeb 2.0 SummitEnterprise 2.0Mobile Business ExpoNoJitter
    Black HatGTECEnergy CampCloud ConnectGov 2.0 ExpoGov 2.0 Summit
    space
    Light Reading Communications Network
    Light ReadingLight Reading AsiaUnstrungCable Digital NewsInternet EvolutionPyramid Research
    Heavy ReadingLight Reading LiveLight Reading InsiderEthrnet ExpoTelco TVTower Technology Summit
    space
    Financial Technology Network
    Advanced TradingBank Systems and TechnologyInsurance and TechnologyWall Street and TechnologyAccelerating WallstreetBST SummitBuyside Trading SummitIT Summit
    space
    Microsoft Technology Network
    MSDNTechNetTotal IT ProTotal Dev ProNET Total Dev Pro CommunitySQL Total Dev Pro Community
    space