March 12, 2008
Super Linearity and the Bigger MachineSometimes you can have your scalability and boost it, tooHerb Sutter
Herb considers how to set superlinear speedups by harnessing more resources.
There are two main ways to achieve superlinear scalability, or to use P processors to compute an answer more than P times faster (see Figure 1):
Last month [1], we focused on the first point by illustrating parallel search and how it naturally achieves superlinear speedups when matches are not distributed evenly because some workers get "rich" subranges and will find a match faster, which benefits the whole search because we can stop as soon as any worker finds a match.
This month, we'll conclude examining the first point with a few more examples, and then consider how to achieve superlinear speedups by harnessing more resourcesquite literally, running on a bigger machine without any change in the hardware.
[Click image to view at full size]
Figure 1: Parallel scalability.
What Have We Learned?
As we saw last month, a parallel search often doesn't need to do anything special to exploit a nonuniform distribution of matches to find a match that much faster; a simple search algorithm will do, such as a linear search for arrays or a depth-first search for trees. Parallel search naturally exploits nonuniform distributions of matches:
Sequential search, on the other hand, cannot exploit nonuniformity at all without advance information about where the high-probability region are. Having the sequential algorithm visit nodes in a different or randomized order doesn't help by itself without additional information about where to start [2]. Even armed with a map of the high-probability match areas, the sequential algorithm would need to be made more complex to exploit the nonuniformity.
Finally, getting superlinear parallel algorithm performance typically relies on the ability to interrupt or cancel work. There are two aspects to this, one must-have and one good-to-have:
Other Sources of Superlinearity
Another way to achieve superlinear scalability is to exploit partial information. In some parallel algorithms, one worker can discover partial information that it can communicate to other workers to let those others do their work faster than they could do otherwise. For example, in a tree search, one worker might discover that "there are never matches under a node whose weight is over 100." Or, in computing solutions to a set of equations, one worker might discover that in any solution the value of variable x has to be between y-1 and y+4. Other workers can then use such partial information to run faster by excluding entire sections of their search spacesubtrees under heavy nodes, or regions, where x and y are too far apartwithout having to traverse those areas at all to inspect their contents for possible matches.
Still another potential path to superlinearity is to perform speculative execution. Let's say that you have two alternative algorithms available to compute a result, such as two search algorithms where one has better average-case performance and the other has better worst-case performance. Now, most of the time you're better off just picking one of the algorithms and using available hardware to run it in parallel. But sometimes you can do better by running both of them, and seeing which gets the answer first. The cost is that you're doing more work, but you can get the answer faster (including avoiding the bad worst case of one algorithm). Sometimes the extra work can be relatively "free," at least from your selfish point of view; for example, when the two operations are two database or web service queries against different servers, and one server might get the answer in a different way than the other because the servers may be experiencing different loads or may use different methods to compute the answer.
The bad news about all of the effects we've covered so far is that none of them are going to help you go superlinear with algorithms that have to visit every element in a range anyway, such as when you're calculating the sum of all nodes. The good news, however, is that there is another effect that can help: It's when using more threads lets you use a bigger machine, literally.
|
|
||||||||||||||||||||||||||||||
|
|
![]() |
||||
![]() |
|
|||
![]() |
||||
![]() |
||||
![]() |
|
|||
![]() |
|
|||