Since the first microprocessors came out of the ovens, performance has been characterized in terms of word size and operating frequency. But as processor speeds hit 3 GHz and beyond, the high operating power of the processors--typically close to 100 watts or more per chip--has forced CPU designers to rethink the standard approach of increasing the clock speed to improve performance. The solution: By dividing the processing task among two or more processors integrated on the same chip, designers can reduce the clock speed, to lower the power consumption, while still delivering higher performance. That realization has sent all the major processor suppliers on a stampede to create CPU chips with two, four or more cores.
As process features shrink, it becomes a lot easier to integrate multiple cores and the associated resources, such as large caches, onto a single chip. But because of power constraints and resource integration limitations, multicore solutions in the PC and server segments have typically started as dual-core implementations. As designs become more power-efficient, more cores can be integrated. And as features shrink below 65 nanometers, larger caches and more system functions can be integrated as well.
Dual-core single-thread designs could transition to quad-core versions, or available quad-core single-thread designs could transition to two threads per core to achieve the next step up in performance, according to David Tuhy, general manager of the Desktop Products Division at Intel Corp. The addition of support for multiple threads is much less costly than the increase in logic needed to add four more cores to create an octal-core solution, Tuhy said. Indeed, at the quad-core level, two threads might be increased to four threads before integration levels are ready to deliver an octal-core solution with two threads per core.
Just providing the multithreaded silicon, however, won't be enough. Software tools are also needed, to ensure that the applications and operating systems can leverage the threaded architecture.
Intel, for instance, offers thread-checking and thread-profiling tools to help software developers ensure that their applications are properly threaded for optimum performance.
In addition to the basic multithreaded approach, a technique called transactional memory addresses many of the parallel-programming problems. In transactional systems, operations are performed on memory that are perceived to be atomic (all the operations are perceived to happen simultaneously); consistent (all or none of the memory operations are perceived to happen); isolated (there are no concurrent conflicting accesses to the same set of memory locations by any other agent); and durable (in the programming world, transactions do not encompass durability). In general, transactions allow multiple concurrent readers and enable modularity in the code, while automatically providing fine-grain locking. Those features make it easier to develop applications that can run across multiple CPUs.
The published road maps of the two key CPU suppliers in the PC and server markets, Intel and Advanced Micro Devices Inc., show that both companies plan quickly to follow up their dual-core processor offerings with quad-core versions of their processors. Intel has pulled in its projected introduction road map; it now plans to sample its quad-core Cloverton processor for dual-processor server and high-end workstation applications before the year is out, said Justin Rattner, chief technology officer and senior fellow at the company. Also in the works are quad-core CPUs for the desktop and higher-end servers: the Kentsfield and Tigerton CPUs, respectively.
One of the biggest challenges going forward will be benchmarking the system performance of multicore devices, Rattner said. To that end, Intel is working with Princeton University, the University of Pittsburgh and Stanford University to define the RMS suite, a set of benchmark programs using recognition, mining and synthesis.
AMD's road map, meanwhile, calls for the company to release four-core processors in the first half of 2007. Unlike Intel, which keeps the DRAM interface on the north-bridge chip, AMD's processor designs incorporate dual DDR2 memory interfaces on the CPU chip.