The forthcoming quad-core chips will leverage a highly modular design, said Phil Hester, senior vice president and chief technology officer at AMD. The approach will allow the company to craft multiple versions of a processor in very little time, thus reducing time-to-market, Hester said.
The quad-core chips will add enhancements not offered in the current, dual-core Opteron 64 processors. The forthcoming chips will include a shared Level 3 cache of at least 2 Mbytes, enhanced branch prediction and instruction-set enhancements to the streaming single-instruction, multiple-data (SIMD) extensions. Bit-manipulation extensions will also be added to the instruction set. And the on-chip floating-point unit will be able to deliver up to four dual-precision floating-point operations per cycle.
IBM Corp. and Sun Microsystems Inc. have also developed multicore solutions for server systems. And even Freescale Semiconductor Inc. has developed a dual-core version of its AltiVec PowerPC processor, targeting system platforms from Apple Computer and various embedded-computing applications.
Most of the processors integrate two CPU cores on a single chip. Extensions to the cores allow designers to run multiple program threads, or they may employ virtualization technology (so that multiple operating systems can run simultaneously) or other features that simplify the system implementation.
IBM's latest dual-core processor, the PPC970MP, unveiled this year at the International Solid-State Circuits Conference, is an extension of the PPC970 design the company introduced at ISSCC two years ago, said Brad McCredie, an IBM fellow and chief architect of the Power 6. The PPC970MP packs two 64-bit PPC970 cores that each contain a 64-kbyte Level 1 instruction cache, 32-kbyte Level 1 data cache and 1-Mbyte unified Level 2 cache.
Each processor can dispatch up to five instructions per cycle and issue one instruction per cycle to each of its execution units. In each execution unit, there are two integer, two floating-point, two load/store and two SIMD execution units, as well as two additional units that execute control operations. Those resources let programmers get the most out of multithreading and eliminate some of the architectural constraints of previous-generation Power processors, said McCredie.
Although IBM had published research on a two-thread-per-core CPU as far back as 1998, for the most part companies are just starting to leverage the advantages of multithreading. The exception has been Sun Microsystems: Its Niagara T1, an eight-core processor for servers and server blades, executes four program threads per core, thus effectively looking like a 32-processor subsystem-on-chip. To feed the multiple processors, Sun incorporated four DDR2 memory interfaces, each 144 bits wide.
At the recent Hot Chips conference, Sun designers unveiled the next-generation version of the Niagara processor. The T2 sports some enhancements that allow it to deliver double the throughput of the T1, said Rick Hetherington, chief architect and distinguished engineer at Sun.
Chip designers at the company doubled the number of threads that each processor core can execute and increased the size of the on-chip L2 cache by 33 percent, bringing the on-chip 16-way associative L2 cache up to 4 Mbytes. The designers modified the internal pipeline of each CPU by creating two independent pipelines, each capable of executing four threads, Hetherington said.