There's an old engineering joke that goes: "Standards are great – everyone should have one!" In the not-so-distant past, this was pretty much the way things were when it came to floating-point calculations. Prior to 1985, all of the major computer manufacturers (CRAY, DEC, IBM, etc.) defined and implemented their own floating-point format, including the precision and rounding schemes to be used. Even worse, different machines from the same manufacturer might support different flavors of floating-point. And, in the absence of floating-point hardware, the various compilers implemented their own floating-point interpretations. The end result was that you could create a program in a language like C, compile and run it on the various platforms, and end up with different results on each machine. Not surprisingly, this was something of an annoyance for the end users.
In order to address this problem, the IEEE created a standard for floating-point called IEEE 754. This standard defines the way in which binary floating-point values are to be represented and the way in which mathematical operations are to be performed on these values. Released in 1985, this standard was quickly adopted by all of the major computer manufactures and compiler developers. IEEE 754 was followed in 1987 by IEEE 854; this defines a standard for radix-independent floating-point arithmetic, which – of course – includes decimal.
More recently, an ongoing revision to IEEE 754 (known as IEEE 754r) is currently in ballot. This defines a new decimal data type (and associated operations) that can be used for integer, fixed-point, and floating-point decimal arithmetic. The decimal-encoded formats and arithmetic described in the 754r draft have already been implemented in the IBM System z9 (mainframe) processor and will be shipped in the IBM Power6 processor, which is scheduled for mid-2007.
An appropriate hardware platform
Several companies have interesting hardware platforms available. Many of these are based on the concept of a motherboard sporting two of AMD's Opteron processors linked by a high-speed, low latency HyperTransport (HTX) bus. The idea is to remove one of the processors (each of which may be dual- or quad-core) and replace it with a pin-compatible FPGA card. The AMD processor is subsequently used to execute control-type tasks, while the FPGA module is used to perform algorithmically-intensive data-processing and number-crunching tasks. Meanwhile, the HyperTransport bus is used to move massive amounts of data around the system with extreme speed.
Editor's Note: AMD's initiative to promote the openness of the HyperTransport Bus is known as Torrenza; more recently, Intel have announced a proposal, codenamed Geneseo, to enhance PCI Express technology and open up their front side bus (FSB) to facilitate the same type of implementation.
A good example of this type of approach is offered by XtremeData, who combine an AMD processor with an FPGA-based module using high-capacity, high-performance FPGAs from Altera (Fig 1). Another example is provided by DRC Computer Corporation, who do much the same thing but with FPGAs from Xilinx (Fig 2).
1. XtremeData's XD1000 FPGA Coprocessor Module for Socket 940.
(Click this image to view a larger, more detailed version)
2. DRC's in-socket Reconfigurable Processing Unit the RPU110-L200.
(Click this image to view a larger, more detailed version)
As opposed to using the main processor bus, some solutions are plugged into a memory slot, while others are connected via an I/O slot.
Editor's Note: Many servers now include a Hypertransport (HTX) slot along with PCIe slot(s). The HTX slot attaches directly to the same bus that both processors use, so can communicate to all the processors is the same way as the 'socket' solutions discussed above (except that you now get to keep both processors on the motherboard).There is no reason there couldn't be multiple HTX expansion slots, but today's systems typically offer only one such slot. Two examples of systems supporting two processors and one HTX slot are the IBM x3455 and the HP DL145 Server.
There are many other vendors with interesting solutions, such as Celoxica, who plug into the HTX slot discussed above (Fig 3); Nallatech, who have a wide variety of solutions (Fig 4); and SRC Computers, who plug into a memory slot (Fig 5).
3. Celoxica's RCHTX high-performance computing (HPC) board.
(Click this image to view a larger, more detailed version)
4. Nallatech's BenONE FPGA-based computing card.
(Click this image to view a larger, more detailed version)
5. The MAP reconfigurable processor from SRC.
(Click this image to view a larger, more detailed version)