Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Embedded Systems

Alternative computing solutions, from single cores to arrays of 'things'


Multiple Processors (Homogeneous)
Perhaps the most famous early example of using multiple processors was the INMOS transputer chip, which surfaced in the mid 1980s (the all lowercase "transputer" was the official written form). As a point of interest, the native programming language for the transputer was occam (again, the all lowercase "occam" was the official written form), which was named in honor of the 14th century English philosopher and Franciscan friar William of Ockham, also spelled Occam (1286"1348 give or take a few years).

Each transputer chip contained a single processor that was designed to communicate with – and work in parallel with – other transputers. The idea was that users could hook as many transputer chips together on a circuit board as was necessary to satisfy the computational requirements of the target application. Many believed that the transputer was going to be the next great leap in computing, but creating programs that ran efficiently on this parallel architecture was non-trivial, and the transputer eventually faded away.

Although most non-engineers don't realize it, it is actually very common for systems to use multiple processors. Consider a home computer, for example; in addition to the main CPU, the keyboard will also have its own processor; each hard disk and optical (CD/DVD) drive will typically contain two or more processors, and so forth. Even a simple "USB Memory Stick" contains its own processor, which is used to make the contents of the stick appear to be a hard disk drive as far as the host computer's operating system is concerned.

However, the above examples are characterized by the fact that these multiple processors all have very focused well-partitioned tasks that can be largely performed in isolation. It is much more complicated to have tightly-coupled homogeneous processors, such as the dual-core chips that are now available from AMD and Intel (the term "homogeneous" means that these processing elements are of the same kind). Another term that is applicable to this type of configuration is symmetric multiprocessing (SMP), which means that the view of the rest of the system – memory, input/output, operating system, etc. – is exactly the same (i.e. "symmetrical") for each processor.

When moving from a single processor/core to a dual-processor/core configuration, the system becomes noticeably more responsive, and users don't experience those annoying "hang-ups" and "stalls" that are the hallmark of a single-processor environment. And two processors are only the start; for example, Intel is already talking about a four-core microprocessor called "Clovertown," which is expected to appear on the market in early 2007.

Meanwhile, Sun Microsystems (www.sun.com) is already fielding an eight-core processor called the Ultrasparc T1. Formally known as Niagara, this extreme-performance device is well-suited to highly-threaded commercial environments, such as thread-aware web servers, applications servers, and database servers. Of particular interest is that fact that Sun is open sourcing this chip; the register transfer level (RTL) representation of this device was made available to the engineering community when the www.opensparc.net website went live on January 24th 2006.

And if you think an eight-core processor is impressive, you should check out the Vega processor chip from Azul Systems (www.azulsystems.com). The current implementation of this device boasts an array of twenty-four 64-bit CPU cores, and Azul have announced that a forty-eight core version will be made available in 2007.

Before we move on, we should also make mention of the Multicore Association (www.multicore-association.org), which is a new industry group focused on companies involved with multi-processor hardware, software, and system implementations.

Multiple Processors (Heterogeneous)
As opposed to using multiple identical cores, it may be preferable to use a mixture of dissimilar cores. For example, the main digital chip in even the most rudimentary cell phone will typically contain at least one CPU core (to manage the human-machine interface) coupled with at least one DSP core (to perform the baseband signal processing functions). Such solutions are referred to as being "heterogeneous," meaning "consisting of dissimilar elements or parts."

One example of this type of scenario is the Cell processor from IBM (www.ibm.com), which is a single chip containing a general-purpose CPU core tightly coupled with eight DSP cores [IBM actually call these DSP cores Synergistic Processor Elements (SPEs); these little scamps contain floating-point engines and other units; they are predominantly used for graphics calculations.]. Another example is a high-end cell phone, which may include two or more CPU cores and two or more DSP cores combined with large numbers of hardware accelerator blocks and peripheral functions.

Things are further complicated by the fact that the processing cores and other functional units may have their own individual memories along with shared memory structures; and everything may be connected together using multi-level buses and cross-point switches (some of the larger chips actually feature a Network-on-Chip (NoC), which the various processors and peripherals use to communicate with each other). One term which is commonly associated with this type of environment is asymmetric multiprocessing (AMP or ASMP), in which computational tasks (or threads) are strictly divided by type between processors.

Large Arrays of "Things"
One way to think of the hardware used to perform computations is in terms of its granularity. The finest level of granularity is provided by an application-specific integrated circuit (ASIC) or application-specific standard part (ASSP), in which algorithms can be hand-crafted in silicon at the level of individual logic gates. (An ASIC is a device that is custom-created for a particular application and is intended for use by only one – or very few – companies. By compassion, ASSPs are devices that are created using ASIC technologies, but that are intended to be sold as standard parts to anybody who wants to use them.)

Next, we have FPGAs with their lookup tables (LUTs). These are off-the-shelf chips that can contain the equivalent of tens of thousands to tens of millions of logic gates. FPGAs are designed in such a way that they can be configured (programmed) to perform some desired function or functions; the SRAM-based versions of these devices have the advantage that they can be reconfigured as required. [Structured ASICs may be considered to occupy a space somewhere between ASICs and FPGAs, especially in the case of devices from eASIC (www.easic.com), which combine custom routing with FPGA-like SRAM-based LUTs.]

Note that we might decide to include one or more hard processor cores on an ASIC or ASSP, in which case we would refer to this device as a System-on-Chip (SoC). Similarly, we might decide to include one or more hard and/or soft processor cores on an FPGA (which may also be viewed as an SoC by some folks). All of these cases would then be considered to be a hybrid solution involving a mixture of traditional processor core(s) and algorithms implemented in gates/LUTs/etc.

In recent years, a number of companies have started to offer more exotic architectures, each of which is applicable to a focused set of computational applications. If we consider these offerings in terms of granularity, then the first step above traditional FPGAs would be an architecture such as that provided by Elixent (www.elixent.com). This reconfigurable algorithm processing (RAP) architecture – which is targeted toward the efficient implementation of arithmetic/DSP functions – is based on an array of 4-bit arithmetic-logic units (ALUs) in a "sea" of programmable interconnect. These ALUs can be linked using fast carry chains so as to implement wider functions. In addition to forming part of a datapath, the output of one ALU may be used to select the instruction of another ALU. The programming model for these devices is to take the same register transfer level (RTL) representation used to create an ASIC or to configure (program) an FPGA and to use an appropriate synthesis engine to generate a corresponding configuration file.

Next, we have the field programmable object array (FPOA) architecture from MathStar (www.mathstar.com). An example FPOA device may contain around 400 silicon "objects" in the form of 16-bit ALUs (each with its own instruction cache and scratchpad memory), register files, and multiply accumulators (MACs) – along with internal RAM banks and external high-speed memory interfaces – all of which can communicate with each other through programmable interconnect fabric. Each object can be programmed individually and acts autonomously. All of the objects and the interconnect run at 1 GHz. In addition to general-purpose I/O (GPIO) pins, the FPOA boasts high-speed I/O that can transmit and receive 2 - 32 GB/s. The main programming model for these devices is to use a graphical interface that generates SystemC, and the target application area is for compute-intensive DSP tasks such as edge detection and pattern recognition for robotic vision systems with high-frame-rates and high resolutions.

Another group of architectures may be classed as comprising one (or a small number) of general-purpose CPU cores coupled with an array of processing elements (PEs). Depending on the implementation, each of these PEs can contain multipliers, adders, ALUs, MACs, counters, synchronizers, memory, etc. Three good examples of this concept are IPFlex (www.ipflex.com) with an off-the-shelf device comprising two CPUs and hundreds of 32-bit PEs; ClearSpeed (www.clearspeed.com) with an off-the shelf device comprising a general-purpose CPU coupled with an array of 32/64-bit PEs containing floating-point multipliers and suchlike targeted toward scientific and engineering calculations; and IMEC (www.imec.be) with a configurable core comprising a single very long instruction word (VLIW) CPU coupled with an array of 32/64 PEs each containing an ALU/MAC combo.

A good example of the next higher level of granularity is provided by picoChip (www.picochip.com), whose picoArray features several hundred 16-bit CPU and DSP cores connected by a sea of programmable interconnect. Each core, has its own local memory (ranging from 1K to 64K depending on the core type). The programming model for a picoArray is an interesting mixture of styles. A VHDL block-level netlist is used to define the connectivity between each of the CPU and DSP cores (each block in the netlist maps onto a specific type of core); meanwhile, the actual function of each block is defined in C and/or assembly code.

Another good example of this level of granularity is provided by the multiprocessor DSP (MDSP) architecture from Cradle Technologies (www.cradle.com). Current incarnations of the MDSP offer up to 8 CPU cores and 16 DSP cores. Each of these 32-bit cores has its own local instruction and data memory. The latest programming model for these devices is to create a C program that is divided into multiple threads, and to tag each thread as being either a control thread (to be executed on a CPU) or a signal processing thread (to be executed on a DSP). A run-time dynamic scheduler is then used to assign threads to available resources on the device.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.