The Mitosis System
I've just described how Intel's speculative threading system "Mitosis" works. This codename was chosen for the analogy it makes between cells splitting apart and working in parallel and speculative threading.
Mitosis relies on both hardware and software support to work. On the software side, the Mitosis compiler is responsible for analyzing the program, and locating the sections of it that can efficiently be executed in parallel. A key component of this analysis is the identification of sections of code whose corresponding pre-computation slice has a very low computation overhead. Other conventional aspects such as workload balance also have to be considered.
On the hardware side, Mitosis is built on top of a multi-core and/or multithreaded processor. The main extension required is support for buffering and multiversioning in the memory hierarchy. Buffering is needed to keep the speculative state until the thread is verified and can be committed. Multiversioning is required to allow each variable to have a different value for each one of the threads that are running in parallel. This is important because every thread is executing a piece of code that started out with sequential semantics, but now, parallelized in threads, is being worked on simultaneously with values that used to be supplied in different points in time in the program. That means the variables of the concurrent threads are the same, but the values that these variables contain may be different since they represent the state of the program at different points of the sequential semantics. Mitosis also relies on hardware support to check which variables are read by each one of the threads and which memory locations are written. This enables data dependence misspeculations to be spotted quickly and the corresponding threads be discarded.
The Mitosis system has been designed to optimize the trade-off between software and hardware to exploit speculative thread-level parallelism.
Results
To illustrate the performance potential of the Mitosis compiler, we use a subset of the Olden benchmark suite. Olden benchmarks are pointer-intensive programs for which automatic parallel compilers can hardly extract any thread-level parallelism.
As you can see in Figure 3, the results obtained by the Mitosis compiler/architecture for this subset of the Olden benchmarks are impressive. It outperforms single-threaded execution by 2.2x. When compared with a big out-of-order core, the speed increase is close to 2x. We can also see that the benefits of Mitosis do not come only from reducing memory latency--it outperforms an ideal system with perfect memory by about 60 percent. Overall, this work shows that significant amounts of thread-level parallelism can be exploited in irregular codes, with a rather low overhead in terms of extra (wasted) activity.