Novell's Mono Brings SIMD Support to C#

Parallel programming is not just about multi-threading and multi-core. There's also a lot of power in the SIMD (Single Instruction Multiple Data) extended instruction set available in most modern microprocessors from Intel and AMD.

Many applications written in C and C++ take advantage of these instruction sets to work on vectors and matrixes. They are very useful to improve performance in algorithms that need to perform multiple calculations on many data blocks. Many C and C++ compilers optimize loops to take advantage of SIMD instruction sets. Therefore, they are able to perform an automatic parallelization.

Nevertheless, .Net and C# developers working with managed code didn’t have a simple way to take advantage of these powerful instruction sets in C# code. This scenario changes with the release of Mono 2.2 and the outstanding work done by Miguel De Icaza. Mono is an open source project, sponsored by Novell, which offers a multiplatform .Net development framework. However, it goes beyond this goal and, as a bonus, among other features, it offers access to hardware accelerated SIMD-based primitives. The key is the namespace Mono.Simd. Using it, you can take advantage of SIMD instruction sets in C#. It is a work in progress. Thus, it only supports up to SSE3 and some SSE4. However, it is a great improvement over the lack of support in C#.

Most operations for updating vectors and matrixes offer an incredible performance improvement and you don’t have to leave C#.

It offers support for the following hardware accelerated packed types:

* Mono.Simd.Vector16b: 16 unsigned bytes.
* Mono.Simd.Vector16sb: 16 signed bytes
* Mono.Simd.Vector2d: 2 doubles
* Mono.Simd.Vector2l: 2 signed 64-bit longs
* Mono.Simd.Vector2ul: 2 unsigned 64-bit longs
* Mono.Simd.Vector4f: 4 floats
* Mono.Simd.Vector4i: 4 signed 32-bit ints
* Mono.Simd.Vector4ui: 4 unsigned 32-bit ints
* Mono.Simd.Vector8s: 8 signed 16-bit shorts
* Mono.Simd.Vector8us: 8 unsigned 16-bit shorts

If you are interested in taking advantage of SIMD support offered in Mono, you can take a look at the excellent slide show presented by Miguel De Icaza at PDC 2008 here.

You can find a further explanation of SIMD extensions in the article "I've Fallen In Love With the Vectoriser" written by Stephen Blair-Chappel, a few weeks ago. It uses C/C++, but you will be able to use C# with the Mono.Simd namespace.

You can check the kind of SIMD support that your current CPU offers using the freeware CPU-Z. It’s an excellent utility to discover the different versions of SIMD available in a CPU.

If you work with vectors, matrixes and C#, you’ll love the SIMD support that Mono offers.

For more details, go here

Real World Parallelism Webinar Series
  • November 17, 2009
    Visual Effects for Animation - presented by DreamWorks Animation
    Speaker: Ron Henderson (Bio)

    Ron Henderson manages the FX Tools group at DreamWorks Animation, where he is responsible for developing physical simulation and procedural modeling tools. These systems have been used for key visual effects in recent films such as Kung Fu Panda and Monsters vs. Aliens (March 2009).

    Prior to joining DreamWorks in 2002 he was a senior scientist at Caltech with a joint appointment to the Applied Math and Aeronautics departments, where he worked on efficient techniques for the direct numerical simulation of fluid turbulence.

    Abstract:
    In this webinar, Ron Henderson will show examples of visual effects, from hair and feathers to smoke and fire, from a variety of DreamWorks Animation feature films. He will discuss in general terms the kinds of techniques used to achieve particular visual effects. Finally, Henderson will show a detailed breakdown of the dam-breaking scene from Madagascar: Escape 2 Africa, demonstrating how different elements of key frame animation, simulation, and rendering are combined in a real production shot.

  • December 1, 2009
    A Quick and Easy Way to Parallelize a Legacy Codebase with Intel® Threading Building Blocks (TBBs)
    Speaker: Bernard Laberge, Avid, Senior Principal Engineer (Bio)

    Bernard Laberge is a senior principal engineer in the video editors division at Avid. During his seven years with the company he has been actively involved in the replacement of the legacy video processing engines used by Avid editors with a common hardware-abstracted, component-based video processing engine currently running on the CPU with SIMD optimized code, GPU, and dedicated hardware.

    Abstract:
    Learn how to overcome the limitations of a thread-based scheduler, including dealing with the absence of recursive parallelism support and the inefficient handling of unbalanced processing load. Bernard Laberge addresses how Avid resolved the expensive refactoring of their thread-based scheduler into a task-based solution by choosing Intel® Threading Building Blocks (TBBs). He explores how Avid was able to easily integrate the Intel TBBs into their video editor applications and more than 5 million lines of code.

  • December 15, 2009
    How to Use Intel® Parallel Studio to Streamline Code Development in a Multicore Environment
    Speaker: Matt Dunbar, Director for Performance Technology, SIMULIA (Bio)

    Matt Dunbar is the director for performance technology at SIMULIA. Since joining the company in 1993, he has worked on parallelization of the Abaqus suite of products, initially for shared memory architectures and more recently for distributed memory architectures. Dunbar has also been intimately involved in selecting both the hardware and software tools used in the development of the Abaqus product line.

    Abstract:
    Resolve elusive, costly multithreading errors quickly and efficiently with Intel® Parallel Studio. While many coding problems that lead to bugs in software applications are typically straightforward logic errors, errors in managing memory and in multithreading code can sometimes take weeks to months to diagnose and fix. Matt Dunbar explores how and why taking advantage of multicore processors through multithreaded code is critical for compute-intensive applications. While spotlighting his work on SIMULIA's Abaqus finite element solver, Dunbar addresses the need for multicore execution and shares his experiences using Intel Parallel Studio to streamline code development in a multicore environment.