New Garbage Collectors Designed With Parallelism in Mind

On the one hand, Garbage Collectors simplify developers' lives, but on the other hand, they can become the greatest enemies of a parallelized algorithm's performance. Finally, Java 7 and .Net 4 are going to offer new Garbage Collectors really targeted for multicore microprocessors with large memories.

I work with many programming languages. I work with unmanaged C++, C# and Java, among others. One of the most exciting features of both C# and Java is their Garbage Collectors. Most developers tend to forget about releasing unused resources. The recommendation is to leave the Garbage Collectors do their work. It is a "Mother Nature will provide" approach.

One of the great advantages of designing an algorithm that is going to be programmed using unmanaged C++ is that the developer is responsible of releasing the resources at the right-time. This is very important in complex algorithms running many concurrent tasks in multicore microprocessors. Parallelized algorithms usually require more memory than their serial code version. Choosing the right time to release the resources is crucial to achieve the best possible performance. There is no Garbage Collector marking elements to be released in the next collection process. You don't have to trust in the Garbage Collector's fortune-telling capabilities. You know the algorithm and you know exactly what you want to do. You control all the variables.

However, when you work with C# or Java and you trust in the Garbage Collectors' algorithms, your algorithm can be the next victim of their inaccuracies. As aforementioned, parallelized algorithms usually require more memory than their serial code version. Therefore, they add a great pressure to Garbage Collectors and they can add serious performance problems to algorithms with outstanding designs.

The great problem is that the algorithms used in the Garbage Collectors current versions were not optimized to run on microprocessors with a great number of cores. They were optimized for multiprocessor systems. However, a Core i7, for example, offers 8 logical cores in a single physical microprocessor. It is completely different than a system with 8 physical microprocessors. Garbage Collectors are really complex and the hardware available nowadays is different than the hardware that was available a few years ago.

Luckily, .Net 4.0 and Java 7 will offer new Garbage Collectors, really optimized for multicore microprocessors. They were both designed to target the new micro-architectures, support high levels of concurrency, manage larger memory and reduce the latencies introduced during applications' execution. Of course, they have many differences, because JVM (Java Virtual Machine) and .Net's CLR (Common Language Run-time) are very different. However, the Garbage Collectors are changing in similar directions.

This is great news for C# and Java developers thinking seriously about multicore programming.

.Net's new CLR 4 will offer a new Garbage Collector mode, Background GC, which reduces latency among other improvements. You can watch the video of the presentation offered by Joshua Goodman on Lang.Net Symposium 2009. CLR 4 is available in .Net Framework 4.0 Beta 1 and Visual Studio 2010 Beta 1.

Java 7 will offer the new G1, also known as Garbage-First, Garbage Collector. G1 is available as an early preview since Java 6 Update 14.
You can read this excellent white-paper explaining its technical issues.
Besides, you can go here and watch the slides of "The Garbage-First Garbage Collector", by Tony Printezis and Paul Ciciora.

Real World Parallelism Webinar Series
  • November 17, 2009
    Visual Effects for Animation - presented by DreamWorks Animation
    Speaker: Ron Henderson (Bio)

    Ron Henderson manages the FX Tools group at DreamWorks Animation, where he is responsible for developing physical simulation and procedural modeling tools. These systems have been used for key visual effects in recent films such as Kung Fu Panda and Monsters vs. Aliens (March 2009).

    Prior to joining DreamWorks in 2002 he was a senior scientist at Caltech with a joint appointment to the Applied Math and Aeronautics departments, where he worked on efficient techniques for the direct numerical simulation of fluid turbulence.

    Abstract:
    In this webinar, Ron Henderson will show examples of visual effects, from hair and feathers to smoke and fire, from a variety of DreamWorks Animation feature films. He will discuss in general terms the kinds of techniques used to achieve particular visual effects. Finally, Henderson will show a detailed breakdown of the dam-breaking scene from Madagascar: Escape 2 Africa, demonstrating how different elements of key frame animation, simulation, and rendering are combined in a real production shot.

  • December 1, 2009
    A Quick and Easy Way to Parallelize a Legacy Codebase with Intel® Threading Building Blocks (TBBs)
    Speaker: Bernard Laberge, Avid, Senior Principal Engineer (Bio)

    Bernard Laberge is a senior principal engineer in the video editors division at Avid. During his seven years with the company he has been actively involved in the replacement of the legacy video processing engines used by Avid editors with a common hardware-abstracted, component-based video processing engine currently running on the CPU with SIMD optimized code, GPU, and dedicated hardware.

    Abstract:
    Learn how to overcome the limitations of a thread-based scheduler, including dealing with the absence of recursive parallelism support and the inefficient handling of unbalanced processing load. Bernard Laberge addresses how Avid resolved the expensive refactoring of their thread-based scheduler into a task-based solution by choosing Intel® Threading Building Blocks (TBBs). He explores how Avid was able to easily integrate the Intel TBBs into their video editor applications and more than 5 million lines of code.

  • December 15, 2009
    How to Use Intel® Parallel Studio to Streamline Code Development in a Multicore Environment
    Speaker: Matt Dunbar, Director for Performance Technology, SIMULIA (Bio)

    Matt Dunbar is the director for performance technology at SIMULIA. Since joining the company in 1993, he has worked on parallelization of the Abaqus suite of products, initially for shared memory architectures and more recently for distributed memory architectures. Dunbar has also been intimately involved in selecting both the hardware and software tools used in the development of the Abaqus product line.

    Abstract:
    Resolve elusive, costly multithreading errors quickly and efficiently with Intel® Parallel Studio. While many coding problems that lead to bugs in software applications are typically straightforward logic errors, errors in managing memory and in multithreading code can sometimes take weeks to months to diagnose and fix. Matt Dunbar explores how and why taking advantage of multicore processors through multithreaded code is critical for compute-intensive applications. While spotlighting his work on SIMULIA's Abaqus finite element solver, Dunbar addresses the need for multicore execution and shares his experiences using Intel Parallel Studio to streamline code development in a multicore environment.