Balder: A Silverlight 3 Managed 3D Engine Optimized for Multicore
Silverlight 3 doesn't offer native support for loading and rendering 3D models. However, Balder, an open source project, offers a very complete managed 3D engine for Silverlight 3. It achieved the necessary frame rate taking advantage of Silverlight's threading capabilities.
This time, I'm going to share a nice experience about great results achieved parallelizing an existing algorithm. I had the opportunity to be a witness of the evolution of a 3D engine, Balder.
In a previous post, I had already talked about designing Rich Internet Applications with parallelism in mind. Balder's creator, Einar Ingebrigtsen, implemented this idea in its managed 3D engine for Silverlight 3.
I've been working with Balder in many projects in the last year and I had the possibility to talk to Einar and Petri Wilhelmsen (two gaming gurus) about many potential performance improvements. Balder doesn't use the GPU (Graphics Processing Unit) to render 3D models. It uses a software rendering process. Silverlight 3 has many restrictions and you cannot access DirectX or OpenGL from a Silverlight application. Therefore, an efficient software rendering process was the only opportunity to bring 3D models to life in a Silverlight viewport.
There are other open source 3D engines offering some software rendering capabilities to Silverlight. However, one of Balder's key features is its optimization to take advantage of multiple logical cores (hardware threads) in the software rendering process. This way, it is capable of offering a very interesting frame rate when running on microprocessors with two or more logical cores (hardware threads). Silverlight 3 doesn't offer the new task-based programming included in .NET 4. You have to work with the classic .NET 3.5 Threading model in order to take advantage of multiple cores.
Balder's software rendering process had a reasonable performance. However, it was not enough to reach 60 frames per seconds (FPS) with dozens of 3D models being rendered in real-time. The software rendering process was using a single-threaded model, old-fashioned non-scalable sequential code. Rewriting a complete software rendering algorithm to take full advantage of multiple cores is indeed a very complex task.
However, there was a very easy to detect hotspot. A very easy to identify block of sequential code that could be broken down into many concurrent blocks of code. Balder's team worked to optimize this loop using multiple threads. They used wait handles to coordinate the start and the end of many concurrent loops. Of course, this process required a new design. It wasn't just a copy and paste. It wasn't a simple code refactoring. The new design had to consider concurrency. Nevertheless, as the focus was in just one block of code, it was easier to achieve faster results than rewriting the complete engine.
The performance improvements were really important. The rendering process was really faster when more than two logical cores were available. Balder became a really serious 3D engine with a high performance software rendering process. It has a great advantage over other 3D engines for Silverlight, because it uses many logical cores to improve its performance.
But wait, what about compatibility? There were many incompatibility problems with the new multithreaded version. A classic multithreading problem. Some code in the engine was supposed to run on the main UI thread (the only thread capable of making changes to the UI controls). In the optimization process, some of this code wasn't running on the main UI thread anymore. Therefore, some applications didn't work as expected. Luckily, the problem was solved in a newer version. This is a very interesting situation. When you optimize an engine to take advantage of multiple cores using concurrent code, you have to keep compatibility with previous versions. There is a lot of code prepared to use the services offered by this engine. Einar solved the problem in just a few hours.
Performance improvements in engines shouldn't destroy backward compatibility.
The parallel code used in Balder isn't perfect. There are many other possibilities to offer more scalability and to improve its performance. However, it's a good example of how to translate multicore power into application performance. I'm sure Balder's team is going to improve the engine in future versions and they'll be able to take even more advantage of multicore microprocessors.
What are you waiting for? It's time to go parallel. If you find a RIA running too slow, remember that you can take advantage of multiple cores in many programming languages used for developing RIAs.
You can read more details about Balder's optimizations in this post written by Einar Ingebrigtsen: "Balder Silverlight 3 optimization - round 1"
You can compare the performance and the code for this 3D engine downloading the different versions from its Codeplex website
This Week's Multicore Reading List
MATLAB and Google App Engine
Logging In C++ : Part 2
Improving log granularityA Conversation with BitMagic's Developer
Prefer Structured Lifetimes: Local, Nested, Bounded, Deterministic
- Intel Parallel Studio; Download the free eval today!
- Parallelism Breakthrough Video Series; Watch and learn more about Intel® Parallel Studio
- 2009 Intel Software Webinar Series; View On-Demand webinars
- Coding for Multi-core Processes; Intel® Compiler Pro eBook
- Performance Through Parallelism; Intel® Tuning for Vista eBook
- Intel® Software Network; Connect with developers and Intel engineers
-
November 17, 2009
Visual Effects for Animation - presented by DreamWorks Animation
Speaker: Ron Henderson (Bio)Ron Henderson manages the FX Tools group at DreamWorks Animation, where he is responsible for developing physical simulation and procedural modeling tools. These systems have been used for key visual effects in recent films such as Kung Fu Panda and Monsters vs. Aliens (March 2009).
Prior to joining DreamWorks in 2002 he was a senior scientist at Caltech with a joint appointment to the Applied Math and Aeronautics departments, where he worked on efficient techniques for the direct numerical simulation of fluid turbulence.Abstract:
In this webinar, Ron Henderson will show examples of visual effects, from hair and feathers to smoke and fire, from a variety of DreamWorks Animation feature films. He will discuss in general terms the kinds of techniques used to achieve particular visual effects. Finally, Henderson will show a detailed breakdown of the dam-breaking scene from Madagascar: Escape 2 Africa, demonstrating how different elements of key frame animation, simulation, and rendering are combined in a real production shot. -
December 1, 2009
A Quick and Easy Way to Parallelize a Legacy Codebase with Intel® Threading Building Blocks (TBBs)
Speaker: Bernard Laberge, Avid, Senior Principal Engineer (Bio)Bernard Laberge is a senior principal engineer in the video editors division at Avid. During his seven years with the company he has been actively involved in the replacement of the legacy video processing engines used by Avid editors with a common hardware-abstracted, component-based video processing engine currently running on the CPU with SIMD optimized code, GPU, and dedicated hardware.
Abstract:
Learn how to overcome the limitations of a thread-based scheduler, including dealing with the absence of recursive parallelism support and the inefficient handling of unbalanced processing load. Bernard Laberge addresses how Avid resolved the expensive refactoring of their thread-based scheduler into a task-based solution by choosing Intel® Threading Building Blocks (TBBs). He explores how Avid was able to easily integrate the Intel TBBs into their video editor applications and more than 5 million lines of code. -
December 15, 2009
How to Use Intel® Parallel Studio to Streamline Code Development in a Multicore Environment
Speaker: Matt Dunbar, Director for Performance Technology, SIMULIA (Bio)Matt Dunbar is the director for performance technology at SIMULIA. Since joining the company in 1993, he has worked on parallelization of the Abaqus suite of products, initially for shared memory architectures and more recently for distributed memory architectures. Dunbar has also been intimately involved in selecting both the hardware and software tools used in the development of the Abaqus product line.
Abstract:
Resolve elusive, costly multithreading errors quickly and efficiently with Intel® Parallel Studio. While many coding problems that lead to bugs in software applications are typically straightforward logic errors, errors in managing memory and in multithreading code can sometimes take weeks to months to diagnose and fix. Matt Dunbar explores how and why taking advantage of multicore processors through multithreaded code is critical for compute-intensive applications. While spotlighting his work on SIMULIA's Abaqus finite element solver, Dunbar addresses the need for multicore execution and shares his experiences using Intel Parallel Studio to streamline code development in a multicore environment.



