Visualizing Parallelism and Concurrency in Visual Studio 2010 Beta 2

Visual Studio 2010 Beta 2 includes many interesting improvements related to its multicore programming features. The parallelism and concurrency profiling tools allow developers to visualize the behavior of a multithreaded application on multicore microprocessors and collect resource contention data.

If you want to translate multicore power into application performance, you have to make sure your concurrent software threads are running on hardware threads taking advantage of parallelism. Visual Studio 2010 Beta 2 improved many profiling reports related to parallelism and concurrency.

The IDE uses the name Concurrency. However, I'd rather talk about both parallelism and concurrency. When you create a multithreaded application, using task-based programming or raw threads, you're creating concurrent code. Nonetheless, it doesn't mean that the concurrent code is going to run in parallel all the time. It depends on the decisions taken by the operating system scheduler, the underlying hardware and the synchronization problems, among others. Therefore, it is necessary to evaluate whether the programmed concurrency is taking advantage of certain parallel hardware capabilities. Are the software threads running in parallel taking advantage of the existing hardware threads? The new Concurrency profiling tools offered by Visual Studio 2010 Beta 2 provide nice information to answer this question. Again, this tool allows you to visualize parallelism and concurrency, not just concurrency.

This option works with Visual Studio 2010 Beta 2 Premium or Ultimate versions. Besides, it requires Windows Vista, Windows 7, Windows Server 2008 or Windows Server 2008 R2.

Before beginning, you must run Visual Studio 2010 Beta 2 as Administrator. Then, you can open the multithreaded solution to analyze and select Analyze, Launch Performance Wizard… from the main menu. I'm going to explain some of the results offered activating the options Concurrency (Parallelism and concurrency in my parallel programming language), Collect resource contention data and Visualize the behavior of a multithreaded application, as shown in the following picture:





Specifying the desired profiling method.

If you're working on a 64-bits operating system, you'll probably see a dialog box whit this message "To enable complete call stacks on x64 platforms, executive paging must be disabled. A reboot is then required. To make this change, click "Yes", save your work, and then reboot.", as shown in the following picture:

On 64-bits operating systems, the IDE will disable executive paging and force you to reboot.

You have to take into account that the application is going to take more time to run whilst being profiled. Once the application finishes or the profiling session is interrupted, Visual Studio will start analyzing the generated report.

Minor criticism, the IDE usually takes a long time to analyze the report. It doesn't take advantage of multicore in order to run this CPU-intensive process… I think that multicore programming analysis tools should be optimized for multicore. However, remember that I'm talking about Beta 2. As a multicore developer, I expect multicore development environments to take full advantage of modern multicore microprocessors.

The first graph will show a concurrency visualization, displaying the wall clock time, as shown in the following picture:

Visualizing the behavior of a multithreaded application.

Then, you can click on CPU utilization and Visual Studio will display the average CPU utilization for the analyzed process on a graph, considering the available hardware threads (logical cores). In this case, the average CPU utilization was 86%, as shown in the following picture:

Visualizing the CPU utilization.

However, you have to be careful whilst analyzing this graph. As I explained in my previous post, "TMonitor: Understanding What Happens With Each Hardware Thread", some technologies like Enhanced Intel SpeedStep Technology and Intel Turbo Boost Technology affect the CPU utilization. Besides, a high CPU utilization percentage could mean huge synchronization overheads. Remember to measure speedup and scalability considering the execution time with different hardware threads (logical cores) before profiling.

Then, you can click on Threads and Visual Studio will display visual timelines for the disks activities, the main thread and all the worker threads. This is a very useful visualization because it helps to split between execution and synchronization times. Visual Studio uses different colors, as shown in the following visible timeline profile:

Visual Studio uses different colors to fill the timelines and offers a very clear summary.

The following visualization shows the result of running an application that creates groups of worker threads to take advantage of four hardware threads (logical cores). It is not using the work stealing queues offered by .Net 4.0 Beta 2:

Visualizing timelines for each worker thread.

The application uses raw threads. Therefore, it is very easy to see that it is not reusing threads to schedule tasks. It is very important to reduce the thread creation overhead and the existing synchronization to optimize the application. The profiler offers very useful information.

Finally, you can click on Cores and Visual Studio will display how each software thread was executed on each available hardware thread (logical core). In this case, the application ran on a quad-core CPU with 4 hardware threads (4 logical cores and 4 physical cores), as shown in the following picture:

Visualizing the software threads running on the available hardware threads (logical cores).

Besides, the profiler summarizes the cross-core context switches, the total context switches and the percent of context switches that cross cores.

These new visualization options are really useful to optimize applications to help developers using Visual Studio to successfully translate multicore power into application performance. There are many additional options. This is just an introduction to the new views. I'll be adding real-life examples related to parallel programming and profiling using the new features found in Visual Studio 2010 Beta 2.

Real World Parallelism Webinar Series
  • November 17, 2009
    Visual Effects for Animation - presented by DreamWorks Animation
    Speaker: Ron Henderson (Bio)

    Ron Henderson manages the FX Tools group at DreamWorks Animation, where he is responsible for developing physical simulation and procedural modeling tools. These systems have been used for key visual effects in recent films such as Kung Fu Panda and Monsters vs. Aliens (March 2009).

    Prior to joining DreamWorks in 2002 he was a senior scientist at Caltech with a joint appointment to the Applied Math and Aeronautics departments, where he worked on efficient techniques for the direct numerical simulation of fluid turbulence.

    Abstract:
    In this webinar, Ron Henderson will show examples of visual effects, from hair and feathers to smoke and fire, from a variety of DreamWorks Animation feature films. He will discuss in general terms the kinds of techniques used to achieve particular visual effects. Finally, Henderson will show a detailed breakdown of the dam-breaking scene from Madagascar: Escape 2 Africa, demonstrating how different elements of key frame animation, simulation, and rendering are combined in a real production shot.

  • December 1, 2009
    A Quick and Easy Way to Parallelize a Legacy Codebase with Intel® Threading Building Blocks (TBBs)
    Speaker: Bernard Laberge, Avid, Senior Principal Engineer (Bio)

    Bernard Laberge is a senior principal engineer in the video editors division at Avid. During his seven years with the company he has been actively involved in the replacement of the legacy video processing engines used by Avid editors with a common hardware-abstracted, component-based video processing engine currently running on the CPU with SIMD optimized code, GPU, and dedicated hardware.

    Abstract:
    Learn how to overcome the limitations of a thread-based scheduler, including dealing with the absence of recursive parallelism support and the inefficient handling of unbalanced processing load. Bernard Laberge addresses how Avid resolved the expensive refactoring of their thread-based scheduler into a task-based solution by choosing Intel® Threading Building Blocks (TBBs). He explores how Avid was able to easily integrate the Intel TBBs into their video editor applications and more than 5 million lines of code.

  • December 15, 2009
    How to Use Intel® Parallel Studio to Streamline Code Development in a Multicore Environment
    Speaker: Matt Dunbar, Director for Performance Technology, SIMULIA (Bio)

    Matt Dunbar is the director for performance technology at SIMULIA. Since joining the company in 1993, he has worked on parallelization of the Abaqus suite of products, initially for shared memory architectures and more recently for distributed memory architectures. Dunbar has also been intimately involved in selecting both the hardware and software tools used in the development of the Abaqus product line.

    Abstract:
    Resolve elusive, costly multithreading errors quickly and efficiently with Intel® Parallel Studio. While many coding problems that lead to bugs in software applications are typically straightforward logic errors, errors in managing memory and in multithreading code can sometimes take weeks to months to diagnose and fix. Matt Dunbar explores how and why taking advantage of multicore processors through multithreaded code is critical for compute-intensive applications. While spotlighting his work on SIMULIA's Abaqus finite element solver, Dunbar addresses the need for multicore execution and shares his experiences using Intel Parallel Studio to streamline code development in a multicore environment.