Multicore-enabling the N-Queens Problem Using Cilk++

Tools
  • Print
  • Email

The N-Queens problem asks this question: Given an n-x-n chessboard and n queens, is it possible to place the n queens on the chessboard without any conflict? Is so, how many different ways can they be placed? An arrangement is said to be conflicting if at least two queens conflict with each other. Two queens conflict if they share the same row, column, or diagonal. For example, the following is a solution for eight queens:

An excellent overview of the problem is available here.

Algorithm

The classical solution to N-Queens problem is brute force enumeration with pruning. Brute force enumeration tries every possible arrangement of the n queens and checks if any of them satisfies the non-conflicting criteria. For example, there are n^2 locations on the board, so there are n^2 possible locations for the first queen, n^2-1 for the second, and so on. Thus we have choose (n^2, n) = (n^2)!/((n^2-n)! * n!) possible arrangements of queens. By enumerating all of these possibilities and checking each of them, all solutions can be found. For n=10, this number is approximately 1.73e13.

Because there could be only one queen per row in a non-conflicting arrangement, we can constrain the first queen to locations in the first row, the second queen to the second row, and so on. This will give n possible locations for each queen, resulting in n^n possible arrangements. For n=10, this is 1e10. Observe that there could be one queen per column as well; the i-th queen will have only at most n-i+1 valid locations, because the previous i-1 queens already took I-1 columns. The number of possibilities is then reduced to n!, or 3.6e6 for n=10.

The final point to reduce enumerations is to prevent queens from sharing a diagonal. While this is straightforward to code, it is hard to provide an actual estimate of the number of enumeration after using it. The following is a pseudo-code for this enumeration plus pruning algorithm:


queens=[];     //locations of already put queens
try (int i)    //put i-th queen
{
	if (i>n) 
	{
	    found_a_solution;
	    return;
	}
	for j=1..n,   //possible locations of i-th queen in i-th row
	{
	    if (putting i-th queen at location (i,j) 
             does not conflict with previously put queens)
	    {
		queens[i]=j;   //store location of i-th queen
		try(i+1);      //put (i+1)-th queen		
	    }	    
	}
}
  

The complexity of this algorithm is upper-bounded by n!.

Implementation

I implemented the above algorithm in C++. Storing the queens array variable is a bit tricky. There were three choices before me: An STL vector, an array, or pointers. There are definitely other ways this could be implemented, such as using an array for space and bookkeeping pointers manually, but here I only considered these three basic cases. They each have their pros and cons:

  • STL Vector
    • Pro: don't need to manually allocate array space
    • Con: more actual memory allocations performed

  • Array
    • Pro: easy to implement, need only one allocation in every recursion
    • Con: still redundant, the queens placed early are stored many times

  • Pointers, maintain a cactus stack
    • Pro: no redundancy in storage
    • Con: hard to implement, can result in many fine grained memory allocations

All three choices were implemented. Before multicore-enabling them, their serial versions were built first. After all, it's always good to have the serial version of a parallel implementation at hand. In the following, the black code is the serial version, and the modification to parallelize them with Cilk++ are highlighted in blue.

Parallelizing a piece of code with Cilk++ is easy. What I mainly needed to do was to insert cilk_spawn before every recursive call, and cilk_sync before I used the return value from such calls. Also I needed to remove any data races. The two main tools I used were Cilkscreen (a data race detector), and reducers (race-free global variables provided with Cilk++).

The following are the three implementation. (The STL version is illustrated below, and the other two implementations are available by clicking on the links.)

PAGE 1 | 2 | 3 | 4 | 5 Next Page »

Real World Parallelism Webinar Series
  • November 17, 2009
    Visual Effects for Animation - presented by DreamWorks Animation
    Speaker: Ron Henderson (Bio)

    Ron Henderson manages the FX Tools group at DreamWorks Animation, where he is responsible for developing physical simulation and procedural modeling tools. These systems have been used for key visual effects in recent films such as Kung Fu Panda and Monsters vs. Aliens (March 2009).

    Prior to joining DreamWorks in 2002 he was a senior scientist at Caltech with a joint appointment to the Applied Math and Aeronautics departments, where he worked on efficient techniques for the direct numerical simulation of fluid turbulence.

    Abstract:
    In this webinar, Ron Henderson will show examples of visual effects, from hair and feathers to smoke and fire, from a variety of DreamWorks Animation feature films. He will discuss in general terms the kinds of techniques used to achieve particular visual effects. Finally, Henderson will show a detailed breakdown of the dam-breaking scene from Madagascar: Escape 2 Africa, demonstrating how different elements of key frame animation, simulation, and rendering are combined in a real production shot.

  • December 1, 2009
    A Quick and Easy Way to Parallelize a Legacy Codebase with Intel® Threading Building Blocks (TBBs)
    Speaker: Bernard Laberge, Avid, Senior Principal Engineer (Bio)

    Bernard Laberge is a senior principal engineer in the video editors division at Avid. During his seven years with the company he has been actively involved in the replacement of the legacy video processing engines used by Avid editors with a common hardware-abstracted, component-based video processing engine currently running on the CPU with SIMD optimized code, GPU, and dedicated hardware.

    Abstract:
    Learn how to overcome the limitations of a thread-based scheduler, including dealing with the absence of recursive parallelism support and the inefficient handling of unbalanced processing load. Bernard Laberge addresses how Avid resolved the expensive refactoring of their thread-based scheduler into a task-based solution by choosing Intel® Threading Building Blocks (TBBs). He explores how Avid was able to easily integrate the Intel TBBs into their video editor applications and more than 5 million lines of code.

  • December 15, 2009
    How to Use Intel® Parallel Studio to Streamline Code Development in a Multicore Environment
    Speaker: Matt Dunbar, Director for Performance Technology, SIMULIA (Bio)

    Matt Dunbar is the director for performance technology at SIMULIA. Since joining the company in 1993, he has worked on parallelization of the Abaqus suite of products, initially for shared memory architectures and more recently for distributed memory architectures. Dunbar has also been intimately involved in selecting both the hardware and software tools used in the development of the Abaqus product line.

    Abstract:
    Resolve elusive, costly multithreading errors quickly and efficiently with Intel® Parallel Studio. While many coding problems that lead to bugs in software applications are typically straightforward logic errors, errors in managing memory and in multithreading code can sometimes take weeks to months to diagnose and fix. Matt Dunbar explores how and why taking advantage of multicore processors through multithreaded code is critical for compute-intensive applications. While spotlighting his work on SIMULIA's Abaqus finite element solver, Dunbar addresses the need for multicore execution and shares his experiences using Intel Parallel Studio to streamline code development in a multicore environment.