In our first installment of the Featured Algorithm series, we looked at parallel_for, a popular and easy way to start using Intel Threading Building Blocks to gain performance increases on multi-core platforms. Next, we focus on the pipeline component. A pipeline functions like a factory assembly line, running a stream of inputs through a series of filters.
To get a sense of how real-world developers are using Intel Threading Building Blocks, we spoke with Richard Bowler, CTO of Aeshen. His PackRat application, written in C++, uses multithreading via a TBB pipeline to speed up file compression.
Q: Was it easy to add the pipeline class to your application?
A: It was very easy to add the required pipeline classes to the application. In fact, of all the TBB mechanisms I used over several sample applications, the pipeline mechanism was the easiest to implement as a neophyte with TBB. (Let me stress that all the TBB mechanisms I used worked very well and went in with a minimum of fuss, but pipeline was the easiest to implement of the bunch.)
Q: Did you make any mistakes?
A: Actually, the pipeline code I added worked the first time. This is surprising when you think about it. I remember when I was using TBB for doing some loop parallelization, it took a couple of tries before I got the kinks out. But the pipeline was easy to understand and went in without a hitch.
Q: What would be the most interesting use for this algorithm?
A: This is a great feature to use when you have to move large amounts of data through multi-step algorithms. Anytime you can separate an algorithm into discrete steps, you can break those steps up and parallelize the algorithm using pipeline. The setup is straightforward. Basically, you code the steps, and define intermediate data that gets passed from each step to the next, and at the end, your results roll out. Good examples of algorithms that are ripe for pipelining are file compression by directory (which I did in my example), and performing a complex convolution on a video frame.
Q: What performance or productivity benefits did you gain?
A: I noticed real performance gains on my single-core computer setup to do hyperthreading. Given that what you're doing underneath is spawning concurrent threads to divide up work, the gains grow significantly when you move to multi-core systems.
Q: How should a developer get started with pipeline?
A: Presuming you have an algorithm that fits the piping model, I'd say just dive in. There really isn't a huge learning curve. I went from zero to implemented in about four hours. It's a snap!
Here are the pipeline runs in the Aeshen PackRat source code. The first sample illustrates use of the pipeline:
// create the filter objects for the pipeline BlockCompress compressor; // create the pipeline and insert filter tbb::pipeline pipeline; pipeline.add_filter(compressor); // run the pipeline // N is the maximal number of data pieces a pipeline could process at one time pipeline.run(N); // clear the pipeline before destruction pipeline.clear();
This is an example of a filter:
class BlockCompress : public tbb::filter { public: BlockCompress(void) : tbb::filter(/*is serial step?*/ false) {} ~BlockCompress(void); // override the () operator, as required for use in a TBB pipeline void* operator() (void* item) { PRFileBlock* pBlock = (PRFileBlock*) item; char buffer[PR_BLOCK_READ_SIZE+1024]; pBlock->nCompressedSize = PR_BLOCK_READ_SIZE+1024; int nCompressResult = BZ2_bzBuffToBuffCompress( buffer, &pBlock->nCompressedSize, pBlock->buffer, pBlock->nUncompressedSize, 5, 0, 30); if (nCompressResult == BZ_MEM_ERROR) return NULL; ASSERT(nCompressResult == BZ_OK); memcpy_s(pBlock->buffer, PR_BLOCK_READ_SIZE+1024, buffer, pBlock->nCompressedSize); return pBlock; } };