FREE Subscription to Dr. Dobb’s Digest: Same Great Content, New Digital Edition
Site Archive (Complete)
C++
Email
Print
Reprint

add to:
Del.icio.us
Digg
Google
Furl
Slashdot
Y! MyWeb
Blink
October 29, 2008

Writing a Generalized Concurrent Queue

(Page 2 of 4)

A Two-Lock Multiproducer/Consumer Queue

The queue data structure itself is a singly linked list. To make the code simpler for the empty-queue boundary case, the list always contains a dummy node at the head, and so the first element logically in the queue is the one contained in the node after first. Figure 1 shows the layout of an empty queue, and Figure 2 shows a queue that holds some objects.

Each node holds the T object by pointer and adds padding:

template <typename T> struct LowLockQueue { private: struct Node { Node( T* val ) : value(val), next(nullptr) { } T* value; atomic<Node*> next; char pad[CACHE_LINE_SIZE - sizeof(T*)- sizeof(atomic<Node*>)]; };

Like any shared variable, the next pointer needs to be protected by a mutex or, as here, declared as an ordered atomic type (C++0x atomic<> or Java/.NET volatile). The padding here is to ensure that two Node objects won't occupy the same cache line; in particular, in a nonempty queue, having the first and last nodes in the same cache line would penalize performance by causing invisible contention between the producer and consumer. Note that the amount of padding shown here and later on errs on the side of being too conservative: Each Node will be allocated on the heap, and a heap allocator may add its own extra overhead to each allocated block for alignment and housekeeping information, which effectively adds extra padding. If so, and if we know how much that will be, we could reduce our internal padding proportionately to make a heap-allocated Node exactly fill one cache line.

Figure 1: Structure of an empty queue.

[Click image to view at full size]

Figure 2: A queue containing four T objectsempty queue.

Next, here are our queue control variables:

char pad0[CACHE_LINE_SIZE]; // for one consumer at a time Node* first; char pad1[CACHE_LINE_SIZE - sizeof(Node*)]; // shared among consumers atomic<bool> consumerLock; char pad2[CACHE_LINE_SIZE - sizeof(atomic<bool>)]; // for one producer at a time Node* last; char pad3[CACHE_LINE_SIZE - sizeof(Node*)]; // shared among producers atomic<bool> producerLock; char pad4[CACHE_LINE_SIZE - sizeof(atomic<bool>)];

Again, we add padding to make sure hat data used by different threads stay physically separate in memory and cache. Clearly, we want the consumer-end data and the producer-end data to be on separate cache lines; but even though only one producer and one consumer will be active at a time, we want to keep the locking variable separate so that waiting consumers spinning on consumerLock won't contend on the cache line that contains first which the active consumer is updating, and that waiting producers spinning on producerLock won't slow down the active producer who is updating last.

The constructor just sets up the initial empty state, and the destructor (in .NET or Java, this would be the disposer) just walks the internal list and tears it down:

public: LowLockQueue() { first = last = new Node( nullptr ); producerLock = consumerLock = false; } ~LowLockQueue() { while( first != nullptr ) { // release the list Node* tmp = first; first = tmp->next; delete tmp->value; // no-op if null delete tmp; } }

Previous Page | 1 Multiple Producers and Consumers | 2 A Two-Lock Multiproducer/ Consumer Queue | 3 Produce and Consume | 4 Fully Nonblocking Multiproducer/ Consumer Queues Next Page
TOP 5 ARTICLES
No Top Articles.



MICROSITES
FEATURED TOPIC

ADDITIONAL TOPICS

INFO-LINK