FREE Subscription to Dr. Dobb’s Digest: Same Great Content, New Digital Edition
Site Archive (Complete)
C++
Email
Print
Reprint

add to:
Del.icio.us
Digg
Google
Furl
Slashdot
Y! MyWeb
Blink
June 15, 2006

Lock-free Interprocess Communication

(Page 7 of 7)

Performance Testing

Algorithm 2 was compared to Algorithm 3 and a standard version of this algorithm. This test involves passing 10,000,000 integer values from one thread to another. The standard algorithm performs all required locking when it accesses shared resources. Locking is performed for every passing integer.

  Standard Algorithm Algorithm 2 (Light Pipe) Algorithm 3 (Cache Line Optimised Light Pipe)
Dell notebook 1.6 GHz 11.5 sec 0.18 sec 0.22 sec
Dell server 2 processors with hyper threading technology (4 virtual processors) 243 sec 0.35 sec 0.07 sec
Table 1: Performance comparison of standard algorithm, Algorithm 2 and Algorithm 3.

I would not consider this comparison very fair for standard applications because we usually do not send information in small chunks from one process to another, but for some applications (like routers or switches) this technique may bring some benefits. For example, consider a few specialized processors which perform pipelined data processing (Figure 6).

Figure 6: Pipelined data processing.

Then data can be passed between these processors using the proposed mechanism. It should improve overall performance in case of high load and probably decrease latency time in case of low load.

Table 2 shows Algorithm 4 compared with standard version of this algorithm. The testing procedure in this case involves running 8 threads. Each thread reads from the registry 1,024,000 times and updates the registry 1,000 times.

  Standard version of Algorithm 4 Algorithm 4 (Lock-free Read-optimized Registry)
Dell notebook 1.6 GHz 3.4 sec 3.5 sec
Dell server, 2 processors 3.2 GHz with hyper threading technology (4 virtual processors) 37 sec 1.2 sec
Table 2: Performance comparison of standard and optimized versions of Algorithm 4.

The lock-free algorithm does not provide any benefits on single-processor systems, but on a multi-processor system it outperformed the standard algorithm by about a factor of 30. Since multiprocessor systems are becoming mainstream the described technique or its variations may bring considerable benefits for applications which use it.

References:

  1. Chandler , Dean. Reduce False Sharing in .NET*, http://www.intel.com/cd/ids/ developer/asmo-na/eng/193679.htm?page=1
  2. Alexandrescu, Andrei. Lock-free data structures, C/C++ Users Journal, October 2004
Previous Page | 1 Lock-free IPC | 2 Algorithm One | 3 Algorithm Two (Light Pipe) | 4 Passing Zero Words | 5 Algorithm Three (Cache Line Optimized Light Pipe) | 6 Algorithm Four (Read-optimized Registry) | 7 Performance Testing
TOP 5 ARTICLES
No Top Articles.



MICROSITES
FEATURED TOPIC

ADDITIONAL TOPICS

INFO-LINK