February 26, 2007
Programming and Optimizing C Code: Part 2This second of a five-part series shows how to optimize DSP "kernels," i.e., inner loops. It also shows how to write fast floating-point and fractional code.Alan Anderson, Analog Devices
This second of a five-part series shows how to optimize DSP "kernels," i.e., inner loops. It also shows how to write fast floating-point and fractional code.
[Editor's note: Part 1 introduces the basic principles of writing C code for DSP. Part 3 will explain how to access DSP features like hardware loops and circular addressing from portable C. It will be published Monday, March 5. For more programming tips, see the DSP programmer's guide.]
DSP Kernels Today, the distinction between DSP kernels and control code is blurring. Newer application kernels require all sorts of, operations, including operations that were previously seen in control code. In addition, compilers are penetrating into previously hand-coded areas. In those C written DSP kernels we are at the place where performance needs must be considered most carefully.
Floating Point As DSPs have gotten faster, it has become practical to simply leave less-critical code in floating point in order to reduce development costs—even when the target DSP lacks native floating-point instructions. This has led to a re-evaluation of the floating-point support functions that vendors provide with their C compilers. In the past these functions were an afterthought, provided only to ensure code portability. Today, they are often carefully handcrafted. In the quest for speed, these libraries may even omit some aspects of the IEEE standard—such as standards-compliant processing of NaN values—which are mathematically useful but are seldom critical for DSP applications. This is illustrated in Figure 1, which shows reference IEEE-compliant functions for ADI's Blackfin on the right. The left-hand side shows highly optimized, non-compliant functions. (These are sample figures that do not show the entire range of performance.)
![]() Figure 1. IEEE-compliant (right) vs. non-compliant (left) floating-point libraries. Also consider if you really need the 64-bit (or "double") precision which is the normal ANSI C portability standard. Many applications—for example those in the automotive and audio areas—only require 32-bit (or "float") precision. Using the lower precision can double your speed, whether you use native floating-point instructions or software emulation. [Editor's note: For a great intro to floating-point arithmetic, see this tutorial.]
Fractional processing To solve this problem, you can evolve the language either by creating your own dialect of C or by international standards committee. The problem with creating your own dialect of C is that your code is no longer portable. The problem with going through standards committees is that it takes decades for the world to adopt a new coding standard. Another approach is to enhance the semantic capability of the compiler in the hope that it will comprehend that complex chunks of C correspond to fractional operations. This is challenging, but it can be done. We'll look at an example in the next section. We can also offer intrinsics (or built-in functions), which map directly to single machine instructions. This produces a clumsy but efficient programming style. We'll look at an example in the following text. Given the drawbacks to all of these approaches, it is tempting to use C++ instead of C. C++ allows the programmer to define new types and overloaded operators. This may appear to be a natural way to express fractional arithmetic. However, the semantic gap between expression and intention is wider in C++ than it is in C, and this approach requires very careful coding and analysis. The C++ language is more powerful than C, but that means it does more for you automatically. For instance, C++ compilers may unexpectedly create constructors and destructors. Also, C++ style involves more indirection, which can cause problems in a compiler's alias analysis. For example, C++ programs tend to produce more temporary variables and common subexpressions, which the compiler must then analyze. As another example, data tends to exist within structs or objects, rather than as stand-alone variables.
|
|
||||||||||||||||||||||||||||
|
|
|
|