Embedded Systems

Programming High-Performance DSPs: Part 3

By Rob Oshana and Texas Instruments, November 30, 2006

This third of a three-part series shows how you can help the compiler produce faster code. It explains the drawbacks of software pipelining. It also explains how to optimize for minimum power consumption.

Power Optimization for Embedded Systems Programmers
Despite the importance of power consumption and memory use, relatively little emphasis has been placed on optimizing power and memory for embedded applications. This paper will provide some guidelines on optimizing embedded applications for power.

Just as code size and speed impact cost, power consumption also affects cost. The more power consumed by an embedded application, the larger the battery required to drive it. For a portable application, this can make the product more expensive, unwieldy, and undesirable. To reduce power, you need to make the application run in as few cycles as possible, considering that each cycle consumes a measurable amount of energy. In this sense, it would seem that performance and power optimization are similar—consume the fewest number of cycles to get both performance and power optimization goals. Performance and power optimization strategies share similar goals but have subtle differences as will be shown shortly.

But the real power optimization gains come with how data is accessed before being processed by the embedded CPU. Most of the power consumed in an embedded application comes not from the CPU but from the processes used to get data from memory to the CPU. Each time the CPU accesses external memory, buses are turned on, and other functional units must be powered on and utilized to get the data to the CPU. This is where the majority of power is consumed. If the programmer can design embedded applications to minimize the use of external memory, efficiently move data into and out of the CPU, and make efficient use of cache to prevent cache thrashing, the overall power consumption of the application will be reduced significantly. Figure 16 shows the two main power contributors. The compute block includes the CPU and this is where the algorithmic functions are performed. The other is the memory transfer block and this is where the memory subsystems are utilized by the application. The memory transfer block is where the majority of the power is consumed by an embedded application.

Figure 16. The main power contributors for an embedded application are in the memory transfer functions, not in the compute block. (From PowerEscape)

LAST RESORT - ASSEMBLY LANGUAGE
Many times, the C code can be modified slightly to alleviate this situation, but it can take time and several iterations to get the optimal (or close to optimal) solution. The process of refining code in this manner is shown in Figure 17. The last resort is coding the algorithm in assembly language. Assembly language is harder to write, understand, and maintain. Tools have been developed that make it easier for assembly language programmers to write efficient code for superscalar and VLIW processors. Assembly language optimizers, for example, allow the programmer to write serial assembly language and then optimize it into software pipelined loops automatically.

Figure 17. Code optimization process

CONCLUSION
Real time programmers have always had to develop a library of tricks to allow software to run as fast as possible. As processors continue to become more complicated, this becomes a more difficult endeavor. For superscalar VLIW processors, managing two separate pipelines and insuring the highest amount of parallelism requires tools support. Optimizing compilers are helping overcome many of the obstacles of these powerful new processors, but even the compilers have limitations. Real time programmers should not trust the compiler to perform all of the necessary optimizations for you. They need help! The main steps to follow are:

Study the assembly language produced by the compiler. In many instances, subtle changes to the structure of the C code can make a big difference in how the compiler generates the .asm language. This can make the difference in the real time performance of the system.
Use the DMA capabilities. Especially for data intensive number crunching applications common in DSP systems. The DMA can take a huge burden off of the CPU and help manage data efficiently.
Keep the pipelines full. The whole reason superscalar and VLIW processors were invented was to take advantage of parallelism. Look for areas of inefficiency in the assembly language make modifications to allow both pipelines to be used at their full efficiency. This requires an understanding of what the compiler looks for in terms of pipelining opportunities. It also requires an understanding of the application. Many times, just re-arranging the algorithm in a different way can make it run more efficiently on the processor.

References

TMS320C62X Programmers Guide, Texas Instruments, 1997
Computer Architecture, A Quantitative Approach, by John L Hennesey and David A Patterson, copyright 1990 by Morgan Kaufmann Publishers, Inc., Palo Alto, CA

Previous 1 2 3

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Embedded Systems

Programming High-Performance DSPs: Part 3

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Embedded Systems Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Embedded Systems

Programming High-Performance DSPs: Part 3

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Embedded Systems Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content