Winning the Passing Game

By Max Fomitchev, August 12, 2001

One of the most common approaches to improved performance is to optimize the most frequent operations. Even though the benefit resulting from optimizing a single operation is small, if that operation is called frequently, it can result in substantial improvement. Although the idea is simple, finding the right operations to optimize can be a challenge.

Optimize to the Max

One frequent operation that is commonly overlooked is subroutine or class method calling. Calls to subroutines or class methods may make up 50 percent or even as much as 80 percent of the source code. There is not much interesting in a subroutine call itself. But there are other operations that happen almost every time a subroutine is called. These are operations related to parameter passing.

In the good old days, when procedural languages reigned supreme, parameter passing was simple. All it took was to create a stack frame and to push values to the stack:

push A push B call proc ... ; proc push ebp ; stack frame mov ebp,esp ; created here ; do something mov esp,ebp ; stack frame pop ebp ; destruction ret

This is still a very typical scenario for a C/C++ program because there are many routines that operate on "simple" (i.e. non-object) parameters. Even if the routine is defined with an __inline modifier and in-lined in the program code, parameters are pushed into the stack and stack frame is created. The only thing that changes is that the call instruction is replaced by the subroutine code and the ret instruction is eliminated. Looking at the design of modern CPUs it is easy to see that inlining does not provide any performance improvement. In most cases, call and ret instructions are processed in zero cycles due to successful static branch prediction and instruction prefetching. However, excessive inlining may blow up the size of the code and ultimately reduce performance due to the increased likelihood of instruction cache misses. Perhaps the only reason to use inlining is when the routines are extremely compact (just a few operations) and called very frequently. Good examples are CString methods in C++ and COMPLEX arithmetics.

Making the Most of Your Registers

There is another modifier that can be helpful: __fastcall forces subroutine parameters to be passed in registers. This eliminates memory operations such as pushing parameters into stack and stack frame access. Also, instructions that operate solely on registers execute faster in the internal CPU pipeline. However in the x86 architecture, sometimes there are just not enough registers to accommodate all the values.

Also there is an /Oy- compiler option in Visual C++. It turns off stack frame initialization, which saves a few instructions and frees the EBP register for general use. Though the advantage is small, it's still an advantage. Needless to say in a scarce pool of x86 registers, an extra register may be a big asset.

Simple parameters are only a part of the problem. Most programs use objects heavily and pass them as parameters frequently. Where there are objects, one finds constructors, destructors, and quite often memory allocation. And did I mention local variables? Consider what happens in the following code sample:

void foo(CString S) { CString S2; ... } ... CString S1; foo(S1);

First the constructor for S1 is called. Then the copy constructor for S (which also allocates memory using the new operator). Then the constructor for S2 is called; then the subroutine does something. Then the destructor for S2 is called (which releases allocated memory using the delete operator). Then the destructor for S is called (which again releases allocated memory using the delete operator). What if there are more parameters? And what if they are complex objects with complex constructors, or destructors that, among other things, allocate and/or free memory? And what about all those local variables? It is clear that the overhead can be quite substantial. Is there a work around? Of course: Pass objects by reference and avoid, minimize, or consolidate local variables or make them static. Given these guidelines, the foo() routine can be rewritten as:

void foo(const CString& S) { static CString S2; ... }

While there is nothing wrong with using static local objects (though you must remember to initialize or clear the static objects forcefully every time the routine is called) local variable consolidation is now considered a bad practice because it violates code separability. For instance, if you have two routines foo() and faa() that both rely on a local CString variable it is possible to consolidate both local variables into one by defining a global CString.

Also keep in mind that static or global variables are not thread safe. If several threads or processes call the same function that uses a static variable, the value of the static variable will be undetermined unless explicit synchronization (e.g., using incremental locks and mutexes) is employed. Though global variables are out of favor and there are some risks, there is no reason why we should not consider using them when performance really matters (or rather, there is no reason why compilers should not attempt to consolidate local object-type variables automatically).

A Winning Strategy

To improve the performance of subroutine/method calls, pass parameters in registers (__fastcall modifier in C++); pass objects by reference; reduce usage and/or consolidate expensive local variables by making them global, or make them static to prevent violation of code separability.

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Winning the Passing Game

Making the Most of Your Registers

A Winning Strategy

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Winning the Passing Game

Making the Most of Your Registers

A Winning Strategy

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content