Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Winning the Passing Game


Optimize to the Max

One frequent operation that is commonly overlooked is subroutine or class method calling. Calls to subroutines or class methods may make up 50 percent or even as much as 80 percent of the source code. There is not much interesting in a subroutine call itself. But there are other operations that happen almost every time a subroutine is called. These are operations related to parameter passing.

In the good old days, when procedural languages reigned supreme, parameter passing was simple. All it took was to create a stack frame and to push values to the stack:

push A
push B
call proc
...
; proc
push ebp ; stack frame
mov ebp,esp ; created here
; do something
mov esp,ebp ; stack frame
pop ebp ; destruction
ret

This is still a very typical scenario for a C/C++ program because there are many routines that operate on "simple" (i.e. non-object) parameters. Even if the routine is defined with an __inline modifier and in-lined in the program code, parameters are pushed into the stack and stack frame is created. The only thing that changes is that the call instruction is replaced by the subroutine code and the ret instruction is eliminated. Looking at the design of modern CPUs it is easy to see that inlining does not provide any performance improvement. In most cases, call and ret instructions are processed in zero cycles due to successful static branch prediction and instruction prefetching. However, excessive inlining may blow up the size of the code and ultimately reduce performance due to the increased likelihood of instruction cache misses. Perhaps the only reason to use inlining is when the routines are extremely compact (just a few operations) and called very frequently. Good examples are CString methods in C++ and COMPLEX arithmetics.

Making the Most of Your Registers

There is another modifier that can be helpful: __fastcall forces subroutine parameters to be passed in registers. This eliminates memory operations such as pushing parameters into stack and stack frame access. Also, instructions that operate solely on registers execute faster in the internal CPU pipeline. However in the x86 architecture, sometimes there are just not enough registers to accommodate all the values.

Also there is an /Oy- compiler option in Visual C++. It turns off stack frame initialization, which saves a few instructions and frees the EBP register for general use. Though the advantage is small, it's still an advantage. Needless to say in a scarce pool of x86 registers, an extra register may be a big asset.

Simple parameters are only a part of the problem. Most programs use objects heavily and pass them as parameters frequently. Where there are objects, one finds constructors, destructors, and quite often memory allocation. And did I mention local variables? Consider what happens in the following code sample:

void foo(CString S)
{
 
CString S2;
 
...
}
...
CString S1;
foo(S1);

First the constructor for S1 is called. Then the copy constructor for S (which also allocates memory using the new operator). Then the constructor for S2 is called; then the subroutine does something. Then the destructor for S2 is called (which releases allocated memory using the delete operator). Then the destructor for S is called (which again releases allocated memory using the delete operator). What if there are more parameters? And what if they are complex objects with complex constructors, or destructors that, among other things, allocate and/or free memory? And what about all those local variables? It is clear that the overhead can be quite substantial. Is there a work around? Of course: Pass objects by reference and avoid, minimize, or consolidate local variables or make them static. Given these guidelines, the foo() routine can be rewritten as:

void foo(const CString& S)
{
 
static CString S2;
 
...
}

While there is nothing wrong with using static local objects (though you must remember to initialize or clear the static objects forcefully every time the routine is called) local variable consolidation is now considered a bad practice because it violates code separability. For instance, if you have two routines foo() and faa() that both rely on a local CString variable it is possible to consolidate both local variables into one by defining a global CString.

Also keep in mind that static or global variables are not thread safe. If several threads or processes call the same function that uses a static variable, the value of the static variable will be undetermined unless explicit synchronization (e.g., using incremental locks and mutexes) is employed. Though global variables are out of favor and there are some risks, there is no reason why we should not consider using them when performance really matters (or rather, there is no reason why compilers should not attempt to consolidate local object-type variables automatically).

A Winning Strategy

To improve the performance of subroutine/method calls, pass parameters in registers (__fastcall modifier in C++); pass objects by reference; reduce usage and/or consolidate expensive local variables by making them global, or make them static to prevent violation of code separability.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.