Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

To Inline or Not To Inline


May04: To Inline or Not to Inline

Richard lives and works in London, where he provides development, training, and consultancy services. He can be contacted at http://www.dodeca.co.uk/


Inline functions, supported by programming languages such as C++, can deliver dramatic performance improvements while preserving proper design semantics. However, the misuse of inlining can be detrimental to an application's performance. While certain classes of functions are obvious candidates for inlining, others are not. Moreover, inlining a given set of functions can affect performance in unexpected ways. Of course, profiling tools can be useful, but they don't always tell the whole story. Compounding this, profiling is a postcompilation step, which runs contrary to the idea of pushing as much of the development process into design time as is possible, instead of compile time and runtime. To address problems such as these, I present in this article a way of visualizing the interplay between the different factors involved in making inlining decisions.

The Problem

In terms of source code, an inline function is virtually the same as any other function. When compiled, however, the body of the function is inserted at every call site, as opposed to a jump to a single instance of the body that is held elsewhere in the executable. Given that this obviates the overhead associated with the normal call mechanism, execution of the function should be faster.

But as straightforward as it seems, this is not a mandate for inlining everything. Compilation dependencies aside, there are three factors that affect your decision to inline or "outline" a given function:

  • Compiled function size (assuming uniform size and machine cycles for each instruction).

  • Static (or spatial) frequency (the number of places the function is called from).

  • Dynamic (or temporal) frequency (the number of times the function is called in a typical run).

Within these criteria, small functions are clear candidates for inlining because the proportion of overhead to real execution costs is very large. Inlining these will always result in faster and smaller executables. Moreover, a smaller executable contributes to faster execution.

However—and this is the pivotal issue—the effect is reversed with large functions because the proportion of overhead to body execution costs tends to be very small. Given that inlining, in principle, replicates the function body at each call site, many call sites lead to larger executables, which significantly impairs performance regardless of the faster execution that results from the calling overhead.

Generally, therefore, the benefits of inlining larger functions will only be retained if they are called from only one location (that is, Singletons). However, this principle can be offset by how often a function is called in a typical run of the program—the more calls there are, the more the overhead-free benefit of inlining may be felt. Thus, runtime popularity may justify inlining a relatively large statically frequent function, and here lies the challenge: Can we construct a numerical model that better resolves the boundary between favorable and unfavorable inlining candidates?

Calculating Inlining Favorability

Even though a given function's performance metrics are highly platform dependent, you can quantify the three variables—compiled function size, static frequency, and dynamic frequency—as proportions. Moreover, you can try to determine whether a function's call overhead is paying its way or proving to be deadweight. Therefore, it follows that subtracting the cost of inlining from that of outlining yields the overall savings that results if the function is inlined—its inlining "favorability." A negative result means that the cost of inlining exceeds that of outlining, meaning that the function should remain out of line. Expressed symbolically:

Favorability=Outlining Cost-Inlining Cost

An out-of-line call incurs an overhead in space because storage is consumed by the calling overhead instructions. However, an outlined call also incurs a cost in time because the CPU is delayed as it executes the calling overhead. Given this, you can define the total cost of outlining as a combination of the temporal and spatial costs. However, since we are dealing with proportions, the arithmetically correct method is to first take the sum and then subtract the product; for example:

Outlining Cost =

Spatial Overhead+

Temporal Overhead-Spatial Cost×Temporal Cost

From this, the favorability of inlining a given function can be expressed formally as follows, where F, s, fs, and ft denote favorability, size, spatial frequency, and temporal frequency, respectively:

F=(1-s)fs+(1-s)ft-fsft(1-s)2-sfs

Visualizing the Space

Putting real numbers through this equation, you can map the results to a color scheme, where:

  • Red means that the function should not be inlined.
  • Yellow means that the merits/demerits of inlining a given function cancel each other out.

  • Green indicates that inlining is beneficial.

This lets you depict a three-dimensional "inlining-decision space," as in Figure 1.

What is immediately apparent (and as intuition suggests) about Figure 1 is that the space is skewed towards choosing not to inline. Mostly, it is colored red, orange, and yellow because it is a situation of two against one—plurality and size, against temporal popularity, which translates to the deleterious effects of inlining (executable bloat) against our desire to hurry the processor along.

In the best-case scenario (the lower front-right of the space), it is those CPU-popular little minnows that are called everywhere that make a real difference when inlined. Conversely, the functions that should remain forever out of line are very large, temporally unpopular, and spatially frequent (the top front-left of the space). Not that such behemoths should ever arise in any self-respecting developer's code; they indicate a poor understanding of design, coding, and compilation issues, one in which the issue of inlining is superfluous.

The Boundary

Given this distribution, the unfavorable part can be excised to leave just the area where inlining is beneficial. By adopting a linear scale for function size, then concentrating only on the 0.500 to 0.905 range, you can zoom in on the boundary and answer the original question. As Figure 2 demonstrates, for example, temporally popular Singletons have an upper limit (in this model) of 0.86 in terms of real costs to overhead costs; past that, it is not worth inlining them.

An Optimistic Nature

So far, I have not addressed the nature of the function in question. Semantically, one block of source code differs radically from another, and this can have a great bearing on the decision to inline. First of all, functions can range from being entirely compute bound to entirely I/O bound, the latter being poor candidates for inlining. Second, different functions can be considerably more or less amenable to compiler optimizations. This is because, while inlining is an optimization in itself (when used correctly), the fact that it renders the function body visible from within the caller makes additional, powerful optimizations available. This can have a great bearing on your decision to inline because, viewed in this light, a previously unfavorable candidate can suddenly become favorable.

Given this, you can extend the aforementioned equation to yield the following, where c and o denote "compute binding" and "optimizability," respectively:

F=(1-s)fs+(1-s)ft-fsft(1-s)2-sfs(1-(c+o-co))

However, you run into difficulties here: First, it is not possible to (usefully) render a five-dimensional space in two dimensions, in a manner akin to my treatment of the original equation (although you could use a workaround). Second, determining a function's true potential for optimization is challenging and depends on the most intimate knowledge of your compiler, as well as a deep understanding of the language. Moreover, if an inline function contains other inline calls (and so on, recursively), then calculating its favorability metric is redundant because the code that is generated could be very distant from the code that you originally wrote.

Show Me the Numbers

Bear in mind that the C++ inline keyword is only a hint to the compiler; therefore, this discussion is in part academic, aside from the problem of incorporating compute binding and optimizability. In addition to this, seasoned developers may opine that they understand the various trade-offs intuitively.

Here lies an important lesson, however. This project began life as an ad hoc handout for a group of developers who attended a high-performance C++ training course. As a somewhat rough-and-ready illustration of inlining criteria, the decision space presented was, in part, an educated guess; Figure 3 shows this. Comparing Figure 3 with Figure 1, which was generated numerically, is revealing. Specifically, I overestimated the degree to which a high temporal frequency compensates for a high spatial frequency with larger functions. Clearly, the numbers do not support my sunny optimism and the upshot is that, while intuition gets us most of the way, to be absolutely sure, you must go to hard data.

For further explanation/exploration of issues pertaining to inlining and performance-oriented C++, Efficient C++: Performance Programming Techniques by Dov Bulka and David Mayhew (Addison-Wesley, 2000; ISBN 0201379503) is an excellent starting point. However, note that while the authors discuss static call frequency as a criterion, the inlining-decision matrix they present only plots function size against temporal frequency and is not founded on a numerical model.

DDJ


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.