Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Statements and Loops


The New C: Statements and Loops

Several years ago, Jill, a friend of mine, was interviewing an applicant for a job with her company. Jill's employer had several openings for programmers working in different specialties: compilers, operating systems, user interfaces. Jill, wishing to match the applicant with the proper job opening, asked, "What type of programming do you like to do?"

The job applicant paused for a second to consider the question carefully. He then answered, "Loops. I like to write loops."

I have never quite made up my mind if the job candidate was being overly specific or overly general. On one hand, I have never heard, "Whoa, we need a loop here. Better call in a specialist." On the other hand, loops are useful in almost all forms of programming, even compilers, where recursive algorithms give loops a run for their money.

This column is dedicated to everyone who has ever written a loop and realized that a lot of work was going to happen in it. Which brings us to this month's subject, the changes to statements made in the 1999 revision of the C Standard. Although C99 did not create any new types of statements, it did introduce some new rules regarding the existing statements that increase their flexibility and perhaps allow you to avoid bugs. Ultimately, the statements most affected are loops.

Mixing Declarations and Statements

As I have written before [1], C99 no longer requires all of the declarations in a block to appear before all of the statements in the block. There are two properties important to understanding declarations that appear after statements.

The first property is that statements cannot use identifiers before they are declared for the simple reason that the names are not yet accessible. This rule should not be surprising since it has always been true that declarations cannot use identifiers declared in later declarations. Listing 1 shows invalid references to identifiers declared later in the block.

Some readers might think that struct and union tags are an exception to this rule. However, merely mentioning an identifier after the struct or union keywords declares that identifier as a struct or union tag, even if you do not provide a brace-enclosed list of declarations for the members of the new struct or union type. The rules for tags are somewhat involved (see 6.7.2.3 of [2]) and have been the same since the early days of C and even forward into C++ (although C++ rules out a few contexts, such as casts may not declare new types as a side effect). Tags do not violate the no-reference-before-declaration rule, and I will not discuss them specifically any further.

The second property necessary to understanding declarations following statements is that if an object in a block with automatic storage duration (declared without the static or extern keywords) is initialized, the initialization happens at run time when the declaration is reached. In other words, the initialization functions like an assignment statement, and every time the declaration is executed, the object will receive the specified value.

Thus, there is no difference between this pair of statements in a block:

int x;
x = f();
and this statement:

int x = f();
In both cases, x will be set to the return value of calling f every time the statements are executed.

Why Mix?

This leads to the reason why you should take advantage of being able to intermix declarations and executable statements. If you do not declare a variable until you can give it its first value, then you can never reference it while it is uninitialized. A simple change in programming style can eliminate an entire class of hard-to-find run-time bugs.

Consider Listing 2. There could be hundreds of statements between the declaration of sum and the first assignment to sum. The compiler will be no help at all if a programmer "cleaning up" the function manages to move the printf that references sum to before sum gets a value. In a large function, and sometimes even in a small function, it is very easy to lose track of the region of program text that sets the value of a variable.

In contrast, what if the loop-computing sum had been written as:

  int sum = 0;
  for (i = 0; i < 5; ++i)
    sum += a[i];
There would be no vast region of program text during which you could reference sum, but sum would not have the correct value. The compiler would prohibit such references.

Differences between C and C++

C99, C++, and Java all permit intermixing declarations and statements within a block, so it is a new programming style with which you should become familiar. However, the difference between how C and C++ accomplish this can lead to two differences in what is valid code.

The grammar for C++ just makes a declaration another type of statement. Thus, wherever you can have a statement, you can have a declaration. However, the grammar for C99 says that a compound statement (brace-enclosed block) is a sequence of block items, and block items are either statements or declarations.

A first glance, this appears to accomplish the same thing, since you can now put statements and declarations in a block in any order. But, while C and C++ both agree you can put a goto label on a statement, C++ considers declarations to be statements, and C99 does not. So the following is valid C++:

// C++ only
loop: int x = 0;
but not valid C99. You can write the equivalent in C99:

//valid C99 and C++
loop: ;
int x = 0;
since empty statements (a single semicolon) are valid both in C and C++. (However, I hope you really do not care about the ins and outs of goto labels.)

The second place where the two grammars for C99 and C++ permit differences is that there are contexts in the languages that permit a single statement. C++ permits a declaration to be there, but C99 does not. For example:

// C++, not C99
for (i = 0; i < 5; ++i)
    int x;
That code probably looks pretty alien to old C programmers. So alien that there might be a moment of panic wondering what the code does. Do you end up with five variables named x?

Implicit Blocks

The meaning of the above loop becomes clear once you know that C++ (and now C99) defines the statement that is the body of a loop as being a block even if it is not enclosed in braces. That block is entered and exited upon every pass of a loop. Thus, the above loop is exactly like:

for (i = 0; i < 5; ++i) {
    int x;
}
In other words, create and destroy a variable named x five times. Most compilers will eliminate all code for such a loop. Many will even complain that a variable was declared but never referenced.

You ask, if C99 does not permit a declaration in such a context, why does it borrow the C++ rule that the body of a for, while, and do-while loop is a block? The primary motivation is that it gives well-defined semantics to compound literals [3]. Compound literals are a new form of structured constant that allow you to create an unnamed object by "casting" a brace-enclosed initializer to the right type. In Listing 3, the function diagonal draws a diagonal line of the indicated length by calling drawpixel. The function drawpixel takes an argument that is a pointer to a point. The call to drawpixel in diagonal creates an unnamed object of type struct POINT using the compound-literal syntax and passes the address of that unnamed object to drawpixel. The lifetime of that unnamed object is the implicit block that is the body of the loop.

C99 and C++ not only make the bodies of for, while, and do-while loops implicit blocks, they also make the then and else clauses of if statements and the body of a switch statement also implicit blocks.

Not only are the various bodies of loops and switch statements and the then and else clauses of if statements implicit blocks, but the entire statement itself is another implicit block containing those blocks. Thus:

for (/*...*/; /*...*/; /*...*/)
  /*stmt */
means exactly the same as:

{
  for (/*...*/; /*...*/; /*...*/) {
    /*stmt */
  }
}
likewise for the if, switch, while, and do-while statements. In case you are worried, entering and exiting a block, even one that reserves storage, takes little or no time. Except when variable length arrays [4, 5, 6, 7] are used, most compilers generate code to allocate stack space only once upon entering a function. (The amount of space allocated is the minimum amount necessary to handle the maximum requirements of any of the blocks in the function.)

Again, compound literals provide part of the motivation for making these entire statements implicit, local blocks. However, there is an additional, more obvious reason that applies only to the for statement. C99 adopted the feature from C++ and Java where the first item in the parenthesized list following the for keyword (the "initializer" clause) can be either a declaration or an expression. Let's rewrite that loop from Listing 2:

int sum = 0;
for (int i = 0; i < 5; ++i)
    sum += a[i];
Now not only has the declaration sum been moved to the first point that it is needed, but the declaration of i has been moved to the first point it is needed. The scope of i is just the loop itself. It cannot be referenced before the loop or after. Since they are separate scopes, all of the loops in an enclosing block can have their own index variable named i. (C++ programmers beware: some older C++ compilers do not consider the loop itself to be a block, and any index variable you declare will persist to the end of the explicit block enclosing the for loop.)

Note, C99 did not pick up the C++ feature that allows declarations as the controlling expressions of while, if, switch, or do-while statements. The most common uses of declarations in those contexts are an idiom involving the C++-only feature of run-time type identification. C programmers would likely never find declarations useful in those contexts.

Conclusion

We have come full circle. C99 adopted a feature from C++ and Java that permits intermixing declarations and code in order to allow safer programming. While there are some minor differences between C99 and C++ regarding this, they are in obscure and not very useful dark corners. Additional implicit blocks were introduced into the language to support the new contexts for declarations and expressions that have the side effect of creating unnamed objects. This ultimately opened a new place to allow declarations, the initializer clause of a for statement. The ultimate goal is to support a style of programming that eliminates uninitialized variable bugs.

References

[1] Randy Meyers. "The New C: Declarations and Initializations," C/C++ Users Journal, April 2001.

[2] ANSI/ISO/IEC 9899:1999, Programming Languages -- C. 1999. Available in Adobe PDF format for $18 from <www.techstreet.com/ ncitsgate.html>.

[3] Randy Meyers. "The New C: Compound Literals," C/C++ Users Journal, June 2001.

[4] Randy Meyers. "The New C: Why Variable Length Arrays," C/C++ Users Journal, October 2001.

[5] Randy Meyers. "The New C: Variable Length Arrays, Part 2," C/C++ Users Journal, December 2001.

[6] Randy Meyers. "The New C: Variable Length Arrays, Part 3: Pointers and Parameters," C/C++ Users Journal, January 2002.

[7] Randy Meyers. "The New C: Variable Length Arrays, Part 4: VLA typedefs and Flexible Array Members," C/C++ Users Journal, March 2002.

About the Author

Randy Meyers is a consultant providing training and mentoring in C, C++, and Java. He is the current chair of J11, the ANSI C committee, and previously was a member of J16 (ANSI C++) and the ISO Java Study Group. He worked on compilers for Digital Equipment Corporation for 16 years and was Project Architect for DEC C and C++. He can be reached at [email protected].


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.