Site Archive (Complete)
C++ Blog: Why compiler writers sometimes like undefined behavior
C++
void main(void)

Calls, Returns and In-Between.

by Kevin Carlson
SELECTIVE IGNORANCE

Finding the Signal in the Noise

by Andrew Koenig
October 06, 2006

Why compiler writers sometimes like undefined behavior

Undefined behavior in a programming language makes life harder for programmers than most people realize, because a program that evokes undefined behavior has an error that no amount of testing is assured of revealing.

If you do something that's formally undefined, but happens to do what you want, how can you ever find the problem short of happening to notice it?

Because undefined behavior is so problematic, one is tempted to wonder why any programming langauge ever fails to define behavior. Here is one example from real life.

The C and C++ language definitions both say that the result of a signed integer overflow is undefined. Just about every implementation treats integer overflow as modular arithmetic. In other words, if the result doesn't fit in a word, the implementation quietly throws away the extra high-order bits. Why not standardize that behavior, thus allowing programmers to count on it?

Here is one example of why not. Once upon a time, I knew someone who was working on a C compiler for a processor embedded in a large telephone switch. Like many such processors, this one had a "condition code" -- two bits that were automatically set based on the result of each computation. Two bits can represent four states; in this case, the states were positive, negative, zero, and overflow.

Consider the following C statement:

if ((x -= y) == 0) { /* ... */ }

Assuming that x and y are integer variables, what code should the compiler generate for this statement?

The obvious strategy is to subtract y from x and then generate an instruction to test the condition code. If the condition code indicates that the result was zero, it executes the code in the braces; otherwise it skips that code.

This strategy behaves anomalously if the values of x and y are such that the result is zero with an overflow. In that case, the value of x is zero but the condition code indicates overflow, not zero.

The question, then, is whether the compiler should be permitted to generate code that behaves this way. If you take the view that the effect of an overflow is undefined, then the answer is yes: Once your program has overflowed, the compiler can do as it pleases; and in the absence of overflow, the code is correct. If, on the other hand, you think that irrespective of what happened earlier, your program is asking about the value of x, then your compiler had better deal correctly with the case where x is zero, even if it got that way after an overflow.

The compiler writer pointed out that the language specification was clear, and that changing the compiler for the convenience of the user who had encountered the problem would make execution slower for all users. The reason was that in order to make the generated code work "correctly" after an overflow, it would have been necessary to generate an extra instruction to test the value of x after storing it. The processor would have to execute that instruction even when it wasn't needed.

When you see a language specification that says "in this case, the behavior is undefined," it is a reasonable bet that the language author(s) wanted to allow extra latitude for implementors in problematic cases such as this one. In effect, they're weakening the abstractions that the language provides as a trade for easier or more efficient implementation.

Posted by Andrew Koenig at 12:22 PM  Permalink




 
INFO-LINK


Related Sites: DotNetJunkies, SD Expo, SqlJunkies