The 20th Anniversary of the C Standards Committee

, September 01, 2003

Moore's Law and then some -- how far we've come!

The New C: The 20th Anniversary of the C Standards Committee

We interrupt this series of articles about the "new C" to commemorate the C committee's 20th anniversary. The C committee started with a proposal from Jim Brodie (then, and now, at Motorola). He was writing a C compiler for a small Motorola chip and noticed that there was no standard for C. His boss suggested that he propose creating a committee. The new committee, known then as X3J11, held its organizational meeting in June 1983, at the Washington, D.C., offices of Accredited Standards Committee X3 (now known as INCITS, see <http://www.incits.org/>). The first meeting of X3J11 to address technical matters was held in San Francisco, September 1983. Those who attended have described the unusual heat wave that hit San Francisco that September. Since summers are usually cool there, the hotel had no air-conditioning. The attendees had to leave the meeting-room windows open. Then they had to shout above the noise of jackhammers working on the massive renovation of cablecar tracks. Nonetheless, a productive and harmonious meeting launched the C standards process. Three subcommittees were created: Environment, chaired by Ralph Russel Ryan (then, and now, at Microsoft); Language, chaired by Larry Rossler (then at AT&T Bell Laboratories, now at Hewlett-Packard); and Library, chaired by P. J. Plauger (then at Whitesmiths, Ltd., now at Dinkumware, Ltd.).

There were four five-day meetings of X3J11 each year. In today's world of email, Internet, and laptops, the 1980's committee process appears quaint in retrospect. Massive mailings of paper, eight times a year, kept members in touch. Committee officers brought cardboard boxes full of committee records to each meeting. Nonetheless, the schedule was considered adequate. The major decisions were made within three years, by 1986. Further meetings ratified and tidied the work in 1987 and 1988. An unexpected tempest-in-a-teapot challenge delayed the X3J11 standard until 1989.

The 1989 standard for "ANSI C" won a Productivity Award at the subsequent Software Development conference. Byte Magazine reported a measurable convergence among C compilers: their 1989 review of C compilers found that all compilers could execute all of Byte's test programs, without any need for all the little source-code tweaks that had characterized their previous compiler reviews.

By 1989 an international standards process for C had also been initiated, known as ISO/IEC JTC1/SC22/WG14 (see std.dkuug.dk/JTC1/SC22/WG14). Subsequently, the name "X3J11" was shortened to "J11" (see <http://www.x3.org/incits/tc_home/j11.htm>). Since the early 90's, WG14 and J11 have been meeting "co-located" (i.e., at the same time and place). The twentieth-anniversary meeting of J11 will be held October 14-17, 2003, in Kona, Hawaii, hosted by Plum Hall.

And twenty years after the first meetings, C is still one of the most widely-used programming languages.

To mark the occasion, we have opened up a small "time capsule": the C benchmarks from a May 1988 CUJ article: "Simple Benchmarks for C Compilers" [1].

The textbook "Efficient C" [2] described how useful it can be if the programmer has a ballpark idea of how much CPU time it takes to execute the "average operator" in a language like C. Then, a programmer can make reasonable estimates of the CPU time that a particular algorithm will take. The book showed simple benchmarks to determine these numbers. They are protected against overly-aggressive compiler optimizations and reflect empirically-observed operator frequencies in C programs. (Note: These benchmarks do produce integer overflows during computation. If your machine traps overflow, you will need to modify them to use unsigned integer types.)

The inner loop has exactly 1000 operations. Let T be the execution time in microseconds. Then the average time per operator is T/major nanoseconds. (The 1988 version of these benchmarks measured loop time in milliseconds, to produce average times in microseconds.)

Moore's Law says that, for equal cost, capacity doubles every 18 months, or approximately ten times capacity every five years. So after fifteen years, Moore's law predicts that if the "average C operator" took a few microseconds in 1988, then it would take a few nanoseconds in 2003. It's remarkable how accurately that prediction is realized in these results.

For our purposes, it is actually helpful that these benchmarks have not become very widely-used since they were published in CUJ fifteen years ago. Managers routinely channel resources into making compilers perform well on popular benchmarks. If compilers started "recognizing" these benchmarks, one would have to introduce variations that would be less recognizable or change the command-line input used. (The current benchmarks must always be run with a one-character first argument, the digit "1".)

Some early benchmarks (such as the "sieve" program) are prone to total elimination; a very aggressive compiler can determine the output at compile time. A more systematic approach was shown in "Dhrystone" [3].

These "average-operator" benchmarks are very small (as was Dhrystone); the resulting code is likely to reside in one page of cache memory. Thus the "average operator" as measured here will execute more quickly than the average operator in a large application. This is one reason why serious commercial benchmarking suites (such as "Spec" at <http://www.specbench.org/>) use realistic large programs.

Based upon Dhrystone's survey of the frequency of operators in real C programs, the inner loop of our benchmark emphasizes assignment, then addition, then the various other operators. Within the inner loop, 40% of the operators are assignments. Of the other operators, the most frequent are plus and minus. The sequence of operations is chosen to ensure that an aggressive optimizer cannot find any useless code sections; each result depends functionally upon previous results.

To benchmark the average function call-and-return time, we cause 1000 call-and-returns to be executed. If your compiler detects and optimizes useless code, put the lowest-level function (f3), and the variable named dummy, in a separate source file

Floating-point operands are not allowed for the shift, remainder, and bitwise operators, and the subscript operator does not really exercise the floating-point instructions. Therefore, to benchmark the double operators, we had to replace the inner loop body with a slightly different version, which still gives us a representative mix of typical operations.

Plum Hall customers have been reporting results of running this benchmark for a number of years, and some of these results are in Table 2. Since in many cases these results were obtained by people who own particular machines rather than by the manufacturers of those machines, the results might not show the maximum performance that could be obtained by careful selection of compilation options.

In keeping with the retrospective theme of this article, we decided to run the benchmark in-house to get a rough idea of how computers and compilers from the early days of Standard C compare to computers and compilers from today. Since even Randy does not keep old computers forever, we used a pretty reasonable stand-in for a mid-80s computer: a palmtop computer manufactured in early 90s that is architecturally equivalent to an IBM XT, although faster. It has a Chips and Technology 8680 (compatible with the Intel 8086), running at either 7 MHz or 14 MHz. Depending upon the clock speed, it is either 4.9 or 10.5 times the speed of an XT. It has one megabyte of RAM and runs Microsoft MS/DOS V5.0.

The typical modern personal computer is represented by an AMD Athlon XP 2200+ (actual clock speed 1800 MHz) with 512 megabytes of memory. It runs Microsoft Windows XP and Red Hat Linux (both kept current with patches).

For the old compiler, we used Borland's Turbo C V2.01, a C compiler from 1989 that supported many ANSI C features, and targeted MS/DOS. (Turbo C may be downloaded for free from community.borland.com/museum.) When we compiled the benchmark, we optimized for speed, enabled register allocation, and aligned data. Since the palmtop did not have a hardware floating point (typical of many PCs in the early to mid 80s), we had to enable software floating point emulation. We also compiled the benchmark with hardware floating point support for testing with the modern PC.

For the new compilers, we used the Microsoft Visual Studio 2002 C compiler and the Red Hat Linux gcc compiler version 2.96-RH. We compiled the benchmark under Visual C both as a "release" (optimized) build and a "debug" (non-optimized) build. On Red Hat gcc, we compiled with different levels of optimization enabled.

The results of this benchmarking is Table 1. There are several interesting points.

The modern PC is up to 850 times faster for register int operations, over 2,600 times faster for auto long operations, close to 600 times faster when calling a function, and over 230,000 times faster for floating point.

It is especially interesting to contrast the results of the modern compilers against the old compiler when running on new hardware. When you compare the Turbo C hardware floating-point results to the optimized Visual C results, there is not much difference for the register int and auto short categories. However, gcc performs 26% faster for auto short. Both Visual C and gcc show 65% faster performance for auto long. This probably results from the modern compilers using the 32-bit instruction set rather than the 16 bit instruction set used by the old Turbo C. Likewise, the 40% to 85% increase in performance for auto double in the modern compilers probably reflects the use of new instructions as much as more effort spent in fundamental optimizations of floating point. Given the surprisingly good showing of the old Turbo C optimizer compared to the modern optimizers, this probably confirms that the benchmark met its goal of being safe from very sophisticated optimizations.

Although the benchmark is designed to prevent aggressive optimizations, it still shows the importance of basic optimizations on a program's performance. If you compare the Visual C debug versus release results, and the gcc unoptimized versus -O1 results, you see about a factor of two speed increase. Often, a little register allocation goes a long way.

The gcc optimized results also serve as a cautionary tale. Just because a compiler supports different levels of optimization, that does not mean that your program gets faster the more optimization that you enable. The gcc auto double results actually are a factor of two slower at the higher optimization levels (-O2, -O3). Other compilers sometimes show the same paradoxical result that higher "optimization" levels can slow down a program. This might be caused by plain bugs in the optimizer, or by the fact that some optimizations only work well in certain situations, and the compiler uses an imperfect heuristic when deciding to perform the optimization. The lesson here is that if performance of your program is critical, then you must study its performance at different optimization levels and with different compiler options. When computer manufacturers benchmark hardware for sales literature, it is common for them to have very talented engineers study the effects of almost every compiler option on the benchmark programs.

The 35,000 times increased performance of auto double between the Turbo C on the palmtop and Turbo C on the modern PC is due not only to the modern machine being much faster, but also due to the modern machine having hardware floating point instructions. The Turbo C benchmark compiled with and without hardware floating point does not effectively show the effect of hardware floating point, since the software floating point emulation routines will use floating point hardware if it finds it. Note, however, that the modern compilers are 3.5 times (gcc) to 6.8 times (Visual C) faster at floating point using today's floating point hardware instructions than the old Turbo C.

The full benchmark sources are available on the CUJ website (<www.cuj.com/code/>); also available there are copies of some of the executables as compiled by various compilers.

The next article in the series will continue our discussion of C99 and the subsequent Technical Reports.

References

[1] Thomas Plum. "Simple Benchmarks for C Compilers," C Users Journal, May 1988.

[2] Thomas Plum and Jim Brodie. Efficient C. (Plum Hall, 1985).

[3] R. Weicker. "Dhrystone: A Synthetic Systems Programming Benchmark," Communications of the ACM, volume 27, pages 1013-1030, 1984.

About the Authors

Randy Meyers is a consultant providing training and mentoring in C, C++, and Java. He is the current chair of J11, the ANSI C committee, and previously was a member of J16 (ANSI C++) and the ISO Java Study Group. He worked on compilers for Digital Equipment Corporation for 16 years and was Project Architect for DEC C and C++. He can be reached at [email protected].

Dr. Thomas Plum has authored four books on C, and co-authored Efficient C (with Jim Brodie) and C++ Programming Guidelines (with Daniel Saks). He has been an officer of the United States and International C and C++ standards committees. His company Plum Hall Inc. provides test suites for C, C++, Java, and C#. His address is [email protected].

1 2 3 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

The 20th Anniversary of the C Standards Committee

References

About the Authors

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

The 20th Anniversary of the C Standards Committee

References

About the Authors

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content