Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Thoughts on Language Design


January, 2006: Thoughts on Language Design

Guy is a Sun Fellow, Distinguished Engineer, and Principal Investigator at Sun Microsystems Laboratories. Along with James Gosling and Bill Joy, he wrote the original specification for Java. Guy is also the recipient of the 2005 Dr. Dobb's Journal Excellence in Programming Award. He can be contacted at [email protected].


It was back in Fall of 1971 that I began to write my first Lisp interpreter, for a 16-bit minicomputer, the IBM 1130. I based my design on documents from MIT about its Lisp system for the PDP-10, but I fiddled a bit with the language, partly because the 1132 printer on my IBM 1130 didn't have the same character set as on the Teletypes at MIT. In particular, the printer didn't have double quotes, and single quotes are used for something else in Lisp, so in the end I had to use commas as my string delimiters. (Fortunately, Lisp syntax didn't use commas for anything else at the time.) This was my first experience with serious language design and implementation. It was also my first encounter with the need for technical compromises. The commas did work—but they looked really silly.

A similar sort of compromise was made back in the 1950s. Why do we use an asterisk for a multiplication sign? This is not the normal practice of mathematicians; it's not what we teach in elementary schools. I observe that when numerical programming languages were first designed, existing punched-card equipment had been designed for business purposes—International Business Machines, right? Keypunch machines had plus signs, minus signs, and slashes, but no multiplication signs. The asterisk was pressed into service and it survives to this day, not only in Fortran, but in C and Python and Perl and nearly every other programming language now in widespread use.

Good programming-language design requires judgment and compromise. It requires the passion to innovate, but also a healthy respect for tradition, and the good sense to make trade-offs when resources are limited—and resources are always limited. The character set is one such resource.

Language design is as much the art of what to leave out as what to put in. In 1968, Edsger Dijkstra wrote a letter to the editor of the Communications of the ACM. It was headlined "Go To Statement Considered Harmful." It's now considered a classic. I'd like to read you just the first two sentences:

For a number of years I have been familiar with the observation that the quality of programmers is a decreasing function of the density of go to statements in the programs they produce. More recently I discovered why the use of the go to statement has such disastrous effects, and I became convinced that the go to statement should be abolished from all "higher level" programming languages (i.e., everything except, perhaps, plain machine code).
Communications of the ACM, Vol. 11, No. 3, March 1968, pp. 147-148; see also http://www.acm.org/classics/oct95/.

Dijkstra went on to defend this position and to acknowledge that he was not the first to make such observations. But it was this letter that sparked the big controversy in computer science at that time. An entire session was devoted to the topic at the ACM National Conference in 1972. This was organized as a debate; Martin Hopkins of IBM presented the case for the goto, and Bill Wulf of Carnegie-Mellon University presented the case against. (See Hopkins, "A Case for the GOTO," and Wulf, "A Case against the GOTO," both in ACM SIGPLAN Notices, Vol. 7, No. 11, November 1972.)

This was the same year that Dahl, Dijkstra, and Hoare published an elegant little book called Structured Programming (Academic Press, ISBN 0122005503). It was also just before I became a student at Harvard College. I was 17, and I had just gotten a summer job working on MacLisp at MIT. (So, you see, my previous year's experience implementing Lisp for the 1130 really paid off!) The conference was in Boston, so I was able to attend. (I was already a student member of the ACM, thanks to the encouragement of a kindly judge at a science fair.) At the end of the Goto Debate, there was time for audience comments and questions. I can't resist quoting some of my remarks from the transcript of the discussion because they represent my earliest published thoughts on language design:

I'd like to point out that some consideration should be given to how the style of the language can affect not only the readability of the program after it is written but the conceptualization of the program itself as it's being written. The facilities which a language provides can grossly affect the style in which a program is written. For example, I once had to implement Conway's "Life" game on a computer, which involves a large number of parallel matrix operations. Doing it in Fortran and doing it in APL will impose two very different styles on the implementation because APL allows you to think in terms of parallel operations, whereas Fortran doesn't.
(We should remember that this was well before Fortran 90. Modern versions of Fortran do support parallel array operations.)
I also wanted to point out that earlier, the statement was made that if you leave goto in a given language, you will at least have the choice, whereas if you leave it out, you can get in trouble. I'd like to point out that whether or not this choice is made, we should also give consideration to the choice of putting in features, which allow you the choice of the goto. In other words, put in while-do, the case, and so forth, so you will have a choice in both directions.
ACM SIGPLAN Notices, Vol. 7, No. 11, November 1972, pp. 80-81.

I still believe much of what I said then, when I was just a twerp: Language design can shape a programmer's thinking, but should also present the programmer with a legitimate range of choices.

It's not enough simply to say, "Don't use goto." You also need to say what to use instead. That slim little book on Structured Programming had quite a number of ideas on how best to organize programs and data structures, but "Structured Programming" came to be a buzz-phrase that meant merely "programming without goto statements." Even more specifically, it meant organizing the control flow within a procedure using three specific tools: sequencing, conditionals, and loops. To put it even more briefly: begin-end, if-then-else, and while-do.

This intellectual stance did help a lot of programmers to better organize their code. But structured programming in this narrow sense is not above criticism. It is a theory only of how to organize code in the small, within a single procedure. A number of people pointed out that if your language has procedure calls and block structure, and your optimizing compiler is sufficiently talented, then goto is merely a special case of a procedure call—not only theoretically, but pragmatically, and if a procedure call without arguments isn't as fast as a goto, then your optimizing compiler is leaving a lot of performance on the table.

Five years later, I published a paper at the 1977 ACM National Conference; its (rather pretentious) title was "Debunking the 'Expensive Procedure Call' Myth; or, Procedure Call Implementations Considered Harmful; or, LAMBDA: The Ultimate GOTO." This was based on the work I did for my masters degree at MIT, writing a compiler for the Scheme programming language in which all loops were turned into patterns of function calls. The function calls were then compiled down to branch instructions. The theory worked. So if you allow procedure calls, you can't really forbid the use of goto. But you can at least design languages so as to encourage good style.

But times change. Hardware technology changes. Our needs change. Computers linked together in networks are now the norm, not the exception. Computers with multiple processors are becoming more and more common. We need programming languages that support the use of multiple threads of control. Ada tackled this in the 1980s. The Java programming language has made the use of multithreading and remote procedure calls widespread across the Internet. In such a context, the organizing principles of structured programming may not always be appropriate. I'm not talking about avoiding goto. (Java has no goto statement.) I'm talking about sequencing, if-then-else, and loops.

Sequencing implies a single thread of control, ordering actions in time. In the future, we may need languages that better support multiple threads of control and deal with the consequences of unordered actions.

If-then-else makes a binary choice, which is fine when there are really exactly two cases to consider; but often there are more than two possibilities, in which case if-then-else imposes a kind of "Boolean bottleneck" that requires complex case analysis to be reduced to a tree of binary decisions. This decision tree then imposes an arbitrary time ordering of decisions that may not be relevant to the description of the computation.

Loops connote sequential execution of successive iterations, and furthermore rely on side effects in the loop body to make progress, making it difficult for a compiler to recognize potential parallelism.

I'm not saying we should throw structured programming out the window. I am saying that the trade-offs have changed and are likely to keep changing.

What might a language look like in which parallelism is the default? How about data-parallel languages, in which you operate, at least conceptually, on all the elements of an array at the same time? These go back to APL in the 1960s, and there was a revival of interest in the 1980s when data-parallel computer architectures were in vogue. But they were not entirely satisfactory. I'm talking about a more general sort of language in which there are control structures, but designed for parallelism, rather than the sequential mindset of conventional structured programming. What if do loops and for loops were normally parallel, and you had to use a special declaration or keyword to indicate sequential execution? That might change your mindset a little bit.

What if a numerical programming language actually looked like the stuff that mathematicians and physicists use on their blackboards?—or whiteboards?—I'm showing my age. That was an area of active exploration in the 1960s. But as terminals became standardized on the ASCII character set with 24×80 screens, that exploration died out. Bitmapped displays have made the use of extended character sets and notations possible again, but very little work has been done in this area, aside from specialized packages such as MATLAB and Mathematica. Network browsers and the Java programming language and others have recently made support for the Unicode character set widely available. Can Unicode make programming languages easier to read and write? If so, are constraints needed on its use to avoid chaos? These are the kinds of questions I am looking at now, and I'm hoping that other researchers will, too.

I've been working on language design, implementation, and documentation for a third of a century. It's been fulfilling and fun. In the future, I will try to do better.

DDJ


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.