October 16, 2007
The principle of least surprise can be surprising
It's hard to study software design for long without coming across the principle of least surprise, sometimes called the principle of least astonishment. Both terms refer to the idea that a system should cause as little surprise as possible for someone who doesn't already know how it behaves.
This idea is a good one most of the time. However, sometimes there is a good and simple reason for behavior that is surprising at first glance. In such cases, following the principle of least surprise may introduce extra complexity into the system and make its behavior more surprising in the long run.
I'm going to give you an example in APL. I think it's a language that most readers don't know, so they won't have preconceptions about how things ought to work.
One of APL's most fundamental ideas is that of an array. In fact, every value in APL is an array. An array can have any number of dimensions. Indeed, in APL programs, it is not uncommon to create three- and four-dimensional arrays as temporary parts of larger computations, and then compress them back down to one- or two-dimensional arrays.
What kind of array should represent an ordinary scalar (i.e. a number that isn't really an array), such as 42? There art two choices: Perhaps it should be a one-dimensional array, with a single element, or perhaps it should be an array with no dimensions at all.
At first glance, the idea of an array with zero dimensions may seem surprising. However, treating a scalar as a zero-element array makes a lot of sense. First of all, if an array can have any number of dimensions, it would surely be surprising if "any number" did not include zero. So the language should admit zero-dimensional arrays; if they are not equivalent to scalars, then we have to explain what the difference is between scalars and zero-dimensional arrays.
In this sense, then defining a scalar as a zero-dimensional array follows the principle of least surprise.
However, in my experience with APL, the fact that scalars are zero-dimensional arrays causes one of the first serious surprises that most people have when they learn APL.
Consider a program that computes the average of a vector. I don't have the APL character set in this blog, so I will express the average of v as sum(v)/size(v). APL has its own characters for sum, size, and division, but I don't need them to make my point.
In a language with multi-dimensional arrays, it is not immediately obvious how to define sum and size. In fact, APL does so by saying that the sum of an array is an array with one less dimension. For example, the sum of a matrix is a vector, each element of which is the sum of one row of the matrix. It has to make an exception for the sum of a scalar, which has no dimensions to begin with, so it defines the sum as being the same as the scalar itself.
Similarly, the size of an array is a vector, with one element for each dimension of the array.
So what is the size of a scalar?Here is where it gets interesting. We said earlier that a scalar is a zero-dimensional array--so the size of a scalar should be a zero-element vector. Given our definition of average, what do you suppose is the average of a scalar?
Well, we know that the average is sum(v)/size(v). sum(v) is the same as v, so our average is the result of dividing our scalar by an empty vector. We just need to figure out what that is.
Again, the principle of least surprise comes into play. Suppose v is a vector and we write v+1. Surely the result of that addition should be to add 1 to each element of v. (It cannot be concatenation because APL uses a comma for concatenation; binary + always means addition). So applying an arithmetic operation to a scalar and a vector applies that operation to the scalar and each element of the vector.
But this rule means that dividing a scalar by an empty vector yields an empty vector--which means in turn that the average of a scalar is an empty vector! Just about everyone who sets out to learn APL is surprised by this behavior; people expect the average of a scalar to be the scalar itself, not an empty vector.
So here we have a case where a surprising result comes from three applications of the principle of least surprise:
1) It should be possible to have an array with no dimensions, and a zero-dimensional array should be the same thing as a scalar.
2) The size of an array should be a vector with one element for each dimension of the array.
3) When you do arithmetic on a scalar and a vector, the result has the same size as the vector.
Choosing the least surprising behavior in these three contexts cuses the surprising behavior that the average of a scalar is an empty vector. Does this mean that there is something wrong with our choices?
I don't think so. Rather, I think this exmaple shows that it is not always possible to have ideas that are unsurprising both individually and in combination. When such surprising combinations occur, it can be tempting to remove the surprise by tweaking individual ideas--but doing so is often harder than it looks. The trouble is that because the surprise doesn't come from any single design decision, changing one decision might introduce another surprise somewhere else--and that one might be even more surprising and harder to see.
Posted by Andrew Koenig at 10:39 AM Permalink
|