Site Archive (Complete)
C++
SELECTIVE IGNORANCE

Finding the Signal in the Noise

by Andrew Koenig

November 2007


November 21, 2007

Divide and botch


A well known software design principle is "Divide and conquer"--the notion that by dividing a large problem into smaller problems will yield a solution where the overall problem might be too hard to solve. This strategy is often combined with recursion. For example, one widely used sorting algorithm is to divide the data to be sorted into two approximately equal pieces, sort each piece, and finally merge the two sorted pieces.

However, sometimes the division doesn't have quite the intended effect. Each individual piece of the problem might be solved correctly, but combining the solutions yields a surprise.

For example, once upon a time my checkbook was stolen. I asked my bank what I should do, and they advised me to close the account and open a new one. If I told them the numbers of the checks that I had written, they would ensure that only those checks would clear; any other checks written on the old account would go to the police.

So I closed my account and opened a new one, and never encountered any attempts at fradulent check-writing. I did, however, get a nasty surprise when my next payday rolled around: The automatic deposits that had been reaching my old checking acccount were now vanishing into thin air.

The reason was obvious in retrospect: Whoever designed the banking system realized that it was important to deal with withdrawals from closed accounts, but forgot to do anything about deposits into closed accounts. Fortunately, I was able to tell the people at the bank the exact date and amount of the missing deposit, because they had to go through the deposit information by hand to find the transaction to move.

I haven't seen a good term for such design mishaps, so I've taken to calling them "divide and botch." Perhaps some readers can suggest a better term.

Posted by Andrew Koenig at 10:00 AM  Permalink |


November 03, 2007

Least surprise in elections


We have seen that it's not easy to collect several players into teams that have well-defined strength. Such surprises come up even when we're not dealing with teams. Indeed, they happen even in seemingly straightforward situations such as elections.

Suppose, for example, we have two candidates running for office, and every voter gets to vote for a single candidate. The winning candidate will obviously be the one who received the most votes, and it hard to argue that this state of affairs is anything but fair.

Unfortunately, adding even one more candidate makes the situation much more complicated. Suppose, for example, that we have candidates A1, A2, and B, so named because A1's position is almost identical to A2's position (and, of course, both A1 and A2 are very different from B). Suppose that 40% of the voters support B and 60% would be happy with either A1 or A2.

What would you expect the election results to be? In the absence of any reason to choose A1 or A2, a reasonable result would be 30% for A1, 30% for A2, and 60% for B. B wins by a landslide even though either of the other candidates would be preferred by more than half the voters.

One solution to this problem is for A1 and A2 to get together and decide that they don't both need to be running. Instead, one of them can withdraw in favor of the other, on the assumption that doing so will prevent B from being elected. Of course, in real life, both A1 and A2 are likely to say "You first," so this suggestion works better in theory than in practice.

Another possible solution is for a third party to suggest to voters that if they are thinking of voting for A2, they would do better to vote for A1 instead. That way, they get a candidate who is almost as good from their viewpoint, and they avoid electing B. If A1 and A2 are truly indistinbuishable, this is a fine idea. But what happens if A1 and A2 are subtly but significantly different? Now a voter who supports A2 will be under pressure to vote insincerely--that is, to vote for someone other than the favorite candidate in order to maximize the desirability of the election resuts.

There are many voting schemes intended to avoid surprises such as this one. To my knowledge, none of them work perfectly in all situations. That is, they all cause surprises of one kind or another in some circumstances.

If we can't avoid surprises in a situation as simple as voting, what makes us think we can avoid them in programming-language design?

Posted by Andrew Koenig at 02:48 PM  Permalink |



Least surprise in tournaments


Now that we've seen how trying to combine plausible-sounding rules can surprise APL programmers, let's look at a more mundane example: a tournament to find out which of several teams is the strongest. We shall learn that it is hard to avoid surprising results, even in simple cases.

As an example, suppose we have three teams, each with three players. We shall imaginatively call them Team 1, Team 2, and Team 3. Suppose further that each player's strength is reliable--in other words, if player A ever beats player B, player A will always beat player B. Moreover, we will assume that players' strengths are transitive--in other words, if player A beats player B and player B beats player C, then player A will also beat player C. Finally, let's assume that no two players are exactly the same strength.

Under such circumstances, it should be easy to see that there is a unique strongest player, because our assumptions imply that when players compete against each other, the results define a total ordering of the players' strengths. No surprises so far.

Now assume that we have three teams of three players each, and we want to determine which team is strongest. The first problem is to define what it means for one team to be stronger than another.

Suppose we define the strength of a team by a round-robin tournament. Every player in one team plays every player in the other; then the team that won the most games is considered the stronger one. On the surface, this rule seems fair. If each team has three players, then there will be nine games; and because ties are impossible (because we assumed that no two players were exactly the same strength), one team will always win more games than the other. So we now have a way of determining which of our three teams is the strongest: Play each team against the other two and see what happens.

Let's make one more supposition: The relative strengths of Team 1's members are 8, 1, and 6; Team 2's members have strengths of 3, 5, and 7; and Team 3's members have strengths of 4, 9, and 2. Now let's play our tournament and see what happens.

Team 1's first player beats everyone on Team 2, and the second player loses to everyone on Team 2. The third player, with strength 6, beats two of Team 2's members and loses to the third. So Team 1 wins, 5 to 4.

Team 2's first player loses to two of Team 3's players; each of Team 2's other players wins two and loses one. So Team 2 beats Team 3, again by 5 to 4.

Finally, when Team 1 plays Team 3, Team 1's first player wins two games, the second player loses all three, and the third player wins two. So Team 3 beats Team 1 by 5 to 4.

Look what has happened! We started with players of well-defined relative strength. We put them together into teams and tried to determine the teams' relative strength by the most straightforward method possible. The result was that our teams' overall strength was not transitive. If that's not a surprise, I don't know what is.

The moral of the story is that just because we know how to compare one object with another, we do not necessarily know how to compare one collection of objects with another. We shall see some more examples of this phenomenon in the next few posts.

Posted by Andrew Koenig at 02:26 PM  Permalink |



December 2007
Sun Mon Tue Wed Thu Fri Sat
            1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31          


BLOGROLL
 
INFO-LINK


Related Sites: DotNetJunkies, SD Expo, SqlJunkies