Of course, that figuring out can often be automated. As an obvious example, every standard-library container class keeps track of its own memory and frees it as needed. Nevertheless, it is hard to resist the belief that automatically freeing memory would make life a lot simpler for C++ programmers.
One of these disadvantages was peculiar to switching systems, and probably only to systems of that era: The software had to be designed to be able to work around hardware failure, and memory failure in particular. In this system, every data structure had a corresponding audit routine, the purpose of which was to inspect the data structure, detect any internal inconsistencies, and correct those inconsistencies from the remaining data. These engineers viewed the garbage collector as a program that manipulated one more data structure; and they felt that they would have to augment the memory-management system so that it would be robust in the face of memory failures.
This requirement led to a second problem: In order to reengineer the garbage collector in this way, they would need access to its source code and specifications. Moreover, every time those specifications changed, they would have to rewrite their own version to match those specifications.
The third problem was the possibility that allocating memory might trigger a garbage collection, which in turn might cause the switch to stop for a while to do the garbage collection. Such delays were usually acceptable--as they would simply cause a telephone call to take a little longer to connect or disconnect--but sometimes they were unacceptable. To forestall delays where they could not be tolerated, they talked about preallocating list cells and using them as needed.
In other words, these engineers started out by lauding the advantages of garbage collection, and then proceeded to rewrite the automatic memory-management system to make it manage memory manually. They just didn't have the control they needed for their application.
Of course, telephone switching systems are unusual applications, because they have to be reliable even in the face of hardware failures. Surely there is no need to rewrite the memory-management subsystem for ordinary applications, right?
I'll leave you to ponder that. Meanwhile, comments are welcome.
With experience comes a more sophisticated viewpoint: The documentation is the system designer's contract with the user, and one cannot change that contract unilaterally. The developer's job is to produce software that conforms to that contract; if it fails to do so, it is the software, not the contract, that must change.
A simple example should serve to reinforce this second viewpoint: If a program adds 2 and 2 and gets 5, the way to fix it is not to change the documentation to say that 2 + 2 = 5.
Still further reflection should reveal still a more sophisticated view: Sometimes documentation will contain mistakes, just as software contains mistakes. So when there is a discrepancy, the wisest course of action is to figure out whether the program or its documentation that is in error, and then to fix the one that is broken.
If only reality were that simple. Once upon a time I was working on a project in which we found a discrepancy between code and documentation: A library routine took two parameters, and the documentation described those parameters in the opposite order from the code. If you used this library in the way that the documentation described, your program would surely fail.
This situation gave us a choice of changing the code or the documentation. The argument that eventually won was the one about the documentation being a contract with the user: Users would probably have copies on their shelves of the documentation we had already published, and those copies would persist long afer the code had been superseded. So we corrected the next release of the library to make it match the documentation.
Of course, this decision was a disaster, because it effectively renedered that library routine useless. The problem was that it became impossible to write code that would work with both versions of the library, which meant that every piece of code that used that library routine had now to be written in two versions: one for the old library and one for the new one. Still worse: There was no reliable way of testing which library was in use, so programmers had to select the right version of the code manually.
This situation came about because of several factors.
First, it was impossible to use the program in the way the documentation said it should work. Thus, anyone who wound up using the program successfully did so by violating the documentation.
Second, "fixing" the code changed its previously useful (albeit undocumented) behavior in an incompatible way. So the programmers who had discovered how to circumvent the documentation error found that their programs stopped working.
Finally, there was no way to unify the old and new behavior. That is, there was no way to write a single program that would work with both versions of the library.
In hindsight, I think that perhaps the best way to resolve this problem would have been to have change the documentation and find a way to warn programmers that the original documentation was incorrect. After all, there was no way to follow the original documentation and obtain a working program.
This approach is certainly appealing. In particular, having automated tests that capture as much as possible of a program's desired behavior is an excellent idea. But what if a program has to have a characteristic that you don't know how to test?
Although that point of view is appealing, I can think of several counterexamples.
The first is that even a simple program may have so many possible inputs that there is no way to test them all, and it might not even be possible to think of a set of representative samples. As an example, consider a program that does a floating-point multiplication. Suppose you are trying to verify that the program conforms to the relevant IEEE floating-point standard. There isn't enough time in the world for you to test every possible pair of input values, so you have to select them somehow. But the moment you do so, you run into the possibility that a bug may lurk in one of the possibilities you didn't select.
The second counterexample might come from a requirement that a program not leak memory. It's hard to express such a requirement in terms of specific numbers, because the requirement really governs asymptotic behavior. In such circumstances, it is typically a judgment call as to whether the requirement has been met in a particular case.
The third counterexample is related: It is often important that a program be robust against incorrect input. In particular, programs, such as web applications, that take input from the public have to be robust against malicious attempts to manipulate them with inappropriate input. In such cases, the cost of forgetting to test for a particular hazard may be the loss of the entire system of which the program is a part. Worse, that hazard might be part of a library component that there is no easy way to test directly.
What can we do in such circumstances? The best approach I can think of is to try to verify the program's behavior independently of testing it. Here I am using the word "verify" to mean reasoning about the program with an eye toward showing that certain kinds of failures are impossible. If, for example, you can locate every place in a program where memory is allocated and prove that the memory is always freed, then you have just increased your confidence that the system as a whole doesn't leak memory. One way of proving such behavior might be to show that memory is allocated only in constructors and is always freed in the corresponding destructors. In that case, the program can only leak memory if it leaks objects, and it may well be much easier to prove that that cannot happen.
I don't want to minimize the importance of testing. Indeed, I have some interesting examples that show just how important testing can be. But even when a program has passed all its tests, I think it's a mistake to assume that it's incapable of improvement.
]]> Sorry for the interruption.
Hello! You have been chosen to participate in
an important survey from <name deleted>, a
respected 3rd party research company.
This is for RESEARCH PURPOSES ONLY.
We are not selling anything. Your answers are grouped anonymously.
followed by "yes" and "no" buttons.
The requested URL /surv/369132/ai_start.php was not found on this server.
I can't imagine the process by which the people who run such a major website could have allowed this to happen.
On the surface, these two ideas may seem equivalent. However, a closer look reveals that Dijkstra has introduced a new level of abstraction into the discussion, and this abstraction profoundly affects how we program.
This abstraction concerns the language in which we program--the rules of the game, as it were. If the purpose of a program is to instruct a machine, then the actual behavior of the machine defines the nature of our instructions. If, instead, the purpose of the machine is to execute our programs, then there is the possibility that the machine might fail to comply with the rules that we use for writing our programs, and in that case we are justified in saying that the machine is broken and the manufacturer should fix it.
Around the time that Dijkstra wrote his book, something happened that underscored his point--even though I did not realize the connection until this morning. I had a friend who worked in the data center of a large media company, and part of her job was to decide whether to accept or reject the first model of a brand-new mainframe. She and her colleagues ran all kinds of tests on this machine, and eventually uncovered a problem: When a subroutine-call instruction happened to be located at the very end of a memory page, the return address would not be the address of the first byte of the next page, as one would expect--instead, it would be the first byte of the current page.
As you would expect, this problem caused all kinds of strange failures. Moreover, those failures were quite rare, as they happened only in these very specific circumstances.
My friend had no trouble dealing with this situation: She rejected the machine and told the manufacturer to bring it back when it worked. Part of what gave her the authority to do this was that the manufacturer had written a document that described how the machine was expected to behave. In effect, this document was the rule book that programmers were supposed to use to program this machine, and when the machine didn't follow the rules, it was the machine, not the rules, that were at fault.
This discussion may seem obvious. However, even to this day it is surprising how often programmers have to change their programs to work around undocument behavioral bugs in the systems on which they depend.
]]>However, sometimes the division doesn't have quite the intended effect. Each individual piece of the problem might be solved correctly, but combining the solutions yields a surprise.
]]> For example, once upon a time my checkbook was stolen. I asked my bank what I should do, and they advised me to close the account and open a new one. If I told them the numbers of the checks that I had written, they would ensure that only those checks would clear; any other checks written on the old account would go to the police.So I closed my account and opened a new one, and never encountered any attempts at fradulent check-writing. I did, however, get a nasty surprise when my next payday rolled around: The automatic deposits that had been reaching my old checking acccount were now vanishing into thin air.
The reason was obvious in retrospect: Whoever designed the banking system realized that it was important to deal with withdrawals from closed accounts, but forgot to do anything about deposits into closed accounts. Fortunately, I was able to tell the people at the bank the exact date and amount of the missing deposit, because they had to go through the deposit information by hand to find the transaction to move.
I haven't seen a good term for such design mishaps, so I've taken to calling them "divide and botch." Perhaps some readers can suggest a better term.
Unfortunately, adding even one more candidate makes the situation much more complicated. Suppose, for example, that we have candidates A1, A2, and B, so named because A1's position is almost identical to A2's position (and, of course, both A1 and A2 are very different from B). Suppose that 40% of the voters support B and 60% would be happy with either A1 or A2.
What would you expect the election results to be? In the absence of any reason to choose A1 or A2, a reasonable result would be 30% for A1, 30% for A2, and 60% for B. B wins by a landslide even though either of the other candidates would be preferred by more than half the voters.
One solution to this problem is for A1 and A2 to get together and decide that they don't both need to be running. Instead, one of them can withdraw in favor of the other, on the assumption that doing so will prevent B from being elected. Of course, in real life, both A1 and A2 are likely to say "You first," so this suggestion works better in theory than in practice.
Another possible solution is for a third party to suggest to voters that if they are thinking of voting for A2, they would do better to vote for A1 instead. That way, they get a candidate who is almost as good from their viewpoint, and they avoid electing B. If A1 and A2 are truly indistinbuishable, this is a fine idea. But what happens if A1 and A2 are subtly but significantly different? Now a voter who supports A2 will be under pressure to vote insincerely--that is, to vote for someone other than the favorite candidate in order to maximize the desirability of the election resuts.
There are many voting schemes intended to avoid surprises such as this one. To my knowledge, none of them work perfectly in all situations. That is, they all cause surprises of one kind or another in some circumstances.
If we can't avoid surprises in a situation as simple as voting, what makes us think we can avoid them in programming-language design?
Under such circumstances, it should be easy to see that there is a unique strongest player, because our assumptions imply that when players compete against each other, the results define a total ordering of the players' strengths. No surprises so far.
Now assume that we have three teams of three players each, and we want to determine which team is strongest. The first problem is to define what it means for one team to be stronger than another.
Suppose we define the strength of a team by a round-robin tournament. Every player in one team plays every player in the other; then the team that won the most games is considered the stronger one. On the surface, this rule seems fair. If each team has three players, then there will be nine games; and because ties are impossible (because we assumed that no two players were exactly the same strength), one team will always win more games than the other. So we now have a way of determining which of our three teams is the strongest: Play each team against the other two and see what happens.
Let's make one more supposition: The relative strengths of Team 1's members are 8, 1, and 6; Team 2's members have strengths of 3, 5, and 7; and Team 3's members have strengths of 4, 9, and 2. Now let's play our tournament and see what happens.
Team 1's first player beats everyone on Team 2, and the second player loses to everyone on Team 2. The third player, with strength 6, beats two of Team 2's members and loses to the third. So Team 1 wins, 5 to 4.
Team 2's first player loses to two of Team 3's players; each of Team 2's other players wins two and loses one. So Team 2 beats Team 3, again by 5 to 4.
Finally, when Team 1 plays Team 3, Team 1's first player wins two games, the second player loses all three, and the third player wins two. So Team 3 beats Team 1 by 5 to 4.
Look what has happened! We started with players of well-defined relative strength. We put them together into teams and tried to determine the teams' relative strength by the most straightforward method possible. The result was that our teams' overall strength was not transitive. If that's not a surprise, I don't know what is.
The moral of the story is that just because we know how to compare one object with another, we do not necessarily know how to compare one collection of objects with another. We shall see some more examples of this phenomenon in the next few posts.
]]>This idea is a good one most of the time. However, sometimes there is a good and simple reason for behavior that is surprising at first glance. In such cases, following the principle of least surprise may introduce extra complexity into the system and make its behavior more surprising in the long run.
]]> I'm going to give you an example in APL. I think it's a language that most readers don't know, so they won't have preconceptions about how things ought to work.One of APL's most fundamental ideas is that of an array. In fact, every value in APL is an array. An array can have any number of dimensions. Indeed, in APL programs, it is not uncommon to create three- and four-dimensional arrays as temporary parts of larger computations, and then compress them back down to one- or two-dimensional arrays.
What kind of array should represent an ordinary scalar (i.e. a number that isn't really an array), such as 42? There art two choices: Perhaps it should be a one-dimensional array, with a single element, or perhaps it should be an array with no dimensions at all.
At first glance, the idea of an array with zero dimensions may seem surprising. However, treating a scalar as a zero-element array makes a lot of sense. First of all, if an array can have any number of dimensions, it would surely be surprising if "any number" did not include zero. So the language should admit zero-dimensional arrays; if they are not equivalent to scalars, then we have to explain what the difference is between scalars and zero-dimensional arrays.
In this sense, then defining a scalar as a zero-dimensional array follows the principle of least surprise.
However, in my experience with APL, the fact that scalars are zero-dimensional arrays causes one of the first serious surprises that most people have when they learn APL.
Consider a program that computes the average of a vector. I don't have the APL character set in this blog, so I will express the average of v as sum(v)/size(v). APL has its own characters for sum, size, and division, but I don't need them to make my point.
In a language with multi-dimensional arrays, it is not immediately obvious how to define sum and size. In fact, APL does so by saying that the sum of an array is an array with one less dimension. For example, the sum of a matrix is a vector, each element of which is the sum of one row of the matrix. It has to make an exception for the sum of a scalar, which has no dimensions to begin with, so it defines the sum as being the same as the scalar itself.
Similarly, the size of an array is a vector, with one element for each dimension of the array.
So what is the size of a scalar?Here is where it gets interesting. We said earlier that a scalar is a zero-dimensional array--so the size of a scalar should be a zero-element vector. Given our definition of average, what do you suppose is the average of a scalar?
Well, we know that the average is sum(v)/size(v). sum(v) is the same as v, so our average is the result of dividing our scalar by an empty vector. We just need to figure out what that is.
Again, the principle of least surprise comes into play. Suppose v is a vector and we write v+1. Surely the result of that addition should be to add 1 to each element of v. (It cannot be concatenation because APL uses a comma for concatenation; binary + always means addition). So applying an arithmetic operation to a scalar and a vector applies that operation to the scalar and each element of the vector.
But this rule means that dividing a scalar by an empty vector yields an empty vector--which means in turn that the average of a scalar is an empty vector! Just about everyone who sets out to learn APL is surprised by this behavior; people expect the average of a scalar to be the scalar itself, not an empty vector.
So here we have a case where a surprising result comes from three applications of the principle of least surprise:
1) It should be possible to have an array with no dimensions, and a zero-dimensional array should be the same thing as a scalar.
2) The size of an array should be a vector with one element for each dimension of the array.
3) When you do arithmetic on a scalar and a vector, the result has the same size as the vector.
Choosing the least surprising behavior in these three contexts cuses the surprising behavior that the average of a scalar is an empty vector. Does this mean that there is something wrong with our choices?
I don't think so. Rather, I think this exmaple shows that it is not always possible to have ideas that are unsurprising both individually and in combination. When such surprising combinations occur, it can be tempting to remove the surprise by tweaking individual ideas--but doing so is often harder than it looks. The trouble is that because the surprise doesn't come from any single design decision, changing one decision might introduce another surprise somewhere else--and that one might be even more surprising and harder to see.
]]> if (x == 0)
and instead wrote
if (x = 0)
the compiler would surely be providing a useful service by telling us that we had probably done something we didn't intend.
However, there is a dark side to compiler warnings that takes a while to appreciate. We can begin to see its shadows by asking what we should do when we compile someone else's program and see a dozen (or a hundred) warning messages.
The obvious answer is to locate the source of the warnings and change the program so that they no longer appear. After all, every warning represents a part of the program that might be hiding a serious bug, and if the program produces too many warnings, we have no way of knowing how many of those bugs might be waiting to bite us.
In effect, a conscientious programmer will treat compiler warning messages similarly to compiler error messages: as indicating problems that must be fixed before proceeding further. Indeed, it is not uncommon for commercial programming shops to decree that warning messages are unacceptable: All code must compile warning-free.
That's a Good Thing too, isn't it? After all, isn't it reasonable to insist that programmers refrain from writing programs that even a compiler can see are hazardous? Who could argue with such an aim?
Indeed, it is hard to argue with an aim expressed as this one is. However, if we take a step back, we shall see that the problem is not quite that simple. For if an organization insists on programs compiling warning-free, then the programmers are no longer writing their programs in the programming language that their manuals define. Instead, they are programming in the intersection of the language in the manual and the language that the compiler doesn't warn about. Suddenly, the language definition changes from one that is written down to one that must be inferred from the compiler's behavior.
The situation gets worse when we have to deal with more than one compiler--as we all do. Remember that each new release of a compiler is effectively a new compiler from the user's viewpoint. Suppose, for example, that a compiler developer wakes up one day and realizes that a class with a destructor and no constructor is usually asking for trouble. After suitable discussion and review, a new release of the compiler comes out that gives warning messages for such classes.
Now shift focus to a developer using that compiler. The developer wrote this code:
class Thing {
public:
virtual ~Thing() { }
};
The old compiler accepted it with nary a peep; the new compiler warns that there is a destructor but no constructor.
In a sense, that behavior is a compiler bug: If you treat warnings as errors, you've just lost the ability to have an empty virtual destructor in an abstract base class. In another sense, of course, one might argue that it is never a bug to issue a warning as long as it is warning about something that is actually happening.
This example shows why I think warning messages are a decidedly mixed blessing. They have certainly saved my hide, but equally certainly they have made it hard for me to do things I thought were entirely reasonable.
I'd love to see reader comments about their experiences with warning messages.
I connected this disk to one of the Firewire ports on my machine, because I want to use the USB port for low-latency purposes such as sending audio signals to an external sound card, and I fear that a lot of USB disk traffic might interfere. So far so good.
I have a second disk that I use for backups. Two of them, actually, with the same manufacturer, make, and model. To keep these disks straight, I'll call them the "backup disks" and the other one the "auxiliary disk." As I said, the same company made both backup disks, and it's a different company from the auxiliary disk.
Here's the weird part: When I connect one of the backup disks to the machine, the auxiliary disk immediately reports a permanent I/O error. Something about being unable to do a delayed write to the Master File Table, which sounds pretty serious. So the auxiliary disk must be broken, right? After all, it's the one reporting the error.
Not so fast! First, I can make the problem occur whenever I want: Just connect the backup disk to the machine and the auxiliary disk reports failure! If I don't connect the backup disk, the auxiliary disk hums along just fine.
But what is really weird is that if I connect the other auxiliary disk to the machine instead, there is no problem. In other words, I have two disks of the same make and model, one of them causes my auxiliary disk to report errors, and the other one doesn't. Which disk would you think is the real culprit now? Right, one of the backup disks.
So I went to the backup disk manufacturer's website to report the problem. I went through their registration procedure thence to their "report a problem" page, and finally I attached a screen shot showing the message.
When I submitted the trouble report, their website crashed.
After a decent interval, I decided to test the waters by submitting a trouble report about their trouble-reporting system. If that crashed, it would waste less time than sending in the whole original disk repoirt a second time. Fortunately, it went right through.
So I submitted the original trouble report again. Wouldn't you know it--their website crashed again! On the theory that what crashes it is trying to attach a screen shot, I tried one more time without the screenshot. Worked like a charm.
So now I don't know what to think. Is it so rare for a customer to report a problem and include a screen shot? Does the manufacturer just not care? Is something else dark and obscure happening that I don't know about?
No matter how you look at it, some bugs just shouldn't happen.
]]>I've lost track of how many times someone has come to me complaining about what turns out to be correct behavior. The first example of such behavior that comes to mind is expecting (1.0/3)*3 to be equal to 1.0; but there are plenty of other such examples.
If a program's behavior doesn't match your expectations, it just might be your expectations that are incorrect.
Most beginners' first reaction when they make such discoveries is to think that there is something wrong with the compiler. After being disabused of that notion, they will reluctantly go looking for the problem.
However, my experience is that many programmers, especially beginners, make a critical mistake when they go on their bug hunts.
For example, before you ran your program, did you compile it? Are you sure? Are you sure that you compiled every piece of source code that you're using? With the same compiler? Are you sure that you saved the code before you compiled it.
I've lost track of how many times I've had a program crash on me, then removed all object and executable files, then recompiled everything from scratch, and had it work perfectly.
Oh yes, one other thing. Many operating systems have a notion of a search path, which is a list of places where the system should look for a program when you run it. Sometimes, the system is set up in such a way that when you try to execute a program, you are unwittingly getting a program with the same name from somewhere else. For example, on many Unix systems, typing a.out might execute a file named a.out wherever in the search path it happens to be, whereas typing ./a.out will insist on executing a.out in the current directory.
Whatever facilities your operating system happens to have, you will save yourself a lot of time if you remember the first rule: Before you go to fix it, be sure you're fixing the right thing.
Is the third time the charm?
The response: "We checked your machine's configuration; it has a 305-watt power supply, which is adequate for this graphics card." I said that the label on the power supply said it was 250 watts, but they insisted that their records showed that it was 305 watts.
I did not ask how they managed to get from 305 watts to 350 watts. Instead, I reasoned that the graphics card's power-supply requirements were not absolute. After all, how could the graphics card's manufacturer know what other devices were on the machine. The manufacturer's technical-support people had assured me that this card would work with my existing power supply; if I tried it and it did not work, it would be their problem. Bad reasoning based on wishful thinking, I know, but so be it. I was worried that if I tried to return the card after tech support had said it would work, they would charge me a restocking fee, or--still worse--refuse the return altogether.
So I tried the new graphics card. Somewhat to my surprise, it appeared to work, though the speed of my computer's cooling fan reminded me that the power supply was now working hard. Nevertheless, I thought I'd contact tech support again to ask about that gap between 305 and 350 watts, and to see if there was another power supply available.
The second tech-support rep's story flatly contradicted the first: (1) This graphics card requires a 350-watt power supply, not 305; (2) 305 watts is the largest power supply that could be installed in this machine, and therefore (3) I had to send back the graphics card that I had just installed and buy another one with smaller capacity.
We've already seen lesson one: Making incorrect recommendations is worse than making no recommendations at all. We can now add to that as lession eight: Once you've made an incorrect recommendation, correcting it later doesn't help much because there's no reason to believe the correction.
Anyway, I asked for a recommendation for an appropriate graphics card. Silly me. But then I had an idea: I phoned the manufacturer of the graphics card, described the situation to them, and asked for their recommendation. They recommended the same model. Now, at last, I had independent corroboration.
So I ordered the new card, hoping that my machine wouldn't fry itself before it arrived. This time, I got lucky: When I installed the new card, it worked just fine, and the machine was significantly quieter too.
Now, let's look at what would have happened had the computer manufacturer refrained from advising me about graphics cards. I would probably have rummaged around the web and come up with the same card I eventually bought. The chances of doing so would be even greater if the computer manufacturer had a web page saying something like "Here are the different types of graphical interfaces," "Here is how to find out what will fit on your machine," and other such general advice.
I would probably have been slightly annoyed that they didn't know enough about their own products to make specific recommendations, but that annoyance would be nothing compared with how I ultimately wound up feeling. In effect, by trying to be helpful, they wasted my time and their own, and damaged their reputation.
Now let's see how these lessons might apply to other aspects of system design.
First and foremost is the idea of thinking about systems from the user's viewpoint rather than from the designer's viewpoint. I am fond of saying that when a company's representative explains why the company can't help me by describing the company's organization chart, the company is in trouble. When I have a problem with a company's products, and I call their customer-service number, I expect to be connected with someone who can help me solve my problem--not with someone who listens at length to my problem and then tells me that I need to contact someone in another department.
The other important idea is that incorrect information is worse than no information at all. This notion usually extends to other aspects of system design as well. For example, it is usually better for a program to refuse to produce output at all than it is for the program to produce incorrect output without complaint. Moreover, once a program has produced incorrect output, correcting it after the fact may not make things any better.
The third idea, and perhaps the most important one, is that complexity is an ever-present problem. Think about all of the details involved in these transactions, and all the specialized knowledge that was necessary to arrange for them in the first place. Such complexity seems to have been creeping for decades into every aspect of every computer system I encounter. It is so pervasive that it has no chance at all of going away--the best we can hope for in practice is to find useful ways of managing it.
Managing complexity is, of course, another huge topic, so let's save that one for later.
"So how do I arrange for you to take this one back and send me the right one?"
"Oh, I can't help you there; I just answer technical questions. But I'll connect you to customer service."
Lesson three (Lessons one and two are in Part 1): When someone explains to you why the company's organization chart means that someone else has to help you, there is trouble ahead.
Lesson four: When you're designing a system that deals with customers, think about how the system looks from the customer's viewpoint, not just from the company's.
So it was off to customer service, and another fair (or unfair) amount of time on hold. When I eventually got to the front of the line, I had to explain my problem again, after which the customer-service rep arranged for me to return the inappropriate graphics card.
Lesson five: Design your system so that when your customer tells you something, the system remembers it and doesn't make the customer repeat the information.
Lesson six: If one of your people has to hand a customer off to someone else, put the customer at the head of the line.
So far so good: I had arranged to send back the inappropriate graphics card. Now the real question:
"How do I arrange to purchase the correct card for my machine?"
"Oh, I can't help you there; I'll have to transfer you to sales."
See lessons three and four, above.
Sales, of course, answered the phone much more quickly than customer service or returns. However, I had to tell the whole story again (See lesson five). The sales rep was happy to sell me a new graphics card. Not only that, but she double-checked whether that card would work with my machine.
It turned out, she said, that it wouldn't work either. The problem is that the card has a higher-capacity processor than my old one, and therefore requires more power than my machine's power supply can deliver. Fortunately, we can solve the problem if I buy and install a new power supply. Moreover, because I would be buying the power supply at the same time as the graphics card, she would offer me a discount on the combination, making the bottom line not much more expensive than the graphics card I was returning.
I checked the box for the graphics card I was returning. Sure enough, it said that it required a 350-watt power supply, and my machine had less than that. So she wasn't just trying to extract more money from me. Instead, the tech-support guy had failed to notice the power-supply requirement. See lesson one again.
So I shrugged my shoulders, took out my credit card, and ordered the new graphics card and power supply.
A few days later, they showed up. Being a cautious sort of fellow when it comes to hardware, I decided I would install the new power supply first, then verify that the machine worked the same way with the new power supply as it did with the old one, and only then would I install the new graphics card.
So I shut down my machine (having backed up all of its files the day before), took out the old power supply, unpacked the new power supply--and found that it didn't fit.
I mean that it physically didn't fit: The power cord from the wall plugs into a socket on the power supply, and the new power supply's socket was in a different place than the old power supply's socket. Which meant that if I were to install it in my machine, I would have to cut a hole in the metal panel on the back of the machine to allow the power cord to plug into it.
Lesson seven: When a customer calls you to replace something that doesn't work, and you sell him a replacement, you had really better make sure that the replacement works. Otherwise your customer is going to be very, very angry.
To be continued...
]]>