Site Archive (Complete)
Architecture Blog: Distributed Computing Fallacies Explained: "Latency is Zero"
Architecture & Design
PATTERN LANGUAGE

Modeling, Managing, Making it Right.

by Jonathan Erickson
IF YOU BUILD IT

... Will they Come?

by Arnon Rotem-Gal-Oz
May 08, 2006

Distributed Computing Fallacies Explained: "Latency is Zero"

The second fallacy of Distributed Computing is the assumption that "Latency is Zero". Latency is how much time it takes for data to move from one place to another (versus bandwidth which is how much data we can transfer during that time). Latency can be relatively good on a LAN--but latency deteriorate quickly when you move to WAN scenarios or internet scenarios.

Latency is more problematic than bandwidth. Here's a quote from a post by Ingo Rammer on latency vs. Bandwidth that illustrates this:

But I think that it’s really interesting to see that the end-to-end bandwidth increased by 1468 times within the last 11 years while the latency (the time a single ping takes) has only been improved tenfold. If this wouldn’t be enough, there is even a natural cap on latency. The minimum round-trip time between two points of this earth is determined by the maximum speed of information transmission: the speed of light. At roughly 300,000 kilometers per second (3.6 * 10E12 teraangstrom per fortnight), it will always take at least 30 milliseconds to send a ping from Europe to the US and back, even if the processing would be done in real time.

You may think all is okay if you only deploy your application on LANs. However even when you work on a LAN with Gigabit Ethernet you should still bare in mind that the latency is much bigger then accessing local memory Assuming the latency is zero you can be easily tempted to assume making a call over the wire is almost like making a local calls--this is one of the problems with approaches like distributed objects, that provide "network transparency"--alluring you to make a lot of fine grained calls to objects which are actually remote and expensive (relatively) to call to.

Taking latency into consideration means you should strive to make as few as possible calls and assuming you have enough bandwidth (which will talk about next time) you'd want to move as much data out in each of this calls. There is a nice example illustrating the latency problem and what was done to solve it in Windows Explorer in here.

Another example is AJAX. The AJAX approach allows for using the dead time the users spend digesting data to retrieve more data - however, you still need to consider latency. Let's say you are working on a new shiny AJAX front-end--everything looks just fine in your testing environment. It also shines in your staging environment passing the load tests with flying colors. The application can still fail miserably on the production environment if you fail to test for latency problems--retrieving data in the background is good but if you can't do that fast enough the application would still stagger and will be unresponsive.… (You can read more on AJAX and latency here.)

You can (and should) use tools like Shunra Virtual Enterprise, Opnet Modeler and many others to simulate network conditions and understand system behavior thus avoiding failure in the production system.


Posted by Arnon Rotem-Gal-Oz at 08:22 AM  Permalink




 
INFO-LINK


Related Sites: DotNetJunkies, SD Expo, SqlJunkies