Load Balancing Your Web Site
Practical Approaches for Distributing HTTP Traffic
By Ralf S. Engelschall
When it comes to handling lots of visitors, high-volume sites like Yahoo!, Netscape, and Microsoft have learned that the actual quality of service a Web server provides to end users typically depends on two parameters -- network-transfer speed and server-response time. Network-transfer speed is mainly a matter of your Internet-link bandwidth, while server-response time depends upon resources: fast CPU (especially for CGI programs), lots of RAM (especially for parallel-running HTTP daemon processes), and good I/O performance (especially for disk and network traffic).
What do you do when these resources are exhausted and your Web server is struggling against heavy traffic? You could install more RAM on existing machines, or perhaps replace the CPU with a faster one. You could also use faster or dedicated SCSI controllers and disks with shorter access times (perhaps a RAID system with a huge cache). Software could be tuned as well; you could adjust operating-system parameters and Web-server software to achieve better performance.
Or you can address the problem with an alternative approach: Improve performance by increasing the number of Web servers. This involves an attempt to distribute the traffic onto a cluster of back-end Web servers. Aside from the technical hurdles, this is an interesting approach, because the back-end servers don't need to be large-scale machines -- medium-scale hardware works just fine.
Assume there are N back-end servers available, named wwwX.foo.dom (where X is between 1 and N), and you want to use the cluster approach to solve the resource problem. The goal then is to balance the traffic (addressed to www.foo.dom) onto these available servers so that the technical distribution is totally transparent to the end user. Your Web-site visitors can still use canonical URLs of the http://www.foo.dom/bar/quux/ form to reach the Web cluster, and are not directly confronted with the fact that their requests are being served by more than one machine. They never see the underlying distribution. This is important both for backward compatibility and to avoid problems (for instance, bookmarking pages, or a back-end server crash). The new Web cluster should behave identically to the old single-machine approach.
The DNS Approach
The first solution is based on the Domain Name System (DNS); see
Figure 1. Here we exploit the fact that the first step a browser has to take to retrieve the URL www.foo.dom/bar/quux/ is to resolve the corresponding IP address for www.foo.dom. This is accomplished with a passive resolver library that calls a nearby DNS server, which then actively iterates over the distributed DNS server hierarchy on the Internet until it reaches your DNS server, which finally gives the IP address. Instead of giving a static address for www.foo.dom, the DNS server gives the address of one of the back-end Web servers. Which one depends on the scheme you want to use for balancing the traffic and the technical possibilities available for this decision.
The state of the art in DNS-server implementation is still the Berkeley Internet Name Daemon (BIND), which is currently developed and maintained by the Internet Software Consortium (ISC).
The key to this solution is the fact that BIND provides a nifty, but widely unknown, feature called "round robin." Round robin lets you select and provide a particular IP address from a pool of addresses when a DNS request arrives. In the meantime, the selection pointer to this pool is increased and when it reaches the last element, it starts again with the first one. It is configured making www.foo.dom an alias that is mapped to wwwX.foo.dom by using multiple CNAME (canonical name) resource records as shown in
Example 1.
This sounds perfect in theory, because the traffic is distributed equally onto the Web cluster. But in practice, we have to fight with the fact that DNS servers cache the resolved data at any point in the DNS hierarchy both to decrease the resolver traffic and to speed up resolving. This caching is controlled by a time-to-live (TTL) value that is appended to each piece of information by our DNS server. It resides in the "start of authority" (SOA) resource record of the BIND zone file where the CNAME resource records reside.
Now we have a dilemma: When we set this TTL value too high, we decrease the DNS traffic on our Internet link, but we let the other DNS servers cache our information too long, which leads to bad HTTP traffic distribution over our Web cluster. On the other hand, when we set this TTL value too low, we increase our DNS traffic and the request time for the visitor dramatically, because the other DNS servers expire our information faster, so they have to resolve it more often. But we then have a better balancing of HTTP traffic. The decision for the best TTL value is thus dependent on the level of balancing that we want and how many intermittent delays we think the visitor will accept before deciding that the Web-cluster approach has reduced the quality of service. In practice, a TTL of one hour has been shown to be adequate.
One problem remains: When we change the SOA resource record in the zone file for foo.dom in order to achieve the effect for www.foo.dom, we also change the TTL of all other entries in this zone file. For instance, ftp.foo.dom is also assigned the decreased TTL, which increases the DNS traffic unnecessarily. To overcome this problem, we have to use another trick: We move the round-robin entry for www.foo.dom into a separate zone file that only gets the decreased TTL. For the configuration, we use a round-robin subdomain, rr.foo.dom, to achieve the effect. See
Listing One,
Two and
Listing Three for the final BIND configuration.
The Reverse Proxy Approach
The DNS-based approach is simple and elegant, but has some drawbacks. The caching of DNS data and the simple round-robin decision scheme of BIND restricts its usefulness. For instance, when one of the back-end servers crashes, www.foo.dom is not available to all visitors of this back-end server, at least for the TTL we used. Even hitting the Reload button in the browser won't work, because once a particular back-end server is resolved, it remains the contact point for that particular visitor until the address information expires. The round-robin scheme also treats all back-end servers equally. For example, back-end servers can't be selected dependent on the requesting URL. Perhaps we only want to run CPU-intensive jobs (CGI programs, for instance) on a subset of the back-end servers to avoid slow static-data serving.
The solution is a "reverse proxy," an HTTP proxy server that operates in the opposite direction of the commonly known one, hence the name. Usually, an HTTP proxy server is used near or in front of the browsers to bundle requests (when using a firewall) and to reduce bandwidth waste by performing data caching. Browsers call their proxy with the absolute URL http://www.foo.dom/bar/quux/ and the proxy either forwards this request to parent proxies or requests the relative URL /bar/quux/ from www.foo.dom. In other words, the proxy either forwards absolute URLs or translates them to relative URLs. In contrast to this, a reverse proxy masquerades as the final www.foo.dom server, and translates the relative URL back to an absolute URL addressed to one of its back-end servers.
Figure 2 shows how a reverse proxy resides side-by-side with the back-end servers and visually (for the browser or the other proxies) acts as the final Web server. But instead of serving the request itself, it determines a proper back-end server on-the-fly, turns the request over to it, and then forwards the response. No DNS tricks are needed here; www.foo.dom now actually resolves to the IP address of the reverse proxy in the DNS. For security and/or speed considerations, the back-end servers can even be placed on a separate subnet that stays behind the reverse proxy (see
Figure 3). This way, you separate the communication traffic between the reverse proxy and its back-end servers and even avoid N officially assigned IP addresses and DNS entries for the back-end servers. Additionally, you can place the back-end servers behind your company's firewall. A reverse proxy is a very elegant solution that provides maximum flexibility for your network topology.
Once this network/machine topology has been established, numerous benefits emerge. First, we have a single point of access -- the reverse proxy. This leads to simplified traffic logging and monitoring of the Web site, although we are now using a Web cluster instead of a single server. Secondly, we now have complete control over the back-end delegation scheme, because it's done locally in the reverse proxy for each request and not cached somewhere on the Internet. Because the delegation scheme is performed locally, changes are activated immediately. For instance, when one of the back-end servers crashes, we just change the delegation configuration of the reverse proxy, so the crashed back-end no longer leads to errors for the visitors. After it is repaired we can reactivate it as simply as we deactivated it.
One problem remains: Which hardware, software, and configuration can be used to implement the reverse proxy? The choices are many. There are some dedicated proxy-software packages (for example, Squid Internet Object Cache, Netscape's or Microsoft's Proxy Server, and Sun's Netra Proxy Cache Server) and hardware-based solutions (such as Cisco Systems' LocalDirector and Coyote Point Systems' Equalizer) that can be used as reverse proxies. As part of the Apache team, I've designed a pragmatic and cheap, but nevertheless high-performance and flexible, all-in-one solution.
Because we have to rewrite a mass of relative URL requests to absolute URL requests to the back-end servers, we need a scalable server with a powerful URL-rewriting engine and an HTTP-proxy engine. Apache already provides these with its preforking process model and its mod_rewrite and mod_proxy modules. So, the idea is to strip down the full-featured Apache modules and configure them according to our required functionality.
Functionality was the problem here because mod_rewrite lacked the ability to perform random selection. After thinking about the reverse-proxy functionality, we also noticed that mod_proxy lacked the ability to divert HTTP responses back to itself. Because high performance is a major requirement for a reverse proxy, the only alternative would be Squid, the most popular dedicated proxy. But it's not easier to use this program as a reverse proxy. So we decided to stay on the Apache track and enhance it to create a full-featured reverse proxy. At the time of this writing, the patches are being considered for inclusion in the official Apache sources for version 1.3b6, but currently only Apache 1.3b5 is available. So we first had to create the Apache binary out of the original source code and the patches.
Listing Four shows a script that automatically builds the binary. (See "
Online" for corresponding patches and sample configuration files.)
After running this script, we receive a binary named apache-rproxy, which is a heavily stripped-down Apache plus the missing functionality added. So now we can start configuring it as our reverse proxy. We assume that we have a pool of exactly six back-end Web servers, named www1.foo.dom through www6.foo.dom. www5.foo.dom and www6.foo.dom should be dedicated to running CPU-intensive jobs, while www1.foo.dom through www4.foo.dom should mainly serve the static data. Inside these two subsets of back-end servers, the traffic should be balanced.
Let's start with the configuration of our available back-end servers. We create a file named apache-rproxy.conf-servers in
Listing Five. Under the key static, we list the servers that serve the static data, and under the key dynamic, we list the ones dedicated to dynamic data.
Additionally, we need the actual Apache configuration file that controls the apache-rproxy binary (see
Listing Six). apache-rproxy first sets up the runtime parameters, then configures the auxiliary files that Apache uses with a custom log file that shows only the request delegation. Next, we add more directives to make Apache quiet on startup and to avoid runtime side effects. Then we activate the online status monitor for our proxy through URL www.foo.dom/rproxy-status. The actual reverse-proxy configuration follows. First we turn on the URL rewriting engine without logging. Then we activate the apache-rproxy.conf-servers file by defining a rewriting map called servers that has a random subvalue post-processing enabled (the rnd-feature, which was added by our patch). We then have to make sure that the status monitor is handled locally, rather than by the back-end servers, and that no one on the Internet can exploit us by using our reverse proxy as a standard proxy.
The delegation scheme is now ready to be implemented. First, we delegate all URLs to CGI programs and SSI pages to the servers under the key dynamic, which is either the server www5.foo.dom or www6.foo.dom. The one used is randomly chosen by mod_rewrite. All other URLs are then delegated to the servers under the key static. The delegation is activated by passing the URL through the Apache proxy module mod_proxy, while setting the environment variable SERVER to provide the logging module with complete information to write the delegation log file. Then, we make sure no URLs survive. The program activates mod_proxy as a plain proxy without caching. We then configure mod_proxy as a reverse proxy by using the second feature we've patched into our Apache program. We force our reverse proxy to divert back to itself all URLs in http Location headers that the back-end servers send on HTTP redirects to again use the reverse proxy. Either the back ends are not directly accessible, or we want to let all traffic flow over our reverse proxy and avoid bypassed traffic.
Server Processes
Finally, we must calculate the number of servers (NOS) and the amount of RAM in MB (MBR) that we need for our reverse proxy. To calculate these values, we need three input parameters: the maximum number of HTTP requests per minute (RPM) we expect, the average number of seconds an HTTP request needs to be completely served (SPR = seconds per request), and the maximum number of MB an apache-rproxy process needs to operate under the operating system (SPS = server process size). The formulas in
Example 2 assume that because of lingering socket closes, 20 percent of the servers are not always available. These formulas also assume that we conservatively want to use only 70 percent of the available memory for our reverse proxy, and that we have only 16-MB chunks of RAM available.
For instance, when we run our reverse proxy under FreeBSD (see text box entitled "Performance Tuning Apache Under FreeBSD"), we see with the commands ps or top that each server process requires between 700 KB and 900 KB of RAM. So we have SPS = 0.9 MB. Because we have approximately 1000 requests per minute we use RPM = 1000. With HTTP benchmarks or just by appreciating, the processing time is between 0.5 and 4 seconds per request, so we use SPR = 2 seconds. Through the above formulas, we see that we need NOS = ceil(1000 * 2 * (1/60) * (100/80)) = 42 servers to start, and that we have to make sure that our machine has at least MBR = ceil((0.9 * 42 * (100/70)) / 16) * 16 = 64 MB of total RAM installed.
In Summary
We've looked at two methods for distributing the load across a number of low-cost servers. The DNS method is fairly standard, but does have its problems, including caching of DNS data and the potential that an individual machine in the cluster may fail. The reverse-proxy method improves reliability and control by allowing the administrator to designate subclusters for specific tasks, but we could go even further. For instance, we could write a script on the reverse proxy that periodically polls the back-end machines through rsh or ssh, and then adjusts the apache-rproxy.conf-servers table accordingly, restarting the reverse proxy through a kill-USR1 to reread the configuration file and replace the server processes with new ones. And finally, wouldn't it be nice if we could do some sort of server balancing based not on load, but on the actual bandwidth? This may soon be possible, as I am currently working on Apache::SiteSwitch, a mod_perl-based module that will allow for a distributed set of servers (each one has its own particular Internet link) to negotiate for the optimum world-wide path between client and server. This will be especially interesting for Web sites with international mirrors.
(Get the source code for this article here.)
Ralf is a computer science student at the Technische UniversitSt Mnchen (TUM), Germany, and a member of the Apache Group and FreeBSD developer teams. He can be reached at rse@engelschall.com.