Client Side Load Balancing for the Web

Steve Lewontin
Elizabeth Martin
Open Group Research Institute
11 Cambridge Center,
Cambridge MA, 02142
stevel@opengroup.org
emartin@opengroup.org

Abstract

Mirroring is widely used to improve the availability of documents on the Web. Mirroring should be transparent to users, but Web software does not incorporate mechanisms to choose among replicated sites. Instead users have to choose without knowing which site will give the best performance. The useability of replicated sites is greatly enhanced by incorporating load balancing logic into clients. Traditional server oriented load balancing mechanisms are not suited for this purpose. We have designed a simple algorithm that allows clients to choose sites based on past experience. This algorithm is combined with a mechanism to automatically discover replicated sites and present all replicas via a single URL, thus providing complete transparency to users. Our implementation of this mechanism uses a directory service, though we also propose mechanisms that can work without directory services. The client side load balancing mechanism we have developed improves useability by increasing performance, hiding replication and site selection from users, and providing transparent fail-over when replicated sites go down.

Introduction

Mirroring is widely used to improve the availability of documents from heavily loaded Web sites, but Web browsers have no means to select intelligently among mirrored sites. Instead, lists of mirrored sites, sometimes organized by geography, are made visible for users to select from. In effect, the user's best guess about which site to access is the only load-balancing algorithm available among mirrored sites. Unfortunately users often have no clue about which site will actually yield the best performance. Moreover, making mirrored site lists visible on Web pages sharply conflicts with the Web's underlying user model of links as abstract connections that hide the physical locations and storage of documents.

Ideally, the Web should support document replication transparently to users: browsers, proxies, and other Web software should incorporate the intelligence to choose among replicated sites, and the fact that documents are replicated should be hidden from users by single links that can access multiple replicas. Such a system increases the useability of the Web by hiding the complexity of load-balancing from users, improving the availability of high-demand documents, and better supporting the user model of hyperlinks. We have implemented such a system as part of our work on the Secure Enterprise Web. In this paper, we describe the design of our system, discuss its useability and performance, and show how it can be incorporated into a variety of Web environments.

The Uses of Replication

Mirroring is one example of replication, a widely used technique for increasing the scalability, availability, and performance of distributed systems. Much replication work deals with the problem of maintaining consistency among replicated sites, but this is generally not an issue for Web site mirroring since updates are infrequent and a high level of consistency is not required. Typically what is being replicated are static documents or software for downloading, and there is no serious penalty if some sites are temporarily not up to date. Much work has also been done on the issue of load balancing among replicated servers, but the focus has been on small, tightly coupled clusters. In the Web, such load balancing is typically achieved by DNS round-robin or some other hardware-based mechanism among several nodes at a single site. However, little work has been done on load-balancing among multiple replica sites [1].

Mirroring is a widely used mechanism on the Web. One of the reasons is that internet latency is such an important factor in download times, and local clusters cannot help here. By scattering mirror sites across the topology of the internet, ad hoc load balancing across internet bandwidth is achieved [2]. However, the load balancing techniques used for closely coupled clusters are not much help for widely separated replica sites. The most common technique is simply to round-robin among servers in a cluster. This model assumes that there is some single point of contact for the site--typically a router--that can switch off among servers. Clearly, this makes no sense for the mirroring case where the goal is to avoid the bottleneck of a single point of contact or failure.

One of the basic assumptions of our model is that with mirroring the locus of server selection needs to be moved from the server site to the browser. This radically changes the view of load balancing. In the cluster case, the cluster site wants to distribute load across its servers. In the case of a user who wants to access a document on a mirrored site, on the other hand, the distribution of load over replica servers is not directly interesting. What the user wants is the lowest latency path to a document, and the factors that make this up include not just server load, but network latency as well.

What this means is that load balancing algorithms that make sense for clusters may make no sense for a client trying to select among mirrored sites. For example, while round-robin is a reasonable mechanism to distribute load across multiple nodes at a cluster site, it would clearly be counterproductive for a client since it would mean randomly switching from low latency to high latency sites, a sure way to guarantee poor performance. Similarly, mechanisms have been developed to actively measure server load in a cluster and direct requests to the least loaded server. But when network latency between clients and servers is high relative to the time period for significant variation in server load, this feedback mechanism fails because servers will receive document requests based on out of date load data.

To deal with these problems, we designed a client side load balancing mechanism based on very different principles. The basic constraints we set for ourselves were:

Our solution consists of a simple client-side load balancing algorithm combined with a mechanism for hiding replica selection from the user.

A Simple Client Side Algorithm

The algorithm we implemented can be summarized by saying that the client uses past experience to make a best guess about which site is likely to provide good performance. The rationale is that the overall performance of a download is a compound of many factors affecting network latency and server load, but the client can neither measure nor predict these separate factors with any accuracy. Furthermore, attempts to actively measure these factors may be counterproductive. For example, pinging sites to determine network latency expends bandwidth and time to measure a highly ephemeral quantity. Instead of actively attempting to measure and predict performance, we accept that the client must make a guess based on limited data, and assume that past performance is a reasonable predictor of future performance.

Our model is a simple one: a decision about which server to try should be based on an aggregate of past results, extending over a reasonable period of time, but the choice needs to take into account the possibility that a server that usually gives good results may be having a bad day. The basic algorithm is as follows: the client obtains a list of replica site locations (more about how this is done in practice later). Such a list, which we call a binding vector, represents the underlying abstraction of a single link to replicated sites. In parallel to each binding vector, the client maintains a load vector where it stores performance information for each site listed in the binding vector [3]. As the client makes requests to sites, it updates a running average of performance (measured as total bytes sent and received/ total time) for each site, based on the last n downloads from that site. The value of n is chosen so that a reasonable number of past downloads are considered but that very old data is eventually eliminated. The load vector is saved between browser sessions so that long term performance history is accumulated.

When deciding which site to use, the client selects the one with the best historical average, but with a number of qualifications. First, we compare the performance of the last request with a switch-over threshold. If performance is not up to par, the site is flagged as slow. Slow-flagged sites are temporarily eliminated from the site selection decision, so that a recent slow request will cause the client to switch over to the historically next best server. Untried sites (which thus have no historical data) are only tried when all other sites are either untried or marked as slow. After a time-out period, or if there are no suitable sites remaining, retries of slow-flagged sites are permitted. While historical performance data is saved in persistent storage, slow flags are not, so that initial selection during a client session is made on the basis of historical data only.

When making a document request based on a new binding vector, the client simply makes a guess about which site to contact. In practice, the client may have some heuristic for choosing the first site--for example, an ordered list of domain names--but the model does not assume that the client has any real knowledge about the best site to choose that would make its guess any better than random. In fact, what we implement is a simple random selection from the binding vector. After some sites have been tried, the algorithm implies that other sites will not be tried as long as there are known sites whose performance is above the switch-over threshold. Some sites may never be tried although they may potentially offer better performance than any of the currently active sites. This is a trade-off that our model accepts since the goal is to get acceptably good performance rather than the theoretical best. The algorithm can be provoked to test more sites by raising the switch-over threshold, but the threshold must not be raised too high or the client wastefully tries too many sites.

We have also considered more sophisticated load averaging algorithms than a simple running average, but we are not convinced that there is much to be gained in this direction. Ultimately, the client is simply generating a guess about which server is likely to give good performance based on very limited data. The ideal behavior is to find an acceptable site and stick with it as long as it remains acceptable: what might be called the "pretty good performance" principle.

Transparent Replication

The other major component of client side load balancing is a mechanism to hide selection among replicated sites from the user. The idea is to fully support the user model of Web links as connections to abstract document locations that hide such details as network addresses, storage, and replication. In describing the transparency mechanism here, we discuss some of the details of our implementation, but only as a guide to issues that we believe will arise in any implementation. We originally implemented client side load balancing as part of the Secure Enterprise Web, a package that includes security, directory, and replication services for enterprise Webs. The package is now licensed as part of a number of commercial products, but we designed the load balancing mechanism with the express intention that the same principles could be applied independently of other elements of the package.

We built the load balancing logic into a small Web proxy since this is the easiest way to make it available to any browser (via standard proxy HTTP). The load-balancing logic could also be implemented using one of the plug-in APIs or some other extension technique supported by browser makers. However, a proxy has the important advantage that it can be shared among browsers at a site, allowing several browsers to pool the historical knowledge they gain about site performance. The nature of the algorithm is that it improves as more requests are made and sites are tested. (In the degenerate case where a replicated site is accessed only once, the algorithm is simply a random guess.) Having several clients at a site share load data greatly improves the chance that they will benefit from load balancing.

The basic model of the load balancing module (whether implemented as a proxy or by some other means) is to examine the URLs requested by clients to discover whether they point to replicated sites. When they do, the module examines its cache to see if there is a corresponding binding and load vector pair. If there is, the module applies the load balancing logic to determine where to forward the client's request. If not, the client first constructs and caches the vectors and then applies the load balancing logic. A simple fail-over mechanism is also layered on top of the load balancing logic. When a request fails because a server is down, the server is marked as slow, and the load balancer retries with another server.

Since a major goal of our model is to have single URLs that point to multiple sites, our design requires some mechanism to deploy such URLs. Our approach is to use a directory service to map a URL into multiple server locations. This can be done transparently to the browser by having the proxy recognize a URL that can be mapped and, if necessary, make an inquiry to the directory service for the appropriate mappings. The problem with this approach for the Web as a whole is that directory services with the necessary functionality are not yet widely deployed. However, browsers and servers that support LDAP are now becoming available, so this dilemma may soon be resolved. In the meantime, we propose a set of transitional mechanisms that can be used in the absence of widely deployed directory services.

Using a Directory Service

The directory service we currently employ is the DCE Cell Directory Service since this is available as part of our enterprise Web package. However, the specifics of this directory service are unimportant since other standard directory services, such as those supported by LDAP servers, provide the same functionality. A directory service stores entries containing data members in a hierarchical name space. The directory service supports a variety of operations to add entries to the name space, to modify the data stored in entries, to retrieve the contents of a named entry, etc.

Our model for directory service usage is that all replicas for a given site export their addresses to the directory service so that they can be accessed under a single name [4]. A URL to replicated sites contains this name. When the load balancing module detects such a URL, it extracts the name and queries the directory service for all of the server addresses stored under that name. The information received back from the query is used to populate a binding vector stored in the load balancer's cache.

A couple of subtleties need to be pointed out here. One is that the load balancer needs an efficient mechanism to recognize URLs that can be resolved by a directory service. Once LDAP URLs are available, this will not be an issue, but with HTTP URLs some cue must be embedded in the URL to tell the load balancer to search its cache or try the name service. Our load balancer implementation recognizes a DCE Cell Directory Service name (which always begins with "/..." or "/.: "), but other mechanisms are possible [5].

For example, suppose a set of replicated sites export their addresses to the directory service under the name "/.:/www/ri ". A typical replica URL in our implementation would then look something like this [6]:


http://localhost/.:/www/ri/public_html/specs/index.html

When the load balancer recognizes the directory service syntax (a name beginning with ".:"), it passes the rest of the URL to the directory service.

It is very important to note that the portion of the above URL passed to the directory service ("/.:/www/ri/public_html/specs/index.html") contains both a name known to the name service (".:/www/ri") and a document name known only to the replicated servers ("/public_html/specs/index.html"). However, the load balancer has no means to parse the name into these two pieces: only the directory service knows how to do that. To resolve this problem, it is very useful for the directory service to support a partial name resolution mechanism. Such a mechanism allows a query to be made for a name that is longer than any name known to the directory service. What the directory service does in response to such a resolution request is to return the resolved and unresolved portions of the name to the client. In the absence of such a service, the URL needs to contain some kind of syntactic cue about where the two portions of the name join, for example by treating the document name as a query string.

One final point is that the load balancing service must not transfer the work of load balancing from Web servers to the directory service. Directory services do support distributed replication so that some of the cost of load balancing queries can be borne by the directory service without substituting one bottleneck for another. Nevertheless, one of our design constraints is to avoid extra network traffic. Therefore, the load balancer cache is designed to make requests to the directory service as infrequently as possible. One of the beauties of using a directory service for replication is that sites can be added to and removed from the replica list as servers come up and go down. The caching algorithm causes the load balancer to try to reload bindings from the directory service if all servers are marked as down and also provides for configurable aging of the cache. This allows the load balancer to take advantage of new servers if needed without requiring too frequent access to the directory service.

Other Approaches

Directory services will probably become widely available on the Web at some point in the not too distant future, but for now it seems useful to provide transitional mechanisms to support client side load balancing using current facilities. We are currently experimenting with several approaches which use the same basic technique: the load balancer filters documents as they are downloaded and extracts vectors of replicated server addresses from them on the fly. In each case, the downloaded document itself contains the information that maps one URL to many addresses so that no directory service is required.

One simple method is to have the load balancing proxy filter documents as they are downloaded and map what appear to be lists of mirror sites into a single URL which is inserted into the downloaded document. This method has the advantage that it requires no cooperation from servers and can be used with current mirror sites. The method is not highly reliable, however, since there is no guaranteed way to identify lists of mirror sites in the running text of an HTML document. Also, modifying a downloaded document to insert new URLs is a questionable practice since it changes the appearance of the document. The alternative approaches involve adding either HTTP extension headers or adding HTML comment fields that describe how to map a single URL to a set of replica URLs. These mechanisms are more reliable and do not require the proxy to modify documents. However, these mechanisms can only be implemented with the cooperation of servers to insert the required headers, or document authors to insert the required HTML tags.

While these approaches work, we are reluctant to propose them as standard mechanisms for replication because we believe that they still violate the correct data model by inserting replication information into documents, even if this is done on the server side and is invisible to users. Replication data is by nature dynamic, changing as servers come up and go down, and it is not an attribute of document content, but of document storage and location. Embedding replication data in document text makes documents non portable, and adding replication data to headers means that stale data will be held in proxy and browser caches. We believe that a replicated, light-weight directory service like LDAP is ideally suited as a medium for conveying dynamic replication information.

Load Balancing Behavior

The goal of our load balancing work is to produce a system where the user clicks on a single URL and gets transparent access to a replica site if the document pointed to is a replicated one. Part of this behavior is embodied in the load balancing algorithm, which attempts to assume the burden of choosing a replica site with reasonable intelligence. Thus, one obvious test of the load balancing system is to measure its performance as compared with, for example, a purely random choice of sites. The following table shows results of a test which consists of Webstone driving load-balancing proxies to retrieve documents from two replicated servers. Server A is on a heavily loaded machine, server B is on a lightly loaded machine in order to simulate the behavior a client might see when downloading documents from servers with varying loads and network latencies:

Configuration
Average Throughput (bits per sec)
Server A up, Server B down
96,621
Server B up, Server A down
1,014,387
Servers A and B up, load balancing
998,020
Servers A and B up, random selection
643,134

As the table shows, use of the load balancer gives throughput nearly as good as the best server and roughly 50% better than a purely random selection between servers.

One of the things that our tests consistently show is the sensitivity of load balancing behavior to the threshold setting. If the value is set too low, the load balancer may become stuck on a poorly performing server, so average performance may be no better than a random guess. If the value is set too high, the load balancer wastes time trying too many sites. In our current implementation the threshold level is set by hand, but we have begun to investigate mechanisms to set the threshold automatically to the optimum level [7].

Despite the positive results, it is important to point out that this kind of performance testing does not really measure some of the important benefits of the load balancing mechanism. Even if the algorithm is replaced by a purely random choice, the user still gets the benefit of transparency and fail-over. If servers are taken down and brought up during the test, the load balancer transparently finds the best currently available server, so that server failures are visible to the user only as a possibly longer latency. An important component of the benefits of the load balancer is thus subjective. The Web appears to be easier to use, availability is higher, and the user sees a more consistent model of the Web as an abstract space of documents connected by hyperlinks.

Notes

  1. For a description of cluster load balancing for the Web, see Eric Dean Katz, Michelle Butler, and Robert McGrath, A Scalable HTTP Server: The NCSA Prototype, NCSA, University of Illinois, 1994. For discussion of server selection among distributed sites, see, for example, Robert Carter and Mark Crovella, Dynamic Server Selection Using Bandwidth Probing in Wide-Area Networks. Computer Science Department, Boston University, BU-CS-96-007, March 18, 1996; Michael Garland et al. Implementing Distributed Server Groups for the World Wide Web, School of Computer Science, Carnegie Mellon University, 25 January 1995, CMU-CS-95-114.
  2. It is worth noting that caching is often a highly efficient replication mechanism and that a hierarchy of caching proxies has often been proposed as a general solution to load balancing for the internet. (See for example, Chankhunthod, A. et al., A Hierarchical Internet Object Cache, Proceedings of USENIX 1996 Annual Technical Conference, pp. 153-163.) This may in fact offer a more scalable solution than mirroring in the long run, but the infrastructure is currently not in place. One of the reasons that mirroring is so widely used is that it can be applied on a site-by-site basis without requiring any new internet supported infrastructure.
  3. The fact that load and binding vectors are separate objects is an implementation detail. However, this permits us to abstract the protocol and addressing information used for connections from the load data. In practice, we have implemented the load balancing mechanism as a library that knows nothing about specific protocols and addresses so that we can use the same mechanism with multiple protocols.
  4. The fact that server addresses are accessible by the same name does not imply that they are all stored in the same entry. In our implementation we make use of group entries that hold the names of a set of entries, each of which contains address information for a single server. The directory service supports queries through group entries.
  5. For example, the DNS or IP address portion of the URL is no longer needed to identify the server that will handle the request (since the directory service resolves this). Instead, this can be used to hold the address of a directory service.
  6. The "localhost" DNS name in the sample URL is merely a placeholder, since URLs require something in this field. In fact, when our load balancer implementation sees a DCE Cell Directory Service name, it ignores the DNS name. As pointed out in note [4], another implementation would be to use this field to hold the address of a directory server.
  7. In the test shown, the threshold value is set at 200K bytes/second. Setting the value low enough (roughly 10K bytes/second), causes the load balancer to stick with server A in roughly half the tests, giving performance no better than random selection on average.




Return to Top of Page
Return to Posters Index