HAWA: A Client-side Approach to High-Availability Web Access

Yi-Min Wang AT&T Labs, Research ymwang@research.att.com	P. Emerald Chung Bell Laboratories Lucent Technologies emerald@bell-labs.com
Chih-Mei Lin AT&T Labs, Research cmlin@research.att.com	Yennun Huang Bell Laboratories Lucent Technologies yen@bell-labs.com

Abstract

In this paper, we describe a client-side applet-based approach, named HAWA, to improving the availability and quality of Web accesses. HAWA allows users to bookmark any HTTP requests, organize them into groups of equivalent services, and invoke the services through the group names. With the built-in mechanisms for automatic retry and parallel accesses, HAWA can mask access failures, provide fast responses, and present multiple responses in a customizable fashion. Several examples are used to demonstrate the practical usefulness of this approach. An implementation using applet-based filtering is described.

1. Introduction

The World Wide Web has become a primary source of information for our daily lives. We store in our browser bookmark files those URLs that we frequently access for information such as stock quotes, news, weather forecasts, product prices, etc. With the explosive growth of popularity of the Web, however, we may not be able to obtain the information we want within a reasonable response time either because the Internet is congested or because the Web servers are overloaded. It is therefore important to investigate the availability issues of Web accesses. Consider the scenario where you eagerly want to see the current prices of the stocks that you own, but the quote server is simply not responding. What can be done to improve such a situation?

From the client side, we can view the entire Web as a slow and unreliable information server with heavy redundancy . There are at least three things we can do to improve the quality of Web accesses. First, since server unavailability are sometimes transient problems, automatic retries can relieve users from the frustration of having to repeatedly submit the same request and see the same error messages. Second, for any kind of popular information, it is almost guaranteed that there will be multiple Web sites providing that information. Automatically retrying another equivalent site when one site is not responding can greatly improve availability. Better yet, the response time can also be improved by issuing parallel accesses to multiple equivalent sites at the same time, and presenting to the user the first reply that comes back. Third, not every service unavailability can be automatically detected. For example, it is not uncommon for a Web server to reply a normal HTML page containing arbitrary error messages. In this case, the best solution is to present to the user all the responses from multiple parallel accesses, and let the user decide which response to use. As demonstrated later, presenting side-by-side the responses from multiple equivalent sites also has many other advantages.

In this paper, we describe the design and implementation of HAWA (High-Availability Web Access) , a client-side approach to providing high availability and ease of access. HAWA allows a user to organize URLs that provide similar information into a group. Once a group is specified, the user can then access the information using the group name, and select the capabilities of automatic retry or parallel accesses .

2. The Design of HAWA

HAWA consists of a registration applet , an access applet , and a few auxiliary HTML files. The target application environment is for internet service providers to bundle HAWA with the browser software that are sent to the customers and installed on the customers' machines.

2.1. Registration

The first step in using HAWA is to access the registration page which contains the registration applet. This page allows a user to create groups, add URLs to groups, and specify retry parameters. In addition, it supports the following three enhanced bookmark functions:

POST request registration: Traditional bookmarks usually allow the user to save only the URLs. That is not sufficient for bookmarking an HTTP POST request which also contains a message content. Figure 1 shows an example POST request in which the last line is the message content.

POST http://qs.secapl.com/cgi-bin/qs HTTP/1.0
Referer: http://www.secapl.com/cgi-bin/qs
Proxy-Connection: Keep-Alive
User-Agent: Mozilla/3.0 (X11; I; IRIX 5.3 IP19)
Host: qs.secapl.com
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
Content-type: application/x-www-form-urlencoded
Content-length: 13

tick=TLC+AMAT

Figure 1: A typical HTTP POST request.

To provide a convenient way for the users to register POST requests, HAWA implements a simple applet-based request recording mechanism which captures the entire request message (as shown in Figure 1) as part of the registered information. More details are described in the Implementation section.
Variable substitution: The variable substitution feature allows the user to, for example, change the last line of Figure 1 to tick=HAWA_INPUT1 . The user can then invoke the registered request with arbitrary stock symbols, instead of a fixed set of symbols.
Auto-scrolling: The auto-scrolling feature is particularly important for multi-frame displays. Instead of requiring the user to manually scroll each frame to the part of the page that he is interested in, HAWA allows the user to specify a scroll-tag for each registered request. This tag is used as a keyword at access time to filter the response HTML page to enable auto-scrolling.

2.2. Access

After the registration, the user can go to the access page to invoke HAWA-enabled service. The access page consists of two frames: an access frame and a data frame. The access frame contains the access applet which displays the existing groups, the URLs in the selected group, and the available access modes, as shown in the top frame in Figure 2. After the user clicks on the HAWA Access button, the access applet is responsible for sending out requests according to the specified access mode, and finally displaying the response in the data frame below the access frame. (In Figure 2, the data frame consists of the three lower subframes.) HAWA provides four basic access modes to address the issues discussed in the Introduction.

Same-site retry: When a client submits a URL request, it may not receive a response for a number of reasons including network congestion, server overload, server failures, etc. Since many of these problems are transient in nature, resubmitting the same request at a different time can often bypass the problem. Even when the same problem persists, retrying the same site can still help because the same URL request may in fact be sent to a different server machine. For example, in the popular round-robin DNS scheme, the same host name may be translated into different IP addresses in a round-robin fashion [7] (provided that the local DNS does not cache the mapping for a long time). In the network address translation approach [4], the same IP address can be translated into the IP address of any of the server machines in a cluster. Even when the request is re-sent to exactly the same IP address of a failed server, a backup server may have taken over that IP address and therefore can process that request.
Since manual retry is often a very unpleasant experience, HAWA provides an automatic retry mechanism. For each URL, the user specifies three retry parameter: timeout, retry period, and maximum number of retries. When the Same-site retry access mode is selected, HAWA performs periodic retry based on these parameters until either a response comes back or the maximum is reached.
Sequential retry: The previous option is often used when the user has a strong preference for the primary site in a group and the unavailability of that site is mostly transient. If that is not the case, then trying a different site after an access failure may be a better choice. When the user selects the Sequential retry access mode, HAWA sets up a timer for each request according to the user-supplied timeout parameter. If the connection attempt fails or the timer expires, HAWA immediately sends another request to the next site in the group. This process repeats until either a request succeeds or the list is exhausted. Since different sites are rarely down at the same time, Sequential retry can often greatly improve service availability.
Parallel-any: When network traffic load is not a great concern or when the user needs a response as soon as possible, the Parallel-any mode can be used to improve response time as well as availability. When this mode is selected, HAWA makes multiple simultaneous connection attempts to all of the URLs in the group. When one of the connections is established, all the other connection attempts are aborted, and the response from the successful connection is displayed to the user. Although sending multiple connection requests incurs additional overhead, studies have shown that dynamic server selection in general outperforms static selection [6].

Figure 2: Parallel-all access mode for obtaining high-confidence forecast information.
Parallel-all: This access mode performs parallel accesses to all URLs belonging to the same group, and displays all the responses to the user. The original motivation was to use information redundancy to verify correctness and tolerate information errors. A common sentence seen on the Web is "We cannot guarantee the accuracy of this information." Since all of the sites in a group provide basically the same information, the user can compare the multiple responses to detect if there is any discrepancy. For example, the user may want to see the winning lottery numbers from multiple sites before he discards his tickets. Other types of information such as weather forecasts simply cannot be guaranteed to be correct. If a user needs weather information for arranging a party, it would be very useful for him to see the weather forecasts from multiple sites at the same time as shown in Figure 2, and either take a majority vote or plan for the worst case. Figure 2 also shows the two display options for the Parallel-all mode: Parallel-all(f) displays responses in a multi-frame page, and Parallel-all(w) opens a new browser window for each response. Since we introduced the Parallel-all access mode, the users have discovered many new applications. Three of them are described below.
- Since multiple equivalent sites often update their information at different times, Parallel-all allows the users to obtain the most up-to-date information. For example, when given stock quotes from multiple sites, a user can determine which quotes are most up-to-date by comparing the volumes. This can also be applied to other types of information such as sports scores, election results, news headlines, etc.
- The two retry modes and the Parallel-any mode do not work when service unavailability cannot be detected by a low-level mechanism. For example, a stock quote server may (and did) reply with an otherwise normal HTML page containing messages that apologize for not updating the data because its data feed is down. Since it is most likely not as heavily loaded as the other sites, its useless response often comes back first, thereby defeating the purpose of Parallel-any. As another example, a proxy server may send back a default page with error messages when it fails to connect to the site requested by the user. For these scenarios where service unavailability can best be identified by a human being, the Parallel-all mode is often the best choice to obtain correct information.
- Figure 3 shows an example of using HAWA to compare product prices when doing on-line book shopping. Although it can be difficult to extend the same approach to general shopping, the Parallel-all mode is very useful for shopping for products with unique identifiers such as ISBN numbers. All three sites in Figure 3 use the POST method for ISBN search. So the user needs to invoke the POST request registration function to create the ISBN Pricing group. Since repeatedly shopping for the same book is not very useful, the variable substitution function is used to allow a new ISBN each time. Finally, to make it easy to compare prices, all three pages are auto-scrolled to the pricing information as shown. It has been observed that each of these three on-line bookstores offers the lowest prices for different books. So a page like Figure 3 is very useful for finding the best bargain.

Figure 3: Parallel-all access mode for on-line shopping.

3. Implementation

3.1. Applet-based request/response filtering

A basic mechanism used in the implementation of HAWA is to intercept and filter HTTP requests and responses. For example, a fragment tag needs to be inserted to a response page to enable auto-scrolling; any outgoing POST request may need to be captured for registration. A natural way to perform request/response filtering is to inject a proxy server that sits between the client and the server, and intercepts every HTTP request and response. An earlier version of HAWA was built using such a proxy-based implementation. It was later migrated to the current applet-based implementation for the following two reasons. First, HAWA is not intended for improving the availability of arbitrary HTTP requests. It is therefore desirable to activate HAWA only when the user requests HAWA-enabled services, without getting in the way of the user's other browsing activities. Applet-based implementation allows HAWA to be activated only when the registration or access page is being accessed. Second, since the target users are customers of commercial internet access providers, it is much easier for them to go to the HAWA pages to access the service rather than to start a separate proxy server on their PCs.

Figure 4 shows the architecture for applet-based filtering used in HAWA. In the access page, the access applet starts a thread which opens a server socket at port number 8282, for example, that listens for requests coming from the browser. Another thread sends out requests and filters responses according to the user selections. When the final response is ready, the applet invokes the showDocumet() call with the URL argument http://localhost:8282/ . The effect is that the browser will send a request to the server socket, and the applet then supplies the final response through that socket to be displayed in the data frame. If security is of concern, the URL can also contain a one-time password generated by the applet. When the server socket receives a connection request, it checks the client IP address and verifies the password to make sure that it is indeed the containing browser making the request.

Figure 4: Applet-based filtering. (The numbers in parentheses indicate the order of events.)

To register a POST request, the user first types in the URL of the site providing the POST form in a text field inside the registration frame. The registration applet calls showDocumet() to ask the browser to fetch the URL and display it in the data frame. In addition, the applet opens a server socket at port number 8383 and gets ready to act as a temporary HTTP proxy to intercept outgoing requests. After the user fills out the form in the data frame, he changes the browser proxy setting by pointing the HTTP proxy to localhost:8383. When he hits the Submit button, the browser sends the entire request message to the proxy socket where the applet receives it and displays it in a text field in the registration frame. The user can then click the Register button to save the request message together with the retry parameters in a group. If the user would like to use the variable substitution feature, he can edit the content of the request message before clicking the button. If desired, the same process can also be used to register GET requests.

3.2. Implementation of access modes

When one of the retry modes is selected, the access applet tries to open a socket connection to the requested site. It also starts a timer thread based on the user-specified timeout value. If the connection request results in an I/O exception or if the timer expires, another connection request is automatically initiated. When the applet successfully receives a response, it performs necessary filtering and supplies the response to the browser through the server socket. The Parallel-any access mode follows a similar procedure except that multiple threads are created, each starting a connection attempt to one of the equivalent sites. Upon the first successful connection, all the other threads are destroyed.

When the Parallel-all(f) option is selected, the applet first supplies the browser with a multi-frame HTML page with the number of frames equal to the number of URLs in the selected group. If auto-scrolling is not specified for an URL, the corresponding frame tag contains that URL which is then directly fetched by the browser without filtering. If auto-scrolling is specified for an URL, the frame tag contains the file name of an empty page and defines the frame name. The applet fetches the URL, parses the response to find the user-specified keyword, and inserts an HTML fragment tag. It then issues a showDocument() call to ask the browser to overwrite the empty frame with the filtered response and scroll to the fragment tag. The Parallel-all(w) mode is implemented by calling showDocument() with undefined frame names.

4. Related Work

Several techniques have been proposed to provide transparent fault tolerance and load balancing by using more than one machines to serve the same URL [1,4,7,9]. This is similar to the idea of Parallel-any. One difference is that these techniques are developed from the page providers' point of view, and the emphasis is on providing information from any of several hosts serving exactly the same content. In contrast, HAWA is developed from the users' point of view, and the emphasis is on obtaining information from any of several hosts serving approximately equivalent contents as defined by each individual user. Moreover, most of those techniques use server-side approaches, while HAWA is a client-side approach. The Smart Client scheme [9] has a similar flavor as HAWA's Parallel-any in that it also provides a Java applet at the client side to dynamically select one of several equivalent sites. However, it is still a service-centric approach: the service provider provides the applet, the name resolver within the applet uses a service-specific mechanism, and the load and fault tolerance information are also service-specific. In contrast, HAWA is a user-centric approach: the applets that HAWA provides are not tied to any specific services, and the group information are all defined by the user.

The Internet Engineering Task Force (IETF) has been working on the issue of location-transparency for many years. The Uniform Resource Name (URN) scheme has been introduced to provide a location-independent naming mechanism [10]. Again, the URN architecture mainly allows a service provider to change the mapping between URLs and the resource names. HAWA's approach can be viewed as providing a personalized URN scheme.

A study by Crovella and Carter [6] showed that the access latency to a given site is not strongly correlated to the physical location or to the number of hops between the client and the server. Given a list of similar services, dynamic selection based on polling in general outperforms static selection. This study confirms that using the Parallel-any access mode on a group of URLs in general provides a better response time than always accessing a particular URL.

Web sites that provide one-stop shopping have a similar flavor as HAWA's Parallel-all. For example, the Computer ESP Web site provides a one-stop shopping for computer software and equipment [5]. It is implemented by indexing several on-line stores and providing a single search engine to the end users. The search engine is set up as a cgi-bin program at their Web site. Compared to HAWA, this kind of domain-specific cgi-bin-based approach to Parallel-all can usually provide a better integration of the responses from multiple sources. However, it is domain-specific and not generally applicable. Also, the availability of the Web site that provides such a service greatly affects the availability of information.

The paper by Ladd et al. [8] described MHTML as an extension to HTML for defining Multi-Head/Multi-Tail (MHMT) links. A multi-headed link points at multiple nodes all of which are opened in separate browser windows when the link is followed. This is similar to the Parallel-all(w) option. Compared to HAWA, the MHTML scheme provides Parallel-all for general HTTP links, instead of specific bookmarked groups. However, it requires an enhanced browser to understand the MHTML extension in order to provide this general functionality.

Finally, proxy-based services for providing value-added content transformations have been quite popular and successful [2,3]. As discussed earlier, HAWA was migrated from a proxy-based implementation to the current applet-based implementation because its focus is on providing value-added filtering only for user-specified URLs, and it needs to be tightly integrated with the browsers for the target application environment.

5. Conclusions

We have defined four access modes for obtaining information from groups of URLs that provide similar services. The Same-site retry mode is useful for masking transient access failures. The Sequential retry mode allows the users to specify a personalized list of backup sites for each type of information. The Parallel-any mode provides a single-service-name image for highly available and fast Web accesses. The Parallel-all mode presents multiple responses in a customizable way to enable the best interpretation of all available information. In addition, we have identified several enhanced bookmark functions such as POST request registration, variable substitution, and auto-scrolling, that further provide flexibility and convenience. We have implemented these functionalities in HAWA using an applet-based filtering architecture to provide ease of use through tight integration with the browsers.

References

1: E. Anderson, D. Patterson, and E. Brewer. The Magicrouter, an Application of Fast Packet Interposing, USENIX Symposium on Operating Systems Design and Implementation (OSDI), 1996.
2: C. L. Brooks, M. Mazer, S. Meeks, and J. Miller Application-specific Proxy Servers as HTTP Stream Transducers, Fourth International World Wide Web Conference, 1995.
3: C. L. Brooks, Wide Area Information Browsing Assistance Final Technical Report.
4: Cisco Local Director, http://www.cisco.com/warp/public/751/lodir/index.html.
5: Computer ESP Search site .
6: M. E. Crovella and R. L. Carter, Dynamic Server Selection in the Internet, In Proceedings Third IEEE Workshop on the Architecture and Implementation of High Performance Communication Subsystems, Aug 1995.
7: T. Kwan, R. McGrath, and D. Reed. NCSA's World Wide Web Server: Design and Performance, IEEE Computers, pp. 68-74, Nov. 1995.
8: B. Ladd, M. Capps, P. Stotts and R. Furuta, Multi-Head Multi-Tail Mosaic: Adding Parallel Automata Semantics to the Web, Fourth International World Wide Web Conference, 1995.
9: C. Yoshikawa et. al., Using Smart Clients to Build Scalable Services, USENIX '97.
10: C. Weider and P. Deutsch. A Vision of an Integrated Information Service.

Return to Top of Page
Return to Posters Index