A Light-weight, Temporary File System for Large-scale Web Servers

Jun Wang
Department of Computer Science and Engineering
University of Nebraska Lincoln
115 Ferguson Hall, Lincoln,NE 68588-0115
wang@cse.unl.edu
Hailong Cai
Department of Computer Science and Engineering
University of Nebraska Lincoln
115 Ferguson Hall, Lincoln,NE 68588-0115
hcai@cse.unl.edu
Yiming Hu
Department of Electrical & Computer Engineering and Computer Science
University of Cincinnati
Cincinnati, OH 45221-0030
yhu@ececs.uc.edu

ABSTRACT

Several recent studies have pointed out that file I/Os can be a major performance bottleneck for some large Web servers. Large I/O buffer caches often do not work effectively for large servers. This paper presents a novel, light-weight, temporary file system called TFS that can effectively improve I/O performance for large servers with minimal cost. TFS is a much more cost-effective scheme than the full caching policy for large servers. It is a user-level application that manages files on a raw disk and works in conjunction with a regular file system as an I/O accelerator. Since the entire system runs in the user space, it is easy and inexpensive to implement and maintain. It also has good portability. TFS uses a novel disk storage subsystem called Cluster-structured Storage System (CSS) to manage files. CSS uses only large disk reads and writes and does no have garbage collection problems. Comprehensive trace-driven simulation experiments show that, TFS achieves up to 160% better system throughput and reduces at up to 77% I/O latency per URL operation than that in a traditional Unix Fast File System in large Web servers.

1. INTRODUCTION

Various recent studies [1] have shown that disk I/O is one of the important performance bottlenecks for large Web servers. One problem with full caching (cache all Web files in very large RAM) is that the solution would be very expensive at high throughput levels. This is because that the file system size often increases linearly with the server throughput.

Another important change in large-scale Web server traffic is that, client browser caches and proxy servers filter out most access locality at server side. It makes the Least Recently Used (LRU) cache management policy less effective. Greedy-Dual algorithm developed by Cao et al. is a better algorithm to consider locality, size and network latency/cost [2]. But it does not consider how to make I/O read intensive workload (an important character in largescale Web servers) faster.

Other recent techniques such as content distributed network (CDN) or reverse proxy server aim to improve Web server performance by dynamic load balancing, but are not much helpful for I/O problems.

For the above-stated reasons, we argue that, rather than caching a huge file data set in RAM , we should pay more attention to the efficiencies of secondary storage management. We aim to design a cost-effective, high performance I/O subsystem with only a modest I/O buffer cache (such as 256 MB) to replace a full caching scheme (which requires 4 GB of RAM or more) for largeWeb servers. The I/O performance of this new I/O subsystem would reach that of full caching strategy.

Currently most Web servers are designed to run on top of a general purpose file system. Such a design has several inherent performance disadvantages: they cannot efficiently handle small files, leading to very poor performance; there are very few techniques studied to improve the I/O read performance since read requests dominate the file system traffic in Web server workloads; current solutions do not have effective prefetching functions that can be very helpful in Web server workloads with good spatial locality; the meta-data overhead also poses a serious problem for regular file systems.

In order to solve the above-mentioned problems, we may also develop a new file system that is optimized for Web servers to replace the file system of the host OS. Since it is too expensive to implement and maintain a new file system in the kernel, we propose to let Web servers to use raw disks to manage their own data and meta-data and by-pass the regular file systems (RFS). For reasons of compatibility, simplicity and easy implementation, we develop a Temporary File System (TFS) for large Web servers as an accelerator. TFS works in front of a RFS to service repeated file accesses while RFS still handles issues like file creations, invalidations, security and etc. When a file is read for the first time, the system sends another file copy from the RFS to the TFS (resides on a raw disk partition). All later accesses to this file will be handed over to TFS until the file is modified or deleted. The entire process is completely transparent to users. The advantages of such a design are simple but more efficient implementations and good portability.

2. THE DESIGN OF TFS

The system consists of the following major components. They are discussed in details in the rest of this section.

  1. A Cluster-structured Storage System (CSS)
    This is a novel, high performance disk storage system. CSS divides its disk space into large, fix-sized clusters, which are the basic units for disk reads and writes. The cluster size is normally 64 KB or 128 KB. Such large I/Os maximize the use of disk bandwidth and improve the CPU utilization. Read strategy in CSS is very different from those in other file systems such as LFS or FFS. It adopts an affiliated large read policy and eliminates additional meta data I/O overhead by storing meta data together with data on the disk. Because clusters are the basic I/O unit in CSS, when reading a file, the entire cluster that contains that file is read, in a single large disk I/O, into the memory. Many small files are prefetched and will be accessed soon for the spatial locality.
  2. In-memory File Lookup Table
    TFS uses this hash table to retrieve files stored on CSS.With the in-memory file lookup table, all TFS meta-data searches and updates take place only in RAM. No disk directory searches or inode accesses are needed here
  3. A Disk Cluster Table
    TFS uses this table to maintain the cluster status on CSS. A cluster is a large disk block that mainly contains many small files. The concept of clusters will be discussed shortly.
  4. TFS buffer cache
    This is the TFS application buffer cache that stores active files. All files in the buffer are organized in one doublelinked LRU list.
  5. A Group Buffer
    This buffer is a logical region of TFS buffer cache and located in the bottom region of the buffer cache (the Least Recently Used region). It occupies one-tenth space of TFS buffer cache from the LRU side. TFS uses this buffer to group associated files into clusters and save to the CSS.
  6. A Locality-based Grouping Algorithm
    TFS uses this algorithm to group affiliated files based on their locality.

Figure 1 shows the illustration of the system architecture.

3. EXPERIMENTALMETHODOLOGYAND RESULTS

3.1 Experimental Methodology

We used HTTP trace-driven simulation to evaluate the TFS performance. The Unix FFS mounted on the asynchronous mode is chosen as the baseline system. The FFS simulator is built on top of the Greg Ganger's DiskSim. We chose a Quantum Atlas 10K 9.1 GB as the basic model (the largest model provided by DiskSim). It has 40 MB/sec date transfer rate and 4.7 ms seek time. The TFS system simulator consists of a TFS simulator and a RFS (FFS) simulator. To ensure fairness, the I/O buffer cache size of the baseline system (FFS) is the sum of the I/O buffer cache size of RFS and the TFS buffer cache size.

We obtained a very recentWeb trace, UCBSep01, from the HTTP server of the Computer Science Department of University of California at Berkeley. We also obtained two other small traces,World-Cup98 and ClarkNet, from the Internet Achieve and expanded by linear insertions.

3.2 Results

We define URL latency equaling to the sum of URL I/O latency and CPU processing time (including idle time). Figure 2 compares the average disk latency per URL request, calculated by dividing the total disk I/O latency by the total request number. We can clearly see that TFS achieves dramatical performance improvements over FFS -async, reducing the I/O latency per URL request by 18-65%.

Figure 2: Average Disk I/O Latency per URL Request between TFS and FFS

4. RELATED WORK AND CONCLUSIONS

TFS shares some insights with its sister project called UCFS [3], a user-level customized file system to boost I/O performance for Web proxy servers. The main differences are TFS is optimized for Web servers which are almost read-only, while UCFS is for proxy servers, which also see heavy write traffic. Kant and Won [1] examined the characteristics of SPECWeb99 benchmark with respect to its file-caching properties and suggested to address the efficiencies of file cache management, I/O management, and I/O subsystem components instead of attempting to cache huge portions of the file-set.

This paper presents a novel light-weight, custom temporary file system accelerator called TFS for large Web servers. Extensive simulation confirms that TFS can drastically improve large Web server performance.

5. REFERENCES

  1. K. Kant and Y. Won., "Performance impact of uncached file accesses in specweb99," in Proceedings of 2nd IEEE Workshop on workload characterization,, (Austin TX), Oct 1999.
  2. P. Cao and S. Irani, "Cost-aware WWW proxy caching algorithms," in Proceedings of the USENIX Symposium on Internet Technologies and Systems (ITS-97), (Berkeley), pp. 193-206, USENIX Association, Dec. 8-11 1997.
  3. J. Wang, R. Min, Y. Zhu, and Y. Hu, "UCFS - a user-space, high performance, customized file system for web proxy servers," IEEE transactions on computers,vol. 51, pp. 1056-1073, September 2002.