next up previous
Next: Data Sets Up: Recrawl Scheduling Based on Previous: Discussion


Analysis of Web Data

In this section we study the information longevity characteristics of real web pages. After describing our data sets in Section 3.1 we measure the (lack of) correlation between information longevity and change frequency in Section 3.2. Then in Section 3.3 we propose and validate a generative model for dynamic web content. Lastly in Section 3.4 we measure the potential performance gain from use of a longevity-aware revisitation policy.



Subsections

Chris Olston 2008-02-15