next up previous
Next: Generative Model Up: Analysis of Web Data Previous: Data Sets


Information Longevity Distribution

Figure 3: Change frequency versus information longevity.

Figure 3 plots change frequency versus information longevity for the high-quality data set. Each point denotes a page. Change frequency is computed as the number of snapshots that differ from the previous snapshot, normalized to $[0,1]$. Information longevity is the average lifetime of fragments (shingles) on the page, in terms of the number of contiguous snapshots in which a given shingle occurred (shingles that were present in the initial or final snapshots are not included, since we are unable to determine their lifetimes).

The fact that the points in Figure 3 are substantially spread out indicates that information longevity is not strongly correlated with change frequency (the correlation coefficient is $0.67$). In other words, if we are told the change frequency of a page, we cannot accurately predict the longevity of its content. This observation, combined with the fact that information longevity is a major determinant of effective page revisitation strategy (Section 1), motivates the study of information longevity on the web.


next up previous
Next: Generative Model Up: Analysis of Web Data Previous: Data Sets
Chris Olston 2008-02-15