In this paper we characterize the longevity of information found on the web, via both empirical measurements and a generative model that coincides with these measurements. We then develop new recrawl scheduling policies that take longevity into account. As we show via experiments over real web data, our policies obtain better freshness at lower cost, compared with previous approaches.
H.3.3Information Storage and RetrievalInformation Search and Retrieval Algorithms, Experimentation, Measurement, Theory