next up previous
Next: Change Profiles Up: Recrawl Scheduling Based on Previous: Offline Page Revisitation Policies


Online Revisitation Policies

We now turn our attention to online page revisitation policies. An online policy is not given any a priori information about page change behavior, and must expend refreshes in order to learn how pages behave. There is little previous work on this topic.

In this section we present two online revisitation policies. We begin by introducing a data structure common to both policies, called a change profile, in Section 4.1. We then present our two online policies in Sections 4.2 and 4.3. Both policies are based on our underlying theory of optimal refreshing (Section 2.2) and are governed by a utility threshold parameter $T$; we discuss how to choose $T$ in Section 4.4. Lastly, in Section 4.5 we give a method of bounding the risk associated with overfitting to past observations.



Subsections

Chris Olston 2008-02-15