next up previous
Next: Metrics Up: Recrawl Scheduling Based on Previous: Contributions


Theoretical Framework

The scenario we consider is as follows. A crawler has acquired the content associated with a set of pages $\mathcal{P}$. Each page $P \in \mathcal{P}$ may undergo autonomous updates after the crawler's first visit, causing the web-resident copy of $P$ to drift from the crawler's copy. A page revisitation policy governs the schedule with which the crawler refreshes its local copy of $P$, by revisiting the web-resident copy and reacquiring its content, in order to bring the local copy in sync with the web copy, if only temporarily.



Subsections

Chris Olston 2008-02-15