We specialize our model of optimal recrawling from Section 2.2 to the case where the divergence metric of interest is holistic staleness.
Following the model of [4], suppose each page experiences updates according to a Poisson process with rate parameter , where each update is substantial enough to cause the crawled version to become stale. Consider a particular page that was most recently refreshed at time . The probability that has undergone at least one update during the interval is . Hence the expected divergence is given by . The expected utility of refreshing at time , according to Equation 4, is found to be:
The policy of refreshing a page whenever (for some constant ) results in identical schedules to those resulting from the optimal policy derived in [4] for the same holistic staleness objective.