Overall, crawling resources must be shared between discovery and retrieval of new content, and refreshing of old content [7]. Hence there is an intrinsic tradeoff between freshness and coverage. In view of this tradeoff, the following overall crawling strategy seems appropriate: when there is an opportunity to boost freshness significantly by refreshing old content, do so; dedicate all other resources to acquiring new content.
From Section 2.2 we know that basing refresh decisions on a fixed threshold of utility, measured according to Equation 4, is optimal in terms of freshness achieved per unit cost. We leave the utility threshold as a parameter to be set by a human administrator who can judge the relative importance of freshness and coverage in the appropriate business context. is set properly iff (1) it would be preferable to receive a freshness boost of magnitude (in units of divergence time) rather than download a new page, and (2) it would be preferable to download a new page than to receive a freshness boost of .
In a parallel crawler, the value of may be broadcast to all nodes at the outset of crawling (and during occasional global tuning). Subsequently, all refresh scheduling decisions are local, because they depend only on and a given page's change profiles.