To validate our generative model, we analyzed the real fragment lifetime distribution of pages from the high-quality data set. We focused on a set of pages that have the same average change frequency . We assigned an estimated value to every non-static fragment, based on the number of page update events the fragment ``survives'' (i.e., remains on the page). For each page we found the most common value among its fragments.
We obtained three groups of pages: those whose dominant value is (churn behavior), those dominated by (short scroll behavior), and those dominated by scroll behavior with some (the third category was not large enough to subdivide while still having enough data for smooth lifetime distributions). Figure 5 plots the fragment lifetime distributions for the non-static fragments of the three groups of pages, along with corresponding lifetime distributions obtained from our generative model. Each instantiation of the model reflects a distribution of values that matches the distribution occurring in the data.
The actual lifetime curves closely match the ones predicted by the model. One exception is that the curve for the actual data diverges somewhat from the model at the end (top-right corner of Figure 5). This discrepancy is an unfortunate artifact of the somewhat short time duration of our data set: Fragments present in the initial or final snapshot of a page were not included in our analysis because we cannot determine their full lifetimes. Consequently the data is slightly biased against long-lifetime fragments.
Unlike the right-most curve of Figure 4, none of the curves in Figure 5 exhibit the flat component at the beginning. The reason is that in each of our page groups there is at least some content that churns rapidly.