next up previous
Next: Workload and capacity management: Up: Workload Statistics Previous: Static Properties

Temporal Properties

         % latex2html id marker 574
$\textstyle \parbox{0.47\columnwidth}{\vspace{-1pt}
...
...autoCoeff_yahoo.eps}}
\par
}
\caption{Workload autocorrelation coefficient}
}$          % latex2html id marker 575
$\textstyle \parbox{0.47\columnwidth}{\vspace{-8pt}
...
...raphics{technicalreport/period.eps}}
\par
}
\caption{Workload periodicity}
}$          % latex2html id marker 576
$\textstyle \parbox{0.47\columnwidth}{\vspace{8pt}
...
... \par
}
\caption{Comparison of a bursty interval and its preceding interval}
}$

Job Size Stationarity: Job size distribution is defined as the distribution of stream requests on video durations during a measurement interval. We use histogram intersection distance [4] to measure the change between two job size distributions, and calculated the pair-wise histogram intersection distance of two adjacent data points during the measuremnent. Figure 1 shows the CDFs of histogram intersection distance distribution for 3 time scales. We can see that within 30-minute and one-hour scale, the histogram distance is very small for most of the time. For example, $90\%$ of the time it is no more than 0.15. But from day to day, the difference of request size distributions is obvious. This indicates that short-term dynamic provisioning only needs to focus on request arrival rate dynamics, while capacity planning at daily or longer basis has to take into account both arrival rate and job size dynamics.

Arrival Rate Predictability: We calculate the autocorrelation coefficient of the arrival rates at the time scale of 30 minutes, and from Figure 2 we can see that the workload is highly correlated in short term. We also use Fourier analysis to discover the possible periodicity in the workload dynamics after removing a few workload spikes. As shown in Figure 3, the maximum value on the figure indicates that the period is one day. With the strong periodicity components, well-known statistical prediction approaches can be applied to further improve the accuracy in capacity planning.

Burstiness: While we can not predict unexpected spikes in the workload, it is necessary to learn the nature of the burstiness and find out an efficient way to handle it once a busrty event happens. The comparison of the request (popularity) distribution during one spike interval and that in the preceding interval is shown in Figure 4. We can see that the workload can be seen as two parts: a base workload similar to the workload in the previous normal period, and an extra workload that is due to several very popular files.


next up previous
Next: Workload and capacity management: Up: Workload Statistics Previous: Static Properties
Hui Zhang 2008-02-28