If you are monitoring your web services with Wavefront, Prometheus, Grafana etc. then you are aware of time series. In this post I am sharing my learnings about time series forecasting. Starting with some basics of time series, then using a library from Facebook called Prophet to forecast and finally some thoughts on how this could work in for real world scenarios.
To start here is a sample hourly request data for a hypothetical service for 4 weeks (28 days * 24 = 672 data points). Each row represents average number of calls per second to your server in that hour. I will use this in all the examples.
Stationarity and Autocorrelation
Since our goal is to forecast the future there are some conditions under which we can get more accurate predictions. These ideal conditions are referred to as stationarity and autocorrelation. Stationary time series with higher autocorrelation will yield more accurate forecast.
In a stationary time series, the mean and variance is generally constant over time (reference).
a stationary time series is a series whose statistical properties are independent of the point in time at which they are observed
Intuitively this should make it easy to understand why stationarity helps in more accurate forecasts. There are statistical tests you can apply (beyond my scope) so here is a simple visualization. In general the size of each candle is similar so this could be considered stationary time series (there are some outliers like day 12) (reference).
Autocorrelation function (ACF) is the degree of similarity of the time series when lagged by itself and represents how similar the current value is to past values. In ACF all correlations between current past are used and these are direct and indirect effects. Additionally, there is another Partial Autocorrelation (PACF) function that represent to how similar the current value is to the past but eliminates all the indirect effects (reference). This specifically applies to autoregressive (AR) models which we will use later. There are other moving average (MA) models as well which are not discussed.
Time Series Components
In learning about forecasting there is some interesting terminology to be aware of and how the time series is “decomposed” into following components:
- Trend — over a period of time (x-axis left to right) are the values (y-axis) going up or down or remain the same? For example, there may be ups and downs but generally in the last 20 years the total stock index has an upwards trend.
- Seasonality — is there some repeating pattern? For example, summer vacations, Thanksgiving and Christmas would have increased number of people traveling. The “seasons” could be within a day, for example a retail ecommerce site may have more traffic during the day and less at night. Holidays — Seasonality maybe impacted by certain special circumstances like a pandemic and these are known as holidays. Again, for a retail web service during the black Friday week there will be a spike that interrupts the usual seasonality. Additionally, we can also consider new feature launch as a seasonal effect.
- Residual — this is the “noise” and other unexplained component of a time series.
There are 2 ways to decompose a time series into these components: Additive and Multiplicative.
In this case the components are added (reference video and intuition). Breaking up our example data.
This is roughly represented as:
y(t) = Trend(t) + Seasonality(t) + Residual(t)
In this case components are multiplied (reference video and intuition). Breaking up our example data.
This is roughly represented as:
y(t) = Trend(t) * Seasonality(t) * Residual(t)
Additive vs Multiplicative
Both decompositions above look similar but note the y-axis for seasonality. For additive this is a number approximately between -50 to +200 while for multiplicative you will notice this is a multiplier approximately between 0.9 and +1.4. In our example it is unclear which one it is, and this seems to be an ‘it depends’ situation with no objective answer. In general, if seasonality variation does not follow the trend, then its additive. If seasonality variation increases/decreases along with the trend then it is multiplicative (reference).
In the case of web-services traffic I will try both but generally multiplicative is likely the right choice. This is because seasonal variations generally follow the trend and affect the series by a large value.
Forecasting with Facebook Prophet
Prophet is a simple library and great for beginners as there are good defaults and also similar
predict style like sklearn. It is as simple as this (see link to full notebook below):
fb = Prophet()
future_dates = fb.make_future_dataframe(periods=7*24, freq='H')
forecast = fb.predict(future_dates)
Default is additive and I tired with both additive and multiplicative. The
forecast above contains all the predictions and the upper and lower bounds including those of the components. In my case they appear identical though as discussed above multiplicative is probably the better approach.
The black dots are actual values while the blue line is from the model.
Forecast vs Observed
Since I have future observed data as well it is easy to compare! Here are plots comparing the
yhat and their upper/lower with observed in the future 7 day period. The first 2 days look very closely aligned and within error bounds and given the very small training data this is very impressive!!
There are lots of knobs to control and while the Prophet documentation mentions these you actually need to understand a lot of details and understand internals before you can play with them.
All the code and sample data for above is in Github at this location.
Exampleland is good but trying to apply this to real problems is not so easy. I do not have production experience yet so sharing some issues I am confronting as I try to apply to some real problems. Gathering, cleaning, and understanding the data are the biggest hurdles before you even to get to any modeling or inference!
When collecting metrics, each metric can have N arbitrary key-value labels (see Prometheus example) so when considering such time series we need to extract series based on some label combinations or apply additional regressors or potentially a both? Further forecasts created would then need to be recombined in some way?
Another problem is scaling. If the time series is specific to a specific customer (for example forecasting use for a single customer), then the model trained cannot be applied to all customers and each must be individually trained? Components such as seasonality may apply across customers and that maybe applicable to all?
Finally building and operating data pipelines is something I have no knowledge about so lots of new things to learn!
If these topics interest you then reach out to me, and I will appreciate any feedback. If you would like to work on such problems you will generally find open roles as well! Please refer to LinkedIn.