Time series forecasting and Auto ML with AWS and GCP
Following up from some recent time series forecasting with Prophet, SARIMA and Greykite, this is a fourth related topic and I looked into Amazon Forecast from AWS and the GCP Vertex AI forecasting. Using the same dataset as previously to forecast API request rate for a hypothetical service.
Using the console the process is as expected: create a dataset, provide the data, train model and run inference.
In creating a dataset it was interesting to note the domains and web traffic sounds like the correct one in my case. Picking that forces certain field names vs in custom having to define it again anyway. Also goto S3 first and upload your data (cannot be done in the flow). Note AWS is very picky with format. Pandas is very forgiving so until now I had not even noticed the date string. But it is simple enough to reformat for my small example. For real world data this can be an issue though most likely some pre-processing would be required regardless and this can be one step.
For training I picked the defaults.
Its gives an estimate of approx 2 hours so check back later. Once complete the winning algorithm and its hyperparameters are provided. For me it happens to be NTPS (see AWS list).
Finally for last step, create forecast — it is unclear to me why create forecast does not have a time horizon in input but once that is done you can do a “lookup” and provide the range so setting for the same subsequent 7 day period it produces the p10, p50 and p90. I will use these as
yhat upper and lower in the subsequent comparison as done in previous examples.
An export to CSV would have been great! I did not want to mess around with the APIs so for now I just grabbed the JSON from my browser console that renders the above charts and converted it to a CSV.
//all the values
//similar to above
//similar to above
The outcome is not bad though it seems to be missing the seasonal element in p50. Maybe a different forecast lookup is needed?
The next day you will see some cost added to your bill. Surprisingly this simple exercise was ~95 hours of training so well above their 10 hour free tier and the various other costs for other aspects. So certainly not something to use while learning.
Tear down is simple, just delete your data group and don’t forget S3 cleanup.
Things are now renamed under Vertex AI since I had last looked but same process: create a dataset, provide the data, train model and run inference.
One change I needed to do was to provide a time series id so potentially same file can contain multiple series and each would be trained independently.
I picked 1 hour for compute budget and surprisingly it took a while — approx 1 hr 15mins. Under model details it links to some logs but not easy to understand what it did but these metrics are available (more on understanding these in another post).
Next step is to create the predictions. Its a bit cumbersome but you need to generate the forecast file and pre-fill the timestamps first.
The result is a predicted value added to your file. The results are disappointing and also there is not much to learn.
GCP cost breakdown does not show yet, maybe in few days but given the $$ cost it is too expensive for learning. For cleanup I had to delete the dataset, training pipeline and model. Not sure if I missed something or a simpler way. Also don’t forget files in the cloud storage bucket.
Link to my notebook and files used. In the next post I will discuss various metrics and evaluate each method.
If these topics interest you then reach out to me, and I will appreciate any feedback. If you would like to work on such problems you will generally find open roles as well! Please refer to LinkedIn.