Time series forecasting and Auto ML with AWS and GCP

Photo by Patrick Tomasso on Unsplash

See other stories in this series

Following up from some recent time series forecasting with Prophet, SARIMA and Greykite, this is a fourth related topic and I looked into Amazon Forecast from AWS and the GCP Vertex AI forecasting. Using the same dataset as previously to forecast API request rate for a hypothetical service.


Using the console the process is as expected: create a dataset, provide the data, train model and run inference.

In creating a dataset it was interesting to note the domains and web traffic sounds like the correct one in my case. Picking that forces certain field names vs in custom having to define it again anyway. Also goto S3 first and upload your data (cannot be done in the flow). Note AWS is very picky with format. Pandas is very forgiving so until now I had not even noticed the date string. But it is simple enough to reformat for my small example. For real world data this can be an issue though most likely some pre-processing would be required regardless and this can be one step.

Format error
Reformat as per AWS in pandas
Create a dataset group with a domain
Dataset details

For training I picked the defaults.

Model training

Its gives an estimate of approx 2 hours so check back later. Once complete the winning algorithm and its hyperparameters are provided. For me it happens to be NTPS (see AWS list).

"context_length": "1100",
"exp_kernel_weights": "0.01",
"kernel_type": "exponential",
"prediction_length": "168",
"use_default_time_features": "true",
"use_seasonal_model": "true"

Finally for last step, create forecast — it is unclear to me why create forecast does not have a time horizon in input but once that is done you can do a “lookup” and provide the range so setting for the same subsequent 7 day period it produces the p10, p50 and p90. I will use these as yhat upper and lower in the subsequent comparison as done in previous examples.

AWS forecasts

An export to CSV would have been great! I did not want to mess around with the APIs so for now I just grabbed the JSON from my browser console that renders the above charts and converted it to a CSV.

"Forecast": {
"Predictions": {
"p10": [
"Timestamp": "2021-05-06T19:00:00",
"Value": 338.935302734375
//all the values
"p50": [
//similar to above
"p90": [
//similar to above

The outcome is not bad though it seems to be missing the seasonal element in p50. Maybe a different forecast lookup is needed?

Observed vs AWS forecast values

The next day you will see some cost added to your bill. Surprisingly this simple exercise was ~95 hours of training so well above their 10 hour free tier and the various other costs for other aspects. So certainly not something to use while learning.

Tear down is simple, just delete your data group and don’t forget S3 cleanup.


Things are now renamed under Vertex AI since I had last looked but same process: create a dataset, provide the data, train model and run inference.

One change I needed to do was to provide a time series id so potentially same file can contain multiple series and each would be trained independently.

Create dataset for forecasting
Mark columns and see data statistics
Define details and forecast horizon
Finalize training options

I picked 1 hour for compute budget and surprisingly it took a while — approx 1 hr 15mins. Under model details it links to some logs but not easy to understand what it did but these metrics are available (more on understanding these in another post).

More accuracy metrics

Next step is to create the predictions. Its a bit cumbersome but you need to generate the forecast file and pre-fill the timestamps first.

The result is a predicted value added to your file. The results are disappointing and also there is not much to learn.

Observed vs GCP forecast values

GCP cost breakdown does not show yet, maybe in few days but given the $$ cost it is too expensive for learning. For cleanup I had to delete the dataset, training pipeline and model. Not sure if I missed something or a simpler way. Also don’t forget files in the cloud storage bucket.

Link to my notebook and files used. In the next post I will discuss various metrics and evaluate each method.

See other stories in this series

If these topics interest you then reach out to me, and I will appreciate any feedback. If you would like to work on such problems you will generally find open roles as well! Please refer to LinkedIn.





Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Decision Tree in Machine Learning

Build a Natural Language Classifier With Bert and Tensorflow

“Edge Computing ” Science-Research, November 2021 — summary from Arxiv, Astrophysics Data System…

Data augmentation for neural network training — example for printed characters recognition

Qualitative evaluation of search ranking algorithms

Implementing Point Pillars in Tensorflow

EDA on Boston Housing Dataset

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Rahul Agarwal

Rahul Agarwal


More from Medium

DevOps in Data Science

Calculating the Probability of Loan Repayment — Using MLOps for Credit Scoring

Cluster Analysis-Unsupervised ML for Pairs Trading using Indian Stock data

Scalable Time-Series Forecasting in Spark — Prophet, CNN, LSTM, and SARIMA