Skip to content

Latest commit

 

History

History
276 lines (193 loc) · 21.4 KB

File metadata and controls

276 lines (193 loc) · 21.4 KB

Modern Time Series Forecasting on AWS

Overview

This workshop demonstrates how to use AWS services to implement time series forecasting. It covers the following examples and AWS services:

  1. Amazon SageMaker Canvas
  2. Amazon SageMaker Autopilot API
  3. Amazon SageMaker DeepAR
  4. Chronos
  5. AutoGluon

Additional notebooks cover forecasting with GluonTS, a custom algorithm on SageMaker, and Amazon QuickSight.

The workshop is available in AWS workshop catalog. You can run this workshop on an AWS-led event or in your own AWS account.

How to use this workshop

To use this workshop, you need an Amazon SageMaker domain. All workshop content is in Jupyter notebooks running on Amazon SageMaker. To get started, follow the instructions in the Getting started section. To clean up resources, follow the instructions in the Clean-up section. You can execute the notebooks in any order, and you don't need to switch between the notebooks and the workshop web page.

Required resources

Ignore this section if you're using an AWS-provided account as a part of an AWS-led workshop.

In order to be able to run notebooks and complete workshop labs you need access to the following resources in your AWS account. You can check quotas for all following resources in AWS console in Service Quotas console.

Studio JupyterLab app
Minimal required instance type is ml.m5.2xlarge. We recommend to use ml.m5.4xlarge as an instance to run all notebooks. If you have access to GPU-instances like ml.g5.4xlarge or ml.g6.4xlarge, use these instance to run the notebooks.

To experiment with the full dataset with 370 time series in the lab 5 AutoGluon you need a GPU instance for the notebook - ml.g5.4xlarge/ml.g6.4xlarge or ml.g5.8xlarge/ml.g6.8xlarge.

Number of concurrent AutoML Jobs
To follow the optimal flow of the workshop, you need to run at least three AutoML jobs in parallel. We recommend to have a quota set to six or more concurrent jobs.

Training jobs
To run a training job for DeepAR algorithm you need a ml.c5.4xlarge compute instance

SageMaker real-time inference endpoints
DeepAR, Chronos, and AutoGluon notebooks deploy SageMaker real-time inference endpoints to test models. You need access to the following compute instances for endpoint use:

Workshop flow

The notebooks from Lab 1 to Lab 5 are self-sufficient. You can run them in any order. If you're unfamiliar with time series forecasting, we recommend starting with the Lab 1 notebook and continuing from there. Alternatively, you can run only the notebooks that interest you, such as lab4_chronos or lab5_autogluon.

The model training in Labs 1, 2, and 3 takes 15-40 minutes, depending on the algorithm. You don't need to wait for the training to complete before moving on to the next notebook. You can come back to the previous notebook once the training is done.

Executing all five notebooks will take 2-3 hours. If you're new to time series forecasting, Jupyter notebooks, or Python, it may take longer.

Workshop costs

The notebooks in this workshop create cost-generating resources in your account. Make sure you always delete created SageMaker inference endpoints, log out of Canvas, and stop JupyterLab spaces if you don't use them.

If running all notebooks with all sections, including optional sections and training three models using Standard builds in Canvas, the estimated cost is approximately 90-100 USD.

Please note that your actual costs may vary depending on the duration of the workshop, the number of inference endpoints created, and the time the endpoints remain in service.

To optimize costs, follow these recommendations:

  1. Run only Quick builds in Canvas to minimize costs. Note that in this case you cannot download model performance JSON files
  2. Use only a sample from the full dataset to train models and run all experiments. Each notebook contains code to create a small dataset with a sample from the time series
  3. Promptly delete SageMaker inference endpoints after use
  4. Use ml.m5.xlarge instance for JupyterLab app to balance performance and cost
  5. Limit Chronos experiments to one endpoint and a sample of the time series in the notebook lab4_chronos

Getting started

If you'd lke to create a new domain, you have two options:

  1. Use the provided AWS CloudFormation template that creates a SageMaker domain, a user profile, and adds the IAM roles required for executing the provided notebooks - this is the recommended approach
  2. Follow the onboarding instructions in the Developer Guide and create a new domain and a user profile via AWS Console

Datasets

All examples and notebooks in this workshop using the same real-world dataset. It makes possible to compare performance and model metrics across different approaches.

You use the electricity dataset from the repository of the University of California, Irvine:

Trindade, Artur. (2015). ElectricityLoadDiagrams20112014. UCI Machine Learning Repository. https://doi.org/10.24432/C58C86.

Example 1: Amazon SageMaker Canvas

Open the lab 1 notebook and follow the instructions.

Additional SageMaker Canvas links:

Example 2: Amazon SageMaker Autopilot API

Open the lab 2 notebook and follow the instructions.

Note: previous Autopilot UX in Studio Classic merged with Canvas as of re:Invent 2023. All AutoML functionality is moved to Canvas as of now.

Additional SageMaker Autopilot API links:

Example 3: Amazon SageMaker DeepAR

Open the lab 3 notebook and follow the instructions.

Additional DeepAR links:

Example 4: Chronos

Open the lab 4 notebook and follow the instructions.

Links to more Chronos content:

Example 5: AutoGluon

Open the lab 5 notebook and follow the instructions.

Links to AutoGluon content:

Additional examples

The additional notebooks in the folder notebooks/additional cover more approaches you can use for time series forecasting. These notebooks demonstrate:

  1. GluonTS
  2. Custom algorithms on SageMaker
  3. Amazon QuickSight forecast

Example 1A: GluonTS

Navigate to the additional folder inside the notebooks folder. Open the lab 1A notebook and follow the instructions.

The notebook additional\lab1a_gluonts also contains an end-to-end example of productization of a time series forecasting workflow. The lab demonstrates how to create a reproducible SageMaker pipeline with data processing, model training, model evaluation, model registration in the model registry, and model deployment to a SageMaker endpoint. The notebook uses GluonTS implementation of Temporal Fusion Transformer forecast and SageMaker Python SDK PyTorch framework together with SageMaker built-in Deep Learning Containers (DLC).

Links to GluonTS content:

Example 2A: Amazon SageMaker custom algorithm

This example is under development.

Refer to the following resources to see how you can run custom algorithms on SageMaker:

Example 3A: Amazon QuickSight forecast

Amazon QuickSight has ML features to give you hidden insights and trends in your data. One of these ML features is ML-powered forecast. The built-in ML forecast uses Random Cut Forest (RCF) algorithm to detect seasonality, trends, exclude outliers, and impute missing values. For more details on how QuickSight uses RCF to generate forecasts, see the developer guide.

You can customize multiple settings on the Forecast properties pane, such as number of forecast periods, prediction interval, seasonality, and forecast boundaries.

For more details refer to the Developer Guide Forecasting and creating what-if scenarios with Amazon QuickSight.

Besides a graphical forecasting, you can also add a forecast as a narrative in an insight widget. To learn more, see Creating autonarratives with Amazon QuickSight.

Additional resources for Amazon QuickSight forecasting:

Results and comparison

Open the lab 6 notebook and follow the instructions.

Additional resources about time series forecast accuracy evaluation

Clean up

To avoid unnecessary costs, you must remove all project-provisioned and generated resources from your AWS account.

Shut down SageMaker resources

You must complete this section before deleting the SageMaker domain or the CloudFormation stack.

Complete the following activities to shut down your Amazon SageMaker resources:

Remove the SageMaker domain

You don't need to complete this section if you run an AWS-instructor led workshop in an AWS-provisioned account.

If you used the AWS Console to provision a Studio domain for this workshop, and don't need the domain, you can delete the domain by following the instructions in the Developer Guide.

If you provisioned a Studio domain with the provided CloudFormation template, you can delete the CloudFormation stack in the AWS console.

If you provisioned a new VPC for the domain, go to the VPC console and delete the provisioned VPC.

Resources

Algorithms

Books and whitepapers

Blog posts

Workshops and notebooks

QR codes and links

This GitHub repository

Link: https://github.com/aws-samples/modern-time-series-forecasting-on-aws
Short link: https://bit.ly/47hnKH6

AWS workshop

Link: https://catalog.workshops.aws/modern-time-series-forecasting-on-aws/en-US
Short link: https://bit.ly/4dBQ0G8

Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0