Machine Learning Pipelines with AWS SageMaker
This DAG shows an example implementation of machine learning model orchestration using Airflow and AWS SageMaker.
AI + Machine Learning
Providers:
Modules:
Run this DAG
1. Install the Astronomer CLI:Skip if you already have our CLI
2. Download the repository:
3. Navigate to where the repository was cloned and start the DAG:
airflow-sagemaker-tutorial
This repo contains an Astronomer project with multiple example DAGs showing how to use Airflow for ML orchestration with AWS SageMaker. A guide discussing the DAGs and concepts in depth will be published shortly.
Tutorial Overview
This tutorial has two example DAGs showing how to accomplish the following ML use cases:
sagemaker-run-model
: gets inferences on a dataset from an existing SageMaker model by running a batch transform job and saves the results to Redshift.sagemaker-pipeline
: orchestrates an end-to-end ML model including obtaining and pre-processing the data, training a model, saving the model from the training artifact, and testing the model with a batch transform job.
Getting Started
The easiest way to run these example DAGs is to use the Astronomer CLI to get an Airflow instance up and running locally:
- Install the Astronomer CLI
- Clone this repo somewhere locally and navigate to it in your terminal
- Initialize an Astronomer project by running
astro dev init
- Start Airflow locally by running
astro dev start
- Navigate to localhost:8080 in your browser and you should see the tutorial DAGs there
- Add the following Airflow Variables:
s3_bucket
- S3 Bucket used with SageMaker instancerole
- Role ARN to execute SageMaker jobs
- Add Airflow connections with the following IDs:
aws-sagemaker
- Connection type of AWSredshift_default
- Connection type of Redshift
Pre-Requisites
- Setup AWS Redshift and make sure it's accessible from your local Airflow
- Create a table in Redshift named
results
. You can use the SQL query ininclude/helper/create_results_table.sql
to do so.
- Create a table in Redshift named
- Follow the Sagemaker notebooks tutorial to create your model used in the
sagemaker-run-model.py
dag, or run the provided notebook frominclude/helper/sagemaker-guide.ipynb
in Sagemaker.