SageMakerProcessingOperator

Amazon

Use Amazon SageMaker Processing to analyze data and evaluate machine learning models on Amazon SageMake. With Processing, you can use a simplified, managed experience on SageMaker to run your data processing workloads, such as feature engineering, data validation, model evaluation, and model interpretation.

View on GitHub

Last Updated: Feb. 27, 2023

Access Instructions

Install the Amazon provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

configRequiredThe configuration necessary to start a processing job (templated). For details of the configuration parameter see SageMaker.Client.create_processing_job()
aws_conn_idThe AWS connection ID to use.
wait_for_completionIf wait is set to True, the time interval, in seconds, that the operation waits to check the status of the processing job.
print_logif the operator should print the cloudwatch log during processing
check_intervalif wait is set to be true, this is the time interval in seconds which the operator will check the status of the processing job
max_ingestion_timeIf wait is set to True, the operation fails if the processing job doesn’t finish within max_ingestion_time seconds. If you set this parameter to None, the operation does not timeout.
action_if_job_existsBehaviour if the job name already exists. Possible options are “timestamp” (default), “increment” (deprecated) and “fail”.
Dict

Documentation

Use Amazon SageMaker Processing to analyze data and evaluate machine learning models on Amazon SageMake. With Processing, you can use a simplified, managed experience on SageMaker to run your data processing workloads, such as feature engineering, data validation, model evaluation, and model interpretation.

See also

For more information on how to use this operator, take a look at the guide: Create an Amazon SageMaker processing job

Was this page helpful?