Astro Python SDK is a Python SDK for rapid development of extract, transform, and load workflows in Apache Airflow. It allows you to express your workflows as a set of data dependencies without having to worry about ordering and tasks. The Astro Python SDK is maintained by Astronomer.
Prerequisites
- Apache Airflow >= 2.1.0.
Install
The Astro Python SDK is available at PyPI. Use the standard Python installation tools.
To install a cloud-agnostic version of the SDK, run:
pip install astro-sdk-python
You can also install dependencies for using the SDK with popular cloud providers:
pip install astro-sdk-python[amazon,google,snowflake,postgres]
Quickstart
Ensure that your Airflow environment is set up correctly by running the following commands:
export AIRFLOW_HOME=`pwd`airflow db initNote:
AIRFLOW__CORE__ENABLE_XCOM_PICKLING
no longer needs to be enabled from astro-sdk-python release 1.2 and above.- For airflow version < 2.5 and astro-sdk-python release < 1.3 Users can either use a custom XCom backend AstroCustomXcomBackend with Xcom pickling disabled (or) enable Xcom pickling.
- For airflow version >= 2.5 and astro-sdk-python release >= 1.3.3 Users can either use Airflow's Xcom backend with Xcom pickling disabled (or) enable Xcom pickling.
The data format used by pickle is Python-specific. This has the advantage that there are no restrictions imposed by external standards such as JSON or XDR (which can’t represent pointer sharing); however it means that non-Python programs may not be able to reconstruct pickled Python objects.
Read more: enable_xcom_pickling and pickle:
Create a SQLite database for the example to run with:
# The sqlite_default connection has different host for MAC vs. Linuxexport SQL_TABLE_NAME=`airflow connections get sqlite_default -o yaml | grep host | awk '{print $2}'`sqlite3 "$SQL_TABLE_NAME" "VACUUM;"Copy the following workflow into a file named
calculate_popular_movies.py
and add it to thedags
directory of your Airflow project:Alternatively, you can download
calculate_popular_movies.py
curl -O https://raw.githubusercontent.com/astronomer/astro-sdk/main/example_dags/calculate_popular_movies.pyRun the example DAG:
airflow dags test calculate_popular_movies `date -Iseconds`Check the result of your DAG by running:
sqlite3 "$SQL_TABLE_NAME" "select * from top_animation;" ".exit"You should see the following output:
$ sqlite3 "$SQL_TABLE_NAME" "select * from top_animation;" ".exit"Toy Story 3 (2010)|8.3Inside Out (2015)|8.2How to Train Your Dragon (2010)|8.1Zootopia (2016)|8.1How to Train Your Dragon 2 (2014)|7.9
Supported technologies
FileLocation |
---|
local |
http |
https |
gs |
gdrive |
s3 |
wasb |
wasbs |
azure |
sftp |
ftp |
FileType |
---|
csv |
json |
ndjson |
parquet |
xls |
xlsx |
Database |
---|
postgres |
sqlite |
delta |
bigquery |
snowflake |
redshift |
mssql |
duckdb |
mysql |
Available operations
The following are some key functions available in the SDK:
load_file
: Load a given file into a SQL tabletransform
: Applies a SQL select statement to a source table and saves the result to a destination tabledrop_table
: Drops a SQL tablerun_raw_sql
: Run any SQL statement without handling its outputappend
: Insert rows from the source SQL table into the destination SQL table, if there are no conflictsmerge
: Insert rows from the source SQL table into the destination SQL table, depending on conflicts:ignore
: Do not add rows that already existupdate
: Replace existing rows with new ones
export_file
: Export SQL table rows into a destination filedataframe
: Export given SQL table into in-memory Pandas data-frame
For a full list of available operators, see the SDK reference documentation.
Documentation
The documentation is a work in progress--we aim to follow the Diátaxis system:
- Getting Started Tutorial: A hands-on introduction to the Astro Python SDK
- How-to guides: Simple step-by-step user guides to accomplish specific tasks
- Reference guide: Commands, modules, classes and methods
- Explanation: Clarification and discussion of key decisions when designing the project
Changelog
The Astro Python SDK follows semantic versioning for releases. Check the changelog for the latest changes.
Release managements
To learn more about our release philosophy and steps, see Managing Releases.
Contribution guidelines
All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.
Read the Contribution Guideline for a detailed overview on how to contribute.
Contributors and maintainers should abide by the Contributor Code of Conduct.