Google Analytics to Snowflake

The google_analytics_to_snowflake DAG transfers data from Google Analytics (GA) to Google Cloud Storage (GCS), and then loads the data to Snowflake staging tables. From staging, data is loaded to Snowflake transform tables.

ETL/ELTData ProcessingStorageDatabasesData Management & Governance


Modules:

Last Updated: Jan. 14, 2022

Run this DAG

1. Install the Astronomer CLI:Skip if you already have our CLI

2. Download the repository:

3. Navigate to where the repository was cloned and start the DAG:

Google Analytics Example DAGs

This repository contains example code for DAGs interacting with Google Analytics, and the hooks and operators needed to power the DAGs.

The custom GAToGCSOperator is used to load data from Google Analytics to GCS, and the code can be found in the include/operators/ directory. The corresponding hook can be found under include/hooks/.

Requirements

The Astronomer CLI and Docker installed locally are needed to run all DAGs in this repo. Python requirements are found in the requirements.txt file.

An account and connection for Google Cloud and Snowflake are needed to run DAGs.

For Google Cloud, under Admin -> Connections in the Airflow UI, add a new connection with Conn ID as google_cloud_default. The connection type is Google Cloud. A GCP key associated with a service account using OAuth that has access to Google Cloud Storage is needed; for more information generating a key, follow the instructions in this guide. The key can either be added via a path via the Keyfile Path field, or the JSON can be directly copied and pasted into the Keyfile JSON field. In the case of the Keyfile Path, a relative path is allowed, and if using Astronomer, the recommended path is under the include/ directory, as Docker will mount all files and directories under it. Make sure the file name is included in the path. Finally, add the project ID to the Project ID field. The scope https://www.googleapis.com/auth/analytics.readonly, is needed.

For Snowflake, under Admin -> Connections in the Airflow UI, add a new connection with Conn ID as snowflake_default. The connection type is Snowflake. The host field should be the full URL that you use to log into Snowflake, for example https://[account].[region].snowflakecomputing.com. Fill out the Login, Password, Schema, Account, Database, Region, Role, and Warehouse fields with your information.

Getting Started

The easiest way to run these example DAGs is to use the Astronomer CLI to get an Airflow instance up and running locally:

  1. Install the Astronomer CLI.
  2. Clone this repo locally and navigate into it.
  3. Start Airflow locally by running astro dev start.
  4. Create all necessary connections and variables - see below for specific DAG cases.
  5. Navigate to localhost:8080 in your browser and you should see the tutorial DAGs there.