Apache Airflow Provider - Toloka


CommunityMachine Learning

An Apache Airflow provider for Toloka, a data collection, markup, and aggregation platform.

Last Published
Sep. 6, 2022
Quick Install

Airflow Toloka Provider

GitHub Tests Codecov

This library allows you to run crowdsourcing Toloka processes in Apache Airflow - a widely used workflow management system

Here you can find a collection of ready-made Airflow tasks for the most frequently used actions in Toloka-Kit.

Getting started

$ pip install airflow-provider-toloka

A good way to start is to follow the example in this repo.


TolokaHook is used for getting toloka OAuth token from Airflow Connection and creating TolokaClient with it. You can get TolokaClient from TolokaHook by calling get_conn() method.

To make an appropriate Airflow Connection you need to create it in the Airflow Connections UI with following parameters:

  • Conn ID: toloka_default
  • Conn Type: Toloka
  • Token: enter your OAuth token for Toloka. You can learn more about how to get it here.
  • Environment: enter production or sandbox

Tasks use the toloka_default connection id by default, but if needed, you can create additional Airflow Connections and reference them as the function toloka_conn_id argument.

Tasks and Sensors

There are several tasks and sensors that give you easy way to interact with Toloka from Airflow DAGs. Creating a project and a pool, adding tasks and getting assignments are among them. You can easily create your own task or operator using TolokaHook if it is beyond the scope of implemented ones: just get toloka client through toloka hook: toloka_client = TolokaHook().get_conn() and use all power of Toloka-Kit.

Also, it would be nice to have your pull request with updates.

Useful Links

Questions and bug reports


© YANDEX LLC, 2022. Licensed under the Apache License, Version 2.0. See LICENSE file for more details.