ConsumeFromTopicOperator

Apache Kafka

ConsumeFromTopicOperator An operator that consumes from Kafka a topic(s) and processing the messages.

Access Instructions

Install the Apache Kafka provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

topicsSequence[str]A list of topics or regex patterns the consumer should subsrcribe to.
apply_functionUnion[Callable[…, Any], str]The function that should be applied to fetched one at a time. name of dag file executing the function and the function name delimited by a .
apply_function_batchUnion[Callable[…, Any], str]The function that should be applied to a batch of messages fetched. Can not be used with apply_function. Intended for transactional workloads where an expensive task might be called before or after operations on the messages are taken.
apply_function_argsOptional[Sequence[Any]], optionalAdditional arguments that should be applied to the callable, defaults to None
apply_function_kwargsOptional[Dict[Any, Any]], optionalAdditional key word arguments that should be applied to the callable , defaults to None
kafka_conn_idOptional[str], optionalThe airflow connection id to fetch kafka brokers from, defaults to None
consumer_configOptional[Dict[Any, Any]], optionalthe config dictionary for the kafka client (additional information available on the confluent-python-kafka documentation), defaults to None, defaults to None
commit_cadenceOptional[str], optionalWhen consumers should commit offsets (“never”, “end_of_batch”,”end_of_operator”), defaults to “end_of_operator”; if end_of_operator, the commit() is called based on the max_messsages arg. Commits are made after the operator has processed the apply_function method for the maximum messages in the operator. if end_of_batch, the commit() is called based on the max_batch_size arg. Commits are made after each batch has processed by the apply_function method for all messages in the batch. if never, close() is called without calling the commit() method.
max_messagesOptional[int], optionalThe maximum total number of messages an operator should read from Kafka, defaults to None
max_batch_sizeint, optionalThe maximum number of messages a consumer should read when polling, defaults to 1000
poll_timeoutOptional[float], optionalHow long the Kafka consumer should wait before determining no more messages are available, defaults to 60

Documentation

ConsumeFromTopicOperator An operator that consumes from Kafka a topic(s) and processing the messages.

The operator creates a Kafka consumer that reads a batch of messages from the cluster and processes them using the user supplied callable function. The consumer will continue to read in batches until it reaches the end of the log or reads a maximum number of messages is reached.

Was this page helpful?