FromXLSXQueryOperator

XLSX

Execute an SQL query an XLSX/XLS file and export the result into a Parquet or CSV file

View on GitHub

Last Updated: Mar. 3, 2022

Access Instructions

Install the XLSX provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

sourcestrSource filename (XLSX or XLS, templated)
targetstrTarget filename (templated)
worksheetstr or intWorksheet title or number (zero-based, templated)
skip_rowsintNumber of input lines to skip (default: 0, templated)
typesstr or dictionary of string key/value pairforce Parquet column types (dict or list column=’str’, ‘int64’, ‘double’, ‘datetime64[ns]’)
file_formatstrOutput file format (parquet, csv, json, jsonl)
csv_delimiterstrCSV delimiter (default: ‘,’)
csv_headerstrConvert CSV output header case (‘lower’, ‘upper’, ‘skip’)
querystrSQL query (templated)
table_nameTable name (default: ‘xls’, templated)
use_first_row_as_headerboolif true, use the first row as column names otherwhise use A, B, C, … as colum names
nullable_intboolnullable integer data type support

Documentation

Execute an SQL query an XLSX/XLS file and export the result into a Parquet or CSV file

This operators loads an XLSX or XLS file into an in-memory SQLite database, executes a query on the db and stores the result into a Parquet, CSV, JSON, JSON Lines(one line per record) file. The output columns names and types are determinated by the SQL query output.

Was this page helpful?