FromXLSXQueryOperator
XLSXExecute an SQL query an XLSX/XLS file and export the result into a Parquet or CSV file
Access Instructions
Install the XLSX provider package into your Airflow environment.
Import the module into your DAG file and instantiate it with your desired params.
Parameters
sourcestrSource filename (XLSX or XLS, templated)
targetstrTarget filename (templated)
worksheetstr or intWorksheet title or number (zero-based, templated)
skip_rowsintNumber of input lines to skip (default: 0, templated)
typesstr or dictionary of string key/value pairforce Parquet column types (dict or list column=’str’, ‘int64’, ‘double’, ‘datetime64[ns]’)
file_formatstrOutput file format (parquet, csv, json, jsonl)
csv_delimiterstrCSV delimiter (default: ‘,’)
csv_headerstrConvert CSV output header case (‘lower’, ‘upper’, ‘skip’)
querystrSQL query (templated)
table_nameTable name (default: ‘xls’, templated)
use_first_row_as_headerboolif true, use the first row as column names otherwhise use A, B, C, … as colum names
nullable_intboolnullable integer data type support
Documentation
Execute an SQL query an XLSX/XLS file and export the result into a Parquet or CSV file
This operators loads an XLSX or XLS file into an in-memory SQLite database, executes a query on the db and stores the result into a Parquet, CSV, JSON, JSON Lines(one line per record) file. The output columns names and types are determinated by the SQL query output.