Apache Druid Operators

Prerequisite

To use DruidOperator, you must configure a Druid Connection first.

DruidOperator

Submit a task directly to Druid, you need to provide the filepath to the Druid index specification json_index_file, and the connection id of the Druid overlord druid_ingest_conn_id which accepts index jobs in Airflow Connections. In addition, you can provide the ingestion type ingestion_type to determine whether the job is Batch Ingestion or SQL-based ingestion.

There is also a example content of the Druid Ingestion specification below.

For parameter definition take a look at DruidOperator.

Using the operator

tests/system/apache/druid/example_druid_dag.py[source]

submit_job = DruidOperator(task_id="spark_submit_job", json_index_file="json_index.json")
# Example content of json_index.json:
JSON_INDEX_STR = """
    {
        "type": "index_hadoop",
        "datasource": "datasource_prd",
        "spec": {
            "dataSchema": {
                "granularitySpec": {
                    "intervals": ["2021-09-01/2021-09-02"]
                }
            }
        }
    }
"""

Reference

For more information, please refer to Apache Druid Ingestion spec reference.

Was this entry helpful?