Apache Druid Operators¶
Prerequisite¶
To use DruidOperator
,
you must configure a Druid Connection first.
DruidOperator¶
Submit a task directly to Druid, you need to provide the filepath to the Druid index specification json_index_file
, and the connection id of the Druid overlord druid_ingest_conn_id
which accepts index jobs in Airflow Connections.
In addition, you can provide the ingestion type ingestion_type
to determine whether the job is Batch Ingestion or SQL-based ingestion.
There is also a example content of the Druid Ingestion specification below.
For parameter definition take a look at DruidOperator
.
Using the operator¶
submit_job = DruidOperator(task_id="spark_submit_job", json_index_file="json_index.json")
# Example content of json_index.json:
JSON_INDEX_STR = """
{
"type": "index_hadoop",
"datasource": "datasource_prd",
"spec": {
"dataSchema": {
"granularitySpec": {
"intervals": ["2021-09-01/2021-09-02"]
}
}
}
}
"""
Reference¶
For more information, please refer to Apache Druid Ingestion spec reference.