airflow.example_dags.example_datasets
¶
Example DAG for demonstrating the behavior of the Datasets feature in Airflow, including conditional and dataset expression-based scheduling.
Notes on usage:
Turn on all the DAGs.
dataset_produces_1 is scheduled to run daily. Once it completes, it triggers several DAGs due to its dataset being updated. dataset_consumes_1 is triggered immediately, as it depends solely on the dataset produced by dataset_produces_1. consume_1_or_2_with_dataset_expressions will also be triggered, as its condition of either dataset_produces_1 or dataset_produces_2 being updated is satisfied with dataset_produces_1.
dataset_consumes_1_and_2 will not be triggered after dataset_produces_1 runs because it requires the dataset from dataset_produces_2, which has no schedule and must be manually triggered.
After manually triggering dataset_produces_2, several DAGs will be affected. dataset_consumes_1_and_2 should run because both its dataset dependencies are now met. consume_1_and_2_with_dataset_expressions will be triggered, as it requires both dataset_produces_1 and dataset_produces_2 datasets to be updated. consume_1_or_2_with_dataset_expressions will be triggered again, since it’s conditionally set to run when either dataset is updated.
consume_1_or_both_2_and_3_with_dataset_expressions demonstrates complex dataset dependency logic. This DAG triggers if dataset_produces_1 is updated or if both dataset_produces_2 and dag3_dataset are updated. This example highlights the capability to combine updates from multiple datasets with logical expressions for advanced scheduling.
conditional_dataset_and_time_based_timetable illustrates the integration of time-based scheduling with dataset dependencies. This DAG is configured to execute either when both dataset_produces_1 and dataset_produces_2 datasets have been updated or according to a specific cron schedule, showcasing Airflow’s versatility in handling mixed triggers for dataset and time-based scheduling.
The DAGs dataset_consumes_1_never_scheduled and dataset_consumes_unknown_never_scheduled will not run automatically as they depend on datasets that do not get updated or are not produced by any scheduled tasks.