Integration

AWS: Amazon Webservices

GCP: Google Cloud Platform

Airflow has extensive support for the Google Cloud Platform. But note that most Hooks and Operators are in the contrib section. Meaning that they have a beta status, meaning that they can have breaking changes between minor releases.

BigQuery

BigQuery Operators

BigQueryCheckOperator
BigQueryValueCheckOperator
BigQueryIntervalCheckOperator
BigQueryOperator
BigQueryToBigQueryOperator
BigQueryToCloudStorageOperator

BigQueryHook

Cloud DataFlow

DataFlow Operators

DataFlowJavaOperator
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date':
        (2016, 8, 1),
    'email': ['alex@vanboxel.be'],
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=30),
    'dataflow_default_options': {
        'project': 'my-gcp-project',
        'zone': 'us-central1-f',
        'stagingLocation': 'gs://bucket/tmp/dataflow/staging/',
    }
}

dag = DAG('test-dag', default_args=default_args)

task = DataFlowJavaOperator(
    gcp_conn_id='gcp_default',
    task_id='normalize-cal',
    jar='{{var.value.gcp_dataflow_base}}pipeline-ingress-cal-normalize-1.0.jar',
    options={
        'autoscalingAlgorithm': 'BASIC',
        'maxNumWorkers': '50',
        'start': '{{ds}}',
        'partitionType': 'DAY'

    },
    dag=dag)

DataFlowHook

Cloud DataProc

DataProc Operators

DataProcPigOperator
DataProcHiveOperator
DataProcSparkSqlOperator
DataProcSparkOperator
DataProcHadoopOperator

DataProcPySparkOperator

Cloud Datastore

Datastore Operators

Cloud Storage

Storage Operators

GoogleCloudStorageDownloadOperator
GoogleCloudStorageToBigQueryOperator

GoogleCloudStorageHook