Azure Synapse Operators

Azure Synapse Analytics is a limitless analytics service that brings together data integration, enterprise data warehousing and big data analytics. It gives you the freedom to query data on your terms, using either serverless or dedicated options—at scale. Azure Synapse brings these worlds together with a unified experience to ingest, explore, prepare, transform, manage and serve data for immediate BI and machine learning needs.

AzureSynapseRunSparkBatchOperator

Use the AzureSynapseRunSparkBatchOperator to execute a spark application within Synapse Analytics. By default, the operator will periodically check on the status of the executed Spark job to terminate with a “Succeeded” status.

Below is an example of using this operator to execute a Spark application on Azure Synapse.

tests/system/microsoft/azure/example_azure_synapse.py[source]

run_spark_job = AzureSynapseRunSparkBatchOperator(
    task_id="run_spark_job",
    spark_pool="provsparkpool",
    payload=SPARK_JOB_PAYLOAD,  # type: ignore
)

AzureSynapseRunPipelineOperator

Use the: class:~airflow.providers.microsoft.azure.operators.synapse.AzureSynapseRunPipelineOperator to execute a pipeline application within Synapse Analytics. The operator will Execute a Synapse Pipeline.

tests/system/microsoft/azure/example_synapse_run_pipeline.py[source]

run_pipeline1 = AzureSynapseRunPipelineOperator(
    task_id="run_pipeline1",
    azure_synapse_conn_id="azure_synapse_connection",
    pipeline_name="Pipeline 1",
    azure_synapse_workspace_dev_endpoint="azure_synapse_workspace_dev_endpoint",
)

Reference

For further information, please refer to the Microsoft documentation:

Was this entry helpful?