apache-airflow-providers-apache-beam

Changelog

5.9.1

Misc

  • Standard provider python operator (#42081)

5.9.0

Features

  • Add early job_id xcom_push for google provider Beam Pipeline operators (#42982)

5.8.1

Bug Fixes

  • Bugfix/dataflow job location passing (#41887)

5.8.0

Note

This release of provider is only available for Airflow 2.8+ as explained in the Apache Airflow providers support policy.

Misc

  • Bump minimum Airflow version in providers to Airflow 2.8.0 (#41396)

5.7.2

Bug Fixes

  • Fix BeamRunJavaPipelineOperator fails without job_name set (#40645)

5.7.1

Bug Fixes

  • Fix deferrable mode for BeamRunJavaPipelineOperator (#39371)

Misc

  • Faster 'airflow_version' imports (#39552)

  • Simplify 'airflow_version' imports (#39497)

5.7.0

Note

This release of provider is only available for Airflow 2.7+ as explained in the Apache Airflow providers support policy.

Bug Fixes

  • Bugfix to correct GCSHook being called even when not required with BeamRunPythonPipelineOperator (#38716)

Misc

  • Bump minimum Airflow version in providers to Airflow 2.7.0 (#39240)

5.6.3

Bug Fixes

  • fix: skip apache beam pipeline options if value is set to false (#38496)

  • Fix side-effect of default options in Beam Operators (#37916)

  • Avoid to use subprocess in asyncio loop (#38292)

  • Avoid change attributes into the constructor in Apache Beam operators (#37934)

5.6.2

Misc

  • Add Python 3.12 exclusions in providers/pyproject.toml (#37404)

5.6.1

Misc

  • feat: Switch all class, functions, methods deprecations to decorators (#36876)

5.6.0

Misc

  • Get rid of pyarrow-hotfix for CVE-2023-47248 (#36697)

5.5.0

Features

  • Add ability to run streaming Job for BeamRunPythonPipelineOperator in non deferrable mode (#36108)

  • Implement deferrable mode for BeamRunJavaPipelineOperator (#36122)

5.4.0

Note

This release of provider is only available for Airflow 2.6+ as explained in the Apache Airflow providers support policy.

Misc

  • Bump minimum Airflow version in providers to Airflow 2.6.0 (#36017)

5.3.0

Note

This release of provider is only available for Airflow 2.5+ as explained in the Apache Airflow providers support policy.

Misc

  • Bump min airflow version of providers (#34728)

  • Use 'airflow.exceptions.AirflowException' in providers (#34511)

5.2.3

Misc

  • Replace sequence concatenation by unpacking in Airflow providers (#33933)

  • Improve modules import in Airflow providers by some of them into a type-checking block (#33754)

5.2.2

Bug Fixes

  • Fix wrong OR condition when evaluating beam version < 2.39.0 (#33308)

Misc

  • Refactor: Simplify code in Apache/Alibaba providers (#33227)

5.2.1

Misc

  • Allow downloading requirements file from GCS in 'BeamRunPythonPipelineOperator' (#31645)

5.2.0

Features

  • Add deferrable mode to 'BeamRunPythonPipelineOperator' (#31471)

5.1.1

Note

This release dropped support for Python 3.7

Misc

  • Add note about dropping Python 3.7 for providers (#32015)

5.1.0

Note

This release of provider is only available for Airflow 2.4+ as explained in the Apache Airflow providers support policy.

Misc

  • Bump minimum Airflow version in providers (#30917)

  • Update SDKs for google provider package (#30067)

5.0.0

Breaking changes

Warning

In this version of the provider, deprecated GCS and Dataflow hooks’ param delegate_to is removed from all Beam operators. Impersonation can be achieved instead by utilizing the impersonation_chain param.

  • remove delegate_to from GCP operators and hooks (#30748)

4.3.0

Features

  • Get rid of state in Apache Beam provider hook (#29503)

4.2.0

Features

  • Add support for running a Beam Go pipeline with an executable binary (#28764)

Misc

  • Deprecate 'delegate_to' param in GCP operators and update docs (#29088)

4.1.1

Bug Fixes

  • Ensure Beam Go file downloaded from GCS still exists when referenced (#28664)

4.1.0

Note

This release of provider is only available for Airflow 2.3+ as explained in the Apache Airflow providers support policy.

Misc

  • Move min airflow version to 2.3.0 for all providers (#27196)

Features

  • Add backward compatibility with old versions of Apache Beam (#27263)

4.0.0

Breaking changes

Note

This release of provider is only available for Airflow 2.2+ as explained in the Apache Airflow providers support policy.

Features

  • Added missing project_id to the wait_for_job (#24020)

  • Support impersonation service account parameter for Dataflow runner (#23961)

Misc

  • chore: Refactoring and Cleaning Apache Providers (#24219)

3.4.0

Features

  • Support serviceAccount attr for dataflow in the Apache beam

3.3.0

Features

  • Add recipe for BeamRunGoPipelineOperator (#22296)

Bug Fixes

  • Fix mistakenly added install_requires for all providers (#22382)

3.2.1

Misc

  • Add Trove classifiers in PyPI (Framework :: Apache Airflow :: Provider)

3.2.0

Features

  • Add support for BeamGoPipelineOperator (#20386)

Misc

  • Support for Python 3.10

3.1.0

Features

  • Use google cloud credentials when executing beam command in subprocess (#18992)

3.0.1

Misc

  • Optimise connection importing for Airflow 2.2.0

3.0.0

Breaking changes

  • Auto-apply apply_default decorator (#15667)

Warning

Due to apply_default decorator removal, this version of the provider requires Airflow 2.1.0+. If your Airflow version is < 2.1.0, and you want to install this provider version, first upgrade Airflow to at least version 2.1.0. Otherwise your Airflow package version will be upgraded automatically and you will have to manually run airflow upgrade db to complete the migration.

2.0.0

Breaking changes

Integration with the google provider

In 2.0.0 version of the provider we’ve changed the way of integrating with the google provider. The previous versions of both providers caused conflicts when trying to install them together using PIP > 20.2.4. The conflict is not detected by PIP 20.2.4 and below but it was there and the version of Google BigQuery python client was not matching on both sides. As the result, when both apache.beam and google provider were installed, some features of the BigQuery operators might not work properly. This was cause by apache-beam client not yet supporting the new google python clients when apache-beam[gcp] extra was used. The apache-beam[gcp] extra is used by Dataflow operators and while they might work with the newer version of the Google BigQuery python client, it is not guaranteed.

This version introduces additional extra requirement for the apache.beam extra of the google provider and symmetrically the additional requirement for the google extra of the apache.beam provider. Both google and apache.beam provider do not use those extras by default, but you can specify them when installing the providers. The consequence of that is that some functionality of the Dataflow operators might not be available.

Unfortunately the only complete solution to the problem is for the apache.beam to migrate to the new (>=2.0.0) Google Python clients.

This is the extra for the google provider:

extras_require = (
    {
        # ...
        "apache.beam": ["apache-airflow-providers-apache-beam", "apache-beam[gcp]"],
        # ...
    },
)

And likewise this is the extra for the apache.beam provider:

extras_require = ({"google": ["apache-airflow-providers-google", "apache-beam[gcp]"]},)

You can still run this with PIP version <= 20.2.4 and go back to the previous behaviour:

pip install apache-airflow-providers-google[apache.beam]

or

pip install apache-airflow-providers-apache-beam[google]

But be aware that some BigQuery operators functionality might not be available in this case.

1.0.1

Bug fixes

  • Improve Apache Beam operators - refactor operator - common Dataflow logic (#14094)

  • Corrections in docs and tools after releasing provider RCs (#14082)

  • Remove WARNINGs from BeamHook (#14554)

1.0.0

Initial version of the provider.

Was this entry helpful?