airflow.providers.amazon.aws.triggers.glue

Module Contents

Classes

GlueJobCompleteTrigger

Watches for a glue job, triggers when it finishes.

GlueCatalogPartitionTrigger

Asynchronously waits for a partition to show up in AWS Glue Catalog.

GlueDataQualityRuleSetEvaluationRunCompleteTrigger

Trigger when a AWS Glue data quality evaluation run complete.

GlueDataQualityRuleRecommendationRunCompleteTrigger

Trigger when a AWS Glue data quality recommendation run complete.

class airflow.providers.amazon.aws.triggers.glue.GlueJobCompleteTrigger(job_name, run_id, verbose, aws_conn_id, job_poll_interval)[source]

Bases: airflow.triggers.base.BaseTrigger

Watches for a glue job, triggers when it finishes.

Parameters
  • job_name (str) – glue job name

  • run_id (str) – the ID of the specific run to watch for that job

  • verbose (bool) – whether to print the job’s logs in airflow logs or not

  • aws_conn_id (str | None) – The Airflow connection used for AWS credentials.

serialize()[source]

Return the information needed to reconstruct this Trigger.

Returns

Tuple of (class path, keyword arguments needed to re-instantiate).

Return type

tuple[str, dict[str, Any]]

async run()[source]

Run the trigger in an asynchronous context.

The trigger should yield an Event whenever it wants to fire off an event, and return None if it is finished. Single-event triggers should thus yield and then immediately return.

If it yields, it is likely that it will be resumed very quickly, but it may not be (e.g. if the workload is being moved to another triggerer process, or a multi-event trigger was being used for a single-event task defer).

In either case, Trigger classes should assume they will be persisted, and then rely on cleanup() being called when they are no longer needed.

class airflow.providers.amazon.aws.triggers.glue.GlueCatalogPartitionTrigger(database_name, table_name, expression='', waiter_delay=60, aws_conn_id='aws_default', region_name=None, verify=None, botocore_config=None)[source]

Bases: airflow.triggers.base.BaseTrigger

Asynchronously waits for a partition to show up in AWS Glue Catalog.

Parameters
  • database_name (str) – The name of the catalog database where the partitions reside.

  • table_name (str) – The name of the table to wait for, supports the dot notation (my_database.my_table)

  • expression (str) – The partition clause to wait for. This is passed as is to the AWS Glue Catalog API’s get_partitions function, and supports SQL like notation as in ds='2015-01-01' AND type='value' and comparison operators as in "ds>=2015-01-01". See https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-catalog-partitions.html #aws-glue-api-catalog-partitions-GetPartitions

  • aws_conn_id (str | None) – ID of the Airflow connection where credentials and extra configuration are stored

  • region_name (str | None) – Optional aws region name (example: us-east-1). Uses region from connection if not specified.

  • waiter_delay (int) – Number of seconds to wait between two checks. Default is 60 seconds.

serialize()[source]

Return the information needed to reconstruct this Trigger.

Returns

Tuple of (class path, keyword arguments needed to re-instantiate).

Return type

tuple[str, dict[str, Any]]

hook()[source]
async poke(client)[source]
async run()[source]

Run the trigger in an asynchronous context.

The trigger should yield an Event whenever it wants to fire off an event, and return None if it is finished. Single-event triggers should thus yield and then immediately return.

If it yields, it is likely that it will be resumed very quickly, but it may not be (e.g. if the workload is being moved to another triggerer process, or a multi-event trigger was being used for a single-event task defer).

In either case, Trigger classes should assume they will be persisted, and then rely on cleanup() being called when they are no longer needed.

class airflow.providers.amazon.aws.triggers.glue.GlueDataQualityRuleSetEvaluationRunCompleteTrigger(evaluation_run_id, waiter_delay=60, waiter_max_attempts=75, aws_conn_id='aws_default')[source]

Bases: airflow.providers.amazon.aws.triggers.base.AwsBaseWaiterTrigger

Trigger when a AWS Glue data quality evaluation run complete.

Parameters
  • evaluation_run_id (str) – The AWS Glue data quality ruleset evaluation run identifier.

  • waiter_delay (int) – The amount of time in seconds to wait between attempts. (default: 60)

  • waiter_max_attempts (int) – The maximum number of attempts to be made. (default: 75)

  • aws_conn_id (str | None) – The Airflow connection used for AWS credentials.

hook()[source]

Override in subclasses to return the right hook.

class airflow.providers.amazon.aws.triggers.glue.GlueDataQualityRuleRecommendationRunCompleteTrigger(recommendation_run_id, waiter_delay=60, waiter_max_attempts=75, aws_conn_id='aws_default')[source]

Bases: airflow.providers.amazon.aws.triggers.base.AwsBaseWaiterTrigger

Trigger when a AWS Glue data quality recommendation run complete.

Parameters
  • recommendation_run_id (str) – The AWS Glue data quality rule recommendation run identifier.

  • waiter_delay (int) – The amount of time in seconds to wait between attempts. (default: 60)

  • waiter_max_attempts (int) – The maximum number of attempts to be made. (default: 75)

  • aws_conn_id (str | None) – The Airflow connection used for AWS credentials.

hook()[source]

Override in subclasses to return the right hook.

Was this entry helpful?