airflow.providers.amazon.aws.operators.glue_crawler¶

Classes¶

GlueCrawlerOperator

Creates, updates and triggers an AWS Glue Crawler.

Module Contents¶

class airflow.providers.amazon.aws.operators.glue_crawler.GlueCrawlerOperator(config, poll_interval=5, wait_for_completion=True, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]¶

Bases: airflow.providers.amazon.aws.operators.base_aws.AwsBaseOperator[airflow.providers.amazon.aws.hooks.glue_crawler.GlueCrawlerHook]

Creates, updates and triggers an AWS Glue Crawler.

AWS Glue Crawler is a serverless service that manages a catalog of metadata tables that contain the inferred schema, format and data types of data stores within the AWS cloud.

See also

For more information on how to use this operator, take a look at the guide: Create an AWS Glue crawler

Parameters:

config – Configurations for the AWS Glue crawler
poll_interval (int) – Time (in seconds) to wait between two consecutive calls to check crawler status
wait_for_completion (bool) – Whether to wait for crawl execution completion. (default: True)
deferrable (bool) – If True, the operator will wait asynchronously for the crawl to complete. This implies waiting for completion. This mode requires aiobotocore module to be installed. (default: False)
aws_conn_id – The Airflow connection used for AWS credentials. If this is None or empty then the default boto3 behaviour is used. If running Airflow in a distributed manner and aws_conn_id is None or empty, then default boto3 configuration would be used (and must be maintained on each worker node).
region_name – AWS region_name. If not specified then the default boto3 behaviour is used.
verify – Whether or not to verify SSL certificates. See: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html
botocore_config – Configuration dictionary (key-values) for botocore client. See: https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html

aws_hook_class[source]¶

template_fields: collections.abc.Sequence[str][source]¶

ui_color = '#ededed'[source]¶

poll_interval = 5[source]¶

wait_for_completion = True[source]¶

deferrable = True[source]¶

config[source]¶

execute(context)[source]¶

Execute AWS Glue Crawler from Airflow.

Returns:: the name of the current glue crawler.
Return type:: str

execute_complete(context, event=None)[source]¶