airflow.providers.amazon.aws.operators.glue_databrew

Module Contents

Classes

GlueDataBrewStartJobOperator

Start an AWS Glue DataBrew job.

class airflow.providers.amazon.aws.operators.glue_databrew.GlueDataBrewStartJobOperator(job_name, wait_for_completion=True, delay=None, waiter_delay=30, waiter_max_attempts=60, deferrable=conf.getboolean('operators', 'default_deferrable', fallback=False), **kwargs)[source]

Bases: airflow.providers.amazon.aws.operators.base_aws.AwsBaseOperator[airflow.providers.amazon.aws.hooks.glue_databrew.GlueDataBrewHook]

Start an AWS Glue DataBrew job.

AWS Glue DataBrew is a visual data preparation tool that makes it easier for data analysts and data scientists to clean and normalize data to prepare it for analytics and machine learning (ML).

See also

For more information on how to use this operator, take a look at the guide: Start an AWS Glue DataBrew job

Parameters
  • job_name (str) – unique job name per AWS Account

  • wait_for_completion (bool) – Whether to wait for job run completion. (default: True)

  • deferrable (bool) – If True, the operator will wait asynchronously for the job to complete. This implies waiting for completion. This mode requires aiobotocore module to be installed. (default: False)

  • waiter_delay (int) – Time in seconds to wait between status checks. Default is 30.

  • waiter_max_attempts (int) – Maximum number of attempts to check for job completion. (default: 60)

  • aws_conn_id – The Airflow connection used for AWS credentials. If this is None or empty then the default boto3 behaviour is used. If running Airflow in a distributed manner and aws_conn_id is None or empty, then default boto3 configuration would be used (and must be maintained on each worker node).

  • region_name – AWS region_name. If not specified then the default boto3 behaviour is used.

  • verify – Whether or not to verify SSL certificates. See: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html

  • botocore_config – Configuration dictionary (key-values) for botocore client. See: https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html

Returns

dictionary with key run_id and value of the resulting job’s run_id.

aws_hook_class[source]
template_fields: Sequence[str][source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

execute_complete(context, event=None)[source]

Was this entry helpful?