airflow.providers.amazon.aws.hooks.datasync

Interact with AWS DataSync, using the AWS boto3 library.

Module Contents

Classes

DataSyncHook

Interact with AWS DataSync.

class airflow.providers.amazon.aws.hooks.datasync.DataSyncHook(wait_interval_seconds=30, *args, **kwargs)[source]

Bases: airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook

Interact with AWS DataSync.

Additional arguments (such as aws_conn_id) may be specified and are passed down to the underlying AwsBaseHook.

Parameters

wait_interval_seconds (int) – Time to wait between two consecutive calls to check TaskExecution status. Defaults to 30 seconds.

Raises

ValueError – If wait_interval_seconds is not between 0 and 15*60 seconds.

TASK_EXECUTION_INTERMEDIATE_STATES = ['INITIALIZING', 'QUEUED', 'LAUNCHING', 'PREPARING', 'TRANSFERRING', 'VERIFYING'][source]
TASK_EXECUTION_FAILURE_STATES = ['ERROR'][source]
TASK_EXECUTION_SUCCESS_STATES = ['SUCCESS'][source]
create_location(location_uri, **create_location_kwargs)[source]

Creates a new location.

Parameters
  • location_uri (str) – Location URI used to determine the location type (S3, SMB, NFS, EFS).

  • create_location_kwargs – Passed to boto.create_location_xyz(). See AWS boto3 datasync documentation.

Return str

LocationArn of the created Location.

Raises

AirflowException – If location type (prefix from location_uri) is invalid.

Return type

str

get_location_arns(location_uri, case_sensitive=False, ignore_trailing_slash=True)[source]

Return all LocationArns which match a LocationUri.

Parameters
  • location_uri (str) – Location URI to search for, eg s3://mybucket/mypath

  • case_sensitive (bool) – Do a case sensitive search for location URI.

  • ignore_trailing_slash (bool) – Ignore / at the end of URI when matching.

Returns

List of LocationArns.

Raises

AirflowBadRequest – if location_uri is empty

Return type

list[str]

create_task(source_location_arn, destination_location_arn, **create_task_kwargs)[source]

Create a Task between the specified source and destination LocationArns.

Parameters
  • source_location_arn (str) – Source LocationArn. Must exist already.

  • destination_location_arn (str) – Destination LocationArn. Must exist already.

  • create_task_kwargs – Passed to boto.create_task(). See AWS boto3 datasync documentation.

Returns

TaskArn of the created Task

Return type

str

update_task(task_arn, **update_task_kwargs)[source]

Update a Task.

Parameters
  • task_arn (str) – The TaskArn to update.

  • update_task_kwargs – Passed to boto.update_task(), See AWS boto3 datasync documentation.

delete_task(task_arn)[source]

Delete a Task.

Parameters

task_arn (str) – The TaskArn to delete.

get_task_arns_for_location_arns(source_location_arns, destination_location_arns)[source]

Return list of TaskArns for which use any one of the specified source LocationArns and any one of the specified destination LocationArns.

Parameters
  • source_location_arns (list) – List of source LocationArns.

  • destination_location_arns (list) – List of destination LocationArns.

Returns

list

Raises

AirflowBadRequest – if source_location_arns or destination_location_arns are empty.

Return type

list

start_task_execution(task_arn, **kwargs)[source]

Start a TaskExecution for the specified task_arn. Each task can have at most one TaskExecution.

Parameters
  • task_arn (str) – TaskArn

  • kwargs – kwargs sent to boto3.start_task_execution()

Returns

TaskExecutionArn

Raises
  • ClientError – If a TaskExecution is already busy running for this task_arn.

  • AirflowBadRequest – If task_arn is empty.

Return type

str

cancel_task_execution(task_execution_arn)[source]

Cancel a TaskExecution for the specified task_execution_arn.

Parameters

task_execution_arn (str) – TaskExecutionArn.

Raises

AirflowBadRequest – If task_execution_arn is empty.

get_task_description(task_arn)[source]

Get description for the specified task_arn.

Parameters

task_arn (str) – TaskArn

Returns

AWS metadata about a task.

Raises

AirflowBadRequest – If task_arn is empty.

Return type

dict

describe_task_execution(task_execution_arn)[source]

Get description for the specified task_execution_arn.

Parameters

task_execution_arn (str) – TaskExecutionArn

Returns

AWS metadata about a task execution.

Raises

AirflowBadRequest – If task_execution_arn is empty.

Return type

dict

get_current_task_execution_arn(task_arn)[source]

Get current TaskExecutionArn (if one exists) for the specified task_arn.

Parameters

task_arn (str) – TaskArn

Returns

CurrentTaskExecutionArn for this task_arn or None.

Raises

AirflowBadRequest – if task_arn is empty.

Return type

str | None

wait_for_task_execution(task_execution_arn, max_iterations=60)[source]

Wait for Task Execution status to be complete (SUCCESS/ERROR). The task_execution_arn must exist, or a boto3 ClientError will be raised.

Parameters
  • task_execution_arn (str) – TaskExecutionArn

  • max_iterations (int) – Maximum number of iterations before timing out.

Returns

Result of task execution.

Raises
  • AirflowTaskTimeout – If maximum iterations is exceeded.

  • AirflowBadRequest – If task_execution_arn is empty.

Return type

bool

Was this entry helpful?