airflow.providers.microsoft.azure.hooks.wasb

This module contains integration with Azure Blob Storage.

It communicate via the Window Azure Storage Blob protocol. Make sure that a Airflow connection of type wasb exists. Authorization can be done by supplying a login (=Storage account name) and password (=KEY), or login and SAS token in the extra field (see connection wasb_default for an example).

Module Contents

Classes

WasbHook

Interact with Azure Blob Storage through the wasb:// protocol.

WasbAsyncHook

An async hook that connects to Azure WASB to perform operations.

Attributes

AsyncCredentials

airflow.providers.microsoft.azure.hooks.wasb.AsyncCredentials[source]
class airflow.providers.microsoft.azure.hooks.wasb.WasbHook(wasb_conn_id=default_conn_name, public_read=False)[source]

Bases: airflow.hooks.base.BaseHook

Interact with Azure Blob Storage through the wasb:// protocol.

These parameters have to be passed in Airflow Data Base: account_name and account_key.

Additional options passed in the ‘extra’ field of the connection will be passed to the BlockBlockService() constructor. For example, authenticate using a SAS token by adding {“sas_token”: “YOUR_TOKEN”}.

If no authentication configuration is provided, DefaultAzureCredential will be used (applicable when using Azure compute infrastructure).

Parameters
  • wasb_conn_id (str) – Reference to the wasb connection.

  • public_read (bool) – Whether an anonymous public read access should be used. default is False

conn_name_attr = 'wasb_conn_id'[source]
default_conn_name = 'wasb_default'[source]
conn_type = 'wasb'[source]
hook_name = 'Azure Blob Storage'[source]
classmethod get_connection_form_widgets()[source]

Return connection widgets to add to connection form.

classmethod get_ui_field_behaviour()[source]

Return custom field behaviour.

blob_service_client()[source]

Return the BlobServiceClient object (cached).

get_conn()[source]

Return the BlobServiceClient object.

check_for_blob(container_name, blob_name, **kwargs)[source]

Check if a blob exists on Azure Blob Storage.

Parameters
  • container_name (str) – Name of the container.

  • blob_name (str) – Name of the blob.

  • kwargs – Optional keyword arguments for BlobClient.get_blob_properties takes.

Returns

True if the blob exists, False otherwise.

Return type

bool

check_for_prefix(container_name, prefix, **kwargs)[source]

Check if a prefix exists on Azure Blob storage.

Parameters
  • container_name (str) – Name of the container.

  • prefix (str) – Prefix of the blob.

  • kwargs – Optional keyword arguments that ContainerClient.walk_blobs takes

Returns

True if blobs matching the prefix exist, False otherwise.

Return type

bool

get_blobs_list(container_name, prefix=None, include=None, delimiter='/', **kwargs)[source]

List blobs in a given container.

Parameters
  • container_name (str) – The name of the container

  • prefix (str | None) – Filters the results to return only blobs whose names begin with the specified prefix.

  • include (list[str] | None) – Specifies one or more additional datasets to include in the response. Options include: snapshots, metadata, uncommittedblobs, copy`, ``deleted.

  • delimiter (str) – filters objects based on the delimiter (for e.g ‘.csv’)

get_blobs_list_recursive(container_name, prefix=None, include=None, endswith='', **kwargs)[source]

List blobs in a given container.

Parameters
  • container_name (str) – The name of the container

  • prefix (str | None) – Filters the results to return only blobs whose names begin with the specified prefix.

  • include (list[str] | None) – Specifies one or more additional datasets to include in the response. Options include: snapshots, metadata, uncommittedblobs, copy`, ``deleted.

  • delimiter – filters objects based on the delimiter (for e.g ‘.csv’)

load_file(file_path, container_name, blob_name, create_container=False, **kwargs)[source]

Upload a file to Azure Blob Storage.

Parameters
  • file_path (str) – Path to the file to load.

  • container_name (str) – Name of the container.

  • blob_name (str) – Name of the blob.

  • create_container (bool) – Attempt to create the target container prior to uploading the blob. This is useful if the target container may not exist yet. Defaults to False.

  • kwargs – Optional keyword arguments that BlobClient.upload_blob() takes.

load_string(string_data, container_name, blob_name, create_container=False, **kwargs)[source]

Upload a string to Azure Blob Storage.

Parameters
  • string_data (str) – String to load.

  • container_name (str) – Name of the container.

  • blob_name (str) – Name of the blob.

  • create_container (bool) – Attempt to create the target container prior to uploading the blob. This is useful if the target container may not exist yet. Defaults to False.

  • kwargs – Optional keyword arguments that BlobClient.upload() takes.

get_file(file_path, container_name, blob_name, **kwargs)[source]

Download a file from Azure Blob Storage.

Parameters
  • file_path (str) – Path to the file to download.

  • container_name (str) – Name of the container.

  • blob_name (str) – Name of the blob.

  • kwargs – Optional keyword arguments that BlobClient.download_blob() takes.

read_file(container_name, blob_name, **kwargs)[source]

Read a file from Azure Blob Storage and return as a string.

Parameters
  • container_name (str) – Name of the container.

  • blob_name (str) – Name of the blob.

  • kwargs – Optional keyword arguments that BlobClient.download_blob takes.

upload(container_name, blob_name, data, blob_type='BlockBlob', length=None, create_container=False, **kwargs)[source]

Create a new blob from a data source with automatic chunking.

Parameters
  • container_name (str) – The name of the container to upload data

  • blob_name (str) – The name of the blob to upload. This need not exist in the container

  • data (Any) – The blob data to upload

  • blob_type (str) – The type of the blob. This can be either BlockBlob, PageBlob or AppendBlob. The default value is BlockBlob.

  • length (int | None) – Number of bytes to read from the stream. This is optional, but should be supplied for optimal performance.

  • create_container (bool) – Attempt to create the target container prior to uploading the blob. This is useful if the target container may not exist yet. Defaults to False.

download(container_name, blob_name, offset=None, length=None, **kwargs)[source]

Download a blob to the StorageStreamDownloader.

Parameters
  • container_name – The name of the container containing the blob

  • blob_name – The name of the blob to download

  • offset (int | None) – Start of byte range to use for downloading a section of the blob. Must be set if length is provided.

  • length (int | None) – Number of bytes to read from the stream.

create_container(container_name)[source]

Create container object if not already existing.

Parameters

container_name (str) – The name of the container to create

delete_container(container_name)[source]

Delete a container object.

Parameters

container_name (str) – The name of the container

delete_blobs(container_name, *blobs, **kwargs)[source]

Mark the specified blobs or snapshots for deletion.

Parameters
  • container_name (str) – The name of the container containing the blobs

  • blobs – The blobs to delete. This can be a single blob, or multiple values can be supplied, where each value is either the name of the blob (str) or BlobProperties.

delete_file(container_name, blob_name, is_prefix=False, ignore_if_missing=False, delimiter='', **kwargs)[source]

Delete a file, or all blobs matching a prefix, from Azure Blob Storage.

Parameters
  • container_name (str) – Name of the container.

  • blob_name (str) – Name of the blob.

  • is_prefix (bool) – If blob_name is a prefix, delete all matching files

  • ignore_if_missing (bool) – if True, then return success even if the blob does not exist.

  • kwargs – Optional keyword arguments that ContainerClient.delete_blobs() takes.

test_connection()[source]

Test Azure Blob Storage connection.

class airflow.providers.microsoft.azure.hooks.wasb.WasbAsyncHook(wasb_conn_id='wasb_default', public_read=False)[source]

Bases: WasbHook

An async hook that connects to Azure WASB to perform operations.

Parameters
  • wasb_conn_id (str) – reference to the wasb connection

  • public_read (bool) – whether an anonymous public read access should be used. default is False

async get_async_conn()[source]

Return the Async BlobServiceClient object.

async check_for_blob_async(container_name, blob_name, **kwargs)[source]

Check if a blob exists on Azure Blob Storage.

Parameters
  • container_name (str) – name of the container

  • blob_name (str) – name of the blob

  • kwargs (Any) – optional keyword arguments for BlobClient.get_blob_properties

async get_blobs_list_async(container_name, prefix=None, include=None, delimiter='/', **kwargs)[source]

List blobs in a given container.

Parameters
  • container_name (str) – the name of the container

  • prefix (str | None) – filters the results to return only blobs whose names begin with the specified prefix.

  • include (list[str] | None) – specifies one or more additional datasets to include in the response. Options include: snapshots, metadata, uncommittedblobs, copy`, ``deleted.

  • delimiter (str) – filters objects based on the delimiter (for e.g ‘.csv’)

async check_for_prefix_async(container_name, prefix, **kwargs)[source]

Check if a prefix exists on Azure Blob storage.

Parameters
  • container_name (str) – Name of the container.

  • prefix (str) – Prefix of the blob.

  • kwargs (Any) – Optional keyword arguments for ContainerClient.walk_blobs

Was this entry helpful?