airflow.providers.microsoft.azure.hooks.wasb
¶
This module contains integration with Azure Blob Storage.
It communicate via the Window Azure Storage Blob protocol. Make sure that a Airflow connection of type wasb exists. Authorization can be done by supplying a login (=Storage account name) and password (=KEY), or login and SAS token in the extra field (see connection wasb_default for an example).
Module Contents¶
Classes¶
Interact with Azure Blob Storage through the |
|
An async hook that connects to Azure WASB to perform operations. |
Attributes¶
- class airflow.providers.microsoft.azure.hooks.wasb.WasbHook(wasb_conn_id=default_conn_name, public_read=False)[source]¶
Bases:
airflow.hooks.base.BaseHook
Interact with Azure Blob Storage through the
wasb://
protocol.These parameters have to be passed in Airflow Data Base: account_name and account_key.
Additional options passed in the ‘extra’ field of the connection will be passed to the BlockBlockService() constructor. For example, authenticate using a SAS token by adding {“sas_token”: “YOUR_TOKEN”}.
If no authentication configuration is provided, DefaultAzureCredential will be used (applicable when using Azure compute infrastructure).
- Parameters
wasb_conn_id (str) – Reference to the wasb connection.
public_read (bool) – Whether an anonymous public read access should be used. default is False
- classmethod get_connection_form_widgets()[source]¶
Return connection widgets to add to connection form.
- check_for_blob(container_name, blob_name, **kwargs)[source]¶
Check if a blob exists on Azure Blob Storage.
- check_for_prefix(container_name, prefix, **kwargs)[source]¶
Check if a prefix exists on Azure Blob storage.
- get_blobs_list(container_name, prefix=None, include=None, delimiter='/', **kwargs)[source]¶
List blobs in a given container.
- Parameters
container_name (str) – The name of the container
prefix (str | None) – Filters the results to return only blobs whose names begin with the specified prefix.
include (list[str] | None) – Specifies one or more additional datasets to include in the response. Options include:
snapshots
,metadata
,uncommittedblobs
,copy`, ``deleted
.delimiter (str) – filters objects based on the delimiter (for e.g ‘.csv’)
- get_blobs_list_recursive(container_name, prefix=None, include=None, endswith='', **kwargs)[source]¶
List blobs in a given container.
- Parameters
container_name (str) – The name of the container
prefix (str | None) – Filters the results to return only blobs whose names begin with the specified prefix.
include (list[str] | None) – Specifies one or more additional datasets to include in the response. Options include:
snapshots
,metadata
,uncommittedblobs
,copy`, ``deleted
.delimiter – filters objects based on the delimiter (for e.g ‘.csv’)
- load_file(file_path, container_name, blob_name, create_container=False, **kwargs)[source]¶
Upload a file to Azure Blob Storage.
- Parameters
file_path (str) – Path to the file to load.
container_name (str) – Name of the container.
blob_name (str) – Name of the blob.
create_container (bool) – Attempt to create the target container prior to uploading the blob. This is useful if the target container may not exist yet. Defaults to False.
kwargs – Optional keyword arguments that
BlobClient.upload_blob()
takes.
- load_string(string_data, container_name, blob_name, create_container=False, **kwargs)[source]¶
Upload a string to Azure Blob Storage.
- Parameters
string_data (str) – String to load.
container_name (str) – Name of the container.
blob_name (str) – Name of the blob.
create_container (bool) – Attempt to create the target container prior to uploading the blob. This is useful if the target container may not exist yet. Defaults to False.
kwargs – Optional keyword arguments that
BlobClient.upload()
takes.
- get_file(file_path, container_name, blob_name, **kwargs)[source]¶
Download a file from Azure Blob Storage.
- read_file(container_name, blob_name, **kwargs)[source]¶
Read a file from Azure Blob Storage and return as a string.
- upload(container_name, blob_name, data, blob_type='BlockBlob', length=None, create_container=False, **kwargs)[source]¶
Create a new blob from a data source with automatic chunking.
- Parameters
container_name (str) – The name of the container to upload data
blob_name (str) – The name of the blob to upload. This need not exist in the container
data (Any) – The blob data to upload
blob_type (str) – The type of the blob. This can be either
BlockBlob
,PageBlob
orAppendBlob
. The default value isBlockBlob
.length (int | None) – Number of bytes to read from the stream. This is optional, but should be supplied for optimal performance.
create_container (bool) – Attempt to create the target container prior to uploading the blob. This is useful if the target container may not exist yet. Defaults to False.
- download(container_name, blob_name, offset=None, length=None, **kwargs)[source]¶
Download a blob to the StorageStreamDownloader.
- Parameters
- create_container(container_name)[source]¶
Create container object if not already existing.
- Parameters
container_name (str) – The name of the container to create
- delete_container(container_name)[source]¶
Delete a container object.
- Parameters
container_name (str) – The name of the container
- delete_blobs(container_name, *blobs, **kwargs)[source]¶
Mark the specified blobs or snapshots for deletion.
- Parameters
container_name (str) – The name of the container containing the blobs
blobs – The blobs to delete. This can be a single blob, or multiple values can be supplied, where each value is either the name of the blob (str) or BlobProperties.
- copy_blobs(source_container_name, source_blob_name, destination_container_name, destination_blob_name)[source]¶
Copy the specified blobs from one blob prefix to another.
- Parameters
source_container_name (str) – The name of the source container containing the blobs.
source_blob_name (str) – The full source blob path without the container name.
destination_container_name (str) – The name of the destination container where the blobs will be copied to.
destination_blob_name (str) – The full destination blob path without the container name.
- delete_file(container_name, blob_name, is_prefix=False, ignore_if_missing=False, delimiter='', **kwargs)[source]¶
Delete a file, or all blobs matching a prefix, from Azure Blob Storage.
- Parameters
container_name (str) – Name of the container.
blob_name (str) – Name of the blob.
is_prefix (bool) – If blob_name is a prefix, delete all matching files
ignore_if_missing (bool) – if True, then return success even if the blob does not exist.
kwargs – Optional keyword arguments that
ContainerClient.delete_blobs()
takes.
- class airflow.providers.microsoft.azure.hooks.wasb.WasbAsyncHook(wasb_conn_id='wasb_default', public_read=False)[source]¶
Bases:
WasbHook
An async hook that connects to Azure WASB to perform operations.
- Parameters
wasb_conn_id (str) – reference to the wasb connection
public_read (bool) – whether an anonymous public read access should be used. default is False
- async check_for_blob_async(container_name, blob_name, **kwargs)[source]¶
Check if a blob exists on Azure Blob Storage.
- async get_blobs_list_async(container_name, prefix=None, include=None, delimiter='/', **kwargs)[source]¶
List blobs in a given container.
- Parameters
container_name (str) – the name of the container
prefix (str | None) – filters the results to return only blobs whose names begin with the specified prefix.
include (list[str] | None) – specifies one or more additional datasets to include in the response. Options include:
snapshots
,metadata
,uncommittedblobs
,copy`, ``deleted
.delimiter (str) – filters objects based on the delimiter (for e.g ‘.csv’)