airflow.providers.databricks.operators.databricks_repos

This module contains Databricks operators.

Module Contents

Classes

DatabricksReposCreateOperator

Creates, and optionally checks out, a Databricks Repo using the POST api/2.0/repos API endpoint.

DatabricksReposUpdateOperator

Updates specified repository to a given branch or tag using the PATCH api/2.0/repos API endpoint.

DatabricksReposDeleteOperator

Deletes specified repository using the DELETE api/2.0/repos API endpoint.

class airflow.providers.databricks.operators.databricks_repos.DatabricksReposCreateOperator(*, git_url, git_provider=None, branch=None, tag=None, repo_path=None, ignore_existing_repo=False, databricks_conn_id='databricks_default', databricks_retry_limit=3, databricks_retry_delay=1, **kwargs)[source]

Bases: airflow.models.BaseOperator

Creates, and optionally checks out, a Databricks Repo using the POST api/2.0/repos API endpoint.

Parameters
  • git_url (str) – Required HTTPS URL of a Git repository

  • git_provider (str | None) – Optional name of Git provider. Must be provided if we can’t guess its name from URL.

  • repo_path (str | None) – optional path for a repository. Must be in the format /Repos/{folder}/{repo-name}. If not specified, it will be created in the user’s directory.

  • branch (str | None) – optional name of branch to check out.

  • tag (str | None) – optional name of tag to checkout.

  • ignore_existing_repo (bool) – don’t throw exception if repository with given path already exists.

  • databricks_conn_id (str) – Reference to the Databricks connection. By default and in the common case this will be databricks_default. To use token based authentication, provide the key token in the extra field for the connection and create the key host and leave the host field empty. (templated)

  • databricks_retry_limit (int) – Amount of times retry if the Databricks backend is unreachable. Its value must be greater than or equal to 1.

  • databricks_retry_delay (int) – Number of seconds to wait between retries (it might be a floating point number).

template_fields: Sequence[str] = ('repo_path', 'tag', 'branch', 'databricks_conn_id')[source]
__git_providers__[source]
__aws_code_commit_regexp__[source]
__repos_path_regexp__[source]
static __detect_repo_provider__(url)[source]
execute(context)[source]

Create a Databricks Repo.

Parameters

context (airflow.utils.context.Context) – context

Returns

Repo ID

class airflow.providers.databricks.operators.databricks_repos.DatabricksReposUpdateOperator(*, branch=None, tag=None, repo_id=None, repo_path=None, databricks_conn_id='databricks_default', databricks_retry_limit=3, databricks_retry_delay=1, **kwargs)[source]

Bases: airflow.models.BaseOperator

Updates specified repository to a given branch or tag using the PATCH api/2.0/repos API endpoint.

See: https://docs.databricks.com/dev-tools/api/latest/repos.html#operation/update-repo

Parameters
  • branch (str | None) – optional name of branch to update to. Should be specified if tag is omitted

  • tag (str | None) – optional name of tag to update to. Should be specified if branch is omitted

  • repo_id (str | None) – optional ID of existing repository. Should be specified if repo_path is omitted

  • repo_path (str | None) – optional path of existing repository. Should be specified if repo_id is omitted

  • databricks_conn_id (str) – Reference to the Databricks connection. By default and in the common case this will be databricks_default. To use token based authentication, provide the key token in the extra field for the connection and create the key host and leave the host field empty. (templated)

  • databricks_retry_limit (int) – Amount of times retry if the Databricks backend is unreachable. Its value must be greater than or equal to 1.

  • databricks_retry_delay (int) – Number of seconds to wait between retries (it might be a floating point number).

template_fields: Sequence[str] = ('repo_path', 'tag', 'branch', 'databricks_conn_id')[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

class airflow.providers.databricks.operators.databricks_repos.DatabricksReposDeleteOperator(*, repo_id=None, repo_path=None, databricks_conn_id='databricks_default', databricks_retry_limit=3, databricks_retry_delay=1, **kwargs)[source]

Bases: airflow.models.BaseOperator

Deletes specified repository using the DELETE api/2.0/repos API endpoint.

See: https://docs.databricks.com/dev-tools/api/latest/repos.html#operation/delete-repo

Parameters
  • repo_id (str | None) – optional ID of existing repository. Should be specified if repo_path is omitted

  • repo_path (str | None) – optional path of existing repository. Should be specified if repo_id is omitted

  • databricks_conn_id (str) – Reference to the Databricks connection. By default and in the common case this will be databricks_default. To use token based authentication, provide the key token in the extra field for the connection and create the key host and leave the host field empty. (templated)

  • databricks_retry_limit (int) – Amount of times retry if the Databricks backend is unreachable. Its value must be greater than or equal to 1.

  • databricks_retry_delay (int) – Number of seconds to wait between retries (it might be a floating point number).

template_fields: Sequence[str] = ('repo_path', 'databricks_conn_id')[source]
execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

Was this entry helpful?