airflow.providers.databricks.hooks.databricks
¶
Databricks hook.
This hook enable the submitting and running of jobs to the Databricks platform. Internally the
operators talk to the
api/2.1/jobs/run-now
endpoint <https://docs.databricks.com/dev-tools/api/latest/jobs.html#operation/JobsRunNow>_
or the api/2.1/jobs/runs/submit
endpoint.
Module Contents¶
Classes¶
Enum for the run life cycle state concept of Databricks runs. |
|
Utility class for the run state concept of Databricks runs. |
|
Utility class for the cluster state concept of Databricks cluster. |
|
Interact with Databricks. |
Attributes¶
- airflow.providers.databricks.hooks.databricks.GET_CLUSTER_ENDPOINT = ('GET', 'api/2.0/clusters/get')[source]¶
- airflow.providers.databricks.hooks.databricks.RESTART_CLUSTER_ENDPOINT = ('POST', 'api/2.0/clusters/restart')[source]¶
- airflow.providers.databricks.hooks.databricks.START_CLUSTER_ENDPOINT = ('POST', 'api/2.0/clusters/start')[source]¶
- airflow.providers.databricks.hooks.databricks.TERMINATE_CLUSTER_ENDPOINT = ('POST', 'api/2.0/clusters/delete')[source]¶
- airflow.providers.databricks.hooks.databricks.CREATE_ENDPOINT = ('POST', 'api/2.1/jobs/create')[source]¶
- airflow.providers.databricks.hooks.databricks.RESET_ENDPOINT = ('POST', 'api/2.1/jobs/reset')[source]¶
- airflow.providers.databricks.hooks.databricks.UPDATE_ENDPOINT = ('POST', 'api/2.1/jobs/update')[source]¶
- airflow.providers.databricks.hooks.databricks.RUN_NOW_ENDPOINT = ('POST', 'api/2.1/jobs/run-now')[source]¶
- airflow.providers.databricks.hooks.databricks.SUBMIT_RUN_ENDPOINT = ('POST', 'api/2.1/jobs/runs/submit')[source]¶
- airflow.providers.databricks.hooks.databricks.GET_RUN_ENDPOINT = ('GET', 'api/2.1/jobs/runs/get')[source]¶
- airflow.providers.databricks.hooks.databricks.CANCEL_RUN_ENDPOINT = ('POST', 'api/2.1/jobs/runs/cancel')[source]¶
- airflow.providers.databricks.hooks.databricks.DELETE_RUN_ENDPOINT = ('POST', 'api/2.1/jobs/runs/delete')[source]¶
- airflow.providers.databricks.hooks.databricks.REPAIR_RUN_ENDPOINT = ('POST', 'api/2.1/jobs/runs/repair')[source]¶
- airflow.providers.databricks.hooks.databricks.OUTPUT_RUNS_JOB_ENDPOINT = ('GET', 'api/2.1/jobs/runs/get-output')[source]¶
- airflow.providers.databricks.hooks.databricks.CANCEL_ALL_RUNS_ENDPOINT = ('POST', 'api/2.1/jobs/runs/cancel-all')[source]¶
- airflow.providers.databricks.hooks.databricks.INSTALL_LIBS_ENDPOINT = ('POST', 'api/2.0/libraries/install')[source]¶
- airflow.providers.databricks.hooks.databricks.UNINSTALL_LIBS_ENDPOINT = ('POST', 'api/2.0/libraries/uninstall')[source]¶
- airflow.providers.databricks.hooks.databricks.LIST_JOBS_ENDPOINT = ('GET', 'api/2.1/jobs/list')[source]¶
- airflow.providers.databricks.hooks.databricks.LIST_PIPELINES_ENDPOINT = ('GET', 'api/2.0/pipelines')[source]¶
- airflow.providers.databricks.hooks.databricks.WORKSPACE_GET_STATUS_ENDPOINT = ('GET', 'api/2.0/workspace/get-status')[source]¶
- airflow.providers.databricks.hooks.databricks.SPARK_VERSIONS_ENDPOINT = ('GET', 'api/2.0/clusters/spark-versions')[source]¶
- class airflow.providers.databricks.hooks.databricks.RunLifeCycleState[source]¶
Bases:
enum.Enum
Enum for the run life cycle state concept of Databricks runs.
See more information at: https://docs.databricks.com/api/azure/workspace/jobs/listruns#runs-state-life_cycle_state
- class airflow.providers.databricks.hooks.databricks.RunState(life_cycle_state, result_state='', state_message='', *args, **kwargs)[source]¶
Utility class for the run state concept of Databricks runs.
- class airflow.providers.databricks.hooks.databricks.ClusterState(state='', state_message='', *args, **kwargs)[source]¶
Utility class for the cluster state concept of Databricks cluster.
- class airflow.providers.databricks.hooks.databricks.DatabricksHook(databricks_conn_id=BaseDatabricksHook.default_conn_name, timeout_seconds=180, retry_limit=3, retry_delay=1.0, retry_args=None, caller='DatabricksHook')[source]¶
Bases:
airflow.providers.databricks.hooks.databricks_base.BaseDatabricksHook
Interact with Databricks.
- Parameters
databricks_conn_id (str) – Reference to the Databricks connection.
timeout_seconds (int) – The amount of time in seconds the requests library will wait before timing-out.
retry_limit (int) – The number of times to retry the connection in case of service outages.
retry_delay (float) – The number of seconds to wait between retries (it might be a floating point number).
retry_args (dict[Any, Any] | None) – An optional dictionary with arguments passed to
tenacity.Retrying
class.
- reset_job(job_id, json)[source]¶
Call the
api/2.1/jobs/reset
endpoint.- Parameters
json (dict) – The data used in the new_settings of the request to the
reset
endpoint.
- list_jobs(limit=25, expand_tasks=False, job_name=None, page_token=None, include_user_names=False)[source]¶
List the jobs in the Databricks Job Service.
- Parameters
- Returns
A list of jobs.
- Return type
- find_job_id_by_name(job_name)[source]¶
Find job id by its name; if there are multiple jobs with the same name, raise AirflowException.
- list_pipelines(batch_size=25, pipeline_name=None, notebook_path=None)[source]¶
List the pipelines in Databricks Delta Live Tables.
- Parameters
- Returns
A list of pipelines.
- Return type
- find_pipeline_id_by_name(pipeline_name)[source]¶
Find pipeline id by its name; if multiple pipelines with the same name, raise AirflowException.
- get_run_state(run_id)[source]¶
Retrieve run state of the run.
Please note that any Airflow tasks that call the
get_run_state
method will result in failure unless you have enabled xcom pickling. This can be done using the following environment variable:AIRFLOW__CORE__ENABLE_XCOM_PICKLING
If you do not want to enable xcom pickling, use the
get_run_state_str
method to get a string describing state, orget_run_state_lifecycle
,get_run_state_result
, orget_run_state_message
to get individual components of the run state.
- cancel_all_runs(job_id)[source]¶
Cancel all active runs of a job asynchronously.
- Parameters
job_id (int) – The canonical identifier of the job to cancel all runs of
- get_cluster_state(cluster_id)[source]¶
Retrieve run state of the cluster.
- Parameters
cluster_id (str) – id of the cluster
- Returns
state of the cluster
- Return type
- async a_get_cluster_state(cluster_id)[source]¶
Async version of get_cluster_state.
- Parameters
cluster_id (str) – id of the cluster
- Returns
state of the cluster
- Return type
- restart_cluster(json)[source]¶
Restarts the cluster.
- Parameters
json (dict) – json dictionary containing cluster specification.
- start_cluster(json)[source]¶
Start the cluster.
- Parameters
json (dict) – json dictionary containing cluster specification.
- terminate_cluster(json)[source]¶
Terminate the cluster.
- Parameters
json (dict) – json dictionary containing cluster specification.
- install(json)[source]¶
Install libraries on the cluster.
Utility function to call the
2.0/libraries/install
endpoint.- Parameters
json (dict) – json dictionary containing cluster_id and an array of library
- uninstall(json)[source]¶
Uninstall libraries on the cluster.
Utility function to call the
2.0/libraries/uninstall
endpoint.- Parameters
json (dict) – json dictionary containing cluster_id and an array of library
- delete_repo(repo_id)[source]¶
Delete given Databricks Repos.
- Parameters
repo_id (str) – ID of Databricks Repos
- Returns