airflow.providers.apache.livy.hooks.livy

This module contains the Apache Livy hook.

Classes

BatchState

Batch session states.

LivyHook

Hook for Apache Livy through the REST API.

LivyAsyncHook

Hook for Apache Livy through the REST API asynchronously.

Functions

sanitize_endpoint_prefix(endpoint_prefix)

Ensure that the endpoint prefix is prefixed with a slash.

Module Contents

class airflow.providers.apache.livy.hooks.livy.BatchState[source]

Bases: enum.Enum

Batch session states.

NOT_STARTED = 'not_started'[source]
STARTING = 'starting'[source]
RUNNING = 'running'[source]
IDLE = 'idle'[source]
BUSY = 'busy'[source]
SHUTTING_DOWN = 'shutting_down'[source]
ERROR = 'error'[source]
DEAD = 'dead'[source]
KILLED = 'killed'[source]
SUCCESS = 'success'[source]
airflow.providers.apache.livy.hooks.livy.sanitize_endpoint_prefix(endpoint_prefix)[source]

Ensure that the endpoint prefix is prefixed with a slash.

class airflow.providers.apache.livy.hooks.livy.LivyHook(livy_conn_id=default_conn_name, extra_options=None, extra_headers=None, auth_type=None, endpoint_prefix=None)[source]

Bases: airflow.providers.http.hooks.http.HttpHook

Hook for Apache Livy through the REST API.

Parameters:
  • livy_conn_id (str) – reference to a pre-defined Livy Connection.

  • extra_options (dict[str, Any] | None) – A dictionary of options passed to Livy.

  • extra_headers (dict[str, Any] | None) – A dictionary of headers passed to the HTTP request to livy.

  • auth_type (Any | None) – The auth type for the service.

See also

For more details refer to the Apache Livy API reference: https://livy.apache.org/docs/latest/rest-api.html

TERMINAL_STATES[source]
conn_name_attr = 'livy_conn_id'[source]
default_conn_name = 'livy_default'[source]
conn_type = 'livy'[source]
hook_name = 'Apache Livy'[source]
default_headers[source]
method = 'POST'[source]
http_conn_id = 'livy_default'[source]
extra_headers[source]
extra_options[source]
endpoint_prefix = ''[source]
run_method(endpoint, method='GET', data=None, headers=None, retry_args=None)[source]

Wrap HttpHook; allows to change method on the same HttpHook.

Parameters:
  • method (str) – http method

  • endpoint (str) – endpoint

  • data (Any | None) – request payload

  • headers (dict[str, Any] | None) – headers

  • retry_args (dict[str, Any] | None) – Arguments which define the retry behaviour. See Tenacity documentation at https://github.com/jd/tenacity

Returns:

http response

Return type:

Any

post_batch(*args, **kwargs)[source]

Perform request to submit batch.

Returns:

batch session id

Return type:

int

get_batch(session_id)[source]

Fetch info about the specified batch.

Parameters:

session_id (int | str) – identifier of the batch sessions

Returns:

response body

Return type:

dict

get_batch_state(session_id, retry_args=None)[source]

Fetch the state of the specified batch.

Parameters:
Returns:

batch state

Return type:

BatchState

delete_batch(session_id)[source]

Delete the specified batch.

Parameters:

session_id (int | str) – identifier of the batch sessions

Returns:

response body

Return type:

dict

get_batch_logs(session_id, log_start_position, log_batch_size)[source]

Get the session logs for a specified batch.

Parameters:
  • session_id (int | str) – identifier of the batch sessions

  • log_start_position – Position from where to pull the logs

  • log_batch_size – Number of lines to pull in one batch

Returns:

response body

Return type:

dict

dump_batch_logs(session_id)[source]

Dump the session logs for a specified batch.

Parameters:

session_id (int | str) – identifier of the batch sessions

Returns:

response body

Return type:

None

static build_post_batch_body(file, args=None, class_name=None, jars=None, py_files=None, files=None, archives=None, name=None, driver_memory=None, driver_cores=None, executor_memory=None, executor_cores=None, num_executors=None, queue=None, proxy_user=None, conf=None)[source]

Build the post batch request body.

See also

For more information about the format refer to https://livy.apache.org/docs/latest/rest-api.html

Parameters:
  • file (str) – Path of the file containing the application to execute (required).

  • proxy_user (str | None) – User to impersonate when running the job.

  • class_name (str | None) – Application Java/Spark main class string.

  • args (collections.abc.Sequence[str | int | float] | None) – Command line arguments for the application s.

  • jars (list[str] | None) – jars to be used in this sessions.

  • py_files (list[str] | None) – Python files to be used in this session.

  • files (list[str] | None) – files to be used in this session.

  • driver_memory (str | None) – Amount of memory to use for the driver process string.

  • driver_cores (int | str | None) – Number of cores to use for the driver process int.

  • executor_memory (str | None) – Amount of memory to use per executor process string.

  • executor_cores (int | None) – Number of cores to use for each executor int.

  • num_executors (int | str | None) – Number of executors to launch for this session int.

  • archives (list[str] | None) – Archives to be used in this session.

  • queue (str | None) – The name of the YARN queue to which submitted string.

  • name (str | None) – The name of this session string.

  • conf (dict[Any, Any] | None) – Spark configuration properties.

Returns:

request body

Return type:

dict

class airflow.providers.apache.livy.hooks.livy.LivyAsyncHook(livy_conn_id=default_conn_name, extra_options=None, extra_headers=None, endpoint_prefix=None)[source]

Bases: airflow.providers.http.hooks.http.HttpAsyncHook

Hook for Apache Livy through the REST API asynchronously.

Parameters:
  • livy_conn_id (str) – reference to a pre-defined Livy Connection.

  • extra_options (dict[str, Any] | None) – A dictionary of options passed to Livy.

  • extra_headers (dict[str, Any] | None) – A dictionary of headers passed to the HTTP request to livy.

See also

For more details refer to the Apache Livy API reference: https://livy.apache.org/docs/latest/rest-api.html

TERMINAL_STATES[source]
conn_name_attr = 'livy_conn_id'[source]
default_conn_name = 'livy_default'[source]
conn_type = 'livy'[source]
hook_name = 'Apache Livy'[source]
method = 'POST'[source]
http_conn_id = 'livy_default'[source]
extra_headers[source]
extra_options[source]
endpoint_prefix = ''[source]
async run_method(endpoint, method='GET', data=None, headers=None)[source]

Wrap HttpAsyncHook; allows to change method on the same HttpAsyncHook.

Parameters:
  • method (str) – http method

  • endpoint (str) – endpoint

  • data (Any | None) – request payload

  • headers (dict[str, Any] | None) – headers

Returns:

http response

Return type:

Any

async get_batch_state(session_id)[source]

Fetch the state of the specified batch asynchronously.

Parameters:

session_id (int | str) – identifier of the batch sessions

Returns:

batch state

Return type:

Any

async get_batch_logs(session_id, log_start_position, log_batch_size)[source]

Get the session logs for a specified batch asynchronously.

Parameters:
  • session_id (int | str) – identifier of the batch sessions

  • log_start_position (int) – Position from where to pull the logs

  • log_batch_size (int) – Number of lines to pull in one batch

Returns:

response body

Return type:

Any

async dump_batch_logs(session_id)[source]

Dump the session logs for a specified batch asynchronously.

Parameters:

session_id (int | str) – identifier of the batch sessions

Returns:

response body

Return type:

Any

static build_post_batch_body(file, args=None, class_name=None, jars=None, py_files=None, files=None, archives=None, name=None, driver_memory=None, driver_cores=None, executor_memory=None, executor_cores=None, num_executors=None, queue=None, proxy_user=None, conf=None)[source]

Build the post batch request body.

Parameters:
  • file (str) – Path of the file containing the application to execute (required).

  • proxy_user (str | None) – User to impersonate when running the job.

  • class_name (str | None) – Application Java/Spark main class string.

  • args (collections.abc.Sequence[str | int | float] | None) – Command line arguments for the application s.

  • jars (list[str] | None) – jars to be used in this sessions.

  • py_files (list[str] | None) – Python files to be used in this session.

  • files (list[str] | None) – files to be used in this session.

  • driver_memory (str | None) – Amount of memory to use for the driver process string.

  • driver_cores (int | str | None) – Number of cores to use for the driver process int.

  • executor_memory (str | None) – Amount of memory to use per executor process string.

  • executor_cores (int | None) – Number of cores to use for each executor int.

  • num_executors (int | str | None) – Number of executors to launch for this session int.

  • archives (list[str] | None) – Archives to be used in this session.

  • queue (str | None) – The name of the YARN queue to which submitted string.

  • name (str | None) – The name of this session string.

  • conf (dict[Any, Any] | None) – Spark configuration properties.

Returns:

request body

Return type:

dict[str, Any]

Was this entry helpful?