airflow.providers.apache.livy.hooks.livy

This module contains the Apache Livy hook.

Module Contents

Classes

BatchState

Batch session states.

LivyHook

Hook for Apache Livy through the REST API.

LivyAsyncHook

Hook for Apache Livy through the REST API asynchronously.

class airflow.providers.apache.livy.hooks.livy.BatchState[source]

Bases: enum.Enum

Batch session states.

NOT_STARTED = 'not_started'[source]
STARTING = 'starting'[source]
RUNNING = 'running'[source]
IDLE = 'idle'[source]
BUSY = 'busy'[source]
SHUTTING_DOWN = 'shutting_down'[source]
ERROR = 'error'[source]
DEAD = 'dead'[source]
KILLED = 'killed'[source]
SUCCESS = 'success'[source]
class airflow.providers.apache.livy.hooks.livy.LivyHook(livy_conn_id=default_conn_name, extra_options=None, extra_headers=None, auth_type=None)[source]

Bases: airflow.providers.http.hooks.http.HttpHook, airflow.utils.log.logging_mixin.LoggingMixin

Hook for Apache Livy through the REST API.

Parameters
  • livy_conn_id (str) – reference to a pre-defined Livy Connection.

  • extra_options (dict[str, Any] | None) – A dictionary of options passed to Livy.

  • extra_headers (dict[str, Any] | None) – A dictionary of headers passed to the HTTP request to livy.

  • auth_type (Any | None) – The auth type for the service.

See also

For more details refer to the Apache Livy API reference: https://livy.apache.org/docs/latest/rest-api.html

TERMINAL_STATES[source]
conn_name_attr = 'livy_conn_id'[source]
default_conn_name = 'livy_default'[source]
conn_type = 'livy'[source]
hook_name = 'Apache Livy'[source]
get_conn(headers=None)[source]

Return http session for use with requests.

Parameters

headers (dict[str, Any] | None) – additional headers to be passed through as a dictionary

Returns

requests session

Return type

Any

run_method(endpoint, method='GET', data=None, headers=None, retry_args=None)[source]

Wrap HttpHook; allows to change method on the same HttpHook.

Parameters
  • method (str) – http method

  • endpoint (str) – endpoint

  • data (Any | None) – request payload

  • headers (dict[str, Any] | None) – headers

  • retry_args (dict[str, Any] | None) – Arguments which define the retry behaviour. See Tenacity documentation at https://github.com/jd/tenacity

Returns

http response

Return type

Any

post_batch(*args, **kwargs)[source]

Perform request to submit batch.

Returns

batch session id

Return type

int

get_batch(session_id)[source]

Fetch info about the specified batch.

Parameters

session_id (int | str) – identifier of the batch sessions

Returns

response body

Return type

dict

get_batch_state(session_id, retry_args=None)[source]

Fetch the state of the specified batch.

Parameters
Returns

batch state

Return type

BatchState

delete_batch(session_id)[source]

Delete the specified batch.

Parameters

session_id (int | str) – identifier of the batch sessions

Returns

response body

Return type

dict

get_batch_logs(session_id, log_start_position, log_batch_size)[source]

Get the session logs for a specified batch.

Parameters
  • session_id (int | str) – identifier of the batch sessions

  • log_start_position – Position from where to pull the logs

  • log_batch_size – Number of lines to pull in one batch

Returns

response body

Return type

dict

dump_batch_logs(session_id)[source]

Dump the session logs for a specified batch.

Parameters

session_id (int | str) – identifier of the batch sessions

Returns

response body

Return type

None

static build_post_batch_body(file, args=None, class_name=None, jars=None, py_files=None, files=None, archives=None, name=None, driver_memory=None, driver_cores=None, executor_memory=None, executor_cores=None, num_executors=None, queue=None, proxy_user=None, conf=None)[source]

Build the post batch request body.

See also

For more information about the format refer to https://livy.apache.org/docs/latest/rest-api.html

Parameters
  • file (str) – Path of the file containing the application to execute (required).

  • proxy_user (str | None) – User to impersonate when running the job.

  • class_name (str | None) – Application Java/Spark main class string.

  • args (Sequence[str | int | float] | None) – Command line arguments for the application s.

  • jars (list[str] | None) – jars to be used in this sessions.

  • py_files (list[str] | None) – Python files to be used in this session.

  • files (list[str] | None) – files to be used in this session.

  • driver_memory (str | None) – Amount of memory to use for the driver process string.

  • driver_cores (int | str | None) – Number of cores to use for the driver process int.

  • executor_memory (str | None) – Amount of memory to use per executor process string.

  • executor_cores (int | None) – Number of cores to use for each executor int.

  • num_executors (int | str | None) – Number of executors to launch for this session int.

  • archives (list[str] | None) – Archives to be used in this session.

  • queue (str | None) – The name of the YARN queue to which submitted string.

  • name (str | None) – The name of this session string.

  • conf (dict[Any, Any] | None) – Spark configuration properties.

Returns

request body

Return type

dict

class airflow.providers.apache.livy.hooks.livy.LivyAsyncHook(livy_conn_id=default_conn_name, extra_options=None, extra_headers=None)[source]

Bases: airflow.providers.http.hooks.http.HttpAsyncHook, airflow.utils.log.logging_mixin.LoggingMixin

Hook for Apache Livy through the REST API asynchronously.

Parameters
  • livy_conn_id (str) – reference to a pre-defined Livy Connection.

  • extra_options (dict[str, Any] | None) – A dictionary of options passed to Livy.

  • extra_headers (dict[str, Any] | None) – A dictionary of headers passed to the HTTP request to livy.

See also

For more details refer to the Apache Livy API reference: https://livy.apache.org/docs/latest/rest-api.html

TERMINAL_STATES[source]
conn_name_attr = 'livy_conn_id'[source]
default_conn_name = 'livy_default'[source]
conn_type = 'livy'[source]
hook_name = 'Apache Livy'[source]
async run_method(endpoint, method='GET', data=None, headers=None)[source]

Wrap HttpAsyncHook; allows to change method on the same HttpAsyncHook.

Parameters
  • method (str) – http method

  • endpoint (str) – endpoint

  • data (Any | None) – request payload

  • headers (dict[str, Any] | None) – headers

Returns

http response

Return type

Any

async get_batch_state(session_id)[source]

Fetch the state of the specified batch asynchronously.

Parameters

session_id (int | str) – identifier of the batch sessions

Returns

batch state

Return type

Any

async get_batch_logs(session_id, log_start_position, log_batch_size)[source]

Get the session logs for a specified batch asynchronously.

Parameters
  • session_id (int | str) – identifier of the batch sessions

  • log_start_position (int) – Position from where to pull the logs

  • log_batch_size (int) – Number of lines to pull in one batch

Returns

response body

Return type

Any

async dump_batch_logs(session_id)[source]

Dump the session logs for a specified batch asynchronously.

Parameters

session_id (int | str) – identifier of the batch sessions

Returns

response body

Return type

Any

static build_post_batch_body(file, args=None, class_name=None, jars=None, py_files=None, files=None, archives=None, name=None, driver_memory=None, driver_cores=None, executor_memory=None, executor_cores=None, num_executors=None, queue=None, proxy_user=None, conf=None)[source]

Build the post batch request body.

Parameters
  • file (str) – Path of the file containing the application to execute (required).

  • proxy_user (str | None) – User to impersonate when running the job.

  • class_name (str | None) – Application Java/Spark main class string.

  • args (Sequence[str | int | float] | None) – Command line arguments for the application s.

  • jars (list[str] | None) – jars to be used in this sessions.

  • py_files (list[str] | None) – Python files to be used in this session.

  • files (list[str] | None) – files to be used in this session.

  • driver_memory (str | None) – Amount of memory to use for the driver process string.

  • driver_cores (int | str | None) – Number of cores to use for the driver process int.

  • executor_memory (str | None) – Amount of memory to use per executor process string.

  • executor_cores (int | None) – Number of cores to use for each executor int.

  • num_executors (int | str | None) – Number of executors to launch for this session int.

  • archives (list[str] | None) – Archives to be used in this session.

  • queue (str | None) – The name of the YARN queue to which submitted string.

  • name (str | None) – The name of this session string.

  • conf (dict[Any, Any] | None) – Spark configuration properties.

Returns

request body

Return type

dict[str, Any]

Was this entry helpful?