airflow.providers.cncf.kubernetes.operators.spark_kubernetes

Module Contents

Classes

SparkKubernetesOperator

Creates sparkApplication object in kubernetes cluster.

class airflow.providers.cncf.kubernetes.operators.spark_kubernetes.SparkKubernetesOperator(*, application_file, namespace=None, kubernetes_conn_id='kubernetes_default', api_group='sparkoperator.k8s.io', api_version='v1beta2', in_cluster=None, cluster_context=None, config_file=None, watch=False, **kwargs)[source]

Bases: airflow.models.BaseOperator

Creates sparkApplication object in kubernetes cluster.

See also

For more detail about Spark Application Object have a look at the reference: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/v1beta2-1.1.0-2.4.5/docs/api-docs.md#sparkapplication

Parameters
  • application_file (str) – Defines Kubernetes ‘custom_resource_definition’ of ‘sparkApplication’ as either a path to a ‘.yaml’ file, ‘.json’ file, YAML string or JSON string.

  • namespace (str | None) – kubernetes namespace to put sparkApplication

  • kubernetes_conn_id (str) – The kubernetes connection id for the to Kubernetes cluster.

  • api_group (str) – kubernetes api group of sparkApplication

  • api_version (str) – kubernetes api version of sparkApplication

  • watch (bool) – whether to watch the job status and logs or not

template_fields: Sequence[str] = ('application_file', 'namespace')[source]
template_ext: Sequence[str] = ('.yaml', '.yml', '.json')[source]
ui_color = '#f4a460'[source]
execute(context)[source]

This is the main method to derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

on_kill()[source]

Override this method to clean up subprocesses when a task instance gets killed.

Any use of the threading, subprocess or multiprocessing module within an operator needs to be cleaned up, or it will leave ghost processes behind.

Was this entry helpful?