airflow.providers.cncf.kubernetes.operators.spark_kubernetes
¶
Module Contents¶
Classes¶
Creates sparkApplication object in kubernetes cluster. |
- class airflow.providers.cncf.kubernetes.operators.spark_kubernetes.SparkKubernetesOperator(*, image=None, code_path=None, namespace='default', name='default', application_file=None, template_spec=None, get_logs=True, do_xcom_push=False, success_run_history_limit=1, startup_timeout_seconds=600, log_events_on_failure=False, reattach_on_restart=True, delete_on_termination=True, kubernetes_conn_id='kubernetes_default', **kwargs)[source]¶
Bases:
airflow.providers.cncf.kubernetes.operators.pod.KubernetesPodOperator
Creates sparkApplication object in kubernetes cluster.
See also
For more detail about Spark Application Object have a look at the reference: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/v1beta2-1.3.3-3.1.1/docs/api-docs.md#sparkapplication
- Parameters
application_file (str | None) – filepath to kubernetes custom_resource_definition of sparkApplication
kubernetes_conn_id (str) – the connection to Kubernetes cluster
image (str | None) – Docker image you wish to launch. Defaults to hub.docker.com,
code_path (str | None) – path to the spark code in image,
namespace (str) – kubernetes namespace to put sparkApplication
cluster_context – context of the cluster
application_file – yaml file if passed
get_logs (bool) – get the stdout of the container as logs of the tasks.
do_xcom_push (bool) – If True, the content of the file /airflow/xcom/return.json in the container will also be pushed to an XCom when the container completes.
success_run_history_limit (int) – Number of past successful runs of the application to keep.
delete_on_termination (bool) – What to do when the pod reaches its final state, or the execution is interrupted. If True (default), delete the pod; if False, leave the pod.
startup_timeout_seconds – timeout in seconds to startup the pod.
log_events_on_failure (bool) – Log the pod’s events if a failure occurs
reattach_on_restart (bool) – if the scheduler dies while the pod is running, reattach and monitor
- static create_labels_for_pod(context=None, include_try_number=True)[source]¶
Generate labels for the pod to track the pod in case of Operator crash.
- execute(context)[source]¶
Based on the deferrable parameter runs the pod asynchronously or synchronously.
- on_kill()[source]¶
Override this method to clean up subprocesses when a task instance gets killed.
Any use of the threading, subprocess or multiprocessing module within an operator needs to be cleaned up, or it will leave ghost processes behind.