Apache Livy Operators¶
Apache Livy is a service that enables easy interaction with a Spark cluster over a REST interface. It enables easy submission of Spark jobs or snippets of Spark code, synchronous or asynchronous result retrieval, as well as Spark Context management, all via a simple REST interface or an RPC client library.
LivyOperator¶
This operator wraps the Apache Livy batch REST API, allowing to submit a Spark application to the underlying cluster.
livy_java_task = LivyOperator(
task_id="pi_java_task",
file="/spark-examples.jar",
num_executors=1,
conf={
"spark.shuffle.compress": "false",
},
class_name="org.apache.spark.examples.SparkPi",
)
livy_python_task = LivyOperator(task_id="pi_python_task", file="/pi.py", polling_interval=60)
livy_java_task >> livy_python_task
You can also run this operator in deferrable mode by setting the parameter deferrable
to True.
This will lead to efficient utilization of Airflow workers as polling for job status happens on
the triggerer asynchronously. Note that this will need triggerer to be available on your Airflow deployment.
livy_java_task_deferrable = LivyOperator(
task_id="livy_java_task_deferrable",
file="/spark-examples.jar",
num_executors=1,
conf={
"spark.shuffle.compress": "false",
},
class_name="org.apache.spark.examples.SparkPi",
deferrable=True,
)
livy_python_task_deferrable = LivyOperator(
task_id="livy_python_task_deferrable", file="/pi.py", polling_interval=60, deferrable=True
)
livy_java_task_deferrable >> livy_python_task_deferrable
Reference¶
For further information, look at Apache Livy.