airflow.providers.apache.hive.operators.hive

Module Contents

Classes

HiveOperator

Executes hql code or hive script in a specific Hive database.

class airflow.providers.apache.hive.operators.hive.HiveOperator(*, hql, hive_cli_conn_id='hive_cli_default', schema='default', hiveconfs=None, hiveconf_jinja_translate=False, script_begin_tag=None, mapred_queue=None, mapred_queue_priority=None, mapred_job_name=None, hive_cli_params='', auth=None, proxy_user=None, **kwargs)[source]

Bases: airflow.models.BaseOperator

Executes hql code or hive script in a specific Hive database.

Parameters
  • hql (str) – the hql to be executed. Note that you may also use a relative path from the dag file of a (template) hive script. (templated)

  • hive_cli_conn_id (str) – Reference to the Hive CLI connection id. (templated)

  • hiveconfs (dict[Any, Any] | None) – if defined, these key value pairs will be passed to hive as -hiveconf "key"="value"

  • hiveconf_jinja_translate (bool) – when True, hiveconf-type templating ${var} gets translated into jinja-type templating {{ var }} and ${hiveconf:var} gets translated into jinja-type templating {{ var }}. Note that you may want to use this along with the DAG(user_defined_macros=myargs) parameter. View the DAG object documentation for more details.

  • script_begin_tag (str | None) – If defined, the operator will get rid of the part of the script before the first occurrence of script_begin_tag

  • mapred_queue (str | None) – queue used by the Hadoop CapacityScheduler. (templated)

  • mapred_queue_priority (str | None) – priority within CapacityScheduler queue. Possible settings include: VERY_HIGH, HIGH, NORMAL, LOW, VERY_LOW

  • mapred_job_name (str | None) – This name will appear in the jobtracker. This can make monitoring easier.

  • hive_cli_params (str) – parameters passed to hive CLO

  • auth (str | None) – optional authentication option passed for the Hive connection

  • proxy_user (str | None) – Run HQL code as this user.

template_fields: Sequence[str] = ('hql', 'schema', 'hive_cli_conn_id', 'mapred_queue', 'hiveconfs', 'mapred_job_name',...[source]
template_ext: Sequence[str] = ('.hql', '.sql')[source]
template_fields_renderers[source]
ui_color = '#f0e4ec'[source]
hook()[source]

Get Hive cli hook.

prepare_template()[source]

Execute after the templated fields get replaced by their content.

If you need your object to alter the content of the file before the template is rendered, it should override this method to do so.

execute(context)[source]

Derive when creating an operator.

Context is the same dictionary used as when rendering jinja templates.

Refer to get_template_context for more context.

dry_run()[source]

Perform dry run for the operator - just render template fields.

on_kill()[source]

Override this method to clean up subprocesses when a task instance gets killed.

Any use of the threading, subprocess or multiprocessing module within an operator needs to be cleaned up, or it will leave ghost processes behind.

clear_airflow_vars()[source]

Reset airflow environment variables to prevent existing ones from impacting behavior.

Was this entry helpful?