Kerberos

Airflow has initial support for Kerberos. This means that Airflow can renew Kerberos tickets for itself and store it in the ticket cache. The hooks and DAGs can make use of ticket to authenticate against kerberized services.

Limitations

Please note that at this time, not all hooks have been adjusted to make use of this functionality. Also it does not integrate Kerberos into the web interface and you will have to rely on network level security for now to make sure your service remains secure.

Celery integration has not been tried and tested yet. However, if you generate a key tab for every host and launch a ticket renewer next to every worker it will most likely work.

Enabling Kerberos

Airflow

To enable Kerberos you will need to generate a (service) key tab.

# in the kadmin.local or kadmin shell, create the airflow principal
kadmin:  addprinc -randkey airflow/fully.qualified.domain.name@YOUR-REALM.COM

# Create the airflow keytab file that will contain the airflow principal
kadmin:  xst -norandkey -k airflow.keytab airflow/fully.qualified.domain.name

Now store this file in a location where the airflow user can read it (chmod 600). And then add the following to your airflow.cfg

[core]
security = kerberos

[kerberos]
keytab = /etc/airflow/airflow.keytab
reinit_frequency = 3600
principal = airflow

In case you are using Airflow in a docker container based environment, you can set the below environment variables in the Dockerfile instead of modifying airflow.cfg

ENV AIRFLOW__CORE__SECURITY kerberos
ENV AIRFLOW__KERBEROS__KEYTAB /etc/airflow/airflow.keytab
ENV AIRFLOW__KERBEROS__INCLUDE_IP False

If you need more granular options for your Kerberos ticket the following options are available with the following default values:

[kerberos]
# Location of your ccache file once kinit has been performed
ccache = /tmp/airflow_krb5_ccache
# principal gets augmented with fqdn
principal = airflow
reinit_frequency = 3600
kinit_path = kinit
keytab = airflow.keytab

# Allow kerberos token to be flag forwardable or not
forwardable = True

# Allow to include or remove local IP from kerberos token.
# This is particularly useful if you use Airflow inside a VM NATted behind host system IP.
include_ip = True

Keep in mind that Kerberos ticket are generated via kinit and will your use your local krb5.conf by default.

Launch the ticket renewer by

# run ticket renewer
airflow kerberos

To support more advanced deployment models for using kerberos in standard or one-time fashion, you can specify the mode while running the airflow kerberos by using the --one-time flag.

a) standard: The airflow kerberos command will run endlessly. The ticket renewer process runs continuously every few seconds and refreshes the ticket if it has expired. b) one-time: The airflow kerberos will run once and exit. In case of failure the main task won’t spin up.

The default mode is standard.

Example usages:

For standard mode:

airflow kerberos

For one time mode:

airflow kerberos --one-time

Hadoop

If want to use impersonation this needs to be enabled in core-site.xml of your hadoop config.

<property>
  <name>hadoop.proxyuser.airflow.groups</name>
  <value>*</value>
</property>

<property>
  <name>hadoop.proxyuser.airflow.users</name>
  <value>*</value>
</property>

<property>
  <name>hadoop.proxyuser.airflow.hosts</name>
  <value>*</value>
</property>

Of course if you need to tighten your security replace the asterisk with something more appropriate.

Using Kerberos authentication

The Hive hook has been updated to take advantage of Kerberos authentication. To allow your DAGs to use it, simply update the connection details with, for example:

{ "use_beeline": true, "principal": "hive/_HOST@EXAMPLE.COM"}

Adjust the principal to your settings. The _HOST part will be replaced by the fully qualified domain name of the server.

You can specify if you would like to use the DAG owner as the user for the connection or the user specified in the login section of the connection. For the login user, specify the following as extra:

{ "use_beeline": true, "principal": "hive/_HOST@EXAMPLE.COM", "proxy_user": "login"}

For the DAG owner use:

{ "use_beeline": true, "principal": "hive/_HOST@EXAMPLE.COM", "proxy_user": "owner"}

and in your DAG, when initializing the HiveOperator, specify:

run_as_owner=True

To use kerberos authentication, you must install Airflow with the kerberos extras group:

pip install 'apache-airflow[kerberos]'

You can read about some production aspects of Kerberos deployment at Kerberos-authenticated workers

Was this entry helpful?