:mod:`airflow.hooks.webhdfs_hook` ================================= .. py:module:: airflow.hooks.webhdfs_hook Module Contents --------------- .. data:: _kerberos_security_mode .. data:: log .. py:exception:: AirflowWebHDFSHookException Bases::class:`airflow.exceptions.AirflowException` .. py:class:: WebHDFSHook(webhdfs_conn_id='webhdfs_default', proxy_user=None) Bases::class:`airflow.hooks.base_hook.BaseHook` Interact with HDFS. This class is a wrapper around the hdfscli library. .. method:: get_conn(self) Returns a hdfscli InsecureClient object. .. method:: check_for_path(self, hdfs_path) Check for the existence of a path in HDFS by querying FileStatus. .. method:: load_file(self, source, destination, overwrite=True, parallelism=1, **kwargs) Uploads a file to HDFS :param source: Local path to file or folder. If a folder, all the files inside of it will be uploaded (note that this implies that folders empty of files will not be created remotely). :type source: str :param destination: PTarget HDFS path. If it already exists and is a directory, files will be uploaded inside. :type destination: str :param overwrite: Overwrite any existing file or directory. :type overwrite: bool :param parallelism: Number of threads to use for parallelization. A value of `0` (or negative) uses as many threads as there are files. :type parallelism: int :param \*\*kwargs: Keyword arguments forwarded to :meth:`upload`.