DatabricksCopyIntoOperator

Use the DatabricksCopyIntoOperator to import data into Databricks table using COPY INTO command.

Using the Operator

Operator loads data from a specified location into a table using a configured endpoint. The only required parameters are:

  • table_name - string with the table name

  • file_location - string with the URI of data to load

  • file_format - string specifying the file format of data to load. Supported formats are CSV, JSON, AVRO, ORC, PARQUET, TEXT, BINARYFILE.

  • One of sql_endpoint_name (name of Databricks SQL endpoint to use) or http_path (HTTP path for Databricks SQL endpoint or Databricks cluster).

Other parameters are optional and could be found in the class documentation.

Examples

Importing CSV data

An example usage of the DatabricksCopyIntoOperator to import CSV data into a table is as follows:

tests/system/databricks/example_databricks_sql.py[source]

    # Example of importing data using COPY_INTO SQL command
    import_csv = DatabricksCopyIntoOperator(
        task_id="import_csv",
        databricks_conn_id=connection_id,
        sql_endpoint_name=sql_endpoint_name,
        table_name="my_table",
        file_format="CSV",
        file_location="abfss://container@account.dfs.core.windows.net/my-data/csv",
        format_options={"header": "true"},
        force_copy=True,
    )

Was this entry helpful?