airflow.providers.common.sql.hooks.sql

Module Contents

Classes

ConnectorProtocol

Database connection protocol.

DbApiHook

Abstract base class for sql hooks.

Functions

return_single_query_results(sql, return_last, ...)

Determine when results of single query only should be returned.

fetch_all_handler(cursor)

Return results for DbApiHook.run().

fetch_one_handler(cursor)

Return first result for DbApiHook.run().

Attributes

T

SQL_PLACEHOLDERS

airflow.providers.common.sql.hooks.sql.T[source]
airflow.providers.common.sql.hooks.sql.SQL_PLACEHOLDERS[source]
airflow.providers.common.sql.hooks.sql.return_single_query_results(sql, return_last, split_statements)[source]

Determine when results of single query only should be returned.

For compatibility reasons, the behaviour of the DBAPIHook is somewhat confusing. In some cases, when multiple queries are run, the return value will be an iterable (list) of results – one for each query. However, in other cases, when single query is run, the return value will be just the result of that single query without wrapping the results in a list.

The cases when single query results are returned without wrapping them in a list are as follows:

  1. sql is string and return_last is True (regardless what split_statements value is)

  2. sql is string and split_statements is False

In all other cases, the results are wrapped in a list, even if there is only one statement to process. In particular, the return value will be a list of query results in the following circumstances:

  1. when sql is an iterable of string statements (regardless what return_last value is)

  2. when sql is string, split_statements is True and return_last is False

Parameters
  • sql (str | Iterable[str]) – sql to run (either string or list of strings)

  • return_last (bool) – whether last statement output should only be returned

  • split_statements (bool) – whether to split string statements.

Returns

True if the hook should return single query results

airflow.providers.common.sql.hooks.sql.fetch_all_handler(cursor)[source]

Return results for DbApiHook.run().

airflow.providers.common.sql.hooks.sql.fetch_one_handler(cursor)[source]

Return first result for DbApiHook.run().

class airflow.providers.common.sql.hooks.sql.ConnectorProtocol[source]

Bases: Protocol

Database connection protocol.

connect(host, port, username, schema)[source]

Connect to a database.

Parameters
  • host (str) – The database host to connect to.

  • port (int) – The database port to connect to.

  • username (str) – The database username used for the authentication.

  • schema (str) – The database schema to connect to.

Returns

the authorized connection object.

Return type

Any

class airflow.providers.common.sql.hooks.sql.DbApiHook(*args, schema=None, log_sql=True, **kwargs)[source]

Bases: airflow.hooks.base.BaseHook

Abstract base class for sql hooks.

When subclassing, maintainers can override the _make_common_data_structure method: This method transforms the result of the handler method (typically cursor.fetchall()) into objects common across all Hooks derived from this class (tuples). Most of the time, the underlying SQL library already returns tuples from its cursor, and the _make_common_data_structure method can be ignored.

Parameters
  • schema (str | None) – Optional DB schema that overrides the schema specified in the connection. Make sure that if you change the schema parameter value in the constructor of the derived Hook, such change should be done before calling the DBApiHook.__init__().

  • log_sql (bool) – Whether to log SQL query when it’s executed. Defaults to True.

property sqlalchemy_url: sqlalchemy.engine.URL[source]

Return a Sqlalchemy.engine.URL object from the connection.

Needs to be implemented in the provider subclass to return the sqlalchemy.engine.URL object.

Returns

the extracted sqlalchemy.engine.URL object.

Return type

sqlalchemy.engine.URL

property inspector: sqlalchemy.engine.Inspector[source]
property last_description: Sequence[Sequence] | None[source]
conn_name_attr: str[source]
default_conn_name = 'default_conn_id'[source]
supports_autocommit = False[source]
supports_executemany = False[source]
connector: ConnectorProtocol | None[source]
get_conn_id()[source]
placeholder()[source]
get_conn()[source]

Return a connection object.

get_uri()[source]

Extract the URI from the connection.

Returns

the extracted uri.

Return type

str

get_sqlalchemy_engine(engine_kwargs=None)[source]

Get an sqlalchemy_engine object.

Parameters

engine_kwargs – Kwargs used in create_engine().

Returns

the created engine.

get_pandas_df(sql, parameters=None, **kwargs)[source]

Execute the sql and returns a pandas dataframe.

Parameters
  • sql – the sql statement to be executed (str) or a list of sql statements to execute

  • parameters (list | tuple | Mapping[str, Any] | None) – The parameters to render the SQL query with.

  • kwargs – (optional) passed into pandas.io.sql.read_sql method

get_pandas_df_by_chunks(sql, parameters=None, *, chunksize, **kwargs)[source]

Execute the sql and return a generator.

Parameters
  • sql – the sql statement to be executed (str) or a list of sql statements to execute

  • parameters (list | tuple | Mapping[str, Any] | None) – The parameters to render the SQL query with

  • chunksize (int) – number of rows to include in each chunk

  • kwargs – (optional) passed into pandas.io.sql.read_sql method

get_records(sql, parameters=None)[source]

Execute the sql and return a set of records.

Parameters
  • sql (str | list[str]) – the sql statement to be executed (str) or a list of sql statements to execute

  • parameters (Iterable | Mapping[str, Any] | None) – The parameters to render the SQL query with.

get_first(sql, parameters=None)[source]

Execute the sql and return the first resulting row.

Parameters
  • sql (str | list[str]) – the sql statement to be executed (str) or a list of sql statements to execute

  • parameters (Iterable | Mapping[str, Any] | None) – The parameters to render the SQL query with.

static strip_sql_string(sql)[source]
static split_sql_string(sql)[source]

Split string into multiple SQL expressions.

Parameters

sql (str) – SQL string potentially consisting of multiple expressions

Returns

list of individual expressions

Return type

list[str]

run(sql: str | Iterable[str], autocommit: bool = ..., parameters: Iterable | Mapping[str, Any] | None = ..., handler: None = ..., split_statements: bool = ..., return_last: bool = ...) None[source]
run(sql: str | Iterable[str], autocommit: bool = ..., parameters: Iterable | Mapping[str, Any] | None = ..., handler: Callable[[Any], T] = ..., split_statements: bool = ..., return_last: bool = ...) tuple | list[tuple] | list[list[tuple] | tuple] | None

Run a command or a list of commands.

Pass a list of SQL statements to the sql parameter to get them to execute sequentially.

The method will return either single query results (typically list of rows) or list of those results where each element in the list are results of one of the queries (typically list of list of rows :D)

For compatibility reasons, the behaviour of the DBAPIHook is somewhat confusing. In some cases, when multiple queries are run, the return value will be an iterable (list) of results – one for each query. However, in other cases, when single query is run, the return value will be the result of that single query without wrapping the results in a list.

The cases when single query results are returned without wrapping them in a list are as follows:

  1. sql is string and return_last is True (regardless what split_statements value is)

  2. sql is string and split_statements is False

In all other cases, the results are wrapped in a list, even if there is only one statement to process. In particular, the return value will be a list of query results in the following circumstances:

  1. when sql is an iterable of string statements (regardless what return_last value is)

  2. when sql is string, split_statements is True and return_last is False

After run is called, you may access the following properties on the hook object:

  • descriptions: an array of cursor descriptions. If return_last is True, this will be a one-element array containing the cursor description for the last statement. Otherwise, it will contain the cursor description for each statement executed.

  • last_description: the description for the last statement executed

Note that query result will ONLY be actually returned when a handler is provided; if handler is None, this method will return None.

Handler is a way to process the rows from cursor (Iterator) into a value that is suitable to be returned to XCom and generally fit in memory.

You can use pre-defined handles (fetch_all_handler, fetch_one_handler) or implement your own handler.

Parameters
  • sql – the sql statement to be executed (str) or a list of sql statements to execute

  • autocommit – What to set the connection’s autocommit setting to before executing the query.

  • parameters – The parameters to render the SQL query with.

  • handler – The result handler which is called with the result of each statement.

  • split_statements – Whether to split a single SQL string into statements and run separately

  • return_last – Whether to return result for only last statement or for all after split

Returns

if handler provided, returns query results (may be list of results depending on params)

set_autocommit(conn, autocommit)[source]

Set the autocommit flag on the connection.

get_autocommit(conn)[source]

Get autocommit setting for the provided connection.

Parameters

conn – Connection to get autocommit setting from.

Returns

connection autocommit setting. True if autocommit is set to True on the connection. False if it is either not set, set to False, or the connection does not support auto-commit.

Return type

bool

get_cursor()[source]

Return a cursor.

insert_rows(table, rows, target_fields=None, commit_every=1000, replace=False, *, executemany=False, autocommit=False, **kwargs)[source]

Insert a collection of tuples into a table.

Rows are inserted in chunks, each chunk (of size commit_every) is done in a new transaction.

Parameters
  • table – Name of the target table

  • rows – The rows to insert into the table

  • target_fields – The names of the columns to fill in the table

  • commit_every – The maximum number of rows to insert in one transaction. Set to 0 to insert all rows in one transaction.

  • replace – Whether to replace instead of insert

  • executemany – If True, all rows are inserted at once in chunks defined by the commit_every parameter. This only works if all rows have same number of column names, but leads to better performance.

  • autocommit – What to set the connection’s autocommit setting to before executing the query.

abstract bulk_dump(table, tmp_file)[source]

Dump a database table into a tab-delimited file.

Parameters
  • table – The name of the source table

  • tmp_file – The path of the target file

abstract bulk_load(table, tmp_file)[source]

Load a tab-delimited file into a database table.

Parameters
  • table – The name of the target table

  • tmp_file – The path of the file to load into the table

test_connection()[source]

Tests the connection using db-specific query.

get_openlineage_database_info(connection)[source]

Return database specific information needed to generate and parse lineage metadata.

This includes information helpful for constructing information schema query and creating correct namespace.

Parameters

connection – Airflow connection to reduce calls of get_connection method

get_openlineage_database_dialect(connection)[source]

Return database dialect used for SQL parsing.

For a list of supported dialects check: https://openlineage.io/docs/development/sql#sql-dialects

get_openlineage_default_schema()[source]

Return default schema specific to database.

get_openlineage_database_specific_lineage(task_instance)[source]

Return additional database specific lineage, e.g. query execution information.

This method is called only on completion of the task.

Parameters

task_instance – this may be used to retrieve additional information that is collected during runtime of the task

static get_openlineage_authority_part(connection, default_port=None)[source]

Get authority part from Airflow Connection.

The authority represents the hostname and port of the connection and conforms OpenLineage naming convention for a number of databases (e.g. MySQL, Postgres, Trino).

Parameters

default_port (int | None) – (optional) used if no port parsed from connection URI

Was this entry helpful?