How to create your own provider¶
Custom provider packages¶
You can develop and release your own providers. Your custom operators, hooks, sensors, transfer operators can be packaged together in a standard airflow package and installed using the same mechanisms. Moreover they can also use the same mechanisms to extend the Airflow Core with auth backends, custom connections, logging, secret backends and extra operator links as described in the previous chapter.
As mentioned in the Providers documentation, custom providers can extend Airflow core - they can add extra links to operators as well as custom connections. You can use build your own providers and install them as packages if you would like to use the mechanism for your own, custom providers.
How to create a provider¶
Adding a provider to Airflow is just a matter of building a Python package and adding the right meta-data to
the package. We are using standard mechanism of python to define
entry points . Your package
needs to define appropriate entry-point apache_airflow_provider
which has to point to a callable
implemented by your package and return a dictionary containing the list of discoverable capabilities
of your package. The dictionary has to follow the
json-schema specification.
Most of the schema provides extension point for the documentation (which you might want to also use for your own purpose) but the important fields from the extensibility point of view are those:
Displaying package information in CLI/API:
package-name
- Name of the package for the provider.name
- Human-friendly name of the provider.description
- Additional description of the provider.version
- List of versions of the package (in reverse-chronological order). The first version in the list is the current package version. It is taken from the version of package installed, not from the provider_info information.
Exposing customized functionality to the Airflow’s core:
extra-links
- this field should contain the list of all the operator class names that are adding extra links capability. See Define an operator extra link for description of how to add extra link capability to the operators of yours.connection-types
- this field should contain the list of all the connection types together with hook class names implementing those custom connection types (providing custom extra fields and custom field behaviour). This field is available as of Airflow 2.2.0 and it replaces deprecatedhook-class-names
. See Managing Connections for more details.secret-backends
- this field should contain the list of all the secret backends class names that the provider provides. See Secrets Backend for description of how to add.task-decorators
- this field should contain the list of dictionaries of name/path where the decorators are available. See Creating Custom @task Decorators for description of how to add custom decorators.logging
- this field should contain the list of all the logging handler class names that the provider provides. See Logging for Tasks for description of the logging handlers.auth-backends
- this field should contain the authentication backend module names for API/UI. See API for description of the auth backends.notifications
- this field should contain the notification classes. See Creating a notifier for description of the notifications.executors
- this field should contain the executor class names. See Executor for description of the executors.config
- this field should contain dictionary that should conform to theairflow/config_templates/config.yml.schema.json
with configuration contributed by the providers See Setting Configuration Options for details about setting configuration.filesystems
- this field should contain the list of all the filesystem module names. See Object Storage for description of the filesystems.
dataset-uris
- this field should contain the list of the URI schemes together with class names implementing normalization functions. See Data-aware scheduling for description of the dataset URIs.
Note
Deprecated values
hook-class-names
(deprecated) - this field should contain the list of all hook class names that provide custom connection types with custom extra fields and field behaviour. Thehook-class-names
array is deprecated as of Airflow 2.2.0 (for optimization reasons) and will be removed in Airflow 3. If your providers are targeting Airflow 2.2.0+ you do not have to include thehook-class-names
array, if you want to also target earlier versions of Airflow 2, you should include bothhook-class-names
andconnection-types
arrays. See Managing Connections for more details.
When your providers are installed you can query the installed providers and their capabilities with the
airflow providers
command. This way you can verify if your providers are properly recognized and whether
they define the extensions properly. See Command Line Interface and Environment Variables Reference for details of available CLI
sub-commands.
When you write your own provider, consider following the Naming conventions for provider packages
Special considerations¶
Optional provider features¶
New in version 2.3.0: This feature is available in Airflow 2.3+.
Some providers might provide optional features, which are only available when some packages or libraries
are installed. Such features will typically result in ImportErrors
; however, those import errors
should be silently ignored rather than pollute the logs of Airflow with false warnings. False warnings
are a very bad pattern, as they tend to turn into blind spots, so avoiding false warnings is encouraged.
However, until Airflow 2.3, Airflow had no mechanism to selectively ignore “known” ImportErrors. So
Airflow 2.1 and 2.2 silently ignored all ImportErrors coming from providers with actually lead to
ignoring even important import errors - without giving the clue to Airflow users that there is something
missing in provider dependencies.
Using Providers with dynamic task mapping¶
Airflow 2.3 added Dynamic Task Mapping and it added the possibility of assigning a unique key to each task. Which means that when such dynamically mapped task wants to retrieve a value from XCom (for example in case an extra link should calculated) it should always check if the ti_key value passed is not None an only then retrieve the XCom value using XCom.get_value. This allows to keep backwards compatibility with earlier versions of Airflow.
Typical code to access XCom Value in providers that want to keep backwards compatibility should look similar to
this (note the if ti_key is not None:
condition).
def get_link( self, operator: BaseOperator, dttm: datetime | None = None, ti_key: "TaskInstanceKey" | None = None, ): if ti_key is not None: job_ids = XCom.get_value(key="job_id", ti_key=ti_key) else: assert dttm is not None job_ids = XCom.get_one( key="job_id", dag_id=operator.dag.dag_id, task_id=operator.task_id, execution_date=dttm, ) if not job_ids: return None if len(job_ids) < self.index: return None job_id = job_ids[self.index] return BIGQUERY_JOB_DETAILS_LINK_FMT.format(job_id=job_id)
FAQ for custom providers¶
When I write my own provider, do I need to do anything special to make it available to others?
You do not need to do anything special besides creating the apache_airflow_provider
entry point
returning properly formatted meta-data - dictionary with extra-links
and connection-types
fields
(and deprecated hook-class-names
field if you are also targeting versions of Airflow before 2.2.0).
Anyone who runs airflow in an environment that has your Python package installed will be able to use the package as a provider package.
Should I name my provider specifically or should it be created in ``airflow.providers`` package?
We have quite a number (>80) of providers managed by the community and we are going to maintain them
together with Apache Airflow. All those providers have well-defined structured and follow the
naming conventions we defined and they are all in airflow.providers
package. If your intention is
to contribute your provider, then you should follow those conventions and make a PR to Apache Airflow
to contribute to it. But you are free to use any package name as long as there are no conflicts with other
names, so preferably choose package that is in your “domain”.
What do I need to do to turn a package into a provider?
You need to do the following to turn an existing Python package into a provider (see below for examples):
Add the
apache_airflow_provider
entry point in thepyproject.toml
file - this tells airflow where to get the required provider metadataCreate the function that you refer to in the first step as part of your package: this functions returns a dictionary that contains all meta-data about your provider package
If you want Airflow to link to documentation of your Provider in the providers page, make sure to add “project-url/documentation” metadata to your package. This will also add link to your documentation in PyPI.
note that the dictionary should be compliant with
airflow/provider_info.schema.json
JSON-schema specification. The community-managed providers have more fields there that are used to build documentation, but the requirement for runtime information only contains several fields which are defined in the schema:
airflow/provider_info.schema.json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"package-name": {
"description": "Package name available under which the package is available in the PyPI repository.",
"type": "string"
},
"name": {
"description": "Provider name",
"type": "string"
},
"description": {
"description": "Information about the package in RST format",
"type": "string"
},
"hook-class-names": {
"type": "array",
"description": "Hook class names that provide connection types to core (deprecated by connection-types)",
"items": {
"type": "string"
},
"deprecated": {
"description": "The hook-class-names property has been deprecated in favour of connection-types which is more performant version allowing to only import individual Hooks rather than all hooks at once",
"deprecatedVersion": "2.2.0"
}
},
"filesystems": {
"type": "array",
"description": "Filesystem module names",
"items": {
"type": "string"
}
},
"transfers": {
"type": "array",
"items": {
"type": "object",
"properties": {
"how-to-guide": {
"description": "Path to how-to-guide for the transfer. The path must start with '/docs/'",
"type": "string"
},
"source-integration-name": {
"type": "string",
"description": "Integration name. It must have a matching item in the 'integration' section of any provider."
},
"target-integration-name": {
"type": "string",
"description": "Target integration name. It must have a matching item in the 'integration' section of any provider."
},
"python-module": {
"type": "string",
"description": "List of python modules containing the transfers."
}
},
"additionalProperties": false,
"required": [
"source-integration-name",
"target-integration-name",
"python-module"
]
}
},
"triggers": {
"type": "array",
"items": {
"type": "object",
"properties": {
"integration-name": {
"type": "string",
"description": "Integration name. It must have a matching item in the 'integration' section of any provider."
},
"python-modules": {
"description": "List of Python modules containing the triggers.",
"type": "array",
"items": {
"type": "string"
}
}
},
"additionalProperties": false,
"required": [
"integration-name",
"python-modules"
]
}
},
"connection-types": {
"type": "array",
"description": "Map of connection types mapped to hook class names.",
"items": {
"type": "object",
"properties": {
"connection-type": {
"description": "Type of connection defined by the provider",
"type": "string"
},
"hook-class-name": {
"description": "Hook class name that implements the connection type",
"type": "string"
}
},
"required": [
"connection-type",
"hook-class-name"
]
}
},
"extra-links": {
"type": "array",
"description": "Operator class names that provide extra link functionality",
"items": {
"type": "string"
}
},
"secrets-backends": {
"type": "array",
"description": "Secrets Backend class names",
"items": {
"type": "string"
}
},
"logging": {
"type": "array",
"description": "Logging Task Handlers class names",
"items": {
"type": "string"
}
},
"auth-backends": {
"type": "array",
"description": "API Auth Backend module names",
"items": {
"type": "string"
}
},
"auth-managers": {
"type": "array",
"description": "Auth managers module names",
"items": {
"type": "string"
}
},
"notifications": {
"type": "array",
"description": "Notification class names",
"items": {
"type": "string"
}
},
"executors": {
"type": "array",
"description": "Executor class names",
"items": {
"type": "string"
}
},
"config": {
"type": "object",
"additionalProperties": {
"type": "object",
"properties": {
"description": {
"type": [
"string",
"null"
]
},
"options": {
"type": "object",
"additionalProperties": {
"$ref": "#/definitions/option"
}
},
"renamed": {
"type": "object",
"properties": {
"previous_name": {
"type": "string"
},
"version": {
"type": "string"
}
}
}
},
"required": [
"description",
"options"
],
"additionalProperties": false
}
},
"task-decorators": {
"type": "array",
"description": "Apply custom decorators to the TaskFlow API. Can be accessed by users via '@task.<name>'",
"items": {
"name": {
"type": "string"
},
"path": {
"type": "string"
}
}
}
},
"definitions": {
"option": {
"type": "object",
"properties": {
"description": {
"type": [
"string",
"null"
]
},
"version_added": {
"type": [
"string",
"null"
]
},
"type": {
"type": "string",
"enum": [
"string",
"boolean",
"integer",
"float"
]
},
"example": {
"type": [
"string",
"null",
"number"
]
},
"default": {
"type": [
"string",
"null",
"number"
]
},
"sensitive": {
"type": "boolean",
"description": "When true, this option is sensitive and can be specified using AIRFLOW__{section}___{name}__SECRET or AIRFLOW__{section}___{name}_CMD environment variables. See: airflow.configuration.AirflowConfigParser.sensitive_config_values"
}
},
"required": [
"description",
"version_added",
"type",
"example",
"default"
],
"additional_properties": false
}
},
"required": [
"name",
"description"
]
}
Example pyproject.toml
:
[project.entry-points."apache_airflow_provider"]
provider_info = "airflow.providers.myproviderpackage.get_provider_info:get_provider_info"
Example myproviderpackage/get_provider_info.py
:
def get_provider_info():
return {
"package-name": "my-package-name",
"name": "name",
"description": "a description",
"hook-class-names": [
"myproviderpackage.hooks.source.SourceHook",
],
}
Is there a convention for a connection id and type?
Very good question. Glad that you asked. We usually follow the convention <NAME>_default
for connection
id and just <NAME>
for connection type. Few examples:
google_cloud_default
id andgoogle_cloud_platform
typeaws_default
id andaws
type
You should follow this convention. It is important, to use unique names for connection type, so it should be unique for your provider. If two providers try to add connection with the same type only one of them will succeed.
Can I contribute my own provider to Apache Airflow?
The answer depends on the provider. We have a policy for that in the PROVIDERS.rst developer documentation.
Can I advertise my own provider to Apache Airflow users and share it with others as package in PyPI?
Absolutely! We have an Ecosystem area on our website where we share non-community managed extensions and work for Airflow. Feel free to make a PR to the page and add we will evaluate and merge it when we see that such provider can be useful for the community of Airflow users.
Can I charge for the use of my provider?
This is something that is outside of our control and domain. As an Apache project, we are commercial-friendly and there are many businesses built around Apache Airflow and many other Apache projects. As a community, we provide all the software for free and this will never change. What 3rd-party developers are doing is not under control of Apache Airflow community.