airflow.providers.mongo.hooks.mongo
¶
Hook for Mongo DB.
Module Contents¶
Classes¶
PyMongo wrapper to interact with MongoDB. |
- class airflow.providers.mongo.hooks.mongo.MongoHook(mongo_conn_id=default_conn_name, *args, **kwargs)[source]¶
Bases:
airflow.hooks.base.BaseHook
PyMongo wrapper to interact with MongoDB.
Mongo Connection Documentation https://docs.mongodb.com/manual/reference/connection-string/index.html You can specify connection string options in extra field of your connection https://docs.mongodb.com/manual/reference/connection-string/index.html#connection-string-options
If you want use DNS seedlist, set srv to True.
- ex.
{“srv”: true, “replicaSet”: “test”, “ssl”: true, “connectTimeoutMS”: 30000}
For enabling SSL, the “ssl”: true option can be used within the connection string options, under extra. In scenarios where SSL is enabled, allow_insecure option is not included by default in the connection unless specified. This is so that we ensure a secure medium while handling connections to MongoDB.
The allow_insecure only makes sense in ssl context and is configurable and can be used in one of the following scenarios:
HTTP (ssl = False) Here, ssl is disabled and using allow_insecure doesn’t make sense. Example connection extra: {“ssl”: false}
HTTPS, but insecure (ssl = True, allow_insecure = True) Here, ssl is enabled, and the connection allows insecure connections. Example connection extra: {“ssl”: true, “allow_insecure”: true}
HTTPS, but secure (ssl = True, allow_insecure = False - default when SSL enabled): Here, ssl is enabled, and the connection does not allow insecure connections (default behavior when SSL is enabled). Example connection extra: {“ssl”: true} or {“ssl”: true, “allow_insecure”: false}
Note: tls is an alias to ssl and can be used in place of ssl. Example: {“ssl”: false} or {“tls”: false}.
- Parameters
mongo_conn_id (str) – The Mongo connection id to use when connecting to MongoDB.
- classmethod get_connection_form_widgets()[source]¶
Return connection widgets to add to connection form.
- __exit__(exc_type, exc_val, exc_tb)[source]¶
Close mongo connection when exiting the context manager.
- get_collection(mongo_collection, mongo_db=None)[source]¶
Fetch a mongo collection object for querying.
Uses connection schema as DB unless specified.
- aggregate(mongo_collection, aggregate_query, mongo_db=None, **kwargs)[source]¶
Run an aggregation pipeline and returns the results.
https://pymongo.readthedocs.io/en/stable/api/pymongo/collection.html#pymongo.collection.Collection.aggregate https://pymongo.readthedocs.io/en/stable/examples/aggregation.html
- find(mongo_collection: str, query: dict, find_one: typing_extensions.Literal[False], mongo_db: str | None = None, projection: list | dict | None = None, **kwargs) pymongo.cursor.Cursor [source]¶
- find(mongo_collection: str, query: dict, find_one: typing_extensions.Literal[True], mongo_db: str | None = None, projection: list | dict | None = None, **kwargs) Any | None
Run a mongo find query and returns the results.
- insert_one(mongo_collection, doc, mongo_db=None, **kwargs)[source]¶
Insert a single document into a mongo collection.
- insert_many(mongo_collection, docs, mongo_db=None, **kwargs)[source]¶
Insert many docs into a mongo collection.
- update_one(mongo_collection, filter_doc, update_doc, mongo_db=None, **kwargs)[source]¶
Update a single document in a mongo collection.
- Parameters
mongo_collection (str) – The name of the collection to update.
filter_doc (dict) – A query that matches the documents to update.
update_doc (dict) – The modifications to apply.
mongo_db (str | None) – The name of the database to use. Can be omitted; then the database from the connection string is used.
- update_many(mongo_collection, filter_doc, update_doc, mongo_db=None, **kwargs)[source]¶
Update one or more documents in a mongo collection.
- Parameters
mongo_collection (str) – The name of the collection to update.
filter_doc (dict) – A query that matches the documents to update.
update_doc (dict) – The modifications to apply.
mongo_db (str | None) – The name of the database to use. Can be omitted; then the database from the connection string is used.
- replace_one(mongo_collection, doc, filter_doc=None, mongo_db=None, **kwargs)[source]¶
Replace a single document in a mongo collection.
Note
If no
filter_doc
is given, it is assumed that the replacement document contain the_id
field which is then used as filters.- Parameters
mongo_collection (str) – The name of the collection to update.
doc (dict) – The new document.
filter_doc (dict | None) – A query that matches the documents to replace. Can be omitted; then the _id field from doc will be used.
mongo_db (str | None) – The name of the database to use. Can be omitted; then the database from the connection string is used.
- replace_many(mongo_collection, docs, filter_docs=None, mongo_db=None, upsert=False, collation=None, **kwargs)[source]¶
Replace many documents in a mongo collection.
Uses bulk_write with multiple ReplaceOne operations https://pymongo.readthedocs.io/en/stable/api/pymongo/collection.html#pymongo.collection.Collection.bulk_write
Note
If no
filter_docs``are given, it is assumed that all replacement documents contain the ``_id
field which are then used as filters.- Parameters
mongo_collection (str) – The name of the collection to update.
filter_docs (list[dict] | None) – A list of queries that match the documents to replace. Can be omitted; then the _id fields from airflow.docs will be used.
mongo_db (str | None) – The name of the database to use. Can be omitted; then the database from the connection string is used.
upsert (bool) – If
True
, perform an insert if no documents match the filters for the replace operation.collation (pymongo.collation.Collation | None) – An instance of
Collation
. This option is only supported on MongoDB 3.4 and above.
- delete_one(mongo_collection, filter_doc, mongo_db=None, **kwargs)[source]¶
Delete a single document in a mongo collection.
- delete_many(mongo_collection, filter_doc, mongo_db=None, **kwargs)[source]¶
Delete one or more documents in a mongo collection.
- distinct(mongo_collection, distinct_key, filter_doc=None, mongo_db=None, **kwargs)[source]¶
Return a list of distinct values for the given key across a collection.
- Parameters
mongo_collection (str) – The name of the collection to perform distinct on.
distinct_key (str) – The field to return distinct values from.
filter_doc (dict | None) – A query that matches the documents get distinct values from. Can be omitted; then will cover the entire collection.
mongo_db (str | None) – The name of the database to use. Can be omitted; then the database from the connection string is used.