airflow.providers.weaviate.hooks.weaviate
¶
Module Contents¶
Classes¶
Interact with Weaviate database to store vectors. This hook uses the 'conn_id'. |
Functions¶
Attributes¶
- class airflow.providers.weaviate.hooks.weaviate.WeaviateHook(conn_id=default_conn_name, retry_status_codes=None, *args, **kwargs)[source]¶
Bases:
airflow.hooks.base.BaseHook
Interact with Weaviate database to store vectors. This hook uses the ‘conn_id’.
- Parameters
conn_id (str) – The connection id to use when connecting to Weaviate. <howto/connection:weaviate>
- classmethod get_connection_form_widgets()[source]¶
Return connection widgets to add to connection form.
- create_schema(schema_json)[source]¶
Create a new Schema.
Instead of adding classes one by one , you can upload a full schema in JSON format at once.
- get_schema(class_name=None)[source]¶
Get the schema from Weaviate.
- Parameters
class_name (str | None) – The class for which to return the schema. If NOT provided the whole schema is returned, otherwise only the schema of this class is returned. By default None.
- delete_classes(class_names, if_error='stop')[source]¶
Delete all or specific classes if class_names are provided.
- Parameters
- Returns
if if_error=continue return list of classes which we failed to delete. if if_error=stop returns None.
- Return type
- delete_all_schema()[source]¶
Remove the entire schema from the Weaviate instance and all data associated with it.
- create_or_replace_classes(schema_json, existing='ignore')[source]¶
Create or replace the classes in schema of Weaviate database.
- Parameters
schema_json (dict[str, Any] | str) – Json containing the schema. Format {“class_name”: “class_dict”} .. seealso:: example of class_dict.
existing (ExitingSchemaOptions) – Options to handle the case when the classes exist, possible options ‘replace’, ‘fail’, ‘ignore’.
- check_subset_of_schema(classes_objects)[source]¶
Check if the class_objects is a subset of existing schema.
- Note - weaviate client’s contains() don’t handle the class properties mismatch, if you want to
- compare Class A with Class B they must have exactly same properties. If Class A has fewer
numbers of properties than Class B, contains() will result in False.
See also
- batch_data(class_name, data, batch_config_params=None, vector_col='Vector', uuid_col='id', retry_attempts_per_object=5, tenant=None)[source]¶
Add multiple objects or object references at once into weaviate.
- Parameters
class_name (str) – The name of the class that objects belongs to.
data (list[dict[str, Any]] | pandas.DataFrame | None) – list or dataframe of objects we want to add.
batch_config_params (dict[str, Any] | None) – dict of batch configuration option. .. seealso:: batch_config_params options
vector_col (str) – name of the column containing the vector.
uuid_col (str) – Name of the column containing the UUID.
retry_attempts_per_object (int) – number of time to try in case of failure before giving up.
tenant (str | None) – The tenant to which the object will be added.
- query_with_vector(embeddings, class_name, *properties, certainty=0.7, limit=1)[source]¶
Query weaviate database with near vectors.
This method uses a vector search using a Get query. we are using a with_near_vector to provide weaviate with a query with vector itself. This is needed for query a Weaviate class with a custom, external vectorizer. Weaviate then converts this into a vector through the inference API (OpenAI in this particular example) and uses that vector as the basis for a vector search.
- query_without_vector(search_text, class_name, *properties, limit=1)[source]¶
Query using near text.
This method uses a vector search using a Get query. we are using a nearText operator to provide weaviate with a query search_text. Weaviate then converts this into a vector through the inference API (OpenAI in this particular example) and uses that vector as the basis for a vector search.
- get_or_create_object(data_object=None, class_name=None, vector=None, consistency_level=None, tenant=None, **kwargs)[source]¶
Get or Create a new object.
Returns the object if already exists
- Parameters
data_object (dict | str | None) – Object to be added. If type is str it should be either a URL or a file. This is required to create a new object.
class_name (str | None) – Class name associated with the object given. This is required to create a new object.
vector (Sequence | None) – Vector associated with the object given. This argument is only used when creating object.
consistency_level (weaviate.data.replication.ConsistencyLevel | None) – Consistency level to be used. Applies to both create and get operations.
tenant (str | None) – Tenant to be used. Applies to both create and get operations.
kwargs – Additional parameters to be passed to weaviate_client.data_object.create() and weaviate_client.data_object.get()
- get_object(**kwargs)[source]¶
Get objects or an object from weaviate.
- Parameters
kwargs – parameters to be passed to weaviate_client.data_object.get() or weaviate_client.data_object.get_by_id()
- get_all_objects(after=None, as_dataframe=False, **kwargs)[source]¶
Get all objects from weaviate.
if after is provided, it will be used as the starting point for the listing.
- delete_object(uuid, **kwargs)[source]¶
Delete an object from weaviate.
- Parameters
uuid (weaviate.types.UUID | str) – uuid of the object to be deleted
kwargs – Optional parameters to be passed to weaviate_client.data_object.delete()
- update_object(data_object, class_name, uuid, **kwargs)[source]¶
Update an object in weaviate.
- Parameters
data_object (dict | str) – The object states the fields that should be updated. Fields not specified in the ‘data_object’ remain unchanged. Fields that are None will not be changed. If type is str it should be either an URL or a file.
class_name (str) – Class name associated with the object given.
uuid (weaviate.types.UUID | str) – uuid of the object to be updated
kwargs – Optional parameters to be passed to weaviate_client.data_object.update()
- replace_object(data_object, class_name, uuid, **kwargs)[source]¶
Replace an object in weaviate.
- Parameters
data_object (dict | str) – The object states the fields that should be updated. Fields not specified in the ‘data_object’ will be set to None. If type is str it should be either an URL or a file.
class_name (str) – Class name associated with the object given.
uuid (weaviate.types.UUID | str) – uuid of the object to be replaced
kwargs – Optional parameters to be passed to weaviate_client.data_object.replace()
- object_exists(uuid, **kwargs)[source]¶
Check if an object exists in weaviate.
- Parameters
uuid (str | weaviate.types.UUID) – The UUID of the object that may or may not exist within Weaviate.
kwargs – Optional parameters to be passed to weaviate_client.data_object.exists()
- create_or_replace_document_objects(data, class_name, document_column, existing='skip', uuid_column=None, vector_column='Vector', batch_config_params=None, tenant=None, verbose=False)[source]¶
create or replace objects belonging to documents.
In real-world scenarios, information sources like Airflow docs, Stack Overflow, or other issues are considered ‘documents’ here. It’s crucial to keep the database objects in sync with these sources. If any changes occur in these documents, this function aims to reflect those changes in the database.
Note
This function assumes responsibility for identifying changes in documents, dropping relevant database objects, and recreating them based on updated information. It’s crucial to handle this process with care, ensuring backups and validation are in place to prevent data loss or inconsistencies.
Provides users with multiple ways of dealing with existing values. replace: replace the existing objects with new objects. This option requires to identify the objects belonging to a document. which by default is done by using document_column field. skip: skip the existing objects and only add the missing objects of a document. error: raise an error if an object belonging to a existing document is tried to be created.
- Parameters
data (pandas.DataFrame | list[dict[str, Any]] | list[pandas.DataFrame]) – A single pandas DataFrame or a list of dicts to be ingested.
class_name (str) – Name of the class in Weaviate schema where data is to be ingested.
existing (str) – Strategy for handling existing data: ‘skip’, or ‘replace’. Default is ‘skip’.
document_column (str) – Column in DataFrame that identifying source document.
uuid_column (str | None) – Column with pre-generated UUIDs. If not provided, UUIDs will be generated.
vector_column (str) – Column with embedding vectors for pre-embedded data.
batch_config_params (dict | None) – Additional parameters for Weaviate batch configuration.
tenant (str | None) – The tenant to which the object will be added.
verbose (bool) – Flag to enable verbose output during the ingestion process.
- Returns
list of UUID which failed to create