airflow.providers.weaviate.hooks.weaviate

Module Contents

Classes

WeaviateHook

Interact with Weaviate database to store vectors. This hook uses the 'conn_id'.

Attributes

ExitingSchemaOptions

HTTP_RETRY_STATUS_CODE

REQUESTS_EXCEPTIONS_TYPES

airflow.providers.weaviate.hooks.weaviate.ExitingSchemaOptions[source]
airflow.providers.weaviate.hooks.weaviate.HTTP_RETRY_STATUS_CODE = [429, 500, 503, 504][source]
airflow.providers.weaviate.hooks.weaviate.REQUESTS_EXCEPTIONS_TYPES = ()[source]
airflow.providers.weaviate.hooks.weaviate.check_http_error_is_retryable(exc)[source]
class airflow.providers.weaviate.hooks.weaviate.WeaviateHook(conn_id=default_conn_name, retry_status_codes=None, *args, **kwargs)[source]

Bases: airflow.hooks.base.BaseHook

Interact with Weaviate database to store vectors. This hook uses the ‘conn_id’.

Parameters

conn_id (str) – The connection id to use when connecting to Weaviate. <howto/connection:weaviate>

conn_name_attr = 'conn_id'[source]
default_conn_name = 'weaviate_default'[source]
conn_type = 'weaviate'[source]
hook_name = 'Weaviate'[source]
classmethod get_connection_form_widgets()[source]

Return connection widgets to add to connection form.

classmethod get_ui_field_behaviour()[source]

Return custom field behaviour.

get_conn()[source]

Return connection for the hook.

conn()[source]

Returns a Weaviate client.

get_client()[source]

Return a Weaviate client.

test_connection()[source]
create_class(class_json)[source]

Create a new class.

create_schema(schema_json)[source]

Create a new Schema.

Instead of adding classes one by one , you can upload a full schema in JSON format at once.

Parameters

schema_json (dict[str, Any] | str) – Schema as a Python dict or the path to a JSON file, or the URL of a JSON file.

get_schema(class_name=None)[source]

Get the schema from Weaviate.

Parameters

class_name (str | None) – The class for which to return the schema. If NOT provided the whole schema is returned, otherwise only the schema of this class is returned. By default None.

delete_classes(class_names, if_error='stop')[source]

Delete all or specific classes if class_names are provided.

Parameters
  • class_names (list[str] | str) – list of class names to be deleted.

  • if_error (str) – define the actions to be taken if there is an error while deleting a class, possible options are stop and continue

Returns

if if_error=continue return list of classes which we failed to delete. if if_error=stop returns None.

Return type

list[str] | None

delete_all_schema()[source]

Remove the entire schema from the Weaviate instance and all data associated with it.

update_config(class_name, config)[source]

Update a schema configuration for a specific class.

create_or_replace_classes(schema_json, existing='ignore')[source]

Create or replace the classes in schema of Weaviate database.

Parameters
  • schema_json (dict[str, Any] | str) – Json containing the schema. Format {“class_name”: “class_dict”} .. seealso:: example of class_dict.

  • existing (ExitingSchemaOptions) – Options to handle the case when the classes exist, possible options ‘replace’, ‘fail’, ‘ignore’.

check_subset_of_schema(classes_objects)[source]

Check if the class_objects is a subset of existing schema.

Note - weaviate client’s contains() don’t handle the class properties mismatch, if you want to
compare Class A with Class B they must have exactly same properties. If Class A has fewer

numbers of properties than Class B, contains() will result in False.

See also

contains.

batch_data(class_name, data, batch_config_params=None, vector_col='Vector', uuid_col='id', retry_attempts_per_object=5, tenant=None)[source]

Add multiple objects or object references at once into weaviate.

Parameters
  • class_name (str) – The name of the class that objects belongs to.

  • data (list[dict[str, Any]] | pandas.DataFrame | None) – list or dataframe of objects we want to add.

  • batch_config_params (dict[str, Any] | None) – dict of batch configuration option. .. seealso:: batch_config_params options

  • vector_col (str) – name of the column containing the vector.

  • uuid_col (str) – Name of the column containing the UUID.

  • retry_attempts_per_object (int) – number of time to try in case of failure before giving up.

  • tenant (str | None) – The tenant to which the object will be added.

query_with_vector(embeddings, class_name, *properties, certainty=0.7, limit=1)[source]

Query weaviate database with near vectors.

This method uses a vector search using a Get query. we are using a with_near_vector to provide weaviate with a query with vector itself. This is needed for query a Weaviate class with a custom, external vectorizer. Weaviate then converts this into a vector through the inference API (OpenAI in this particular example) and uses that vector as the basis for a vector search.

query_without_vector(search_text, class_name, *properties, limit=1)[source]

Query using near text.

This method uses a vector search using a Get query. we are using a nearText operator to provide weaviate with a query search_text. Weaviate then converts this into a vector through the inference API (OpenAI in this particular example) and uses that vector as the basis for a vector search.

create_object(data_object, class_name, **kwargs)[source]

Create a new object.

Parameters
  • data_object (dict | str) – Object to be added. If type is str it should be either a URL or a file.

  • class_name (str) – Class name associated with the object given.

  • kwargs – Additional parameters to be passed to weaviate_client.data_object.create()

get_or_create_object(data_object=None, class_name=None, vector=None, consistency_level=None, tenant=None, **kwargs)[source]

Get or Create a new object.

Returns the object if already exists

Parameters
  • data_object (dict | str | None) – Object to be added. If type is str it should be either a URL or a file. This is required to create a new object.

  • class_name (str | None) – Class name associated with the object given. This is required to create a new object.

  • vector (Sequence | None) – Vector associated with the object given. This argument is only used when creating object.

  • consistency_level (weaviate.data.replication.ConsistencyLevel | None) – Consistency level to be used. Applies to both create and get operations.

  • tenant (str | None) – Tenant to be used. Applies to both create and get operations.

  • kwargs – Additional parameters to be passed to weaviate_client.data_object.create() and weaviate_client.data_object.get()

get_object(**kwargs)[source]

Get objects or an object from weaviate.

Parameters

kwargs – parameters to be passed to weaviate_client.data_object.get() or weaviate_client.data_object.get_by_id()

get_all_objects(after=None, as_dataframe=False, **kwargs)[source]

Get all objects from weaviate.

if after is provided, it will be used as the starting point for the listing.

Parameters
  • after (str | weaviate.types.UUID | None) – uuid of the object to start listing from

  • as_dataframe (bool) – if True, returns a pandas dataframe

  • kwargs – parameters to be passed to weaviate_client.data_object.get()

delete_object(uuid, **kwargs)[source]

Delete an object from weaviate.

Parameters
  • uuid (weaviate.types.UUID | str) – uuid of the object to be deleted

  • kwargs – Optional parameters to be passed to weaviate_client.data_object.delete()

update_object(data_object, class_name, uuid, **kwargs)[source]

Update an object in weaviate.

Parameters
  • data_object (dict | str) – The object states the fields that should be updated. Fields not specified in the ‘data_object’ remain unchanged. Fields that are None will not be changed. If type is str it should be either an URL or a file.

  • class_name (str) – Class name associated with the object given.

  • uuid (weaviate.types.UUID | str) – uuid of the object to be updated

  • kwargs – Optional parameters to be passed to weaviate_client.data_object.update()

replace_object(data_object, class_name, uuid, **kwargs)[source]

Replace an object in weaviate.

Parameters
  • data_object (dict | str) – The object states the fields that should be updated. Fields not specified in the ‘data_object’ will be set to None. If type is str it should be either an URL or a file.

  • class_name (str) – Class name associated with the object given.

  • uuid (weaviate.types.UUID | str) – uuid of the object to be replaced

  • kwargs – Optional parameters to be passed to weaviate_client.data_object.replace()

validate_object(data_object, class_name, **kwargs)[source]

Validate an object in weaviate.

Parameters
  • data_object (dict | str) – The object to be validated. If type is str it should be either an URL or a file.

  • class_name (str) – Class name associated with the object given.

  • kwargs – Optional parameters to be passed to weaviate_client.data_object.validate()

object_exists(uuid, **kwargs)[source]

Check if an object exists in weaviate.

Parameters
  • uuid (str | weaviate.types.UUID) – The UUID of the object that may or may not exist within Weaviate.

  • kwargs – Optional parameters to be passed to weaviate_client.data_object.exists()

create_or_replace_document_objects(data, class_name, document_column, existing='skip', uuid_column=None, vector_column='Vector', batch_config_params=None, tenant=None, verbose=False)[source]

create or replace objects belonging to documents.

In real-world scenarios, information sources like Airflow docs, Stack Overflow, or other issues are considered ‘documents’ here. It’s crucial to keep the database objects in sync with these sources. If any changes occur in these documents, this function aims to reflect those changes in the database.

Note

This function assumes responsibility for identifying changes in documents, dropping relevant database objects, and recreating them based on updated information. It’s crucial to handle this process with care, ensuring backups and validation are in place to prevent data loss or inconsistencies.

Provides users with multiple ways of dealing with existing values. replace: replace the existing objects with new objects. This option requires to identify the objects belonging to a document. which by default is done by using document_column field. skip: skip the existing objects and only add the missing objects of a document. error: raise an error if an object belonging to a existing document is tried to be created.

Parameters
  • data (pandas.DataFrame | list[dict[str, Any]] | list[pandas.DataFrame]) – A single pandas DataFrame or a list of dicts to be ingested.

  • class_name (str) – Name of the class in Weaviate schema where data is to be ingested.

  • existing (str) – Strategy for handling existing data: ‘skip’, or ‘replace’. Default is ‘skip’.

  • document_column (str) – Column in DataFrame that identifying source document.

  • uuid_column (str | None) – Column with pre-generated UUIDs. If not provided, UUIDs will be generated.

  • vector_column (str) – Column with embedding vectors for pre-embedded data.

  • batch_config_params (dict | None) – Additional parameters for Weaviate batch configuration.

  • tenant (str | None) – The tenant to which the object will be added.

  • verbose (bool) – Flag to enable verbose output during the ingestion process.

Returns

list of UUID which failed to create

Was this entry helpful?