airflow.providers.pinecone.hooks.pinecone

Hook for Pinecone.

Module Contents

Classes

PineconeHook

Interact with Pinecone. This hook uses the Pinecone conn_id.

class airflow.providers.pinecone.hooks.pinecone.PineconeHook(conn_id=default_conn_name)[source]

Bases: airflow.hooks.base.BaseHook

Interact with Pinecone. This hook uses the Pinecone conn_id.

Parameters

conn_id (str) – Optional, default connection id is pinecone_default. The connection id to use when connecting to Pinecone.

conn_name_attr = 'conn_id'[source]
default_conn_name = 'pinecone_default'[source]
conn_type = 'pinecone'[source]
hook_name = 'Pinecone'[source]
classmethod get_connection_form_widgets()[source]

Return connection widgets to add to connection form.

classmethod get_ui_field_behaviour()[source]

Return custom field behaviour.

get_conn()[source]

Return connection for the hook.

test_connection()[source]
static list_indexes()[source]

Retrieve a list of all indexes in your project.

static upsert(index_name, vectors, namespace='', batch_size=None, show_progress=True, **kwargs)[source]

Write vectors into a namespace.

If a new value is upserted for an existing vector id, it will overwrite the previous value.

To upsert in parallel follow

Parameters
  • index_name (str) – The name of the index to describe.

  • vectors (list[Any]) – A list of vectors to upsert.

  • namespace (str) – The namespace to write to. If not specified, the default namespace - “” is used.

  • batch_size (int | None) – The number of vectors to upsert in each batch.

  • show_progress (bool) – Whether to show a progress bar using tqdm. Applied only if batch_size is provided.

static create_index(index_name, dimension, index_type='approximated', metric='cosine', replicas=1, shards=1, pods=1, pod_type='p1', index_config=None, metadata_config=None, source_collection='', timeout=None)[source]

Create a new index.

Parameters
  • index_name (str) – The name of the index to create.

  • dimension (int) – the dimension of vectors that would be inserted in the index

  • index_type (str | None) – type of index, one of {“approximated”, “exact”}, defaults to “approximated”.

  • metric (str | None) – type of metric used in the vector index, one of {“cosine”, “dotproduct”, “euclidean”}

  • replicas (int | None) – the number of replicas, defaults to 1.

  • shards (int | None) – the number of shards per index, defaults to 1.

  • pods (int | None) – Total number of pods to be used by the index. pods = shard*replicas

  • pod_type (str | None) – the pod type to be used for the index. can be one of p1 or s1.

  • index_config (dict[str, str] | None) – Advanced configuration options for the index

  • metadata_config (dict[str, str] | None) – Configuration related to the metadata index

  • source_collection (str | None) – Collection name to create the index from

  • timeout (int | None) – Timeout for wait until index gets ready.

static describe_index(index_name)[source]

Retrieve information about a specific index.

Parameters

index_name (str) – The name of the index to describe.

static delete_index(index_name, timeout=None)[source]

Delete a specific index.

Parameters
  • index_name (str) – the name of the index.

  • timeout (int | None) – Timeout for wait until index gets ready.

static configure_index(index_name, replicas=None, pod_type='')[source]

Change the current configuration of the index.

Parameters
  • index_name (str) – The name of the index to configure.

  • replicas (int | None) – The new number of replicas.

  • pod_type (str | None) – the new pod_type for the index.

static create_collection(collection_name, index_name)[source]

Create a new collection from a specified index.

Parameters
  • collection_name (str) – The name of the collection to create.

  • index_name (str) – The name of the source index.

static delete_collection(collection_name)[source]

Delete a specific collection.

Parameters

collection_name (str) – The name of the collection to delete.

static describe_collection(collection_name)[source]

Retrieve information about a specific collection.

Parameters

collection_name (str) – The name of the collection to describe.

static list_collections()[source]

Retrieve a list of all collections in the current project.

static query_vector(index_name, vector, query_id=None, top_k=10, namespace=None, query_filter=None, include_values=None, include_metadata=None, sparse_vector=None)[source]

Search a namespace using query vector.

It retrieves the ids of the most similar items in a namespace, along with their similarity scores. API reference: https://docs.pinecone.io/reference/query

Parameters
  • index_name (str) – The name of the index to query.

  • vector (list[Any]) – The query vector.

  • query_id (str | None) – The unique ID of the vector to be used as a query vector.

  • top_k (int) – The number of results to return.

  • namespace (str | None) – The namespace to fetch vectors from. If not specified, the default namespace is used.

  • query_filter (dict[str, str | float | int | bool | list[Any] | dict[Any, Any]] | None) – The filter to apply. See https://www.pinecone.io/docs/metadata-filtering/

  • include_values (bool | None) – Whether to include the vector values in the result.

  • include_metadata (bool | None) – Indicates whether metadata is included in the response as well as the ids.

  • sparse_vector (pinecone.core.client.model.sparse_values.SparseValues | dict[str, list[float] | list[int]] | None) – sparse values of the query vector. Expected to be either a SparseValues object or a dict of the form: {‘indices’: List[int], ‘values’: List[float]}, where the lists each have the same length.

upsert_data_async(index_name, data, async_req=False, pool_threads=None)[source]

Upserts (insert/update) data into the Pinecone index.

Parameters
  • index_name (str) – Name of the index.

  • data (list[tuple[Any]]) – List of tuples to be upserted. Each tuple is of form (id, vector, metadata). Metadata is optional.

  • async_req (bool) – If True, upsert operations will be asynchronous.

  • pool_threads (int | None) – Number of threads for parallel upserting. If async_req is True, this must be provided.

static describe_index_stats(index_name, stats_filter=None, **kwargs)[source]

Describe the index statistics.

Returns statistics about the index’s contents. For example: The vector count per namespace and the number of dimensions. API reference: https://docs.pinecone.io/reference/describe_index_stats_post

Parameters

Was this entry helpful?