diff --git a/docs/quick-tour/interacting-with-the-index.ipynb b/docs/quick-tour/interacting-with-the-index.ipynb index 3a98cab1..5a1e0cb5 100644 --- a/docs/quick-tour/interacting-with-the-index.ipynb +++ b/docs/quick-tour/interacting-with-the-index.ipynb @@ -2,6 +2,7 @@ "cells": [ { "cell_type": "markdown", + "id": "0", "metadata": { "editable": true, "id": "a3e6b1da", @@ -12,11 +13,11 @@ }, "source": [ "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/docs/quick-tour/interacting-with-the-index.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/docs/quick-tour/interacting-with-the-index.ipynb)" - ], - "id": "a3e6b1da" + ] }, { "cell_type": "markdown", + "id": "1", "metadata": { "id": "forbidden-sunglasses", "papermill": { @@ -40,11 +41,11 @@ "* `query`: query the index and retrieve the top-k nearest neighbors based on dot-product, cosine-similarity, Euclidean distance, and more.\n", "* `fetch`: fetch vectors stored in the index by id.\n", "* `describe_index_stats`: get statistics about the index." - ], - "id": "forbidden-sunglasses" + ] }, { "cell_type": "markdown", + "id": "2", "metadata": { "id": "quiet-signal", "papermill": { @@ -58,11 +59,11 @@ }, "source": [ "## Prerequisites" - ], - "id": "quiet-signal" + ] }, { "cell_type": "markdown", + "id": "3", "metadata": { "id": "beautiful-paper", "papermill": { @@ -76,11 +77,12 @@ }, "source": [ "Install dependencies." - ], - "id": "beautiful-paper" + ] }, { "cell_type": "code", + "execution_count": null, + "id": "4", "metadata": { "colab": { "base_uri": "https://localhost:8080/" @@ -100,15 +102,20 @@ }, "tags": [] }, - "source": [ - "!pip install -qU pandas==2.2.3 pinecone==8.0.0" - ], - "execution_count": 1, "outputs": [], - "id": "complex-diversity" + "source": [ + "!pip install -qU pandas==2.2.3 pinecone==9.0.0\n", + "\n", + "import os\n", + "from getpass import getpass\n", + "\n", + "import pandas as pd\n", + "from pinecone import AwsRegion, CloudProvider, Metric, Pinecone, ServerlessSpec" + ] }, { "cell_type": "markdown", + "id": "5", "metadata": { "editable": true, "slideshow": { @@ -120,11 +127,12 @@ "## Creating an Index\n", "\n", "We begin by instantiating the Pinecone client. To do this we need a [free API key](https://app.pinecone.io)." - ], - "id": "4b7eca35" + ] }, { "cell_type": "code", + "execution_count": null, + "id": "6", "metadata": { "editable": true, "slideshow": { @@ -132,33 +140,18 @@ }, "tags": [] }, + "outputs": [], "source": [ - "import os\n", - "from getpass import getpass\n", - "\n", - "from pinecone import Pinecone\n", - "\n", "# Get API key at app.pinecone.io\n", "api_key = os.environ.get(\"PINECONE_API_KEY\") or getpass(\"Enter your Pinecone API key: \")\n", "\n", "# Instantiate the client\n", "pc = Pinecone(api_key=api_key)" - ], - "execution_count": 2, - "outputs": [ - { - "output_type": "stream", - "text": [ - "/opt/conda/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", - " from .autonotebook import tqdm as notebook_tqdm\n" - ], - "name": "stderr" - } - ], - "id": "296b4b28" + ] }, { "cell_type": "markdown", + "id": "7", "metadata": { "editable": true, "slideshow": { @@ -177,11 +170,12 @@ "- `spec` holds a specification which tells Pinecone how you would like to deploy our index. You can find a list of all [available providers and regions here](https://docs.pinecone.io/guides/index-data/create-an-index#cloud-regions).\n", "\n", "There are more configurations available, but this minimal set will get us started." - ], - "id": "e5ded34b-58b6-46b1-9c04-b62e380b80a2" + ] }, { "cell_type": "code", + "execution_count": null, + "id": "8", "metadata": { "editable": true, "id": "MjzMwddcyHM2", @@ -192,15 +186,15 @@ "parameters" ] }, + "outputs": [], "source": [ "index_name = \"interacting-with-the-index\"" - ], - "execution_count": 3, - "outputs": [], - "id": "MjzMwddcyHM2" + ] }, { "cell_type": "code", + "execution_count": null, + "id": "9", "metadata": { "editable": true, "slideshow": { @@ -208,17 +202,17 @@ }, "tags": [] }, + "outputs": [], "source": [ "# Delete the demo index if it already exists\n", "if pc.has_index(name=index_name):\n", - " pc.delete_index(index_name)" - ], - "execution_count": 4, - "outputs": [], - "id": "07826c0c" + " pc.delete_index(name=index_name)" + ] }, { "cell_type": "code", + "execution_count": null, + "id": "10", "metadata": { "editable": true, "id": "progressive-blues", @@ -234,100 +228,60 @@ }, "tags": [] }, + "outputs": [], "source": [ - "from pinecone import AwsRegion, CloudProvider, Metric, ServerlessSpec\n", - "\n", "pc.create_index(\n", " name=index_name,\n", " dimension=2,\n", " metric=Metric.EUCLIDEAN,\n", " spec=ServerlessSpec(cloud=CloudProvider.AWS, region=AwsRegion.US_EAST_1),\n", ")" - ], - "execution_count": 5, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "{\n", - " \"name\": \"interacting-with-the-index\",\n", - " \"metric\": \"euclidean\",\n", - " \"host\": \"interacting-with-the-index-dojoi3u.svc.aped-4627-b74a.pinecone.io\",\n", - " \"spec\": {\n", - " \"serverless\": {\n", - " \"cloud\": \"aws\",\n", - " \"region\": \"us-east-1\"\n", - " }\n", - " },\n", - " \"status\": {\n", - " \"ready\": true,\n", - " \"state\": \"Ready\"\n", - " },\n", - " \"vector_type\": \"dense\",\n", - " \"dimension\": 2,\n", - " \"deletion_protection\": \"disabled\",\n", - " \"tags\": null\n", - "}" - ] - }, - "metadata": {}, - "execution_count": null - } - ], - "id": "progressive-blues" + ] }, { "cell_type": "markdown", + "id": "11", "metadata": {}, "source": [ "The index configuration is returned by the create command, but we can look it up again at any time by calling the `describe_index` method." - ], - "id": "0143b2e9-ef6c-4308-8c6d-bc04d0372888" + ] }, { "cell_type": "code", + "execution_count": null, + "id": "12", "metadata": {}, + "outputs": [], "source": [ "index_config = pc.describe_index(name=index_name)\n", "\n", "print(f\"The index host is {index_config.host}\")" - ], - "execution_count": 6, - "outputs": [ - { - "output_type": "stream", - "text": [ - "The index host is interacting-with-the-index-dojoi3u.svc.aped-4627-b74a.pinecone.io\n" - ], - "name": "stdout" - } - ], - "id": "b10c2aba-aead-4230-b1e9-3b4a61aeb659" + ] }, { "cell_type": "markdown", + "id": "13", "metadata": {}, "source": [ "# Using the index\n", "\n", "Data operations such as `upsert` and `query` are sent directly to the index host instead of `api.pinecone.io`, so we use a different client object object for these operations. By using the `pc.Index()` helper method to construct this index client object, it will automatically inherit your API Key and any other configurations from the parent `Pinecone` instance." - ], - "id": "e524b3fc-a289-4608-97e6-58283c7ca31f" + ] }, { "cell_type": "code", + "execution_count": null, + "id": "14", "metadata": {}, + "outputs": [], "source": [ "# Instantiate an index client\n", "index = pc.Index(host=index_config.host)" - ], - "execution_count": 9, - "outputs": [], - "id": "d686f6a8-5536-4890-a0ca-653a3b62e666" + ] }, { "cell_type": "markdown", + "id": "15", "metadata": { "id": "billion-imperial", "papermill": { @@ -343,11 +297,12 @@ "### Insert vectors\n", "\n", "In a real use case, the vectors we insert would represent embeddings of our data. But for this simple demo, we will make up some small values just to illustrate the shape of the interface." - ], - "id": "billion-imperial" + ] }, { "cell_type": "code", + "execution_count": null, + "id": "16", "metadata": { "colab": { "base_uri": "https://localhost:8080/", @@ -364,90 +319,18 @@ }, "tags": [] }, + "outputs": [], "source": [ "# Create some sample data\n", - "import pandas as pd\n", - "\n", "df = pd.DataFrame()\n", "df[\"id\"] = [\"A\", \"B\", \"C\", \"D\", \"E\"]\n", "df[\"vector\"] = [[1.0, 1.0], [2.0, 2.0], [3.0, 3.0], [4.0, 4.0], [5.0, 5.0]]\n", "df" - ], - "execution_count": 13, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
idvector
0A[1.0, 1.0]
1B[2.0, 2.0]
2C[3.0, 3.0]
3D[4.0, 4.0]
4E[5.0, 5.0]
\n", - "
" - ], - "text/plain": [ - " id vector\n", - "0 A [1.0, 1.0]\n", - "1 B [2.0, 2.0]\n", - "2 C [3.0, 3.0]\n", - "3 D [4.0, 4.0]\n", - "4 E [5.0, 5.0]" - ] - }, - "metadata": {}, - "execution_count": null - } - ], - "id": "analyzed-charity" + ] }, { "cell_type": "markdown", + "id": "17", "metadata": { "id": "e3c126d0", "papermill": { @@ -461,11 +344,12 @@ }, "source": [ "We perform upsert operations in our index. The upsert operation will insert a new vector in the index or update the vector if the id was already present." - ], - "id": "e3c126d0" + ] }, { "cell_type": "code", + "execution_count": null, + "id": "18", "metadata": { "colab": { "base_uri": "https://localhost:8080/" @@ -481,27 +365,15 @@ }, "tags": [] }, + "outputs": [], "source": [ "# Upsert the vectors\n", "index.upsert(vectors=zip(df.id, df.vector))" - ], - "execution_count": 14, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "{'upserted_count': 5}" - ] - }, - "metadata": {}, - "execution_count": null - } - ], - "id": "checked-christopher" + ] }, { "cell_type": "markdown", + "id": "19", "metadata": { "id": "psychological-estate", "papermill": { @@ -515,11 +387,12 @@ }, "source": [ "### Fetch vectors" - ], - "id": "psychological-estate" + ] }, { "cell_type": "code", + "execution_count": null, + "id": "20", "metadata": { "colab": { "base_uri": "https://localhost:8080/" @@ -535,28 +408,16 @@ }, "tags": [] }, + "outputs": [], "source": [ "# Fetch vectors by ID\n", "fetch_results = index.fetch(ids=[\"A\", \"B\"])\n", "fetch_results" - ], - "execution_count": 15, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "FetchResponse(namespace='', vectors={'A': Vector(id='A', values=[1.0, 1.0], metadata=None, sparse_values=None), 'B': Vector(id='B', values=[2.0, 2.0], metadata=None, sparse_values=None)}, usage={'read_units': 1})" - ] - }, - "metadata": {}, - "execution_count": null - } - ], - "id": "varied-scene" + ] }, { "cell_type": "markdown", + "id": "21", "metadata": { "id": "frank-participation", "papermill": { @@ -570,11 +431,12 @@ }, "source": [ "### Query top-k vectors" - ], - "id": "frank-participation" + ] }, { "cell_type": "code", + "execution_count": null, + "id": "22", "metadata": { "colab": { "base_uri": "https://localhost:8080/" @@ -590,28 +452,16 @@ }, "tags": [] }, + "outputs": [], "source": [ "# Query top-k nearest neighbors\n", "query_results = index.query(vector=[1.1, 1.1], top_k=2)\n", "query_results" - ], - "execution_count": 16, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "{'matches': [], 'namespace': '', 'usage': {'read_units': 1}}" - ] - }, - "metadata": {}, - "execution_count": null - } - ], - "id": "dried-demographic" + ] }, { "cell_type": "markdown", + "id": "23", "metadata": { "id": "binary-drama", "papermill": { @@ -625,11 +475,12 @@ }, "source": [ "### Update vectors by ID" - ], - "id": "binary-drama" + ] }, { "cell_type": "code", + "execution_count": null, + "id": "24", "metadata": { "colab": { "base_uri": "https://localhost:8080/" @@ -645,28 +496,17 @@ }, "tags": [] }, + "outputs": [], "source": [ "# Fetch current vectors by ID\n", "fetch_result = index.fetch(ids=[\"A\"])\n", "fetch_result" - ], - "execution_count": 17, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "FetchResponse(namespace='', vectors={'A': Vector(id='A', values=[1.0, 1.0], metadata=None, sparse_values=None)}, usage={'read_units': 1})" - ] - }, - "metadata": {}, - "execution_count": null - } - ], - "id": "generic-witness" + ] }, { "cell_type": "code", + "execution_count": null, + "id": "25", "metadata": { "colab": { "base_uri": "https://localhost:8080/" @@ -682,27 +522,16 @@ }, "tags": [] }, + "outputs": [], "source": [ "# Update vectors by ID\n", "index.upsert(vectors=[(\"A\", [0.1, 0.1])])" - ], - "execution_count": 18, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "{'upserted_count': 1}" - ] - }, - "metadata": {}, - "execution_count": null - } - ], - "id": "comic-rwanda" + ] }, { "cell_type": "code", + "execution_count": null, + "id": "26", "metadata": { "colab": { "base_uri": "https://localhost:8080/" @@ -718,28 +547,16 @@ }, "tags": [] }, + "outputs": [], "source": [ "# Fetch vector by the same ID again\n", "fetch_result = index.fetch(ids=[\"A\"])\n", "fetch_result" - ], - "execution_count": 25, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "FetchResponse(namespace='', vectors={'A': Vector(id='A', values=[0.1, 0.1], metadata=None, sparse_values=None)}, usage={'read_units': 1})" - ] - }, - "metadata": {}, - "execution_count": null - } - ], - "id": "gentle-messenger" + ] }, { "cell_type": "markdown", + "id": "27", "metadata": { "id": "manual-format", "papermill": { @@ -753,11 +570,12 @@ }, "source": [ "### Delete vectors by ID" - ], - "id": "manual-format" + ] }, { "cell_type": "code", + "execution_count": null, + "id": "28", "metadata": { "colab": { "base_uri": "https://localhost:8080/" @@ -773,27 +591,16 @@ }, "tags": [] }, + "outputs": [], "source": [ "# Delete vectors by ID\n", "index.delete(ids=[\"A\"])" - ], - "execution_count": 26, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "{}" - ] - }, - "metadata": {}, - "execution_count": null - } - ], - "id": "hispanic-talent" + ] }, { "cell_type": "code", + "execution_count": null, + "id": "29", "metadata": { "colab": { "base_uri": "https://localhost:8080/" @@ -809,28 +616,16 @@ }, "tags": [] }, + "outputs": [], "source": [ "# Deleted vectors are empty\n", "fetch_results = index.fetch(ids=[\"A\", \"B\"])\n", "fetch_results" - ], - "execution_count": 28, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "FetchResponse(namespace='', vectors={'A': Vector(id='A', values=[0.1, 0.1], metadata=None, sparse_values=None), 'B': Vector(id='B', values=[2.0, 2.0], metadata=None, sparse_values=None)}, usage={'read_units': 1})" - ] - }, - "metadata": {}, - "execution_count": null - } - ], - "id": "romantic-dubai" + ] }, { "cell_type": "markdown", + "id": "30", "metadata": { "id": "balanced-intellectual", "papermill": { @@ -844,11 +639,12 @@ }, "source": [ "### Get index statistics" - ], - "id": "balanced-intellectual" + ] }, { "cell_type": "code", + "execution_count": null, + "id": "31", "metadata": { "colab": { "base_uri": "https://localhost:8080/" @@ -864,32 +660,15 @@ }, "tags": [] }, + "outputs": [], "source": [ "# Index statistics\n", "index.describe_index_stats()" - ], - "execution_count": 29, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "{'dimension': 2,\n", - " 'index_fullness': 0.0,\n", - " 'metric': 'euclidean',\n", - " 'namespaces': {'': {'vector_count': 4}},\n", - " 'total_vector_count': 4,\n", - " 'vector_type': 'dense'}" - ] - }, - "metadata": {}, - "execution_count": null - } - ], - "id": "nonprofit-popularity" + ] }, { "cell_type": "markdown", + "id": "32", "metadata": { "id": "directed-keyboard", "papermill": { @@ -903,11 +682,12 @@ }, "source": [ "### Delete the index" - ], - "id": "directed-keyboard" + ] }, { "cell_type": "code", + "execution_count": null, + "id": "33", "metadata": { "id": "supported-casino", "papermill": { @@ -919,13 +699,11 @@ }, "tags": [] }, + "outputs": [], "source": [ "# Delete the index\n", "pc.delete_index(name=index_name)" - ], - "execution_count": 30, - "outputs": [], - "id": "supported-casino" + ] } ], "metadata": {