feat: add semantic search marimo notebook by jhamon · Pull Request #581 · pinecone-io/examples

jhamon · 2026-05-20T19:24:40Z

Summary

Adds a new marimo notebook demonstrating semantic search with Pinecone, converted and significantly expanded from the existing docs/semantic-search.ipynb. The notebook uses Pinecone's Integrated Inference with the multilingual-e5-large model to demonstrate cross-lingual semantic search across English and Spanish sentences.

Changes

New notebook docs/semantic-search.py (marimo format) with:
- Pinecone SDK 9.0.1 API (pc.indexes.*, pc.index(), updated search signature)
- multilingual-e5-large embedding model for cross-lingual retrieval
- Refactored dataset prep: filter_pairs + extract_sentences(lang) to produce both English and Spanish records from Tatoeba
- to_records parameterized on column name with ID prefixes for multi-language upsert
- mo.ui.table for dataset inspection, mo.status.progress_bar replacing tqdm, mo.ui.run_button for safe index deletion
- Interactive query section with mo.ui.text and mo.ui.radio for language filter
- Language filtering section demonstrating metadata filters scoped to en/es
- Prose interspersed between code cells narrating the process
- "Meaning Over Keywords" and "How It Works" sections explaining model selection and cross-lingual retrieval
pyproject.toml: pins notebook dependencies (datasets==3.5.1, pinecone==9.0.1, numpy, tqdm)

Test Plan

Notebook runs end-to-end with a valid PINECONE_API_KEY
Index creation, upsert, and query cells execute without errors
Cross-lingual queries return results in both languages
Language filter correctly scopes results to en or es
Interactive query input updates results on change
Delete button safely removes the index

🤖 Generated with Claude Code

Note

Low Risk
Low risk: this PR only adds a new documentation notebook/script and does not modify production code paths; the main impact is increased dependency/runtime expectations when running the notebook (Pinecone API key, index create/delete).

Overview
Adds a new docs/semantic-search.py Marimo notebook that walks through semantic search with Pinecone Integrated Inference, including index creation with the multilingual-e5-large model, preparing English/Spanish Tatoeba records, batched upsert_records, and index.search queries.

The notebook also adds an interactive section for running queries with an optional lang metadata filter, plus a guarded cleanup flow to delete the created index via a UI button.

^{Reviewed by Cursor Bugbot for commit fc1dc00. Bugbot is set up for automated code reviews on this repo. Configure here.}

Converts docs/semantic-search.ipynb to a marimo notebook with: - Pinecone SDK 9.0.1 API (pc.indexes.*, pc.index(), new search signature) - Refactored dataset preparation into prepare_sentences/to_records functions - Keyword filtering with deduplication - mo.ui.table for dataset inspection - mo.status.progress_bar replacing tqdm - mo.ui.run_button for safe index deletion - Improved prose structure with explanations interspersed between code cells Also pins notebook dependencies in pyproject.toml. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Replace print-based result output with mo.vstack containing a bold query header and a mo.ui.table, and fix three cells that were incorrectly configured as markdown cells. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

- Switch embedding model to multilingual-e5-large - Refactor data prep: filter_pairs + extract_sentences(lang) to embed both English and Spanish sentences with prefixed IDs - Upsert both languages to a single namespace - Add lang column to search results table - Add cross-lingual queries (English + Spanish) and a no-keyword query to demonstrate meaning-over-keywords retrieval - Add language filtering section with lang= parameter on search() - Update How It Works to explain model selection's role in vector space - Improve prose throughout querying and filtering sections Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

- Add Try It Yourself section with mo.ui.text and mo.ui.radio for language filter, results update reactively on input change - Fix empty cell and duplicate filter query cells - Correct second language filter query to Spanish Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

cursor · 2026-05-20T19:30:43Z

+            top_k=top_k,
+            inputs={"text": query},
+            filter={"lang": {"$eq": lang}} if lang else None,
+        )


Incorrect index.search call shape

High Severity

index.search passes top_k, inputs, and filter as top-level keyword arguments without a query dict. For integrated text search, Pinecone v9 still expects query={"inputs": {"text": ...}, "top_k": ..., "filter": ...}, so these calls will raise a TypeError or fail validation and break all query cells.

^{Reviewed by Cursor Bugbot for commit 72b422c. Configure here.}

Both were added during development but are no longer used in the notebook — tqdm was replaced by mo.status.progress_bar. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

datasets and pinecone were added to [project.dependencies] by marimo's package manager during development. Notebook-specific deps belong in the notebook's inline PEP 723 metadata (# /// script block), not the root project config. Run with --sandbox to use the inline deps. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

No pyproject.toml changes on this branch that would affect the lock file. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit fc1dc00. Configure here.}

cursor · 2026-05-20T19:43:56Z

+@app.cell
+def _(delete_button, index_name, mo, pc):
+    mo.stop(not delete_button.value)
+    pc.indexes.delete(index_name)


Delete call uses positional instead of keyword argument

Low Severity

pc.indexes.delete(index_name) uses a positional argument, while the same call earlier (line 98) correctly uses name=index_name. The project review rules require preferring named keyword arguments over positional arguments, and the inconsistency within the same notebook makes the example harder to follow.

^{Triggered by project rule: Bugbot Configuration}

^{Reviewed by Cursor Bugbot for commit fc1dc00. Configure here.}

claude and others added 4 commits May 20, 2026 12:46

cursor Bot reviewed May 20, 2026

View reviewed changes

claude and others added 4 commits May 20, 2026 15:30

chore: remove unused numpy and tqdm dependencies

ca9b8dc

Both were added during development but are no longer used in the notebook — tqdm was replaced by mo.status.progress_bar. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

chore: restore uv.lock to main state

e539da8

No pyproject.toml changes on this branch that would affect the lock file. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

chore: fix ruff formatting

fc1dc00

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

cursor Bot reviewed May 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add semantic search marimo notebook#581

feat: add semantic search marimo notebook#581
jhamon wants to merge 8 commits into
mainfrom
semantic-search-marimo

jhamon commented May 20, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot May 20, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jhamon commented May 20, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test Plan

Uh oh!

cursor Bot May 20, 2026

Choose a reason for hiding this comment

Incorrect index.search call shape

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 20, 2026

Choose a reason for hiding this comment

Delete call uses positional instead of keyword argument

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jhamon commented May 20, 2026 •

edited by cursor Bot

Loading