feat: add semantic search marimo notebook#581
Conversation
Converts docs/semantic-search.ipynb to a marimo notebook with: - Pinecone SDK 9.0.1 API (pc.indexes.*, pc.index(), new search signature) - Refactored dataset preparation into prepare_sentences/to_records functions - Keyword filtering with deduplication - mo.ui.table for dataset inspection - mo.status.progress_bar replacing tqdm - mo.ui.run_button for safe index deletion - Improved prose structure with explanations interspersed between code cells Also pins notebook dependencies in pyproject.toml. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Replace print-based result output with mo.vstack containing a bold query header and a mo.ui.table, and fix three cells that were incorrectly configured as markdown cells. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Switch embedding model to multilingual-e5-large - Refactor data prep: filter_pairs + extract_sentences(lang) to embed both English and Spanish sentences with prefixed IDs - Upsert both languages to a single namespace - Add lang column to search results table - Add cross-lingual queries (English + Spanish) and a no-keyword query to demonstrate meaning-over-keywords retrieval - Add language filtering section with lang= parameter on search() - Update How It Works to explain model selection's role in vector space - Improve prose throughout querying and filtering sections Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Add Try It Yourself section with mo.ui.text and mo.ui.radio for language filter, results update reactively on input change - Fix empty cell and duplicate filter query cells - Correct second language filter query to Spanish Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
| top_k=top_k, | ||
| inputs={"text": query}, | ||
| filter={"lang": {"$eq": lang}} if lang else None, | ||
| ) |
There was a problem hiding this comment.
Incorrect index.search call shape
High Severity
index.search passes top_k, inputs, and filter as top-level keyword arguments without a query dict. For integrated text search, Pinecone v9 still expects query={"inputs": {"text": ...}, "top_k": ..., "filter": ...}, so these calls will raise a TypeError or fail validation and break all query cells.
Reviewed by Cursor Bugbot for commit 72b422c. Configure here.
Both were added during development but are no longer used in the notebook — tqdm was replaced by mo.status.progress_bar. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
datasets and pinecone were added to [project.dependencies] by marimo's package manager during development. Notebook-specific deps belong in the notebook's inline PEP 723 metadata (# /// script block), not the root project config. Run with --sandbox to use the inline deps. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
No pyproject.toml changes on this branch that would affect the lock file. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit fc1dc00. Configure here.
| @app.cell | ||
| def _(delete_button, index_name, mo, pc): | ||
| mo.stop(not delete_button.value) | ||
| pc.indexes.delete(index_name) |
There was a problem hiding this comment.
Delete call uses positional instead of keyword argument
Low Severity
pc.indexes.delete(index_name) uses a positional argument, while the same call earlier (line 98) correctly uses name=index_name. The project review rules require preferring named keyword arguments over positional arguments, and the inconsistency within the same notebook makes the example harder to follow.
Triggered by project rule: Bugbot Configuration
Reviewed by Cursor Bugbot for commit fc1dc00. Configure here.


Summary
Adds a new marimo notebook demonstrating semantic search with Pinecone, converted and significantly expanded from the existing
docs/semantic-search.ipynb. The notebook uses Pinecone's Integrated Inference with themultilingual-e5-largemodel to demonstrate cross-lingual semantic search across English and Spanish sentences.Changes
docs/semantic-search.py(marimo format) with:pc.indexes.*,pc.index(), updated search signature)multilingual-e5-largeembedding model for cross-lingual retrievalfilter_pairs+extract_sentences(lang)to produce both English and Spanish records from Tatoebato_recordsparameterized on column name with ID prefixes for multi-language upsertmo.ui.tablefor dataset inspection,mo.status.progress_barreplacing tqdm,mo.ui.run_buttonfor safe index deletionmo.ui.textandmo.ui.radiofor language filteren/espyproject.toml: pins notebook dependencies (datasets==3.5.1,pinecone==9.0.1,numpy,tqdm)Test Plan
PINECONE_API_KEYenores🤖 Generated with Claude Code
Note
Low Risk
Low risk: this PR only adds a new documentation notebook/script and does not modify production code paths; the main impact is increased dependency/runtime expectations when running the notebook (Pinecone API key, index create/delete).
Overview
Adds a new
docs/semantic-search.pyMarimo notebook that walks through semantic search with Pinecone Integrated Inference, including index creation with themultilingual-e5-largemodel, preparing English/Spanish Tatoeba records, batchedupsert_records, andindex.searchqueries.The notebook also adds an interactive section for running queries with an optional
langmetadata filter, plus a guarded cleanup flow to delete the created index via a UI button.Reviewed by Cursor Bugbot for commit fc1dc00. Bugbot is set up for automated code reviews on this repo. Configure here.