Search engine for RoboCup Team Description Papers (TDPs). Hybrid dense + sparse search over 2000+ papers across all RoboCup leagues.
Live at tdpsearch.com.
Rust workspace with the following crates:
| Crate | Type | Description |
|---|---|---|
web |
Binary | Axum HTTP API server |
mcp |
Binary | MCP (Model Context Protocol) server for LLM integration |
frontend |
SvelteKit | Web UI with Tailwind CSS |
api |
Library | Shared business logic (search, list, filter) |
data_access |
Library | Trait-based clients: Qdrant, SQLite, OpenAI |
data_processing |
Library | Chunking, embedding, IDF, search orchestration |
data_structures |
Library | Shared types (TDPName, League, Chunk, ContentItem, Filter) |
event_processing |
Library | Fire-and-forget event system: activity logging, Telegram notifications |
configuration |
Library | Config loading and client initialization |
tools |
Binaries | CLI tools for initialization, search, and analytics |
- Rust (edition 2024)
- Docker (for Qdrant)
- Node.js 22+ (for frontend)
-
Start Qdrant vector database:
make qdrant-restart -
Create
config.tomlfrom the example:cp config.toml.example config.tomlFill in your OpenAI API key and the path to your TDP markdown files.
Note:
embedding_sizemust match the embed model's output dimension. If you change models, re-runmake initto rebuild the Qdrant collection — mismatches cause silent failures. -
Initialize the database (parse TDPs, compute embeddings, build IDF):
make init -
Start the web server and frontend:
make web # API server on :50000 make ui # SvelteKit dev server on :50080Note:
make webrunscargo run -p web, which serves the built frontend from./static/. If you run it directly withoutmake ui, create a symlink first:ln -s frontend/build static.
make docker # build and start all services
make docker-logs # follow logs
make docker-down # stop
Before building Docker images, create your configuration files:
# Create Docker config from example
cp config.docker.toml.example config.docker.toml
# Edit config.docker.toml and add your OpenAI API key (or leave empty and use env vars)
# Create .env file for runtime overrides
cp .env.example .env
# Edit .env and configure your API keysDocker images bake in config.docker.toml as the default config. Settings can be overridden at runtime via environment variables using the TDP_ prefix and __ (double underscore) as a separator for nested keys.
For example, to set the OpenAI API key:
TDP_DATA_ACCESS__EMBED__OPENAI__API_KEY=sk-proj-...
There are two ways to pass environment variables to the containers:
-
env_filedirective indocker-compose.yml— addenv_file: .envto a service to inject all variables from.envdirectly into the container. -
environmentblock with interpolation — reference host/.envvariables using${VAR}syntax in docker-compose.yml. Note: Docker Compose auto-loads.envonly for${...}interpolation within the compose file itself, it does not automatically pass.envvariables into containers.
To get started, copy the example env file and fill in your values:
cp .env.example .env
The mcp and web services use depends_on with a health check on Qdrant's /healthz endpoint. They will not start until Qdrant is ready to accept connections. If you need to rebuild after changing config.docker.toml, run docker compose up --build since the config is copied into the image at build time.
All CLI tools live in the tools crate and are run via cargo run -p tools --bin <name>.
Parses TDP markdown files, computes embeddings, builds IDF, and upserts everything into Qdrant + SQLite.
make init
# or: cargo run --release -p tools --bin initialize
End-to-end verification: searches every (league, year) combination across all three search types (sparse, dense, hybrid) against a live Qdrant instance. Run after reindexing to catch filter mismatches or embedding alignment issues.
make smoke-test
CLI search with --mode (dense/sparse/hybrid) and --type (text/table/image) flags.
make search "omniwheels"
make search-text "omniwheels" # text content only
make search-table "omniwheels" # tables only
make search-image "omniwheels" # images only
# or: cargo run -p tools --bin search_by_sentence -- "omniwheels" --type text
Query the activity log database for usage reports and scraper detection.
cargo run -p tools --bin activity -- summary # event counts by type/source, top queries
cargo run -p tools --bin activity -- summary --since 2025-06-01
cargo run -p tools --bin activity -- recent # last 20 events
cargo run -p tools --bin activity -- recent --limit 50
cargo run -p tools --bin activity -- agents # user-agent and IP breakdown
cargo run -p tools --bin activity -- agents --since 2025-06-01
Or via Make:
make activity ARGS="summary"
make activity ARGS="agents --since 2025-06-01"
make activity-docker ARGS="summary" # query the Docker-deployed activity DB
Corpus coverage analysis: checks parsing completeness, indexing status, and metadata gaps.
cargo run -p tools --bin coverage # all reports
cargo run -p tools --bin coverage -- parsing # PDFs vs markdowns
cargo run -p tools --bin coverage -- indexing # disk vs DB
cargo run -p tools --bin coverage -- heatmap # league×year grid
cargo run -p tools --bin coverage -- teams # missing team metadata
Generate an HMAC authentication code for a team.
cargo run -p tools --bin generate_team_code -- --team "RoboTeam Twente"
Upsert metadata (website, GitHub, socials) for a team in the registry.
cargo run -p tools --bin set_team_metadata -- --team "RoboTeam Twente" --key "github" --value "https://github.com/RoboTeamTwente"
Upsert metadata for a league in the registry.
cargo run -p tools --bin set_league_metadata -- --league "Soccer SmallSize" --key "github" --value "https://github.com/RoboCup-SSL"
Services:
| Target | Description |
|---|---|
make web |
Start the Axum API server on :50000 |
make mcp |
Start the MCP servers (:50001 open, :50002 OAuth) |
make ui |
Start the SvelteKit dev server on :50080 |
Tools:
| Target | Description |
|---|---|
make init |
Initialize database (parse, embed, index) |
make smoke-test |
End-to-end search verification across all leagues/years |
make search "query" |
Hybrid search for a query |
make search-text "query" |
Search text content only |
make search-table "query" |
Search tables only |
make search-image "query" |
Search images only |
make activity ARGS="..." |
Run the activity analytics CLI |
make activity-docker ARGS="..." |
Activity analytics against Docker-deployed DB |
Infrastructure:
| Target | Description |
|---|---|
make qdrant-restart |
Restart Qdrant Docker container |
make qdrant-snapshot |
Create Qdrant snapshot for Docker image |
make rebuild-index |
Full teardown → reindex → snapshot → Docker rebuild |
make docker |
Build and start all services via Docker Compose |
make docker-logs |
Follow Docker Compose logs |
make docker-down |
Stop Docker Compose |
make leagues |
Quick API test: list all leagues |
The MCP server exposes TDP search functionality to LLMs. Available tools:
search- Hybrid semantic + keyword search across all TDPslist_papers- List papers with optional league/year/team filterslist_teams- List team names with optional hint filterlist_leagues- List all RoboCup leagueslist_years- List years with optional league/year/team filtersget_tdp_contents- Retrieve full markdown of a specific paperget_table_of_contents- Get the structured table of contents of a paperget_abstract- Get a paper's abstractget_section- Get a specific section by content sequence numberget_references- Get the references/bibliography of a paperget_paper_info- Get paper metadata (team, league, year, authors)get_team_info- Get team metadata (website, GitHub, socials)get_league_info- Get league metadata (official sites, GitHub orgs, rules, socials)submit_suggestion- Submit feedback or suggestions about the TDP search system
cargo run -p mcp
All interactions (searches, paper opens, list operations) are logged to data/activity.db from both Web and MCP sources. HTTP requests from the web server also capture IP and user-agent for scraper detection.
Configure in config.toml:
[event_processing.activity.sqlite]
filename = "data/activity.db"