Skip to content

Add query expansion to improve retrieval for ambiguous questions #249

@neuromechanist

Description

@neuromechanist

Problem

When users ask questions with ambiguous or overloaded terms, the assistant's retrieval tools may not find the most relevant documentation. This was identified during feedback from MNE-Python maintainers (mne-tools/mne-python#13702):

  1. BDF status channel question: "How do I parse the status channel of a BDF file correctly?" -- The word "status" is ambiguous (general status vs. the specific BioSemi status channel). The assistant retrieved general BDF reading docs but missed the specific status channel behavior documented in read_raw_bdf().

  2. Eyetracking unit conversion: "How can I convert the units of eyetracking data from pixels-on-screen to radians of visual angle using MNE-Python?" -- The assistant said MNE doesn't have built-in functions for this, but mne.preprocessing.eyetracking.convert_units exists and is used in 3 tutorials.

In both cases, rephrasing the question (e.g., putting status in backticks, or asking about the specific function) produced correct answers. The retrieval worked; the query formulation was the bottleneck.

Proposed Solution

Add a query expansion step before tool calls. When the user asks a question, the agent (or a lightweight pre-processing step) generates multiple reformulations to improve retrieval coverage:

Original: "How do I parse the status channel of a BDF file correctly?"
Expanded queries:

  • read_raw_bdf status channel
  • BDF status channel trigger events
  • BioSemi status channel parsing

Original: "How can I convert eyetracking data from pixels to radians?"
Expanded queries:

  • eyetracking convert_units pixels radians
  • mne.preprocessing.eyetracking
  • eye tracking unit conversion visual angle

Implementation Options

  1. Prompt-based expansion: Add instructions to the system prompt telling the agent to generate multiple search queries per question and call tools multiple times with different phrasings.
  2. Pre-processing agent: A lightweight agent node that runs before the main agent, expanding the user's question into multiple retrieval queries. This could be a simple LLM call with a focused prompt.
  3. Tool-level expansion: Modify the retrieval tools themselves to do query expansion internally (e.g., generate synonyms, extract technical terms).

Option 1 is simplest and can be tried first. Option 2 is more robust for production.

Acceptance Criteria

  • Ambiguous questions produce expanded search queries
  • The BDF status channel question returns the correct read_raw_bdf() documentation
  • The eyetracking conversion question finds mne.preprocessing.eyetracking.convert_units
  • Query expansion does not significantly increase response latency (< 1s additional)
  • Works across all communities, not just MNE

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1Priority 1: Critical, fix as soon as possiblechat-experienceduplicateThis issue or pull request already existsfeatureNew feature or enhancement

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions