Experiment: DOMShell vs Raw HTML interface comparison

## Summary

Run an apples-to-apples comparison of DOMShell's AX-tree filesystem interface vs raw HTML scraping, using the **same model** (Qwen3-4B) on the **same tasks**.

## Design

**Core matrix — `[nexa, ollama] x [domshell, html]`:**

| | **DOMShell** | **Raw HTML** |
|---|---|---|
| **Nexa serve** | agent.py via MCP | raw_html_agent.py via requests+BS4 |
| **Ollama** | agent.py via MCP | raw_html_agent.py via requests+BS4 |

All 4 cells use the same Qwen3-4B weights. Only variables: interface + backend.

## Tasks (simplified for 4B model)

1. **Page title** — extract the page title from a Wikipedia article
2. **First paragraph** — extract the opening paragraph
3. **List headings** — list all section headings

12 trials total (3 tasks x 4 matrix cells), max 10 turns each.

## Goal

Validate whether DOMShell's structured interface actually helps small models extract web content, compared to feeding them raw HTML. This tests the **interface design**, not the model capability.

## Files

- `experiments/nexa_interface/` — experiment infrastructure, prompts, runner script
- `experiments/nexa_interface/raw_html_agent.py` — baseline agent using requests + BeautifulSoup
- `integrations/nexa/agent.py` — DOMShell agent (already exists)

## Related

- Previous experiment: `experiments/nexa_claude/` (model size comparison, 0/12 tasks completed)
- Follows up on roadmap item in README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment: DOMShell vs Raw HTML interface comparison #29

Summary

Design

Tasks (simplified for 4B model)

Goal

Files

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

	DOMShell	Raw HTML
Nexa serve	agent.py via MCP	raw_html_agent.py via requests+BS4
Ollama	agent.py via MCP	raw_html_agent.py via requests+BS4

Experiment: DOMShell vs Raw HTML interface comparison #29

Description

Summary

Design

Tasks (simplified for 4B model)

Goal

Files

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions