Experiment: DOMShell vs Vision (browser-use) with same local model

## Summary

Run the same Wikipedia tasks from `experiments/nexa/` using Nexa's vision-based [Web-Agent-Qwen3VL](https://github.com/NexaAI/nexa-sdk/tree/main/cookbook/PC/Web-Agent-Qwen3VL) (Playwright + screenshots) and compare against the DOMShell results.

## Why

The current Nexa experiment compares 1.7B/4B local models against Claude Opus — that's a model-size comparison, not an interface comparison. To validate DOMShell's text/AX-tree design, we need an apples-to-apples test: **same model, same tasks, different interface** (DOMShell text vs Playwright screenshots).

## Approach

- Use Nexa's existing Web-Agent cookbook (Playwright + browser-use + Qwen3-VL)
- Run the same 3 Wikipedia tasks from `experiments/nexa/nexa_prompts.md`
- Compare: tool calls, correctness, completeness, token usage
- Key question: does DOMShell's structured text approach outperform pixel-based browsing at the same model size?

## Expected Outcome

DOMShell should be more token-efficient (text vs images) and may enable smaller models to succeed where vision models need 4B+ parameters.

## References

- `experiments/nexa/` — existing DOMShell results (0/12 with 1.7B-4B)
- `integrations/nexa/agent.py` — DOMShell agent
- [Web-Agent-Qwen3VL cookbook](https://github.com/NexaAI/nexa-sdk/tree/main/cookbook/PC/Web-Agent-Qwen3VL) — vision-based comparison target

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment: DOMShell vs Vision (browser-use) with same local model #28

Summary

Why

Approach

Expected Outcome

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Experiment: DOMShell vs Vision (browser-use) with same local model #28

Description

Summary

Why

Approach

Expected Outcome

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions