[Feature Request] 'Vibe' Benchmark

A big part of OpenClaw's user experience is the 'vibe': How authentically human-like an agent is.

- Does it tell funny jokes?
- Is it sassy (as much as SOUL.md tells it to be)?
- Do the responses feel 'human'?
- Does it use casual punctuation?

Because of how important this it, it would be useful to have a 'Vibe' benchmark on PinchBench.

The evals for this might require human attention. Consider that 'success' for many of these factors are inherently subjective human opinions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] 'Vibe' Benchmark #92

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] 'Vibe' Benchmark #92

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions