This document defines a reusable development method for projects where:
- the product owner may not write code,
- requirements are often initially vague,
- an AI agent is expected to carry most of the engineering execution,
- quality still needs to be controlled through clear process and verification.
This method is designed to be reused across products, not only inside one repository.
The method is:
Harness-First Agentic Development
It is a layered combination of three ideas:
Agentic Engineeringdefines the collaboration model.Harness Engineeringdefines the default execution system.Ralph-style loopdefines the local recursive bugfix mechanism.
The key point is that these three ideas are not peers in practice.
Agenticis the top-level collaboration paradigm.Harnessis the primary operating model.Ralph-style loopis a local subroutine for specific problem classes.
Agentic Engineering answers:
- Who is responsible for what?
- How do humans and agents work together?
The human product owner is responsible for:
- defining goals,
- setting priorities,
- evaluating whether the product experience is right,
- providing final acceptance from a real user perspective.
The human product owner is not required to:
- design database schemas,
- design system architecture,
- implement code,
- debug infrastructure,
- write automated tests.
The agent is responsible for:
- turning vague requirements into executable plans,
- maintaining document consistency,
- implementing frontend, backend, data, and AI workflow changes,
- writing or updating tests,
- verifying builds,
- handling deployments and bugfixes,
- translating user feedback into concrete next actions.
The governing principle is:
Humans steer. Agents execute.
Harness Engineering is the main operating model.
It answers:
- How do we constrain and stabilize agent output?
- How do we maintain quality?
- How do we prevent drift between intent, implementation, and delivery?
The harness is composed of six parts.
Purpose:
- prevent direction drift,
- define scope and non-goals,
- anchor the main user path.
Typical artifacts:
- product context,
- PRD,
- journey / roadmap,
- phase plans,
- acceptance criteria.
Purpose:
- define the implementation boundary and repo rules.
Typical artifacts:
- repository rules,
- directory structure,
- naming conventions,
- commit conventions,
- UI language rules,
- deployment rules.
Purpose:
- constrain model behavior,
- reduce hallucination and output drift,
- enforce structured outputs.
Typical artifacts:
- prompts,
- schemas,
- model routing,
- fallback strategies,
- token / latency budgets.
Purpose:
- ensure that “it seems to work” is not mistaken for completion.
Typical artifacts:
- unit tests,
- route tests,
- integration tests,
- build checks,
- smoke tests,
- regression tests.
Purpose:
- keep local, staging, and production behavior aligned.
Typical artifacts:
- GitHub repository,
- CI/CD pipeline,
- environment variable policy,
- migration process,
- production verification checklist.
Purpose:
- turn real user experience into system improvement.
Typical artifacts:
- user testing notes,
- bug reports,
- support tickets,
- product feedback logs,
- regression scenarios learned from production.
This is not the global development method.
It is a local recursive problem-solving loop for narrow, well-defined issues.
Use it for:
- reproducible bugs,
- small refactors,
- schema or prompt compatibility issues,
- documentation / implementation drift.
Do not use it as the primary method for:
- product direction setting,
- information architecture redesign,
- major architectural changes,
- strategic business decisions.
- Reproduce the problem.
- Identify the root cause.
- Write a failing test or regression test.
- Make the smallest correct fix.
- Run targeted verification.
- Run broader test/build verification.
- Do production smoke validation if needed.
- Commit, push, and document the result.
The standard cycle has six stages.
Input is often vague:
- “This flow feels too complicated.”
- “I want this to be easier to use.”
- “This result does not feel trustworthy.”
At this stage, the agent must first clarify:
- who the user is,
- what task the user is actually trying to complete,
- what is in scope for this iteration,
- what is explicitly out of scope,
- what success looks like.
Once the requirement is clear, the agent turns it into an executable plan:
- UI / interaction changes,
- data model changes,
- AI workflow changes,
- API changes,
- verification criteria,
- risks and assumptions.
The agent carries implementation across all required layers:
- frontend,
- backend,
- database,
- AI workflow,
- deployment configuration.
Before handoff, the agent must verify:
- targeted tests,
- broader tests,
- build success,
- smoke checks,
- critical path correctness.
Nothing should be marked complete without verification evidence.
The product owner uses the system as a real user would.
The focus is on:
- clarity,
- trust,
- speed,
- friction,
- goal completion.
User feedback is converted into the next cycle’s input.
This closes the loop.
When trade-offs conflict, use this order:
- user task completion over technical elegance,
- main path over side capability,
- stability over novelty,
- verified behavior over assumption,
- real feedback over imagined preference,
- document consistency over local convenience.
This method is violated when:
- implementation starts before the requirement is clarified,
- an agent claims completion without verification,
- documents and current system behavior are allowed to drift apart,
- “the model is powerful” is used as a substitute for clear constraints,
- bugs are patched without reproduction or tests,
- the product owner is forced to make low-level technical decisions unnecessarily.
To reuse the method:
- replace the product’s main path,
- redefine the product owner’s task,
- replace the AI harness details,
- redefine the feedback source,
- keep the role model and closed-loop structure intact.
At minimum, create:
Product ContextProject JourneySession Handoff PromptDevelopment Method
These documents create the first working harness.
Recommended external description:
Harness-First Agentic Development is a development method where humans define goals, priorities, and acceptance, while agents execute engineering end-to-end under structured harnesses for scope, quality, deployment, and feedback.