Skip to content

Latest commit

 

History

History
178 lines (140 loc) · 7.12 KB

File metadata and controls

178 lines (140 loc) · 7.12 KB

Workflow Runtime Proof V1

Status: deterministic MVP. Static validation only. No n8n runtime execution.

This document describes the Workflow Runtime Proof V1 loop and what it does — and does not — prove about the Open Workflow Library pipeline.

What V1 proves

End-to-end, on the developer machine, with standard-library Python only:

  1. A natural-language prompt is converted into a Universal Workflow IR (tools/prompt_to_ir.py).
  2. The IR is exported to an n8n-shaped workflow JSON (tools/export_ir_to_n8n.py).
  3. The exported workflow passes a static, structural n8n compatibility validator (tools/validate_n8n_workflow.py).
  4. Any validator finding becomes a human-reviewable repair proposal (tools/propose_runtime_repair.py) and a learning event (tools/create_learning_event.py) — both conforming to the existing project schemas.
  5. All review items roll up into a single human review queue (tools/build_review_queue.py).

Every artefact for every demo prompt is captured under reports/runtime-proof/<slug>/.

What V1 does not prove

  • The exported workflow has not been imported into n8n.
  • No node has been executed. No external service has been called.
  • No real credentials have been used. The exporter emits only safe placeholders.
  • The prompt-to-IR layer is a deterministic keyword-rule MVP, not an LLM-backed generator. Plurals, synonyms, and context shifts will be missed.
  • Repair proposals and learning events are evidence only. None are auto-applied. The curated wiki is not modified.

Static validation vs behavioural execution

validate_n8n_workflow.py is a static, structural checker. It inspects the workflow JSON for:

  • Required top-level keys (nodes, connections).
  • A trigger node from a small conservative whitelist.
  • Unique node names and IDs.
  • active: false by default.
  • No credentials, no real-looking secrets, no JWTs / API keys / private keys in any node parameter.
  • Connection references that point to nodes that exist.
  • Webhook flows that include a respondToWebhook node or are explicitly documented.
  • Sticky-note setup/safety guidance.
  • httpRequest URLs that match the safe placeholder allow-list.

It does not call n8n. It does not parse n8n's internal parameter schemas. A workflow can pass this validator and still fail to import into n8n. The validator's job is to catch the classes of failure that we can check without n8n in the loop.

Prompt-to-IR vs prompt-to-n8n

  • prompt_to_ir.py turns a prompt into a draft workflow.ir.json. The IR carries generationStatus: planned and validationStatus: needs-review. It is not a buildable workflow.
  • prompt_to_n8n.py orchestrates the IR step plus the n8n export and the static validation. The output is an n8n-shaped workflow JSON with safe placeholders and a sticky-note safety disclaimer.

The two are intentionally separate so the IR can be reviewed independently of any framework export.

Repair proposals

propose_runtime_repair.py reads validation.json from a runtime-proof slug directory and emits one proposal per material validator finding. Each proposal conforms to schemas/repair-proposal.schema.json:

  • requiresHumanReview: true
  • status: "proposed"
  • framework: "n8n"
  • validationBefore carries the pre-repair validation state
  • validationAfter is unvalidated (no auto-apply)

The tool does not modify the workflow file.

Learning events

create_learning_event.py records evidence of each validator finding into schemas/learning-event.schema.json shape. Every event carries:

  • source: "validation"
  • humanReviewStatus: "pending"
  • appliedToWiki: false

The tool does not promote events into curated wiki rules.

Human review queue

build_review_queue.py aggregates every item that needs a human into reports/review-queue.json and .md. Grouping:

  • highRiskWorkflows — from the unified catalog
  • repairProposals — pre-runtime catalog proposals
  • runtimeRepairProposals — V1 runtime-proof proposals
  • learningEvents — V1 evidence records
  • duplicateCandidates — from the duplicates analyzer
  • generatedPackReview — anchor item for behavioural testing of the expansion pack
  • promptToN8nProofs — one item per runtime-proof slug

Every item is status: pending, humanReviewRequired: true. The queue builder does not auto-approve anything.

How to run the demo cases

# 1. End-to-end proof loop, one prompt at a time:
python tools/prompt_to_n8n.py "Create a workflow that receives website leads, scores them, saves qualified leads to CRM, and alerts Slack."
python tools/prompt_to_n8n.py "Create a workflow that checks support tickets every morning, summarizes urgent tickets, and sends a manager digest."
python tools/prompt_to_n8n.py "Create a workflow for homecare admin intake that receives a referral, checks required documents, sends an internal task, and escalates missing information to a human."
python tools/prompt_to_n8n.py "Create a workflow that monitors failed payments, updates a finance tracker, and notifies the customer success team."
python tools/prompt_to_n8n.py "Create a workflow that receives a content brief, creates a draft task, assigns it to a reviewer, and sends a status update."

# 2. (Optional) Run the static validator directly on any generated workflow:
python tools/validate_n8n_workflow.py reports/runtime-proof/<slug>/workflow.n8n.json

# 3. Generate repair proposals and learning events from every proof:
python tools/propose_runtime_repair.py --all
python tools/create_learning_event.py --all

# 4. Roll everything into the review queue:
python tools/build_review_queue.py

Honest limitations

  • Static validation only. No runtime execution.
  • No external API calls. No real credentials.
  • No autonomous self-improvement. Learning events are evidence; rules enter the wiki only via human review.
  • The prompt-to-IR engine is keyword-rule. The pipeline is correct; it is not yet capable of arbitrary intent extraction.
  • The exporter uses a conservative whitelist of n8n node types. Nodes outside the whitelist are intentionally not emitted.
  • Multi-framework export (Dify, LangGraph, Make, Zapier) is not implemented in V1. Only n8n.

File map

tools/
├── prompt_to_ir.py            # prompt -> Universal Workflow IR
├── export_ir_to_n8n.py        # IR  -> n8n workflow JSON
├── validate_n8n_workflow.py   # static n8n compatibility validator
├── prompt_to_n8n.py           # orchestrator (prompt -> IR -> n8n -> validate)
├── propose_runtime_repair.py  # validation -> repair proposals (no apply)
├── create_learning_event.py   # validation -> learning events (no promotion)
└── build_review_queue.py      # all-of-the-above -> human review queue

reports/runtime-proof/<slug>/
├── prompt.txt
├── workflow.ir.json
├── workflow.n8n.json
├── validation.json
├── validation.md
├── repair-proposals.json
├── repair-proposals.md
├── learning-events.json
├── learning-events.md
└── README.md

reports/
├── review-queue.json
├── review-queue.md
└── runtime-proof-v1.{json,md}    # summary report