diff --git a/canon/constraints/telemetry-validation-gate.md b/canon/constraints/telemetry-validation-gate.md index e5f1c04..60f393b 100644 --- a/canon/constraints/telemetry-validation-gate.md +++ b/canon/constraints/telemetry-validation-gate.md @@ -43,7 +43,7 @@ Sample size is one per tool per surface. Increase it for operator margin if desi 1. Enumerate every `server.tool()` registration in `workers/src/index.ts`. This is the smoke target list. 2. Drive one synthetic call per tool through the surface's `/mcp` endpoint. Record the exact `args` object sent (the JSON-RPC `params.arguments` payload) and the exact `{ content: [...] }` envelope returned by the handler — not the full HTTP request/response bodies, which include JSON-RPC framing the wrapper does not see. -3. For each call, compute the expected values locally against the same in-memory values the wrapper measures per `klappy://canon/constraints/telemetry-governance` Rule 2: `bytes_in = utf8_byte_length(JSON.stringify(args))`, `bytes_out = utf8_byte_length(JSON.stringify(content_envelope))`, `tokens_in = cl100k_count(JSON.stringify(args))`, `tokens_out = cl100k_count(JSON.stringify(content_envelope))`. For SSE-streamed responses, expected `bytes_out = 0` and `tokens_out = 0` per the Emission Contract. +3. For each call, compute the expected values locally against the same in-memory values the wrapper measures per `klappy://canon/constraints/telemetry-governance` Rule 2: `bytes_in = utf8_byte_length(JSON.stringify(args))`, `bytes_out = utf8_byte_length(JSON.stringify(content_envelope))`, `tokens_in = cl100k_count(JSON.stringify(args))`, `tokens_out = cl100k_count(JSON.stringify(content_envelope))`. Wire-level SSE framing does not zero out these values: the wrapper measures the in-memory envelope before transport, which is the failure mode the Emission Contract was designed to defeat. The "0 for streamed (SSE) responses" caveat in the §Numeric Values table refers to the old wire-edge instrumentation, not the wrapper. 4. Query `oddkit_telemetry` with `event_type = 'tool_call'`, `worker_version = `, and a timestamp window covering the smoke run. 5. Match each emitted row to the corresponding smoke call (by tool name and timing). Compare emitted versus expected on all four fields.