Skip to content

Cal-target prompt: document FigureExcerpt/TableExcerpt + denominator condition#32

Open
jeliason wants to merge 1 commit into
mainfrom
fix/cal-target-prompt-figure-table-excerpt
Open

Cal-target prompt: document FigureExcerpt/TableExcerpt + denominator condition#32
jeliason wants to merge 1 commit into
mainfrom
fix/cal-target-prompt-figure-table-excerpt

Conversation

@jeliason
Copy link
Copy Markdown
Member

@jeliason jeliason commented May 5, 2026

Three issues observed in Logfire traces of pdac-build cytokine extraction (gpt-5.5):

  1. FigureExcerpt subschema undocumented. Agent emitted floats for figure_excerpt.value (str field), invented forbidden fields (panel, group, caption_excerpt), omitted required figure_id. One IL-10 retry racked up 80 validation errors.
  2. TableExcerpt subschema not documented at all.
  3. experimental_denominator was labeled optional but validator requires it for density/per-mass observables (units like pg/mg with support=positive).

This PR adds explicit YAML schemas for both excerpt types (all-string, extra=forbid), flags the common invented-field mistakes, clarifies that figure_excerpt.value is a text annotation of what was read (not the digitized numeric value), and updates experimental_denominator from optional → conditionally required with the exact validator error message inline.

…condition

Three issues observed in Logfire traces of pdac-build cytokine
extraction stage 3 (gpt-5.5):

1. FigureExcerpt was mentioned only in passing in the prompt. The agent
   repeatedly emitted floats for `figure_excerpt.value` (the schema is
   `str`), invented forbidden fields (`panel`, `group`, `caption_excerpt`),
   and omitted required `figure_id`. Some single targets racked up
   80 validation errors per retry.

2. TableExcerpt was not documented at all. Same risk class.

3. `observable.experimental_denominator` is conditionally required by
   the validator (whenever support='positive' and units look like a
   density/per-mass), but the prompt described it as optional. Targets
   like CCL2 in pg/mg of tissue triggered the validator's
   "experimental_denominator is not set" value_error and consumed a
   retry to fix.

This PR replaces validation requirement #6's brief "Exception for
figures" line with explicit YAML schemas for both FigureExcerpt
(4 string fields: figure_id, value, description, context) and
TableExcerpt (5 string fields: table_id, column, row, value, context),
flags the all-string requirement, lists the common invented-field
mistakes, and clarifies that figure_excerpt.value is a *text annotation*
of what was visually read (e.g. "~5", "lower whisker at ~5 pg/mg") —
NOT the digitized numeric value (which goes in the parent
EstimateInput.value field).

Also bumps the experimental_denominator description from "optional"
to "conditionally required (density / per-mass observables)" with the
exact validator error message documented.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant