Cal-target prompt: document FigureExcerpt/TableExcerpt + denominator condition#32
Open
jeliason wants to merge 1 commit into
Open
Cal-target prompt: document FigureExcerpt/TableExcerpt + denominator condition#32jeliason wants to merge 1 commit into
jeliason wants to merge 1 commit into
Conversation
…condition Three issues observed in Logfire traces of pdac-build cytokine extraction stage 3 (gpt-5.5): 1. FigureExcerpt was mentioned only in passing in the prompt. The agent repeatedly emitted floats for `figure_excerpt.value` (the schema is `str`), invented forbidden fields (`panel`, `group`, `caption_excerpt`), and omitted required `figure_id`. Some single targets racked up 80 validation errors per retry. 2. TableExcerpt was not documented at all. Same risk class. 3. `observable.experimental_denominator` is conditionally required by the validator (whenever support='positive' and units look like a density/per-mass), but the prompt described it as optional. Targets like CCL2 in pg/mg of tissue triggered the validator's "experimental_denominator is not set" value_error and consumed a retry to fix. This PR replaces validation requirement #6's brief "Exception for figures" line with explicit YAML schemas for both FigureExcerpt (4 string fields: figure_id, value, description, context) and TableExcerpt (5 string fields: table_id, column, row, value, context), flags the all-string requirement, lists the common invented-field mistakes, and clarifies that figure_excerpt.value is a *text annotation* of what was visually read (e.g. "~5", "lower whisker at ~5 pg/mg") — NOT the digitized numeric value (which goes in the parent EstimateInput.value field). Also bumps the experimental_denominator description from "optional" to "conditionally required (density / per-mass observables)" with the exact validator error message documented.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Three issues observed in Logfire traces of pdac-build cytokine extraction (gpt-5.5):
This PR adds explicit YAML schemas for both excerpt types (all-string, extra=forbid), flags the common invented-field mistakes, clarifies that figure_excerpt.value is a text annotation of what was read (not the digitized numeric value), and updates experimental_denominator from optional → conditionally required with the exact validator error message inline.