feat: Improve generation failure reporting for schema and timeout failures#416
Conversation
Greptile SummaryThis PR significantly improves the observability of generation failures by threading structured Key changes:
Two issues remain worth attention:
|
| Filename | Overview |
|---|---|
| packages/data-designer-engine/src/data_designer/engine/dataset_builders/column_wise_builder.py | Adds column_name to fan-out context, new _classify_worker_failure / _extract_failure_detail / _format_worker_failure_warning helpers, and a fail-loud RuntimeError in _worker_error_callback. Two issues: progress_tracker.record_failure() is skipped when RuntimeError is raised, and the "schema" in detail fallback check is too broad. |
| packages/data-designer-engine/src/data_designer/engine/models/errors.py | Introduces _normalize_error_detail, expands GenerationValidationFailureError and ModelGenerationValidationFailureError with structured detail and failure_kind attributes, and fixes the trailing-period double-punctuation bug. Changes look correct and well-tested. |
| packages/data-designer-engine/src/data_designer/engine/models/facade.py | Adds _classify_generation_failure_kind and _build_generation_validation_error helpers; all four ParserException raise sites in generate and agenerate now use the helper to carry structured detail and failure_kind through to GenerationValidationFailureError. Logic is symmetric and well-tested. |
| packages/data-designer-engine/tests/engine/dataset_builders/test_column_wise_builder.py | Adds tests for schema-validation detail, timeout detail, missing-context RuntimeError, and column_name propagation in both thread and async fan-out paths. Coverage is solid for the happy and fail-loud paths, but no test covers _make_error_callback to verify progress_tracker.record_failure() is (or is not) called when RuntimeError propagates. |
| packages/data-designer-engine/tests/engine/models/test_facade.py | New tests verify that both generate and agenerate surface detail and failure_kind on the raised ModelGenerationValidationFailureError. Tests are clear and symmetric. |
| packages/data-designer-engine/tests/engine/models/test_model_errors.py | Updated parametrized fixture and two new standalone tests confirm failure_kind propagation and double-period stripping in handle_llm_exceptions. All look correct. |
Sequence Diagram
sequenceDiagram
participant MF as ModelFacade.generate()
participant BVE as _build_generation_validation_error()
participant GVFE as GenerationValidationFailureError
participant HLE as handle_llm_exceptions()
participant MGVFE as ModelGenerationValidationFailureError
participant WEC as _worker_error_callback()
participant CWF as _classify_worker_failure()
participant EFD as _extract_failure_detail()
participant LOG as logger.warning()
MF->>MF: parser(response) → raises ParserException
MF->>BVE: _build_generation_validation_error(summary, exc)
BVE->>BVE: _classify_generation_failure_kind(exc)
Note right of BVE: "schema_validation" or "parse_error"
BVE-->>MF: GenerationValidationFailureError(summary, detail, failure_kind)
MF->>HLE: catch_llm_exceptions → handle_llm_exceptions(exc)
HLE->>HLE: detail_text = exc.detail.rstrip(".")
HLE-->>MGVFE: raise ModelGenerationValidationFailureError(msg, detail, failure_kind)
MGVFE->>WEC: _worker_error_callback(exc, context={index, column_name})
WEC->>WEC: _format_worker_failure_warning(exc, context)
WEC->>CWF: _classify_worker_failure(exc)
CWF->>CWF: getattr(exc, "failure_kind") → "schema_validation"
CWF-->>WEC: "schema validation"
WEC->>EFD: _extract_failure_detail(exc)
EFD->>EFD: getattr(exc, "detail") → structured detail string
EFD-->>WEC: detail string
WEC->>LOG: "⚠️ Generation for record at index N failed in column 'X' (schema validation). Detail: ..."
WEC->>WEC: _records_to_drop.add(context["index"])
Prompt To Fix All With AI
This is a comment left during a code review.
Path: packages/data-designer-engine/src/data_designer/engine/dataset_builders/column_wise_builder.py
Line: 545-550
Comment:
**`progress_tracker.record_failure()` skipped when `RuntimeError` is raised**
`_make_error_callback` calls `progress_tracker.record_failure()` only if `_worker_error_callback` returns normally. When `context` is `None` or missing `"index"`, `_worker_error_callback` raises `RuntimeError` before returning, so `record_failure()` is never reached. This means the progress tracker will undercount failures for that execution slot.
In the thread executor path the `RuntimeError` is caught by Python's done-callback machinery and swallowed (as already noted). In the async path it is caught by the explicit `try/except Exception` in `_run_task`. In both cases `record_failure()` is never called, leaving the progress bar one count short.
A minimal fix is to call `record_failure()` before the guard raises:
```python
def _worker_error_callback(self, exc: Exception, *, context: dict | None = None) -> None:
"""If a worker fails, we can handle the exception here."""
logger.warning(self._format_worker_failure_warning(exc, context=context))
if context is None or "index" not in context:
raise RuntimeError("Worker error callback called without a valid context index.")
self._records_to_drop.add(context["index"])
```
Alternatively, move `progress_tracker.record_failure()` into a `finally` block inside the wrapper in `_make_error_callback` so it is always called regardless of whether `_worker_error_callback` raises.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: packages/data-designer-engine/src/data_designer/engine/dataset_builders/column_wise_builder.py
Line: 527-528
Comment:
**Overly broad `"schema" in detail` fallback may misclassify unrelated errors**
The condition `"schema" in detail` runs for any exception that does not carry a structured `failure_kind` attribute (e.g., `ModelStructuredOutputError`, `ModelBadRequestError`, or future custom errors). If such an error's formatted cause string happens to contain the word "schema" — for instance, a bad-request error referencing an "API schema" or a structured-output error mentioning "output schema" — it will be labelled `"schema validation"` in the warning, which can mislead operators diagnosing the root cause.
The first sub-condition `"response_schema" in detail` is already highly specific. Replacing the broader fallback with an equally precise keyword (or removing it) would prevent false-positive classification:
```python
if "response_schema" in detail or "model_validate" in detail or "doesn't match requested" in detail:
return "schema validation"
```
Alternatively, demote the catch-all `"schema"` check to after the `"validation"` check so that less-specific matches are only reached if nothing else fits.
How can I resolve this? If you propose a fix, please make it concise.Last reviewed commit: f0f12fe
|
Follow-up update pushed in
Re-ran: |
|
Wanting to go ahead and get some more useful logging in to help for parsing structured output failures. |
packages/data-designer-engine/src/data_designer/engine/dataset_builders/column_wise_builder.py
Outdated
Show resolved
Hide resolved
packages/data-designer-engine/src/data_designer/engine/models/errors.py
Outdated
Show resolved
Hide resolved
|
Follow-up update pushed in
Re-ran: |
| def _worker_error_callback(self, exc: Exception, *, context: dict | None = None) -> None: | ||
| """If a worker fails, we can handle the exception here.""" | ||
| logger.warning( | ||
| f"⚠️ Generation for record at index {context['index']} failed. " | ||
| f"Will omit this record from the dataset.\n{exc}" | ||
| ) | ||
| logger.warning(self._format_worker_failure_warning(exc, context=context)) | ||
| if context is None or "index" not in context: | ||
| raise RuntimeError("Worker error callback called without a valid context index.") | ||
| self._records_to_drop.add(context["index"]) |
There was a problem hiding this comment.
RuntimeError silently swallowed by both executor paths
The RuntimeError raised here is intended to provide fail-loud behaviour when context is missing, but both executor implementations prevent it from ever reaching the main thread:
- Thread path (
ConcurrentThreadExecutor):_error_callbackis called at line 188 ofconcurrency.py, outside any try/except inside_handle_future._handle_futureis registered as adone_callbackviafuture.add_done_callback(_handle_future). Python'sconcurrent.futurescatches and only logs (viaLOGGER.exception) any exception raised from a done-callback — it does not propagate to the caller of_fan_out_with_threads. - Async path (
AsyncConcurrentExecutor): The callback is already wrapped in an explicittry/except Exceptionat lines 215–219 ofasync_concurrency.py, which catches theRuntimeErrorand logs"error_callback raised an exception".
In both production paths the RuntimeError is silently swallowed. The test test_worker_error_callback_requires_context_index exercises the direct method call — not the integrated path — so this gap is not caught by the test suite.
Because the _records_to_drop.add(context["index"]) line is never reached when context is missing, the failed record is also not dropped from the dataset (the exact silent-data-corruption risk flagged in a previous review comment).
Consider moving the context validation to the outermost fan-out site (before the executor loop) where the exception can propagate freely, or guard the record-drop with a logged error and a bare return so the callback's side-effect is clearly skipped without the false-safe of an unreachable raise.
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/data-designer-engine/src/data_designer/engine/dataset_builders/column_wise_builder.py
Line: 545-550
Comment:
**`RuntimeError` silently swallowed by both executor paths**
The `RuntimeError` raised here is intended to provide fail-loud behaviour when context is missing, but both executor implementations prevent it from ever reaching the main thread:
- **Thread path** (`ConcurrentThreadExecutor`): `_error_callback` is called at line 188 of `concurrency.py`, outside any try/except inside `_handle_future`. `_handle_future` is registered as a `done_callback` via `future.add_done_callback(_handle_future)`. Python's `concurrent.futures` catches and only logs (via `LOGGER.exception`) any exception raised from a done-callback — it does **not** propagate to the caller of `_fan_out_with_threads`.
- **Async path** (`AsyncConcurrentExecutor`): The callback is already wrapped in an explicit `try/except Exception` at lines 215–219 of `async_concurrency.py`, which catches the `RuntimeError` and logs `"error_callback raised an exception"`.
In both production paths the `RuntimeError` is silently swallowed. The test `test_worker_error_callback_requires_context_index` exercises the direct method call — not the integrated path — so this gap is not caught by the test suite.
Because the `_records_to_drop.add(context["index"])` line is never reached when context is missing, the failed record is also not dropped from the dataset (the exact silent-data-corruption risk flagged in a previous review comment).
Consider moving the context validation to the outermost fan-out site (before the executor loop) where the exception can propagate freely, or guard the record-drop with a logged error and a bare `return` so the callback's side-effect is clearly skipped without the false-safe of an unreachable `raise`.
How can I resolve this? If you propose a fix, please make it concise.
packages/data-designer-engine/src/data_designer/engine/dataset_builders/column_wise_builder.py
Show resolved
Hide resolved
FindingsCritical — Must fix before mergeNo critical issues found. Warnings — Strongly recommend fixing
def _worker_error_callback(self, exc: Exception, *, context: dict | None = None) -> None:
"""If a worker fails, we can handle the exception here."""
logger.warning(self._format_worker_failure_warning(exc, context=context))
if context is None or "index" not in context:
logger.error("Worker error callback called without a valid context index; record cannot be dropped.")
return
self._records_to_drop.add(context["index"])This was flagged in a prior review; the current commit addressed it by adding the
omit_msg = (
"Will omit this record from the dataset."
if record_index != "unknown"
else "Record index is unknown; this record may not be omitted correctly."
)This was also flagged in a prior review and remains unaddressed.
Suggestions — Consider improving
What Looks Good
VerdictShip it (with nits) — The core error reporting improvement is well-designed and thoroughly tested. The Next steps
|
| def _worker_error_callback(self, exc: Exception, *, context: dict | None = None) -> None: | ||
| """If a worker fails, we can handle the exception here.""" | ||
| logger.warning( | ||
| f"⚠️ Generation for record at index {context['index']} failed. " | ||
| f"Will omit this record from the dataset.\n{exc}" | ||
| ) | ||
| logger.warning(self._format_worker_failure_warning(exc, context=context)) | ||
| if context is None or "index" not in context: | ||
| raise RuntimeError("Worker error callback called without a valid context index.") | ||
| self._records_to_drop.add(context["index"]) |
There was a problem hiding this comment.
progress_tracker.record_failure() skipped when RuntimeError is raised
_make_error_callback calls progress_tracker.record_failure() only if _worker_error_callback returns normally. When context is None or missing "index", _worker_error_callback raises RuntimeError before returning, so record_failure() is never reached. This means the progress tracker will undercount failures for that execution slot.
In the thread executor path the RuntimeError is caught by Python's done-callback machinery and swallowed (as already noted). In the async path it is caught by the explicit try/except Exception in _run_task. In both cases record_failure() is never called, leaving the progress bar one count short.
A minimal fix is to call record_failure() before the guard raises:
def _worker_error_callback(self, exc: Exception, *, context: dict | None = None) -> None:
"""If a worker fails, we can handle the exception here."""
logger.warning(self._format_worker_failure_warning(exc, context=context))
if context is None or "index" not in context:
raise RuntimeError("Worker error callback called without a valid context index.")
self._records_to_drop.add(context["index"])Alternatively, move progress_tracker.record_failure() into a finally block inside the wrapper in _make_error_callback so it is always called regardless of whether _worker_error_callback raises.
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/data-designer-engine/src/data_designer/engine/dataset_builders/column_wise_builder.py
Line: 545-550
Comment:
**`progress_tracker.record_failure()` skipped when `RuntimeError` is raised**
`_make_error_callback` calls `progress_tracker.record_failure()` only if `_worker_error_callback` returns normally. When `context` is `None` or missing `"index"`, `_worker_error_callback` raises `RuntimeError` before returning, so `record_failure()` is never reached. This means the progress tracker will undercount failures for that execution slot.
In the thread executor path the `RuntimeError` is caught by Python's done-callback machinery and swallowed (as already noted). In the async path it is caught by the explicit `try/except Exception` in `_run_task`. In both cases `record_failure()` is never called, leaving the progress bar one count short.
A minimal fix is to call `record_failure()` before the guard raises:
```python
def _worker_error_callback(self, exc: Exception, *, context: dict | None = None) -> None:
"""If a worker fails, we can handle the exception here."""
logger.warning(self._format_worker_failure_warning(exc, context=context))
if context is None or "index" not in context:
raise RuntimeError("Worker error callback called without a valid context index.")
self._records_to_drop.add(context["index"])
```
Alternatively, move `progress_tracker.record_failure()` into a `finally` block inside the wrapper in `_make_error_callback` so it is always called regardless of whether `_worker_error_callback` raises.
How can I resolve this? If you propose a fix, please make it concise.| if "response_schema" in detail or "schema" in detail: | ||
| return "schema validation" |
There was a problem hiding this comment.
Overly broad "schema" in detail fallback may misclassify unrelated errors
The condition "schema" in detail runs for any exception that does not carry a structured failure_kind attribute (e.g., ModelStructuredOutputError, ModelBadRequestError, or future custom errors). If such an error's formatted cause string happens to contain the word "schema" — for instance, a bad-request error referencing an "API schema" or a structured-output error mentioning "output schema" — it will be labelled "schema validation" in the warning, which can mislead operators diagnosing the root cause.
The first sub-condition "response_schema" in detail is already highly specific. Replacing the broader fallback with an equally precise keyword (or removing it) would prevent false-positive classification:
if "response_schema" in detail or "model_validate" in detail or "doesn't match requested" in detail:
return "schema validation"Alternatively, demote the catch-all "schema" check to after the "validation" check so that less-specific matches are only reached if nothing else fits.
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/data-designer-engine/src/data_designer/engine/dataset_builders/column_wise_builder.py
Line: 527-528
Comment:
**Overly broad `"schema" in detail` fallback may misclassify unrelated errors**
The condition `"schema" in detail` runs for any exception that does not carry a structured `failure_kind` attribute (e.g., `ModelStructuredOutputError`, `ModelBadRequestError`, or future custom errors). If such an error's formatted cause string happens to contain the word "schema" — for instance, a bad-request error referencing an "API schema" or a structured-output error mentioning "output schema" — it will be labelled `"schema validation"` in the warning, which can mislead operators diagnosing the root cause.
The first sub-condition `"response_schema" in detail` is already highly specific. Replacing the broader fallback with an equally precise keyword (or removing it) would prevent false-positive classification:
```python
if "response_schema" in detail or "model_validate" in detail or "doesn't match requested" in detail:
return "schema validation"
```
Alternatively, demote the catch-all `"schema"` check to after the `"validation"` check so that less-specific matches are only reached if nothing else fits.
How can I resolve this? If you propose a fix, please make it concise.
Summary
ModelFacadeexhausts correction attemptsColumnWiseDatasetBuilderWhat changed
GenerationValidationFailureErrornow carries normalized validation detail from the underlyingParserExceptionhandle_llm_exceptions()includes that preserved detail inModelGenerationValidationFailureErrorfailure_kindnow propagates throughModelGenerationValidationFailureErrorand is consumed directly by the dataset-builder warning pathcolumn_nameinto the worker error callback for both thread and async execution pathsDetail:lineClassification notes
schema validationblock) and missing fenced code output are reported asparse error`schema validationReporting examples
Before:
After (schema validation):
After (parse error: missing fenced JSON/code output):
After (timeout):
Testing