feat: Improve generation failure reporting for schema and timeout failures by eric-tramel · Pull Request #416 · NVIDIA-NeMo/DataDesigner

eric-tramel · 2026-03-13T21:00:33Z

Summary

preserve concise parser/schema validation details when ModelFacade exhausts correction attempts
include column name, failure category, and primary cause in the dropped-record warning emitted by ColumnWiseDatasetBuilder
keep timeout failures visible in the same reporting path without dumping raw model responses

What changed

GenerationValidationFailureError now carries normalized validation detail from the underlying ParserException
handle_llm_exceptions() includes that preserved detail in ModelGenerationValidationFailureError
failure_kind now propagates through ModelGenerationValidationFailureError and is consumed directly by the dataset-builder warning path
fan-out work items now pass column_name into the worker error callback for both thread and async execution paths
dropped-record warnings are reformatted to show record index, column, failure kind, and a concise Detail: line
tests cover schema-validation warnings, timeout warnings, preserved parser detail, and worker context propagation for both thread and async fan-out paths

Classification notes

Pydantic validation failures and JSON Schema validation failures are both reported as schema validation
missing fenced structured output (for example, no ````jsonblock) and missing fenced code output are reported asparse error`
this PR does distinguish validation failures from fence-extraction/parsing failures
this PR does not yet distinguish Pydantic validation from JSON Schema validation; both remain grouped under schema validation

Reporting examples

Before:

⚠️ Generation for record at index 248 failed. Will omit this record from the dataset.

After (schema validation):

⚠️ Generation for record at index 248 failed in column 'test_column' (schema validation). Will omit this record from the dataset. Detail: The model output from 'test-model' could not be parsed into the requested format while running generation for column 'test_column'. Validation detail: Response doesn't match requested <response_schema> 'name' is a required property.

After (parse error: missing fenced JSON/code output):

⚠️ Generation for record at index 42 failed in column 'test_column' (parse error). Will omit this record from the dataset. Detail: No parsable JSON structure within ```json markdown fence.

After (timeout):

⚠️ Generation for record at index 17 failed in column 'test_column' (timeout). Will omit this record from the dataset. Detail: The request to model 'test-model' timed out while running generation for column 'test_column'.

Testing

UV_CACHE_DIR=/tmp/uv-cache uv run --no-sync --group dev pytest packages/data-designer-engine/tests/engine/models packages/data-designer-engine/tests/engine/dataset_builders
UV_CACHE_DIR=/tmp/uv-cache uv run --no-sync --group dev ruff check packages/data-designer-engine/src/data_designer/engine/models/errors.py packages/data-designer-engine/src/data_designer/engine/models/facade.py packages/data-designer-engine/src/data_designer/engine/dataset_builders/column_wise_builder.py packages/data-designer-engine/tests/engine/models/test_model_errors.py packages/data-designer-engine/tests/engine/models/test_facade.py packages/data-designer-engine/tests/engine/dataset_builders/test_column_wise_builder.py

greptile-apps · 2026-03-13T21:05:21Z

Greptile Summary

This PR significantly improves the observability of generation failures by threading structured detail and failure_kind attributes through the exception chain — from ParserException caught in ModelFacade, through GenerationValidationFailureError, all the way to ModelGenerationValidationFailureError consumed by ColumnWiseDatasetBuilder. Dropped-record warnings now include the column name, a categorized failure kind, and a concise single-line cause rather than the previous bare "Will omit this record" message.

Key changes:

GenerationValidationFailureError and ModelGenerationValidationFailureError are expanded with detail and failure_kind structured attributes, eliminating duplicate text-based re-classification in the builder.
_build_generation_validation_error() and _classify_generation_failure_kind() centralise parser-error classification at the ModelFacade level.
Both _fan_out_with_threads and _fan_out_with_async now pass column_name in the worker context dict.
_worker_error_callback now raises RuntimeError on missing context rather than silently skipping the failed record.
handle_llm_exceptions strips trailing periods from detail before composing the cause string, preventing double punctuation.
New tests cover schema-validation warnings, timeout warnings, preserved parser detail, and column_name propagation for both concurrency paths.

Two issues remain worth attention:

In _make_error_callback, progress_tracker.record_failure() is never reached when _worker_error_callback raises RuntimeError (the context-missing path), leaving the progress bar one count short for that slot.
The "schema" in detail fallback keyword in _classify_worker_failure is broad enough that unrelated errors whose formatted cause mentions "schema" will be mis-labelled "schema validation" in the warning log.

Confidence Score: 3/5

Safe to merge with minor fixes — core reporting logic is correct, but two small behavioral gaps remain in the error-callback path.
The primary goal of the PR (structured failure reporting with detail and failure_kind propagation) is implemented correctly and is well-tested end-to-end. The two remaining issues — progress_tracker.record_failure() not being called on the RuntimeError path, and the overly broad "schema" in detail fallback — are real logic gaps. The RuntimeError-path issue only manifests for a context=None case that should never occur in production given the updated fan-out code, but the absence of a test covering _make_error_callback means the gap is invisible in CI. The broad keyword match could produce misleading classification labels in operator logs for non-schema errors. Neither issue blocks functionality today, but both should be addressed before the pattern is extended further.
column_wise_builder.py — specifically _classify_worker_failure (lines 527–528) and the _make_error_callback / _worker_error_callback interaction (lines 418–423, 545–550).

Important Files Changed

Filename	Overview
packages/data-designer-engine/src/data_designer/engine/dataset_builders/column_wise_builder.py	Adds `column_name` to fan-out context, new `_classify_worker_failure` / `_extract_failure_detail` / `_format_worker_failure_warning` helpers, and a fail-loud `RuntimeError` in `_worker_error_callback`. Two issues: `progress_tracker.record_failure()` is skipped when `RuntimeError` is raised, and the `"schema" in detail` fallback check is too broad.
packages/data-designer-engine/src/data_designer/engine/models/errors.py	Introduces `_normalize_error_detail`, expands `GenerationValidationFailureError` and `ModelGenerationValidationFailureError` with structured `detail` and `failure_kind` attributes, and fixes the trailing-period double-punctuation bug. Changes look correct and well-tested.
packages/data-designer-engine/src/data_designer/engine/models/facade.py	Adds `_classify_generation_failure_kind` and `_build_generation_validation_error` helpers; all four `ParserException` raise sites in `generate` and `agenerate` now use the helper to carry structured `detail` and `failure_kind` through to `GenerationValidationFailureError`. Logic is symmetric and well-tested.
packages/data-designer-engine/tests/engine/dataset_builders/test_column_wise_builder.py	Adds tests for schema-validation detail, timeout detail, missing-context `RuntimeError`, and `column_name` propagation in both thread and async fan-out paths. Coverage is solid for the happy and fail-loud paths, but no test covers `_make_error_callback` to verify `progress_tracker.record_failure()` is (or is not) called when `RuntimeError` propagates.
packages/data-designer-engine/tests/engine/models/test_facade.py	New tests verify that both `generate` and `agenerate` surface `detail` and `failure_kind` on the raised `ModelGenerationValidationFailureError`. Tests are clear and symmetric.
packages/data-designer-engine/tests/engine/models/test_model_errors.py	Updated parametrized fixture and two new standalone tests confirm `failure_kind` propagation and double-period stripping in `handle_llm_exceptions`. All look correct.

Sequence Diagram

sequenceDiagram
    participant MF as ModelFacade.generate()
    participant BVE as _build_generation_validation_error()
    participant GVFE as GenerationValidationFailureError
    participant HLE as handle_llm_exceptions()
    participant MGVFE as ModelGenerationValidationFailureError
    participant WEC as _worker_error_callback()
    participant CWF as _classify_worker_failure()
    participant EFD as _extract_failure_detail()
    participant LOG as logger.warning()

    MF->>MF: parser(response) → raises ParserException
    MF->>BVE: _build_generation_validation_error(summary, exc)
    BVE->>BVE: _classify_generation_failure_kind(exc)
    Note right of BVE: "schema_validation" or "parse_error"
    BVE-->>MF: GenerationValidationFailureError(summary, detail, failure_kind)
    MF->>HLE: catch_llm_exceptions → handle_llm_exceptions(exc)
    HLE->>HLE: detail_text = exc.detail.rstrip(".")
    HLE-->>MGVFE: raise ModelGenerationValidationFailureError(msg, detail, failure_kind)
    MGVFE->>WEC: _worker_error_callback(exc, context={index, column_name})
    WEC->>WEC: _format_worker_failure_warning(exc, context)
    WEC->>CWF: _classify_worker_failure(exc)
    CWF->>CWF: getattr(exc, "failure_kind") → "schema_validation"
    CWF-->>WEC: "schema validation"
    WEC->>EFD: _extract_failure_detail(exc)
    EFD->>EFD: getattr(exc, "detail") → structured detail string
    EFD-->>WEC: detail string
    WEC->>LOG: "⚠️ Generation for record at index N failed in column 'X' (schema validation). Detail: ..."
    WEC->>WEC: _records_to_drop.add(context["index"])

Prompt To Fix All With AI

This is a comment left during a code review.
Path: packages/data-designer-engine/src/data_designer/engine/dataset_builders/column_wise_builder.py
Line: 545-550

Comment:
**`progress_tracker.record_failure()` skipped when `RuntimeError` is raised**

`_make_error_callback` calls `progress_tracker.record_failure()` only if `_worker_error_callback` returns normally. When `context` is `None` or missing `"index"`, `_worker_error_callback` raises `RuntimeError` before returning, so `record_failure()` is never reached. This means the progress tracker will undercount failures for that execution slot.

In the thread executor path the `RuntimeError` is caught by Python's done-callback machinery and swallowed (as already noted). In the async path it is caught by the explicit `try/except Exception` in `_run_task`. In both cases `record_failure()` is never called, leaving the progress bar one count short.

A minimal fix is to call `record_failure()` before the guard raises:

```python
def _worker_error_callback(self, exc: Exception, *, context: dict | None = None) -> None:
    """If a worker fails, we can handle the exception here."""
    logger.warning(self._format_worker_failure_warning(exc, context=context))
    if context is None or "index" not in context:
        raise RuntimeError("Worker error callback called without a valid context index.")
    self._records_to_drop.add(context["index"])
```

Alternatively, move `progress_tracker.record_failure()` into a `finally` block inside the wrapper in `_make_error_callback` so it is always called regardless of whether `_worker_error_callback` raises.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: packages/data-designer-engine/src/data_designer/engine/dataset_builders/column_wise_builder.py
Line: 527-528

Comment:
**Overly broad `"schema" in detail` fallback may misclassify unrelated errors**

The condition `"schema" in detail` runs for any exception that does not carry a structured `failure_kind` attribute (e.g., `ModelStructuredOutputError`, `ModelBadRequestError`, or future custom errors). If such an error's formatted cause string happens to contain the word "schema" — for instance, a bad-request error referencing an "API schema" or a structured-output error mentioning "output schema" — it will be labelled `"schema validation"` in the warning, which can mislead operators diagnosing the root cause.

The first sub-condition `"response_schema" in detail` is already highly specific. Replacing the broader fallback with an equally precise keyword (or removing it) would prevent false-positive classification:

```python
if "response_schema" in detail or "model_validate" in detail or "doesn't match requested" in detail:
    return "schema validation"
```

Alternatively, demote the catch-all `"schema"` check to after the `"validation"` check so that less-specific matches are only reached if nothing else fits.

How can I resolve this? If you propose a fix, please make it concise.

_{Last reviewed commit: f0f12fe}

packages/data-designer-engine/src/data_designer/engine/models/facade.py

eric-tramel · 2026-03-13T21:17:51Z

Follow-up update pushed in b06c4734 to address the Greptile notes:

failure_kind now propagates through ModelGenerationValidationFailureError and is consumed directly by the dataset-builder warning path.
Added test_fan_out_with_async_passes_column_name_in_context for async parity with the thread path.
Removed the redundant pre-normalization in facade.py.

Re-ran:
UV_CACHE_DIR=/tmp/uv-cache uv run --no-sync --group dev pytest packages/data-designer-engine/tests/engine/models packages/data-designer-engine/tests/engine/dataset_builders

eric-tramel · 2026-03-13T21:18:53Z

Wanting to go ahead and get some more useful logging in to help for parsing structured output failures.

packages/data-designer-engine/src/data_designer/engine/dataset_builders/column_wise_builder.py

packages/data-designer-engine/src/data_designer/engine/models/errors.py

eric-tramel · 2026-03-13T21:36:12Z

Follow-up update pushed in df9cfe11 to address the newest Greptile notes:

_worker_error_callback() now fails loudly if the context index is missing, rather than silently keeping the failed record.
Added async parity coverage for agenerate() to verify validation detail and failure_kind propagation.
Removed the double-punctuation risk in handle_llm_exceptions() when validation detail already ends with a period.

Re-ran:
UV_CACHE_DIR=/tmp/uv-cache uv run --no-sync --group dev pytest packages/data-designer-engine/tests/engine/models packages/data-designer-engine/tests/engine/dataset_builders

greptile-apps · 2026-03-13T21:40:29Z

packages/data-designer-engine/src/data_designer/engine/dataset_builders/column_wise_builder.py

    def _worker_error_callback(self, exc: Exception, *, context: dict | None = None) -> None:
        """If a worker fails, we can handle the exception here."""
-        logger.warning(
-            f"⚠️ Generation for record at index {context['index']} failed. "
-            f"Will omit this record from the dataset.\n{exc}"
-        )
+        logger.warning(self._format_worker_failure_warning(exc, context=context))
+        if context is None or "index" not in context:
+            raise RuntimeError("Worker error callback called without a valid context index.")
        self._records_to_drop.add(context["index"])


RuntimeError silently swallowed by both executor paths

The RuntimeError raised here is intended to provide fail-loud behaviour when context is missing, but both executor implementations prevent it from ever reaching the main thread:

Thread path (ConcurrentThreadExecutor): _error_callback is called at line 188 of concurrency.py, outside any try/except inside _handle_future. _handle_future is registered as a done_callback via future.add_done_callback(_handle_future). Python's concurrent.futures catches and only logs (via LOGGER.exception) any exception raised from a done-callback — it does not propagate to the caller of _fan_out_with_threads.

Async path (AsyncConcurrentExecutor): The callback is already wrapped in an explicit try/except Exception at lines 215–219 of async_concurrency.py, which catches the RuntimeError and logs "error_callback raised an exception".

In both production paths the RuntimeError is silently swallowed. The test test_worker_error_callback_requires_context_index exercises the direct method call — not the integrated path — so this gap is not caught by the test suite.

Because the _records_to_drop.add(context["index"]) line is never reached when context is missing, the failed record is also not dropped from the dataset (the exact silent-data-corruption risk flagged in a previous review comment).

Consider moving the context validation to the outermost fan-out site (before the executor loop) where the exception can propagate freely, or guard the record-drop with a logged error and a bare return so the callback's side-effect is clearly skipped without the false-safe of an unreachable raise.

Prompt To Fix With AI

This is a comment left during a code review. Path: packages/data-designer-engine/src/data_designer/engine/dataset_builders/column_wise_builder.py Line: 545-550 Comment: **`RuntimeError` silently swallowed by both executor paths** The `RuntimeError` raised here is intended to provide fail-loud behaviour when context is missing, but both executor implementations prevent it from ever reaching the main thread: - **Thread path** (`ConcurrentThreadExecutor`): `_error_callback` is called at line 188 of `concurrency.py`, outside any try/except inside `_handle_future`. `_handle_future` is registered as a `done_callback` via `future.add_done_callback(_handle_future)`. Python's `concurrent.futures` catches and only logs (via `LOGGER.exception`) any exception raised from a done-callback — it does **not** propagate to the caller of `_fan_out_with_threads`. - **Async path** (`AsyncConcurrentExecutor`): The callback is already wrapped in an explicit `try/except Exception` at lines 215–219 of `async_concurrency.py`, which catches the `RuntimeError` and logs `"error_callback raised an exception"`. In both production paths the `RuntimeError` is silently swallowed. The test `test_worker_error_callback_requires_context_index` exercises the direct method call — not the integrated path — so this gap is not caught by the test suite. Because the `_records_to_drop.add(context["index"])` line is never reached when context is missing, the failed record is also not dropped from the dataset (the exact silent-data-corruption risk flagged in a previous review comment). Consider moving the context validation to the outermost fan-out site (before the executor loop) where the exception can propagate freely, or guard the record-drop with a logged error and a bare `return` so the callback's side-effect is clearly skipped without the false-safe of an unreachable `raise`. How can I resolve this? If you propose a fix, please make it concise.

packages/data-designer-engine/src/data_designer/engine/dataset_builders/column_wise_builder.py

nabinchha

Approved — see review comment below.

nabinchha · 2026-03-13T21:59:52Z

Findings

Critical — Must fix before merge

No critical issues found.

Warnings — Strongly recommend fixing

column_wise_builder.py:548-549 — RuntimeError is unreachable in production

What: The RuntimeError("Worker error callback called without a valid context index.") raised at line 549 will never propagate to the main thread in either executor path. In the thread executor, _error_callback is invoked inside a done_callback registered via future.add_done_callback — Python's concurrent.futures silently swallows exceptions from done callbacks. In the async executor, the callback is wrapped in an explicit try/except Exception (line 216-219 of async_concurrency.py) that catches and logs the RuntimeError.
Why: The raise creates a false sense of safety. The warning message claims "Will omit this record from the dataset" but _records_to_drop.add(...) is never reached, so the corrupted record silently remains. The existing test (test_worker_error_callback_requires_context_index) exercises the method directly, not through the executor, so it doesn't catch this gap.
Suggestion: Replace the raise with a logged error and explicit return. The defensive check is still valuable — it just shouldn't rely on an exception that can't propagate:

def _worker_error_callback(self, exc: Exception, *, context: dict | None = None) -> None:
    """If a worker fails, we can handle the exception here."""
    logger.warning(self._format_worker_failure_warning(exc, context=context))
    if context is None or "index" not in context:
        logger.error("Worker error callback called without a valid context index; record cannot be dropped.")
        return
    self._records_to_drop.add(context["index"])

This was flagged in a prior review; the current commit addressed it by adding the RuntimeError, but the underlying swallowing issue in both executors remains.

column_wise_builder.py:540-542 — Warning message is misleading when index is unknown

What: _format_worker_failure_warning always says "Will omit this record from the dataset" even when record_index is "unknown" — i.e., when the record cannot actually be omitted because there's no valid index to add to _records_to_drop.
Why: Misleading log messages make incident investigation harder. An operator reading "Will omit this record" would assume the corrupted record was dropped.
Suggestion: Adjust the message when the index is unknown:

omit_msg = (
    "Will omit this record from the dataset."
    if record_index != "unknown"
    else "Record index is unknown; this record may not be omitted correctly."
)

This was also flagged in a prior review and remains unaddressed.

column_wise_builder.py:508-531 — Fallback text classification in _classify_worker_failure can diverge from structured failure_kind

What: When failure_kind is not set on the exception (e.g., for non-ModelGenerationValidationFailureError errors), _classify_worker_failure falls back to string matching on the detail and exception class name. Several of these heuristics are fragile — for example, "schema" in detail would match any error message that happens to contain the word "schema" even if it's not a schema validation failure. Similarly, "validation" in exc_name would match any custom exception class with "validation" in its name.
Why: The structured failure_kind path (line 510-512) is the reliable path. The fallback heuristics are a best-effort classification that could misclassify errors, leading to confusing warning messages.
Suggestion: Consider adding a comment documenting that the text-matching branch is best-effort and may misclassify. Also consider making the fallback return value more generic (e.g., just "error" instead of specific categories) to avoid false confidence in the classification.

Suggestions — Consider improving

test_column_wise_builder.py:213-231 — Test for RuntimeError exercises the wrong path

What: test_worker_error_callback_requires_context_index calls _worker_error_callback directly, which correctly triggers the RuntimeError. However, in production, this method is only called from within executor callbacks where the RuntimeError is swallowed (as noted in the warning above).
Why: The test provides coverage of the method in isolation but doesn't verify the actual production behavior. If the RuntimeError is replaced with a return (per the suggestion above), this test should be updated to verify that _records_to_drop remains empty and the error is logged.
Suggestion: If you keep the RuntimeError, add an integration-level test that exercises the error callback through the actual executor to verify the exception's effect (or lack thereof) in the real execution context.

What Looks Good

Clean error propagation chain: The failure_kind and detail attributes flow naturally from ParserException → GenerationValidationFailureError → ModelGenerationValidationFailureError → _classify_worker_failure. The structured attribute is checked first before falling back to text matching, which is the right design.
Thorough test coverage: New tests cover schema validation warnings, timeout warnings, preserved parser detail, double-period stripping, missing context handling, and worker context propagation for both thread and async fan-out paths. The parametric facade tests for both sync and async paths are well-structured.
Good separation of concerns: _extract_failure_detail, _classify_worker_failure, and _format_worker_failure_warning are cleanly separated as static/class methods, making them independently testable and easy to reason about.

Verdict

Ship it (with nits) — The core error reporting improvement is well-designed and thoroughly tested. The RuntimeError swallowing issue (carried over from a prior review) is the most substantive concern but is a defensive-path issue that only matters if a future code change accidentally omits context — which is unlikely given the current fan-out code always provides it. The misleading warning message for unknown indices is a minor UX issue. Neither blocks merge.

Next steps

Consider addressing the RuntimeError → logger.error + return change as a quick follow-up, since it was flagged in the prior review round and the current fix doesn't fully resolve it.
The warning message wording for unknown indices could be tightened in the same follow-up.

greptile-apps · 2026-03-13T22:40:13Z

packages/data-designer-engine/src/data_designer/engine/dataset_builders/column_wise_builder.py

    def _worker_error_callback(self, exc: Exception, *, context: dict | None = None) -> None:
        """If a worker fails, we can handle the exception here."""
-        logger.warning(
-            f"⚠️ Generation for record at index {context['index']} failed. "
-            f"Will omit this record from the dataset.\n{exc}"
-        )
+        logger.warning(self._format_worker_failure_warning(exc, context=context))
+        if context is None or "index" not in context:
+            raise RuntimeError("Worker error callback called without a valid context index.")
        self._records_to_drop.add(context["index"])


progress_tracker.record_failure() skipped when RuntimeError is raised

_make_error_callback calls progress_tracker.record_failure() only if _worker_error_callback returns normally. When context is None or missing "index", _worker_error_callback raises RuntimeError before returning, so record_failure() is never reached. This means the progress tracker will undercount failures for that execution slot.

In the thread executor path the RuntimeError is caught by Python's done-callback machinery and swallowed (as already noted). In the async path it is caught by the explicit try/except Exception in _run_task. In both cases record_failure() is never called, leaving the progress bar one count short.

A minimal fix is to call record_failure() before the guard raises:

def _worker_error_callback(self, exc: Exception, *, context: dict | None = None) -> None: """If a worker fails, we can handle the exception here.""" logger.warning(self._format_worker_failure_warning(exc, context=context)) if context is None or "index" not in context: raise RuntimeError("Worker error callback called without a valid context index.") self._records_to_drop.add(context["index"])

Alternatively, move progress_tracker.record_failure() into a finally block inside the wrapper in _make_error_callback so it is always called regardless of whether _worker_error_callback raises.

Prompt To Fix With AI

This is a comment left during a code review. Path: packages/data-designer-engine/src/data_designer/engine/dataset_builders/column_wise_builder.py Line: 545-550 Comment: **`progress_tracker.record_failure()` skipped when `RuntimeError` is raised** `_make_error_callback` calls `progress_tracker.record_failure()` only if `_worker_error_callback` returns normally. When `context` is `None` or missing `"index"`, `_worker_error_callback` raises `RuntimeError` before returning, so `record_failure()` is never reached. This means the progress tracker will undercount failures for that execution slot. In the thread executor path the `RuntimeError` is caught by Python's done-callback machinery and swallowed (as already noted). In the async path it is caught by the explicit `try/except Exception` in `_run_task`. In both cases `record_failure()` is never called, leaving the progress bar one count short. A minimal fix is to call `record_failure()` before the guard raises: ```python def _worker_error_callback(self, exc: Exception, *, context: dict | None = None) -> None: """If a worker fails, we can handle the exception here.""" logger.warning(self._format_worker_failure_warning(exc, context=context)) if context is None or "index" not in context: raise RuntimeError("Worker error callback called without a valid context index.") self._records_to_drop.add(context["index"]) ``` Alternatively, move `progress_tracker.record_failure()` into a `finally` block inside the wrapper in `_make_error_callback` so it is always called regardless of whether `_worker_error_callback` raises. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-13T22:40:14Z

packages/data-designer-engine/src/data_designer/engine/dataset_builders/column_wise_builder.py

+        if "response_schema" in detail or "schema" in detail:
+            return "schema validation"


Overly broad "schema" in detail fallback may misclassify unrelated errors

The condition "schema" in detail runs for any exception that does not carry a structured failure_kind attribute (e.g., ModelStructuredOutputError, ModelBadRequestError, or future custom errors). If such an error's formatted cause string happens to contain the word "schema" — for instance, a bad-request error referencing an "API schema" or a structured-output error mentioning "output schema" — it will be labelled "schema validation" in the warning, which can mislead operators diagnosing the root cause.

The first sub-condition "response_schema" in detail is already highly specific. Replacing the broader fallback with an equally precise keyword (or removing it) would prevent false-positive classification:

if "response_schema" in detail or "model_validate" in detail or "doesn't match requested" in detail: return "schema validation"

Alternatively, demote the catch-all "schema" check to after the "validation" check so that less-specific matches are only reached if nothing else fits.

Prompt To Fix With AI

This is a comment left during a code review. Path: packages/data-designer-engine/src/data_designer/engine/dataset_builders/column_wise_builder.py Line: 527-528 Comment: **Overly broad `"schema" in detail` fallback may misclassify unrelated errors** The condition `"schema" in detail` runs for any exception that does not carry a structured `failure_kind` attribute (e.g., `ModelStructuredOutputError`, `ModelBadRequestError`, or future custom errors). If such an error's formatted cause string happens to contain the word "schema" — for instance, a bad-request error referencing an "API schema" or a structured-output error mentioning "output schema" — it will be labelled `"schema validation"` in the warning, which can mislead operators diagnosing the root cause. The first sub-condition `"response_schema" in detail` is already highly specific. Replacing the broader fallback with an equally precise keyword (or removing it) would prevent false-positive classification: ```python if "response_schema" in detail or "model_validate" in detail or "doesn't match requested" in detail: return "schema validation" ``` Alternatively, demote the catch-all `"schema"` check to after the `"validation"` check so that less-specific matches are only reached if nothing else fits. How can I resolve this? If you propose a fix, please make it concise.

improve generation failure reporting

cf9977d

eric-tramel requested a review from a team as a code owner March 13, 2026 21:00

greptile-apps bot reviewed Mar 13, 2026

View reviewed changes

packages/data-designer-engine/src/data_designer/engine/models/facade.py Show resolved Hide resolved

packages/data-designer-engine/src/data_designer/engine/models/facade.py Show resolved Hide resolved

address greptile review feedback

b06c473

eric-tramel changed the title ~~Improve generation failure reporting for schema and timeout failures~~ feat: Improve generation failure reporting for schema and timeout failures Mar 13, 2026

eric-tramel self-assigned this Mar 13, 2026

greptile-apps bot reviewed Mar 13, 2026

View reviewed changes

packages/data-designer-engine/src/data_designer/engine/dataset_builders/column_wise_builder.py Outdated Show resolved Hide resolved

packages/data-designer-engine/src/data_designer/engine/models/errors.py Outdated Show resolved Hide resolved

tighten error reporting follow-ups

df9cfe1

greptile-apps bot reviewed Mar 13, 2026

View reviewed changes

nabinchha approved these changes Mar 13, 2026

View reviewed changes

eric-tramel added 3 commits March 13, 2026 18:24

Merge branch 'main' into feature/generation-error-reporting-v1

bc47201

Merge branch 'main' into feature/generation-error-reporting-v1

4a2b11e

Merge branch 'main' into feature/generation-error-reporting-v1

f0f12fe

eric-tramel merged commit 5783435 into main Mar 13, 2026
47 checks passed

greptile-apps bot reviewed Mar 13, 2026

View reviewed changes

		if "response_schema" in detail or "schema" in detail:
		return "schema validation"

Conversation

eric-tramel commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Classification notes

Reporting examples

Testing

Uh oh!

greptile-apps bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

eric-tramel commented Mar 13, 2026

Uh oh!

eric-tramel commented Mar 13, 2026

Uh oh!

Uh oh!

Uh oh!

eric-tramel commented Mar 13, 2026

Uh oh!

greptile-apps bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nabinchha left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nabinchha commented Mar 13, 2026

Findings

Critical — Must fix before merge

Warnings — Strongly recommend fixing

Suggestions — Consider improving

What Looks Good

Verdict

Next steps

Uh oh!

Uh oh!

greptile-apps bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eric-tramel commented Mar 13, 2026 •

edited

Loading

greptile-apps bot commented Mar 13, 2026 •

edited

Loading

nabinchha left a comment •

edited

Loading