fix(observability): demote unknown-provider list_models error (Sentry TAURI-RUST-X)#2783
Conversation
… TAURI-RUST-X)
`inference_list_models` calls `providers::ops::list_configured_models`,
which surfaces `"no cloud provider with id or slug '<id>' found"` when the
user-selected provider id isn't a registered cloud provider — e.g. picking
`"ollama"`, which is actually a local runtime. That's a user-config state,
not a code bug.
Before this change the emit site at `inference/ops.rs:248` unconditionally
called `error!`, which sentry-tracing ships to Sentry as the opaque title
`"[inference::ops] list_models:error"` — the underlying error sat in a
`Rust Tracing Fields` context where no body-level classifier in
`core::observability` could reach it. Result: TAURI-RUST-X accumulated
~5740 events with no actionable triage signal.
Two changes:
1. Add `is_unknown_provider_user_config(err)` helper anchored on the
literal phrase from `provider/ops.rs:54`. Match in the emit-site arm
and demote to `warn!` so the case stays in local logs but stops
reaching Sentry.
2. Embed `{err}` in the remaining `error!` format string so future
genuine `list_models` failures land in Sentry with actionable titles
instead of the opaque shape that made TAURI-RUST-X untriageable.
The matcher is anchored on the exact emit-site phrase so sibling
list_models Sentry issues that ARE real bugs (TAURI-RUST-12 JSON parse,
TAURI-RUST-2W reqwest builder, TAURI-RUST-JP local ollama_admin
transport) continue to surface — covered by negative test cases.
## Test plan
- [x] `cargo test is_unknown_provider_user_config` — 2 new tests pass
- [x] `cargo test openhuman::inference` — 687 tests pass, 0 regressions
- [x] `cargo check --bin openhuman-core` — passes
- [x] `cargo fmt --check` on touched files — clean
📝 WalkthroughWalkthroughThe PR refines error handling in the inference service by distinguishing user-configuration issues (missing cloud provider) from operational failures. A new helper function detects the specific error pattern and triggers log-level demotion from error to warn, improving visibility into root causes. Unit tests validate the detection logic across both matching and non-matching error variants. ChangesProvider Error Handling and Log Demotion
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
graycyrus
left a comment
There was a problem hiding this comment.
@CodeGhost21 hey! the code looks good to me — the fix is well-scoped and the tests are solid. there's one CI check still pending (Windows E2E Appium), so i'll hold off on formally approving until that clears. once it's green, i'll come back and approve.
a few things worth noting for anyone else looking at this:
is_unknown_provider_user_configis narrow by design and the negative test set covers the real sibling Sentry issues (TAURI-RUST-12, TAURI-RUST-2W, TAURI-RUST-JP) explicitly — good discipline there.- in the
error!arm,{err}appears both as the structured field (error = %err) and interpolated into the format string. the format string interpolation is what gets Sentry the actionable title, and the structured field is for local log consumption. intentional and fine, but worth a comment if this pattern gets extracted into a helper later. - the out-of-scope callout in the PR description (the same anti-pattern in 5 other ops functions) is the right call — keeping this scoped to TAURI-RUST-X is the right tradeoff.
overall: good contribution, clean implementation. just needs the Windows E2E to go green.
graycyrus
left a comment
There was a problem hiding this comment.
CI is fully green now, including the Windows E2E run that was still pending on the last pass.
Code is clean. The is_unknown_provider_user_config helper is correctly anchored on the exact phrase from provider/ops.rs:54, the demote is scoped tightly, real failures still escalate, and the negative test cases cover the actual sibling Sentry issues (TAURI-RUST-12, TAURI-RUST-2W, TAURI-RUST-JP). Embedding {err} in the surviving error! format string is the right call — future Sentry titles will be actionable.
Approved.
Summary
inference_list_modelsfromerror!towarn!, so it stops escalating to Sentry as a code bug.{err}in the survivingerror!format string so future genuinelist_modelsfailures land in Sentry with actionable titles instead of the opaque"[inference::ops] list_models:error"shape.Why
Sentry TAURI-RUST-X has accumulated ~5740 events. Every event has the same opaque title because the emit site at
src/openhuman/inference/ops.rs:248was:sentry-tracingships this as a Sentry message of"[inference::ops] list_models:error"; the actual error sits in theRust Tracing Fields.errorcontext, which no body-level classifier incore::observabilitycan reach (this is why none of the sibling "demote provider-config" classifiers caught it).The latest event's underlying error:
"no cloud provider with id or slug 'ollama' found"— emitted atsrc/openhuman/inference/provider/ops.rs:54when the user picks a provider id (e.g.ollama, which is actually a local runtime) that isn't a registered cloud provider. User-config state, not a code bug. Actionable for the user (pick a registered provider, or configure Ollama via the local runtime path), not actionable for any developer reading Sentry.What changed
is_unknown_provider_user_config(err: &str) -> boolininference/ops.rs, anchored on the literal phrase fromprovider/ops.rs:54. Narrow on purpose so sibling failures stay observable.inference_list_modelsnow:warn!on the user-config case → not picked up by sentry-tracing's error-level filtererror!on everything else, with{err}interpolated into the format string so the Sentry title carries the underlying cause and future events are triageableOut of scope (noted for follow-up)
error!(..., "{LOG_PREFIX} <op>:error")template anti-pattern exists in ~6 otherinference::opsfunctions (status,summarize,update_local_settings, …). If those spike as their own Sentry issues, the same pattern (or an extractedlog_inference_result(op, result)helper) can address them. Not bundling here to keep the PR scoped to TAURI-RUST-X.fix/sentry-28z-local-runtime-list-models, separate Sentry ID TAURI-RUST-28Z) makes Ollama/LMStudio lookups succeed by synthesizing transientcloud_providersentries. Different layer (provider/ops.rs, the source) vs. this PR (inference/ops.rs, the emit site). The two are complementary defense-in-depth: that PR removes most of the cases; this PR demotes any residual unknown-provider misconfigurations so they don't reappear as a new Sentry issue.Test plan
cargo test is_unknown_provider_user_config— 2 new tests pass (positive + negative)cargo test openhuman::inference— 687 tests pass, 0 regressionscargo check --manifest-path Cargo.toml --bin openhuman-core— passescargo fmt --checkon touched files — cleanPost-merge monitoring
Watch TAURI-RUST-X event rate drop to ~0 on the next release; any residual events should now carry the actual error string in their title (kept out of the Test-plan checklist because it's a follow-up observation, not a pre-merge gate — the PR Submission Checklist workflow only counts
[x]items).Summary by CodeRabbit
Release Notes