Skip to content

fix(observability): demote unknown-provider list_models error (Sentry TAURI-RUST-X)#2783

Open
CodeGhost21 wants to merge 1 commit into
tinyhumansai:mainfrom
CodeGhost21:fix/sentry-tauri-rust-x-list-models-unknown-provider
Open

fix(observability): demote unknown-provider list_models error (Sentry TAURI-RUST-X)#2783
CodeGhost21 wants to merge 1 commit into
tinyhumansai:mainfrom
CodeGhost21:fix/sentry-tauri-rust-x-list-models-unknown-provider

Conversation

@CodeGhost21
Copy link
Copy Markdown
Contributor

@CodeGhost21 CodeGhost21 commented May 27, 2026

Summary

  • Demote the user-config "no cloud provider with id or slug '' found" error in inference_list_models from error! to warn!, so it stops escalating to Sentry as a code bug.
  • Embed {err} in the surviving error! format string so future genuine list_models failures land in Sentry with actionable titles instead of the opaque "[inference::ops] list_models:error" shape.

Why

Sentry TAURI-RUST-X has accumulated ~5740 events. Every event has the same opaque title because the emit site at src/openhuman/inference/ops.rs:248 was:

Err(err) => error!(error = %err, "{LOG_PREFIX} list_models:error"),

sentry-tracing ships this as a Sentry message of "[inference::ops] list_models:error"; the actual error sits in the Rust Tracing Fields.error context, which no body-level classifier in core::observability can reach (this is why none of the sibling "demote provider-config" classifiers caught it).

The latest event's underlying error: "no cloud provider with id or slug 'ollama' found" — emitted at src/openhuman/inference/provider/ops.rs:54 when the user picks a provider id (e.g. ollama, which is actually a local runtime) that isn't a registered cloud provider. User-config state, not a code bug. Actionable for the user (pick a registered provider, or configure Ollama via the local runtime path), not actionable for any developer reading Sentry.

What changed

  1. New helper is_unknown_provider_user_config(err: &str) -> bool in inference/ops.rs, anchored on the literal phrase from provider/ops.rs:54. Narrow on purpose so sibling failures stay observable.
  2. Emit-site match arm in inference_list_models now:
    • warn! on the user-config case → not picked up by sentry-tracing's error-level filter
    • error! on everything else, with {err} interpolated into the format string so the Sentry title carries the underlying cause and future events are triageable
  3. Tests added: positive (canonical emit-site phrase across well-formed slugs) + negative (sibling Sentry shapes TAURI-RUST-12 JSON parse, TAURI-RUST-2W reqwest builder, TAURI-RUST-JP local ollama_admin transport, plus generic timeout/permission/empty cases must still escalate).

Out of scope (noted for follow-up)

  • The same error!(..., "{LOG_PREFIX} <op>:error") template anti-pattern exists in ~6 other inference::ops functions (status, summarize, update_local_settings, …). If those spike as their own Sentry issues, the same pattern (or an extracted log_inference_result(op, result) helper) can address them. Not bundling here to keep the PR scoped to TAURI-RUST-X.
  • Complementary work in flight (fix/sentry-28z-local-runtime-list-models, separate Sentry ID TAURI-RUST-28Z) makes Ollama/LMStudio lookups succeed by synthesizing transient cloud_providers entries. Different layer (provider/ops.rs, the source) vs. this PR (inference/ops.rs, the emit site). The two are complementary defense-in-depth: that PR removes most of the cases; this PR demotes any residual unknown-provider misconfigurations so they don't reappear as a new Sentry issue.

Test plan

  • cargo test is_unknown_provider_user_config — 2 new tests pass (positive + negative)
  • cargo test openhuman::inference — 687 tests pass, 0 regressions
  • cargo check --manifest-path Cargo.toml --bin openhuman-core — passes
  • cargo fmt --check on touched files — clean

Post-merge monitoring

Watch TAURI-RUST-X event rate drop to ~0 on the next release; any residual events should now carry the actual error string in their title (kept out of the Test-plan checklist because it's a follow-up observation, not a pre-merge gate — the PR Submission Checklist workflow only counts [x] items).

Summary by CodeRabbit

Release Notes

  • Bug Fixes
    • Enhanced error handling for cloud provider configuration issues with improved diagnostic logging
    • Better distinction between user-configuration problems and system-level errors in application logs
    • More detailed error messages to support troubleshooting and accelerate issue resolution
    • Improved operational visibility through refined error categorization and more granular logging information

Review Change Stack

… TAURI-RUST-X)

`inference_list_models` calls `providers::ops::list_configured_models`,
which surfaces `"no cloud provider with id or slug '<id>' found"` when the
user-selected provider id isn't a registered cloud provider — e.g. picking
`"ollama"`, which is actually a local runtime. That's a user-config state,
not a code bug.

Before this change the emit site at `inference/ops.rs:248` unconditionally
called `error!`, which sentry-tracing ships to Sentry as the opaque title
`"[inference::ops] list_models:error"` — the underlying error sat in a
`Rust Tracing Fields` context where no body-level classifier in
`core::observability` could reach it. Result: TAURI-RUST-X accumulated
~5740 events with no actionable triage signal.

Two changes:

1. Add `is_unknown_provider_user_config(err)` helper anchored on the
   literal phrase from `provider/ops.rs:54`. Match in the emit-site arm
   and demote to `warn!` so the case stays in local logs but stops
   reaching Sentry.
2. Embed `{err}` in the remaining `error!` format string so future
   genuine `list_models` failures land in Sentry with actionable titles
   instead of the opaque shape that made TAURI-RUST-X untriageable.

The matcher is anchored on the exact emit-site phrase so sibling
list_models Sentry issues that ARE real bugs (TAURI-RUST-12 JSON parse,
TAURI-RUST-2W reqwest builder, TAURI-RUST-JP local ollama_admin
transport) continue to surface — covered by negative test cases.

## Test plan
- [x] `cargo test is_unknown_provider_user_config` — 2 new tests pass
- [x] `cargo test openhuman::inference` — 687 tests pass, 0 regressions
- [x] `cargo check --bin openhuman-core` — passes
- [x] `cargo fmt --check` on touched files — clean
@CodeGhost21 CodeGhost21 requested a review from a team May 27, 2026 20:07
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 27, 2026

📝 Walkthrough

Walkthrough

The PR refines error handling in the inference service by distinguishing user-configuration issues (missing cloud provider) from operational failures. A new helper function detects the specific error pattern and triggers log-level demotion from error to warn, improving visibility into root causes. Unit tests validate the detection logic across both matching and non-matching error variants.

Changes

Provider Error Handling and Log Demotion

Layer / File(s) Summary
Unknown provider detection helper
src/openhuman/inference/ops.rs
A private is_unknown_provider_user_config helper using substring matching detects "no cloud provider with id or slug … found" errors; warn import is added to the tracing module imports.
Error handling log-level demotion
src/openhuman/inference/ops.rs
inference_list_models error handling branches on the helper: unknown provider errors log at warn! with provider_id, other errors log at error! with the underlying error text included.
Detection helper validation tests
src/openhuman/inference/ops_tests.rs
Two unit test suites: positive validation that the helper matches the canonical provider-not-found message with various provider_id values, and negative validation that representative error messages (parse, HTTP, transport, timeout, permission issues) correctly return false.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

working, bug

Suggested reviewers

  • graycyrus
  • M3gA-Mind
  • senamakel

Poem

🐰 A helper hops to catch the lost provider,
Warns when clouds drift away from user's desire,
Tests ensure false alarms ne'er collide,
Error logs cleaner, with nowhere to hide! 🌩️

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title directly describes the primary change: demoting a specific unknown-provider error in list_models from error to warn level for better observability.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added working A PR that is being worked on by the team. bug labels May 27, 2026
Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CodeGhost21 hey! the code looks good to me — the fix is well-scoped and the tests are solid. there's one CI check still pending (Windows E2E Appium), so i'll hold off on formally approving until that clears. once it's green, i'll come back and approve.

a few things worth noting for anyone else looking at this:

  • is_unknown_provider_user_config is narrow by design and the negative test set covers the real sibling Sentry issues (TAURI-RUST-12, TAURI-RUST-2W, TAURI-RUST-JP) explicitly — good discipline there.
  • in the error! arm, {err} appears both as the structured field (error = %err) and interpolated into the format string. the format string interpolation is what gets Sentry the actionable title, and the structured field is for local log consumption. intentional and fine, but worth a comment if this pattern gets extracted into a helper later.
  • the out-of-scope callout in the PR description (the same anti-pattern in 5 other ops functions) is the right call — keeping this scoped to TAURI-RUST-X is the right tradeoff.

overall: good contribution, clean implementation. just needs the Windows E2E to go green.

Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI is fully green now, including the Windows E2E run that was still pending on the last pass.

Code is clean. The is_unknown_provider_user_config helper is correctly anchored on the exact phrase from provider/ops.rs:54, the demote is scoped tightly, real failures still escalate, and the negative test cases cover the actual sibling Sentry issues (TAURI-RUST-12, TAURI-RUST-2W, TAURI-RUST-JP). Embedding {err} in the surviving error! format string is the right call — future Sentry titles will be actionable.

Approved.

@oxoxDev oxoxDev assigned oxoxDev and unassigned oxoxDev May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug working A PR that is being worked on by the team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants