Add resilience metrics and improved retry/fallback logging#150
Add resilience metrics and improved retry/fallback logging#150dougborg wants to merge 6 commits intoFoggedLens:mainfrom
Conversation
Extract duplicated retry logic into shared executeWithFallback() with ErrorDisposition-based classification (abort/fallback/retry). Services use hard-coded primary+fallback endpoint pairs: - OverpassService: overpass.deflock.org + overpass-api.de - RoutingService: dontgetflocked.com + alprwatch.org Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Split the combined null/abort guard so Dart's flow analysis promotes fallbackUrl to non-null, eliminating the unnecessary_non_null_assertion. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Evolve from hard-coded primary/fallback endpoint pairs to a user-configurable service registry. Each endpoint carries its own name, URL, enabled state, and optional retry/timeout overrides. - ServiceEndpoint model with JSON serialization and sensible defaults - ServiceRegistry<T> generic ordered list with SharedPreferences persistence - executeWithEndpointList() iterates enabled endpoints with per-endpoint policy - OverpassService and RoutingService accept List<ServiceEndpoint> - SettingsState uses ServiceRegistry instances with migration from old format - API Endpoints UI section for managing endpoint registries Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add ResilienceMetrics singleton to track request counts, retry/fallback invocations, error types, and latency. Instrument executeWithEndpointList and _executeWithRetries with structured debug logging that includes service name, attempt count, error type, and disposition. Remove deprecated executeWithFallback (no remaining callers). Introduce _ClassifiedError wrapper so callers receive the disposition from _executeWithRetries without re-classifying errors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds a generalized multi-endpoint resilience layer (retry + fallback) with in-memory metrics, and wires endpoint registries through settings/AppState plus a new Advanced Settings UI for managing API endpoints.
Changes:
- Introduce
executeWithEndpointList+ResiliencePolicy/ErrorDisposition+ResilienceMetricsinstrumentation for retry/fallback flows. - Add
ServiceRegistry<T>+ServiceEndpointmodels, settings migration from legacy single-string endpoint prefs, and AppState getters. - Add an “API Endpoints” settings section UI and expand tests to cover registries, migrations, and fallback behavior for routing/overpass.
Reviewed changes
Copilot reviewed 29 out of 30 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| test/state/settings_state_test.dart | Adds coverage for endpoint registries + migration behavior in SettingsState. |
| test/state/service_registry_test.dart | New test suite covering persistence/CRUD/reorder/reset semantics for ServiceRegistry. |
| test/services/service_policy_test.dart | Adds tests for ResiliencePolicy, executeWithEndpointList, and ResilienceMetrics. |
| test/services/routing_service_test.dart | Updates routing tests for explicit endpoints and adds fallback behavior coverage. |
| test/services/overpass_service_test.dart | Updates overpass tests for explicit endpoints and adds fallback behavior coverage. |
| test/models/service_endpoint_test.dart | Adds JSON/copyWith/equality/default-endpoints tests for ServiceEndpoint. |
| pubspec.lock | Updates locked transitive dependency versions/hashes. |
| lib/state/settings_state.dart | Adds endpoint registries, migration from legacy keys, and new getters for endpoints/registries. |
| lib/state/service_registry.dart | Introduces generic persisted ordered-list registry abstraction with mutation helpers. |
| lib/state/navigation_state.dart | Updates routing log messaging to be service-agnostic. |
| lib/services/service_policy.dart | Adds resilience primitives + execution engine + metrics tracking + structured debug logging. |
| lib/services/routing_service.dart | Switches routing to use multi-endpoint retry/fallback via executeWithEndpointList. |
| lib/services/overpass_service.dart | Switches overpass to use multi-endpoint retry/fallback via executeWithEndpointList. |
| lib/screens/settings/sections/api_endpoints_section.dart | New UI for viewing/reordering/toggling/adding/resetting endpoint registries. |
| lib/screens/advanced_settings_screen.dart | Adds the new API Endpoints section to advanced settings UI. |
| lib/models/service_registry_entry.dart | Defines shared interface for registry-managed entries. |
| lib/models/service_endpoint.dart | Adds endpoint model + built-in defaults for routing/overpass. |
| lib/app_state.dart | Exposes endpoint registries/endpoints from SettingsState through AppState. |
| lib/dev_config.dart | Marks old routing timeout as legacy and aligns timeout value with policy defaults. |
| lib/localizations/en.json | Adds strings for API endpoint settings UI. |
| lib/localizations/de.json | Adds strings for API endpoint settings UI. |
| lib/localizations/es.json | Adds strings for API endpoint settings UI. |
| lib/localizations/fr.json | Adds strings for API endpoint settings UI. |
| lib/localizations/it.json | Adds strings for API endpoint settings UI. |
| lib/localizations/nl.json | Adds strings for API endpoint settings UI. |
| lib/localizations/pl.json | Adds strings for API endpoint settings UI. |
| lib/localizations/pt.json | Adds strings for API endpoint settings UI. |
| lib/localizations/tr.json | Adds strings for API endpoint settings UI. |
| lib/localizations/uk.json | Adds strings for API endpoint settings UI. |
| lib/localizations/zh.json | Adds strings for API endpoint settings UI. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| final effectivePolicy = ResiliencePolicy( | ||
| maxRetries: endpoint.maxRetries ?? defaultPolicy.maxRetries, | ||
| httpTimeout: endpoint.timeoutSeconds != null | ||
| ? Duration(seconds: endpoint.timeoutSeconds!) | ||
| : defaultPolicy.httpTimeout, | ||
| retryBackoffBase: defaultPolicy.retryBackoffBase, |
There was a problem hiding this comment.
ResiliencePolicy.httpTimeout is computed per-endpoint (including timeoutSeconds overrides) but is never enforced by executeWithEndpointList/_executeWithRetries. As a result, endpoint timeout overrides (and even defaultPolicy.httpTimeout) have no effect unless each execute callback manually applies the timeout, which contradicts the function docs. Consider wrapping execute(url) with .timeout(policy.httpTimeout) inside _executeWithRetries, or change the API/docs so the execute callback receives the effective policy and is responsible for applying timeouts consistently.
| onReorder: (oldIndex, newIndex) { | ||
| registry.reorder(oldIndex, newIndex); |
There was a problem hiding this comment.
registry.reorder(oldIndex, newIndex) returns a Future, but it’s called without await/error handling. If persistence fails or an exception is thrown, it will become an unhandled async error and the UI will not be able to report it. Make this callback async and await the reorder (or use unawaited(...).catchError(...) and surface a SnackBar on failure).
| onReorder: (oldIndex, newIndex) { | |
| registry.reorder(oldIndex, newIndex); | |
| onReorder: (oldIndex, newIndex) async { | |
| try { | |
| await registry.reorder(oldIndex, newIndex); | |
| } catch (error) { | |
| ScaffoldMessenger.of(context).showSnackBar( | |
| const SnackBar( | |
| content: Text('Failed to reorder endpoints'), | |
| ), | |
| ); | |
| } |
| onPressed: () { | ||
| try { | ||
| registry.delete(endpoint.id); | ||
| } on StateError catch (e) { | ||
| ScaffoldMessenger.of(context).showSnackBar( | ||
| SnackBar(content: Text(e.message)), | ||
| ); | ||
| } | ||
| }, |
There was a problem hiding this comment.
These registry mutations are async but not awaited. In particular, the try { registry.delete(...) } on StateError block won’t catch the StateError because it will be raised on the returned Future, leading to unhandled async exceptions and no SnackBar. Make the handler async, await registry.delete(...), and catch errors from the awaited call (same applies to addOrUpdate/resetToDefaults).
| ServiceEndpoint copyWith({ | ||
| String? id, | ||
| String? name, | ||
| String? url, | ||
| bool? enabled, | ||
| bool? isBuiltIn, | ||
| int? maxRetries, | ||
| int? timeoutSeconds, | ||
| }) => ServiceEndpoint( | ||
| id: id ?? this.id, | ||
| name: name ?? this.name, | ||
| url: url ?? this.url, | ||
| enabled: enabled ?? this.enabled, | ||
| isBuiltIn: isBuiltIn ?? this.isBuiltIn, | ||
| maxRetries: maxRetries ?? this.maxRetries, | ||
| timeoutSeconds: timeoutSeconds ?? this.timeoutSeconds, | ||
| ); |
There was a problem hiding this comment.
ServiceEndpoint.copyWith can’t clear nullable override fields (maxRetries, timeoutSeconds) back to null because it uses maxRetries ?? this.maxRetries (same for timeoutSeconds). The codebase uses a sentinel _notProvided pattern to distinguish “not provided” vs “explicitly set to null” (e.g., NodeProfile.copyWith). Consider adopting the same pattern here so UI can remove an override once set.
Summary
ResilienceMetricssingleton for tracking retry/fallback behavior (request counts, success rates, error types, latency)executeWithEndpointListand_executeWithRetrieswith structured debug logging (service name, attempt count, error type, disposition)_ClassifiedErrorwrapper so callers receive disposition without re-classifying errorsexecuteWithFallback(no remaining callers)Stacked on #146 — will rebase after that merges.
Test plan
flutter analyzepassesservice_policy_test.darttests pass (67 tests)overpass_service_test.darttests pass (22 tests)ResilienceMetricstest group covers: primary success, fallback success, retries, abort, total failure, error types, summary format🤖 Generated with Claude Code