Desktop: add Gemini thinking budget controls to cut API costs ~50% by beastoin · Pull Request #7159 · BasedHardware/omi

beastoin · 2026-05-04T09:36:31Z

Summary

Cut Gemini 2.5 Flash thinking token costs by setting explicit thinkingBudget=0 on all production extraction/classification paths in the desktop macOS app, and adding defense-in-depth budget injection at the Rust proxy layer.

Problem

Gemini 2.5 Flash thinking output tokens cost 5.8x more than regular output ($3.50/M vs $0.60/M)
Thinking tokens accounted for 65% of daily Gemini spend (~$513/day)
Without explicit thinkingConfig, Gemini defaults to unlimited thinking
All desktop Gemini usage is extraction/classification that doesn't need chain-of-thought reasoning

Changes

Swift client (GeminiClient.swift):

Added ThinkingConfig struct with model-aware minimumBudget(for:) — Flash=0, Pro=128
All 4 production methods now clamp thinkingBudget to model minimum:
- sendRequest (image+schema) — Memory, Focus, Onboarding
- sendTextRequest (text only) — LiveNotes, Goals, Profile, PTT
- sendRequest (text+schema) — Prioritization, Dedup, Goals
- sendImageToolLoop (image+tools) — TaskAssistant, InsightAssistant
Removed 5 unused methods and 3 unused structs (685 lines of dead code):
- sendChatStreamRequest, sendToolChatRequest, continueWithToolResults, sendImageToolRequest, continueImageToolRequest
- GeminiChatRequest, GeminiStreamChunk, GeminiToolChatRequest

Rust proxy (proxy.rs):

Defense-in-depth: sanitize_gemini_body() injects thinkingConfig(budget=1024) when client omits it
Creates generationConfig entirely when absent (caps legacy clients with no generation_config)
Handles both snake_case and camelCase field names
8 new tests: injection, preservation, embed skip, missing config, dual casing, null/string config

Expected Impact

Eliminates ~100% of thinking token spend on current app (Flash: budget=0, Pro: budget=128)
Old app versions capped at 1024 tokens via proxy (vs unlimited before)
No impact on extraction quality — these paths are classification, not reasoning

Test plan

Swift builds clean (0 errors, 14.81s)
All 202 Rust proxy tests pass (8 thinking budget tests)
Model-aware budget: Pro gets 128 minimum, Flash gets 0
Proxy creates generationConfig when absent entirely
Edge cases: dual casing, null, string generation_config all handled
Monitor Gemini billing post-deploy for thinking token reduction

🤖 Generated with Claude Code

greptile-apps · 2026-05-04T09:40:21Z

Greptile Summary

This PR adds ThinkingConfig with thinkingBudget to all Gemini request types in Swift (budget=0 for extraction, budget=4096 for chat) and adds a Rust proxy fallback that injects a default budget of 1024 when the client omits thinkingConfig. The cost-reduction rationale is sound, the Swift changes are clean, and 4 new Rust tests are included.

P1 — Proxy defense gap: The Rust injection only fires when a generation_config/generationConfig object is already present; requests that omit the key entirely bypass the cap, defeating the stated defense-in-depth contract.
P2 — ThinkingConfig key casing: Swift encodes as \"thinking_budget\" (snake_case) while the proxy injects \"thinkingBudget\" (camelCase) — worth aligning for consistency.

Confidence Score: 3/5

Safe to merge for immediate cost reduction, but the proxy defense-in-depth has a logic gap that should be fixed before relying on it as a safety net.

One P1 logic bug — proxy doesn't inject thinking budget when generation_config is absent — means the safety net is incomplete. All current Swift callers are protected since they now always set generationConfig, but the gap undermines the stated contract and creates risk for future callers.

desktop/Backend-Rust/src/routes/proxy.rs — the thinking budget injection block needs a fallback for requests that omit generation_config entirely.

Important Files Changed

Filename	Overview
desktop/Backend-Rust/src/routes/proxy.rs	Adds DEFAULT_THINKING_BUDGET constant and injects thinkingConfig into generation_config when absent; injection is skipped entirely if generation_config is not present, leaving a gap in defense-in-depth.
desktop/Desktop/Sources/ProactiveAssistants/Core/GeminiClient.swift	Adds ThinkingConfig struct and wires thinkingBudget=0 to extraction calls and thinkingBudget=4096 to chat/streaming calls; responseMimeType correctly made optional; minor CodingKeys casing inconsistency.
desktop/CHANGELOG.json	Adds unreleased changelog entry for thinking budget controls — no issues.

Sequence Diagram

sequenceDiagram
    participant SW as Swift Client
    participant PX as Rust Proxy
    participant GM as Gemini API

    Note over SW: Extraction call (Focus/Task/Memory)
    SW->>PX: POST generateContent budget=0
    PX->>PX: thinking_config present, skip injection
    PX->>GM: forward with budget=0
    GM-->>SW: response (no thinking tokens)

    Note over SW: Chat / streaming call
    SW->>PX: POST generateContent budget=4096
    PX->>PX: thinking_config present, skip injection
    PX->>GM: forward with budget=4096
    GM-->>SW: response (moderate thinking)

    Note over PX: Defense-in-depth path
    SW->>PX: POST generateContent, generation_config present, NO thinking_config
    PX->>PX: thinking_config absent, inject budget=1024
    PX->>GM: forward with injected budget=1024
    GM-->>SW: response (capped thinking)

    Note over PX,GM: Gap: if generation_config absent entirely, no injection occurs

_{Reviews (1): Last reviewed commit: "Add changelog entry for thinking budget ..." | Re-trigger Greptile}

greptile-apps · 2026-05-04T09:40:26Z

+                // Defense-in-depth: inject default thinking budget if client omits it.
+                // Gemini 2.5 Flash defaults to unlimited thinking which is 5.8x more
+                // expensive than regular output tokens. Cap at 1024 when absent.
+                let has_thinking = gc.contains_key("thinking_config")
+                    || gc.contains_key("thinkingConfig");
+                if !has_thinking {
+                    gc.insert(
+                        "thinkingConfig".to_string(),
+                        serde_json::json!({"thinkingBudget": DEFAULT_THINKING_BUDGET}),
+                    );
+                }


Defense-in-depth bypass when generation_config is absent

The injection only fires when the request already contains a generation_config/generationConfig object. A request that omits the key entirely (valid Gemini API behavior — model uses defaults) skips this block, leaving thinking unlimited. The PR comment says "inject default budget=1024 when client omits thinkingConfig" but the actual contract is narrower: the budget is injected only when a generation_config exists without a thinkingConfig. Any future client call that forgets to set generationConfig bypasses the proxy's cost cap entirely, defeating the stated defense-in-depth goal.

The fix is to add a fallback after the loop: if neither generation_config nor generationConfig exists in the object, insert a new generation_config containing only the default thinkingConfig.

greptile-apps · 2026-05-04T09:40:27Z

+  enum CodingKeys: String, CodingKey {
+    case thinkingBudget = "thinking_budget"
+  }


thinking_budget key name inconsistency

Swift's ThinkingConfig maps thinkingBudget → "thinking_budget" (snake_case), while the Rust proxy injects "thinkingBudget" (camelCase). Both are accepted by Gemini's protobuf JSON layer today, but they're inconsistent with each other and could silently break if the API tightens JSON strictness.

Suggested change

enum CodingKeys: String, CodingKey {

case thinkingBudget = "thinking_budget"

}

enum CodingKeys: String, CodingKey {

case thinkingBudget = "thinkingBudget"

}

beastoin · 2026-05-06T10:11:31Z

PR #7159 Testing Friction Points (for @sora / workflow improvement)

1. Partial knowledge of `beast omi dev` tools

I didn't know about these commands until sora pointed them out mid-test:

beast omi dev auth-token <uid> — standalone dev token generator
beast omi dev doctor — environment health check
beast omi dev start — dev backend launcher
beast omi dev evidence — CP9 evidence capture

Impact: I manually built auth tokens from prod app instead of using the dev token generator, which caused a cascade of auth/project mismatch issues.

Suggestion: Add beast omi dev tool inventory to the desktop-app-walkthrough skill prerequisites or CP9 section of the PR workflow skill.

2. GoogleService-Info-Dev.plist points to prod project

Both GoogleService-Info.plist and GoogleService-Info-Dev.plist in the Desktop package use PROJECT_ID=based-hardware (prod). There is no config pointing to based-hardware-dev. This means:

Dev tokens generated for based-hardware-dev are rejected by the app's Firebase Auth
Auth injection from a dev-signed-in app fails because no app is signed into a dev Firebase project
Testing requires prod-compatible tokens, which conflicts with the dev backend expecting based-hardware-dev

Impact: Required swapping FIREBASE_PROJECT_ID in backend .env from based-hardware-dev to based-hardware to match the app's Firebase config.

3. Other blockers encountered

SwiftPM lock contention: run.sh uses a broad pgrep pattern that matches shell command strings containing SWIFT_BUILD_DIR, falsely detecting lock contention. Had to kill 3 stale processes (one 21hr old).
Missing framework copies in run.sh: ContentsquareCore.framework, onnxruntime.framework, and Sentry.framework are not copied by run.sh's bundle creation logic (lines 381-455), causing runtime crashes.
Resource bundle path: Binary rename without matching resource bundle causes Fatal error: could not load resource bundle.

None of these are code flaws in PR #7159 — they're environment/tooling gaps in the desktop dev workflow.

by AI for @beastoin

…ction Gemini 2.5 Flash thinking output costs $3.50/M tokens vs $0.60/M regular (5.8x). Without explicit thinkingConfig, the model defaults to unlimited thinking on every call — representing 65% of daily Gemini spend. - Add ThinkingConfig struct with thinkingBudget field - Add thinkingConfig to all three GenerationConfig structs - Add thinkingBudget parameter to all 6 public GeminiClient methods - Proactive extraction (Focus, Task, Insight, Memory): budget=0 (no thinking) - User-facing chat (streaming + tool-calling): budget=4096 (moderate thinking) - Make responseMimeType optional in GeminiRequest.GenerationConfig Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Inject default thinkingConfig (budget=1024) in sanitize_gemini_body when client omits it. Catches old app versions and any code path that bypasses the Swift-side ThinkingConfig. Respects both snake_case and camelCase existing configs. 4 new tests for injection, preservation, and embed skip. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…to all paths 5 unused methods removed (sendChatStreamRequest, sendToolChatRequest, continueWithToolResults, sendImageToolRequest, continueImageToolRequest) plus associated structs (GeminiChatRequest, GeminiStreamChunk, GeminiToolChatRequest). 685 lines of dead code eliminated. Added generationConfig with thinkingBudget=0 to GeminiImageToolRequest so task extraction and insight tool loop paths explicitly disable thinking tokens instead of relying on proxy default. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Proxy default stays at 1024 to cap old clients that don't send thinkingConfig. Current Swift client explicitly sends budget=0 on all production paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ws 0) Gemini 2.5 Pro requires minimum thinkingBudget=128 while Flash supports 0. Added ThinkingConfig.minimumBudget(for:) that returns 128 for Pro models and 0 for Flash. All methods now clamp budget to model minimum. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Old clients may send requests with no generation_config at all. Previously the proxy only injected thinkingConfig into an existing generation_config object. Now it creates generationConfig with the default thinking budget when the key is missing entirely. Added regression test for contents-only request body. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Tests for: dual generation_config casings, null generation_config, string generation_config. All malformed cases get a fresh generationConfig with default thinking budget. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

greptile-apps Bot reviewed May 4, 2026

View reviewed changes

beastoin and others added 8 commits May 6, 2026 10:14

Add changelog entry for thinking budget cost reduction

fb1fa78

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Update proxy thinking budget comment to reflect client-side budget=0

576e1e6

Proxy default stays at 1024 to cap old clients that don't send thinkingConfig. Current Swift client explicitly sends budget=0 on all production paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add edge case tests for proxy thinking budget injection

fd46118

Tests for: dual generation_config casings, null generation_config, string generation_config. All malformed cases get a fresh generationConfig with default thinking budget. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

beastoin force-pushed the worktree-gemini-thinking-budget branch from d2c947f to fd46118 Compare May 6, 2026 10:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Desktop: add Gemini thinking budget controls to cut API costs ~50%#7159

Desktop: add Gemini thinking budget controls to cut API costs ~50%#7159
beastoin wants to merge 8 commits intomainfrom
worktree-gemini-thinking-budget

beastoin commented May 4, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 4, 2026

Uh oh!

greptile-apps Bot May 4, 2026

Uh oh!

greptile-apps Bot May 4, 2026

Uh oh!

beastoin commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

beastoin commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Changes

Expected Impact

Test plan

Uh oh!

greptile-apps Bot commented May 4, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

beastoin commented May 6, 2026

PR #7159 Testing Friction Points (for @sora / workflow improvement)

1. Partial knowledge of beast omi dev tools

2. GoogleService-Info-Dev.plist points to prod project

3. Other blockers encountered

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

beastoin commented May 4, 2026 •

edited

Loading

1. Partial knowledge of `beast omi dev` tools