feat(stt): local Whisper transcription backend + transcribe_audio worker tool by Marenz · Pull Request #177 · spacedriveapp/spacebot

Marenz · 2026-02-23T20:20:57Z

Re-submission of #105, which was merged into the voice branch before that branch landed in main — the commits never reached main.

Local Whisper backend

Set routing.voice = "whisper-local://<spec>" in config. When the channel processes an audio attachment, it bypasses the HTTP provider path and runs inference locally via whisper-rs.

<spec> is a known size name (tiny, base, small, medium, large, large-v3) — downloaded from ggerganov/whisper.cpp on HuggingFace — or an absolute path to a GGML model file. The WhisperContext is cached for the process lifetime; switching models requires a restart.

Audio decoding: Ogg/Opus (Telegram voice messages) is handled via the ogg + opus crates. All other formats fall through to symphonia. Both paths resample to 16 kHz mono f32 before Whisper.

GPU acceleration via the vulkan feature (CUDA excluded due to GCC 14+/nvcc incompatibility on modern distros).

Everything is behind the stt-whisper cargo feature. Builds without it are unaffected.

STT unification + transcribe_audio worker tool

The transcription logic is extracted from channel.rs into a shared stt::transcribe_bytes() function that handles both the local Whisper path and any OpenAI-compatible HTTP provider. Workers now get a transcribe_audio tool that reads a local audio file and transcribes it using whatever is configured in routing.voice — no need to shell out to the whisper CLI.

When routing.voice = "whisper-local://<spec>", audio attachments are transcribed locally instead of via the LLM provider HTTP path. <spec> is either: - A known size name (tiny/base/small/medium/large) — fetched from ggerganov/whisper.cpp on HuggingFace via hf-hub, using the existing HF cache if already present - An absolute path to a GGML model file The WhisperContext is loaded once and cached in a OnceLock for the process lifetime. Audio decoding (ogg, opus, mp3, flac, wav, m4a) is handled by symphonia with linear resampling to 16 kHz mono f32. All three deps (whisper-rs, hf-hub, symphonia) are optional behind the stt-whisper feature flag.

Workers can now call transcribe_audio(path) to transcribe local audio files. The tool uses whatever is configured in routing.voice — local Whisper (whisper-local://<spec>) or any OpenAI-compatible HTTP provider. The transcription logic is extracted from channel.rs into stt.rs as transcribe_bytes(), shared by both the channel attachment handler and the new tool. The stt module is now always compiled (not gated on stt-whisper) since it handles all provider paths.

coderabbitai · 2026-03-02T01:46:42Z

Walkthrough

Introduces Speech-to-Text (STT) transcription feature supporting local Whisper and OpenAI-compatible providers, along with extensive agent channel enhancements including worker/branch spawning APIs, concurrent worker limits, attachment download/transcription, history management, and user message formatting with attribution context.

Changes

Cohort / File(s)	Summary
Documentation & Configuration `README.md`, `prompts/en/tools/transcribe_audio_description.md.j2`	Adds detailed Voice Transcription section documenting configuration options (routing.voice), provider-based and local Whisper setups, model caching, supported formats, and GPU acceleration notes; introduces tool description template.
Speech-to-Text Core `src/stt.rs`	New STT module with unified transcribe_bytes API supporting both local Whisper (when feature-gated) and HTTP-based OpenAI-compatible backends, audio format inference, error handling, and internal helpers for model resolution, audio decoding, and resampling.
Transcribe Audio Tool `src/tools/transcribe_audio.rs`, `src/tools.rs`	New tool implementing audio-to-text transcription with file I/O, MIME type inference, and delegation to STT backend; integrates into tool server when voice_model is configured.
Agent Channel Enhancement `src/agent/channel.rs`	Extensive additions for spawning branches/workers/OpenCode agents with concurrent limits, attachment handling (download and transcription), user message formatting with attribution, event filtering, history rollback/commit management, and related utilities.
Worker & Tool Server Wiring `src/agent/worker.rs`, `Cargo.toml`	Routes voice_model from runtime configuration into worker tool server creation; passes LlmManager and HTTP client dependencies enabling STT integration; updates function signature of create_worker_tool_server.
Library & Feature Export `src/lib.rs`, `src/prompts/text.rs`	Conditionally exports new stt module under "stt-whisper" feature gate; adds transcribe_audio tool description to prompt lookup table.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Split channel.rs and standardize adapter metadata keys #271: Directly overlaps with channel-level refactoring of attachment handling, history management, and worker spawning logic into separate submodules.
Secret store: credential isolation, encryption at rest, output scrubbing #260: Modifies the same worker/branch spawning flows (spawn_worker_from_state, spawn_worker_task) and concurrent worker management.
Process sandbox: kernel-enforced filesystem containment for shell/exec #188: Modifies worker/tool-server wiring and create_worker_tool_server signature by injecting new dependencies (voice_model, llm_manager, http_client).

Suggested reviewers

jamiepine

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically identifies the main changes: addition of a local Whisper transcription backend and a transcribe_audio worker tool, accurately reflecting the primary purpose of the PR.
Description check	✅ Passed	The description is directly related to the changeset, providing comprehensive context about the local Whisper backend, STT unification, audio decoding strategy, GPU support, and the transcribe_audio tool implementation.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/agent/channel.rs (1)
3-9: ⚠️ Potential issue | 🔴 Critical

Remove the duplicated helper block and fix stale ProcessEvent payloads in channel.rs.

This file imports from channel_attachments, channel_dispatch, and channel_history modules (lines 3–9) but also redefines those same helpers locally: spawn_memory_persistence_branch (1830), extract_reply_from_tool_syntax (2223), format_user_message (2277), event_is_for_channel (2343), download_attachments (2381), and apply_history_after_turn (2585). Additionally, ProcessEvent payloads at lines 2051 and 2149 are missing the required worker_type field, and line 2208 is missing the required success field—these will fail to compile. Delete the entire duplicated helper block (1800–2843) and keep the canonical implementations in the separate modules.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/channel.rs` around lines 3 - 9, Remove the entire duplicated helper
block that reimplements spawn_memory_persistence_branch,
extract_reply_from_tool_syntax, format_user_message (and
format_batched_user_message if present), event_is_for_channel,
download_attachments, and apply_history_after_turn so the file uses the
canonical implementations imported at the top; after deleting that block, verify
there are no remaining local definitions or duplicate symbols and update any
call sites to use the imported functions. Also scan this file for any
ProcessEvent payload constructions and ensure they include the required fields
(add worker_type and add success where missing) so payload shapes match the
canonical ProcessEvent type, and run a compile to catch any unresolved
references from the deletion.

🧹 Nitpick comments (2)

src/stt.rs (2)

310-313: Use Path::is_absolute() for absolute-path detection.

spec.starts_with('/') is Unix-specific. std::path::Path::is_absolute() is the correct path check across platforms.

♻️ Suggested fix

-        if spec.starts_with('/') {
-            if std::path::Path::new(spec).exists() {
+        if std::path::Path::new(spec).is_absolute() {
+            if std::path::Path::new(spec).exists() {
                 return Ok(spec.to_owned());
             }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/stt.rs` around lines 310 - 313, Replace the Unix-specific check
spec.starts_with('/') with the cross-platform Path::is_absolute() idiom: use
std::path::Path::new(spec).is_absolute() in the same conditional that also calls
Path::new(spec).exists(), updating the branch in the function/block that handles
the spec variable so it returns Ok(spec.to_owned()) when the path is absolute
and exists (look for the spec.starts_with('/') check near the
Path::new(spec).exists() call in src/stt.rs).

251-523: Expand abbreviated variable names in local decoding/transcription paths.

Names like ctx, n, dec, and mss reduce readability in already dense logic. Prefer explicit names in this module.

As per coding guidelines, "Do not abbreviate variable names; use queue not q, message not msg, channel not ch; common abbreviations like config are acceptable".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/stt.rs` around lines 251 - 523, The review asks to replace abbreviated
local variable names with explicit, descriptive names to improve readability: in
transcribe_blocking rename ctx -> context and n -> segment_count (and any other
short names in that function); in get_or_load_context keep variable names
explicit (e.g., ctx -> context) when caching/loading; in decode_to_f32 rename
mss -> media_source_stream, decoded -> decoded_packet_or_buffer (or
decoded_buffer), and track-related locals to descriptive names (e.g., track_id,
sample_rate, channel_count are fine); in decode_ogg_opus rename dec ->
opus_decoder and pcm -> pcm_buffer and samples -> mono_samples; and in resample
keep names clear (e.g., ratio, output_len, output). Update all usages
consistently within each function (transcribe_blocking, get_or_load_context,
resolve_model_path, decode_to_f32, is_ogg_opus, decode_ogg_opus, resample) and
run cargo check to ensure no missing renames.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib.rs`:
- Around line 27-28: The crate root currently conditionally declares the stt
module with #[cfg(feature = "stt-whisper")], but other files (e.g., references
to crate::stt in transcribe_audio.rs and channel.rs) import it unconditionally;
remove the feature gate so the module is always declared (change #[cfg(feature =
"stt-whisper")] pub mod stt; to pub mod stt;), and if needed keep internal cfgs
inside the stt module implementation to select whisper-specific code paths while
preserving the unconditional module export.

In `@src/stt.rs`:
- Line 300: The call to CONTEXT.set((model_spec.to_owned(), ctx)) currently
discards the Result with let _ =; instead explicitly handle the failure from
OnceLock::set by checking its return (or catching the Err) and logging a
debug-level message that a race occurred and the context was already initialized
(include model_spec info), or propagate the error if appropriate; reference
CONTEXT, OnceLock::set, model_spec and ctx to locate the site and ensure the
branch still retrieves the existing value when set fails.
- Around line 454-499: The loop currently swallows reader.read_packet() errors
and uses decoder.as_mut().unwrap() and abbreviated names; change the packet loop
to fully match reader.read_packet() and on Err(e) return
Err(WhisperError::Decode(e.to_string())), on Ok(None) break; replace
decoder.as_mut().unwrap() with a checked access that returns
Err(WhisperError::Decode("missing decoder".into())) if decoder is None; rename
local variables: dec -> decoder_mut or decoder_ref and n -> num_samples, and
rename inner ch -> channel to avoid abbreviations; ensure the decode_float error
is propagated as WhisperError::Decode and update subsequent uses to the new
variable names when extending samples or mixing down to mono.

In `@src/tools.rs`:
- Around line 434-440: Whitespace-only voice_model values cause
TranscribeAudioTool to be registered then fail; trim the voice_model before the
emptiness check and use the trimmed value when constructing TranscribeAudioTool.
Specifically, replace the current check of voice_model.is_empty() with a check
of a trimmed string (e.g., let trimmed = voice_model.trim(); if
trimmed.is_empty() { ... }) and pass that trimmed/owned value into
TranscribeAudioTool::new (the server.tool(TranscribeAudioTool::new(...)) call)
so only non-empty, non-whitespace model names register.

In `@src/tools/transcribe_audio.rs`:
- Around line 24-35: TranscribeAudioTool currently allows the model to supply
arbitrary host paths; update the file-reading logic in the transcribe_audio
method (and the similar code at the other site referenced around lines 83-87) to
enforce workspace boundaries by canonicalizing the requested path and the
workspace root (resolve symlinks), then ensure the canonicalized requested path
is inside the canonicalized workspace root (starts_with); if not, return an
error/deny access and do not read the file. Use the existing TranscribeAudioTool
method and struct members (or add a workspace_root field if needed) to perform
this check before any filesystem access.
- Around line 84-87: Before reading the audio file into memory in
transcribe_audio.rs, check the file size via tokio::fs::metadata(&args.path) (or
equivalent) and enforce a configurable size cap (e.g., MAX_AUDIO_BYTES
constant); if metadata.len() exceeds the cap return a TranscribeAudioError
indicating the file is too large instead of calling tokio::fs::read. Update the
code paths around args.path and the TranscribeAudioError usage so you only read
into memory after the size check and ensure the error message includes the
actual file size and the allowed limit.

---

Outside diff comments:
In `@src/agent/channel.rs`:
- Around line 3-9: Remove the entire duplicated helper block that reimplements
spawn_memory_persistence_branch, extract_reply_from_tool_syntax,
format_user_message (and format_batched_user_message if present),
event_is_for_channel, download_attachments, and apply_history_after_turn so the
file uses the canonical implementations imported at the top; after deleting that
block, verify there are no remaining local definitions or duplicate symbols and
update any call sites to use the imported functions. Also scan this file for any
ProcessEvent payload constructions and ensure they include the required fields
(add worker_type and add success where missing) so payload shapes match the
canonical ProcessEvent type, and run a compile to catch any unresolved
references from the deletion.

---

Nitpick comments:
In `@src/stt.rs`:
- Around line 310-313: Replace the Unix-specific check spec.starts_with('/')
with the cross-platform Path::is_absolute() idiom: use
std::path::Path::new(spec).is_absolute() in the same conditional that also calls
Path::new(spec).exists(), updating the branch in the function/block that handles
the spec variable so it returns Ok(spec.to_owned()) when the path is absolute
and exists (look for the spec.starts_with('/') check near the
Path::new(spec).exists() call in src/stt.rs).
- Around line 251-523: The review asks to replace abbreviated local variable
names with explicit, descriptive names to improve readability: in
transcribe_blocking rename ctx -> context and n -> segment_count (and any other
short names in that function); in get_or_load_context keep variable names
explicit (e.g., ctx -> context) when caching/loading; in decode_to_f32 rename
mss -> media_source_stream, decoded -> decoded_packet_or_buffer (or
decoded_buffer), and track-related locals to descriptive names (e.g., track_id,
sample_rate, channel_count are fine); in decode_ogg_opus rename dec ->
opus_decoder and pcm -> pcm_buffer and samples -> mono_samples; and in resample
keep names clear (e.g., ratio, output_len, output). Update all usages
consistently within each function (transcribe_blocking, get_or_load_context,
resolve_model_path, decode_to_f32, is_ogg_opus, decode_ogg_opus, resample) and
run cargo check to ensure no missing renames.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7ebd1e1 and 89b57ea.

⛔ Files ignored due to path filters (2)

Cargo.lock is excluded by !**/*.lock, !**/*.lock
Cargo.toml is excluded by !**/*.toml

📒 Files selected for processing (9)

README.md
prompts/en/tools/transcribe_audio_description.md.j2
src/agent/channel.rs
src/agent/worker.rs
src/lib.rs
src/prompts/text.rs
src/stt.rs
src/tools.rs
src/tools/transcribe_audio.rs

coderabbitai · 2026-03-02T01:54:02Z

src/lib.rs

+#[cfg(feature = "stt-whisper")]
+pub mod stt;


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Check for the stt module declaration and feature gates in src/lib.rs rg -n 'pub mod stt|cfg\(feature = "stt-whisper"\)' src/lib.rs

Repository: spacedriveapp/spacebot

Length of output: 116

🏁 Script executed:

# Find all references to crate::stt throughout the codebase rg -n 'crate::stt' --type rust -A 2

Repository: spacedriveapp/spacebot

Length of output: 1258

🏁 Script executed:

# Also check for any use statements that might reference stt rg -n 'use.*stt' --type rust

Repository: spacedriveapp/spacebot

Length of output: 48

🏁 Script executed:

# Check the context around the call sites to see if they're gated sed -n '85,100p' src/tools/transcribe_audio.rs

Repository: spacedriveapp/spacebot

Length of output: 652

🏁 Script executed:

# Check if the files containing call sites have any feature gating head -20 src/tools/transcribe_audio.rs head -30 src/agent/channel.rs

Repository: spacedriveapp/spacebot

Length of output: 1835

🏁 Script executed:

# Verify if there are conditional compilations around those call sites sed -n '2490,2515p' src/agent/channel.rs

Repository: spacedriveapp/spacebot

Length of output: 1269

🏁 Script executed:

# Check if the files with call sites are themselves feature-gated head -1 src/tools/transcribe_audio.rs grep -n '#\[cfg' src/tools/transcribe_audio.rs | head -5

Repository: spacedriveapp/spacebot

Length of output: 104

🏁 Script executed:

# Check if transcribe_audio.rs is conditionally compiled grep -B5 'pub struct TranscribeAudioTool' src/tools/transcribe_audio.rs

Repository: spacedriveapp/spacebot

Length of output: 201

🏁 Script executed:

# Check the tools.rs file to see how transcribe_audio is included cat src/tools.rs | head -50

Repository: spacedriveapp/spacebot

Length of output: 1689

🏁 Script executed:

# Check if the containing functions/methods are feature-gated sed -n '2480,2520p' src/agent/channel.rs | grep -B20 'crate::stt'

Repository: spacedriveapp/spacebot

Length of output: 1206

🏁 Script executed:

# Check tools.rs to see if transcribe_audio is included unconditionally grep -n 'transcribe_audio' src/tools.rs

Repository: spacedriveapp/spacebot

Length of output: 126

🏁 Script executed:

# Check if there's a Cargo.toml to understand the stt-whisper feature cat Cargo.toml | head -100

Repository: spacedriveapp/spacebot

Length of output: 2464

🏁 Script executed:

# Check if transcribe_audio tool usage is conditionally compiled elsewhere grep -rn 'TranscribeAudioTool' --type rust src/

Repository: spacedriveapp/spacebot

Length of output: 183

🏁 Script executed:

# Check if the stt module itself is properly structured to handle the feature gate gracefully head -30 src/stt.rs 2>/dev/null || echo "stt.rs not found, checking for stt/"

Repository: spacedriveapp/spacebot

Length of output: 1207

🏁 Script executed:

# Check if there's a stt/ directory (module) ls -la src/ | grep -i stt

Repository: spacedriveapp/spacebot

Length of output: 124

🏁 Script executed:

# Check the full transcribe_bytes function signature in stt.rs sed -n '30,80p' src/stt.rs

Repository: spacedriveapp/spacebot

Length of output: 1745

🏁 Script executed:

# Check how transcribe_bytes handles the case when stt-whisper is not enabled grep -n 'pub.*fn transcribe_bytes' src/stt.rs -A 20

Repository: spacedriveapp/spacebot

Length of output: 773

🏁 Script executed:

# Verify that the HTTP provider path is unconditionally available grep -n 'http\|Http\|provider' src/stt.rs | head -20

Repository: spacedriveapp/spacebot

Length of output: 1172

🏁 Script executed:

# Verify that stt.rs itself is NOT feature-gated at the module level head -5 src/stt.rs

Repository: spacedriveapp/spacebot

Length of output: 289

🏁 Script executed:

# Check if there's a mod declaration for stt in src/lib.rs and if it matches the cfg gate sed -n '20,35p' src/lib.rs

Repository: spacedriveapp/spacebot

Length of output: 340

🏁 Script executed:

# Confirm that transcribe_audio.rs has no feature gates around the call site sed -n '1,15p' src/tools/transcribe_audio.rs && sed -n '80,100p' src/tools/transcribe_audio.rs

Repository: spacedriveapp/spacebot

Length of output: 1217

🏁 Script executed:

# Double check: try to understand if tools.rs or tools/transcribe_audio.rs is itself feature gated grep -n '#\[cfg' src/tools.rs grep -n '#\[cfg' src/tools/transcribe_audio.rs

Repository: spacedriveapp/spacebot

Length of output: 82

🏁 Script executed:

# Check what happens in agent/channel.rs - is that file feature gated? head -50 src/agent/channel.rs | grep -n '#\[cfg'

Repository: spacedriveapp/spacebot

Length of output: 48

🏁 Script executed:

# Final verification: confirm build would fail without stt-whisper # by checking if crate::stt is definitely called in non-gated contexts echo "=== Checking transcribe_audio.rs context ===" && \ sed -n '1,20p' src/tools/transcribe_audio.rs | grep -c 'cfg' && \ echo "=== Checking if TranscribeAudioTool is exposed in tools.rs ===" && \ sed -n '50,110p' src/tools.rs | grep -i transcribe

Repository: spacedriveapp/spacebot

Length of output: 112

Remove the feature gate from the stt module declaration at the crate root.

The stt module is gated with #[cfg(feature = "stt-whisper")], but call sites in src/tools/transcribe_audio.rs and src/agent/channel.rs reference crate::stt unconditionally. Since the stt module itself supports builds without the feature (dispatching to HTTP providers), the module should be unconditionally available.

🔧 Proposed fix

-#[cfg(feature = "stt-whisper")] pub mod stt;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

#[cfg(feature = "stt-whisper")]

pub mod stt;

pub mod stt;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/lib.rs` around lines 27 - 28, The crate root currently conditionally declares the stt module with #[cfg(feature = "stt-whisper")], but other files (e.g., references to crate::stt in transcribe_audio.rs and channel.rs) import it unconditionally; remove the feature gate so the module is always declared (change #[cfg(feature = "stt-whisper")] pub mod stt; to pub mod stt;), and if needed keep internal cfgs inside the stt module implementation to select whisper-specific code paths while preserving the unconditional module export.

coderabbitai · 2026-03-02T01:54:02Z

src/stt.rs

+        let ctx = WhisperContext::new_with_params(&model_path, params)
+            .map_err(|e| WhisperError::Load(e.to_string()))?;
+
+        let _ = CONTEXT.set((model_spec.to_owned(), ctx));


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Find and examine the context around line 300 in src/stt.rs head -n 310 src/stt.rs | tail -n 30

Repository: spacedriveapp/spacebot

Length of output: 1124

🏁 Script executed:

# Search for CONTEXT definition in the file to understand its type rg "CONTEXT" src/stt.rs -B 2 -A 2

Repository: spacedriveapp/spacebot

Length of output: 835

🏁 Script executed:

# Check for other uses of OnceLock or similar patterns in the codebase rg "OnceLock" src/ -A 2 -B 2

Repository: spacedriveapp/spacebot

Length of output: 2028

Explicitly handle OnceLock::set failure instead of silently discarding it.

Line 300 silently discards the result of CONTEXT.set() via let _ =. Per coding guidelines, errors should be handled, logged, or propagated—not silently discarded. While the code handles the failure implicitly by retrieving the already-set value, the race condition should be acknowledged explicitly at debug level.

🔧 Suggested fix

- let _ = CONTEXT.set((model_spec.to_owned(), ctx)); + if CONTEXT.set((model_spec.to_owned(), ctx)).is_err() { + tracing::debug!("Whisper context was initialized concurrently; using existing context"); + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let _ = CONTEXT.set((model_spec.to_owned(), ctx));

if CONTEXT.set((model_spec.to_owned(), ctx)).is_err() {

tracing::debug!("Whisper context was initialized concurrently; using existing context");

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/stt.rs` at line 300, The call to CONTEXT.set((model_spec.to_owned(), ctx)) currently discards the Result with let _ =; instead explicitly handle the failure from OnceLock::set by checking its return (or catching the Err) and logging a debug-level message that a race occurred and the context was already initialized (include model_spec info), or propagate the error if appropriate; reference CONTEXT, OnceLock::set, model_spec and ctx to locate the site and ensure the branch still retrieves the existing value when set fails.

coderabbitai · 2026-03-02T01:54:02Z

src/stt.rs

+        while let Ok(Some(packet)) = reader.read_packet() {
+            if header_packets < 2 {
+                if header_packets == 0 {
+                    // Parse OpusHead to get channel count and pre-skip.
+                    if packet.data.len() >= 11 && &packet.data[0..8] == b"OpusHead" {
+                        channels = packet.data[9] as usize;
+                        // Output sample rate is always 48000 for libopus.
+                        sample_rate = 48000;
+                    }
+                    decoder = Some(
+                        opus::Decoder::new(
+                            sample_rate,
+                            if channels == 2 {
+                                opus::Channels::Stereo
+                            } else {
+                                opus::Channels::Mono
+                            },
+                        )
+                        .map_err(|e| WhisperError::Decode(e.to_string()))?,
+                    );
+                }
+                header_packets += 1;
+                continue;
+            }
+
+            let dec = decoder.as_mut().unwrap();
+            // Max Opus frame: 120ms at 48kHz = 5760 samples per channel.
+            let max_samples = 5760 * channels;
+            let mut pcm = vec![0f32; max_samples];
+            let n = dec
+                .decode_float(&packet.data, &mut pcm, false)
+                .map_err(|e| WhisperError::Decode(e.to_string()))?;
+
+            // Mix down to mono.
+            if channels == 1 {
+                samples.extend_from_slice(&pcm[..n]);
+            } else {
+                for frame in 0..n {
+                    let mut sum = 0f32;
+                    for ch in 0..channels {
+                        sum += pcm[frame * channels + ch];
+                    }
+                    samples.push(sum / channels as f32);
+                }
+            }
+        }


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the actual code around lines 454-499 in src/stt.rs wc -l src/stt.rs

Repository: spacedriveapp/spacebot

Length of output: 80

🏁 Script executed:

# Read the specific section mentioned in the review sed -n '440,510p' src/stt.rs

Repository: spacedriveapp/spacebot

Length of output: 2858

🏁 Script executed:

# Also check the OnceLock issue mentioned at line 300 sed -n '295,305p' src/stt.rs

Repository: spacedriveapp/spacebot

Length of output: 448

🏁 Script executed:

# Check for abbreviated variable names throughout the file rg '\b(ctx|dec|n|mss)\b' src/stt.rs -n

Repository: spacedriveapp/spacebot

Length of output: 866

Propagate Ogg packet read failures and remove panic path in decoder access.

Line 454 silently drops read_packet() errors by exiting the loop, and line 479 can panic on malformed headers via unwrap(). Both violate error-handling requirements and should return WhisperError::Decode instead. Additionally, variable names dec and n use disallowed abbreviations.

Suggested fix

- while let Ok(Some(packet)) = reader.read_packet() { + loop { + let packet = match reader.read_packet() { + Ok(Some(packet)) => packet, + Ok(None) => break, + Err(error) => return Err(WhisperError::Decode(error.to_string())), + }; if header_packets < 2 { if header_packets == 0 { // Parse OpusHead to get channel count and pre-skip. if packet.data.len() >= 11 && &packet.data[0..8] == b"OpusHead" { channels = packet.data[9] as usize; // Output sample rate is always 48000 for libopus. sample_rate = 48000; + } else { + return Err(WhisperError::Decode("invalid Opus header".into())); } decoder = Some( opus::Decoder::new( sample_rate, if channels == 2 { opus::Channels::Stereo } else { opus::Channels::Mono }, ) .map_err(|e| WhisperError::Decode(e.to_string()))?, ); } header_packets += 1; continue; } - let dec = decoder.as_mut().unwrap(); + let decoder = decoder + .as_mut() + .ok_or_else(|| WhisperError::Decode("Opus decoder not initialized".into()))?; // Max Opus frame: 120ms at 48kHz = 5760 samples per channel. let max_samples = 5760 * channels; let mut pcm = vec![0f32; max_samples]; - let n = dec + let samples_decoded = decoder .decode_float(&packet.data, &mut pcm, false) .map_err(|e| WhisperError::Decode(e.to_string()))?;

Per coding guidelines: "Don't silently discard errors; use let _ = only on channel sends where the receiver may be dropped; handle, log, or propagate all other errors" and "Do not abbreviate variable names; use queue not q, message not msg, channel not ch."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/stt.rs` around lines 454 - 499, The loop currently swallows reader.read_packet() errors and uses decoder.as_mut().unwrap() and abbreviated names; change the packet loop to fully match reader.read_packet() and on Err(e) return Err(WhisperError::Decode(e.to_string())), on Ok(None) break; replace decoder.as_mut().unwrap() with a checked access that returns Err(WhisperError::Decode("missing decoder".into())) if decoder is None; rename local variables: dec -> decoder_mut or decoder_ref and n -> num_samples, and rename inner ch -> channel to avoid abbreviations; ensure the decode_float error is propagated as WhisperError::Decode and update subsequent uses to the new variable names when extending samples or mixing down to mono.

coderabbitai · 2026-03-02T01:54:02Z

src/tools.rs

+    if !voice_model.is_empty() {
+        server = server.tool(TranscribeAudioTool::new(
+            voice_model,
+            llm_manager,
+            http,
+        ));
+    }


⚠️ Potential issue | 🟡 Minor

Trim voice_model before deciding tool registration.

Whitespace-only values currently register transcribe_audio and then fail at call time.

🔧 Proposed fix

- if !voice_model.is_empty() { + let trimmed_voice_model = voice_model.trim().to_string(); + if !trimmed_voice_model.is_empty() { server = server.tool(TranscribeAudioTool::new( - voice_model, + trimmed_voice_model, llm_manager, http, )); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if !voice_model.is_empty() {

server = server.tool(TranscribeAudioTool::new(

voice_model,

llm_manager,

http,

));

}

let trimmed_voice_model = voice_model.trim().to_string();

if !trimmed_voice_model.is_empty() {

server = server.tool(TranscribeAudioTool::new(

trimmed_voice_model,

llm_manager,

http,

));

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/tools.rs` around lines 434 - 440, Whitespace-only voice_model values cause TranscribeAudioTool to be registered then fail; trim the voice_model before the emptiness check and use the trimmed value when constructing TranscribeAudioTool. Specifically, replace the current check of voice_model.is_empty() with a check of a trimmed string (e.g., let trimmed = voice_model.trim(); if trimmed.is_empty() { ... }) and pass that trimmed/owned value into TranscribeAudioTool::new (the server.tool(TranscribeAudioTool::new(...)) call) so only non-empty, non-whitespace model names register.

coderabbitai · 2026-03-02T01:54:02Z

src/tools/transcribe_audio.rs

+impl TranscribeAudioTool {
+    /// Create a new transcribe audio tool.
+    pub fn new(
+        voice_model: impl Into<String>,
+        llm_manager: Arc<LlmManager>,
+        http: reqwest::Client,
+    ) -> Self {
+        Self {
+            voice_model: voice_model.into(),
+            llm_manager,
+            http,
+        }


⚠️ Potential issue | 🔴 Critical

Enforce workspace path boundaries before reading files.

transcribe_audio currently reads any host path supplied by the model. This bypasses workspace-scoped file safety.

🔒 Proposed fix

use std::sync::Arc; +use std::path::PathBuf; pub struct TranscribeAudioTool { /// The configured voice model spec (full `routing.voice` value). voice_model: String, + workspace: PathBuf, llm_manager: Arc<LlmManager>, http: reqwest::Client, } impl TranscribeAudioTool { pub fn new( voice_model: impl Into<String>, + workspace: PathBuf, llm_manager: Arc<LlmManager>, http: reqwest::Client, ) -> Self { Self { voice_model: voice_model.into(), + workspace, llm_manager, http, } } } @@ async fn call(&self, args: Self::Args) -> Result<Self::Output, Self::Error> { - let audio = tokio::fs::read(&args.path) + let requested = PathBuf::from(&args.path); + let candidate = if requested.is_absolute() { + requested + } else { + self.workspace.join(requested) + }; + + let workspace = std::fs::canonicalize(&self.workspace) + .map_err(|e| TranscribeAudioError(format!("invalid workspace: {e}")))?; + let canonical = std::fs::canonicalize(&candidate) + .map_err(|e| TranscribeAudioError(format!("failed to resolve {}: {e}", args.path)))?; + if !canonical.starts_with(&workspace) { + return Err(TranscribeAudioError(format!( + "path escapes workspace: {}", + args.path + ))); + } + + let audio = tokio::fs::read(&canonical) .await .map_err(|e| TranscribeAudioError(format!("failed to read {}: {}", args.path, e)))?;

Also applies to: 83-87

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/tools/transcribe_audio.rs` around lines 24 - 35, TranscribeAudioTool currently allows the model to supply arbitrary host paths; update the file-reading logic in the transcribe_audio method (and the similar code at the other site referenced around lines 83-87) to enforce workspace boundaries by canonicalizing the requested path and the workspace root (resolve symlinks), then ensure the canonicalized requested path is inside the canonicalized workspace root (starts_with); if not, return an error/deny access and do not read the file. Use the existing TranscribeAudioTool method and struct members (or add a workspace_root field if needed) to perform this check before any filesystem access.

coderabbitai · 2026-03-02T01:54:02Z

src/tools/transcribe_audio.rs

+        let audio = tokio::fs::read(&args.path)
+            .await
+            .map_err(|e| TranscribeAudioError(format!("failed to read {}: {}", args.path, e)))?;
+


⚠️ Potential issue | 🟠 Major

Add a file-size cap before reading audio into memory.

Reading arbitrary-size files into memory can cause avoidable memory pressure.

🔧 Proposed fix

+const MAX_AUDIO_BYTES: u64 = 25 * 1024 * 1024; // 25MB @@ - let audio = tokio::fs::read(&canonical) + let metadata = tokio::fs::metadata(&canonical) + .await + .map_err(|e| TranscribeAudioError(format!("failed to stat {}: {}", args.path, e)))?; + if metadata.len() > MAX_AUDIO_BYTES { + return Err(TranscribeAudioError(format!( + "file too large: {} bytes (max {})", + metadata.len(), + MAX_AUDIO_BYTES + ))); + } + + let audio = tokio::fs::read(&canonical) .await .map_err(|e| TranscribeAudioError(format!("failed to read {}: {}", args.path, e)))?;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/tools/transcribe_audio.rs` around lines 84 - 87, Before reading the audio file into memory in transcribe_audio.rs, check the file size via tokio::fs::metadata(&args.path) (or equivalent) and enforce a configurable size cap (e.g., MAX_AUDIO_BYTES constant); if metadata.len() exceeds the cap return a TranscribeAudioError indicating the file is too large instead of calling tokio::fs::read. Update the code paths around args.path and the TranscribeAudioError usage so you only read into memory after the size check and ensure the error message includes the actual file size and the allowed limit.

Marenz force-pushed the feat/stt-local-v2 branch from 139bb52 to d9b8764 Compare February 23, 2026 20:23

Marenz changed the title ~~feat(stt): local Whisper transcription backend via whisper-rs~~ feat(stt): local Whisper transcription backend + transcribe_audio worker tool Feb 23, 2026

Marenz added 4 commits March 2, 2026 02:07

Enable Vulkan GPU backend and Ogg/Opus decode for local Whisper STT

5271a11

docs: document local Whisper STT backend in README

c12dc1f

Marenz force-pushed the feat/stt-local-v2 branch from 37d9e0d to 89b57ea Compare March 2, 2026 01:46

coderabbitai bot reviewed Mar 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(stt): local Whisper transcription backend + transcribe_audio worker tool#177

feat(stt): local Whisper transcription backend + transcribe_audio worker tool#177
Marenz wants to merge 4 commits intospacedriveapp:mainfrom
Marenz:feat/stt-local-v2

Marenz commented Feb 23, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 2, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 2, 2026

Uh oh!

coderabbitai bot Mar 2, 2026

Uh oh!

coderabbitai bot Mar 2, 2026

Uh oh!

coderabbitai bot Mar 2, 2026

Uh oh!

coderabbitai bot Mar 2, 2026

Uh oh!

coderabbitai bot Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-        let _ = CONTEXT.set((model_spec.to_owned(), ctx));
+        if CONTEXT.set((model_spec.to_owned(), ctx)).is_err() {
+            tracing::debug!("Whisper context was initialized concurrently; using existing context");
+        }

Conversation

Marenz commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Marenz commented Feb 23, 2026 •

edited

Loading

coderabbitai bot commented Mar 2, 2026 •

edited

Loading