Skip to content

Kokoro::stream hangs on non-EOS-terminated buffer; streamStop(false) never returns #1153

@msluszniak

Description

@msluszniak

Summary

Kokoro::stream hangs the streaming worker when the input buffer holds content that doesn't end in an end-of-sentence character. The buffer never drains, streamStop(false) waits forever, and only streamStop(true) recovers.

Repro

Build the speech app from branch @ms/tts-stress-tests (preset chips reproduce this in one tap; see no-term:a / no-term:long). Or directly:

model.streamInsert('a');
const p = model.stream({ text: '', ... });
await new Promise((r) => setTimeout(r, 6000));
model.streamStop(false);  // never returns
await p;

Same hang with 'hello world', 2000× U+200D, or any content that ends without .?!;.

Root cause

Kokoro.cpp:171-189:

size_t chunkSize = (eosIt != inputTextBuffer_.rend())
                       ? std::distance(eosIt, inputTextBuffer_.rend())
                       : 0;

if (chunkSize > 0 ||
    streamSkippedIterations >= params::kStreamMaxSkippedIterations) {
  input = inputTextBuffer_.substr(0, chunkSize);   // chunkSize still 0
  inputTextBuffer_.erase(0, chunkSize);            // erases nothing
  streamSkippedIterations = 0;                     // reset, loop forever
}

When no EOS exists in the buffer, chunkSize = 0. The force-flush threshold (streamSkippedIterations >= kStreamMaxSkippedIterations) fires correctly, but the extraction still uses chunkSize and pulls zero characters. Counter resets, loop continues, buffer never drains.

Why the obvious fix is unsafe

Switching the force-flush branch to use the searchable window length:

} else if (streamSkippedIterations >= params::kStreamMaxSkippedIterations) {
  input = inputTextBuffer_.substr(0, searchLimit);
  inputTextBuffer_.erase(0, searchLimit);
  streamSkippedIterations = 0;
}

…breaks the LLM streaming mode. kStreamMaxSkippedIterations is wall-clock-divided-by-kStreamPause, so on slow LLM token rates it trips before the LLM has even produced a sentence, forcing mid-word flushes and degrading speech quality. Tuning the threshold to satisfy both regimes is brittle — context noted in #1134 comment.

Why this isn't just a theoretical concern

The current JS hook (useTextToSpeech.ts:108-111) hides this by auto-appending . when calling stream({ text }). Two places that bypass the rescue:

  • Callers using streamInsert directly (incremental LLM tokens, dictation, partial captions).
  • A streaming caller whose final chunk doesn't end with EOS (LLM truncated, network error, user pressed stop mid-sentence). The trailing un-terminated suffix sits in the buffer permanently.

In both cases streamStop(false) blocks forever with no diagnostic.

Options that don't break LLM streaming

Each is a smaller change than re-tuning the skip counter:

  1. Caller-declared flush intent. streamInsert(text, { canFlush?: boolean }) — LLM mode passes false, normal apps pass true. Caller knows its own pacing.
  2. Explicit streamFlush() API. Caller signals "I'm done feeding for now, partition what's left." LLM mode never calls it; normal apps call it before streamStop(false). No threshold tuning.
  3. Wall-clock idle timeout. Track time since last streamInsert. If the buffer has content and N seconds passed without new inserts, flush. LLM streams keep inserting fast enough to never trip it.

Mitigations if a proper fix is out of scope

  • Document the hazard in streamInsert / streamStop JSDoc — currently nothing in the API surface hints that streamStop(false) can hang indefinitely.
  • JS-side safety net in the hook's stream() wrapper: time streamStop(false), fall back to streamStop(true) with console.warn after some threshold. Zero cost to LLM streaming; prevents downstream apps from soft-locking.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    user expThis issue tackles problems with user experience e.g. overcomplicated API

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions