Summary
Kokoro::stream hangs the streaming worker when the input buffer holds content that doesn't end in an end-of-sentence character. The buffer never drains, streamStop(false) waits forever, and only streamStop(true) recovers.
Repro
Build the speech app from branch @ms/tts-stress-tests (preset chips reproduce this in one tap; see no-term:a / no-term:long). Or directly:
model.streamInsert('a');
const p = model.stream({ text: '', ... });
await new Promise((r) => setTimeout(r, 6000));
model.streamStop(false); // never returns
await p;
Same hang with 'hello world', 2000× U+200D, or any content that ends without .?!;.
Root cause
Kokoro.cpp:171-189:
size_t chunkSize = (eosIt != inputTextBuffer_.rend())
? std::distance(eosIt, inputTextBuffer_.rend())
: 0;
if (chunkSize > 0 ||
streamSkippedIterations >= params::kStreamMaxSkippedIterations) {
input = inputTextBuffer_.substr(0, chunkSize); // chunkSize still 0
inputTextBuffer_.erase(0, chunkSize); // erases nothing
streamSkippedIterations = 0; // reset, loop forever
}
When no EOS exists in the buffer, chunkSize = 0. The force-flush threshold (streamSkippedIterations >= kStreamMaxSkippedIterations) fires correctly, but the extraction still uses chunkSize and pulls zero characters. Counter resets, loop continues, buffer never drains.
Why the obvious fix is unsafe
Switching the force-flush branch to use the searchable window length:
} else if (streamSkippedIterations >= params::kStreamMaxSkippedIterations) {
input = inputTextBuffer_.substr(0, searchLimit);
inputTextBuffer_.erase(0, searchLimit);
streamSkippedIterations = 0;
}
…breaks the LLM streaming mode. kStreamMaxSkippedIterations is wall-clock-divided-by-kStreamPause, so on slow LLM token rates it trips before the LLM has even produced a sentence, forcing mid-word flushes and degrading speech quality. Tuning the threshold to satisfy both regimes is brittle — context noted in #1134 comment.
Why this isn't just a theoretical concern
The current JS hook (useTextToSpeech.ts:108-111) hides this by auto-appending . when calling stream({ text }). Two places that bypass the rescue:
- Callers using
streamInsert directly (incremental LLM tokens, dictation, partial captions).
- A streaming caller whose final chunk doesn't end with EOS (LLM truncated, network error, user pressed stop mid-sentence). The trailing un-terminated suffix sits in the buffer permanently.
In both cases streamStop(false) blocks forever with no diagnostic.
Options that don't break LLM streaming
Each is a smaller change than re-tuning the skip counter:
- Caller-declared flush intent.
streamInsert(text, { canFlush?: boolean }) — LLM mode passes false, normal apps pass true. Caller knows its own pacing.
- Explicit
streamFlush() API. Caller signals "I'm done feeding for now, partition what's left." LLM mode never calls it; normal apps call it before streamStop(false). No threshold tuning.
- Wall-clock idle timeout. Track time since last
streamInsert. If the buffer has content and N seconds passed without new inserts, flush. LLM streams keep inserting fast enough to never trip it.
Mitigations if a proper fix is out of scope
- Document the hazard in
streamInsert / streamStop JSDoc — currently nothing in the API surface hints that streamStop(false) can hang indefinitely.
- JS-side safety net in the hook's
stream() wrapper: time streamStop(false), fall back to streamStop(true) with console.warn after some threshold. Zero cost to LLM streaming; prevents downstream apps from soft-locking.
References
Summary
Kokoro::streamhangs the streaming worker when the input buffer holds content that doesn't end in an end-of-sentence character. The buffer never drains,streamStop(false)waits forever, and onlystreamStop(true)recovers.Repro
Build the speech app from branch
@ms/tts-stress-tests(preset chips reproduce this in one tap; seeno-term:a/no-term:long). Or directly:Same hang with
'hello world', 2000× U+200D, or any content that ends without.?!;.Root cause
Kokoro.cpp:171-189:When no EOS exists in the buffer,
chunkSize = 0. The force-flush threshold (streamSkippedIterations >= kStreamMaxSkippedIterations) fires correctly, but the extraction still useschunkSizeand pulls zero characters. Counter resets, loop continues, buffer never drains.Why the obvious fix is unsafe
Switching the force-flush branch to use the searchable window length:
…breaks the LLM streaming mode.
kStreamMaxSkippedIterationsis wall-clock-divided-by-kStreamPause, so on slow LLM token rates it trips before the LLM has even produced a sentence, forcing mid-word flushes and degrading speech quality. Tuning the threshold to satisfy both regimes is brittle — context noted in #1134 comment.Why this isn't just a theoretical concern
The current JS hook (
useTextToSpeech.ts:108-111) hides this by auto-appending.when callingstream({ text }). Two places that bypass the rescue:streamInsertdirectly (incremental LLM tokens, dictation, partial captions).In both cases
streamStop(false)blocks forever with no diagnostic.Options that don't break LLM streaming
Each is a smaller change than re-tuning the skip counter:
streamInsert(text, { canFlush?: boolean })— LLM mode passesfalse, normal apps passtrue. Caller knows its own pacing.streamFlush()API. Caller signals "I'm done feeding for now, partition what's left." LLM mode never calls it; normal apps call it beforestreamStop(false). No threshold tuning.streamInsert. If the buffer has content and N seconds passed without new inserts, flush. LLM streams keep inserting fast enough to never trip it.Mitigations if a proper fix is out of scope
streamInsert/streamStopJSDoc — currently nothing in the API surface hints thatstreamStop(false)can hang indefinitely.stream()wrapper: timestreamStop(false), fall back tostreamStop(true)withconsole.warnafter some threshold. Zero cost to LLM streaming; prevents downstream apps from soft-locking.References