feat(gemini): support --resume flag and fix stuck session on abort#262
feat(gemini): support --resume flag and fix stuck session on abort#262junmo-kim wants to merge 18 commits intotiann:mainfrom
Conversation
When `hapi gemini` starts or restarts (e.g. after a disconnect), it previously always spawned a fresh gemini subprocess with no session context. The geminiSessionId was stored in the HAPI DB but never used to resume the underlying gemini process. This change wires `--resume <gemini-session-uuid>` through the full call chain so that gemini is spawned with `--experimental-acp --resume <id>`, restoring the previous conversation. - cli/src/commands/gemini.ts: parse `--resume` arg - cli/src/gemini/runGemini.ts: accept and forward `resumeGeminiSessionId` - cli/src/gemini/loop.ts: pass session ID to GeminiSession constructor - geminiBackend.ts already adds `--resume` when resumeSessionId is set Adds unit tests for geminiBackend resume behaviour. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous commit wired --resume through to the gemini spawn args, but the ACP session creation still called session/new unconditionally, which starts a fresh conversation regardless of --resume. AcpSdkBackend already has loadSession() which sends session/load to the ACP agent, restoring the previous conversation context. Use it when a resumeSessionId is present. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Findings
- [Major] Remote resume no fallback.
loadSessionerror now aborts remote launch; expected behavior is new session on invalid/expired session id. Evidencecli/src/gemini/geminiRemoteLauncher.ts:71-74
Suggested fix:let acpSessionId: string; if (session.sessionId) { try { acpSessionId = await backend.loadSession({ ...sessionConfig, sessionId: session.sessionId }); } catch (error) { logger.warn('[gemini-remote] resume failed, starting new session', error); session.sendSessionEvent({ type: 'message', message: 'Gemini resume failed; starting a new session.' }); acpSessionId = await backend.newSession(sessionConfig); } } else { acpSessionId = await backend.newSession(sessionConfig); }
Summary
Review mode: initial. 1 major issue.
Testing
Not run (automation)
| }); | ||
| const sessionConfig = { cwd: session.path, mcpServers: toAcpMcpServers(mcpServers) }; | ||
| const acpSessionId = session.sessionId | ||
| ? await backend.loadSession({ ...sessionConfig, sessionId: session.sessionId }) |
There was a problem hiding this comment.
[MAJOR] Remote resume has no fallback. If session.sessionId is invalid/expired, loadSession throws and the remote loop aborts. Expected behavior: start a new session (similar to opencodeRemoteLauncher). Evidence cli/src/gemini/geminiRemoteLauncher.ts:71-74.
Suggested fix:
let acpSessionId: string;
if (session.sessionId) {
try {
acpSessionId = await backend.loadSession({ ...sessionConfig, sessionId: session.sessionId });
} catch (error) {
logger.warn('[gemini-remote] resume failed, starting new session', error);
session.sendSessionEvent({ type: 'message', message: 'Gemini resume failed; starting a new session.' });
acpSessionId = await backend.newSession(sessionConfig);
}
} else {
acpSessionId = await backend.newSession(sessionConfig);
}There was a problem hiding this comment.
Fixed: loadSession is wrapped in try/catch that falls back to newSession. See lines 79-89.
…pt utilities Add two exported utility functions with no callers yet: - findGeminiTranscriptPath(sessionId): locates the gemini session file on disk by scanning ~/.gemini/tmp/*/chats/ for a filename containing the first 8 chars of the session UUID - readGeminiTranscript(filePath): thin public wrapper around the existing private readTranscript helper No behaviour change. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…anscript Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When a gemini session is resumed via loadSession(), the HAPI web UI previously showed an empty chat history even though Gemini had the full conversation context. On resume, find the gemini transcript file on disk and inject existing user/assistant messages into the message buffer so the chat history appears in the UI before the user sends a new message. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
replayHistoricalMessages() was only calling messageBuffer.addMessage() which feeds the local Ink terminal display. Web clients receive messages via sendUserMessage() and sendCodexMessage(), so historical messages were invisible in the web UI when reconnecting to a resumed session. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
If the stored sessionId is invalid or expired, backend.loadSession() throws an error and the main loop was aborting. Now we catch the error, warn the user via a session event, and start a fresh session instead — matching the behaviour of opencodeRemoteLauncher. Also guard replayHistoricalMessages behind a didResume flag so history is only replayed when the session was actually restored. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Gemini stores user messages as Array<{text: string}> in the transcript,
not as plain strings. The previous code checked typeof content === 'string'
which always failed for user messages, so they were silently skipped.
Add extractMessageText() to handle both formats, and update the
GeminiTranscriptMessage type to reflect the actual data shape.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Findings
- [Major] Transcript replay re-emits history to hub, duplicating messages on resume; hub already persisted history. Evidence
cli/src/gemini/geminiRemoteLauncher.ts:214.
Suggested fix:if (text) { this.messageBuffer.addMessage(text, 'user'); } // ... if (message.type === 'gemini' && typeof message.content === 'string' && message.content) { this.messageBuffer.addMessage(message.content, 'assistant'); }
Summary
Review mode: follow-up after new commits. 1 major issue: resume replay duplicates messages in hub/UI.
Testing
- Not run (automation)
HAPI Bot
| if (message.type === 'user') { | ||
| const text = extractMessageText(message.content); | ||
| if (text) { | ||
| this.session.sendUserMessage(text); |
There was a problem hiding this comment.
[MAJOR] History replay re-emits transcript messages to the hub, which will duplicate existing history on resume. Evidence cli/src/gemini/geminiRemoteLauncher.ts:214.
Suggested fix:
if (text) {
this.messageBuffer.addMessage(text, 'user');
}
// ...
if (message.type === 'gemini' && typeof message.content === 'string' && message.content) {
this.messageBuffer.addMessage(message.content, 'assistant');
}There was a problem hiding this comment.
Not a real duplication: when HAPI creates a new resume session, the hub DB for that session is empty. The web client calls getMessages() on connect and gets nothing. sendUserMessage/sendCodexMessage fill it for the first time — no duplicates.
… switch Add `historyReplayed` flag to `GeminiSession` to ensure transcript history is sent to the hub only once. Local mode sets the flag when its scanner starts (it handles live messages itself); remote mode checks the flag before replaying, preventing duplication when the user switches from local to remote mode. Also fix type errors in `sessionScanner.test.ts` introduced by the content type expansion in GeminiTranscriptMessage. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Findings
- [Major] Resume history replay re-emits transcript messages to the hub, which will duplicate already persisted history on resume — evidence
cli/src/gemini/geminiRemoteLauncher.ts:215.
Suggested fix:if (text) { this.messageBuffer.addMessage(text, 'user'); } // ... } else if (message.type === 'gemini' && typeof message.content === 'string' && message.content) { this.messageBuffer.addMessage(message.content, 'assistant'); }
Summary
Review mode: follow-up after new commits. 1 major issue: resume history replay duplicates persisted messages in hub/UI.
Testing
- Not run (automation)
HAPI Bot
Gemini CLI transcript is the SSOT for conversation history; the HAPI hub does not need a secondary copy. Remove sendUserMessage/sendCodexMessage from replayHistoricalMessages so hub messages are not duplicated on resume. Terminal display (messageBuffer) is still populated for local Ink UI. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously, calling cancelPrompt() only sent a session/cancel notification to the Gemini CLI ACP backend. If the backend did not respond to the cancel, backend.prompt() would hang indefinitely (timeoutMs: Infinity) blocking the main message loop from processing any further user input. Add AbortSignal support through the ACP stack: - AcpStdioTransport.sendRequest() now accepts an optional AbortSignal; fires immediately if already aborted, or rejects on the abort event - AcpSdkBackend.prompt() accepts a signal and passes it to sendRequest() - geminiRemoteLauncher passes the session AbortController signal to prompt() Tests added for both transport-level and backend-level abort behaviour. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Findings
- [Major] Resume history only written to local message buffer; remote/web clients (headless runner) see no replay, so
--resumehistory feature missing in primary UI. Evidencecli/src/gemini/geminiRemoteLauncher.ts:201
Suggested fix:import { randomUUID } from 'node:crypto'; if (message.type === 'user') { const text = extractMessageText(message.content); if (text) { this.session.sendUserMessage(text); this.messageBuffer.addMessage(text, 'user'); } } else if (message.type === 'gemini' && typeof message.content === 'string' && message.content) { this.session.sendCodexMessage({ type: 'message', message: message.content, id: randomUUID() }); this.messageBuffer.addMessage(message.content, 'assistant'); }
Summary
Review mode: follow-up after new commits. 1 major issue. Testing gap: not run (automation).
Testing
Not run (automation)
HAPI Bot
| if (message.type === 'user') { | ||
| const text = extractMessageText(message.content); | ||
| if (text) { | ||
| this.messageBuffer.addMessage(text, 'user'); |
There was a problem hiding this comment.
[MAJOR] Resume history replay only updates local messageBuffer; remote/web clients (no TTY) see no history, so resume UX missing. Evidence cli/src/gemini/geminiRemoteLauncher.ts:201.
Suggested fix:
import { randomUUID } from 'node:crypto';
if (message.type === 'user') {
const text = extractMessageText(message.content);
if (text) {
this.session.sendUserMessage(text);
this.messageBuffer.addMessage(text, 'user');
}
} else if (message.type === 'gemini' && typeof message.content === 'string' && message.content) {
this.session.sendCodexMessage({ type: 'message', message: message.content, id: randomUUID() });
this.messageBuffer.addMessage(message.content, 'assistant');
}There was a problem hiding this comment.
Fixed in the current code — both sendUserMessage() and sendCodexMessage() are called during replay. Web clients receive history via hub.
Three issues fixed in replayHistoricalMessages: 1. Web UI showed empty conversation on resume: replay only wrote to local messageBuffer (Ink terminal). Now also calls sendUserMessage() and sendCodexMessage() so web clients see the history. 2. resumedFromSessionId was only set when loadSession() succeeded. When Gemini ACP cannot find the session in its registry (e.g. after process restart), loadSession() fails and fell back to newSession() without replaying history. Now resumedFromSessionId is set in both branches — Gemini CLI was already started with --resume so its conversation context is intact regardless of ACP session/load result. 3. replayHistoricalMessages used session.sessionId after onSessionFound() had already overwritten it with the new ACP session ID. Captured the original session ID before the call to fix transcript lookup. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
clearTimeout() does not accept null; declare timer with let before the cleanup closure captures it, and guard with null check. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| }); | ||
| }, this.abortController.signal); | ||
| } catch (error) { | ||
| logger.warn('[gemini-remote] prompt failed', error); |
There was a problem hiding this comment.
[MAJOR] User abort now throws AbortError from backend.prompt() and is treated as a hard failure, so every abort shows "Gemini prompt failed" plus an extra status line. This is new with the abort signal wiring and makes normal aborts look like errors. Evidence cli/src/gemini/geminiRemoteLauncher.ts:138.
Suggested fix:
} catch (error) {
const isAbort = error instanceof DOMException && error.name === 'AbortError';
if (!isAbort) {
logger.warn('[gemini-remote] prompt failed', error);
session.sendSessionEvent({
type: 'message',
message: 'Gemini prompt failed. Check logs for details.'
});
messageBuffer.addMessage('Gemini prompt failed', 'status');
}
} finally {There was a problem hiding this comment.
Fixed in commit 6ee6eee — catch block now checks isAbortError() and silently ignores expected aborts without showing the error message.
When an interactive shell (e.g., SSH) hangs and the user aborts, the catch block previously treated AbortError the same as a real failure, showing 'Gemini prompt failed. Check logs for details.' in the web UI. Add isAbortError() to errorUtils and use it in geminiRemoteLauncher to silently handle expected abort, logging at debug level only. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Review mode: follow-up after new commits
Major: Resume fallback replays history after starting a new ACP session; UI can imply model context that isn't loaded (see inline).
Minor: signal.aborted check happens after timer setup, leaving an unnecessary timer running when already aborted (see inline).
Testing: Not run (automation)
| acpSessionId = await backend.newSession(sessionConfig); | ||
| // Replay history even when loadSession fails: Gemini CLI was started with --resume | ||
| // so it already has the conversation context, and UI should show the transcript too. | ||
| resumedFromSessionId = originalSessionId; |
There was a problem hiding this comment.
[MAJOR] Resume fallback replays history against a fresh ACP session
When session/load fails you create a new session but still set resumedFromSessionId, so replayHistoricalMessages runs even though ACP is now on a fresh context. That can make the UI show prior context the model does not have.
Suggested fix:
} catch (error) {
logger.warn('[gemini-remote] resume failed, starting new session', error);
session.sendSessionEvent({ type: 'message', message: 'Gemini resume failed; starting a new session.' });
acpSessionId = await backend.newSession(sessionConfig);
resumedFromSessionId = null;
}There was a problem hiding this comment.
Valid point — fixed in commit f3264ad. On loadSession failure, resumedFromSessionId stays null so history replay is skipped. The previous comment in the code was incorrect.
| timer.unref?.(); | ||
| } | ||
|
|
||
| if (signal?.aborted) { |
There was a problem hiding this comment.
[MINOR] signal.aborted check after timer setup leaves a pending timer
If the signal is already aborted, the timeout timer is still scheduled and left to fire later. Move the aborted check before creating the timer, or clear the timer before returning.
Suggested fix:
if (signal?.aborted) {
reject(new DOMException('ACP request aborted', 'AbortError'));
return;
}
if (Number.isFinite(timeoutMs)) {
// existing timer setup
}There was a problem hiding this comment.
Fixed in commit f3264ad — cleanup() is now called before rejecting on a pre-aborted signal, which clears the pending timer.
- AcpStdioTransport: call cleanup() before rejecting on pre-aborted signal to clear the pending timer (bot comment 2902721160) - geminiRemoteLauncher: don't replay history when session/load fails and we fall back to session/new — model has no prior context in that case (bot comment 2902721157) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Findings
- [Major] Local resume sets without backfilling, so switching local→remote after skips history replay. Evidence .
Suggested fix:
Summary
Review mode: follow-up after new commits. 1 major issue. Testing not run (automation).
Testing
Not run (automation)
HAPI Bot
| onSessionId: (sessionId) => session.onSessionFound(sessionId) | ||
| }); | ||
| // Local mode handles live messages directly; prevent remote mode from re-replaying history | ||
| session.historyReplayed = true; |
There was a problem hiding this comment.
[MAJOR] in local mode prevents resume history from reaching the hub when you later switch to remote (local scanner only streams new transcript entries). Evidence .
Suggested fix:
There was a problem hiding this comment.
Duplicate of the comment below — same root cause. Fixed in commit e02a7da.
- AcpStdioTransport: verify cleanup() clears timer when signal is pre-aborted - geminiRemoteLauncher: verify history is not replayed when loadSession fails, and is replayed when loadSession succeeds Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| let scanner: GeminiScannerHandle | null = null; | ||
|
|
||
| const handleTranscriptMessage = (message: { type?: string; content?: string }) => { | ||
| const handleTranscriptMessage = (message: { type?: string; content?: string | Array<{ text?: string }> }) => { |
There was a problem hiding this comment.
[MAJOR] content now allows array form, but local transcript handler only forwards user messages when content is a string. When transcripts use the array format, user messages never reach the hub (remote UI misses user history). Evidence cli/src/gemini/geminiRemoteLauncher.ts:274.
Suggested fix:
const text = extractMessageText(message.content);
if (message.type === 'user' && text) {
session.sendUserMessage(text);
return;
}There was a problem hiding this comment.
Valid point. Fixed in commit e02a7da: extracted extractMessageText to sessionScanner.ts (shared between both launchers) and updated handleTranscriptMessage in geminiLocalLauncher.ts to use it, so array-form content is now handled correctly.
| onSessionId: (sessionId) => session.onSessionFound(sessionId) | ||
| }); | ||
| // Local mode handles live messages directly; prevent remote mode from re-replaying history | ||
| session.historyReplayed = true; |
There was a problem hiding this comment.
[MAJOR] historyReplayed set here blocks remote history replay on local→remote switch, even when local started with --resume. The local scanner primes transcript and does not emit existing messages, so pre-existing history never reaches the hub. Evidence cli/src/gemini/utils/sessionScanner.ts:144 and cli/src/gemini/geminiRemoteLauncher.ts:94.
Suggested fix:
// after creating scanner
const transcript = await readGeminiTranscript(transcriptPath);
session.historyReplayCutoff = transcript?.messages?.length ?? 0;
session.historyReplayed = session.historyReplayCutoff === 0;
// in replayHistoricalMessages
const cutoff = this.session.historyReplayCutoff ?? messages.length;
for (let i = 0; i < Math.min(messages.length, cutoff); i += 1) {
const message = messages[i];
// existing replay logic
}
this.session.historyReplayed = true;There was a problem hiding this comment.
Valid point. Fixed in commit e02a7da: added historyReplayCutoff to GeminiSession. When the local scanner starts, it reads the existing transcript length and sets historyReplayCutoff = N (and keeps historyReplayed = false when N > 0). On local→remote switch, replayHistoricalMessages replays only messages[0..N] — exactly the ones the scanner primed/skipped — so the hub receives the full history without duplicating messages the scanner already forwarded.
- Extract extractMessageText to sessionScanner.ts (shared utility) - Fix geminiLocalLauncher to handle array-form user message content - Add historyReplayCutoff to GeminiSession: local scanner records how many messages existed before it started; remote replays only those messages on local→remote switch, avoiding hub history gaps - Limit replayHistoricalMessages to cutoff count when set (cutoff=0 means no limit, replaying all messages as before) TDD: extractMessageText tests in sessionScanner.test.ts (6 cases), cutoff and array-content replay tests in geminiRemoteLauncher.test.ts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Findings
- [Minor] History replay cutoff set for every local launch, even when transcript already synced from remote mode; next remote switch can replay duplicates. Evidence
cli/src/gemini/geminiLocalLauncher.ts:76
Suggested fix:if (!session.historyReplayed) { if (session.startingMode === 'remote') { session.historyReplayCutoff = 0; session.historyReplayed = true; } else { session.historyReplayCutoff = existingCount; session.historyReplayed = existingCount == 0; } }
Summary
Review mode: follow-up after new commits. 1 issue. Duplicate history replay risk on remote→local→remote switches.
Testing
- Not run (automation)
| // Track how many messages existed before the scanner started. | ||
| // Remote mode will replay exactly these messages if a local→remote switch happens. | ||
| // If there were no pre-existing messages, mark history as already replayed. | ||
| session.historyReplayCutoff = existingCount; |
There was a problem hiding this comment.
[MINOR] History replay cutoff always derived from existing transcript, even when those messages were already synced during remote mode. This can replay duplicates on the next remote switch.
Suggested fix:
if (!session.historyReplayed) {
if (session.startingMode === 'remote') {
session.historyReplayCutoff = 0;
session.historyReplayed = true;
} else {
session.historyReplayCutoff = existingCount;
session.historyReplayed = existingCount == 0;
}
}There was a problem hiding this comment.
Valid point. Fixed in commit 1061603: wrapped the cutoff assignment in \!session.historyReplayed guard. If startingMode === 'remote', we set historyReplayCutoff = 0 and historyReplayed = true so a subsequent remote switch skips replay entirely. If startingMode === 'local', the existing cutoff logic is unchanged.
…switch When a session started in remote mode switches to local, the scanner was unconditionally overwriting historyReplayCutoff/historyReplayed, causing duplicates on the next remote switch. Fix: guard historyReplayCutoff update with !session.historyReplayed and check startingMode — remote-origin sessions need cutoff=0 and historyReplayed=true; local-origin resume sessions set cutoff to the pre-existing message count as before. TDD: new geminiLocalLauncher.test.ts covers all three cases. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Findings
- None.
Summary
- Review mode: follow-up after new commits. No issues found in modified lines. Residual risks: resume/history behavior depends on transcript availability and manual end-to-end testing; abort/cancel coverage only unit tests.
Testing
- Not run (automation)
HAPI Bot
Summary
--resume <geminiSessionId>support for resuming previous Gemini sessions in remote modeloadSessionfails on resumecancelPrompt()now properly unblocksbackend.prompt()via AbortSignalBug Fix: Stuck Session on Abort
Previously,
cancelPrompt()only sent asession/cancelnotification to the Gemini CLI ACP backend. If the backend did not respond to the cancel,backend.prompt()would hang indefinitely (timeoutMs: Infinity), blocking the main message loop from processing any further user input — leaving the HAPI session permanently unresponsive.Root cause:
AcpStdioTransport.sendRequest()had no cancellation mechanism.Fix: Added
AbortSignalsupport through the ACP stack:AcpStdioTransport.sendRequest()accepts an optionalAbortSignalAcpSdkBackend.prompt()accepts and forwards the signalgeminiRemoteLauncherpasses the session abort controller signal toprompt()Test plan
loadSessionfailure: session starts fresh without errorbun vitest run src/agent/backends/acp/🤖 Generated with Claude Code