Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 69 additions & 9 deletions .claude/skills/performance-optimization/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,18 @@
name: performance-optimization
description: >
Performance invariants and optimization knowledge for PolyPilot's render pipeline,
session switching, persistence, and caching layers. Use when: (1) Modifying
RefreshSessions, GetOrganizedSessions, or SafeRefreshAsync, (2) Touching
SaveActiveSessionsToDisk, SaveOrganization, or SaveUiState, (3) Working with
LoadPersistedSessions or session directory scanning, (4) Modifying markdown
rendering or the message cache, (5) Optimizing render cycle performance or
adding Blazor component rendering, (6) Working with debounce timers or
DisposeAsync cleanup. Covers: session-switch bottleneck fix, debounce flush
requirements, expensive operation guards, cache invalidation strategy, and
render cycle analysis.
session switching, persistence, caching layers, and startup performance. Use when:
(1) Modifying RefreshSessions, GetOrganizedSessions, or SafeRefreshAsync,
(2) Touching SaveActiveSessionsToDisk, SaveOrganization, or SaveUiState,
(3) Working with LoadPersistedSessions or session directory scanning,
(4) Modifying markdown rendering or the message cache,
(5) Optimizing render cycle performance or adding Blazor component rendering,
(6) Working with debounce timers or DisposeAsync cleanup,
(7) Debugging startup slowness or blue screen on launch,
(8) Modifying InitializeAsync or RestoreSessionsInBackgroundAsync.
Covers: session-switch bottleneck fix, debounce flush requirements, expensive
operation guards, cache invalidation strategy, render cycle analysis, and
startup performance debugging.
---

# Performance Optimization
Expand Down Expand Up @@ -99,3 +102,60 @@ item heights — chat messages have variable height from markdown rendering.
### ChatMessageItem has no ShouldRender()
Always re-renders when parent calls `StateHasChanged()`. Adding a content
hash comparison would skip re-renders for unchanged messages.

## Startup Performance Debugging

### Architecture
Startup has two phases:
1. **UI thread (blocking):** MAUI bootstrap → BlazorWebView creation → Blazor framework init → Dashboard `OnInitialized` → `InitializeAsync` → fires `RestoreSessionsInBackgroundAsync` on ThreadPool
2. **Background thread (non-blocking):** Read `active-sessions.json` → create lazy placeholders → eager-resume candidates → `ReconcileOrganization`

The UI renders as soon as `InitializeAsync` returns. Session restore NEVER blocks the UI thread (runs via `Task.Run`). If you see a blue screen, the problem is in phase 1 (UI thread), not phase 2.

### Instrumentation
`[STARTUP-TIMING]` log tags in `~/.polypilot/console.log`:
```
[STARTUP-TIMING] LoadOrganization: 63ms ← reads organization.json
[STARTUP-TIMING] Pre-restore: 64ms ← time before Task.Run
[STARTUP-TIMING] Session loop complete: 35056ms ← background thread, NOT blocking UI
[STARTUP-TIMING] RestoreSessionsInBackground: 35095ms ← total background time
```

### How to debug startup slowness

**Step 1: Measure the UI-visible delay**
```
BlazorDevFlow ConfigureHandler → Dashboard Restoring UI state
```
Find both timestamps in console.log for the current PID. The gap is the user-visible startup time. Normal: 5-8 seconds.

**Step 2: Identify which phase is slow**
- If `[STARTUP-TIMING] Pre-restore` is high (>500ms): `LoadOrganization` or `InitializeAsync` is slow on the UI thread
- If `[STARTUP-TIMING] Session loop` is high but UI rendered fast: background restore is slow but not blocking the user — acceptable
- If neither timing tag appears: the slowdown is in Blazor framework init (before our code runs) — check system load, WebView issues

**Step 3: A/B test against main**
```bash
# Save current branch
git stash && git checkout main
cd PolyPilot && ./relaunch.sh
# Measure: BlazorDevFlow Config → Dashboard Restore gap

# Switch back
git checkout <branch> && git stash pop
cd PolyPilot && ./relaunch.sh
# Compare gaps
```

⚠️ **Use `git stash` + `git checkout` — do NOT `git checkout origin/main -- .`** (checks out files but keeps wrong branch name, confuses the app's branch display).

**Step 4: Common false alarms**
- CPU load from concurrent test runs or builds causes 2-3x slowdown in Blazor init
- DLL file locks from running app cause build failures (retry after a few seconds)
- The first launch after a `dotnet clean` is always slower (JIT compilation)
- Run the comparison 2-3 times each to account for variance

### Critical rules
- **NEVER block the UI thread during restore.** All session loading uses `Task.Run` + `ConfigureAwait(false)`. Violations cause blue screen.
- **`LoadPersistedSessions()` is O(N) on ALL session directories** (750+). Never call from `InitializeAsync` or any UI-triggered path. See PERF-2.
- **InvokeOnUI callbacks during restore compete with Blazor rendering.** Minimize them. The restore path should batch state changes and call `NotifyStateChanged` once at the end, not per-session.
36 changes: 24 additions & 12 deletions .claude/skills/processing-state-safety/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,12 @@ description: >
SDK event handlers, (4) Debugging stuck sessions showing "Thinking..." forever
or spinner stuck, (5) Modifying IsResumed, HasUsedToolsThisTurn, or ActiveToolCallCount,
(6) Adding diagnostic log tags, (7) Modifying session restore paths
(RestoreSingleSessionAsync) that must initialize watchdog-dependent state,
(RestorePreviousSessionsAsync / EnsureSessionConnectedAsync) that must initialize
watchdog-dependent state,
(8) Modifying ReconcileOrganization or any code that reads Organization.Sessions
during the IsRestoring window, (9) Session appears hung or unresponsive after tool use.
Covers: 18 invariants from 13 PRs of fix cycles,
the 16 code paths that set/clear IsProcessing, and common regression patterns.
the 21+ code paths that set/clear IsProcessing, and common regression patterns.
---

# Processing State Safety
Expand Down Expand Up @@ -151,9 +152,9 @@ cross-thread fields without a tracking comment explaining the gap.
causing stale renders.

### INV-9: Session restore must initialize all watchdog-dependent state
The restore path (`RestoreSingleSessionAsync`) is separate from `SendPromptAsync`.
Any field that affects watchdog timeout selection or dispatch routing must be
initialized in BOTH paths:
The restore path (`RestorePreviousSessionsAsync` + `EnsureSessionConnectedAsync`) is
separate from `SendPromptAsync`. Any field that affects watchdog timeout selection or
dispatch routing must be initialized in BOTH paths:
- `IsMultiAgentSession` — set via `IsSessionInMultiAgentGroup()` before `StartProcessingWatchdog`
- `HasReceivedEventsSinceResume` / `HasUsedToolsThisTurn` — set via `GetEventsFileRestoreHints()`
- `IsResumed` — set on the `AgentSessionInfo` when `isStillProcessing` is true
Expand Down Expand Up @@ -315,10 +316,10 @@ complete the response while sub-agents are still working.
5. **Missing state initialization on session restore** — `IsMultiAgentSession`,
`IsResumed`, and other flags must be set on restored sessions BEFORE
`StartProcessingWatchdog` is called. The restore path in
`RestoreSingleSessionAsync` is separate from `SendPromptAsync` and must
independently initialize all state the watchdog depends on. PR #284 fixed
`IsMultiAgentSession` not being set during restore, causing the watchdog
to use 120s instead of 600s for multi-agent workers.
`RestorePreviousSessionsAsync` / `EnsureSessionConnectedAsync` is separate
from `SendPromptAsync` and must independently initialize all state the watchdog
depends on. PR #284 fixed `IsMultiAgentSession` not being set during restore,
causing the watchdog to use 120s instead of 600s for multi-agent workers.

**Retired mistake (was #2):** *ActiveToolCallCount as sole tool signal* — still relevant per
INV-5, but the more impactful version is #2 above (suppressing the fallback entirely).
Expand All @@ -345,15 +346,26 @@ When a session shows "Thinking..." indefinitely:
| Symptom | Likely Cause | Fix |
|---------|-------------|-----|
| `[SEND]` then silence | SDK never responded, watchdog will catch at 120s | Wait or abort |
| `[EVT] TurnEnd` but no `[IDLE]` | Zero-idle SDK bug | Watchdog catches at 30s fallback (INV-10) |
| `[IDLE-DEFER]` then long silence | Background tasks (sub-agents/shells) active but never completed | Check agent status; watchdog will eventually catch (INV-18) |
| `[EVT] TurnEnd` but no `[IDLE]` | `session.idle` is ephemeral (never on disk). If live events stopped: IDLE-DEFER deferred the idle and `IsProcessing` was cleared before the follow-up arrived (#403) | Watchdog catches; #403 fix re-arms IsProcessing |
| `[IDLE-DEFER]` then long silence | Background tasks (sub-agents/shells) active | Check `[IDLE-DIAG]` for backgroundTasks count; watchdog will eventually catch (INV-18) |
| `[IDLE-DEFER]` with `IsProcessing=False` | IDLE-DEFER fired but IsProcessing was already cleared by watchdog/reconnect | #403 IDLE-DEFER-REARM fix re-arms IsProcessing |
| `[COMPLETE]` fired but spinner persists | UI thread not notified | Check INV-2, INV-8 |
| `[WATCHDOG]` clears but re-sticks | New turn started before watchdog callback ran | Check INV-3 generation guard |
| After relaunch: session shows "Working" for 600s | Session was active on CLI during relaunch; poll-then-resume waiting for `session.shutdown` (only disk-persisted terminal event) | Normal — watchdog clears at 600s. `session.idle` is ephemeral, never on disk |

5. **Nuclear option** — user clicks Stop (AbortSessionAsync, path #5/#6).

## Key Facts About session.idle

- `session.idle` is **`ephemeral: true`** in the SDK schema — intentionally NOT written to events.jsonl
- Events written to disk: `session.start`, `session.resume`, `session.shutdown` — NOT `session.idle`
- PolyPilot receives `SessionIdleEvent` over the live SDK event stream
- CLI correctly populates `backgroundTasks` field (proven with `[IDLE-DIAG]` instrumentation)
- When `backgroundTasks` is active, IDLE-DEFER defers completion until a subsequent idle with empty backgroundTasks
- After app relaunch, the poll-then-resume pattern watches events.jsonl for `session.shutdown` only (the only terminal event on disk)

## Regression History

10 PRs of fix/regression cycles: #141 → #147 → #148 → #153 → #158 → #163 → #164 → #276 → #284 → #332.
Additional safety PRs: #373 (orphaned state guards), #375 (premature idle re-arm), #399 (IDLE-DEFER for background tasks).
Additional safety PRs: #373 (orphaned state guards), #375 (premature idle re-arm), #399 (IDLE-DEFER for background tasks), #472 (poll-then-resume + IDLE-DEFER-REARM + model selection).
See `references/regression-history.md` for the full timeline with root causes.
4 changes: 2 additions & 2 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -239,7 +239,7 @@ The UI shows: "Sending…" → "Server connected…" → "Thinking…" → "Work
**CRITICAL**: Every code path that sets `IsProcessing = false` must clear 9 companion fields and call `FlushCurrentResponse`. This is the most recurring bug category (13 PRs of fix/regression cycles). **Read `.claude/skills/processing-state-safety/SKILL.md` before modifying ANY processing path.** There are 15+ such paths across CopilotService.cs, Events.cs, Bridge.cs, Organization.cs, and Providers.cs.

### Content Persistence
`FlushCurrentResponse` is also called on `AssistantTurnEndEvent` to persist accumulated response text at each sub-turn boundary. This prevents content loss if the app restarts between `turn_end` and `session.idle` (e.g., "zero-idle sessions" where the SDK never emits `session.idle`). The flush includes a dedup guard to prevent duplicate messages from event replay on resume.
`FlushCurrentResponse` is also called on `AssistantTurnEndEvent` to persist accumulated response text at each sub-turn boundary. This prevents content loss if the app restarts between `turn_end` and `session.idle`. When the IDLE-DEFER logic defers `session.idle` (active background tasks), the flush ensures content from the foreground turn is saved. The flush includes a dedup guard to prevent duplicate messages from event replay on resume.

### Processing Watchdog
The processing watchdog (`RunProcessingWatchdogAsync` in `CopilotService.Events.cs`) detects stuck sessions by checking how long since the last SDK event. It checks every 15 seconds and has three timeout tiers:
Expand All @@ -252,7 +252,7 @@ The processing watchdog (`RunProcessingWatchdogAsync` in `CopilotService.Events.

For multi-agent sessions, Case B also checks **file-size-growth**: if events.jsonl hasn't grown for `WatchdogCaseBMaxStaleChecks` (2) consecutive deferrals, the session is force-completed — the connection is dead. This catches `ConnectionLostException` scenarios where mtime stays fresh but no new data arrives, reducing detection from 30+ min to ~360s (3 cycles: 1 baseline + 2 stale checks). The 1800s freshness window is preserved.

Note: Some sessions never receive `session.idle` events (SDK/CLI bug). In these "zero-idle" cases, `IsProcessing` is only cleared by the watchdog or user abort. The turn_end flush (see Content Persistence above) ensures response content is not lost.
Note: `session.idle` is an ephemeral event (`ephemeral: true` in the SDK schema) — it is delivered over the live event stream but intentionally NOT written to `events.jsonl`. When `session.idle` includes active `backgroundTasks` (sub-agents, shells), the IDLE-DEFER logic defers completion until a subsequent idle arrives with empty/null backgroundTasks. In rare cases where `IsProcessing` was already cleared (by watchdog timeout or reconnect) before the deferred idle arrives, the session may appear stuck until the watchdog fires again — see issue #403.

When the watchdog fires, it marshals state mutations to the UI thread via `InvokeOnUI()` and adds a system warning message.

Expand Down
108 changes: 108 additions & 0 deletions PolyPilot.Tests/BackgroundTasksIdleTests.cs
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
using GitHub.Copilot.SDK;
using PolyPilot.Models;
using PolyPilot.Services;

namespace PolyPilot.Tests;
Expand Down Expand Up @@ -110,4 +111,111 @@ public void HasActiveBackgroundTasks_DataNull_ReturnsFalse()
var idle = new SessionIdleEvent { Data = null! };
Assert.False(CopilotService.HasActiveBackgroundTasks(idle));
}

// --- IDLE-DEFER-REARM tests (issue #403) ---
// These test the expected re-arm conditions when session.idle arrives
// with active background tasks but IsProcessing is already false.

[Fact]
public void IdleDeferRearm_ShouldRearm_WhenBackgroundTasksActiveAndNotProcessing()
{
// Scenario: watchdog or reconnect cleared IsProcessing,
// then session.idle arrives with active background tasks.
// Expected: should re-arm IsProcessing=true.
var info = new AgentSessionInfo { Name = "test", Model = "test-model" };
info.IsProcessing = false; // Cleared by watchdog

var idle = MakeIdle(new SessionIdleDataBackgroundTasks
{
Agents = new[]
{
new SessionIdleDataBackgroundTasksAgentsItem
{
AgentId = "agent-1", AgentType = "copilot", Description = "worker"
}
},
Shells = Array.Empty<SessionIdleDataBackgroundTasksShellsItem>()
});

// Verify preconditions
Assert.False(info.IsProcessing);
Assert.True(CopilotService.HasActiveBackgroundTasks(idle));

// The re-arm logic: if !IsProcessing && HasActiveBackgroundTasks && !WasUserAborted → re-arm
bool shouldRearm = !info.IsProcessing && CopilotService.HasActiveBackgroundTasks(idle);
Assert.True(shouldRearm, "Should re-arm when background tasks active and not processing");

// Simulate re-arm
info.IsProcessing = true;
info.ProcessingPhase = 3; // Working (background tasks)
info.ProcessingStartedAt ??= DateTime.UtcNow;

Assert.True(info.IsProcessing);
Assert.Equal(3, info.ProcessingPhase);
Assert.NotNull(info.ProcessingStartedAt);
}

[Fact]
public void IdleDeferRearm_ShouldNotRearm_WhenNoBackgroundTasks()
{
// Scenario: session.idle arrives with no background tasks and IsProcessing=false.
// Expected: should NOT re-arm — this is a normal completion.
var info = new AgentSessionInfo { Name = "test", Model = "test-model" };
info.IsProcessing = false;

var idle = MakeIdle(bt: null);

bool shouldRearm = !info.IsProcessing && CopilotService.HasActiveBackgroundTasks(idle);
Assert.False(shouldRearm, "Should NOT re-arm when no background tasks");
}

[Fact]
public void IdleDeferRearm_ShouldNotRearm_WhenAlreadyProcessing()
{
// Scenario: session.idle with background tasks but IsProcessing is already true.
// Expected: should NOT re-arm (already armed) — just defer.
var info = new AgentSessionInfo { Name = "test", Model = "test-model" };
info.IsProcessing = true; // Already processing

var idle = MakeIdle(new SessionIdleDataBackgroundTasks
{
Agents = new[]
{
new SessionIdleDataBackgroundTasksAgentsItem
{
AgentId = "a1", AgentType = "copilot", Description = ""
}
},
Shells = Array.Empty<SessionIdleDataBackgroundTasksShellsItem>()
});

bool shouldRearm = !info.IsProcessing && CopilotService.HasActiveBackgroundTasks(idle);
Assert.False(shouldRearm, "Should NOT re-arm when already processing");
}

[Fact]
public void IdleDeferRearm_ShouldNotRearm_WhenUserAborted()
{
// Scenario: user clicked Stop, then session.idle with background tasks arrives.
// Expected: should NOT re-arm — user explicitly stopped.
var info = new AgentSessionInfo { Name = "test", Model = "test-model" };
info.IsProcessing = false;
bool wasUserAborted = true;

var idle = MakeIdle(new SessionIdleDataBackgroundTasks
{
Agents = new[]
{
new SessionIdleDataBackgroundTasksAgentsItem
{
AgentId = "a1", AgentType = "copilot", Description = ""
}
},
Shells = Array.Empty<SessionIdleDataBackgroundTasksShellsItem>()
});

// The full re-arm condition includes WasUserAborted check
bool shouldRearm = !info.IsProcessing && CopilotService.HasActiveBackgroundTasks(idle) && !wasUserAborted;
Assert.False(shouldRearm, "Should NOT re-arm when user aborted");
}
}
18 changes: 11 additions & 7 deletions PolyPilot.Tests/EventsJsonlParsingTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -87,23 +87,27 @@ public void TitleCleaning_RemovesNewlines()
[Fact]
public void IsSessionStillProcessing_ActiveEventTypes()
{
// Terminal events are the only ones that indicate processing is complete.
// Everything else (including intermediate events like assistant.turn_end,
// assistant.message, tool.execution_end) means the session is still active.
var terminalEvents = new[] { "session.idle", "session.error", "session.shutdown" };

// These should indicate the session is still processing (not terminal)
var activeEvents = new[]
{
"assistant.turn_start", "tool.execution_start",
"tool.execution_progress", "assistant.message_delta",
"assistant.reasoning", "assistant.reasoning_delta",
"assistant.intent"
"assistant.intent", "assistant.turn_end",
"assistant.message", "session.start"
};

// These should indicate the session is still processing
foreach (var eventType in activeEvents)
{
Assert.Contains(eventType, activeEvents);
Assert.DoesNotContain(eventType, terminalEvents);
}

// These should NOT indicate processing
var inactiveEvents = new[] { "session.idle", "assistant.message", "session.start" };
foreach (var eventType in inactiveEvents)
// These SHOULD indicate processing is complete (terminal) — must not appear in activeEvents
foreach (var eventType in terminalEvents)
{
Assert.DoesNotContain(eventType, activeEvents);
}
Expand Down
Loading