Skip to content

feat(memory): user work-profile injected at SessionStart#72

Closed
fazleelahhee wants to merge 1 commit into
mainfrom
feat/work-profile-summary
Closed

feat(memory): user work-profile injected at SessionStart#72
fazleelahhee wants to merge 1 commit into
mainfrom
feat/work-profile-summary

Conversation

@fazleelahhee
Copy link
Copy Markdown
Contributor

Summary

Companion to #71 (project_summary). Where project_summary
describes what the project IS, this describes how the user works
on it
— cadence, top-touched files, recurring rollup themes,
decision volume — so each new Claude/Codex session opens with a
"you typically work in X area, ship Y sessions a week, here are
the themes that keep coming up" preamble.

Why both PRs together help

Each Claude/Codex session previously had to re-derive:

The two are surfaced as separate Markdown blocks in
build_session_resume, written by independent code paths. Either
PR can merge first; the resume just renders whichever blocks exist.

How work_profile is derived

Pure-SQL aggregation over existing memory tables — no LLM, no
embeddings, no network call.

  • sessions.started_at_epoch + prompt_count → cadence line
  • code_areas GROUP BY file_path → top N most-touched files
  • sessions.rollup_summary tokenised, stop-word-stripped, frequency-
    ranked → recurring themes (must repeat ≥ 2× to count, so
    one-off jargon doesn't pollute)
  • decisions COUNT → "N decisions on file"

Stop-word set reuses grammar._FILLERS_ULTRA and adds a small
coding-prose-specific extension (fix, add, update, use, …)
so domain terms (cache, retry, auth, parser) survive.

What lands

  • memory.db v4: work_profile table (additive migration; safe
    for existing dbs, tolerated when partially-migrated)
  • src/context_engine/memory/work_profile.py: builder + upsert +
    load + is_stale + format_profile_block + refresh_work_profile
  • build_session_resume prepends the new block; degrades cleanly
    if v4 table is missing
  • cce profile CLI (with --force) — shows + refreshes
  • cce init runs a baseline refresh after the initial index so
    the very first SessionStart already has data

Tests

21 new cases in tests/memory/test_work_profile.py:

  • schema migration
  • cadence (skipped <2 sessions; reports span + avg + last-active)
  • top_files ranking + empty case
  • themes extraction with stop-word strip + min-repeat enforcement
  • decision count
  • format_profile_block omits empty sections; pluralises "decisions"
  • refresh respects TTL but rebuilds on force=True
  • build_session_resume includes the block; tolerates missing v4
    table; empty on virgin state

Deferred to a follow-up

Mirror the same block into the MCP context-engine-init bootstrap
prompt so Codex CLI (no hook support) gets the same content via
its system-prompt path. Data + formatter are ready — only wiring is
left, same as the project_summary PR.

Verification

  • pytest -n 4896 passed, 1 skipped, 0 failed
  • ruff check on every changed file → clean

Test plan

  • cce init on a fresh project — passes silently (no
    session history yet, profile block stays empty)
  • After 3+ Claude Code sessions, run cce profile — see the
    block populate
  • cce profile --force after editing rollups — confirm the
    generated_at_epoch advances
  • Restart Claude Code — confirm SessionStart injects the new
    "Your work profile" block at the top of the resume
  • Older memory.db (pre-v4) opens cleanly and the resume falls
    back to the old shape

Companion to project_summary: where project_summary describes WHAT the
project is, this describes HOW THE USER works on it — cadence, top
files, recurring themes, decision volume — so each new Claude/Codex
session opens with "you typically work in X, ship Y sessions a week,
and these themes keep coming up" preamble.

Built extractively from existing memory.db tables:

- sessions (cadence + prompt_count) → "N sessions over D days · ~K
  prompts/session · last active …"
- code_areas (file_path COUNT) → "src/foo.py (×7), src/bar.py (×3)"
- sessions.rollup_summary (stop-word-stripped token tally) →
  "retry, cache, auth, parser, …" (top _TOP_THEMES_N tokens that
  occur >=2 times across rollups)
- decisions (total COUNT) → "N decisions on file"

Zero LLM dep — reuses grammar._FILLERS_ULTRA + a small set of
extractive stop-words specific to coding-session prose so domain
terms survive.

What lands

- memory.db v4: new work_profile table (additive migration; safe for
  existing dbs and survives partially-migrated states).
- src/context_engine/memory/work_profile.py — builder, upsert/load,
  is_stale, format_profile_block, refresh_work_profile.
- build_session_resume prepends the new block (graceful fallback if
  the v4 table is missing).
- `cce profile` CLI (with `--force`) — shows + refreshes the cached
  block on demand.
- `cce init` runs a baseline refresh after the initial index so the
  very first SessionStart already has data.

Tests (new file)

- schema migration
- cadence (skipped <2 sessions; reports span + avg + last-active)
- top_files ranking + empty case
- themes extraction with stop-word strip; min-repeat enforcement;
  empty when no rollups
- decision count
- format_profile_block omits empty sections, pluralises decisions
- refresh respects TTL but rebuilds on force
- build_session_resume includes the block, tolerates missing v4
  table, empty when no state

Composes with the project_summary work on
feat/auto-project-summary-on-session-start: independent tables,
independent code paths, both surfaced in the same resume by their
respective format functions. Merge order doesn't matter — the resume
just renders whichever blocks exist.

Suite: 896 passed, 1 skipped, 0 failed. Ruff clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant