-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
Summary
Add intelligent conversation summarization as an alternative to simple trimming for long sessions.
Priority: Low - Only implement if users report losing important context with trimming.
Context
PR #35 adds token trimming which caps conversation history at 6K tokens. This works well for most cases, but may lose important context in very long sessions.
Proposed Implementation
Add a SummarizationNode that triggers when history exceeds a threshold:
def summarize_if_needed(state: BaseAgentState) -> dict:
messages = state.get("messages", [])
token_count = count_tokens_approximately(messages)
if token_count > 10000: # Threshold before summarizing
# Keep last 3-5 messages verbatim
recent = messages[-4:]
old = messages[:-4]
# Summarize old messages (hidden from user)
summary_prompt = f"Summarize this conversation context concisely:\n{old}"
summary = llm.invoke(summary_prompt)
return {
"messages": [
SystemMessage(content=f"Previous context: {summary.content}"),
*recent
],
"context_summary": summary.content,
}
return {}Trade-offs
| Approach | Pros | Cons |
|---|---|---|
| Trimming (current) | Zero extra LLM calls, fast | May lose older context |
| Summarization | Preserves key info | Extra LLM call per summary |
When to Implement
Only implement if users report losing important context with the current trimming approach.
Related
- Optimize token usage: reduce ~25K tokens per iteration #34 - Token optimization (parent issue)
- feat: add token-aware conversation trimming #35 - Token trimming PR (Phase 1)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels