Skip to content

feat: add ContextCompressor for context overflow handling#885

Open
vinci-grape wants to merge 4 commits intomodelscope:mainfrom
vinci-grape:feat/context-compressor
Open

feat: add ContextCompressor for context overflow handling#885
vinci-grape wants to merge 4 commits intomodelscope:mainfrom
vinci-grape:feat/context-compressor

Conversation

@vinci-grape
Copy link
Collaborator

Add a new context compression mechanism inspired by opencode's compaction approach. Features include:

  • Token-based overflow detection
  • Tool output pruning to reduce context size
  • LLM-based conversation summarization

Change Summary

Related issue number

Checklist

  • The pull request title is a good summary of the changes - it will be used in the changelog
  • Unit tests for the changes exist
  • Run pre-commit install and pre-commit run --all-files before git commit, and passed lint check.
  • Documentation reflects the changes where applicable

Add a new context compression mechanism inspired by opencode's
compaction approach. Features include:
- Token-based overflow detection
- Tool output pruning to reduce context size
- LLM-based conversation summarization

🤖 Generated with [Qoder][https://qoder.com]
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust context compression mechanism for agents, designed to prevent context window overflow during prolonged conversations. It achieves this by intelligently detecting token limits, pruning less critical tool outputs, and leveraging large language models to summarize conversation history, thereby maintaining relevant context without exceeding token constraints.

Highlights

  • Context Compression Mechanism: Introduced a new ContextCompressor class to manage and mitigate context overflow in agent conversations, inspired by opencode's compaction approach.
  • Token-based Overflow Detection: Implemented functionality to monitor token usage and detect when conversation context exceeds predefined limits.
  • Tool Output Pruning: Added a strategy to prune older tool call outputs, truncating them to save context while protecting recent and relevant information.
  • LLM-based Conversation Summarization: Incorporated the ability to use an LLM to generate concise summaries of conversations, which can then replace extensive chat history to reduce context size.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a ContextCompressor to manage context window overflow by pruning tool outputs and summarizing conversations. The implementation is well-structured and follows the described strategy. My review includes a few suggestions for improvement, mainly around making error handling more specific by avoiding broad exception catches, replacing a magic number with a constant for better maintainability, and simplifying a redundant conditional check.

Comment on lines +64 to +65
except Exception as e:
logger.warning(f'Failed to init LLM for summary: {e}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Catching a generic Exception is generally discouraged as it can mask unexpected errors during LLM initialization and make debugging difficult. It's better to catch more specific exceptions that you anticipate LLM.from_config() might raise, such as configuration or import errors. This makes the error handling more robust and intentional.

content = msg.content if isinstance(msg.content, str) else str(
msg.content)
if content:
conv_parts.append(f'{role}: {content[:2000]}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The value 2000 for truncating message content is a "magic number". To improve readability and maintainability, it should be defined as a named constant at the module level (e.g., SUMMARY_CONTENT_TRUNCATE_LIMIT = 2000) or as a configurable class attribute initialized in __init__. This makes the code's intent clearer and simplifies future modifications to this limit.

Comment on lines +158 to +159
except Exception as e:
logger.error(f'Summary generation failed: {e}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This try...except Exception block is too broad. It can hide various issues from the LLM generation call, such as network errors, API key problems, or rate limits. Please consider catching more specific exceptions if the LLM client library provides them, and handle them accordingly. This will make your error handling more precise and robust.

Comment on lines +201 to +202
if last_user.content and last_user.content != result[-1].content:
result.append(last_user)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The condition last_user.content != result[-1].content is almost always true because result[-1].content is a formatted summary string, making this check redundant. If the intent is to append the last user message if it has content, the check can be simplified.

Suggested change
if last_user.content and last_user.content != result[-1].content:
result.append(last_user)
if last_user.content:
result.append(last_user)

msg = Message(
role=msg.role,
content='[Output truncated to save context]',
tool_call_id=msg.tool_call_id,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

只tool_call_id就行吗?新 Message 时不保留 tool_calls 等字段会不会影响下游处理

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This path trims tool results (role='tool' content only).

return 0
return len(text) // 4

def estimate_message_tokens(self, msg: Message) -> int:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没有直接使用Message类里面的prompt token、completion_tokens之类信息是因为?

usable = self.context_limit - self.reserved_buffer
return total >= usable

def prune_tool_outputs(self, messages: List[Message]) -> List[Message]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个函数的prune操作过程感觉统一一下风格会不会更好?比如按照LLMAgent内的风格统一改成in-place操作修改msg.content=xxx。
现在对无需裁剪的部分是复用旧对象,对于需要裁剪的部分是创建新的对象,返回的list里面混合了两种类型的对象,后续修改的作用范围不是特别清晰。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants