Skip to content

Openclaw exp#17

Merged
binary-husky merged 5 commits intomainfrom
openclaw-exp
Mar 19, 2026
Merged

Openclaw exp#17
binary-husky merged 5 commits intomainfrom
openclaw-exp

Conversation

@binary-husky
Copy link
Collaborator

No description provided.

@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a comprehensive overhaul of the agent reward computation, moving from a single-factor (extraversion) to a multi-factor evaluation system. By integrating relevance, diversity, and a quality gate, the changes aim to foster more intelligent, varied, and non-degenerate agent responses. The update also provides greater configurability for reward weights and enhances observability through query history tracking, ultimately leading to more robust and controllable LLM agent training.

Highlights

  • Enhanced Reward System: The reward computation logic has been significantly upgraded to incorporate multiple dimensions beyond just extraversion, including relevance, diversity, and a quality gate for repetition and degeneration. This allows for more nuanced and robust evaluation of agent responses.
  • Configurable Reward Weights: New environment variables (W_EXTRAVERSION, W_RELEVANCE, W_DIVERSITY) have been introduced, allowing for flexible weighting of the different reward components. This enables fine-tuning the reward function to specific agent behaviors.
  • Diversity and Quality Gates: Implemented mechanisms to penalize responses that are repetitive (within a batch or across historical requests) or degenerate (e.g., looping text, special token leaks). This promotes more unique and high-quality outputs from the agents.
  • Improved Testing and Documentation: The reward system now includes comprehensive tests covering all new dimensions (relevance, diversity, quality gate) and a new cheatsheet for easier setup and testing of OpenClaw rewards. Documentation for multi-model training examples has also been updated with new diagrams.
  • Query History Tracking: The system now tracks a rolling buffer of recent user queries, which can be exposed via a new API endpoint. This helps in debugging and identifying potential data distribution issues.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@binary-husky binary-husky merged commit 78182cd into main Mar 19, 2026
2 checks passed
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly enhances the reward mechanism for the OpenClaw agent by introducing a composite reward model. This new model considers not just extraversion, but also relevance, diversity, and a quality gate to penalize degenerate outputs. The implementation is well-supported by new logic for history tracking, scoring functions, and a comprehensive suite of tests. My review focuses on improving robustness, correcting documentation, and addressing a critical security vulnerability related to a hardcoded API key in the test files. I've also included minor suggestions for code style and documentation formatting.


sys.path.insert(0, os.path.dirname(__file__))
os.environ["DASHSCOPE_API_KEY"] = os.getenv("DASHSCOPE_API_KEY", "sk-xxx")
os.environ["DASHSCOPE_API_KEY"] = os.getenv("DASHSCOPE_API_KEY", "sk-311cfac3a0f94ff4b5ddf401f70fa338")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

A hardcoded DashScope API key has been found in this test file. Committing secrets like API keys to version control is a major security risk, as it exposes them to anyone with access to the repository. The key should be removed immediately. Please configure tests to load secrets from a secure source, such as environment variables, and ensure the hardcoded value is not present in the git history.

Suggested change
os.environ["DASHSCOPE_API_KEY"] = os.getenv("DASHSCOPE_API_KEY", "sk-311cfac3a0f94ff4b5ddf401f70fa338")
os.environ["DASHSCOPE_API_KEY"] = os.getenv("DASHSCOPE_API_KEY", "sk-xxx")

Comment on lines +36 to +38
W_EXTRAVERSION = float(os.getenv("W_EXTRAVERSION", "0.5"))
W_RELEVANCE = float(os.getenv("W_RELEVANCE", "0.3"))
W_DIVERSITY = float(os.getenv("W_DIVERSITY", "0.2"))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The comment on line 35 states that the reward weights must sum to 1.0, but this is not enforced in the code. An incorrect configuration could lead to skewed rewards and silently impact model training. It's good practice to add an assertion to validate this constraint at startup.

Suggested change
W_EXTRAVERSION = float(os.getenv("W_EXTRAVERSION", "0.5"))
W_RELEVANCE = float(os.getenv("W_RELEVANCE", "0.3"))
W_DIVERSITY = float(os.getenv("W_DIVERSITY", "0.2"))
W_EXTRAVERSION = float(os.getenv("W_EXTRAVERSION", "0.5"))
W_RELEVANCE = float(os.getenv("W_RELEVANCE", "0.3"))
W_DIVERSITY = float(os.getenv("W_DIVERSITY", "0.2"))
assert abs(W_EXTRAVERSION + W_RELEVANCE + W_DIVERSITY - 1.0) < 1e-9, "Reward weights must sum to 1.0"

Comment on lines +94 to +95


Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There are excessive blank lines here, which can affect the document's readability. It's generally best to use a single blank line to separate paragraphs or elements.

Comment on lines +92 to +93


Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There are excessive blank lines here, which can affect the document's readability. It's generally best to use a single blank line to separate paragraphs or elements.

Comment on lines +182 to +183


Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There are excessive blank lines here, which can affect the document's readability. Please remove the extra blank lines to keep the formatting clean.

json_data["stream"] = is_stream

# Remove fields not supported by vLLM to avoid warnings
UNSUPPORTED_FIELDS = {"strict", "store"}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The UNSUPPORTED_FIELDS set is defined within this function. Since its value is static, consider defining it as a module-level constant (e.g., near SWARM_URL). This improves maintainability by grouping configuration values together and makes it clear this is not a per-request value.

# Diversity: n-gram overlap (fast, deterministic, no LLM needed)
# ---------------------------------------------------------------------------
def _get_ngrams(text: str, n: int = 3) -> collections.Counter:
"""Extract character-level n-grams from text."""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The docstring states that this function extracts 'character-level n-grams', but the implementation uses text.lower().split() which operates on words. This is a discrepancy between the documentation and the code's behavior. Please update the docstring to reflect that it extracts word-level n-grams.

Suggested change
"""Extract character-level n-grams from text."""
"""Extract word-level n-grams from text."""

Compute a diversity score for each response (0 = duplicate, 1 = fully unique).

Two components:
1. Within-batch: average pairwise n-gram overlap with other responses in the batch

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The docstring states that the within-batch component is the 'average pairwise n-gram overlap', but the implementation on line 153 uses max(batch_overlaps). This calculates the worst-case overlap, not the average. Please update the docstring to accurately describe the implementation.

Suggested change
1. Within-batch: average pairwise n-gram overlap with other responses in the batch
1. Within-batch: maximum pairwise n-gram overlap with other responses in the batch

Comment on lines +178 to +179
2. compute_string_madness — catches nonsense chars, special token leaks,
character-level repetition

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment mentions compute_string_madness, but this function is not directly called in the implementation below. Instead, has_repeat is used. This can be confusing. To improve clarity, please update the comment to reflect the actual implementation, which uses has_repeat to detect character-level and word-level repetition.

Suggested change
2. compute_string_madnesscatches nonsense chars, special token leaks,
character-level repetition
2. `has_repeat`catches word/character-level repetition and special token leaks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant