Openclaw exp by binary-husky · Pull Request #17 · modelscope/AgentJet

binary-husky · 2026-03-19T10:10:18Z

No description provided.

gemini-code-assist · 2026-03-19T10:10:41Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a comprehensive overhaul of the agent reward computation, moving from a single-factor (extraversion) to a multi-factor evaluation system. By integrating relevance, diversity, and a quality gate, the changes aim to foster more intelligent, varied, and non-degenerate agent responses. The update also provides greater configurability for reward weights and enhances observability through query history tracking, ultimately leading to more robust and controllable LLM agent training.

Highlights

Enhanced Reward System: The reward computation logic has been significantly upgraded to incorporate multiple dimensions beyond just extraversion, including relevance, diversity, and a quality gate for repetition and degeneration. This allows for more nuanced and robust evaluation of agent responses.
Configurable Reward Weights: New environment variables (W_EXTRAVERSION, W_RELEVANCE, W_DIVERSITY) have been introduced, allowing for flexible weighting of the different reward components. This enables fine-tuning the reward function to specific agent behaviors.
Diversity and Quality Gates: Implemented mechanisms to penalize responses that are repetitive (within a batch or across historical requests) or degenerate (e.g., looping text, special token leaks). This promotes more unique and high-quality outputs from the agents.
Improved Testing and Documentation: The reward system now includes comprehensive tests covering all new dimensions (relevance, diversity, quality gate) and a new cheatsheet for easier setup and testing of OpenClaw rewards. Documentation for multi-model training examples has also been updated with new diagrams.
Query History Tracking: The system now tracks a rolling buffer of recent user queries, which can be exposed via a new API endpoint. This helps in debugging and identifying potential data distribution issues.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request significantly enhances the reward mechanism for the OpenClaw agent by introducing a composite reward model. This new model considers not just extraversion, but also relevance, diversity, and a quality gate to penalize degenerate outputs. The implementation is well-supported by new logic for history tracking, scoring functions, and a comprehensive suite of tests. My review focuses on improving robustness, correcting documentation, and addressing a critical security vulnerability related to a hardcoded API key in the test files. I've also included minor suggestions for code style and documentation formatting.

gemini-code-assist · 2026-03-19T10:11:57Z

tutorial/opencode_build_openclaw_agent/test_reward.py


 sys.path.insert(0, os.path.dirname(__file__))
-os.environ["DASHSCOPE_API_KEY"] = os.getenv("DASHSCOPE_API_KEY", "sk-xxx")
+os.environ["DASHSCOPE_API_KEY"] = os.getenv("DASHSCOPE_API_KEY", "sk-311cfac3a0f94ff4b5ddf401f70fa338")


A hardcoded DashScope API key has been found in this test file. Committing secrets like API keys to version control is a major security risk, as it exposes them to anyone with access to the repository. The key should be removed immediately. Please configure tests to load secrets from a secure source, such as environment variables, and ensure the hardcoded value is not present in the git history.

Suggested change

os.environ["DASHSCOPE_API_KEY"] = os.getenv("DASHSCOPE_API_KEY", "sk-311cfac3a0f94ff4b5ddf401f70fa338")

os.environ["DASHSCOPE_API_KEY"] = os.getenv("DASHSCOPE_API_KEY", "sk-xxx")

gemini-code-assist · 2026-03-19T10:11:57Z

tutorial/opencode_build_openclaw_agent/on_compute_relative_reward.py

+W_EXTRAVERSION = float(os.getenv("W_EXTRAVERSION", "0.5"))
+W_RELEVANCE = float(os.getenv("W_RELEVANCE", "0.3"))
+W_DIVERSITY = float(os.getenv("W_DIVERSITY", "0.2"))


The comment on line 35 states that the reward weights must sum to 1.0, but this is not enforced in the code. An incorrect configuration could lead to skewed rewards and silently impact model training. It's good practice to add an assertion to validate this constraint at startup.

Suggested change

W_EXTRAVERSION = float(os.getenv("W_EXTRAVERSION", "0.5"))

W_RELEVANCE = float(os.getenv("W_RELEVANCE", "0.3"))

W_DIVERSITY = float(os.getenv("W_DIVERSITY", "0.2"))

W_EXTRAVERSION = float(os.getenv("W_EXTRAVERSION", "0.5"))

W_RELEVANCE = float(os.getenv("W_RELEVANCE", "0.3"))

W_DIVERSITY = float(os.getenv("W_DIVERSITY", "0.2"))

assert abs(W_EXTRAVERSION + W_RELEVANCE + W_DIVERSITY - 1.0) < 1e-9, "Reward weights must sum to 1.0"

gemini-code-assist · 2026-03-19T10:11:58Z

docs/en/example_train_multi_model.md

+
+

There are excessive blank lines here, which can affect the document's readability. It's generally best to use a single blank line to separate paragraphs or elements.

gemini-code-assist · 2026-03-19T10:11:58Z

docs/en/example_train_multi_model.zh.md

+
+

There are excessive blank lines here, which can affect the document's readability. It's generally best to use a single blank line to separate paragraphs or elements.

gemini-code-assist · 2026-03-19T10:11:58Z

docs/en/example_train_multi_model.zh.md

+
+

There are excessive blank lines here, which can affect the document's readability. Please remove the extra blank lines to keep the formatting clean.

gemini-code-assist · 2026-03-19T10:11:58Z

tutorial/opencode_build_openclaw_agent/fake_vllm_endpoint.py

    json_data["stream"] = is_stream

+    # Remove fields not supported by vLLM to avoid warnings
+    UNSUPPORTED_FIELDS = {"strict", "store"}


The UNSUPPORTED_FIELDS set is defined within this function. Since its value is static, consider defining it as a module-level constant (e.g., near SWARM_URL). This improves maintainability by grouping configuration values together and makes it clear this is not a per-request value.

gemini-code-assist · 2026-03-19T10:11:58Z

tutorial/opencode_build_openclaw_agent/on_compute_relative_reward.py

+# Diversity: n-gram overlap (fast, deterministic, no LLM needed)
+# ---------------------------------------------------------------------------
+def _get_ngrams(text: str, n: int = 3) -> collections.Counter:
+    """Extract character-level n-grams from text."""


The docstring states that this function extracts 'character-level n-grams', but the implementation uses text.lower().split() which operates on words. This is a discrepancy between the documentation and the code's behavior. Please update the docstring to reflect that it extracts word-level n-grams.

Suggested change

"""Extract character-level n-grams from text."""

"""Extract word-level n-grams from text."""

gemini-code-assist · 2026-03-19T10:11:58Z

tutorial/opencode_build_openclaw_agent/on_compute_relative_reward.py

+    Compute a diversity score for each response (0 = duplicate, 1 = fully unique).
+
+    Two components:
+      1. Within-batch: average pairwise n-gram overlap with other responses in the batch


The docstring states that the within-batch component is the 'average pairwise n-gram overlap', but the implementation on line 153 uses max(batch_overlaps). This calculates the worst-case overlap, not the average. Please update the docstring to accurately describe the implementation.

Suggested change

1. Within-batch: average pairwise n-gram overlap with other responses in the batch

1. Within-batch: maximum pairwise n-gram overlap with other responses in the batch

gemini-code-assist · 2026-03-19T10:11:58Z

tutorial/opencode_build_openclaw_agent/on_compute_relative_reward.py

+      2. compute_string_madness — catches nonsense chars, special token leaks,
+         character-level repetition


The comment mentions compute_string_madness, but this function is not directly called in the implementation below. Instead, has_repeat is used. This can be confusing. To improve clarity, please update the comment to reflect the actual implementation, which uses has_repeat to detect character-level and word-level repetition.

Suggested change

2. compute_string_madness — catches nonsense chars, special token leaks,

character-level repetition

2. `has_repeat` — catches word/character-level repetition and special token leaks

binary-husky added 5 commits March 12, 2026 16:42

deep-fin-pre-commit-patch

6ce1492

revise openclaw training

9707faf

add illustration

b6da77f

add better reward for openclaw agent build

f091efc

Merge branch 'main' into deep-fin-pre-commit-patch

2f07daf

binary-husky merged commit 78182cd into main Mar 19, 2026
2 checks passed

gemini-code-assist bot reviewed Mar 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Openclaw exp#17

Openclaw exp#17
binary-husky merged 5 commits intomainfrom
openclaw-exp

binary-husky commented Mar 19, 2026

Uh oh!

gemini-code-assist bot commented Mar 19, 2026

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 19, 2026

Uh oh!

gemini-code-assist bot Mar 19, 2026

Uh oh!

gemini-code-assist bot Mar 19, 2026

Uh oh!

gemini-code-assist bot Mar 19, 2026

Uh oh!

gemini-code-assist bot Mar 19, 2026

Uh oh!

gemini-code-assist bot Mar 19, 2026

Uh oh!

gemini-code-assist bot Mar 19, 2026

Uh oh!

gemini-code-assist bot Mar 19, 2026

Uh oh!

gemini-code-assist bot Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	os.environ["DASHSCOPE_API_KEY"] = os.getenv("DASHSCOPE_API_KEY", "sk-311cfac3a0f94ff4b5ddf401f70fa338")
	os.environ["DASHSCOPE_API_KEY"] = os.getenv("DASHSCOPE_API_KEY", "sk-xxx")

	"""Extract character-level n-grams from text."""
	"""Extract word-level n-grams from text."""

	1. Within-batch: average pairwise n-gram overlap with other responses in the batch
	1. Within-batch: maximum pairwise n-gram overlap with other responses in the batch

		2. compute_string_madness — catches nonsense chars, special token leaks,
		character-level repetition

	2. compute_string_madness — catches nonsense chars, special token leaks,
	character-level repetition
	2. `has_repeat` — catches word/character-level repetition and special token leaks

Conversation

binary-husky commented Mar 19, 2026

Uh oh!

gemini-code-assist bot commented Mar 19, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant