Skip to content

Feat/enhance dsv2#886

Merged
alcholiclg merged 6 commits intomodelscope:mainfrom
alcholiclg:feat/enhance_dsv2
Mar 23, 2026
Merged

Feat/enhance dsv2#886
alcholiclg merged 6 commits intomodelscope:mainfrom
alcholiclg:feat/enhance_dsv2

Conversation

@alcholiclg
Copy link
Collaborator

@alcholiclg alcholiclg commented Mar 20, 2026

Change Summary

Agentic Insight V2

  • Externalized prompts: Moved system prompts out of YAML configs into standalone .txt files (en/gpt5, zh/qwen3); added file_resolver.py for file reference resolution.
  • Quality check module: Added quality_checker.py for report quality validation.
  • Researcher/Reporter split: Rewrote researcher_callback.py to restrict direct report edits; heavily expanded reporter_callback.py with updated delivery flow and post-report guidance.
  • Evidence tool overhaul: Major expansion of evidence_tool.py with richer functionality.
  • YAML config cleanup: Slimmed down researcher.yaml, reporter.yaml, searcher.yaml to reference external prompt files.
  • Benchmark tooling: Added run_benchmark.sh; enhanced dr_bench_runner.py.
  • README: Expanded both EN/ZH READMEs with usage and architecture docs.

Agent & Framework

  • In-process subagent: Refactored agent_tool.py to support running subagents in-process; fixed timeout issues.
  • Local code executor: Bug fix in local_code_executor.py.
  • TodoList / Filesystem tools: Minor enhancements and permission adjustments

Tests

  • Added test_prompt_files.py for prompt file loading validation.
  • Extended test_server_tools_smoke.py with additional smoke tests.

Related issue number

Checklist

  • The pull request title is a good summary of the changes - it will be used in the changelog
  • Unit tests for the changes exist
  • Run pre-commit install and pre-commit run --all-files before git commit, and passed lint check.
  • Documentation reflects the changes where applicable

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades the Deep Research V2 framework by introducing a more flexible prompt management system, integrating advanced quality assurance mechanisms for generated reports, and enhancing the stability and execution of subagents. These changes aim to make the research process more autonomous, reliable, and easier to manage, ultimately leading to higher-quality research outputs.

Highlights

  • Externalized Prompt Management: System prompts for agents are now stored in external .txt or .md files, enhancing configurability and multi-language support. A new file_resolver.py dynamically loads prompts based on agent, language, and model family.
  • Enhanced Deep Research Workflow with Quality Checks: A new quality_checker.py introduces LLM-based auditing for reports, detecting issues like placeholder content or improper referencing. Both Reporter and Researcher agents now incorporate self-reflection callbacks to ensure report quality and existence before task completion.
  • Robust Subagent Execution: The agent_tool.py has been refactored to support running subagents in isolated processes, significantly improving timeout handling and overall stability for complex tasks.
  • Evidence Tool Overhaul: The evidence_tool.py now supports a new 'analysis' evidence type with full CRUD operations for structured interim summaries, comparisons, and decision records. The note structure was also simplified to a single content field.
  • Improved Documentation and Benchmarking: Extensive updates to both English and Chinese README.md files provide detailed configuration, usage, and troubleshooting guides. A new run_benchmark.sh script and enhancements to dr_bench_runner.py offer robust benchmarking capabilities.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces significant enhancements to the Agentic Insight V2 system, focusing on externalizing prompts, improving agent execution robustness with in-process subagents, and adding crucial self-reflection and quality check modules for both Researcher and Reporter agents. The overhaul of the evidence tool to include "analysis" records is a valuable addition for structuring intermediate reasoning. Documentation has been extensively updated in both English and Chinese, providing clearer guidance for configuration and troubleshooting. The benchmark tooling has also been improved with a more reliable task completion detection mechanism. Overall, these changes greatly improve the system's modularity, reliability, and user experience.

Comment on lines +104 to +106
except Exception:
# Never block config loading due to prompt resolving.
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Catching a broad Exception and then passing can hide potential issues during prompt file resolution. It's generally better to catch more specific exceptions or at least log the exception for debugging purposes, even if the intention is to not block config loading.

Comment on lines +220 to +222
except Exception:
# Be conservative: prompt loading must never break config loading.
return config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the issue in ms_agent/config/config.py, catching a broad Exception and then returning config can mask underlying problems during prompt file resolution. Consider logging the exception to aid debugging if unexpected issues arise.

Comment on lines +163 to +170
except BaseException as exc: # pragma: no cover
result_queue.put({
'ok': False,
'error': str(exc),
'traceback': traceback.format_exc(),
'agent_tag': getattr(sub_agent, 'tag', None),
'agent_type': getattr(sub_agent, 'AGENT_NAME', None),
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While catching BaseException allows for handling asyncio.CancelledError, it is generally discouraged as it can also catch critical system-exiting exceptions like KeyboardInterrupt or SystemExit. It's usually safer to catch Exception for application-level errors and handle asyncio.CancelledError explicitly if needed, or re-raise BaseException types that are not Exception.

Comment on lines 505 to +507
}
return {
'code_executor': [
t for t in tools['code_executor']
if t['tool_name'] not in self.exclude_functions
]
}

return tools
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The removal of the exclusion logic for tools in _get_tools_inner means that all tools defined within LocalCodeExecutionTool will now be exposed. While this might be intended for higher-level filtering, it's important to ensure that no sensitive or unintended tools are exposed if the higher-level filtering is not robustly implemented.

Comment on lines +417 to +418
'Which occurrence to replace (1-based). Default is 1 (first occurrence). '
'Use -1 to replace all occurrences.',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Changing the default occurrence from -1 (all occurrences) to 1 (first occurrence) for replace_file_contents is a significant behavioral change. While making the default safer, it might break existing workflows that relied on replacing all occurrences by default. It's good that occurrence is now a required parameter, which forces explicit choice.

config = apply_prompt_files(config)
except Exception:
# Never block config loading due to prompt resolving.
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

或者记录日志 logger.warning(f'Prompt resolution failed: {e}')

@alcholiclg alcholiclg merged commit de58fa5 into modelscope:main Mar 23, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants