feat(agent): agent for article fact checking #348
Conversation
Implement ArticleFactChecker using Agent-First architecture pattern with LangChain ReAct agent for autonomous claim extraction and verification. Features include: - Thread-safe context passing between eval() and aggregate_results() - Dual-layer EvalDetail.reason: text summary + structured report dict - Intermediate artifact saving (claims, verification details, report) - Claims extraction from tool_calls and per-claim verification merging - PromptTemplates with OUTPUT_FORMAT for structured agent responses Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add test data files for fact-checking scenarios: - blog_article.md: tech blog about PaddleOCR-VL with institutional claims - news_article_excerpt.md: news article excerpt for testing - product_review_excerpt.md: product review with statistical claims Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comprehensive test coverage for ArticleFactChecker including: - PromptTemplates validation and output format - Claims extraction from tool_calls - Per-claim verification merging - Structured report generation - Dual-layer EvalDetail.reason output - File saving operations (article, claims, verification, report) - News and product review article type tests - Blog article real-world integration test Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add comprehensive test suites for agent tools: - test_arxiv_search.py: ArxivSearchTool unit and integration tests - test_claims_extractor.py: ClaimsExtractor with type filtering, dedup - verify_setup.py: Environment verification script for agent setup Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…fact_check Remove 3 duplicate TestArxivSupport classes that incorrectly tested AgentFactCheck for arxiv_search support. AgentFactCheck only has tavily_search; arxiv_search is specific to ArticleFactChecker and is properly tested in test_article_fact_checker.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Demonstrate ArticleFactChecker usage with InputArgs + Executor pattern: - JSONL temp file creation for article-level input - Complete agent_config with claims_extractor, arxiv, tavily tools - Dual-layer result display (text summary + structured report) - Intermediate artifact output configuration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add comprehensive documentation for article fact-checking: - agent_architecture.md: Agent-First vs Custom architecture patterns - article_fact_checking_guide.md: Complete usage guide with API reference - quick_start_article_fact_checking.md: 5-minute quick start guide - agent_development_guide.md: fix missing fields key in mix example All docs use correct JSONL format and EvalPipline config structure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- _get_output_dir() now auto-generates outputs/article_factcheck_<ts>_<uuid>/ when no explicit output_path is configured, eliminating the need to manually specify artifact_output_path in examples and user configs - Add save_artifacts=false opt-out to disable artifact saving entirely - Add base_output_path config to override the auto-generate base directory - Append uuid suffix to prevent timestamp collision in concurrent evaluations - Fix agent_cfg None guard and empty base_output_path fallback - Update example to remove manual path config and add try/finally cleanup - Update docs to document all three output path options (priority order) - Update tests: replace old None-when-unconfigured test with two new tests covering auto-generate and save_artifacts=false opt-out behaviors Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove quick_start_article_fact_checking.md (redundant with article_fact_checking_guide.md Quick Start section) - Trim agent_architecture.md from 1055 to 598 lines by removing Implementation Patterns, Configuration, and Examples sections (all fully covered in agent_development_guide.md) - Update agent_development_guide.md: refresh _get_output_dir pattern to show new three-priority chain; update test count 82->88 - Fix 5 outdated references in article_fact_checking_guide.md from 'only when output_path is set' to reflect new auto-save default - Stage dingo/model/llm/agent/__init__.py (previously uncommitted) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Trim module and class docstrings, removing content duplicated between the two - Add _write_jsonl_file() helper to deduplicate identical JSONL save logic - Replace manual dict-counting with collections.Counter - Remove redundant hasattr/getattr double-check in _get_system_prompt() - Replace decorative === section dividers with concise --- headers - Extract intermediate variable in example's reason display Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…syncio
Replace single-agent sequential path (~669s for 15 claims) with a
two-phase async architecture targeting 100-150s (4-6x speedup):
Phase 1: ClaimsExtractor direct call via run_in_executor (~30s)
Phase 2: asyncio.gather + Semaphore(max_concurrent=5) parallel
mini-agents, one per claim (~80-120s)
Changes:
- agent_wrapper: add async_invoke_and_format(); extract shared
_format_agent_result() and _make_error_result() helpers to
eliminate duplication between sync/async invoke paths
- agent_article_fact_checker: rewrite eval() with asyncio.run()
bridge and ThreadPoolExecutor fallback; add _async_eval(),
_async_extract_claims(), _async_verify_single_claim(), and
aggregation helpers; add PER_CLAIM_VERIFICATION_PROMPT and
max_concurrent_claims=5 config option
- Fix pre-existing NoneType bug in _build_eval_detail_from_verification
- Add test_async_article_fact_checker.py (16 tests, all passing)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a robust ArticleFactChecker agent, significantly enhancing the platform's ability to perform autonomous, article-level fact-checking. The agent employs a novel two-phase asynchronous architecture, allowing for efficient parallel verification of claims extracted from long-form articles. It integrates new specialized tools, ClaimsExtractor and ArxivSearch, alongside existing web search capabilities, to systematically verify factual statements, institutional attributions, and other claim types. This feature provides a comprehensive, structured report of verification findings, including detailed evidence and sources, and saves intermediate artifacts for transparency and debugging. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This is a substantial and well-executed feature addition, introducing the ArticleFactChecker agent. The new two-phase asynchronous architecture is a significant performance improvement, and the new tools (claims_extractor, arxiv_search) are well-designed with robust features like flexible parsing and rate limiting. The accompanying documentation is comprehensive and very helpful. I have a couple of suggestions for improvement regarding some leftover code and test coverage to further enhance the quality of this contribution.
Note: Security Review did not run due to the size of the PR.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a new ArticleFactChecker agent with a two-phase asynchronous architecture for parallel claim verification. The changes include the agent itself, new tools for arXiv search and claims extraction, extensive documentation, and a comprehensive test suite. The implementation demonstrates robust patterns for handling LLM interactions, including detailed prompts and fallback parsing logic. My review identifies one potential issue regarding state management in what appears to be a legacy code path.
Note: Security Review did not run due to the size of the PR.
No description provided.