support auto llm review and fix test_api_connection#8
support auto llm review and fix test_api_connection#8MrJs133 wants to merge 8 commits intohugegraph:mainfrom
Conversation
Walkthrough本次更新引入了针对多种LLM模型的自动化测试与评审工具,包括配置解析、自动化调用与结果评审。前端Gradio界面新增“评测模型设置”区块,支持上传YAML或文本配置、输入评测Prompt和标准答案,并一键运行LLM测试与评审。新增评审Prompt和资源配置文件。 Changes
Sequence Diagram(s)sequenceDiagram
participant 用户
participant Gradio界面
participant other_tool_utils
participant 各类LLM客户端
participant OpenAI评审模型
用户->>Gradio界面: 输入LLM配置、Prompt、标准答案,点击“Run LLM Test”
Gradio界面->>other_tool_utils: 调用auto_test_llms
other_tool_utils->>各类LLM客户端: 依次发送Prompt,获取答案与延迟
other_tool_utils->>OpenAI评审模型: 调用judge,发送评审Prompt与标准答案、各LLM答案
OpenAI评审模型-->>other_tool_utils: 返回JSON格式评审结果
other_tool_utils-->>Gradio界面: 汇总答案、延迟、评审结果,输出JSON
Gradio界面-->>用户: 展示评测与评审结果
Poem
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (3)
hugegraph-llm/src/hugegraph_llm/resources/demo/llm_review.yaml (1)
1-12: 建议修复YAML文件中的尾随空格当前YAML文件中存在多处尾随空格问题(第3、4、9、10行),这可能导致某些YAML解析器产生警告或错误。同时,文件末尾也缺少标准的换行符。
请应用以下修改:
- type: openai model_name: ernie-4.5-8k-preview api_key: api_base: max_tokens: 2048 - type: openai model_name: gpt-4.1-mini api_key: api_base: max_tokens: 4096 + type: openai + model_name: ernie-4.5-8k-preview + api_key: + api_base: + max_tokens: 2048 + + type: openai + model_name: gpt-4.1-mini + api_key: + api_base: + max_tokens: 4096 +🧰 Tools
🪛 YAMLlint (1.37.1)
[error] 3-3: trailing spaces
(trailing-spaces)
[error] 4-4: trailing spaces
(trailing-spaces)
[error] 9-9: trailing spaces
(trailing-spaces)
[error] 10-10: trailing spaces
(trailing-spaces)
hugegraph-llm/src/hugegraph_llm/demo/rag_demo/other_block.py (2)
49-55: 避免嵌套的with语句并优化getattr调用当前代码使用了嵌套的
with语句并且不必要地使用了getattr函数。根据静态分析工具建议,可以简化代码结构提高可读性。建议按照以下方式修改代码:
- with gr.Accordion("Evaluation Model Settings (only support openai)", open=True): - with gr.Row(): - review_model_name = gr.Textbox(label="Model Name", value="ernie-4.5-8k-preview", interactive=True) - review_max_tokens = gr.Textbox(label="Max Tokens", value=2048) - key = gr.Textbox(value=getattr(llm_settings, "openai_chat_api_key"), label="API Key") - base = gr.Textbox(value=getattr(llm_settings, "openai_chat_api_base"),label="API Base") + with gr.Accordion("Evaluation Model Settings (only support openai)", open=True), gr.Row(): + review_model_name = gr.Textbox(label="Model Name", value="ernie-4.5-8k-preview", interactive=True) + review_max_tokens = gr.Textbox(label="Max Tokens", value=2048) + key = gr.Textbox(value=llm_settings.openai_chat_api_key, label="API Key") + base = gr.Textbox(value=llm_settings.openai_chat_api_base, label="API Base")🧰 Tools
🪛 Ruff (0.11.9)
49-50: Use a single
withstatement with multiple contexts instead of nestedwithstatements(SIM117)
53-53: Do not call
getattrwith a constant attribute value. It is not any safer than normal property access.Replace
getattrwith attribute access(B009)
54-54: Do not call
getattrwith a constant attribute value. It is not any safer than normal property access.Replace
getattrwith attribute access(B009)
55-71: 消除未使用的变量并简化嵌套结构当前代码中有未使用的变量(
tab_upload_file和tab_upload_text),以及过度嵌套的with语句。可以简化这些结构以提高代码可读性。建议按照以下方式修改代码:
- with gr.Row(): - with gr.Column(): - with gr.Tab("file") as tab_upload_file: # pylint: disable=W0612 - inp1_file = gr.File( - value=os.path.join(resource_path, "demo", "llm_review.yaml"), - label="yaml file", - file_count="single", - ) - with gr.Tab("text") as tab_upload_text: # pylint: disable=W0612 - inp1 = gr.Textbox( - value="openai, model_name, api_key, api_base, max_tokens\n" \ - "qianfan_wenxin, model_name, api_key, secret_key\n" \ - "ollama/local, model_name, host, port, max_tokens\n" \ - "litellm, model_name, api_key, api_base, max_tokens\n", - label="LLMs Config (every line represents a different LLM)", - show_copy_button=True, lines=6 - ) + with gr.Row(), gr.Column(): + with gr.Tab("file"): + inp1_file = gr.File( + value=os.path.join(resource_path, "demo", "llm_review.yaml"), + label="yaml file", + file_count="single", + ) + with gr.Tab("text"): + inp1 = gr.Textbox( + value="openai, model_name, api_key, api_base, max_tokens\n" \ + "qianfan_wenxin, model_name, api_key, secret_key\n" \ + "ollama/local, model_name, host, port, max_tokens\n" \ + "litellm, model_name, api_key, api_base, max_tokens\n", + label="LLMs Config (every line represents a different LLM)", + show_copy_button=True, lines=6 + )🧰 Tools
🪛 Ruff (0.11.9)
55-56: Use a single
withstatement with multiple contexts instead of nestedwithstatementsCombine
withstatements(SIM117)
57-57: Local variable
tab_upload_fileis assigned to but never usedRemove assignment to unused variable
tab_upload_file(F841)
63-63: Local variable
tab_upload_textis assigned to but never usedRemove assignment to unused variable
tab_upload_text(F841)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
hugegraph-llm/src/hugegraph_llm/config/prompt_config.py(1 hunks)hugegraph-llm/src/hugegraph_llm/demo/rag_demo/configs_block.py(1 hunks)hugegraph-llm/src/hugegraph_llm/demo/rag_demo/other_block.py(3 hunks)hugegraph-llm/src/hugegraph_llm/resources/demo/llm_review.yaml(1 hunks)hugegraph-llm/src/hugegraph_llm/utils/other_tool_utils.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
hugegraph-llm/src/hugegraph_llm/utils/other_tool_utils.py (5)
hugegraph-llm/src/hugegraph_llm/config/prompt_config.py (1)
PromptConfig(23-410)hugegraph-llm/src/hugegraph_llm/models/llms/ollama.py (1)
OllamaClient(29-154)hugegraph-llm/src/hugegraph_llm/models/llms/openai.py (1)
OpenAIClient(34-226)hugegraph-llm/src/hugegraph_llm/models/llms/qianfan.py (1)
QianfanClient(29-118)hugegraph-llm/src/hugegraph_llm/models/llms/litellm.py (1)
LiteLLMClient(34-191)
🪛 YAMLlint (1.37.1)
hugegraph-llm/src/hugegraph_llm/resources/demo/llm_review.yaml
[error] 3-3: trailing spaces
(trailing-spaces)
[error] 4-4: trailing spaces
(trailing-spaces)
[error] 9-9: trailing spaces
(trailing-spaces)
[error] 10-10: trailing spaces
(trailing-spaces)
🪛 Ruff (0.11.9)
hugegraph-llm/src/hugegraph_llm/demo/rag_demo/other_block.py
49-50: Use a single with statement with multiple contexts instead of nested with statements
(SIM117)
53-53: Do not call getattr with a constant attribute value. It is not any safer than normal property access.
Replace getattr with attribute access
(B009)
54-54: Do not call getattr with a constant attribute value. It is not any safer than normal property access.
Replace getattr with attribute access
(B009)
55-56: Use a single with statement with multiple contexts instead of nested with statements
Combine with statements
(SIM117)
57-57: Local variable tab_upload_file is assigned to but never used
Remove assignment to unused variable tab_upload_file
(F841)
63-63: Local variable tab_upload_text is assigned to but never used
Remove assignment to unused variable tab_upload_text
(F841)
🔇 Additional comments (7)
hugegraph-llm/src/hugegraph_llm/config/prompt_config.py (1)
391-410: 新增的评审模板结构清晰,符合需求新增的
review_prompt模板设计合理,明确定义了评测标准(准确性、相关性、完整性)和评分机制(1-5分),并要求以JSON格式返回结果。hugegraph-llm/src/hugegraph_llm/demo/rag_demo/configs_block.py (1)
222-222: API测试消息内容修改将测试消息从"test"改为"hello"是一个简单但合理的修改,使测试消息更接近实际对话内容,有助于验证API的实际响应能力。
hugegraph-llm/src/hugegraph_llm/demo/rag_demo/other_block.py (3)
18-18: 导入语句优化导入了必要的模块以支持新增的自动评测功能,结构清晰。
Also applies to: 28-31
89-98: 注释掉的代码处理文件中存在注释掉的tab选择处理代码。这些代码看起来是为了实现tab之间切换时的输入控制,但目前处于未激活状态。
请确认这些注释掉的代码是否计划在未来版本中实现,或者是否可以安全删除。如果计划保留,建议添加TODO注释说明未来实现计划;如果不再需要,建议完全删除以保持代码整洁。
73-81: LLM测试功能实现良好添加的LLM测试功能设计合理,UI元素组织清晰,包括提示输入、标准答案输入和结果输出区域,并正确连接到
auto_test_llms函数。hugegraph-llm/src/hugegraph_llm/utils/other_tool_utils.py (2)
108-153: YAML 解析逻辑与文本解析大量重复两段函数体几乎一致,维护成本高。可抽象出
normalize_config(dict)公共方法,文本 / YAML 解析各自产出字典后复用。
[ suggest_optional_refactor ]
224-232: 评审结果结构假设过于严格
reviews_dict = {item["model"]: item for item in reviews}假设评审返回的每项必含"model"键;若评审 Prompt 有变或上游格式稍改,将抛KeyError。
建议使用.get("model")并过滤空值,或允许通过id顺序对齐。
| def parse_llm_configurations(config_text: str): | ||
| configs = [] | ||
| lines = config_text.strip().split("\n") | ||
| for i, line in enumerate(lines, 1): | ||
| fields = [x.strip() for x in line.split(",")] | ||
| if not fields: | ||
| continue | ||
| llm_type = fields[0] |
There was a problem hiding this comment.
跳过空行 / 注释行以避免误解析
line.split(",") 即使原行为空,也会得到长度为 1 的 [""],接下来会误进入各分支并抛 “字段数量不足” 异常。
- lines = config_text.strip().split("\n")
+ lines = [l for l in config_text.split("\n") if l.strip() and not l.strip().startswith("#")]这样能容忍用户在文本中添加空行或 # 开头的注释。
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def parse_llm_configurations(config_text: str): | |
| configs = [] | |
| lines = config_text.strip().split("\n") | |
| for i, line in enumerate(lines, 1): | |
| fields = [x.strip() for x in line.split(",")] | |
| if not fields: | |
| continue | |
| llm_type = fields[0] | |
| def parse_llm_configurations(config_text: str): | |
| configs = [] | |
| - lines = config_text.strip().split("\n") | |
| + lines = [l for l in config_text.split("\n") if l.strip() and not l.strip().startswith("#")] | |
| for i, line in enumerate(lines, 1): | |
| fields = [x.strip() for x in line.split(",")] | |
| if not fields: | |
| continue | |
| llm_type = fields[0] | |
| # … |
🤖 Prompt for AI Agents
In hugegraph-llm/src/hugegraph_llm/utils/other_tool_utils.py around lines 54 to
61, the current code does not properly skip empty lines or comment lines
starting with '#', causing incorrect parsing and potential "insufficient fields"
errors. Modify the loop to explicitly skip lines that are empty after stripping
or that start with '#' before splitting and processing them, ensuring these
lines do not cause parsing errors.
There was a problem hiding this comment.
Pull Request Overview
This PR introduces utilities and UI components for automated LLM testing and review, plus updates the default API test prompt.
- Add
other_tool_utils.pywith functions to parse LLM configs, run prompts, and review outputs. - Integrate the new LLM test/review flow into the RAG demo UI and provide a sample
llm_review.yaml. - Update default test message in
configs_block.pyfrom"test"to"hello"and addreview_prompttemplate inPromptConfig.
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/hugegraph_llm/utils/other_tool_utils.py | New utility functions for parsing configs, running tests, and reviewing LLM outputs. |
| src/hugegraph_llm/resources/demo/llm_review.yaml | Sample YAML configuration for demoing LLM review settings. |
| src/hugegraph_llm/demo/rag_demo/other_block.py | UI changes to incorporate auto LLM testing and review. |
| src/hugegraph_llm/demo/rag_demo/configs_block.py | Change default API connection test prompt from “test” to “hello”. |
| src/hugegraph_llm/config/prompt_config.py | Add review_prompt template for evaluating model responses. |
Comments suppressed due to low confidence (2)
hugegraph-llm/src/hugegraph_llm/utils/other_tool_utils.py:112
- [nitpick] The raised error message is in Chinese; consider using a consistent language (e.g., English) for all user-facing messages in this codebase.
if not isinstance(raw_configs, list):
hugegraph-llm/src/hugegraph_llm/utils/other_tool_utils.py:54
- New parsing logic in this function should be covered by unit tests (including edge cases like malformed lines and missing fields) to ensure robustness.
def parse_llm_configurations(config_text: str):
| lines = config_text.strip().split("\n") | ||
| for i, line in enumerate(lines, 1): | ||
| fields = [x.strip() for x in line.split(",")] | ||
| if not fields: |
There was a problem hiding this comment.
The check if not fields: will never skip lines because fields is always a non-empty list; consider testing if not line.strip(): to properly skip blank or whitespace-only lines.
| if not fields: | |
| if not line.strip(): |
| answers = {} | ||
| for config in configs: | ||
| output = None |
There was a problem hiding this comment.
LLM requests are made sequentially, which could increase total latency; consider running these calls concurrently (e.g., with asyncio or threads) to improve throughput.
| answers = {} | |
| for config in configs: | |
| output = None | |
| async def generate_output(config): |
|
@codecov-ai-reviewer review |
|
On it! We are reviewing the PR and will provide feedback shortly. |
PR DescriptionThis pull request introduces an automated LLM testing and evaluation framework. The primary goal is to enable developers to easily test and compare the performance of different LLMs against a given prompt and standard answer, providing insights into their accuracy, relevance, and completeness. Click to see moreKey Technical Changes
Architecture Decisions
Dependencies and Interactions
Risk Considerations
Notable Implementation Details
|
| data = { | ||
| "model": arg3, | ||
| "temperature": 0.01, | ||
| "messages": [{"role": "user", "content": "test"}], | ||
| "messages": [{"role": "user", "content": "hello"}], |
There was a problem hiding this comment.
Changing the test message from 'test' to 'hello' is a minor change. Consider adding a comment explaining why this change was made or using a constant/configuration value instead of hardcoding the message. This would make the purpose of the change clearer and make it easier to update in the future.
| data = { | |
| "model": arg3, | |
| "temperature": 0.01, | |
| "messages": [{"role": "user", "content": "test"}], | |
| "messages": [{"role": "user", "content": "hello"}], | |
| "messages": [{"role": "user", "content": DEFAULT_TEST_MESSAGE}], # Using a configurable message |
| api_key: | ||
| api_base: | ||
| max_tokens: 2048 |
There was a problem hiding this comment.
The YAML file contains empty API keys and base URLs. Consider adding clear comments or documentation on how users should set these values or provide environment variable references instead of empty values.
| api_key: | |
| api_base: | |
| max_tokens: 2048 | |
| api_key: ${OPENAI_API_KEY} # Set this to your API key or use environment variable | |
| api_base: ${OPENAI_API_BASE} # Set this to your API base URL or use environment variable | |
| max_tokens: 2048 |
| return configs | ||
|
|
||
|
|
||
| def auto_test_llms( | ||
| llm_configs, | ||
| llm_configs_file, | ||
| prompt, | ||
| standard_answer, | ||
| review_model_name, | ||
| review_max_tokens, | ||
| key, | ||
| base, | ||
| fmt=True | ||
| ): | ||
| configs = None | ||
| if llm_configs_file and llm_configs: | ||
| raise gr.Error("Please only choose one between file and text.") | ||
| if llm_configs: | ||
| configs = parse_llm_configurations(llm_configs) | ||
| elif llm_configs_file: | ||
| configs = parse_llm_configurations_from_yaml(llm_configs_file) | ||
| log.debug("LLM_configs: %s", configs) | ||
| answers = {} | ||
| for config in configs: | ||
| output = None | ||
| time_start = time.perf_counter() | ||
| if config["type"] == "openai": | ||
| client = OpenAIClient( | ||
| api_key=config["api_key"], |
There was a problem hiding this comment.
In the auto_test_llms function, there's no validation of the input parameters before using them. Consider adding validation to prevent potential errors.
| return configs | |
| def auto_test_llms( | |
| llm_configs, | |
| llm_configs_file, | |
| prompt, | |
| standard_answer, | |
| review_model_name, | |
| review_max_tokens, | |
| key, | |
| base, | |
| fmt=True | |
| ): | |
| configs = None | |
| if llm_configs_file and llm_configs: | |
| raise gr.Error("Please only choose one between file and text.") | |
| if llm_configs: | |
| configs = parse_llm_configurations(llm_configs) | |
| elif llm_configs_file: | |
| configs = parse_llm_configurations_from_yaml(llm_configs_file) | |
| log.debug("LLM_configs: %s", configs) | |
| answers = {} | |
| for config in configs: | |
| output = None | |
| time_start = time.perf_counter() | |
| if config["type"] == "openai": | |
| client = OpenAIClient( | |
| api_key=config["api_key"], | |
| def auto_test_llms( | |
| llm_configs, | |
| llm_configs_file, | |
| prompt, | |
| standard_answer, | |
| review_model_name, | |
| review_max_tokens, | |
| key, | |
| base, | |
| fmt=True | |
| ): | |
| # Validate required inputs | |
| if not prompt: | |
| raise ValueError("Prompt cannot be empty") | |
| if not standard_answer: | |
| raise ValueError("Standard answer cannot be empty") | |
| if not review_model_name: | |
| raise ValueError("Review model name cannot be empty") | |
| if not key or not base: | |
| raise ValueError("API key and base URL are required for reviews") | |
| # Check that we have at least one config source | |
| if not llm_configs and not llm_configs_file: | |
| raise ValueError("Either text configuration or configuration file must be provided") | |
| configs = None | |
| if llm_configs_file and llm_configs: | |
| raise ValueError("Please only choose one between file and text.") | |
| if llm_configs: | |
| configs = parse_llm_configurations(llm_configs) | |
| elif llm_configs_file: | |
| configs = parse_llm_configurations_from_yaml(llm_configs_file) | |
| if not configs or len(configs) == 0: | |
| raise ValueError("No valid LLM configurations found") | |
| log.debug("LLM_configs: %s", configs) | |
| # Rest of the function... |
| api_key=config["api_key"], | ||
| api_base=config["api_base"], | ||
| model_name=config["model_name"], | ||
| max_tokens=config["max_tokens"], | ||
| ) | ||
| output = client.generate(prompt=prompt) | ||
| elif config["type"] == "qianfan_wenxin": | ||
| client = QianfanClient( | ||
| model_name=config["model_name"], | ||
| api_key=config["api_key"], | ||
| secret_key=config["secret_key"] | ||
| ) | ||
| output = client.generate(prompt=prompt) | ||
| elif config["type"] == "ollama/local": | ||
| client = OllamaClient( | ||
| model_name=config["model_name"], | ||
| host=config["host"], | ||
| port=config["port"], | ||
| ) | ||
| output = client.generate(prompt=prompt) | ||
| elif config["type"] == "litellm": | ||
| client = LiteLLMClient( | ||
| api_key=config["api_key"], | ||
| api_base=config["api_base"], | ||
| model_name=config["model_name"], | ||
| max_tokens=config["max_tokens"], | ||
| ) | ||
| output = client.generate(prompt=prompt) | ||
| time_end = time.perf_counter() | ||
| latency = time_end - time_start | ||
| answers[config["model_name"]] = { |
There was a problem hiding this comment.
The function auto_test_llms has a lot of duplicated code for creating different LLM clients. Consider refactoring this into a factory pattern for better maintainability.
| api_key=config["api_key"], | |
| api_base=config["api_base"], | |
| model_name=config["model_name"], | |
| max_tokens=config["max_tokens"], | |
| ) | |
| output = client.generate(prompt=prompt) | |
| elif config["type"] == "qianfan_wenxin": | |
| client = QianfanClient( | |
| model_name=config["model_name"], | |
| api_key=config["api_key"], | |
| secret_key=config["secret_key"] | |
| ) | |
| output = client.generate(prompt=prompt) | |
| elif config["type"] == "ollama/local": | |
| client = OllamaClient( | |
| model_name=config["model_name"], | |
| host=config["host"], | |
| port=config["port"], | |
| ) | |
| output = client.generate(prompt=prompt) | |
| elif config["type"] == "litellm": | |
| client = LiteLLMClient( | |
| api_key=config["api_key"], | |
| api_base=config["api_base"], | |
| model_name=config["model_name"], | |
| max_tokens=config["max_tokens"], | |
| ) | |
| output = client.generate(prompt=prompt) | |
| time_end = time.perf_counter() | |
| latency = time_end - time_start | |
| answers[config["model_name"]] = { | |
| def create_llm_client(config): | |
| """Factory function to create appropriate LLM client based on configuration""" | |
| if config["type"] == "openai": | |
| return OpenAIClient( | |
| api_key=config["api_key"], | |
| api_base=config["api_base"], | |
| model_name=config["model_name"], | |
| max_tokens=config["max_tokens"], | |
| ) | |
| elif config["type"] == "qianfan_wenxin": | |
| return QianfanClient( | |
| model_name=config["model_name"], | |
| api_key=config["api_key"], | |
| secret_key=config["secret_key"] | |
| ) | |
| elif config["type"] == "ollama/local": | |
| return OllamaClient( | |
| model_name=config["model_name"], | |
| host=config["host"], | |
| port=config["port"], | |
| ) | |
| elif config["type"] == "litellm": | |
| return LiteLLMClient( | |
| api_key=config["api_key"], | |
| api_base=config["api_base"], | |
| model_name=config["model_name"], | |
| max_tokens=config["max_tokens"], | |
| ) | |
| else: | |
| raise ValueError(f"Unsupported LLM type: {config['type']}") | |
| # Then in auto_test_llms: | |
| answers = {} | |
| for config in configs: | |
| output = None | |
| time_start = time.perf_counter() | |
| try: | |
| client = create_llm_client(config) | |
| output = client.generate(prompt=prompt) | |
| except Exception as e: | |
| log.error(f"Error with {config['model_name']}: {str(e)}") | |
| output = f"Error: {str(e)}" | |
| time_end = time.perf_counter() | |
| latency = time_end - time_start | |
| answers[config["model_name"]] = { | |
| "answer": output, | |
| "latency": f"{round(latency, 2)}s" | |
| } |
| api_key=config["api_key"], | ||
| api_base=config["api_base"], | ||
| model_name=config["model_name"], | ||
| max_tokens=config["max_tokens"], | ||
| ) | ||
| output = client.generate(prompt=prompt) | ||
| elif config["type"] == "qianfan_wenxin": | ||
| client = QianfanClient( | ||
| model_name=config["model_name"], | ||
| api_key=config["api_key"], | ||
| secret_key=config["secret_key"] | ||
| ) | ||
| output = client.generate(prompt=prompt) | ||
| elif config["type"] == "ollama/local": | ||
| client = OllamaClient( | ||
| model_name=config["model_name"], | ||
| host=config["host"], | ||
| port=config["port"], | ||
| ) | ||
| output = client.generate(prompt=prompt) | ||
| elif config["type"] == "litellm": | ||
| client = LiteLLMClient( | ||
| api_key=config["api_key"], | ||
| api_base=config["api_base"], | ||
| model_name=config["model_name"], | ||
| max_tokens=config["max_tokens"], | ||
| ) | ||
| output = client.generate(prompt=prompt) | ||
| time_end = time.perf_counter() | ||
| latency = time_end - time_start | ||
| answers[config["model_name"]] = { |
There was a problem hiding this comment.
The code in auto_test_llms doesn't handle any exceptions during LLM requests. If an LLM fails to respond, the entire function will fail. Consider adding proper error handling for each LLM call.
| api_key=config["api_key"], | |
| api_base=config["api_base"], | |
| model_name=config["model_name"], | |
| max_tokens=config["max_tokens"], | |
| ) | |
| output = client.generate(prompt=prompt) | |
| elif config["type"] == "qianfan_wenxin": | |
| client = QianfanClient( | |
| model_name=config["model_name"], | |
| api_key=config["api_key"], | |
| secret_key=config["secret_key"] | |
| ) | |
| output = client.generate(prompt=prompt) | |
| elif config["type"] == "ollama/local": | |
| client = OllamaClient( | |
| model_name=config["model_name"], | |
| host=config["host"], | |
| port=config["port"], | |
| ) | |
| output = client.generate(prompt=prompt) | |
| elif config["type"] == "litellm": | |
| client = LiteLLMClient( | |
| api_key=config["api_key"], | |
| api_base=config["api_base"], | |
| model_name=config["model_name"], | |
| max_tokens=config["max_tokens"], | |
| ) | |
| output = client.generate(prompt=prompt) | |
| time_end = time.perf_counter() | |
| latency = time_end - time_start | |
| answers[config["model_name"]] = { | |
| for config in configs: | |
| output = None | |
| time_start = time.perf_counter() | |
| try: | |
| if config["type"] == "openai": | |
| client = OpenAIClient( | |
| api_key=config["api_key"], | |
| api_base=config["api_base"], | |
| model_name=config["model_name"], | |
| max_tokens=config["max_tokens"], | |
| ) | |
| output = client.generate(prompt=prompt) | |
| # ... other client types ... | |
| except Exception as e: # Handle exceptions for each LLM call | |
| log.error(f"Error with {config['model_name']}: {str(e)}") | |
| output = f"Error generating response: {str(e)}" | |
| finally: | |
| time_end = time.perf_counter() | |
| latency = time_end - time_start | |
| answers[config["model_name"]] = { | |
| "answer": output or "No response received", | |
| "latency": f"{round(latency, 2)}s" | |
| } |
| latency = time_end - time_start | ||
| answers[config["model_name"]] = { | ||
| "answer": output, | ||
| "latency": f"{round(latency, 2)}s" | ||
| } | ||
| reviews = judge( | ||
| {k: v["answer"] for k, v in answers.items()}, | ||
| standard_answer, | ||
| review_model_name, | ||
| review_max_tokens, | ||
| key, | ||
| base | ||
| ) | ||
| log.debug("reviews: %s", reviews) | ||
| result = {} | ||
| reviews_dict = {item["model"]: item for item in reviews} if isinstance(reviews, list) else reviews | ||
| for model_name, infos in answers.items(): | ||
| result[model_name] = { | ||
| "answer": infos["answer"], | ||
| "latency": infos["latency"], | ||
| "review": reviews_dict.get(model_name, {}) | ||
| } | ||
| return json.dumps(result, indent=4, ensure_ascii=False) if fmt else reviews |
There was a problem hiding this comment.
The function is not validating that the returned reviews match the expected format before trying to transform them. This could lead to errors if the review model doesn't return data in the expected format.
| latency = time_end - time_start | |
| answers[config["model_name"]] = { | |
| "answer": output, | |
| "latency": f"{round(latency, 2)}s" | |
| } | |
| reviews = judge( | |
| {k: v["answer"] for k, v in answers.items()}, | |
| standard_answer, | |
| review_model_name, | |
| review_max_tokens, | |
| key, | |
| base | |
| ) | |
| log.debug("reviews: %s", reviews) | |
| result = {} | |
| reviews_dict = {item["model"]: item for item in reviews} if isinstance(reviews, list) else reviews | |
| for model_name, infos in answers.items(): | |
| result[model_name] = { | |
| "answer": infos["answer"], | |
| "latency": infos["latency"], | |
| "review": reviews_dict.get(model_name, {}) | |
| } | |
| return json.dumps(result, indent=4, ensure_ascii=False) if fmt else reviews | |
| reviews = judge( | |
| {k: v["answer"] for k, v in answers.items()}, | |
| standard_answer, | |
| review_model_name, | |
| review_max_tokens, | |
| key, | |
| base | |
| ) | |
| log.debug("reviews: %s", reviews) | |
| # Validate reviews format | |
| if isinstance(reviews, dict) and "error" in reviews: | |
| # Handle error case | |
| result = {} | |
| for model_name, infos in answers.items(): | |
| result[model_name] = { | |
| "answer": infos["answer"], | |
| "latency": infos["latency"], | |
| "review": {"error": reviews["error"]} | |
| } | |
| return json.dumps(result, indent=4, ensure_ascii=False) if fmt else reviews | |
| # Process valid reviews | |
| result = {} | |
| reviews_dict = {item["model"]: item for item in reviews} if isinstance(reviews, list) else reviews | |
| for model_name, infos in answers.items(): | |
| result[model_name] = { | |
| "answer": infos["answer"], | |
| "latency": infos["latency"], | |
| "review": reviews_dict.get(model_name, {}) | |
| } |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
hugegraph-llm/src/hugegraph_llm/demo/rag_demo/other_block.py (3)
48-55: 优化嵌套的 with 语句当前代码使用了嵌套的 with 语句,可以简化为单个 with 语句包含多个上下文。此外,getattr 函数用于获取常量属性值,可以直接使用属性访问替代。
- # auto test llm - with gr.Accordion("Evaluation Model Settings (only support openai)", open=True): - with gr.Row(): - review_model_name = gr.Textbox(label="Model Name", value="ernie-4.5-8k-preview", interactive=True) - review_max_tokens = gr.Textbox(label="Max Tokens", value=2048) - key = gr.Textbox(value=getattr(llm_settings, "openai_chat_api_key"), label="API Key") - base = gr.Textbox(value=getattr(llm_settings, "openai_chat_api_base"),label="API Base") + # auto test llm + with gr.Accordion("Evaluation Model Settings (only support openai)", open=True), gr.Row(): + review_model_name = gr.Textbox(label="Model Name", value="ernie-4.5-8k-preview", interactive=True) + review_max_tokens = gr.Textbox(label="Max Tokens", value=2048) + key = gr.Textbox(value=llm_settings.openai_chat_api_key, label="API Key") + base = gr.Textbox(value=llm_settings.openai_chat_api_base, label="API Base")🧰 Tools
🪛 Ruff (0.11.9)
49-50: Use a single
withstatement with multiple contexts instead of nestedwithstatements(SIM117)
53-53: Do not call
getattrwith a constant attribute value. It is not any safer than normal property access.Replace
getattrwith attribute access(B009)
54-54: Do not call
getattrwith a constant attribute value. It is not any safer than normal property access.Replace
getattrwith attribute access(B009)
56-71: 优化嵌套结构并移除未使用变量当前实现中有嵌套的 with 语句可以优化,并且
tab_upload_file和tab_upload_text变量被赋值但从未使用。- with gr.Row(): - with gr.Column(): - with gr.Tab("file") as tab_upload_file: # pylint: disable=W0612 - inp1_file = gr.File( - value=os.path.join(resource_path, "demo", "llm_review.yaml"), - label="yaml file", - file_count="single", - ) - with gr.Tab("text") as tab_upload_text: # pylint: disable=W0612 - inp1 = gr.Textbox( - value="openai, model_name, api_key, api_base, max_tokens\n" \ - "qianfan_wenxin, model_name, api_key, secret_key\n" \ - "ollama/local, model_name, host, port, max_tokens\n" \ - "litellm, model_name, api_key, api_base, max_tokens\n", - label="LLMs Config (every line represents a different LLM)", - show_copy_button=True, lines=6 - ) + with gr.Row(), gr.Column(): + with gr.Tab("file"): # 移除未使用变量 + inp1_file = gr.File( + value=os.path.join(resource_path, "demo", "llm_review.yaml"), + label="yaml file", + file_count="single", + ) + with gr.Tab("text"): # 移除未使用变量 + inp1 = gr.Textbox( + value="openai, model_name, api_key, api_base, max_tokens\n" \ + "qianfan_wenxin, model_name, api_key, secret_key\n" \ + "ollama/local, model_name, host, port, max_tokens\n" \ + "litellm, model_name, api_key, api_base, max_tokens\n", + label="LLMs Config (every line represents a different LLM)", + show_copy_button=True, lines=6 + )🧰 Tools
🪛 Ruff (0.11.9)
57-57: Local variable
tab_upload_fileis assigned to but never usedRemove assignment to unused variable
tab_upload_file(F841)
63-63: Local variable
tab_upload_textis assigned to but never usedRemove assignment to unused variable
tab_upload_text(F841)
48-81: 建议添加错误处理和帮助文档新增的 LLM 测试功能缺乏错误处理和用户指导。建议添加以下改进:
- 添加错误处理,尤其是 API 密钥无效或网络连接问题时的友好提示
- 提供简短的使用说明或示例配置说明
- 考虑添加一个链接或按钮,引导用户查看更详细的文档
🧰 Tools
🪛 Ruff (0.11.9)
49-50: Use a single
withstatement with multiple contexts instead of nestedwithstatements(SIM117)
53-53: Do not call
getattrwith a constant attribute value. It is not any safer than normal property access.Replace
getattrwith attribute access(B009)
54-54: Do not call
getattrwith a constant attribute value. It is not any safer than normal property access.Replace
getattrwith attribute access(B009)
55-56: Use a single
withstatement with multiple contexts instead of nestedwithstatementsCombine
withstatements(SIM117)
57-57: Local variable
tab_upload_fileis assigned to but never usedRemove assignment to unused variable
tab_upload_file(F841)
63-63: Local variable
tab_upload_textis assigned to but never usedRemove assignment to unused variable
tab_upload_text(F841)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
hugegraph-llm/src/hugegraph_llm/demo/rag_demo/other_block.py(3 hunks)hugegraph-llm/src/hugegraph_llm/utils/other_tool_utils.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- hugegraph-llm/src/hugegraph_llm/utils/other_tool_utils.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
hugegraph-llm/src/hugegraph_llm/demo/rag_demo/other_block.py (2)
hugegraph-llm/src/hugegraph_llm/utils/other_tool_utils.py (1)
auto_test_llms(156-236)hugegraph-llm/src/hugegraph_llm/demo/rag_demo/vector_graph_block.py (1)
timely_update_vid_embedding(157-187)
🪛 Ruff (0.11.9)
hugegraph-llm/src/hugegraph_llm/demo/rag_demo/other_block.py
49-50: Use a single with statement with multiple contexts instead of nested with statements
(SIM117)
53-53: Do not call getattr with a constant attribute value. It is not any safer than normal property access.
Replace getattr with attribute access
(B009)
54-54: Do not call getattr with a constant attribute value. It is not any safer than normal property access.
Replace getattr with attribute access
(B009)
55-56: Use a single with statement with multiple contexts instead of nested with statements
Combine with statements
(SIM117)
57-57: Local variable tab_upload_file is assigned to but never used
Remove assignment to unused variable tab_upload_file
(F841)
63-63: Local variable tab_upload_text is assigned to but never used
Remove assignment to unused variable tab_upload_text
(F841)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build (3.10)
- GitHub Check: build (3.11)
🔇 Additional comments (5)
hugegraph-llm/src/hugegraph_llm/demo/rag_demo/other_block.py (5)
18-18: 添加 os 模块导入新增了 os 模块的导入,用于后续访问文件路径。这是一个必要的添加,符合逻辑。
28-28: 引入 auto_test_llms 函数从 other_tool_utils 导入了 auto_test_llms 函数,这与新增的 LLM 自动评测功能相符。
31-31: 添加配置相关导入新增了 llm_settings 和 resource_path 的导入,用于访问 LLM 配置和资源文件路径。
72-75: 输入字段设置恰当为 LLM 测试添加的提示词和标准答案输入字段设置合理,使用了适当的默认值并启用了复制按钮。
76-81: LLM 测试按钮及函数调用添加了"Run LLM Test"按钮,并正确连接到 auto_test_llms 函数,传递了所有必要的参数。实现符合功能需求。
| # auto test llm | ||
| with gr.Accordion("Evaluation Model Settings (only support openai)", open=True): | ||
| with gr.Row(): | ||
| review_model_name = gr.Textbox(label="Model Name", value="ernie-4.5-8k-preview", interactive=True) | ||
| review_max_tokens = gr.Textbox(label="Max Tokens", value=2048) | ||
| key = gr.Textbox(value=getattr(llm_settings, "openai_chat_api_key"), label="API Key") | ||
| base = gr.Textbox(value=getattr(llm_settings, "openai_chat_api_base"),label="API Base") | ||
| with gr.Row(): | ||
| with gr.Column(): | ||
| with gr.Tab("file") as tab_upload_file: # pylint: disable=W0612 | ||
| inp1_file = gr.File( | ||
| value=os.path.join(resource_path, "demo", "llm_review.yaml"), | ||
| label="yaml file", | ||
| file_count="single", | ||
| ) | ||
| with gr.Tab("text") as tab_upload_text: # pylint: disable=W0612 | ||
| inp1 = gr.Textbox( | ||
| value="openai, model_name, api_key, api_base, max_tokens\n" \ | ||
| "qianfan_wenxin, model_name, api_key, secret_key\n" \ | ||
| "ollama/local, model_name, host, port, max_tokens\n" \ | ||
| "litellm, model_name, api_key, api_base, max_tokens\n", | ||
| label="LLMs Config (every line represents a different LLM)", | ||
| show_copy_button=True, lines=6 | ||
| ) | ||
| with gr.Row(): | ||
| inp2 = gr.Textbox(value="hello, how are you?", label="Prompt", show_copy_button=True, lines=8) | ||
| inp3 = gr.Textbox(value="I am fine, thank you", label="Standard Answer", show_copy_button=True, lines=8) | ||
| out = gr.Code(label="Output", language="json", elem_classes="code-container-show") | ||
| btn = gr.Button("Run LLM Test") | ||
| btn.click( # pylint: disable=no-member | ||
| fn=auto_test_llms, | ||
| inputs=[inp1, inp1_file, inp2, inp3, review_model_name, review_max_tokens, key, base], | ||
| outputs=out | ||
| ) |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
验证 Tab 切换功能是否完整
注意到您添加了 Tab 切换界面,但缺少响应 Tab 选择的回调函数。如果用户在两个 Tab 间切换,可能会导致数据不一致。建议验证这种情况下的行为是否符合预期。
🏁 Script executed:
#!/bin/bash
# 检查之前是否存在 Tab 选择的实现或相关功能
grep -r "on_tab_select\|tab.*select\|select.*tab" --include="*.py" .Length of output: 565
补充 Tab 切换回调以保持数据一致性
在 hugegraph-llm/src/hugegraph_llm/demo/rag_demo/other_block.py (约48–81行)中新增了“file”/“text”两个 Tab,但未绑定切换回调。建议参考已实现的 vector_graph_block.py 中的 on_tab_select,用法示例:
-
文件:hugegraph-llm/src/hugegraph_llm/demo/rag_demo/other_block.py
位置:约 48–81 行
问题:tab_upload_file和tab_upload_text未调用.select(...),切换时两个输入组件可能无法正确同步。 -
示例(来自 vector_graph_block.py):
def on_tab_select(input_f, input_t, evt: gr.SelectData): # …根据 evt.index 返回对应输入… return input_f, input_t tab_upload_file.select( fn=on_tab_select, inputs=[input_file, input_text], outputs=[input_file, input_text] ) tab_upload_text.select( fn=on_tab_select, inputs=[input_file, input_text], outputs=[input_file, input_text] )
请在 other_block.py 中对 tab_upload_file、tab_upload_text 分别添加类似的 select 回调,确保切换时两个组件的值能够正确切换。
🧰 Tools
🪛 Ruff (0.11.9)
49-50: Use a single with statement with multiple contexts instead of nested with statements
(SIM117)
53-53: Do not call getattr with a constant attribute value. It is not any safer than normal property access.
Replace getattr with attribute access
(B009)
54-54: Do not call getattr with a constant attribute value. It is not any safer than normal property access.
Replace getattr with attribute access
(B009)
55-56: Use a single with statement with multiple contexts instead of nested with statements
Combine with statements
(SIM117)
57-57: Local variable tab_upload_file is assigned to but never used
Remove assignment to unused variable tab_upload_file
(F841)
63-63: Local variable tab_upload_text is assigned to but never used
Remove assignment to unused variable tab_upload_text
(F841)
🤖 Prompt for AI Agents
In hugegraph-llm/src/hugegraph_llm/demo/rag_demo/other_block.py around lines 48
to 81, the Tab components tab_upload_file and tab_upload_text lack select event
callbacks to handle tab switching. To fix this, define a callback function
similar to on_tab_select that synchronizes the values of the file and text
inputs based on the selected tab, then bind this function to both
tab_upload_file.select and tab_upload_text.select with appropriate inputs and
outputs to keep the input components consistent when switching tabs.
6a76d6c to
fe94881
Compare
Summary by CodeRabbit
新功能
优化