Add format validation for WriteFile tool#1738
Add format validation for WriteFile tool#1738MaxwellGengYF wants to merge 2 commits intoMoonshotAI:mainfrom
Conversation
Format validation is triggered after a file is successfully written or appended. **The performance impact is negligible** — validation runs only on JSON, XML, and Markdown files after write completion, incurring minimal overhead from standard library parsers. If validation fails: - The operation is marked as unsuccessful (returns `ToolError`) - Error messages exceeding 65,536 characters are saved to temporary files to prevent display issues - Error messages include line and column numbers to facilitate debugging
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5bd4f923c3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if line.strip().startswith('```'): | ||
| continue |
There was a problem hiding this comment.
Skip fenced code contents in markdown validators
check_md is now wired into WriteFile, but the markdown checks only skip lines that start with ``` and still lint the lines inside fenced blocks. This makes valid markdown fail when code samples contain characters like # or `[` (for example `#include <...>`), so `WriteFile` returns a `ToolError` for otherwise valid `.md` files. Track fence state across lines (as `_check_code_fence_balance` already does) and bypass header/link/table checks while inside a fenced block.
Useful? React with 👍 / 👎.
| i = 0 | ||
| while i < len(line): | ||
| # Look for opening bracket (not escaped) | ||
| if line[i] == '[' and not _is_escaped(line, i): |
There was a problem hiding this comment.
Limit link-balance checks to actual markdown links
The validator treats every unescaped [ as the start of a markdown link and reports an error if no matching ] is found, but unmatched brackets are valid plain text in markdown. With this change, writing a .md file containing normal prose like array[index now fails format validation and surfaces as a WriteFile error. The scan should only enforce parenthesis/bracket balance for recognized link/image syntaxes rather than all bracket characters.
Useful? React with 👍 / 👎.
|
|
||
| if md_callback: | ||
| md_callback(content) | ||
| return errors |
There was a problem hiding this comment.
🟡 check_md and check_md_str return empty list [] instead of None on valid input
When markdown validation passes (no errors), _validate_markdown returns an empty list []. The if errors: guard at line 80 is falsy for [], so the early return is skipped, and return errors at line 85 returns [] instead of the documented None. The same issue exists in check_md_str at line 110. While the current caller in write.py:203 uses if fmt_error: (which treats [] as falsy), this violates the function's str | None return contract and will break any future caller using if result is None: or similar identity checks.
| return errors | |
| return None |
Was this helpful? React with 👍 or 👎 to provide feedback.
|
|
||
| if md_callback: | ||
| md_callback(content) | ||
| return errors |
There was a problem hiding this comment.
🟡 check_md_str returns empty list [] instead of None on valid input
Same issue as check_md at line 85: when markdown is valid, return errors returns [] instead of None, violating the documented str | None return type.
| return errors | |
| return None |
Was this helpful? React with 👍 or 👎 to provide feedback.
| def _check_header_syntax(lines: list[str]) -> list[str]: | ||
| """Check if headers have proper syntax.""" | ||
| errors: list[str] = [] | ||
|
|
||
| for line_idx, line in enumerate(lines, start=1): | ||
| stripped = line.lstrip() | ||
|
|
||
| # Check for header syntax | ||
| if stripped.startswith('#'): | ||
| # Count leading hashes | ||
| match = re.match(r'^(#{1,6})\s', stripped) | ||
| if not match: | ||
| # Check if it's more than 6 hashes (invalid header level) | ||
| hash_match = re.match(r'^(#{7,})\s', stripped) | ||
| if hash_match: | ||
| col = line.index('#') + 1 | ||
| errors.append( | ||
| f"Markdown validation error at line {line_idx}, column {col}: Invalid header level (max 6), found {len(hash_match.group(1))} # characters") | ||
| else: | ||
| # Header without space after hashes | ||
| hash_only_match = re.match(r'^(#{1,6})([^\s]|$)', stripped) | ||
| if hash_only_match: | ||
| col = line.index('#') + 1 | ||
| errors.append( | ||
| f"Markdown validation error at line {line_idx}, column {col}: Header must be followed by a space") | ||
|
|
||
| return errors |
There was a problem hiding this comment.
🔴 Markdown validators don't track code block state, causing false positive errors on valid files
The _check_header_syntax, _check_link_balance, and _check_table_syntax functions all iterate through every line of the markdown without tracking whether lines are inside a fenced code block. This means content inside code blocks (e.g., #comment in a Python block, data[0] split across lines, or piped shell output) is validated as if it were markdown prose. For instance, #comment inside a ```python block triggers "Header must be followed by a space", and a multi-line array index like items[\n 0\n] triggers "Unclosed link bracket". Since check_md is called from write.py:201 and any returned error causes a ToolError at write.py:205-208, valid .md files with code blocks will be reported as having format validation failures, potentially causing the LLM to attempt unnecessary and harmful "fixes". Note that _check_code_fence_balance already correctly tracks open/close fence state but this information is not shared with the other validators.
Reproduction: Python comment inside code block
A markdown file containing:
# Title
\`\`\`python
#comment
\`\`\`
_check_header_syntax flags line 4 (#comment) as "Header must be followed by a space" because it doesn't know it's inside a code fence.
Similarly, _check_link_balance at line 216 has a comment saying "Skip code blocks" but only skips the fence marker lines themselves (if line.strip().startswith('\``'): continue`), not lines inside the block.
Was this helpful? React with 👍 or 👎 to provide feedback.
| for line_idx, line in enumerate(lines, start=1): | ||
| # Skip code blocks | ||
| if line.strip().startswith('```'): | ||
| continue |
There was a problem hiding this comment.
🔴 _check_link_balance only skips code fence lines, not content inside code blocks
At check_fmt.py:216, the function skips lines matching line.strip().startswith('\``')(the fence markers themselves), but does not track whether it is currently inside a fenced code block. All lines *between* opening and closing fences are still checked for bracket balance. Any[character inside a code block (e.g., array indexingarr[i], or example link syntax) that isn't closed on the same line will be reported as "Unclosed link bracket". This produces false positive ToolErrors from the WriteFile tool for valid .md` files.
Prompt for agents
The _check_link_balance function in src/kimi_cli/tools/file/check_fmt.py at line 216 only skips lines that ARE code fence markers (starting with ```), but does not skip content lines that are inside a fenced code block. This causes false positives for bracket characters inside code blocks.
The fix should add an in_code_block state variable that toggles when a ``` fence line is encountered, and skip all link balance checking while in_code_block is True. This is the same pattern needed by _check_header_syntax and _check_table_syntax.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 122a6b94a3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| """ | ||
| try: | ||
| js = None | ||
| with open(file_path, 'r', encoding='utf-8') as f: |
There was a problem hiding this comment.
Validate files through KaosPath backend
check_json reads file_path with Python open(), but WriteFile writes via KaosPath (which can target non-local backends such as ACP/remote Kaos). In those sessions this validation step reads the local filesystem path instead of the file that was just written, so .json/.xml/.md writes can incorrectly fail (or validate the wrong file). Please run format validation through Kaos-backed reads so validation uses the same filesystem backend as the write.
Useful? React with 👍 / 👎.
| if stripped.startswith('|') and stripped.endswith('|'): | ||
| if not in_table: | ||
| in_table = True | ||
| table_start_line = line_idx |
There was a problem hiding this comment.
Avoid treating any pipe-delimited line as a table
This branch marks every line that starts and ends with | as a table row, then later errors if no separator row is found. In Markdown, a lone line like | not a table | is valid plain text unless it is followed by a table separator, so this logic rejects valid .md content and causes WriteFile to return a format error unnecessarily. Detect a table only after confirming a valid header+separator pattern.
Useful? React with 👍 / 👎.
#1736
Format validation is triggered after a file is successfully written or appended. The performance impact is negligible — validation runs only on JSON, XML, and Markdown files after write completion, incurring minimal overhead from standard library parsers.
If validation fails:
ToolError)