feat(logging): add diagnostic logging at key error paths and bundle logs in export#1756
feat(logging): add diagnostic logging at key error paths and bundle logs in export#1756
Conversation
…rt bundling - Added logging for various error scenarios in tools, including file read/write, glob, and shell command execution. - Implemented logging for HTTP errors and timeouts in web fetch and search tools. - Introduced a new test suite to validate logging behavior for diagnostic purposes, ensuring key error paths are logged appropriately. - Updated the export functionality to include recent diagnostic logs and added warnings for potential sensitive information.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c9a04f8bfa
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
src/kimi_cli/cli/export.py
Outdated
| for log_path in log_files: | ||
| zf.write(log_path, arcname=f"logs/{log_path.name}") |
There was a problem hiding this comment.
Tolerate log rotation while adding files to export ZIP
The new log bundling loop assumes every path in log_files still exists when zf.write runs, but these files live in the global rotating log directory and can be renamed/deleted by rotation/retention between collection and archive creation. In that case zipfile raises FileNotFoundError/OSError and aborts the whole export, even though the session data itself is still exportable. Please guard each log write with error handling (skip missing logs) so kimi export remains reliable during concurrent log rotation.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR adds broader diagnostic logging across error paths and enhances kimi export to bundle recent global log files into the exported session ZIP, so exported artifacts contain enough context to debug failures.
Changes:
- Added
logger.warning/logger.errorat multiple previously-silent failure points (tools, soul loop, hooks, LLM creation, subagent runner). - Updated
kimi exportto include recent global log files in the ZIP (with a privacy notice). - Added focused tests for log-file collection and for selected new logging call sites; updated EN/ZH docs to mention diagnostic logs.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/core/test_session_logging.py | Adds tests for export log collection and for representative new logging call sites. |
| src/kimi_cli/cli/export.py | Collects and zips recent global logs alongside session files; adds privacy notice. |
| src/kimi_cli/tools/web/search.py | Logs HTTP/parse/timeout/network errors for SearchWeb. |
| src/kimi_cli/tools/web/fetch.py | Logs HTTP/timeout/network errors for FetchURL (local + service paths). |
| src/kimi_cli/tools/shell/init.py | Logs exceptions for foreground shell execution and background task start failures. |
| src/kimi_cli/tools/file/read.py | Logs unexpected ReadFile failures before returning ToolError. |
| src/kimi_cli/tools/file/write.py | Logs unexpected WriteFile failures before returning ToolError. |
| src/kimi_cli/tools/file/replace.py | Logs unexpected StrReplaceFile failures before returning ToolError. |
| src/kimi_cli/tools/file/read_media.py | Logs unexpected ReadMediaFile failures before returning ToolError. |
| src/kimi_cli/tools/file/glob.py | Logs unexpected Glob failures before returning ToolError. |
| src/kimi_cli/tools/file/grep_local.py | Logs unexpected Grep failures before returning ToolError. |
| src/kimi_cli/subagents/runner.py | Adds warning logs for common subagent failure classes (max steps / provider errors). |
| src/kimi_cli/soul/toolset.py | Logs tool JSON parse failures, tool execution failures, tool-load failures, and MCP tool error/timeout paths. |
| src/kimi_cli/soul/kimisoul.py | Adds error/warning logs for compaction failures, step failures, connection recovery exhaustion, and retry attempts. |
| src/kimi_cli/llm.py | Logs a warning when LLM creation fails due to missing base_url/model. |
| src/kimi_cli/hooks/engine.py | Promotes some hook failure paths to warning logs and logs hook “block” events. |
| docs/en/reference/kimi-command.md | Updates kimi export docs to mention diagnostic logs. |
| docs/zh/reference/kimi-command.md | Updates kimi export docs to mention diagnostic logs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/kimi_cli/cli/export.py
Outdated
| result.append(f) | ||
|
|
||
| result.sort(key=lambda p: p.stat().st_mtime) | ||
| return result |
There was a problem hiding this comment.
_collect_recent_log_files() sorts using p.stat() again, which can raise (e.g., file rotated/deleted between the earlier stat() and the sort) and abort the entire export. Consider caching mtimes alongside paths and sorting on the cached value, or handling OSError during the sort to make export best-effort.
src/kimi_cli/soul/toolset.py
Outdated
| except ImportError: | ||
| logger.warning("Tool module import failed: {module_name}", module_name=module_name) |
There was a problem hiding this comment.
The except ImportError: path logs only the module name, but not the actual exception, which makes diagnosing missing deps / import-time errors harder. Capture the exception (except ImportError as e) and include it in the log (or use logger.exception) so diagnostics contain the root cause.
| except ImportError: | |
| logger.warning("Tool module import failed: {module_name}", module_name=module_name) | |
| except ImportError as e: | |
| logger.warning( | |
| "Tool module import failed: {module_name}: {error}", | |
| module_name=module_name, | |
| error=str(e), | |
| ) |
| if response.status >= 400: | ||
| logger.warning( | ||
| "FetchURL HTTP error: status={status}, url={url}", | ||
| status=response.status, | ||
| url=params.url, | ||
| ) |
There was a problem hiding this comment.
These warnings log the full params.url. URLs can embed credentials/tokens (userinfo or query params), and these logs are now bundled into kimi export. Consider redacting sensitive URL parts (e.g., strip userinfo and query string, or redact known secret keys) before logging.
| except Exception as e: | ||
| logger.error( | ||
| "Shell command execution failed: {command}: {error}", | ||
| command=params.command, | ||
| error=e, | ||
| ) |
There was a problem hiding this comment.
Logging the full shell command can leak secrets (tokens/passwords embedded in CLI args), and these logs are now bundled by kimi export. Consider logging a redacted/normalized command (or only the executable + a hint) while keeping the full command in the approval display/output where appropriate.
| @@ -712,6 +721,12 @@ async def _agent_loop(self) -> TurnOutcome: | |||
| back_to_the_future = e | |||
| except Exception as e: | |||
| # any other exception should interrupt the step | |||
| logger.error( | |||
| "Agent step {step_no} failed: {error_type}: {error}", | |||
| step_no=step_no, | |||
| error_type=type(e).__name__, | |||
| error=e, | |||
| ) | |||
There was a problem hiding this comment.
These exception handlers log only type/str via logger.error, which drops the traceback. Since this is a key diagnostic path (and logs are now exported for debugging), consider using logger.exception(...) (or equivalent) to include the stack trace for compaction/step failures.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fcc56c89a1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
src/kimi_cli/cli/export.py
Outdated
| ): | ||
| result.append(f) | ||
|
|
||
| result.sort(key=lambda p: p.stat().st_mtime) |
There was a problem hiding this comment.
Handle missing log files when sorting by mtime
The new export path can still fail under concurrent log rotation before ZIP creation: _collect_recent_log_files already handles OSError during the first stat, but result.sort(key=lambda p: p.stat().st_mtime) does a second uncaught stat. If retention/rotation removes or renames a log between collection and sort, FileNotFoundError bubbles up and aborts kimi export. Please sort using the already-read mtimes (or catch OSError in the sort key path) so export remains robust while logs rotate.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f07c30a3a5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for file_path in files: | ||
| zf.write(file_path, arcname=file_path.name) | ||
| with contextlib.suppress(OSError): | ||
| zf.write(file_path, arcname=str(file_path.relative_to(session_dir))) |
There was a problem hiding this comment.
Fail export when session files cannot be archived
Wrapping session-file writes in contextlib.suppress(OSError) makes kimi export succeed even when core artifacts (for example context.jsonl or wire.jsonl) cannot be read due to transient deletion, permission changes, or I/O errors. That silently produces incomplete archives with no signal to the user, which undermines the purpose of exporting a session for debugging. The suppression should be limited to optional global logs (or at least surfaced as an explicit warning/error) rather than required session data.
Useful? React with 👍 / 👎.
| if total_size + size > _MAX_LOG_BYTES: | ||
| break |
There was a problem hiding this comment.
Continue past oversized logs when enforcing size cap
The size-cap loop breaks on the first file that would exceed _MAX_LOG_BYTES, so one very large high-priority log can cause all subsequent smaller logs to be skipped even when they would fit within the cap. In practice this can yield an export with no diagnostic logs at all despite having many eligible small logs. To preserve diagnostic coverage, oversized entries should be skipped and iteration should continue.
Useful? React with 👍 / 👎.
…tication guidance
Signed-off-by: Kai <me@kaiyi.cool>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 61b974e2c7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| # Collect candidates with (mtime, size, path) | ||
| candidates: list[tuple[float, int, Path]] = [] | ||
| for f in log_dir.iterdir(): |
There was a problem hiding this comment.
Handle OSError when scanning the global log directory
_collect_recent_log_files() treats diagnostic logs as optional, but for f in log_dir.iterdir() is unguarded. If ~/.kimi/logs exists but is unreadable (permission change) or disappears between is_dir() and iteration, iterdir() raises OSError and aborts kimi export, even though session data is still exportable. Wrap directory iteration in try/except OSError (or fall back to []) so export remains best-effort for logs.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c1688a3722
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| # Collect candidates with (mtime, size, path) | ||
| candidates: list[tuple[float, int, Path]] = [] | ||
| for f in log_dir.iterdir(): |
There was a problem hiding this comment.
Handle OSError when enumerating the global logs directory
_collect_recent_log_files only checks log_dir.is_dir() before iterating, but log_dir.iterdir() itself can raise OSError (for example if the logs directory is deleted, permission-changed, or transiently unavailable between the check and iteration). In that case kimi export aborts even though logs are optional and the session archive could still be produced. Please wrap the directory listing step in try/except OSError and treat this as a best-effort miss.
Useful? React with 👍 / 👎.
| for log_path in log_files: | ||
| with contextlib.suppress(OSError): | ||
| zf.write(log_path, arcname=f"logs/{log_path.name}") |
There was a problem hiding this comment.
Avoid buffering log-heavy exports fully in memory
This change can now add up to 100MB of global logs to each export, but the archive is still assembled in an in-memory BytesIO and then copied again via getvalue(). In low-memory environments, exporting a large session plus diagnostic logs can consume hundreds of MB transiently and fail with OOM, which makes the diagnostic path unreliable exactly when needed. Stream the zip directly to the destination file instead of buffering the whole archive in RAM.
Useful? React with 👍 / 👎.
Description
Add diagnostic logging across the codebase so that when users export a session via
kimi export, developers have sufficient error context to debug issues.Changes
1. Logger calls at key error paths (25+ call sites)
Added
logger.warning/logger.errorat previously silent error paths:create_llmmissing config, hook block events, hook callback failures with error details2. Export bundles global diagnostic logs
kimi exportnow includes recent global log files (logs/subdirectory in ZIP):wire.jsonltimestamps ± 2 days)wire.jsonlis absent3. Documentation
Updated
kimi exportdescriptions in both EN/ZH docs to mention diagnostic logs.