feat: ナレッジグラフ・検索精度改善・バグ修正 (#116-#179) by Kewton · Pull Request #183 · Kewton/CommandIndex

Kewton · 2026-03-25T12:33:25Z

Summary

LLM出力最適化 (#116, #117)

--format llm のbody切り詰め・重複除去・impacted_by折りたたみ
--max-tokens トークン量制御（search/impact）

バグ修正 (#123-#127, #157-#160, #165, #167, #177-#179)

--with-snippet 空文字列修正（パス正規化 + empty-bodyスキップ）
semantic検索のEmbeddingStore使用修正、rerank フォールバック通知
suggest 日本語/英語精度改善、ナレッジグラフ統合、展開制限
why 重複エントリ除去、progress-report分類修正、has_progress relation
before-change limitをIssue単位に変更
デフォルトembeddingモデルをbge-m3に変更
ハイブリッド検索BM25 0件時のセマンティックフォールバック
セマンティック検索にスニペット対応

Embedding改善 (#134, #135)

BGE-M3多言語embeddingモデル対応
バッチサイズ拡大 + SQLiteトランザクションバッチング

ナレッジグラフ (#139-#142, #144, #150, #151)

SQLiteナレッジグラフ（knowledge_nodes/edges）
issue show/list / why / before-change コマンド
fileノード・modifiesエッジ、review/stage検出
suggest RRFハイブリッド検索

ナレッジグラフ改善 (#168-#171)

issue/before-changeにスニペットインライン表示
issue listサブコマンド
why/issueのJSON日付情報付与
contextにナレッジグラフ統合

Test plan

cargo build / clippy / fmt 全Pass
UAT実施済み

Closes #116, #117, #123, #124, #125, #126, #127, #134, #135, #139, #140, #141, #142, #144, #150, #151, #157, #158, #159, #160, #165, #167, #168, #169, #170, #171, #177, #178, #179

🤖 Generated with Claude Code

…150) Add DocSubtype::StageReview variant and regex pattern to detect stage-specific review files in dev-reports/review/ directory. Also add DocSubtype::parse() method to centralize string-to-enum conversion (DRY improvement). Changes: - knowledge.rs: StageReview variant, as_str(), parse(), pattern rule - issue.rs: display_label() and sort_order() for StageReview - symbol_store.rs: delegate deserialization to DocSubtype::parse() - e2e tests: add StageReview coverage Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…151) Add `file` node type and `modifies` edge type to the knowledge graph, enabling `why` and `before-change` commands to trace code files back to related Issues and design documents via git commit history. Key changes: - Move ISSUE_RE to knowledge.rs as shared utility with extract_issue_numbers() - Add KnowledgeRelation::Modifies variant - Implement extract_file_modifies_from_git_log() for bulk git log parsing - Add insert_file_modifies_entries() and clear_file_modifies() to symbol_store - Update find_knowledge_related/find_knowledge_by_issue SQL queries for file nodes - Integrate into index command (steps 8.6 and 13.6) - Add modifies filtering in before-change and grouping display in why command Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(knowledge): review/のstage別レビューファイル検出 (#150)

feat(knowledge): fileノード・modifiesエッジ実装 (#151)

…fies extraction Two bugs fixed: 1. Git log parser: empty line after COMMIT_END sentinel incorrectly set reading_files=false, causing ALL file paths to be skipped. Now empty lines are ignored during file reading; state resets on next COMMIT_START. 2. ISSUE_RE: added issue-NNN pattern (e.g. feat(issue-99)) which was not matched by existing regex. Added capture group 5 for issue[-]?(\d+). Fixes #151 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix(knowledge): fix git log parser and ISSUE_RE pattern (#151)

When the query contains issue number patterns (#NNN, Issue #NNN, issue-NNN), the suggest command now queries the knowledge graph via SymbolStore and prepends related document steps to the strategy. This ensures issue-related documents (design policies, reviews, work plans) are prioritized in suggestions. - Add query_knowledge_graph() for best-effort KG lookup with graceful fallback - Add prepend_knowledge_steps() to insert issue/context steps at strategy head - Add matched_issues field to SuggestResult (skip_serializing_if empty) - Deduplicate issue numbers via HashSet, cap at MAX_ISSUE_NUMBERS (3) - Add 7 new unit tests for KG integration logic Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add SELECT DISTINCT to find_knowledge_related() SQL query to prevent Cartesian product from multiple edge paths. Extract group_knowledge_results() function with HashSet-based dedup on (issue_number, file_path, relation). Add modifies_count field to WhyIssueEntry instead of synthetic file_path strings, ensuring json/path formats output only real file paths. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(suggest): ナレッジグラフ統合 (#157)

fix(why): 出力の重複エントリ除去 (#158)

…160) progress-report.md was incorrectly displayed as [review] in why command output because doc_subtype was not propagated from knowledge_edges metadata. Changes: - Add display_label_en() method to DocSubtype for English display labels - Add doc_subtype field to KnowledgeRelatedResult and WhyDocumentEntry - Update find_knowledge_related() SQL to SELECT ke2.metadata - Update relation_display_label() to prioritize doc_subtype over relation - Apply relation_display_label() in LLM output for consistency - Update all existing tests with new doc_subtype field Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…#159) Change the --limit option semantics from limiting the number of documents to limiting the number of issues displayed. This ensures all related issues are visible even when some issues have many associated documents. Key changes: - New group_and_limit_by_issue() groups findings by issue and selects up to 2 representative docs per issue (design + workplan priority) - rank_by_max_similarity() now sorts by issue-level max similarity - findings_without_ranking() uses numeric descending issue sort - relation_priority: has_workplan now ranks above has_review - BeforeChangeResult gains displayed_issues field - --limit gets value_parser range(1..=1000) validation - All output formatters updated (human/json/llm) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix(before-change): limitをIssue単位に変更 (#159)

fix(why): progress-reportの分類修正 (#160)

…s_progress (#165) Add HasProgress variant to KnowledgeRelation enum so progress-report documents are stored with their own dedicated relation instead of reusing has_review. This prevents JSON consumers from misidentifying progress reports as reviews and enables accurate review-count queries. Also refactor find_documents_by_issue() to use KnowledgeRelation::parse() instead of a hardcoded match, eliminating DRY violation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix(knowledge): progress-reportのrelationをhas_progressに変更 (#165)

…et, and priority changes (#171) - Add KnowledgeGraphMeta struct to carry issue_number, relation, doc_subtype metadata - Change RelationType::KnowledgeGraph to struct variant with KnowledgeGraphMeta - Add is_knowledge_graph() and kg_meta() helper methods on RelationType - Increase KNOWLEDGE_GRAPH_WEIGHT from 0.8 to 0.95 for better KG visibility - Move KnowledgeGraph from 6th to 3rd priority in relation_to_string() - Add extract_kg_section() for doc_subtype-based snippet extraction - Update pattern matches in human.rs, llm.rs, json.rs, impact.rs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

) Restructure the `issue` command into subcommands (`issue list` / `issue show`) to allow users and AI agents to discover all indexed issues without knowing their numbers beforehand. Key changes: - Add `issue list` with --format human/json/path/llm support - Rename `issue <number>` to `issue show <number>` (breaking change) - Add `list_all_issues()` to SymbolStore with SQL aggregation query - Separate IssueListRow (data layer) from IssueListEntry (CLI layer) - Update suggest.rs and help_llm.rs for new subcommand syntax - Add 25 new tests (unit + E2E + CLI args) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Development documentation for issue list subcommand: - Design policy with 13 sections (architecture, SQL, security, tests) - Multi-stage issue review (8 stages, Claude + Codex) - Multi-stage design review (8 stages, Claude + Codex) - Work plan with 16 tasks across 5 phases Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…per issue (#167) Add filtering and per-issue limiting to suggest command's knowledge graph expansion, reducing proposals from ~80 to ~15-20 by selecting only representative documents (design policy, work plan, review summaries). - Add KnowledgeRelation::priority() method for shared relation ordering - Add filter_and_limit_kg_docs() with Modifies/HasProgress/StageReview exclusion - Switch from find_knowledge_by_issue() to find_documents_by_issue() for doc_subtype support - Limit to MAX_KG_DOCS_PER_ISSUE=4 documents per issue - Refactor before_change.rs relation_priority() as compatibility wrapper (DRY) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add date information to JSON output of `why` and `issue` commands to enable tracking the timeline of design decisions. Dates are extracted from filename patterns (YYYY-MM-DD prefix) with git log fallback. Breaking change: `issue --format json` output changes from string arrays to object arrays with {file_path, date} structure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add sanitize_label() to remove all control characters (including newlines) from labels before output, preventing output injection via malformed paths. Addresses Codex code review warning about newline-containing labels. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…#168) Add inline snippet display for issue and before-change commands, enabling users to see document summaries without reading each file individually. Changes: - Add snippet: Option<String> to BeforeChangeFinding and IssueDocumentEntry - Add --with-snippet, --snippet-lines, --snippet-chars CLI options - Add enrich_before_change_with_snippets() and enrich_issue_documents_with_snippets() - Unify existing enrich functions to convert empty strings to None - Update human/llm/json formatters for both commands - issue JSON: --with-snippet off = string[] (backward compat), on = object[] - Tantivy reader failure falls back to snippet: None (non-fatal) - Add 14 new tests (formatter + CLI args) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix(suggest)

feat(snippet)

feat(context)

…tion) Integrate --with-snippet feature from develop into issue subcommand structure: - Add with_snippet/snippet_lines/snippet_chars to IssueCommands::Show - Update snippet CLI tests to use `issue show` syntax - Keep both issue list tests and snippet tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Integrate snippet feature from develop with date feature from #170. Both date and snippet fields are preserved in IssueDocumentEntry. JSON output always uses object arrays with file_path, date, and optionally snippet when --with-snippet is enabled. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(issue)

feat(knowledge)

…dent test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Semantic search results now include body snippets instead of returning only headings with estimated tokens ~0. SnippetConfig and LlmFormatOptions are propagated through run_semantic_search() and format_semantic_results() to all output formatters (human/llm/json). Changes: - format_semantic_human(): accept SnippetConfig, replace hardcoded (2, 120) - format_semantic_llm(): accept LlmFormatOptions, apply truncate_body_for_llm - format_semantic_results(): accept SnippetConfig and LlmFormatOptions - run_semantic_search(): accept and forward snippet/llm options - enrich_with_metadata(): fallback to first section body on heading mismatch - main.rs: construct LlmFormatOptions in semantic branch, pass snippet_config - Add 3 new tests for semantic snippet functionality Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…rid search When BM25 returns 0 results, RRF merge compressed semantic scores to ~0.016 (1/61), making hybrid search nearly useless for queries without keyword matches. This adds a fallback path in try_hybrid_search() that returns semantic results with their original cosine similarity scores when BM25 is empty, consistent with the existing fallback in suggest.rs. Closes #178 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Change default model from nomic-embed-text to qllama/bge-m3:q8_0 for significantly better multilingual (especially Japanese) search - DRY: resolve_config() now calls default_model() instead of hardcoding - Add model_not_found_hint() shared helper for install guidance - Move delete_stale_model_embeddings() after first successful embed to prevent data loss when new model is not installed - Add ModelNotFound early exit in embed/index commands - Update README with new default, migration guide - Update test assertions and add new tests for hint, dimension, default Closes #177 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix(embedding): デフォルトモデルをbge-m3に変更 (#177)

fix(search): BM25 0件時のセマンティックフォールバック (#178)

fix(search): セマンティック検索にスニペット対応 (#179)

Kewton and others added 30 commits March 25, 2026 02:59

Merge pull request #154 from Kewton/feature/issue-150-review-detection

e1f6204

feat(knowledge): review/のstage別レビューファイル検出 (#150)

Merge pull request #155 from Kewton/feature/issue-151-file-modifies

b820ad7

feat(knowledge): fileノード・modifiesエッジ実装 (#151)

Merge pull request #156 from Kewton/feature/issue-151-file-modifies

65c7d00

fix(knowledge): fix git log parser and ISSUE_RE pattern (#151)

Merge pull request #161 from Kewton/fix/issue-157-suggest-kg

a148a17

feat(suggest): ナレッジグラフ統合 (#157)

Merge pull request #162 from Kewton/fix/issue-158-why-dedup

56460cd

fix(why): 出力の重複エントリ除去 (#158)

Merge pull request #163 from Kewton/fix/issue-159-before-change-limit

5dffff3

fix(before-change): limitをIssue単位に変更 (#159)

Merge pull request #164 from Kewton/fix/issue-160-why-classification

6c0d2aa

fix(why): progress-reportの分類修正 (#160)

Merge pull request #166 from Kewton/fix/issue-165-has-progress

9d61693

fix(knowledge): progress-reportのrelationをhas_progressに変更 (#165)

docs(issue-169): add pm-auto-dev reports (TDD, Codex review, progress)

3025faa

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

style(issue): apply cargo fmt

2cc1c43

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge pull request #172 from Kewton/fix/issue-167-suggest-limit

9c93ec3

fix(suggest)

Merge pull request #173 from Kewton/feature/issue-168-snippet-inline

78a3435

feat(snippet)

Merge pull request #176 from Kewton/feature/issue-171-context-kg

df3d9da

feat(context)

Kewton and others added 9 commits March 25, 2026 17:48

Merge pull request #174 from Kewton/feature/issue-169-issue-list

94aa8ff

feat(issue)

Merge pull request #175 from Kewton/feature/issue-170-json-date

4dae30d

feat(knowledge)

fix: add missing date field to KnowledgeEntry tests and fix env-depen…

17c49d4

…dent test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge pull request #180 from Kewton/fix/issue-177-bge-m3-default

6568615

fix(embedding): デフォルトモデルをbge-m3に変更 (#177)

Merge pull request #181 from Kewton/fix/issue-178-hybrid-bm25-zero

6ab5247

fix(search): BM25 0件時のセマンティックフォールバック (#178)

Merge pull request #182 from Kewton/fix/issue-179-semantic-snippet

7a79c92

fix(search): セマンティック検索にスニペット対応 (#179)

Kewton merged commit eac7684 into main Mar 25, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: ナレッジグラフ・検索精度改善・バグ修正 (#116-#179)#183

feat: ナレッジグラフ・検索精度改善・バグ修正 (#116-#179)#183
Kewton merged 39 commits intomainfrom
develop

Kewton commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Kewton commented Mar 25, 2026

Summary

LLM出力最適化 (#116, #117)

バグ修正 (#123-#127, #157-#160, #165, #167, #177-#179)

Embedding改善 (#134, #135)

ナレッジグラフ (#139-#142, #144, #150, #151)

ナレッジグラフ改善 (#168-#171)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant