Skip to content

feat: ナレッジグラフ・検索精度改善・バグ修正 (#116-#179)#183

Merged
Kewton merged 39 commits intomainfrom
develop
Mar 25, 2026
Merged

feat: ナレッジグラフ・検索精度改善・バグ修正 (#116-#179)#183
Kewton merged 39 commits intomainfrom
develop

Conversation

@Kewton
Copy link
Copy Markdown
Owner

@Kewton Kewton commented Mar 25, 2026

Summary

LLM出力最適化 (#116, #117)

  • --format llm のbody切り詰め・重複除去・impacted_by折りたたみ
  • --max-tokens トークン量制御(search/impact)

バグ修正 (#123-#127, #157-#160, #165, #167, #177-#179)

  • --with-snippet 空文字列修正(パス正規化 + empty-bodyスキップ)
  • semantic検索のEmbeddingStore使用修正、rerank フォールバック通知
  • suggest 日本語/英語精度改善、ナレッジグラフ統合、展開制限
  • why 重複エントリ除去、progress-report分類修正、has_progress relation
  • before-change limitをIssue単位に変更
  • デフォルトembeddingモデルをbge-m3に変更
  • ハイブリッド検索BM25 0件時のセマンティックフォールバック
  • セマンティック検索にスニペット対応

Embedding改善 (#134, #135)

  • BGE-M3多言語embeddingモデル対応
  • バッチサイズ拡大 + SQLiteトランザクションバッチング

ナレッジグラフ (#139-#142, #144, #150, #151)

  • SQLiteナレッジグラフ(knowledge_nodes/edges)
  • issue show/list / why / before-change コマンド
  • fileノード・modifiesエッジ、review/stage検出
  • suggest RRFハイブリッド検索

ナレッジグラフ改善 (#168-#171)

  • issue/before-changeにスニペットインライン表示
  • issue listサブコマンド
  • why/issueのJSON日付情報付与
  • contextにナレッジグラフ統合

Test plan

  • cargo build / clippy / fmt 全Pass
  • UAT実施済み

Closes #116, #117, #123, #124, #125, #126, #127, #134, #135, #139, #140, #141, #142, #144, #150, #151, #157, #158, #159, #160, #165, #167, #168, #169, #170, #171, #177, #178, #179

🤖 Generated with Claude Code

Kewton and others added 30 commits March 25, 2026 02:59
…150)

Add DocSubtype::StageReview variant and regex pattern to detect
stage-specific review files in dev-reports/review/ directory.
Also add DocSubtype::parse() method to centralize string-to-enum
conversion (DRY improvement).

Changes:
- knowledge.rs: StageReview variant, as_str(), parse(), pattern rule
- issue.rs: display_label() and sort_order() for StageReview
- symbol_store.rs: delegate deserialization to DocSubtype::parse()
- e2e tests: add StageReview coverage

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…151)

Add `file` node type and `modifies` edge type to the knowledge graph,
enabling `why` and `before-change` commands to trace code files back to
related Issues and design documents via git commit history.

Key changes:
- Move ISSUE_RE to knowledge.rs as shared utility with extract_issue_numbers()
- Add KnowledgeRelation::Modifies variant
- Implement extract_file_modifies_from_git_log() for bulk git log parsing
- Add insert_file_modifies_entries() and clear_file_modifies() to symbol_store
- Update find_knowledge_related/find_knowledge_by_issue SQL queries for file nodes
- Integrate into index command (steps 8.6 and 13.6)
- Add modifies filtering in before-change and grouping display in why command

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat(knowledge): review/のstage別レビューファイル検出 (#150)
feat(knowledge): fileノード・modifiesエッジ実装 (#151)
…fies extraction

Two bugs fixed:
1. Git log parser: empty line after COMMIT_END sentinel incorrectly set
   reading_files=false, causing ALL file paths to be skipped. Now empty
   lines are ignored during file reading; state resets on next COMMIT_START.
2. ISSUE_RE: added issue-NNN pattern (e.g. feat(issue-99)) which was not
   matched by existing regex. Added capture group 5 for issue[-]?(\d+).

Fixes #151

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fix(knowledge): fix git log parser and ISSUE_RE pattern (#151)
When the query contains issue number patterns (#NNN, Issue #NNN, issue-NNN),
the suggest command now queries the knowledge graph via SymbolStore and
prepends related document steps to the strategy. This ensures issue-related
documents (design policies, reviews, work plans) are prioritized in suggestions.

- Add query_knowledge_graph() for best-effort KG lookup with graceful fallback
- Add prepend_knowledge_steps() to insert issue/context steps at strategy head
- Add matched_issues field to SuggestResult (skip_serializing_if empty)
- Deduplicate issue numbers via HashSet, cap at MAX_ISSUE_NUMBERS (3)
- Add 7 new unit tests for KG integration logic

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add SELECT DISTINCT to find_knowledge_related() SQL query to prevent
Cartesian product from multiple edge paths. Extract group_knowledge_results()
function with HashSet-based dedup on (issue_number, file_path, relation).
Add modifies_count field to WhyIssueEntry instead of synthetic file_path
strings, ensuring json/path formats output only real file paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat(suggest): ナレッジグラフ統合 (#157)
fix(why): 出力の重複エントリ除去 (#158)
…160)

progress-report.md was incorrectly displayed as [review] in why command
output because doc_subtype was not propagated from knowledge_edges metadata.

Changes:
- Add display_label_en() method to DocSubtype for English display labels
- Add doc_subtype field to KnowledgeRelatedResult and WhyDocumentEntry
- Update find_knowledge_related() SQL to SELECT ke2.metadata
- Update relation_display_label() to prioritize doc_subtype over relation
- Apply relation_display_label() in LLM output for consistency
- Update all existing tests with new doc_subtype field

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…#159)

Change the --limit option semantics from limiting the number of
documents to limiting the number of issues displayed. This ensures
all related issues are visible even when some issues have many
associated documents.

Key changes:
- New group_and_limit_by_issue() groups findings by issue and selects
  up to 2 representative docs per issue (design + workplan priority)
- rank_by_max_similarity() now sorts by issue-level max similarity
- findings_without_ranking() uses numeric descending issue sort
- relation_priority: has_workplan now ranks above has_review
- BeforeChangeResult gains displayed_issues field
- --limit gets value_parser range(1..=1000) validation
- All output formatters updated (human/json/llm)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fix(before-change): limitをIssue単位に変更 (#159)
fix(why): progress-reportの分類修正 (#160)
…s_progress (#165)

Add HasProgress variant to KnowledgeRelation enum so progress-report
documents are stored with their own dedicated relation instead of
reusing has_review. This prevents JSON consumers from misidentifying
progress reports as reviews and enables accurate review-count queries.

Also refactor find_documents_by_issue() to use KnowledgeRelation::parse()
instead of a hardcoded match, eliminating DRY violation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fix(knowledge): progress-reportのrelationをhas_progressに変更 (#165)
…et, and priority changes (#171)

- Add KnowledgeGraphMeta struct to carry issue_number, relation, doc_subtype metadata
- Change RelationType::KnowledgeGraph to struct variant with KnowledgeGraphMeta
- Add is_knowledge_graph() and kg_meta() helper methods on RelationType
- Increase KNOWLEDGE_GRAPH_WEIGHT from 0.8 to 0.95 for better KG visibility
- Move KnowledgeGraph from 6th to 3rd priority in relation_to_string()
- Add extract_kg_section() for doc_subtype-based snippet extraction
- Update pattern matches in human.rs, llm.rs, json.rs, impact.rs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
)

Restructure the `issue` command into subcommands (`issue list` / `issue show`)
to allow users and AI agents to discover all indexed issues without knowing
their numbers beforehand.

Key changes:
- Add `issue list` with --format human/json/path/llm support
- Rename `issue <number>` to `issue show <number>` (breaking change)
- Add `list_all_issues()` to SymbolStore with SQL aggregation query
- Separate IssueListRow (data layer) from IssueListEntry (CLI layer)
- Update suggest.rs and help_llm.rs for new subcommand syntax
- Add 25 new tests (unit + E2E + CLI args)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Development documentation for issue list subcommand:
- Design policy with 13 sections (architecture, SQL, security, tests)
- Multi-stage issue review (8 stages, Claude + Codex)
- Multi-stage design review (8 stages, Claude + Codex)
- Work plan with 16 tasks across 5 phases

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…per issue (#167)

Add filtering and per-issue limiting to suggest command's knowledge graph
expansion, reducing proposals from ~80 to ~15-20 by selecting only
representative documents (design policy, work plan, review summaries).

- Add KnowledgeRelation::priority() method for shared relation ordering
- Add filter_and_limit_kg_docs() with Modifies/HasProgress/StageReview exclusion
- Switch from find_knowledge_by_issue() to find_documents_by_issue() for doc_subtype support
- Limit to MAX_KG_DOCS_PER_ISSUE=4 documents per issue
- Refactor before_change.rs relation_priority() as compatibility wrapper (DRY)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add date information to JSON output of `why` and `issue` commands to
enable tracking the timeline of design decisions. Dates are extracted
from filename patterns (YYYY-MM-DD prefix) with git log fallback.

Breaking change: `issue --format json` output changes from string
arrays to object arrays with {file_path, date} structure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add sanitize_label() to remove all control characters (including newlines)
from labels before output, preventing output injection via malformed paths.

Addresses Codex code review warning about newline-containing labels.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…#168)

Add inline snippet display for issue and before-change commands, enabling
users to see document summaries without reading each file individually.

Changes:
- Add snippet: Option<String> to BeforeChangeFinding and IssueDocumentEntry
- Add --with-snippet, --snippet-lines, --snippet-chars CLI options
- Add enrich_before_change_with_snippets() and enrich_issue_documents_with_snippets()
- Unify existing enrich functions to convert empty strings to None
- Update human/llm/json formatters for both commands
- issue JSON: --with-snippet off = string[] (backward compat), on = object[]
- Tantivy reader failure falls back to snippet: None (non-fatal)
- Add 14 new tests (formatter + CLI args)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tion)

Integrate --with-snippet feature from develop into issue subcommand structure:
- Add with_snippet/snippet_lines/snippet_chars to IssueCommands::Show
- Update snippet CLI tests to use `issue show` syntax
- Keep both issue list tests and snippet tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Integrate snippet feature from develop with date feature from #170.
Both date and snippet fields are preserved in IssueDocumentEntry.
JSON output always uses object arrays with file_path, date, and
optionally snippet when --with-snippet is enabled.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Kewton and others added 9 commits March 25, 2026 17:48
…dent test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Semantic search results now include body snippets instead of returning
only headings with estimated tokens ~0. SnippetConfig and LlmFormatOptions
are propagated through run_semantic_search() and format_semantic_results()
to all output formatters (human/llm/json).

Changes:
- format_semantic_human(): accept SnippetConfig, replace hardcoded (2, 120)
- format_semantic_llm(): accept LlmFormatOptions, apply truncate_body_for_llm
- format_semantic_results(): accept SnippetConfig and LlmFormatOptions
- run_semantic_search(): accept and forward snippet/llm options
- enrich_with_metadata(): fallback to first section body on heading mismatch
- main.rs: construct LlmFormatOptions in semantic branch, pass snippet_config
- Add 3 new tests for semantic snippet functionality

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rid search

When BM25 returns 0 results, RRF merge compressed semantic scores to
~0.016 (1/61), making hybrid search nearly useless for queries without
keyword matches. This adds a fallback path in try_hybrid_search() that
returns semantic results with their original cosine similarity scores
when BM25 is empty, consistent with the existing fallback in suggest.rs.

Closes #178

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Change default model from nomic-embed-text to qllama/bge-m3:q8_0
  for significantly better multilingual (especially Japanese) search
- DRY: resolve_config() now calls default_model() instead of hardcoding
- Add model_not_found_hint() shared helper for install guidance
- Move delete_stale_model_embeddings() after first successful embed
  to prevent data loss when new model is not installed
- Add ModelNotFound early exit in embed/index commands
- Update README with new default, migration guide
- Update test assertions and add new tests for hint, dimension, default

Closes #177

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fix(embedding): デフォルトモデルをbge-m3に変更 (#177)
fix(search): BM25 0件時のセマンティックフォールバック (#178)
fix(search): セマンティック検索にスニペット対応 (#179)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

search/related/impact のJSON出力のLLMプロンプト最適化

1 participant