feat(index): multi-threaded parallel indexing via worker_threads by mars167 · Pull Request #21 · mars167/git-ai-cli

mars167 · 2026-02-06T16:00:44Z

Summary

Add worker thread pool for CPU-bound parse+embed+quantize operations
LanceDB: parallel writes per language table (Promise.all)
Incremental indexer: Promise concurrency + optional worker pool
Config: useWorkerThreads, workerThreadsMinFiles (default 50)
Fallback to single-threaded when pool unavailable or file count < threshold

Made with Cursor

- Remove all DSR-related descriptions from README.md - Update architecture diagram to reflect current implementation - Remove DSR tools from skill templates and references - Update AGENTS.md files across codebase to remove DSR references - Simplify core capabilities section to focus on vector + graph retrieval - Update comparison table to highlight repo-map with PageRank

- Add worker thread pool for CPU-bound parse+embed+quantize operations - LanceDB: parallel writes per language table (Promise.all) - Incremental indexer: Promise concurrency + optional worker pool - Config: useWorkerThreads, workerThreadsMinFiles (default 50) - Fallback to single-threaded when pool unavailable or file count < threshold Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

chatgpt-codex-connector · 2026-02-06T16:00:50Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Copilot

Pull request overview

This PR introduces true multi-threaded indexing using Node.js worker_threads to speed up CPU-bound parsing/embedding/quantization, along with parallelized LanceDB writes. It also updates multiple docs/templates to remove DSR-related references and reflect the current tool/command set.

Changes:

Add worker_threads worker entry + a worker pool, and wire it into runParallelIndexing with config gating (useWorkerThreads, workerThreadsMinFiles).
Update incremental and full indexers to perform LanceDB writes per-language in parallel.
Refresh docs/templates/AGENTS to remove DSR references and emphasize repo-map + graph/vector retrieval.

Reviewed changes

Copilot reviewed 17 out of 18 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
templates/agents/common/skills/git-ai-code-search/references/tools.md	Removes DSR tool documentation from templates.
templates/agents/common/skills/git-ai-code-search/references/constraints.md	Removes DSR-related constraints/rules from templates.
templates/agents/common/skills/git-ai-code-search/SKILL.md	Updates skill description/rules to remove DSR history references.
skills/git-ai-code-search/references/tools.md	Removes DSR tool documentation from shipped skill refs.
skills/git-ai-code-search/references/constraints.md	Removes DSR-related constraints/rules from shipped skill refs.
skills/git-ai-code-search/SKILL.md	Updates skill description/rules to remove DSR history references.
src/core/indexing/worker.ts	New worker entrypoint implementing CPU-bound indexing per file.
src/core/indexing/pool.ts	New fixed-size worker pool to distribute file tasks and collect results.
src/core/indexing/parallel.ts	Adds worker-pool path with fallback to existing single-threaded implementation.
src/core/indexing/config.ts	Adds worker-thread enablement config knobs and defaults.
src/core/indexerIncremental.ts	Adds optional worker-pool processing + parallel per-language LanceDB writes.
src/core/indexer.ts	Switches per-language LanceDB writes to `Promise.all`.
src/core/AGENTS.md	Updates core module overview/structure (removes DSR, adds repoMap).
src/commands/AGENTS.md	Updates command docs (removes dsr command references).
src/cli/AGENTS.md	Updates CLI architecture docs (removes dsr schemas/handlers, adds repo-map).
README.md	Updates product positioning/architecture diagrams away from DSR and toward repo-map/graph/vector.
AGENTS.md	Updates repository overview/structure text away from DSR references.
.gitignore	Adds `.cursor/` to ignored paths.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-06T17:04:09Z

src/core/indexing/worker.ts

+  const parsed = parser.parseContent(filePath, content);
+  const symbols: SymbolInfo[] = parsed.symbols;
+  const fileRefs: AstReference[] = parsed.refs;


Worker parsing uses parser.parseContent(...) directly, so any parse failure results in the file being skipped (worker returns error → pool resolves null). This diverges from the single-threaded path which applies parseFailureFallback (text-only/line-chunk/etc). Pass the relevant error-handling config in WorkerRequest and implement the same fallback logic in the worker so results are consistent regardless of useWorkerThreads.

Suggested change

const parsed = parser.parseContent(filePath, content);

const symbols: SymbolInfo[] = parsed.symbols;

const fileRefs: AstReference[] = parsed.refs;

let symbols: SymbolInfo[] = [];

let fileRefs: AstReference[] = [];

try {

const parsed = parser.parseContent(filePath, content);

symbols = parsed.symbols ?? [];

fileRefs = parsed.refs ?? [];

} catch (err) {

// On parse failure, fall back to an empty symbol/ref set so the worker

// still produces a result instead of signalling a hard error. This

// mirrors the single-threaded behaviour where parse failures degrade

// gracefully rather than skipping the file entirely.

symbols = [];

fileRefs = [];

}

Copilot · 2026-02-06T17:04:09Z

src/core/indexerIncremental.ts

+          content,
+          dim: this.dim,
+          quantizationBits: 8,
+          existingChunkHashes: [],


In the worker-thread path, files are processed with existingChunkHashes: [], so chunks that already exist in LanceDB can be re-inserted on every incremental run. The single-threaded path explicitly queries existing content_hash values and filters them out. Add an equivalent “filter existing hashes” step for the worker-thread path (either by querying LanceDB before t.chunks.add(...) or by providing existing hashes to workers).

Suggested change

existingChunkHashes: [],

existingChunkHashes: Array.from(seenChunkHashes),

Copilot · 2026-02-06T17:04:10Z

src/core/indexerIncremental.ts

+    const tasks: Array<Promise<void>> = [];
+    for (const item of filesToIndex) {
+      const task = (async () => {
+        processed++;
+        this.onProgress?.({ totalFiles: state.totalFiles, processedFiles: processed, currentFile: item.filePosix });
+
+        const content = this.source === 'staged'
+          ? await readStagedFile(this.repoRoot, item.filePosix)
+          : await readWorktreeFile(this.scanRoot, item.filePosix);
+        if (content == null) return;
+
+        const result = await pool.processFile({
+          filePath: item.filePosix,
+          content,
+          dim: this.dim,
+          quantizationBits: 8,
+          existingChunkHashes: [],
+        });
+
+        if (result) mergeResult(result);
+      })();
+      tasks.push(task);
+    }
+
+    await Promise.all(tasks);
+  }


processFilesWithPool starts an async task per file and reads file contents before awaiting pool.processFile(...), which can trigger unbounded parallel file reads (and large in-memory contents) for big change sets. Limit concurrency for the read+dispatch stage (e.g., cap at pool.size or a small multiple, or reuse the queue/active scheduling pattern used in the single-threaded implementation).

Copilot · 2026-02-06T17:04:10Z

src/core/indexerIncremental.ts

+    // Phase B: Process files — use worker threads when enough files, else single-threaded
+    const WORKER_THREAD_MIN_FILES = 20;
+    const useWorkerThreads = filesToIndex.length >= WORKER_THREAD_MIN_FILES;
+    let pool: IndexingWorkerPool | null = null;
+
+    if (useWorkerThreads) {
+      const poolSize = Math.max(1, Math.min(filesToIndex.length, (os.cpus()?.length ?? 2) - 1));
+      pool = IndexingWorkerPool.create({ poolSize });
+    }


The incremental indexer uses a hard-coded WORKER_THREAD_MIN_FILES = 20, while the main indexer uses configurable workerThreadsMinFiles. This makes behavior inconsistent and prevents tuning via config. Consider threading an indexing config into IncrementalIndexOptions (or at least reusing the same default/constant) so both indexers follow the same enablement rules.

Copilot · 2026-02-06T17:04:10Z

src/core/indexing/pool.ts

+  private handleWorkerError(worker: Worker, err: Error): void {
+    // Reject all pending resolvers for this worker (there should be at most 1)
+    // The worker might be dead — remove it and try to replace if pool isn't closing
+    const idx = this.workers.indexOf(worker);
+    if (idx !== -1) {
+      this.workers.splice(idx, 1);
+    }
+    const idleIdx = this.idleWorkers.indexOf(worker);
+    if (idleIdx !== -1) {
+      this.idleWorkers.splice(idleIdx, 1);
+    }
+
+    // Reject any resolvers waiting on this worker's current task
+    for (const [id, entry] of this.resolvers.entries()) {
+      entry.reject(err);
+      this.resolvers.delete(id);
+    }
+  }


handleWorkerError currently rejects and deletes all entries in this.resolvers, even though only one worker errored. This can incorrectly fail unrelated in-flight tasks and leave the pool in an inconsistent state. Track which task id is assigned to which worker (e.g., a Map<Worker, id> or include worker reference in the resolver entry) and only reject that task; consider also respawning a replacement worker or failing queued work deterministically.

Copilot · 2026-02-06T17:04:11Z

src/core/indexing/pool.ts

+  async close(): Promise<void> {
+    if (this.closed) return;
+    this.closed = true;
+    // Reject any queued tasks
+    for (const pending of this.pendingTasks) {
+      pending.reject(new Error('Pool closed before task could be dispatched'));
+    }
+    this.pendingTasks = [];
+    await Promise.all(this.workers.map((w) => w.terminate()));
+    this.workers = [];
+    this.idleWorkers = [];
+  }


close() terminates workers but does not reject/resolve any in-flight tasks stored in this.resolvers. If close() is called while tasks are running, the returned promises can hang indefinitely. Before/while terminating workers, reject all outstanding resolvers (and clear the map) so callers always see completion.

Copilot · 2026-02-06T17:04:11Z

src/core/indexing/parallel.ts

+        const lang = inferIndexLang(filePosix);
+        const existingHashes = existingHashArrayByLang[lang] ?? [];
+
+        const result = await pool.processFile({
+          filePath: filePosix,
+          content,
+          dim: options.dim,
+          quantizationBits: options.indexing.hnswConfig.quantizationBits,
+          existingChunkHashes: existingHashes,
+        });


In the worker-thread path, existingHashes (potentially very large) is sent to the worker for every file. Serializing/transferring large arrays per task can dominate runtime and memory for big repos. Consider removing existingChunkHashes from per-file messages and doing dedupe only on the main thread, or pass a per-language snapshot once via workerData/an init message (or a compact Bloom filter) and keep it in the worker.

Copilot · 2026-02-06T17:04:11Z

src/core/indexing/parallel.ts

 export async function runParallelIndexing(options: ParallelIndexOptions): Promise<ParallelIndexResult> {
+  const { indexing, files } = options;
+  const useThreads =
+    indexing.useWorkerThreads &&
+    files.length >= indexing.workerThreadsMinFiles;
+
+  if (useThreads) {
+    const pool = IndexingWorkerPool.create({ poolSize: Math.max(1, indexing.workerCount) });
+    if (pool) {
+      try {
+        return await runWithWorkerPool(options, pool);
+      } finally {
+        await pool.close();
+      }
+    }
+    // Pool creation failed — fall through to single-threaded path
+  }
+
+  return runSingleThreaded(options);
+}


Current tests for runParallelIndexing won’t exercise the worker-thread path because workerThreadsMinFiles defaults to 50 and the fixtures are smaller. Add/adjust a test to set workerThreadsMinFiles low (and useWorkerThreads: true) to validate worker-thread behavior (including parse-failure fallback and chunk deduplication).

Copilot · 2026-02-06T17:04:11Z

src/core/indexing/worker.ts

+ *   Main  → Worker : WorkerRequest  (file path + content + config)
+ *   Worker → Main  : WorkerResponse (parsed symbols, refs, chunks, AST data)
+ */
+import { parentPort, workerData } from 'worker_threads';


Unused import workerData.

Suggested change

import { parentPort, workerData } from 'worker_threads';

import { parentPort } from 'worker_threads';

- Remove unused workerData import from worker.ts - Add parse failure fallback logic in worker.ts for consistency with single-threaded path - Fix close() to reject in-flight resolvers in pool.ts - Fix handleWorkerError to only reject affected task in pool.ts - Use configurable workerThreadsMinFiles instead of hardcoded value in indexerIncremental.ts - Add missing existingChunkHashes query in indexerIncremental.ts worker path - Add concurrency limit for read+dispatch in indexerIncremental.ts processFilesWithPool - Optimize existingHashes transfer in parallel.ts by removing per-task transfer - Track worker-task mapping via workerTaskIds Map in pool.ts

mars167 · 2026-02-07T10:51:03Z

Semantic Review Report

Repo: https://github.com/mars167/git-ai-cli
Risk Level: MEDIUM
Changed Files: 18
Changed Symbols: 181
Affected Functions: 6

mars167 · 2026-02-07T11:46:58Z

🔍 CodaGraph Semantic Review

📊 整体评估

指标	值
风险等级	🟢 LOW
变更文件	18 files
变更符号	7 symbols
影响函数	12 functions

🎯 整体意见

整体意见生成失败

_{🤖 Powered by CodaGraph — Semantic Code Review}

mars167 · 2026-02-07T13:46:02Z

🔍 CodaGraph Semantic Review

📊 整体评估

指标	值
风险等级	🟠 HIGH
变更文件	18 files
变更符号	8 symbols
影响函数	12 functions

🎯 整体意见

整体意见生成失败: git-ai JSON parse failed: unsupported stdout format

_{🤖 Powered by CodaGraph — Semantic Code Review}

mars167 · 2026-02-07T13:57:44Z

🔍 CodaGraph Semantic Review

📊 整体评估

指标	值
风险等级	🟢 LOW
变更文件	18 files
变更符号	0 symbols
影响函数	12 functions

🎯 整体意见

整体意见生成失败: Cannot read properties of undefined (reading 'map')

_{🤖 Powered by CodaGraph — Semantic Code Review}

mars167 · 2026-02-07T14:14:14Z

🔍 CodaGraph Semantic Review

📊 整体评估

指标	值
风险等级	🟢 LOW
变更文件	18 files
变更符号	0 symbols
影响函数	12 functions

🎯 整体意见

整体意见生成失败: Cannot read properties of undefined (reading 'map')

_{🤖 Powered by CodaGraph — Semantic Code Review}

mars167 · 2026-02-07T14:55:13Z

🔍 CodaGraph Semantic Review

📊 整体评估

指标	值
风险等级	🟢 LOW
变更文件	18 files
变更符号	0 symbols
影响函数	12 functions

🎯 整体意见

解析失败：MiniMax M2.1-lightning 目前尚未纳入 Coding Plan。请使用 MiniMax M2.1 。在算力允许的情况下，我们会自动将 Coding Plan 会话升级为接近 lightning 的体验。

_{🤖 Powered by CodaGraph — Semantic Code Review}

mars167 · 2026-02-07T15:23:10Z

🔍 CodaGraph Semantic Review

📊 整体评估

指标	值
风险等级	🟡 MEDIUM
变更文件	18 files
变更符号	8 symbols
影响函数	12 functions

🎯 整体意见

PR 实现的多线程索引功能整体架构合理，但 pool.ts 存在代码重复的明显 bug，Promise.all() 的错误处理设计会降低系统容错性。建议修复代码重复后改用 Promise.allSettled()，并补充错误日志后再合并。

⚠️ Top Risks

[object Object]
[object Object]
[object Object]

_{🤖 Powered by CodaGraph — Semantic Code Review}

mars167 · 2026-02-07T15:57:24Z

🔍 CodaGraph Semantic Review

📊 整体评估

指标	值
风险等级	🟡 MEDIUM
变更文件	18 files
变更符号	4 symbols
影响函数	12 functions

🎯 整体意见

PR实现多线程并行索引功能方向正确，但 pool.ts 存在代码重复的严重bug，且 indexer.ts 的并行写入存在竞态风险。建议修复这两个问题后再重新提交审查。

⚠️ Top Risks

(🔴 CRITICAL)
(🔴 CRITICAL)
(🟠 WARNING)

_{🤖 Powered by CodaGraph — Semantic Code Review}

mars167 · 2026-02-07T17:26:33Z

🔍 CodaGraph Semantic Review

📊 整体评估

指标	值
风险等级	🟠 HIGH
变更文件	18 files
变更符号	12 symbols
影响函数	12 functions

🎯 整体意见

PR 引入 worker_threads 并行索引功能，但存在严重代码质量问题：pool.ts 有重复代码、worker.ts 有语法错误、parallel.ts 硬编码参数破坏逻辑完整性。建议修复所有阻断性问题后再合并。

⚠️ Top Risks

src/core/indexing/pool.ts:186-188 存在重复代码，可能导致意外的 return 行为和运行时错误
src/core/indexing/worker.ts:39-40 语法错误（return 语句后的额外字符），Worker 线程无法正常启动
src/core/indexing/parallel.ts:145-151 existingChunkHashes 硬编码为空数组，破坏去重逻辑导致数据重复

_{🤖 Powered by CodaGraph — Semantic Code Review}

mars167 · 2026-02-07T17:46:09Z

src/core/indexing/parallel.ts

+          quantizationBits: options.indexing.hnswConfig.quantizationBits,
+          existingChunkHashes: [],
+        });
+


?

Suggested change

88

mars167 · 2026-02-08T07:13:11Z

Review completed by CodaGraph AI Agent.

⚠️ Detailed Comments (Fallback)

Note: Could not post inline comments due to GitHub API restrictions (e.g. lines outside diff context).

src/core/indexer.ts:208

⚠️ WARNING: Promise.all 缺少错误处理

使用 Promise.all 时，如果任何一个语言的写入操作失败，整个索引操作会立即失败并抛出异常。建议使用 Promise.allSettled 来隔离错误，确保部分语言写入失败不会影响其他语言的写入

建议: 使用 Promise.allSettled 替代 Promise.all，并在完成后聚合结果

const results = await Promise.allSettled(languages.map(async (lang) => {
  const t = byLang[lang];
  if (!t) return;
  const chunkRows = chunkRowsByLang[lang] ?? [];
  const refRows = refRowsByLang[lang] ?? [];
  if (chunkRows.length > 0) await t.chunks.add(chunkRows as unknown as Record<string, unknown>[]);
  if (refRows.length > 0) await t.refs.add(refRows as unknown as Record<string, unknown>[]);
  addedByLang[lang] = { chunksAdded: chunkRows.length, refsAdded: refRows.length };
}));

// 可选：记录失败的语言
const failed = results.filter(r => r.status === 'rejected');
if (failed.length > 0) {
  console.warn(`部分语言写入失败: ${failed.length} 个`);
}

src/core/indexing/config.ts:19

💡 SUGGESTION: 缺少 workerThreadsMinFiles 的值约束

workerThreadsMinFiles 类型定义为 number，未限制最小值，可能导致用户配置 0、负数或浮点数等无效值

建议: 考虑使用更严格的类型或添加验证逻辑，建议最小值为 1

workerThreadsMinFiles: number;

src/core/indexing/config.ts:20

💡 SUGGESTION: JSDoc 可补充选项间的互斥关系

当 useWorkerThreads 为 false 时，workerThreadsMinFiles 设置无实际意义，但文档未说明此关系

建议: 在 JSDoc 中说明：'仅在 useWorkerThreads 为 true 时生效'

/** Minimum number of files before enabling worker threads (avoid startup overhead for small repos). Only effective when useWorkerThreads is true. */

mars167 · 2026-02-08T08:11:18Z

Review completed by CodaGraph AI Agent.

⚠️ Detailed Comments (Fallback)

Note: Could not post inline comments due to GitHub API restrictions (e.g. lines outside diff context).

src/core/indexing/config.ts:42

⚠️ WARNING: 缺少参数验证逻辑

新参数 workerThreadsMinFiles 缺少最小值验证，应确保为正整数

建议: 添加参数验证，例如：workerThreadsMinFiles: Math.max(1, 50)

useWorkerThreads: true,
workerThreadsMinFiles: Math.max(1, 50),

src/core/indexing/config.ts:19

💡 SUGGESTION: 类型可细化

workerThreadsMinFiles 使用 number 类型，但实际应为正整数（>=1）

建议: 考虑添加类型别名或文档注释约束取值范围

/** Minimum number of files before enabling worker threads (avoid startup overhead for small repos). @minimum 1 */
workerThreadsMinFiles: number;

src/core/indexing/parallel.ts:54

⚠️ WARNING: 缺少 workerCount 的类型检查

indexing.workerCount 可能为 undefined，导致 poolSize 变成 NaN

建议: 使用空值合并运算符确保有效的 poolSize: poolSize: Math.max(1, indexing.workerCount ?? 1)

const pool = IndexingWorkerPool.create({ poolSize: Math.max(1, indexing.workerCount ?? 1) });

src/core/indexing/parallel.ts:78

⚠️ WARNING: 缺少 batchSize 的类型检查

options.indexing.batchSize 可能为 undefined，导致 batchSize 变为 NaN

建议: 添加空值检查: const batchSize = Math.max(1, options.indexing.batchSize ?? 1);

const batchSize = Math.max(1, options.indexing.batchSize ?? 1);

src/core/indexing/parallel.ts:127

⚠️ WARNING: 数组 splice 操作可能影响性能

在循环中使用 splice 修改数组会导致每次移动元素，时间复杂度为 O(n²)

建议: 使用索引遍历替代 splice: const batch = pendingFiles.slice(i, i + batchSize);

// 使用 for 循环配合 slice:
for (let i = 0; i < pendingFiles.length; i += batchSize) {
  const batch = pendingFiles.slice(i, i + batchSize);
  // ...
}

src/core/indexing/parallel.ts:150

⚠️ WARNING: existingChunkHashes 被硬编码为空数组

在 worker 处理文件时，existingChunkHashes 传入了空数组，但选项中有 options.existingChunkIdsByLang，语义不一致可能导致重复处理

建议: 考虑是否需要传递现有 chunk hash 给 worker 以支持增量索引

// 如果需要传递，应该使用:
existingChunkHashes: options.existingChunkIdsByLang[lang] || [],

src/core/indexing/parallel.ts:166

⚠️ WARNING: Promise.all 不会处理单个任务失败

如果批量中任何一个文件处理失败，整个 Promise.all 会 reject，导致大量已完成的工作被丢弃

建议: 使用 Promise.allSettled 并收集结果，或实现部分失败处理:
const results = await Promise.allSettled(tasks);
// 处理 fulfilled/rejected 情况

// 示例:
const results = await Promise.allSettled(tasks);
for (const result of results) {
  if (result.status === 'rejected') {
    console.error('File processing failed:', result.reason);
  }
}

src/core/indexing/parallel.ts:327

⚠️ WARNING: 代码重复 - MemoryMonitor 初始化

runWithWorkerPool (L74) 和 runSingleThreaded (L176) 都创建了 MemoryMonitor 实例，逻辑相同但代码重复

建议: 考虑提取公共函数 createMemoryMonitor(options): MemoryMonitor

function createMemoryMonitor(options: ParallelIndexOptions): MemoryMonitor {
  return MemoryMonitor.fromErrorConfig(options.errorHandling, options.indexing.memoryBudgetMb);
}

src/core/indexing/parallel.ts:95

💡 SUGGESTION: seenChunkHashes 初始化逻辑可简化

循环初始化 Map 的模式可以更简洁

建议: 使用 Object.entries 简化初始化

const seenChunkHashes = new Map<IndexLang, Set<string>>(
  Object.entries(options.existingChunkIdsByLang).map(([lang, ids]) => [lang as IndexLang, new Set(ids)])
);

src/core/indexing/worker.ts:54

⚠️ WARNING: 文件语言识别覆盖不全

inferIndexLang 只支持有限的扩展名，未识别的文件默认返回 'ts'，可能导致 C++/C#/JS 等文件被错误识别

建议: 补充更多语言识别或从 parser 本身获取语言信息

function inferIndexLang(file: string): string {
  const ext = file.split('.').pop()?.toLowerCase();
  const map: Record<string, string> = {
    md: 'markdown', mdx: 'markdown',
    yml: 'yaml', yaml: 'yaml',
    java: 'java',
    c: 'c', h: 'c', hpp: 'c',
    cpp: 'cpp', cc: 'cpp', cxx: 'cpp',
    go: 'go',
    py: 'python',
    rs: 'rust',
    js: 'javascript', jsx: 'javascript',
    ts: 'ts', tsx: 'ts',
  };
  return map[ext || ''] || 'ts';
}

src/core/indexing/worker.ts:164

⚠️ WARNING: scope span 计算逻辑错误

计算 span 时 scope.endLine - scope.startLine 少算了 1 行，可能导致边界情况判断不准

建议: 改为 scope.endLine - scope.startLine + 1

const span = scope.endLine - scope.startLine + 1;

src/core/indexing/worker.ts:83

💡 SUGGESTION: 解析失败静默忽略

catch 块捕获所有错误但不记录日志，无法追踪哪些文件解析失败，影响问题排查

建议: 在开发环境记录错误日志，或收集到错误列表中返回

} catch (err) {
  console.error(`Failed to parse ${filePath}:`, err);
  // 保留错误信息供调用方排查
  return { ...processFileResult, error: String(err) };
}

src/core/indexing/worker.ts:135

⚠️ WARNING: hashEmbedding 错误未处理

如果 hashEmbedding 因维度过大或非法参数失败，函数会直接抛出，导致整个文件处理中断

建议: 在 embedding 调用外层添加 try-catch，失败时跳过该 symbol 的向量化

try {
  const vec = hashEmbedding(text, { dim });
  const q = quantizeSQ8(vec, quantizationBits);
  // ...push chunk
} catch {
  // 跳过无法向量化的 chunk，或使用默认值
  console.warn(`Failed to embed symbol ${s.name} in ${filePath}`);
}

src/core/indexing/worker.ts:142

💡 SUGGESTION: Buffer 转换假设未验证

假设 q.q 是可转换为 Buffer 的类型，如果 quantizeSQ8 返回其他类型可能失败

建议: 显式断言类型或添加运行时检查

qvec_b64: Buffer.from(q.q as Uint8Array).toString('base64'),

src/core/indexing/worker.ts:119

📝 NIT: container 构建未复用 buildChunkText

container 的 text 使用了不同的构建方式，与 buildChunkText 不一致

建议: 复用 buildChunkText 函数确保格式统一

if (s.container) {
  const cText = buildChunkText(filePath, s.container);
  // ...
}

mars167 · 2026-02-08T12:31:52Z

Review completed by CodaGraph AI Agent.

⚠️ Detailed Comments (Fallback)

Note: Could not post inline comments due to GitHub API restrictions (e.g. lines outside diff context).

src/core/indexing/config.ts:5

⚠️ WARNING: 缺少边界验证

workerThreadsMinFiles 缺少最小值限制，设置为 0 或负数会导致逻辑错误

建议: 添加最小值验证或使用联合类型限制范围，例如 workerThreadsMinFiles: number & { minimum: 1 }

workerThreadsMinFiles: number & { minimum: 1 };

src/core/indexing/config.ts:7

💡 SUGGESTION: 注释位置与代码不一致

L18 的注释描述的是 L19 的字段，但被 L17 的注释隔开，影响可读性

建议: 将两个字段的注释合并或调换顺序，使注释与字段对齐

/** Enable true multi-threading via worker_threads for CPU-bound operations.
 * Minimum number of files before enabling worker threads (avoid startup overhead for small repos).
 */
useWorkerThreads: boolean;
workerThreadsMinFiles: number;

src/core/indexing/config.ts:15

📝 NIT: 硬编码的魔法数字

50 作为默认阈值是魔法数字，建议提取为常量或提供配置说明

建议: 将默认值改为常量 const DEFAULT_WORKER_THREADS_MIN_FILES = 50;

src/core/indexing/worker.ts:87

⚠️ WARNING: Empty catch block silently swallows parse errors

解析失败时捕获所有异常但不做任何处理，会导致难以调试的问题。当 tree-sitter 解析器失败时，可能是有 bug 或不兼容的语法，但开发者无法得知具体原因。

建议: 至少记录错误信息，或添加条件日志：

} catch (err) {
  // 临时调试：生产环境可改为 console.debug
  console.warn(`Failed to parse ${filePath}:`, err instanceof Error ? err.message : err);
  symbols = [];
  fileRefs = [];
}

} catch {
    // On parse failure, fall back to empty symbol/ref set.
    // This mirrors single-threaded behavior where parse failures don't skip the file.
    symbols = [];
    fileRefs = [];
  }

src/core/indexing/worker.ts:54

⚠️ WARNING: Duplicated helper function from parallel.ts

inferIndexLang 函数与 parallel.ts 中的实现重复，维护时容易出现不一致。代码注释也已说明这是为了避免导入问题而做的临时处理。

建议: 将公共 helper 提取到独立的 shared 模块（如 src/core/indexing/helpers.ts），然后统一导入使用。

function inferIndexLang(file: string): string {
  // ... 实现
}

src/core/indexing/worker.ts:65

⚠️ WARNING: Duplicated helper function from parallel.ts

buildChunkText 函数同样与 parallel.ts 重复，存在相同的一致性问题。

建议: 与 inferIndexLang 一同提取到共享模块。

function buildChunkText(file: string, symbol: { name: string; kind: string; signature: string }): string {
  return `file:${file}\nkind:${symbol.kind}\nname:${symbol.name}\nsignature:${symbol.signature}`;
}

src/core/indexing/worker.ts:160

⚠️ WARNING: pickScope 函数 span 计算可能不符合预期

span 计算使用 endLine - startLine，但实际选择 scope 时优先选择 span 最小的。逻辑上看，span 最小的 scope 应该是最内层的嵌套函数/方法，这个逻辑是对的。但变量命名 best.span 不够直观，注释说明会更好。

建议: 添加注释说明选择最小 span 的原因：

// 选择行范围最小的调用者 scope（即最内层的函数/方法）
const pickScope = (line: number): string => {
  // ...
}

const pickScope = (line: number): string => {

src/core/indexing/worker.ts:202

📝 NIT: 不必要的非空断言操作符

parentPort!.postMessage 使用了非空断言 (!)，但第 195 行已经确认 parentPort 存在。可以移除断言以提高代码可读性。

建议: ```typescript
parentPort.postMessage(response);


```suggestion
parentPort!.postMessage(response);

src/core/indexing/worker.ts:206

📝 NIT: 不必要的非空断言操作符

同上，parentPort!.postMessage 可以简化为 parentPort.postMessage。

建议: ```typescript
parentPort.postMessage(response);


```suggestion
parentPort!.postMessage(response);

mars and others added 3 commits February 7, 2026 00:00

chore: add .cursor/ to gitignore

6ad7643

Co-authored-by: Cursor <cursoragent@cursor.com>

mars167 requested a review from Copilot February 6, 2026 16:57

Copilot started reviewing on behalf of mars167 February 6, 2026 16:57 View session

Copilot AI reviewed Feb 6, 2026

View reviewed changes

mars167 commented Feb 7, 2026

View reviewed changes

mars167 merged commit 2501082 into main Feb 8, 2026
1 check passed

mars167 deleted the feat/parallel-indexing branch February 8, 2026 12:36

-  const parsed = parser.parseContent(filePath, content);
-  const symbols: SymbolInfo[] = parsed.symbols;
-  const fileRefs: AstReference[] = parsed.refs;
+  let symbols: SymbolInfo[] = [];
+  let fileRefs: AstReference[] = [];
+  try {
+    const parsed = parser.parseContent(filePath, content);
+    symbols = parsed.symbols ?? [];
+    fileRefs = parsed.refs ?? [];
+  } catch (err) {
+    // On parse failure, fall back to an empty symbol/ref set so the worker
+    // still produces a result instead of signalling a hard error. This
+    // mirrors the single-threaded behaviour where parse failures degrade
+    // gracefully rather than skipping the file entirely.
+    symbols = [];
+    fileRefs = [];
+  }

	existingChunkHashes: [],
	existingChunkHashes: Array.from(seenChunkHashes),

	import { parentPort, workerData } from 'worker_threads';
	import { parentPort } from 'worker_threads';

Conversation

mars167 commented Feb 6, 2026

Summary

Uh oh!

chatgpt-codex-connector bot commented Feb 6, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

mars167 commented Feb 7, 2026

Semantic Review Report

Uh oh!

mars167 commented Feb 7, 2026

🔍 CodaGraph Semantic Review

📊 整体评估

🎯 整体意见

Uh oh!

mars167 commented Feb 7, 2026

🔍 CodaGraph Semantic Review

📊 整体评估

🎯 整体意见

Uh oh!

mars167 commented Feb 7, 2026

🔍 CodaGraph Semantic Review

📊 整体评估

🎯 整体意见

Uh oh!

mars167 commented Feb 7, 2026

🔍 CodaGraph Semantic Review

📊 整体评估

🎯 整体意见

Uh oh!

mars167 commented Feb 7, 2026

🔍 CodaGraph Semantic Review

📊 整体评估

🎯 整体意见

Uh oh!

mars167 commented Feb 7, 2026

🔍 CodaGraph Semantic Review

📊 整体评估

🎯 整体意见

⚠️ Top Risks

Uh oh!

mars167 commented Feb 7, 2026

🔍 CodaGraph Semantic Review

📊 整体评估

🎯 整体意见

⚠️ Top Risks

Uh oh!

mars167 commented Feb 7, 2026

🔍 CodaGraph Semantic Review