Skip to content

Setup / first-run issues: broken post-commit hook, model download stalls, /tmp cache wipe, silent lazy index #67

@AZagatti

Description

@AZagatti

What happened?

Filing as a group because they're all in the install / first-run / setup path and I hit them in sequence while setting CCE up. Used Claude Code as a debug pair to dig into each one. Will split if you'd prefer, just felt spammy to file as five separate issues.

1. cce init installs a post-commit hook that errors silently on every commit.

The hook script written to .git/hooks/post-commit calls:

cce index --changed-only >/dev/null 2>&1 &

But in v0.4.19, cce index only accepts --full and --path. The --changed-only flag doesn't exist:

$ cce index --changed-only
Usage: cce index [OPTIONS]
Try 'cce index --help' for help.

Error: No such option: --changed-only

Output is >/dev/null 2>&1 so the error is invisible. My commits looked like they were keeping the index up to date but nothing was happening. The flag is hardcoded in src/context_engine/indexer/git_hooks.py:36 — confirmed against the v0.4.19 source on main.

I think cce index (no flag) already does incremental indexing of changed files, so just dropping --changed-only from the hook template should fix it.

2. The model download via huggingface_hub has no timeout and stalls indefinitely.

On a fresh WSL my first cce serve (which preloads the embedding model) hung for several minutes with 5 ESTABLISHED IPv6 connections to the HF CDN but zero bytes downloaded into the ONNX blob file. Same machine, same minute, curl of the same URL pulled the 66 MB ONNX in under 5 seconds:

$ curl -sL -o /tmp/m.onnx -w "size=%{size_download} time=%{time_total}\n" \
    https://huggingface.co/qdrant/bge-small-en-v1.5-onnx-q/resolve/main/model_optimized.onnx
size=66465124 time=4.736027

I had this happen on two different WSL boots in different network conditions, once over IPv4 (stuck in TCP SYN-SENT) and once over IPv6 (5 sockets ESTABLISHED but no bytes transferring). I think the underlying issue is just the lack of a timeout / retry budget in TextEmbedding(resolved) at indexer/embedder.py:64. huggingface_hub will happily wait forever.

Worse: when the download stalls partway, fastembed has already created the snapshot directory with the small config blobs, but model_optimized.onnx exists as a 0-byte .incomplete file. Every subsequent cce serve / cce search / cce index then crashes immediately on:

RuntimeError: Failed to load embedding model 'BAAI/bge-small-en-v1.5'. ...
Original error: [ONNXRuntimeError] : 3 : NO_SUCHFILE : Load model from
/tmp/fastembed_cache/.../snapshots/.../model_optimized.onnx failed:
Load model ... failed. File doesn't exist

The broken state is sticky until you manually rm -rf the cache.

3. Default cache lives in /tmp/fastembed_cache, which WSL wipes on reboot.

fastembed's default cache_dir is Path(tempfile.gettempdir()) / "fastembed_cache"/tmp/fastembed_cache on Linux. CCE never overrides this — TextEmbedding(resolved) is called without a cache_dir arg.

On Ubuntu under WSL with systemd=true (the default), /usr/lib/tmpfiles.d/tmp.conf contains:

D /tmp 1777 root root 30d

The D directive empties /tmp recursively whenever systemd-tmpfiles-setup.service runs, which is on every boot. So even when the model download succeeds, the next WSL restart wipes it and the next CCE start has to redownload — which can hit issue #2 again.

Configuration.md doesn't mention FASTEMBED_CACHE_PATH. The fix would be passing cache_dir=Path.home()/".cache"/"fastembed" (or anywhere persistent) to TextEmbedding, plus documenting it.

4. Hook script blocks 1-2 seconds per call when serve.port is stale.

If cce serve ever dies but leaves its serve.port file behind in ~/.cce/projects/<basename>/, every hook fires curl -m 1 (or -m 2 for SessionStart) to a port nothing is listening on, waits for the full timeout, and exits silently. I timed it with a real stale port file:

$ time ~/.cce/hooks/cce_hook.sh PostToolUse < /dev/null
real    1.020s
$ time ~/.cce/hooks/cce_hook.sh SessionStart < /dev/null
real    2.012s

That's 1-2s of dead wait per Claude Code hook event. Long sessions fire hundreds of PostToolUse and UserPromptSubmit hooks, so this accumulates pretty fast. A quick bash -c "exec 3<>/dev/tcp/127.0.0.1/$PORT" 2>/dev/null liveness probe before the curl would skip the wait. Or cce serve should remove its port file on shutdown.

5. First context_search MCP call silently triggers a full project re-index if the index is empty.

I verified this by talking to a fresh cce serve directly via MCP stdio. With an empty project (no cce init, 0 chunks in the vector store), sending tools/call: context_search enters _ensure_indexed() at integration/mcp_server.py:886-903, which silently calls run_indexing(self._config, self._project_dir, full=False). The MCP search request blocks while indexing runs and only returns once it's done.

For a tiny project (3 files, ~9 chunks) this was just a one-time embedder load (cce serve RSS jumped 310 → 421 MB) with no forkserver pool. For a project the size of mine I'd expect it to spawn the same 4-worker pool that cce index does — same code path. I didn't directly measure that case because I didn't want to risk OOMing my WSL while testing.

The user-facing problem is the silence. From the MCP client's side it looks like context_search is "taking unusually long" with no progress indicator and no warning that an indexing pass is happening underneath. A response like "Index empty; indexing in background, retry in ~N seconds" would be much friendlier than blocking silently.

Small related thing: cce search (CLI) doesn't inherit FASTEMBED_CACHE_PATH from cce serve's env. If I set the env in opencode's environment block for the MCP config (so cce serve uses my persistent cache), cce search from my own shell still falls back to /tmp unless I also export the var globally. Minor footgun, but it caught me out when I was testing.

What did you expect?

  1. Hook installed by cce init to work on first commit (drop the dead --changed-only flag).
  2. Model download to bound its wait and fail loudly with a useful error instead of hanging indefinitely.
  3. Cache to survive reboot (default to $HOME/.cache/fastembed or read FASTEMBED_CACHE_PATH from CCE config explicitly).
  4. Hook script to either probe the port for liveness first, or for cce serve to clean up its serve.port on exit.
  5. First context_search on an empty index to either fail with a clear "run cce init first" message, or to return an immediate "indexing in background" response instead of blocking.

Steps to reproduce

For #1 — clean repro, any project:

$ cd /any/project && cce init
$ cat .git/hooks/post-commit
#!/bin/sh
# cce hook
/path/to/cce index --changed-only >/dev/null 2>&1 &
$ cce index --changed-only
Error: No such option: --changed-only

For #2 / #3 — depends on network luck:

$ rm -rf /tmp/fastembed_cache       # or wait for WSL to reboot
$ cce serve --project-dir /any/project
# most of the time this works in a few seconds
# sometimes it hangs at "Fetching 5 files: 20%" with ss showing
# established connections to HF CDN but zero progress

When it does hang, the leftover cache contents will look like:

blobs/0d7726d0... (config.json, 706 B)
blobs/688882a7... (tokenizer.json, 711 KB)
blobs/75305659... (tokenizer_config.json, 1.2 KB)
blobs/9bbecc17... (special_tokens_map.json, 695 B)
blobs/51f1bd0...incomplete (0 B)   ← the missing ONNX

Every subsequent CCE invocation will then crash on the missing ONNX until manually cleaned.

For #4 — easy:

# start any cce serve, kill it
cce serve --project-dir /any/project &
SERVE_PID=$!
sleep 5
kill -9 $SERVE_PID
# serve.port still in ~/.cce/projects/<basename>/

time ~/.cce/hooks/cce_hook.sh PostToolUse < /dev/null
# ~1 second wasted per call

For #5 — empty project:

mkdir -p /tmp/empty-test/src
echo "def foo(): pass" > /tmp/empty-test/src/a.py
cce serve --project-dir /tmp/empty-test
# don't run cce init — leave index empty
# from another shell or MCP client, call tools/call: context_search
# watch the request block while run_indexing runs silently

Relevant logs or error output

Debug logs from the investigation (gist):
https://gist.github.com/AZagatti/7393f669a0fd785d7153e07a52a11127

Most relevant for this issue:

  • 01-syn-sent-hang.log — first stuck-download repro (IPv4 SYN-SENT)
  • 02-stuck-on-broken-cache.log — second one, 5 ESTABLISHED IPv6 sockets and 0 bytes
  • 05-healthy-serve-startup.log — what a healthy startup looks like once the model is pre-staged

Python version

3.13.5

OS

Ubuntu 24.04 LTS on WSL2 (kernel 6.6.87.2-microsoft-standard-WSL2)

CCE version

0.4.19

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions