perf(watchdog): batch registrations, exclude noise directories, and 9p mount exclude#1621
perf(watchdog): batch registrations, exclude noise directories, and 9p mount exclude#1621linkliti wants to merge 2 commits into
Conversation
…kip 9p mounts Add batch_watchdogs() context manager to defer observer refreshes until all watchdog registrations complete, eliminating redundant reschedule cycles during initialization. Switch from recursive=True to per-directory recursive=False scheduling with os.walk pruning so noise folders like __pycache__, .venv, and node_modules are never observed in the first place. Refresh the observer on directory create and move events to pick up new subdirectories under non-recursive watches. Extract a reusable get_noise_folders() into helpers/exclusion.py covering version control, build output, caches, and language-specific directories, replacing the hardcoded __pycache__ ignore patterns in watchdog defaults. Detect 9p remote mounts via /proc/mounts and skip them in add_watchdog since inotify produces zero events on 9p filesystems.
|
Hi @linkliti — thanks for this PR, the noise-folder exclusion and 9p mount handling are nice quality-of-life wins. I want to flag a related correctness issue I just root-caused independently in Symptom: full startup deadlock (not just "slow")On a stock build of Root cause: AB-BA lock-ordering bug
The author of the original code already knew the right pattern — py-spy evidence (from a wedged pid)MainThread holds How this PR interacts with the bugThis PR introduces the elegant if not self._batching:
self._refresh_observer()During startup (which is the only place anyone reliably hit the wedge today), that branch is skipped — so this PR eliminates the symptom at boot, and that's why it likely already "works" in your testing. However, the underlying lock-ordering bug is still latent — the gated call (and the equivalent ungated paths after Proposed amendment to this PRThree tiny dedents apply the def add(...):
with self._lock:
...
self._watches.update(watches)
self._watch_ids_by_group[id] = set(watches)
- if not self._batching:
- self._refresh_observer()
+ if not self._batching:
+ self._refresh_observer()
def remove(self, id: str) -> bool:
with self._lock:
...
- if removed and not self._batching:
- self._refresh_observer()
- return removed
+ if removed and not self._batching:
+ self._refresh_observer()
+ return removed
def clear(self) -> None:
with self._lock:
...
pending_batches = list(self._pending_batches.values())
self._pending_batches.clear()
- if not self._batching:
- self._refresh_observer()
+ if not self._batching:
+ self._refresh_observer()
for pending in pending_batches:
...Net diff: 3 lines moved up one indent level. No semantic change to your perf work; just closes the AB-BA window for the non-batch path. The new Also worth noting: the new Reproducer
If you'd prefer to keep this PR scope-locked to perfHappy to open a small follow-up PR (1 file, 6 lines) immediately after this one merges, referencing this discussion. Just let me know your preference — I have the patch and reproducer ready to go either way. Thanks again for tackling the watchdog perf work — once the lock fix is in, this whole subsystem will be substantially more robust. |
…ove/clear Dedent _refresh_observer() calls out of with self._lock: blocks in _WatchRegistry.add(), remove(), and clear() to break an AB-BA deadlock between _WatchRegistry._lock (L1) and BaseObserver._lock (L2) in `/opt/venv-a0/lib/python3.12/site-packages/watchdog/observers/api.py`. The main thread held L1 while calling unschedule_all() which needs L2; the observer dispatch thread held L2 while calling dispatch() which needs L1.
A0 startup was painfully slow on my machine. On Win 11 setup with A0 and Docker both on HDD, registering watchdogs took over 11 minutes because the observer was scheduling watches across 60k files and 8k folders (mostly because of .git and .venv).
Three changes fix this:
/a0/usris often a 9p bind mount from the Windows host. inotify (used by watchdog) produces zero events on 9p (confirmed with standalone tests: InotifyObserver fires nothing, while PollingObserver catches everything). Detecting 9p via/proc/mountsand skipping those roots inadd_watchdogdrops the total to under 1 second for WSL mount since no events produced anyway (Related: [WSL2] File changes made by Windows apps on Windows filesystem don't trigger notifications for Linux apps microsoft/WSL#4739).Benchmarks of _10_register_watchdogs.py (_time_travel plugin was disabled, but improvements also affect its watchdog):