Add lazy spaCy CLI loading and static launcher by honnibal · Pull Request #13933 · explosion/spaCy

honnibal · 2026-03-09T12:52:45Z

The spacy CLI takes ages to start the first time you run it because it loads everything, and it's still kind of slow subsequently. This has always sucked a bit, but it will suck especially in agentic coding workflows.

This PR tries to address the issue by adding a second package spacy_cli that will be bundled into the same PyPi distribution (spacy). The spacy entrypoint will be provided from the lightweight spacy_cli package so that help can run instantly.

Wrap yield in try/finally in StringStore.memory_zone and Vocab.memory_zone so transient state is always cleaned up, even when an exception propagates through the context manager.

* change typer-slim dependency to typer * set rich_markup_mode to None to preserve behaviour

Replace all pydantic.v1 compat imports with direct pydantic v2 imports. Migrate schemas to v2 API: ConfigDict instead of inner Config class, field_validator instead of validator, RootModel instead of __root__, model_dump() instead of dict(), model_validate() instead of parse_obj(), Annotated[str, StringConstraints()] instead of ConstrainedStr, min_length instead of min_items, populate_by_name instead of allow_population_by_field_name.

- Replace pydantic.v1 compat imports with direct v2 imports - Replace class Config with model_config = ConfigDict(...) - Replace @validator with @field_validator - Replace ConstrainedStr with constr() - Replace min_items with min_length, allow_population_by_field_name with populate_by_name - Add model_rebuild() calls in __init__.py for forward ref resolution - Update test error type assertions for v2

…pe annotation - Update expected error counts in test_pattern_validation.py for pydantic v2 (v2 reports errors for all union members, increasing counts for OP and nested pattern validation) - Fix AttributeRulerPatternType to include List[MatcherPatternType] in the union (v2 is strict about nested list-of-list-of-dict types that v1 accepted laxly)

- requirements.txt: remove black, isort, flake8; add ruff - pyproject.toml: replace [tool.isort] with [tool.ruff] config - setup.cfg: remove [flake8] section (rules moved to pyproject.toml) - .pre-commit-config.yaml: replace black/flake8 hooks with ruff/ruff-format

…ttings

Use confection v1.3 and Thinc v8.3.13, which implement custom validation logic in place of Pydantic, allowing us to properly adopt Pydantic v2 and provide full Python 3.14 support. Our dependency tree used Pydantic v1 in unusual ways, and relied on behaviours that Pydantic v2 reformed. In the time since Pydantic v2 was released there were a few attempts to migrate over to it, but the task has been complicated by the fact that the confection library has a fairly tangled implementation and I had reduced availability for open-source work in 2024 and 2025. Specifically, our library confection provides the extensible configuration system we use in spaCy and Thinc. The config system allows you to refer to values that will be supplied by arbitrary functions, that e.g. define some neural network model or its sublayers. The functionality in confection is complicated because we aggressively prioritised user experience in the specification, even if it required increased implementation complexity. Confection's original implementation built a dynamic Pydantic v1 schema for function-supplied values ("promises"). We validate the schema before calling any promises, and then validate the schema again after calling all the promises and substituting in their values. The variable-interpolation system adds further difficulties to the implementation, and we have to do it all subclassing the Python built-in configparser, which ties us to implementation choices I'd do differently if I had a clean slate. Here's one summary of Pydantic v1-specific behaviours that the migration to v2 particularly difficult for us. This particular summary was produced during a session with Claude Code Opus 4.6, so nuances of it might be wrong. The full history of attempts at doing this spans over different refactors separated by a few months at a time, so I don't have a full record of all the things that I struggled with. It's possible some details of this summary are incorrect though. The core problem we kept hitting: Pydantic v2 compiles validation schemas upfront and has much stricter immutability. The whole session has been a series of workarounds for this: ``` 1. Schema mutation — v1 let you mutate __fields__ in place; v2 needs model_rebuild() which loses forward ref namespaces, or create_model subclasses which don't propagate to parent schemas. 2. model_dump vs dict — v2 converts dataclasses to dicts, breaking resolved objects. Needed a custom _model_to_dict helper. 3. model_construct drops extras — v2 silently drops fields with extra="forbid", needed manual workarounds. 4. Strict coercion — v2 coerces ndarray to List[Floats1d] via iteration, needed strict=True. 5. Forward refs — Every schema with TYPE_CHECKING imports needs model_rebuild() with the right namespace, and that breaks when confection re-rebuilds later. In order to adjust for behavioural differences like this, I'd refactored confection to build the different versions of the schema in multiple passes, instead of building all the representations together as we'd been doing. However this refactor itself had problems, further complicating the migration. ``` ~I've now bitten the bullet and rolled back the refactor I'd been attempting of confection, and instead replaced the Pydantic validation with custom logic. This allows Confection to remove Pydantic as a dependency entirely.~ Update: Actually I went back and got the refactor working. All much nicer now. I've taken some lengths to explain this because migrating off a dependency after breaking changes can be a sensitive topic. I want to stress that the changes Pydantic made from v1 to v2 are very good, and I greatly appreciate them as a user of FastAPI in our services. It would be very bad for the ecosystem if Pydantic pinned themselves to exactly matching the behaviours they had in v1 just to avoid breaking support for the sort of thing we'd been doing. Instead users who were relying on those behaviours like us should just find some way to adapt --- either vendor the v1 version we need, or change our behaviours, or implement an alternative. I would have liked to do this sooner but we've ultimately gone with the third option.

…faster-cli

- setup.py: rename loop variable shadowing parameter (B020) - _util.py: remove unused registry import (F401), use specific except clause (E722, B904) - test_cli_app.py: use dict literals instead of dict() (C408) - main.py: extract _try_static_group to reduce complexity (C901)

honnibal and others added 30 commits March 9, 2026 13:12

Add lazy spaCy CLI loading and static launcher

aa17eb9

Fix lazy load on modules where the function shadows

c7d7a72

Update manifest

126deac

Fix manifest

5c559fc

fix: ensure memory_zone cleanup runs on exception (#13924) (#13932)

cfa1d3a

Wrap yield in try/finally in StringStore.memory_zone and Vocab.memory_zone so transient state is always cleaned up, even when an exception propagates through the context manager.

Update test_cli_launcher

4168448

Switch dependency back from typer-slim to typer (#13922)

37b4a74

* change typer-slim dependency to typer * set rich_markup_mode to None to preserve behaviour

Require weasel 1.0

2f6142b

Allow use of uv as a fallback to pip in spacy download

c6c78d6

Require confection

ed20f79

Fix vuln scan by not calling test file requirements requirements.txt

f22ff91

isort

d41afc2

Escape braces in TokenPatternOperatorMinMax regex for Rust regex engine

2afc3fd

Revert to confection <1 and allow pydantic v1

60a19cb

Revert to weasel <0.5

4f19800

Revert pydantic v2 migration, restore v1 compat imports

3154ede

Update spaCy pydantic imports from v1 compat to v2 native API

d5f67dc

Increment version

f835985

Remove W503 from ruff ignore list (not a valid ruff rule)

adeb162

Fix ruff isort config: replace unsupported profile with equivalent se…

a7f629b

…ttings

Format with ruff

32c4b63

Update CI validation workflow: replace black, isort, flake8 with ruff

79b5f81

Limit CI ruff lint to isort-only checks for now

86f7ce3

Autofix autofixable things from ruff

47b5504

Apply ruff formatting to 8 files

8e6bd6d

Fix import sorting for ruff isort compliance

24255bd

honnibal added 13 commits March 23, 2026 13:45

Add lazy spaCy CLI loading and static launcher

188c90d

Fix lazy load on modules where the function shadows

0a45289

Update manifest

f0abcf7

Fix manifest

8a318db

Update test_cli_launcher

4967496

Merge branch 'faster-cli' of https://github.com/explosion/spaCy into …

7bb0938

…faster-cli

Fix import sorting (ruff I001) for CI validation

c5bcffd

Add local lint script matching CI validate + mypy checks

ec786c8

Regenerate CLI manifest for typer 0.24.1 plain-text output

cb67fe1

Debug: dump manifest diff in test_manifest_is_current

1041f8b

Debug: show per-key manifest diffs in test_manifest_is_current

66b1691

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add lazy spaCy CLI loading and static launcher#13933

Add lazy spaCy CLI loading and static launcher#13933
honnibal wants to merge 43 commits intomasterfrom
faster-cli

honnibal commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

honnibal commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants