Skip to content

fix: resolve Python 3.14 SegFault and modernize PyO3 implementation#9

Open
ElissonRodrigues wants to merge 16 commits into
qdrant:add-python3.14from
ElissonRodrigues:master
Open

fix: resolve Python 3.14 SegFault and modernize PyO3 implementation#9
ElissonRodrigues wants to merge 16 commits into
qdrant:add-python3.14from
ElissonRodrigues:master

Conversation

@ElissonRodrigues
Copy link
Copy Markdown

Summary

This PR resolves the Segmentation Fault occurring on Python 3.14.3 during stemmer initialization and modernizes the overall Rust-Python interface. It also introduces type stubs for better developer experience.

Context

This PR addresses the root cause of the Segmentation Fault reported in qdrant/fastembed#618 when running on Python 3.14. While the issue was observed in fastembed, the underlying crash occurs within the py_rust_stemmers extension during initialization.

Key Changes

  • Python 3.14 Compatibility: Upgraded to PyO3 0.28.2 and implemented the Python Stable ABI (abi3-py38). This ensures long-term stability across Python versions without requiring recompilation for each release.
  • Robust Error Handling: Migrated from panic! into PyResult<Self> in the SnowballStemmer constructor. Invalid language inputs now raise a standard ValueError in Python instead of crashing the process.
  • Modern Bound API: Refactored the implementation to use the modern PyO3 Bound API, satisfying stricter trait requirements and improving memory safety.
  • IDE Support: Added Type Stubs (.pyi) and the py.typed marker (PEP 561) to enable autocompletion and type checking in IDEs and linters.
  • Test Suite Restoration: Fixed the unit test suite and resolved a TypeError in speedtest.py related to Python 3 string/bytes handling in the comparison baseline.

Verification

  • Manually verified on Python 3.14.3 (Linux/WSL).
  • All unit tests pass (tests/test_py_rust_stemmers.py).
  • Benchmarks (speedtest.py and benchmark_for_quantile.py) verified for performance parity.

Performance

The Rust implementation remains ~35x-40x faster than the pure Python snowballstemmer baseline on modern large-scale benchmarks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant