Skip to content

Releases: SamoraHunter/pat2vec

v0.3.1

10 Apr 13:24

Choose a tag to compare

Release v0.3.0: Elasticsearch Testing & Data Safety

10 Apr 12:04

Choose a tag to compare

🚀 What's New in v0.3.0

This release focuses on industrializing the testing pipeline and enhancing data safety when interacting with Elasticsearch.

✨ Highlights

🔍 Integrated Elasticsearch Testing

Developers can now validate their clinical pipelines against a real, temporary Elasticsearch instance inside Docker. This replaces static mocks with actual search behavior.

🧪 Automated Synthetic Data Seeding

Includes new utilities to generate and seed realistic patient timelines into test clusters, complete with automated schema management via elastic_schemas.json.

🛡️ Data Ingestion Safety

Introduced strict guardrails that prevent accidental write operations to production Elasticsearch clusters during testing or development runs.

🤖 CI/CD Enhancements

Full support for local GitHub Action runners (via act), making it easier to debug complex notebook-based tests locally before pushing.


Full Changelog: v0.2.0...v0.3.0

v0.2.0

23 Mar 10:22

Choose a tag to compare

Release v0.2.0

Database Backend Implementation

This release introduces a robust database backend using SQLAlchemy, which replaces the legacy file-based system as the default storage mechanism.

New Features

  • Database Support: Added support for SQLite (default) and PostgreSQL.
    • Defaults to a local {project_name}.db SQLite database if no connection string is provided.
    • Supports in-memory SQLite for testing.
  • Schema Management: The pipeline now handles automatic table creation and schema updates for:
    • Raw Data: raw_data tables (e.g., raw_data.raw_bloods).
    • Annotations: MedCAT annotations tables.
    • Features: Feature vectors with JSON serialization for sparse/high-dimensional data.
  • Migration Utility: Added pat2vec/util/migrate_to_db.py to migrate existing file-based projects to the new database structure.

Configuration Changes

  • Added storage_backend option to config_class (values: 'database', 'file').
  • Added db_connection_string option to config_class.

Technical Improvements

  • Centralized Data Retrieval: Implemented get_df_from_db and updated retrieve_patient_data to abstract data access.
  • Performance: Implemented batch insertion and automatic index creation on primary keys (e.g., client_idcode, timestamps) to improve query performance.

First public release of pat2vec

22 Sep 21:33

Choose a tag to compare