Skip to content

Conversation

@WubbzyFromWuzzleburg
Copy link

Contributor: Elliott Huang (elliott500800@gmail.com)

Contribution Type: New Dataset + Documentation + Example Script

Description:
This PR adds support for the KaggleERN (INRIA BCI Challenge) EEG dataset in PyHealth. It introduces a KaggleERNDataset class that validates the expected raw folder structure and provides an offline preprocessing utility to convert raw EEG CSV files into fixed-length epoch/window pickle files for downstream training (keeping the pickle schema compatible with the provided fine-tuning workflow). This PR also updates the API documentation to include the new dataset and adds an end-to-end fine-tuning example for EEGPT on KaggleERN, including instructions for downloading and placing the pretrained EEGPT checkpoint.

Files to Review:

  • pyhealth/datasets/kaggleern.py — KaggleERN dataset class + offline preprocessing to window pickle format
  • pyhealth/datasets/configs/kaggleern.yaml — Dataset config entry (tables/attributes schema)
  • pyhealth/datasets/__init__.py — Expose KaggleERNDataset in the datasets namespace
  • examples/kaggleern_finetune_EEGPT.py — Fine-tuning example script (paths are placeholders; includes pretrained checkpoint notes)
  • docs/api/datasets.rst — Register KaggleERN in dataset API docs index
  • docs/api/datasets/pyhealth.datasets.KaggleERNDataset.rst — New API doc page for pyhealth.datasets.KaggleERNDataset
  • tests/core/test_kaggleern.py — Unit tests for dataset verification and optional preprocessing integration

@Logiquo Logiquo added the component: dataset Contribute a new dataset to PyHealth label Dec 18, 2025
@Logiquo
Copy link
Collaborator

Logiquo commented Dec 24, 2025

LGTM, but probably need to update the code to support the newest changes.

@Logiquo
Copy link
Collaborator

Logiquo commented Dec 27, 2025

The test failed with error TypeError: KaggleERNPreprocessConfig.__init__() got an unexpected keyword argument 'pipeline'

@Logiquo Logiquo added the status: wait response Pending PR author's response label Dec 27, 2025
@Logiquo Logiquo self-requested a review December 27, 2025 10:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component: dataset Contribute a new dataset to PyHealth status: wait response Pending PR author's response

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants