feat: add PTBXLDataset and PTBXLMultilabelClassification task#2
feat: add PTBXLDataset and PTBXLMultilabelClassification task#2anuragd-UIUC wants to merge 10 commits intojtwells2:masterfrom
Conversation
sl4mmy
left a comment
There was a problem hiding this comment.
Left a few questions inline regarding the config file & dataset implementation, but otherwise this looks good to me. 👍
| @@ -0,0 +1,13 @@ | |||
| version: "1.0.0" | |||
There was a problem hiding this comment.
This looks fine, but should confirm with @jtwells2 (I didn't see a config file in his recent commits)
There was a problem hiding this comment.
Going to manually deconflict this. I'd rather not use the YAML file at all.
| @@ -1,208 +1,277 @@ | |||
| import pandas as pd | |||
| from pathlib import Path | |||
| """PTB-XL ECG Dataset for PyHealth. | |||
There was a problem hiding this comment.
I think I created a branch and didn’t run git pull before pushing my changes. I will update the repo with latest changes.
99c7a58 to
d5e5ea6
Compare
anuragd-UIUC
left a comment
There was a problem hiding this comment.
Earlier was using stubbed file, rebased ptbxl.py file.
| @@ -0,0 +1,13 @@ | |||
| version: "1.0.0" | |||
There was a problem hiding this comment.
Going to manually deconflict this. I'd rather not use the YAML file at all.
There was a problem hiding this comment.
Had to make this change to test PTBXLDataset. Pull the latest version, should be there.
There was a problem hiding this comment.
Examples need to go into the examples folder, anything else should be removed from this PR.
There was a problem hiding this comment.
Examples need to go into the examples folder, anything else should be removed from this PR.
There was a problem hiding this comment.
Examples need to go into the examples folder, anything else should be removed from this PR.
There was a problem hiding this comment.
Change already been made, please remove.
There was a problem hiding this comment.
Please remove this from the PR, don't need it after the current PTBXL updates.
There was a problem hiding this comment.
As discussed, I will cover the dataset testing. Need to make this a test for the multilabel classification task.
- pyhealth/datasets/ptbxl.py: BaseSignalDataset subclass for PTB-XL v1.0.3 - pyhealth/tasks/ptbxl_multilabel_classification.py: 5-class superdiagnostic task - examples/ptbxl_superdiagnostic_sparcnet.ipynb: ablation study (SparcNet vs BiLSTM) - tests/core/test_ptbxl.py: unit tests - cs598_project/: CS-598 course project pipeline notebook Results: SparcNet ROC-AUC 0.9278, BiLSTMECG ROC-AUC 0.9155 on PTB-XL test set
…n design - SNOMED_TO_SUPERDIAG: use jtwells2's 46-code clinically correct mapping (from pyhealth/tasks/ptbxl_multilabel_classification.py) - Signal: full 10s at 100 Hz (decimate 500->100 Hz per jtwells2's signal[:, ::5]) → shape (12, 1000) instead of old (12, 1250) windowed slices - Schema: 'labels' key (plural) matching jtwells2's PTBXLMultilabelClassification output - Samples: 21,767 (1 per recording) vs old 152,859 (7 windows per recording) - SUPERDIAG_CLASSES ordering: [NORM, MI, STTC, CD, HYP] per jtwells2 - Cache: cinc_100hz/ instead of old windows/ directory - Fix: set dataset.refresh_cache=True to overwrite stale BaseSignalDataset cache - Update all downstream cells: split, DataLoaders, SparcNet, BiLSTMECG - Both models trained 5 epochs on CPU; pipeline fully validated end-to-end
033fed6 to
6ed6a48
Compare
Results: SparcNet ROC-AUC 0.9278, BiLSTMECG ROC-AUC 0.9155 on PTB-XL test set