Starting from 2.0a11, PyHealth starts to use a disk-based memory efficient dataset to reduce the memory usage for large dataset such as MIMIC4.
This issues tracks any potential bugs or improvements required for new memory efficient dataset.
Improvements
Bugs
Starting from
2.0a11, PyHealth starts to use a disk-based memory efficient dataset to reduce the memory usage for large dataset such as MIMIC4.This issues tracks any potential bugs or improvements required for new memory efficient dataset.
Improvements
Add option to cache transformed data from processors and skip pipeline entirely #783
.set_taskgives write cache to the same directory for the same task with different configuration #764Update the default task cache path to include task parameter names and values #766
Furthur optimization on task transformation. #750
Multiprocess task transformation #748
n_workerfor dask.Add num_workers to BaseDataset #743
Bugs
.clear_cacheand.clear_task_cachemethod to avoid the need to manually delete the cache. #765Add clear_cache and clear_task_cache methods to BaseDataset #770
Fix the code will hang at set_task if any of the worker have 0 sample written #784
Clean up tmpdir correctly, cache task transformation result, and better notebook support. #753
Clean up tmpdir correctly, cache task transformation result, and better notebook support. #753
Fix incorrect null handling for patient_id and timestamp #746
Fix/processors fit process #744