Background
Running subimage registration tasks on a single workstation may require a prohibitively long time to run on massive, cloud-based image datasets. We would like to be able to distribute registration tasks among a cluster of worker nodes to execute in parallel.
The itk_dreg framework is built with distributed registration in mind via streaming readers and dask.delayed tasks. However, output serialization is not fully supported in ITK v5.4rc2 or earlier.
ITK v5.4rc3 wheels will include support for unbuffered ITK images introduced in InsightSoftwareConsortium/ITK#4270. That support will allow us to serialize itk.Images describing oriented bounding boxes over which piecewise itk.Transform results are be valid, which is required for distributed processing.
Steps to Investigate
When ITK v5.4rc3 is available on PyPI:
- Update
pyproject.toml and CI workflows in itk-dreg to use the updated ITK version
- Run the
localcluster and serialize_pairwise_result tests locally and verify that both tests pass
- Re-enable the
localcluster and serialize_pairwise_result tests in CI and verify that automated tests pass
For further testing:
- Use
dask.distributed.LocalCluster to mock a distributed cluster on your local system. Run serialized registration in an example notebook on a LocalCluster and verify that tasks are visible in the accompanying Dask dashboard.
- Set up access to a distributed cluster and test distributed registration on the cluster. (xref: Coiled, ACCESS)
- Explore Dask optimization to reduce task serialization requirements
Background
Running subimage registration tasks on a single workstation may require a prohibitively long time to run on massive, cloud-based image datasets. We would like to be able to distribute registration tasks among a cluster of worker nodes to execute in parallel.
The
itk_dregframework is built with distributed registration in mind via streaming readers anddask.delayedtasks. However, output serialization is not fully supported in ITK v5.4rc2 or earlier.ITK v5.4rc3 wheels will include support for unbuffered ITK images introduced in InsightSoftwareConsortium/ITK#4270. That support will allow us to serialize
itk.Images describing oriented bounding boxes over which piecewiseitk.Transformresults are be valid, which is required for distributed processing.Steps to Investigate
When ITK v5.4rc3 is available on PyPI:
pyproject.tomland CI workflows initk-dregto use the updated ITK versionlocalclusterandserialize_pairwise_resulttests locally and verify that both tests passlocalclusterandserialize_pairwise_resulttests in CI and verify that automated tests passFor further testing:
dask.distributed.LocalClusterto mock a distributed cluster on your local system. Run serialized registration in an example notebook on aLocalClusterand verify that tasks are visible in the accompanying Dask dashboard.