-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Description
I just closed issue #10 based on parameterizing chained MLDataset transformations, deferring the FeatureUnion discussion there to this separate issue.
FeatureUnionin scikit-learn is an transformer that uses the scikit-learn parallelism (within one machine) to run a transform for each column of a feature matrix.dask_searchcvhasFeatureUnionbased ondask.distributed(single- or multi-node parallelism) that follows the same usage patterns.FeatureUnionan important relative toelm/xarray_filtersgoals because most of the rest of our parallelism relates to tools for multiple models where a Pipeline-like instance is the embarassingly parallel task being automated. Some important workflows for our climate science and satellite imagery use cases may be slow in the processing of each column step(s) whereFeatureUnioncan speed things up, e.g. aPipelinewith a histogram or Gaussian process on each column individually as a preprocessing step.- Also note that
FeatureUnionis associated with scikit-learn and generally people think of it then in ML contexts, but the parallelism approach toFeatureUnionalso has benefits outside of ML, e.g. preprocessing each column of a large array before visualization or summary stats. This is a documentation need for us in however we wrapFeatureUnioninxarray_filters/elm: make sure this it is explained for usage in- or outside of ML contexts.
Metadata
Metadata
Assignees
Labels
No labels