Skip to content

Proposal: Hierarchical Dimensionality Reduction module  #729

@dylanrstewart

Description

@dylanrstewart

Author of Proposal: Dylan Stewart

Reason or Problem

A common issue with multi-dimensional raster image processing (at the extremes, hyperspectral imagery with hundreds of features) is significant redundancy within the feature space. Some datasets have tens or hundreds of bands when only a handful might be necessary for downstream use (e.g., classification, segmentation, clustering).

Proposal

This module takes high dimensional data and a desired number of output channels or threshold, compares the distributions of the features within the data, and returns the most dissimilar grouping.

Design:

  1. Given a dataset containing $N$ pixels and $F$ features, produce a pairwise-distance matrix:
    $$C = F \times F,$$
    where $C$ can be computed using various metrics (e.g., Jensen-Shannon divergence, a symmetric Kullback-Leibler divergence, Mahalanobis Distance Add Mahalanobis Distance Metric #114, Euclidean distance) evaluated over the distribution of pixels within the dataset.
  2. Then, select the most similar pair of features (or spectra) by finding the minimum (for a distance/divergence measure) or maximum (similarity measure, e.g., mutual information or cosine similarity) and merge them by a specified aggregation (e.g., mean, median, max, min).
  3. Update $C$ based on 2. until stopping criteria is met. Return dataset with reduced dimensionality.

Usage: for reducing the dimensionality of an input by finding correlating features within and removing redundancy.

Value: provide support to high-dimensional raster processing applications (e.g., data fusion, hyperspectral, multispectral)

Additional Notes or Context

Some distance metrics already available to build from:

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions