Skip to content

Latest commit

 

History

History
96 lines (74 loc) · 4.04 KB

File metadata and controls

96 lines (74 loc) · 4.04 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

HyperTools is a Python library for visualizing and manipulating high-dimensional data. It provides a unified interface for dimensionality reduction, data alignment, clustering, and visualization, built on top of matplotlib, scikit-learn, and seaborn.

Key Commands

Testing

  • pytest - Run all tests from the hypertools/ directory
  • pytest tests/test_<module>.py - Run tests for a specific module
  • pytest tests/test_<module>.py::test_<function> - Run a specific test function

Development Setup

  • pip install -e . - Install in development mode
  • pip install -r requirements.txt - Install dependencies
  • pip install -r docs/doc_requirements.txt - Install documentation dependencies

Documentation

  • cd docs && make html - Build HTML documentation
  • cd docs && make clean - Clean documentation build files

Code Architecture

Core Components

DataGeometry Class (hypertools/datageometry.py)

  • Central data container that holds raw data, transformed data, and transformation parameters
  • Stores matplotlib figure/axes handles and animation objects
  • Contains normalization, reduction, and alignment model parameters

Main API Functions (hypertools/__init__.py)

  • plot() - Primary visualization function
  • analyze() - Data analysis and dimensionality reduction
  • reduce() - Dimensionality reduction utilities
  • align() - Data alignment across datasets
  • normalize() - Data normalization
  • describe() - Data description and summary
  • cluster() - Clustering functionality
  • load() - Data loading utilities

Tools Module (hypertools/tools/)

  • align.py - Hyperalignment and Procrustes alignment
  • reduce.py - Dimensionality reduction (PCA, t-SNE, UMAP, etc.)
  • normalize.py - Data normalization methods
  • cluster.py - K-means and other clustering algorithms
  • format_data.py - Data preprocessing and formatting
  • text2mat.py - Text-to-matrix conversion
  • df2mat.py - DataFrame-to-matrix conversion
  • load.py - Data loading from various sources
  • missing_inds.py - Missing data handling
  • procrustes.py - Procrustes analysis

Plot Module (hypertools/plot/)

  • plot.py - Main plotting interface and logic
  • backend.py - matplotlib backend configuration
  • draw.py - Low-level drawing functions

External Dependencies (hypertools/_externals/)

  • ppca.py - Probabilistic Principal Component Analysis
  • srm.py - Shared Response Model

Data Flow

  1. Input Processing: Data is formatted and validated through format_data()
  2. Normalization: Optional data normalization via normalize()
  3. Alignment: Optional cross-dataset alignment via align()
  4. Dimensionality Reduction: Data is reduced via reduce()
  5. Clustering: Optional clustering via cluster()
  6. Visualization: Final plotting through plot()

Key Design Patterns

  • Modular Architecture: Each major operation (align, reduce, normalize, etc.) is in its own module
  • Unified Interface: All functions accept similar input formats (lists of arrays, DataFrames, etc.)
  • Flexible Data Types: Supports numpy arrays, pandas DataFrames, text data, and mixed inputs
  • Matplotlib Integration: Deep integration with matplotlib for customizable visualizations
  • Animation Support: Built-in support for animated visualizations

Development Notes

  • The package follows a functional programming style with separate modules for each operation
  • All major functions are designed to work with multiple input formats
  • The DataGeometry class serves as the central data container and state manager
  • Tests are located in tests/ directory and follow pytest conventions
  • Documentation is built with Sphinx and uses example galleries
  • The codebase maintains compatibility with Python 3.9+

Testing Strategy

  • Unit tests for individual tools and functions
  • Integration tests for end-to-end workflows
  • Example-based testing through documentation
  • Visual regression testing for plot outputs