Skip to content

Add measures pages, Stats API and dataset analysis improvements#379

Open
HelderMendes wants to merge 6 commits intoopenml:masterfrom
HelderMendes:feat/measures-stats-analysis
Open

Add measures pages, Stats API and dataset analysis improvements#379
HelderMendes wants to merge 6 commits intoopenml:masterfrom
HelderMendes:feat/measures-stats-analysis

Conversation

@HelderMendes
Copy link
Contributor

Summary

  • Add new measures pages (data, evaluation, procedures, detail view)
  • Add Stats API hook and Next.js route for dataset statistics
  • Enable Stats API for all dataset sizes, including huge datasets
  • Enhance dataset analysis: pagination, theme-aware correlation charts, per-feature distribution logic
  • Fix parquet-wasm loading and add theme-aware distribution charts

Changes

  • app-next/src/app/[locale]/(explore)/measures/ — new measures pages
  • app-next/src/components/measure/ — new measure components (header, search, stats, analysis)
  • app-next/src/app/api/datasets/[id]/stats/route.ts — Stats API route
  • app-next/src/hooks/useDatasetStats.ts — Stats API hook
  • app-next/src/components/dataset/data-analysis-section.tsx — analysis improvements
  • app-next/src/components/benchmark/ and collection/ — navigation and section components

Notes

  • Stats API integration is an initial implementation, more improvements planned
  • Parquet-wasm loading fixed for large datasets

Test plan

  • Measures pages load correctly
  • Stats API returns data for small and large datasets
  • Distribution charts render in light and dark theme
  • Dataset analysis pagination works

This is Work in progress

Helder Mendes and others added 6 commits February 26, 2026 23:21
…ature distribution logic

- Add pagination to Features tab (50 per page) with grid/list view toggle
- Add pagination to Distribution tab feature selector (50 per page)
- Make correlation heatmap theme-aware (dark/light mode font + grid colors)
- Use transparent background for correlation colorscale midpoint
- Show nominal distributions from metadata for large datasets, "coming soon" for numeric
- Default feature selection: target (if nominal) + up to 5 numeric features
- Remove global "coming soon" block from correlation (always show)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Enhance dataset analysis: pagination, theme-aware correlation, per-feature distribution logic#
…tems doc

- Fix parquet-wasm: call .intoIPCStream() before tableFromIPC() (fixes metadata.map error)
- Bump MAX_PARQUET_SIZE from 5MB to 10MB (allows datasets like 1590)
- Add dark/light mode styling to distribution plots (font, grid colors)
- Add loading state and fallback message to correlation heatmap
- Add OPEN_ITEMS.md documenting outstanding items for team discussion

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Stats API was incorrectly disabled for huge datasets (>5GB) by passing
!isHugeDataset as the enabled parameter. The Stats API was specifically
designed to handle large datasets server-side, so it should always be enabled.

This fixes the issue where large datasets were falling back to old architecture
(parquet download or Dash iframe) instead of using the new efficient Stats API.

Co-Authored-By: Claude <noreply@anthropic.com>

 enable Stats API for all dataset sizes (initial implementation)#
- Add useDatasetStats hook for fetching pre-computed statistics
- Add Next.js API proxy route at /api/datasets/[id]/stats
- Required for Vercel deployment and Stats API functionality

Co-Authored-By: Claude <noreply@anthropic.com>
@HelderMendes
Copy link
Contributor Author

Part of the app-next-v3 feature branch work — splitting changes into focused PRs for easier review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant