Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
504 changes: 504 additions & 0 deletions COMPONENT_STRUCTURE.md

Large diffs are not rendered by default.

407 changes: 407 additions & 0 deletions DEBUGGING_GUIDE.md

Large diffs are not rendered by default.

465 changes: 465 additions & 0 deletions ERROR_ANALYSIS.md

Large diffs are not rendered by default.

470 changes: 470 additions & 0 deletions LEARNING_GUIDE.md

Large diffs are not rendered by default.

535 changes: 535 additions & 0 deletions NEXTJS_LEARNING_GUIDE.md

Large diffs are not rendered by default.

167 changes: 167 additions & 0 deletions TEAM_REPORT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
# OpenML Search Implementation - Team Report

**Date:** November 16, 2025
**Status:** ✅ Completed

---

## Summary

Successfully migrated OpenML search functionality from old routes to new SEO-friendly URLs with direct Elasticsearch integration, bypassing the problematic API proxy layer.

---

## What Changed

### 1. New SEO-Friendly Routes

All search pages now use clean, SEO-optimized URLs:

| Old URL | New URL | Status |
| ----------- | ----------- | ------- |
| `/d/search` | `/datasets` | ✅ Live |
| `/t/search` | `/tasks` | ✅ Live |
| `/f/search` | `/flows` | ✅ Live |
| `/r/search` | `/runs` | ✅ Live |

**Benefits:**

- Better Google indexing
- Cleaner URLs for sharing
- Improved user experience
- All old URLs automatically redirect (no broken links)

---

### 2. Custom Elasticsearch Connector

**Problem:** The `@elastic/search-ui-elasticsearch-connector` library was returning HTML error pages instead of JSON responses.

**Solution:** Created `OpenMLSearchConnector.js` that directly queries Elasticsearch 6.8.23 at `https://www.openml.org/es/`

**Key Features:**

- Direct fetch() API calls to Elasticsearch
- Handles search terms, filters, pagination, sorting, and facets
- Wraps results in Search UI's expected `{ raw: value }` format
- Strips invalid `name` fields from range aggregations
- Compatible with ES 6.8.23 query format

**Files Updated:**

- `app/src/services/OpenMLSearchConnector.js` (new)
- `app/src/search_configs/dataConfig.js`
- `app/src/search_configs/taskConfig.js`
- `app/src/search_configs/flowConfig.js`
- `app/src/search_configs/runConfig.js`

---

### 3. Bug Fixes

| Issue | Fix | File |
| --------------------------------------- | ------------------------------------------------------------------ | ---------------------------------- |
| Elasticsearch 400 error on range facets | Removed `name` field from range aggregations | `OpenMLSearchConnector.js` |
| Sort not working | Added `sortList` array handling instead of individual fields | `OpenMLSearchConnector.js` |
| Missing values sort error | Fixed typo: `"NumberOfMissing values"` → `"NumberOfMissingValues"` | `datasets.js` |
| Description font size not applying | Changed `fontSize: "12px"` to `fontSize: "inherit"` | `Teaser.js` |
| Grid responsive layout | Set xs=12, sm=6, md=4 for proper column display | `ResultGridCard.js` |
| Container width constraints | Set width: 100% with 24px horizontal margins | `Wrapper.js`, `SearchContainer.js` |

---

### 4. Architecture Changes

**Before:**

```
Browser → Next.js → SearchAPIConnector → API Proxy → Elasticsearch
```

**After:**

```
Browser → Next.js → OpenMLSearchConnector → Elasticsearch (direct)
```

**Benefits:**

- One less failure point
- Faster queries (no proxy overhead)
- Easier debugging with direct ES error messages
- No dependency on Flask backend for read operations

---

## Testing Results

✅ **Datasets:** 24,498 results, search/filter/sort working
✅ **Tasks:** Search and filters functional
✅ **Flows:** Search and filters functional
✅ **Runs:** Search and filters functional

**Elasticsearch Status:**

- Cluster: `openmlelasticsearch`
- Version: 6.8.23
- Endpoint: `https://www.openml.org/es/`

---

## Files Created/Modified

### New Files

- `app/src/pages/datasets.js`
- `app/src/pages/tasks.js`
- `app/src/pages/flows.js`
- `app/src/pages/runs.js`
- `app/src/services/OpenMLSearchConnector.js`
- `app/src/components/search/DatasetSearchResults.jsx`

### Modified Files

- `app/src/pages/d/search.js` (now redirects)
- `app/src/pages/t/search.js` (now redirects)
- `app/src/pages/f/search.js` (now redirects)
- `app/src/pages/r/search.js` (now redirects)
- All config files in `app/src/search_configs/`
- `app/src/components/search/SearchContainer.js`
- `app/src/components/search/Teaser.js`
- `app/src/components/Wrapper.js`
- `app/src/components/search/ResultGridCard.js`

---

## Known Issues

None at this time. All major functionality working as expected.

---

## Next Steps (Optional)

1. **Performance:** Consider adding request caching
2. **Analytics:** Add tracking for search queries
3. **UI:** Remove debug console.log statements (currently helpful for monitoring)
4. **Tests:** Add unit tests for OpenMLSearchConnector
5. **Documentation:** Update API documentation to reflect new routes

---

## Technical Notes

- **No Docker Required:** Frontend connects directly to production Elasticsearch
- **No Flask Backend Needed:** For read-only search operations
- **Backwards Compatible:** All old URLs redirect automatically
- **SEO Optimized:** Each page has proper meta tags, Open Graph tags, and canonical URLs

---

## Contact

For questions or issues, check:

- Browser console logs (prefixed with `[OpenMLSearchConnector]`)
- Network tab for Elasticsearch requests to `https://www.openml.org/es/`
- Server terminal for Next.js errors
56 changes: 56 additions & 0 deletions app-next/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Dependencies (installed fresh in container)
node_modules
node_modules.nosync

# Build output (rebuilt in container)
.next
out
build

# Version control
.git
.gitignore

# Environment files (secrets must not be baked into image)
.env
.env.*
!.env.docker.example

# IDE and editor files
.vscode
.idea
*.swp
*.swo

# OS files
.DS_Store
Thumbs.db

# Development scripts and docs
scripts/
docs/
Internal_docs/

# Test and debug files
coverage
npm-debug.log*
yarn-debug.log*
yarn-error.log*

# Docker files (prevent recursive inclusion)
Dockerfile
docker-compose.yml
.dockerignore

# Build artifacts
*.tsbuildinfo
tsconfig.tsbuildinfo

# Python artifacts
__pycache__
*.pyc
venv

# Database files
*.db
*.sqlite
109 changes: 109 additions & 0 deletions app-next/.env.docker.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# =====================================================
# OpenML Next.js Docker Environment Variables
# =====================================================
# Copy this file to .env.docker and fill in your values:
# cp .env.docker.example .env.docker
#
# Variables prefixed with NEXT_PUBLIC_ are baked into the
# JavaScript bundle at BUILD TIME. To change them, rebuild
# the image (or pass them as --build-arg to docker compose).
# =====================================================

# -----------------------------------------------
# DATABASE (MySQL - required for production)
# -----------------------------------------------
# When MYSQL_HOST is set, the app uses MySQL instead of SQLite
MYSQL_HOST=your-mysql-host.example.com
MYSQL_USER=openml
MYSQL_PASSWORD=your-secure-password
MYSQL_DATABASE=openml
MYSQL_PORT=3306

# -----------------------------------------------
# ELASTICSEARCH (server-side, runtime)
# -----------------------------------------------
ELASTICSEARCH_URL=https://es.openml.org/
ELASTICSEARCH_SERVER=https://www.openml.org/es/
# Optional: Elasticsearch authentication
# ELASTICSEARCH_USERNAME=
# ELASTICSEARCH_PASSWORD=

# -----------------------------------------------
# ELASTICSEARCH (client-side, BUILD TIME)
# -----------------------------------------------
NEXT_PUBLIC_ENABLE_ELASTICSEARCH=true
NEXT_PUBLIC_ELASTICSEARCH_SERVER=https://www.openml.org/es
NEXT_PUBLIC_URL_ELASTICSEARCH=https://www.openml.org/es
NEXT_PUBLIC_ELASTICSEARCH_VERSION_MAYOR=8

# -----------------------------------------------
# API URLS (BUILD TIME)
# -----------------------------------------------
NEXT_PUBLIC_API_URL=https://www.openml.org
NEXT_PUBLIC_URL_API=https://www.openml.org/api/v1
NEXT_PUBLIC_URL_SITE_BACKEND=https://www.openml.org
NEXT_PUBLIC_OPENML_API_URL=https://www.openml.org

# -----------------------------------------------
# MINIO / FILE STORAGE (BUILD TIME)
# -----------------------------------------------
NEXT_PUBLIC_ENABLE_MINIO=true
NEXT_PUBLIC_URL_MINIO=https://www.openml.org/data

# -----------------------------------------------
# AUTHENTICATION (NextAuth.js)
# -----------------------------------------------
# REQUIRED: Generate with: openssl rand -base64 32
NEXTAUTH_SECRET=your-nextauth-secret-here
# REQUIRED: Full URL where the app is deployed
NEXTAUTH_URL=https://your-domain.example.com
# Optional: JWT expiry in seconds (default: 7200 = 2 hours)
# JWT_ACCESS_TOKEN_EXPIRES=7200

# -----------------------------------------------
# OAUTH: GitHub
# -----------------------------------------------
# Get from: https://github.com/settings/developers
GITHUB_ID=your-github-oauth-client-id
GITHUB_SECRET=your-github-oauth-client-secret

# -----------------------------------------------
# OAUTH: Google
# -----------------------------------------------
# Get from: https://console.cloud.google.com/apis/credentials
GOOGLE_ID=your-google-oauth-client-id
GOOGLE_SECRET=your-google-oauth-client-secret

# -----------------------------------------------
# EMAIL / SMTP
# -----------------------------------------------
SMTP_SERVER=smtp.example.com
SMTP_PORT=587
SMTP_LOGIN=your-smtp-username
SMTP_PASS=your-smtp-password
# Server hostname used in confirmation email links
EMAIL_SERVER=your-domain.example.com
EMAIL_SENDER="OpenML" <noreply@openml.org>

# -----------------------------------------------
# WEBAUTHN / PASSKEYS
# -----------------------------------------------
# RP_ID: domain name only (no protocol, no port)
RP_ID=your-domain.example.com
# RP_ORIGIN: full origin URL with protocol
RP_ORIGIN=https://your-domain.example.com

# -----------------------------------------------
# FLASK HYBRID PROXY
# -----------------------------------------------
# URL of the Flask backend for routes not yet migrated:
# password reset, API key regen, likes/votes, /api/v1/*
FLASK_BACKEND_URL=https://www.openml.org

# -----------------------------------------------
# OPTIONAL
# -----------------------------------------------
# GitHub API token (avoids rate limits on /api/contributors)
# GITHUB_TOKEN=ghp_your_token_here
# Flickr API key (meet-us gallery)
# FLICKR_API_KEY=your-flickr-api-key
Loading