Paper blueprint | Implementation plan | Research plan
Anomaly detection is deployed exactly where labeled anomalies are scarce, expensive, or unrepresentative: medical signals, cybersecurity, fraud, industrial sensors, predictive maintenance, scientific telemetry. A practitioner can train dozens of detectors on the available normal data, but cannot reliably choose between them without labels. Across the standard time-series benchmark (TSB-AD, 1,070 series), the gap between the best and the median model on a single dataset routinely exceeds 0.30 VUS-PR. Picking the wrong detector is the difference between catching failures and missing them.
The model-selection problem in AD is therefore a fundamental, daily, expensive problem that the field has not solved.
- Internal performance measures (IREOS, ASOI, score-distribution heuristics) are statistically indistinguishable from random selection in a 297-detector study by Ma & Zhao [SIGKDD Explorations 2023].
- Meta-learning approaches (MetaOD NeurIPS 2021, MSAD PVLDB 2023) require a historical performance database and assume future anomalies resemble the meta-training distribution. They transfer poorly out-of-distribution.
- Synthetic-injection methods (Goswami et al. ICLR 2023 Oral) inject a single perturbation family and aggregate three signals via heuristic Borda. State of the art for zero-label time-series selection, but brittle when anomaly types vary.
A reliable, label-free, single-dataset selection protocol is the open problem.
Normal data carries enough structural information to rank detectors.
We introduce Leave-Cluster-Out (LCO) validation: cluster the normal training data, hold out one cluster, train the candidate on the rest, and score the holdout as a pseudo-anomaly. The detector that best separates the held-out cluster from the rest is, empirically, more likely to be the one that separates real anomalies from normal data. To our knowledge no prior published method uses cluster-holdout for AD model selection.
LCO alone is not enough. We combine three pseudo-anomaly sources via an anomaly-type-aware rank-fusion combiner:
- LCO at multiple granularities and clustering algorithms, with model-independent fold-difficulty stratification (data-domain and latent-domain variants);
- Six structured synthetic perturbation families (point, level shift, trend, frequency, contextual, mode exclusion), disjoint from the test set;
- Prediction-residual consistency (stationarity and entropy of normal-data residuals from forecasting models).
The combiner reweights sources based on a categorical anomaly-type prior inferred from normal-data diagnostics — never from anomaly labels.
Reliable normal-data-only AD model selection unlocks production deployment in every domain where labels arrive too late or never:
- Medical monitoring — physiological signal anomaly detection without per-patient ground truth.
- Predictive maintenance — industrial sensor failure detection where failures are by definition rare.
- Cybersecurity — attack detection on telemetry that has never been attacked yet.
- Fraud monitoring — new fraud patterns that historical data does not capture.
- Scientific telemetry — instrument anomaly detection (satellite, particle physics, sensor networks).
It also closes the gap between the Ma & Zhao 2023 negative result (IPMs no better than random) and the empirical demand for a positive method.
| Metric | Target | Comparison |
|---|---|---|
| Mean VUS-PR selection regret | < 0.05 | vs. ~0.10 for default Isolation Forest |
| Top-1 selection accuracy | ≥ 30 % | random ≈ 2 % (1 / 60 candidates) |
| Beat Goswami (ICLR 2023 Oral) | ≥ 0.03 absolute | paired bootstrap p < 0.01 |
| OOD result vs MSAD (PVLDB 2023) | win by 0.018-0.027 | despite using no historical labels |
| Failure-mode characterization | explicit | clear (anomaly type x diagnostic) cells |
index.html- paper blueprint with simulated headline results (red ink); served via GitHub Pages at https://apartsinprojects.github.io/AutoAD/implementation plan.md- locked engineering plan: datasets, candidate pool, week-by-week schedule, artifact persistence, smoke tests, computational principlesplan/anomaly_detection_model_selection_plan.md- the broader research plansrc/autoad/- source code (data loaders, models, LCO, synthetic perturbations, aggregation, regret, robustness)scripts/- phase entry points; data downloadtests/smoke/- phase smoke tests, each runs end-to-end on minimal data in < 5 minvendored/- upstream code from TSB-AD, Goswami, MSAD (gitignored, cloned at setup)
| Phase | What | Status |
|---|---|---|
| 0 | Repo, env, BaseAD interface, registry, ECDF normalization | done |
| 0 | Three reference detectors (IForest, LOF, OCSVM) with hyperparameter grids | done |
| 0 | Artifact-persistence harness (parquet with provenance) | done |
| 0 | Smoke-test framework (32 tests, ~70s end-to-end) | done |
| 1 | UCR Anomaly Archive 2021 loader (250 series) | done |
| 1 | Synthetic suite with deterministic train/test split | done |
| 1 | Windowing utilities (sliding window + per-point reduction) | done |
| 2 | Oracle pipeline (E2E demo: 6 candidates × UCR + synthetic, AUC-PR, AUC-ROC) | done |
| 3 | Source 1 LCO (data-domain + PCA-latent, multi-granularity, difficulty-stratified) | done |
| 3 | Source 2 (synthetic perturbation scoring) | done |
| 3 | Rank aggregation (Borda + Kendall agreement) | done |
| 3 | Selection regret + summaries | done |
| 2 | Full 60-candidate pool (12 more model families) | pending |
| 2 | TSB-AD, SMD, MSAD-relevant TSB-UAD downloads | pending |
| 4 | Type-aware combiner | pending |
| 5 | Six baseline ports (Goswami, MSAD, N-1 Experts, Idan, IREOS, ASOI) | pending |
| 6-7 | Main experiments, ablations, failure-mode analysis | pending |
# Python 3.11 is required
python -m venv .venv
source .venv/Scripts/activate # Windows Git Bash
pip install --upgrade pip wheel
# CUDA 12.1 PyTorch (omit --index-url for CPU-only)
pip install --index-url https://download.pytorch.org/whl/cu121 "torch>=2.2,<2.5"
# Install AutoAD + remaining deps
pip install -e .
# Download UCR Anomaly Archive 2021 (~184 MB)
python scripts/01_download_ucr.py
# Run smoke tests (~ 70s on a laptop)
pytest tests/smoke/ -v
# Run a working end-to-end demo (single model, two hyperparams, two datasets)
python scripts/02_run_e2e_minimal.py --ucr-limit 20 --synth-per-family 5
# Run the full MS-PAS demo (oracle + LCO data + LCO latent + Source 2 + selection + regret)
python scripts/03_run_e2e_full.py --ucr-limit 10 --synth-per-family 5A paper is in preparation. The blueprint and references are available at the project page.
MIT.
