Privacy-Preserving Transfer Learning Framework for Building Energy Forecasting with Fully Anonymized Data
Wonjun Choi · Sangwon Lee · Max Langtry · Ruchi Choudhary
Applied Energy, 2026
AI-driven forecasting offers a promising solution for optimal building energy control, yet is constrained by scarce labeled data and strict privacy regulations. While transfer learning can alleviate data scarcity by leveraging data from other buildings, conventional approaches rely on metadata — such as building type, climate zone, or occupancy schedules — that is unavailable in fully anonymized datasets.
The PPTL framework resolves this deadlock by learning similarity directly from anonymized time-series dynamics. Using an unsupervised contrastive encoder, the framework maps each building's dynamics to high-dimensional representation vectors learned solely from temporal patterns. Cosine distance between representations guides source selection to pretrain a lightweight forecaster, which is then fine-tuned on limited target data. Leave-one-out experiments on 89 real-world buildings validate that learned similarity strongly correlates with transfer performance.
| Metric | Value |
|---|---|
| Median MSE reduction vs. no-transfer-learning baseline | 27–31% |
| Configurations improved over no-transfer-learning baseline | 99.2% (353 / 356) |
| Maximum degradation vs. no-transfer-learning baseline (only 3 cases) | 2.2% |
| Communication bandwidth vs. federated learning | 0.51% |
Three modular components work in sequence to enable metadata-free transfer learning:
Encoder (TS2Vec) → Strategy Controller → Forecaster (TiDE)
-
Encoder — TS2Vec: An unsupervised time-series encoder that learns temporal patterns (daily cycles, seasonal trends, load shapes) directly from raw data without labels or metadata.
-
Strategy Controller: Computes a similarity score (cosine distance) between each source building and the target in the learned representation space, then ranks and selects the most similar sources for pretraining.
-
Forecaster — TiDE: A lightweight MLP-based encoder–decoder for time-series forecasting. Scales linearly with input length and supports full parallel computation, making it efficient for deployment on resource-constrained systems.
-
Metadata-free transfer learning framework — Enables effective transfer learning using exclusively anonymized time-series data, establishing a data-native methodology that bypasses reliance on metadata.
-
Representation distance as a transferability proxy — Shows that similarity in the learned representation space reliably predicts transfer learning success, replacing manual heuristics with data-driven source selection.
-
Negative transfer as a manageable engineering risk — Characterizes the trade-off between source quantity and similarity, identifying a distinct performance sweet spot and transforming negative transfer from an unpredictable risk into a systematic engineering decision.
-
Scalable deployment complementing federated learning — Requires only 0.51% of the communication bandwidth compared to federated learning while offloading all computation to the server, enabling deployment on legacy building systems.
| Dimension | Federated Learning | PPTL |
|---|---|---|
| Privacy approach | Structural locality (raw data stays on client) | Regulatory compliance (identifiers stripped before pooling) |
| Communication | High — continuous sync over many rounds (~608 MB) | Minimal — single upload/download cycle (~3.1 MB) |
| Client computation | Heavy — iterative local gradient computation (GPU required) | Negligible — all training offloaded to server |
| Data heterogeneity | Vulnerable — performance degrades when building data differ significantly | Robust by design — automatically selects similar sources |
| Model personalization | Generic global model (averaged behavior) | Target-specific model (fine-tuned per building) |
| Scalability | Bottlenecked by edge network reliability | Bounded by server storage/compute |
FL and PPTL are complementary, not competing. PPTL's similarity-based clustering can enhance FL by grouping buildings into operationally compatible cohorts, addressing FL's vulnerability to heterogeneous data.
The experiments use the Cambridge University Estates Building Energy Archive — a fully anonymized dataset spanning 24 years (2000–2023) of hourly electricity usage, weather observations, and metadata for ~120 buildings at the University of Cambridge. Due to privacy, all buildings are identified only by randomized numerical indices with no metadata.
A 16-month interval [2009-01-01, 2010-05-01) was curated to maximize gap-free coverage, yielding 89 buildings:
| Period | Role | Duration |
|---|---|---|
| Jan–Dec 2009 | Source data (pretraining) | 12 months |
| Jan–Feb 2010 | Target data (fine-tuning & similarity) | 2 months |
| Mar–Apr 2010 | Test data (evaluation) | 2 months |
Features: 10 covariates (time-of-day, day-of-week, temperature, humidity, solar irradiance) + 1 target feature (hourly electricity usage).
The PPTL framework follows a 4-step sequential pipeline, preceded by a one-time hyperparameter tuning step. Each step must be executed in order.
Script: scripts/tune_hyperparameter.py
Performs hyperparameter optimization for the TiDE forecaster using Optuna (400 trials).
uv run python scripts/tune_hyperparameter.py <device_id>Output: output/assets/tide-hypertune.db
Script: scripts/train_encoder.py
Trains TS2Vec contrastive encoders for each target building. For each target, an encoder is trained on data from all 88 other buildings (leave-one-out), constructing the representation space used for similarity assessment.
uv run python scripts/train_encoder.py| Parameter | Value |
|---|---|
| Hidden dimensions | 64 |
| Output dimensions | 320 |
| Max train length | 3000 |
| Training iterations | 200 |
| Batch size | 16 |
Output: output/assets/weights/encoder_b{bid}.pt
Script: scripts/calculate_similarity.py
Generates representation vectors and computes similarity scores (cosine distance) between each target building (Jan–Feb 2010) and each source building (Jan–Feb 2009).
uv run python scripts/calculate_similarity.pyOutput: output/assets/similarities.json
Script: scripts/train_tide.py
Pretrains TiDE forecasters on source buildings selected by similarity ranking.
uv run python scripts/train_tide.py --bid <building_id> --mode <mode> --n-sources <n> --device <device_id>Source selection strategies (paper terminology in parentheses):
best(Closest) — Top N most similar sourcesworst(Farthest) — Bottom N least similar sourcesall— All 88 source buildings
The paper tests N ∈ {2, 4, 8, 16}.
TiDE Hyperparameters (selected via Optuna)
| Parameter | Value |
|---|---|
| Input chunk length | 168 (7 days × 24 hours) |
| Output chunk length | 24 (1 day) |
| Batch size | 256 |
| Hidden size | 256 |
| Encoder / Decoder layers | 1 / 1 |
| Decoder output dim | 8 |
| Temporal decoder hidden | 32 |
| Dropout | 0.3981 |
| Learning rate | 5.3954 × 10⁻⁴ |
Output: output/assets/weights/tide_bid_{bid}_{mode}_{n_sources}.pt
Script: scripts/transfer_tide.py
Fine-tunes the pretrained TiDE model on the target building's data (Jan–Feb 2010) and evaluates on the test period (Mar–Apr 2010).
uv run python scripts/transfer_tide.py --bid <building_id> --mode <mode> --n-sources <n> --device <device_id>- Transfer modes (
best,worst,all): Learning rate scaled to 1/10 of the pretraining rate - No-transfer-learning baseline (
none): Learning rate unscaled
Output Database Schema
| Column | Description |
|---|---|
bid |
Building ID |
mode |
Transfer learning mode |
n_sources |
Number of source buildings used |
last_val_loss / best_val_loss |
Validation losses |
last_test_loss / best_test_loss |
Test losses (MSE) |
run_id |
MLFlow run ID |
Output: output/assets/transfer_learning.db
Script: scripts/visualize_forecast.py
Visualize the forecast of a single fine-tuned TiDE checkpoint against the ground truth. Requires Steps 1–4 to have been completed for the target building.
uv run python scripts/visualize_forecast.py --bid <building_id> --mode <mode> [--n <n_sources>] [--output <path>]- Python 3.10 (Python 3.11 is not supported)
- CUDA-compatible GPU (recommended)
uvpackage manager
uv synccd datasets
git clone https://github.com/EECi/Cambridge-Estates-Building-Energy-Archive.git
cd Cambridge-Estates-Building-Energy-Archive
# Reset to the specific commit used in the paper
git reset --hard b2f5d4e# Step 0: Hyperparameter tuning (one-time)
uv run python scripts/tune_hyperparameter.py 0
# Step 1: Train encoders for all buildings
uv run python scripts/train_encoder.py
# Step 2: Calculate similarities
uv run python scripts/calculate_similarity.py
# Step 3: Pretrain TiDE (example: building 0, Closest 4 sources)
uv run python scripts/train_tide.py --bid 0 --mode best --n-sources 4 --device 0
# Step 4: Fine-tune and evaluate
uv run python scripts/transfer_tide.py --bid 0 --mode best --n-sources 4 --device 0
# No-transfer-learning baseline comparison
uv run python scripts/transfer_tide.py --bid 0 --mode none --device 0
# Visualize a single model's forecast
uv run python scripts/visualize_forecast.py --bid 0 --mode best --n 4# Pretrain all buildings with Closest / Farthest sources
bash scripts/train_tide_best.sh
bash scripts/train_tide_worst.sh
# Transfer learning for all buildings
bash scripts/transfer_tide_best_worst.sh # Closest + Farthest
bash scripts/transfer_tide_none.sh # No-transfer-learning baselinesimport sqlite3
conn = sqlite3.connect("output/assets/transfer_learning.db")
cursor = conn.cursor()
cursor.execute("SELECT * FROM transfer_learning LIMIT 10")
for row in cursor.fetchall():
print(row)PPTL_codes/
├── scripts/ # Main experiment scripts
│ ├── tune_hyperparameter.py # Step 0: Hyperparameter tuning
│ ├── train_encoder.py # Step 1: TS2Vec encoder training
│ ├── calculate_similarity.py # Step 2: Cosine similarity calculation
│ ├── train_tide.py # Step 3: TiDE pretraining
│ ├── transfer_tide.py # Step 4: Fine-tuning & evaluation
│ ├── visualize_forecast.py # Forecast visualization
│ └── *.sh # Batch processing shell scripts
├── utils/ # Utility functions
│ └── data.py # Data loading & preprocessing
├── ts2vec/ # TS2Vec library (modified for compatibility)
├── datasets/ # Dataset directory
│ └── Cambridge-Estates-.../ # └─ Cloned dataset repository
├── output/ # Output directory (auto-created)
│ └── assets/
│ ├── weights/ # ├─ Encoder & TiDE weights
│ ├── tide_transfer/ # ├─ Fine-tuning checkpoints
│ ├── similarities.json # ├─ Building similarity scores
│ ├── tide-hypertune.db # ├─ Optuna study database
│ ├── transfer_learning.db # └─ Transfer learning results
│ └── forecast_b{bid}_*.png # Forecast visualization plots
├── pyproject.toml # Project dependencies
├── LICENSE # MIT License
└── README.md # This file
All scripts resolve paths relative to the script file location. Key paths:
| Path | Used In | Purpose |
|---|---|---|
../datasets/Cambridge-Estates-Building-Energy-Archive |
All scripts | Dataset root |
../output/assets/weights/encoder_b{bid}.pt |
train_encoder.py, calculate_similarity.py |
Encoder weights |
../output/assets/similarities.json |
calculate_similarity.py, train_tide.py |
Similarity scores |
../output/assets/weights/tide_bid_{bid}_{mode}_{n_sources}.pt |
train_tide.py, transfer_tide.py |
Pretrained TiDE weights |
../output/assets/tide-hypertune.db |
tune_hyperparameter.py |
Optuna study database |
../output/assets/transfer_learning.db |
transfer_tide.py |
Transfer learning results |
../output/assets/tide_transfer/ |
transfer_tide.py |
Fine-tuning checkpoints |
../ts2vec |
train_encoder.py, calculate_similarity.py |
TS2Vec library |
Note: Output directories are created automatically when scripts are executed.
- Scripts can be executed from any directory (paths are resolved relative to the script file)
- GPU is required for training (CUDA device)
- The dataset must be properly set up before running any scripts
- Scripts use fixed random seeds for reproducibility
- MLFlow is used for experiment tracking
- Early stopping is configured in all training scripts to prevent overfitting
If you use this code, please cite:
@article{choi2026pptl,
title = {Privacy-Preserving Transfer Learning Framework for Building
Energy Forecasting with Fully Anonymized Data},
author = {Choi, Wonjun and Lee, Sangwon and Langtry, Max and Choudhary, Ruchi},
journal = {Applied Energy},
year = {2026},
doi = {10.1016/j.apenergy.2026.127600}
}The ts2vec directory contains a modified version of the TS2Vec codebase from the official repository. Only library version compatibility issues were resolved; no functional changes were made.
This project is licensed under the MIT License — see the LICENSE file for details.