Privacy-Preserving Transfer Learning (PPTL)

Privacy-Preserving Transfer Learning Framework for Building Energy Forecasting with Fully Anonymized Data

Wonjun Choi · Sangwon Lee · Max Langtry · Ruchi Choudhary

Applied Energy, 2026

📖 Abstract

AI-driven forecasting offers a promising solution for optimal building energy control, yet is constrained by scarce labeled data and strict privacy regulations. While transfer learning can alleviate data scarcity by leveraging data from other buildings, conventional approaches rely on metadata — such as building type, climate zone, or occupancy schedules — that is unavailable in fully anonymized datasets.

The PPTL framework resolves this deadlock by learning similarity directly from anonymized time-series dynamics. Using an unsupervised contrastive encoder, the framework maps each building's dynamics to high-dimensional representation vectors learned solely from temporal patterns. Cosine distance between representations guides source selection to pretrain a lightweight forecaster, which is then fine-tuned on limited target data. Leave-one-out experiments on 89 real-world buildings validate that learned similarity strongly correlates with transfer performance.

✨ Key Results

Metric	Value
Median MSE reduction vs. no-transfer-learning baseline	27–31%
Configurations improved over no-transfer-learning baseline	99.2% (353 / 356)
Maximum degradation vs. no-transfer-learning baseline (only 3 cases)	2.2%
Communication bandwidth vs. federated learning	0.51%

🏗️ Framework Architecture

Three modular components work in sequence to enable metadata-free transfer learning:

Encoder (TS2Vec) → Strategy Controller → Forecaster (TiDE)

Encoder — TS2Vec: An unsupervised time-series encoder that learns temporal patterns (daily cycles, seasonal trends, load shapes) directly from raw data without labels or metadata.
Strategy Controller: Computes a similarity score (cosine distance) between each source building and the target in the learned representation space, then ranks and selects the most similar sources for pretraining.
Forecaster — TiDE: A lightweight MLP-based encoder–decoder for time-series forecasting. Scales linearly with input length and supports full parallel computation, making it efficient for deployment on resource-constrained systems.

🔬 Contributions

Metadata-free transfer learning framework — Enables effective transfer learning using exclusively anonymized time-series data, establishing a data-native methodology that bypasses reliance on metadata.
Representation distance as a transferability proxy — Shows that similarity in the learned representation space reliably predicts transfer learning success, replacing manual heuristics with data-driven source selection.
Negative transfer as a manageable engineering risk — Characterizes the trade-off between source quantity and similarity, identifying a distinct performance sweet spot and transforming negative transfer from an unpredictable risk into a systematic engineering decision.
Scalable deployment complementing federated learning — Requires only 0.51% of the communication bandwidth compared to federated learning while offloading all computation to the server, enabling deployment on legacy building systems.

🆚 Comparison with Federated Learning

Dimension	Federated Learning	PPTL
Privacy approach	Structural locality (raw data stays on client)	Regulatory compliance (identifiers stripped before pooling)
Communication	High — continuous sync over many rounds (~608 MB)	Minimal — single upload/download cycle (~3.1 MB)
Client computation	Heavy — iterative local gradient computation (GPU required)	Negligible — all training offloaded to server
Data heterogeneity	Vulnerable — performance degrades when building data differ significantly	Robust by design — automatically selects similar sources
Model personalization	Generic global model (averaged behavior)	Target-specific model (fine-tuned per building)
Scalability	Bottlenecked by edge network reliability	Bounded by server storage/compute

FL and PPTL are complementary, not competing. PPTL's similarity-based clustering can enhance FL by grouping buildings into operationally compatible cohorts, addressing FL's vulnerability to heterogeneous data.

📊 Dataset

The experiments use the Cambridge University Estates Building Energy Archive — a fully anonymized dataset spanning 24 years (2000–2023) of hourly electricity usage, weather observations, and metadata for ~120 buildings at the University of Cambridge. Due to privacy, all buildings are identified only by randomized numerical indices with no metadata.

A 16-month interval [2009-01-01, 2010-05-01) was curated to maximize gap-free coverage, yielding 89 buildings:

Period	Role	Duration
Jan–Dec 2009	Source data (pretraining)	12 months
Jan–Feb 2010	Target data (fine-tuning & similarity)	2 months
Mar–Apr 2010	Test data (evaluation)	2 months

Features: 10 covariates (time-of-day, day-of-week, temperature, humidity, solar irradiance) + 1 target feature (hourly electricity usage).

🧪 Experimental Workflow

The PPTL framework follows a 4-step sequential pipeline, preceded by a one-time hyperparameter tuning step. Each step must be executed in order.

Step 0 · Hyperparameter Tuning (one-time prerequisite)

Script: scripts/tune_hyperparameter.py

Performs hyperparameter optimization for the TiDE forecaster using Optuna (400 trials).

uv run python scripts/tune_hyperparameter.py <device_id>

Output: output/assets/tide-hypertune.db

Step 1 · Unsupervised Encoder Training

Script: scripts/train_encoder.py

Trains TS2Vec contrastive encoders for each target building. For each target, an encoder is trained on data from all 88 other buildings (leave-one-out), constructing the representation space used for similarity assessment.

uv run python scripts/train_encoder.py

Parameter	Value
Hidden dimensions	64
Output dimensions	320
Max train length	3000
Training iterations	200
Batch size	16

Output: output/assets/weights/encoder_b{bid}.pt

Step 2 · Similarity-Based Source Selection

Script: scripts/calculate_similarity.py

Generates representation vectors and computes similarity scores (cosine distance) between each target building (Jan–Feb 2010) and each source building (Jan–Feb 2009).

uv run python scripts/calculate_similarity.py

Output: output/assets/similarities.json

Step 3 · Forecaster Pretraining

Script: scripts/train_tide.py

Pretrains TiDE forecasters on source buildings selected by similarity ranking.

uv run python scripts/train_tide.py --bid <building_id> --mode <mode> --n-sources <n> --device <device_id>

Source selection strategies (paper terminology in parentheses):

best (Closest) — Top N most similar sources
worst (Farthest) — Bottom N least similar sources
all — All 88 source buildings

The paper tests N ∈ {2, 4, 8, 16}.

TiDE Hyperparameters (selected via Optuna)

Parameter	Value
Input chunk length	168 (7 days × 24 hours)
Output chunk length	24 (1 day)
Batch size	256
Hidden size	256
Encoder / Decoder layers	1 / 1
Decoder output dim	8
Temporal decoder hidden	32
Dropout	0.3981
Learning rate	5.3954 × 10⁻⁴

Output: output/assets/weights/tide_bid_{bid}_{mode}_{n_sources}.pt

Step 4 · Fine-tuning and Evaluation

Script: scripts/transfer_tide.py

Fine-tunes the pretrained TiDE model on the target building's data (Jan–Feb 2010) and evaluates on the test period (Mar–Apr 2010).

uv run python scripts/transfer_tide.py --bid <building_id> --mode <mode> --n-sources <n> --device <device_id>

Transfer modes (best, worst, all): Learning rate scaled to 1/10 of the pretraining rate
No-transfer-learning baseline (none): Learning rate unscaled

Output Database Schema

Column	Description
`bid`	Building ID
`mode`	Transfer learning mode
`n_sources`	Number of source buildings used
`last_val_loss` / `best_val_loss`	Validation losses
`last_test_loss` / `best_test_loss`	Test losses (MSE)
`run_id`	MLFlow run ID

Output: output/assets/transfer_learning.db

Visualization

Script: scripts/visualize_forecast.py

Visualize the forecast of a single fine-tuned TiDE checkpoint against the ground truth. Requires Steps 1–4 to have been completed for the target building.

uv run python scripts/visualize_forecast.py --bid <building_id> --mode <mode> [--n <n_sources>] [--output <path>]

🚀 Quick Start

Prerequisites

Python 3.10 (Python 3.11 is not supported)
CUDA-compatible GPU (recommended)
uv package manager

Installation

uv sync

Dataset Setup

cd datasets
git clone https://github.com/EECi/Cambridge-Estates-Building-Energy-Archive.git
cd Cambridge-Estates-Building-Energy-Archive

# Reset to the specific commit used in the paper
git reset --hard b2f5d4e

Complete Workflow Example

# Step 0: Hyperparameter tuning (one-time)
uv run python scripts/tune_hyperparameter.py 0

# Step 1: Train encoders for all buildings
uv run python scripts/train_encoder.py

# Step 2: Calculate similarities
uv run python scripts/calculate_similarity.py

# Step 3: Pretrain TiDE (example: building 0, Closest 4 sources)
uv run python scripts/train_tide.py --bid 0 --mode best --n-sources 4 --device 0

# Step 4: Fine-tune and evaluate
uv run python scripts/transfer_tide.py --bid 0 --mode best --n-sources 4 --device 0

# No-transfer-learning baseline comparison
uv run python scripts/transfer_tide.py --bid 0 --mode none --device 0

# Visualize a single model's forecast
uv run python scripts/visualize_forecast.py --bid 0 --mode best --n 4

Batch Processing

# Pretrain all buildings with Closest / Farthest sources
bash scripts/train_tide_best.sh
bash scripts/train_tide_worst.sh

# Transfer learning for all buildings
bash scripts/transfer_tide_best_worst.sh  # Closest + Farthest
bash scripts/transfer_tide_none.sh        # No-transfer-learning baselines

Querying Results

import sqlite3

conn = sqlite3.connect("output/assets/transfer_learning.db")
cursor = conn.cursor()
cursor.execute("SELECT * FROM transfer_learning LIMIT 10")
for row in cursor.fetchall():
    print(row)

📁 Repository Structure

PPTL_codes/
├── scripts/                       # Main experiment scripts
│   ├── tune_hyperparameter.py     # Step 0: Hyperparameter tuning
│   ├── train_encoder.py           # Step 1: TS2Vec encoder training
│   ├── calculate_similarity.py    # Step 2: Cosine similarity calculation
│   ├── train_tide.py              # Step 3: TiDE pretraining
│   ├── transfer_tide.py           # Step 4: Fine-tuning & evaluation
│   ├── visualize_forecast.py      # Forecast visualization
│   └── *.sh                       # Batch processing shell scripts
├── utils/                         # Utility functions
│   └── data.py                    # Data loading & preprocessing
├── ts2vec/                        # TS2Vec library (modified for compatibility)
├── datasets/                      # Dataset directory
│   └── Cambridge-Estates-.../     #   └─ Cloned dataset repository
├── output/                        # Output directory (auto-created)
│   └── assets/
│       ├── weights/               #   ├─ Encoder & TiDE weights
│       ├── tide_transfer/         #   ├─ Fine-tuning checkpoints
│       ├── similarities.json      #   ├─ Building similarity scores
│       ├── tide-hypertune.db      #   ├─ Optuna study database
│       ├── transfer_learning.db   #   └─ Transfer learning results
│       └── forecast_b{bid}_*.png  #       Forecast visualization plots
├── pyproject.toml                 # Project dependencies
├── LICENSE                        # MIT License
└── README.md                      # This file

🔧 Hardcoded File Paths

All scripts resolve paths relative to the script file location. Key paths:

Path	Used In	Purpose
`../datasets/Cambridge-Estates-Building-Energy-Archive`	All scripts	Dataset root
`../output/assets/weights/encoder_b{bid}.pt`	`train_encoder.py`, `calculate_similarity.py`	Encoder weights
`../output/assets/similarities.json`	`calculate_similarity.py`, `train_tide.py`	Similarity scores
`../output/assets/weights/tide_bid_{bid}_{mode}_{n_sources}.pt`	`train_tide.py`, `transfer_tide.py`	Pretrained TiDE weights
`../output/assets/tide-hypertune.db`	`tune_hyperparameter.py`	Optuna study database
`../output/assets/transfer_learning.db`	`transfer_tide.py`	Transfer learning results
`../output/assets/tide_transfer/`	`transfer_tide.py`	Fine-tuning checkpoints
`../ts2vec`	`train_encoder.py`, `calculate_similarity.py`	TS2Vec library

Note: Output directories are created automatically when scripts are executed.

📝 Notes

Scripts can be executed from any directory (paths are resolved relative to the script file)
GPU is required for training (CUDA device)
The dataset must be properly set up before running any scripts
Scripts use fixed random seeds for reproducibility
MLFlow is used for experiment tracking
Early stopping is configured in all training scripts to prevent overfitting

📜 Citation

If you use this code, please cite:

@article{choi2026pptl,
  title   = {Privacy-Preserving Transfer Learning Framework for Building
             Energy Forecasting with Fully Anonymized Data},
  author  = {Choi, Wonjun and Lee, Sangwon and Langtry, Max and Choudhary, Ruchi},
  journal = {Applied Energy},
  year    = {2026},
  doi     = {10.1016/j.apenergy.2026.127600}
}

TS2Vec Library

The ts2vec directory contains a modified version of the TS2Vec codebase from the official repository. Only library version compatibility issues were resolved; no functional changes were made.

License

This project is licensed under the MIT License — see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Privacy-Preserving Transfer Learning (PPTL)

📖 Abstract

✨ Key Results

🏗️ Framework Architecture

🔬 Contributions

🆚 Comparison with Federated Learning

📊 Dataset

🧪 Experimental Workflow

Step 0 · Hyperparameter Tuning (one-time prerequisite)

Step 1 · Unsupervised Encoder Training

Step 2 · Similarity-Based Source Selection

Step 3 · Forecaster Pretraining

Step 4 · Fine-tuning and Evaluation

Visualization

🚀 Quick Start

Prerequisites

Installation

Dataset Setup

Complete Workflow Example

Batch Processing

Querying Results

📁 Repository Structure

🔧 Hardcoded File Paths

📝 Notes

📜 Citation

TS2Vec Library

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
datasets		datasets
output		output
scripts		scripts
src/ucam_transfer_learning		src/ucam_transfer_learning
ts2vec		ts2vec
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Privacy-Preserving Transfer Learning (PPTL)

📖 Abstract

✨ Key Results

🏗️ Framework Architecture

🔬 Contributions

🆚 Comparison with Federated Learning

📊 Dataset

🧪 Experimental Workflow

Step 0 · Hyperparameter Tuning (one-time prerequisite)

Step 1 · Unsupervised Encoder Training

Step 2 · Similarity-Based Source Selection

Step 3 · Forecaster Pretraining

Step 4 · Fine-tuning and Evaluation

Visualization

🚀 Quick Start

Prerequisites

Installation

Dataset Setup

Complete Workflow Example

Batch Processing

Querying Results

📁 Repository Structure

🔧 Hardcoded File Paths

📝 Notes

📜 Citation

TS2Vec Library

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages