Rainflow-aware DQN for economic battery storage dispatch with physically consistent market logic, reserve commitment, and economically calibrated degradation.
- Rainflow-exact degradation based on switching points instead of purely heuristic penalties
- Physically consistent step logic with capacity limits, efficiency, reserve activation, and a safety layer
- YAML-based configuration for battery, market, reward, features, agent, and evaluation
- Versioned sample data for arbitrage and load following
- Extended evaluation with economic KPIs, Rainflow cycle statistics, and baseline comparisons
The figures below were generated directly from the versioned sample dataset data/sample_arbitrage_30d.csv with configs/default.yaml. For the README assets, the repo intentionally uses deterministic baselines instead of a bundled checkpoint so the results stay reproducible within seconds:
python scripts/build_readme_assets.pyThis writes docs/readme_assets/*.png and docs/readme_assets/example_metrics.json.
The demo uses 720 hourly steps and combines arbitrage prices, reserve compensation, and an FR signal. That makes the market mechanics of the environment easy to inspect without needing an extra dataset.
| Agent | Net Profit [EUR] | Reserve [EUR] | Degradation [EUR] | Throughput [MWh] | Cycles/Day |
|---|---|---|---|---|---|
MovingAvg |
232.15 | 0.00 | 13.24 | 11.84 | 0.197 |
MovingAvg-Reserve |
228.46 | 2.10 | 13.97 | 12.06 | 0.201 |
Threshold |
205.66 | 0.00 | 11.02 | 6.65 | 0.111 |
Rule-Based |
202.73 | 3.27 | 13.27 | 12.34 | 0.206 |
Quantile |
126.13 | 5.65 | 10.17 | 9.20 | 0.153 |
This example highlights two recurring tradeoffs in the repo:
- More market activity usually increases gross profit, but it also raises throughput and degradation cost.
- Reserve revenue alone does not guarantee the best net dispatch; on this sample,
MovingAvgwins without reserve commitment.
The current best reproducible README run is MovingAvg on data/sample_arbitrage_30d.csv:
- Net profit:
232.15 EUR - Arbitrage revenue:
245.39 EUR - Degradation cost:
13.24 EUR - Throughput:
11.84 MWh - Rainflow cycles:
63 - Mean cycle depth:
0.069 - Final SoH:
99.989 %
If you want to document a trained policy instead of baselines, you can train on the same data with train.py and then evaluate it with evaluate.py.
git clone <repo-url>
cd "BESS Pricing"
python -m venv .venv
# Windows
.venv\Scripts\activate
# Linux / macOS
source .venv/bin/activate
pip install -r requirements.txt# Train on the 30-day arbitrage sample
python train.py --data data/sample_arbitrage_30d.csv
# Smoke test with fewer timesteps
python train.py --data data/sample_arbitrage_30d.csv --timesteps 50000
# Load following with a custom config
python train.py --config configs/default.yaml --data data/sample_load_following_30d.csv# Default run with synthetic data
python train.py
# Generate fresh sample data
python generate_sample_data.py --days 30# Evaluate a trained model
python evaluate.py checkpoints/best_model.zip --data data/sample_arbitrage_30d.csv
# Save plots without opening windows
python evaluate.py checkpoints/best_model.zip --data data/sample_arbitrage_30d.csv --no-plot --output evaluation_results
# Compare against baselines
python evaluate.py checkpoints/best_model.zip --data data/sample_arbitrage_30d.csv --baselinepython scripts/build_readme_assets.pyGenerated files:
docs/readme_assets/market_inputs.pngdocs/readme_assets/baseline_comparison.pngdocs/readme_assets/dispatch_example.pngdocs/readme_assets/example_metrics.json
BESS Pricing/
├── configs/
│ └── default.yaml
├── data/
│ ├── sample_arbitrage.csv
│ ├── sample_arbitrage_30d.csv
│ ├── sample_load_following.csv
│ └── sample_load_following_30d.csv
├── docs/
│ └── readme_assets/ # Reproducible README figures
├── tests/
├── scripts/
│ └── build_readme_assets.py # Generates the figures used in this README
├── baselines.py # Deterministic comparison agents
├── config_loader.py # YAML config loader
├── data_loader.py # CSV and synthetic data loader
├── evaluate.py # Evaluation, KPIs, and plots
├── features.py # Time features and observation stacking
├── generate_sample_data.py # Data generator
├── market_env.py # Gymnasium environment
├── rainflow_sp.py # Switching points and degradation
├── train.py # DQN training
└── Readme.md
The main configuration file is configs/default.yaml. Key parameter blocks:
task: arbitrage
battery:
capacity_mwh: 1.0
p_max_mw: 0.25
eta_charge: 0.95
eta_discharge: 0.95
soc_min: 0.1
soc_max: 0.9
degradation:
alpha_d: 0.0045
beta: 1.3
use_economic_degradation: true
reference_dod: 0.8
cycle_life: 6000.0
replacement_cost_eur_per_mwh: 120000.0
env:
dt_hours: 1.0
n_power_levels: 17
n_reserve_levels: 4
reserve_max_fraction: 0.3
stack_k: 4
agent:
learning_rate: 0.00048
batch_size: 256
gamma: 0.975
training:
total_timesteps: 250000
seed: 42CLI overrides are still supported:
python train.py --data data/sample_arbitrage_30d.csv --timesteps 100000 --seed 123| Column | Description | Required |
|---|---|---|
timestamp |
Timestamp in ISO format | Optional |
price |
Power price in EUR/MWh |
Yes |
fr_signal |
Frequency regulation signal in [-1, 1] |
Optional |
reserve_price |
Reserve compensation in EUR/MW per step |
Optional |
temperature |
Temperature in °C |
Optional |
Example:
timestamp,price,fr_signal,reserve_price
2024-01-01 00:00:00,45.2,0.12,1.5
2024-01-01 01:00:00,42.8,-0.08,1.5Included in the repo:
data/sample_arbitrage.csvfor fast 48h smoke testsdata/sample_arbitrage_30d.csvfor more realistic dispatch and backtest runs
Additional columns:
| Column | Description |
|---|---|
demand |
Load demand in MW |
re_gen |
Renewable generation in MW |
Included in the repo:
data/sample_load_following.csvdata/sample_load_following_30d.csv
Degradation depends on the cycle depth of the SoC trajectory. Using the last three switching points (c₀, c₁, c₂), the repo computes a per-step increment that remains consistent with Rainflow cycle costs:
h_t^d = α_d · exp(β|c_t + b_t - c₂|) - α_d · exp(β|c_t - c₂|)
The repo then calibrates these units to reference cycle life and replacement cost. That makes degradation directly interpretable as economic asset wear in EUR.
Each action is split into two components:
- planned energy position for arbitrage
- reserve ratio that preserves headroom in both directions
The realized SoC change then accounts for:
- SoC limits
- charge and discharge efficiency
- reserve activation through the FR signal
- safety-layer projection into the feasible action space
- Discrete actions map naturally to power levels and reserve levels with a safety layer.
- Time features and observation stacking are already built into the environment.
evaluate.pycan compare trained DQN policies directly against baselines.
- Checkpoints:
checkpoints/bess_dqn_*.zip - Best model:
checkpoints/best_model.zip - TensorBoard logs:
tensorboard/ - Config snapshots:
checkpoints/config_*.yaml
Start TensorBoard:
tensorboard --logdir tensorboard- KPI JSON:
evaluation_results/evaluation_*.json - Evaluation plot:
evaluation_results/evaluation_plot_*.png - Cycle plot:
evaluation_results/cycles_*.png - Agent comparison:
evaluation_results/comparison_*.png
Included metrics:
- Net profit and arbitrage revenue
- Reserve revenue and reserve compliance
- Throughput, EFC, and cycles per day
- Profit per throughput
- Daily risk metrics such as Sharpe, VaR, and drawdown
from market_env import BessMultiMarketEnv
import numpy as np
price = np.random.uniform(30, 100, size=24 * 7)
env = BessMultiMarketEnv(
price=price,
dt_hours=1.0,
c_min=0.1,
c_max=0.9,
c_init=0.5,
b_max=0.08,
alpha_d=4.5e-3,
beta=1.3,
)
obs, info = env.reset()
for _ in range(100):
action = env.action_space.sample()
obs, reward, done, trunc, info = env.step(action)
print(f"SoC: {info['soc']:.2f}, Reward: {reward:.2f}")
if done:
breakfrom data_loader import load_csv_data, generate_synthetic_data
data = load_csv_data(
path="data/my_data.csv",
columns={"price": "price", "fr_signal": "fr_signal"},
task="arbitrage",
)
synthetic = generate_synthetic_data(days=30, seed=42)
print(synthetic)- The degradation model uses an exponential DoD and Rainflow formulation and should be calibrated to the actual cell chemistry.
- The README figures show reproducible baseline runs, not the performance of a bundled pretrained DQN checkpoint.
- For fair model comparisons, training and evaluation data should be separated in time.
- Kwon & Zhu: Rainflow-exact degradation in MDPs with DQN
- Comparative DRL studies on BESS dispatch with cyclical time features and observation stacking
MIT License


