readme fixes, PL1 info removal

simon-donike · simon-donike · commit ddd37252a67d · 2026-02-18T16:47:27.000+01:00
diff --git a/README.md b/README.md
@@ -49,7 +49,7 @@ All key knobs are exposed via YAML in the `opensr_srgan/configs` folder:
 * **EMA smoothing:** Enable `Training.EMA.enabled` to keep a shadow copy of the generator. Decay values in the 0.995–0.9999 range balance responsiveness with stability and are swapped in automatically for validation/inference.
 * **Spectral normalization:** Optional for the SRGAN discriminator via `Discriminator.use_spectral_norm` to better control its Lipschitz constant and stabilize adversarial updates. [Miyato et al., 2018](https://arxiv.org/abs/1802.05957)
 * **Wasserstein critic + R1 penalty:** Switch `Training.Losses.adv_loss_type: wasserstein` to enable a critic objective and pair it with the configurable `Training.Losses.r1_gamma` gradient penalty on real images for smoother discriminator updates. [Arjovsky et al., 2017](https://arxiv.org/abs/1701.07875); [Mescheder et al., 2018](https://arxiv.org/abs/1801.04406)
-* **Relativistic average GAN (BCE):** Set `Training.Losses.relativistic_average_d: true` to train D/G on relative real-vs-fake logits instead of absolute logits. This is supported in both Lightning training paths (PL1 and PL2).
+* **Relativistic average GAN (BCE):** Set `Training.Losses.relativistic_average_d: true` to train D/G on relative real-vs-fake logits instead of absolute logits. This is supported in the Lightning 2+ manual-optimization training path.
 The schedule and ramp make training **easier, safer, and more reproducible**.
 
 ---
diff --git a/docs/architecture.md b/docs/architecture.md
@@ -2,9 +2,9 @@
 
 This document outlines how ESA OpenSR organises its super-resolution GAN, the major components that make up the model, and how each piece interacts during training and inference.
 
-## Vackground
+## Background
 
-OpenSR-SRGAN follows the single-image super-resolution (SISR) formulation in which the generator learns a mapping from a low-resolution observation $x$ to a plausible high-resolution reconstruction $x'$. The generator head widens the receptive field, a configurable trunk of $N$ residual-style blocks extracts features, and an upsampling tail increases spatial resolution. The residual fusion keeps skip connections active so the network focuses on high-frequency corrections rather than relearning the full signal:
+OpenSR-SRGAN follows the single-image super-resolution (SISR) formulation in which the generator learns a mapping from a low-resolution observation \(x\) to a plausible high-resolution reconstruction \(x'\). The generator head widens the receptive field, a configurable trunk of \(N\) residual-style blocks extracts features, and an upsampling tail increases spatial resolution. The residual fusion keeps skip connections active so the network focuses on high-frequency corrections rather than relearning the full signal:
 $$
 x' = \mathrm{Upsample}\!\left( \mathrm{Conv}_{\text{tail}}\!\left(\mathrm{Body}(x_{\text{head}}) + x_{\text{head}}\right)\! \right).
 $$
@@ -24,9 +24,10 @@ Because every generator variant (residual, RCAB, RRDB, large-kernel attention, E
   total-variation terms. Adversarial supervision uses `torch.nn.BCEWithLogitsLoss` with optional label smoothing.
 * **Optimiser scheduling.** `configure_optimizers()` returns paired Adam optimisers (generator + discriminator) with
   `ReduceLROnPlateau` schedulers that monitor a configurable validation metric.
-* **Training orchestration.** `training_step()` alternates discriminator (`optimizer_idx == 0`) and generator (`optimizer_idx ==
-  1`) updates. During the warm-up period configured by `Training.pretrain_g_only`, discriminator weights are frozen via
-  `on_train_batch_start()` and a dedicated `pretraining_training_step()` computes purely content-driven updates.
+* **Training orchestration.** `setup_lightning()` binds `training_step_PL2()` and enables manual optimisation
+  (`automatic_optimization = False`). Each step performs explicit discriminator and generator optimiser updates; during the
+  warm-up period configured by `Training.pretrain_g_only`, the generator runs content-driven updates while discriminator metrics
+  are logged without stepping discriminator weights.
 * **Validation and logging.** `validation_step()` computes the same content metrics, logs discriminator diagnostics, and pushes
   qualitative image panels to Weights & Biases according to `Logging.num_val_images`.
 * **Inference pipeline.** `predict_step()` automatically normalises Sentinel-2 style 0–10000 inputs, runs the generator,
@@ -53,7 +54,7 @@ The generator zoo lives under `opensr_srgan/model/generators/` and can be select
 * **Stochastic GAN generator (`cgan_generator.py`).** Extends the flexible generator with conditioning inputs and latent noise,
   enabling experiments where auxiliary metadata influences the super-resolution output.
 * **ESRGAN generator (`esrgan.py`).** Implements the RRDBNet trunk introduced with ESRGAN, exposing `n_blocks`, `growth_channels`,
-  and `res_scale` so you can dial in deeper receptive fields and sharper textures. The implementation supports original features like Relativistic Average GAN (RaGAN) and the codebase allows to perform two step training phase (content-oriented pretraining of generator followed by adversarial training with Discriminator) as originally proposed by ESRGAN authors.
+  and `res_scale` so you can dial in deeper receptive fields and sharper textures. The implementation supports original features like Relativistic Average GAN (RaGAN), and the codebase allows a two-step training phase (content-oriented pretraining of the generator followed by adversarial training with the discriminator), as originally proposed by the ESRGAN authors.
 * **Advanced variants (`SRGAN_advanced.py`).** Provides additional block implementations and compatibility aliases exposed in
   `__init__.py` for backwards compatibility.
 
diff --git a/docs/configuration.md b/docs/configuration.md
@@ -109,7 +109,7 @@ stable validation imagery. The EMA is fully optional and controlled through the
 | `adv_loss_beta` | `1e-3` | Target weight applied to the adversarial term after ramp-up. |
 | `adv_loss_schedule` | `cosine` | Ramp shape (`linear` or `cosine`). |
 | `adv_loss_type` | `bce` | Adversarial objective (`bce` for classic SRGAN logits, `wasserstein` for a non-saturating critic-style loss). |
-| `relativistic_average_d` | `False` | BCE-only switch for relativistic-average GAN training (real/fake logits are compared against each other's batch mean). Supported in both PL1 and PL2 training-step implementations. |
+| `relativistic_average_d` | `False` | BCE-only switch for relativistic-average GAN training (real/fake logits are compared against each other's batch mean). Supported in the Lightning 2+ manual-optimization training-step implementation. |
 | `r1_gamma` | `0.0` | Strength of the R1 gradient penalty applied to real images (useful with Wasserstein critics). |
 | `l1_weight` | `1.0` | Weight of the pixelwise L1 loss. |
 | `sam_weight` | `0.05` | Weight of the spectral angle mapper loss. |
diff --git a/docs/index.md b/docs/index.md
@@ -12,7 +12,7 @@ OpenSR-SRGAN is a comprehensive toolkit for training and evaluating super-resolu
 that make adversarial optimisation tractable—generator warm-up phases, learning-rate scheduling, adversarial-weight ramping, and more. All options are driven by concise YAML configuration files so you can explore new architectures or datasets without
 rewriting pipelines.  
   
-Whether you are reproducing published results, exploring new remote-sensing modalities, or are trying to esablish some benchmarks, OpenSR-SRGAN gives you a clear and extensible foundation for multispectral super-resolution research.
+Whether you are reproducing published results, exploring new remote-sensing modalities, or trying to establish benchmarks, OpenSR-SRGAN gives you a clear and extensible foundation for multispectral super-resolution research.
 
 > This repository and the configs represent the experiences that were made with SR-GAN training for remote sensing imagery. It's neither complete nor claims to perform SOTA SR, but it implements all tweaks and tips that make training SR-GANs easier.
 
@@ -82,7 +82,7 @@ Whether you are reproducing published results, exploring new remote-sensing moda
 * [Results](results.md) showcases results for some generator/discriminator and dataset combinations.
 
 ## ESA OpenSR
-OpenSR-SRGAN is part of the ESA [OpenSR](https://www.opensr.eu) ecosystem — an open framework for trustworthy super-resolution of multispectral satellite imagery. Within this initiative, this repository serves as the adversarial benchmark suite: it provides standardized GAN architectures, training procedures, and evaluation utilities that complement the other model types implemented in the project (diffusion, transformers, regression) and interfaces with from companion packages such as opensr-utils.
+OpenSR-SRGAN is part of the ESA [OpenSR](https://www.opensr.eu) ecosystem — an open framework for trustworthy super-resolution of multispectral satellite imagery. Within this initiative, this repository serves as the adversarial benchmark suite: it provides standardized GAN architectures, training procedures, and evaluation utilities that complement the other model types implemented in the project (diffusion, transformers, regression), and it interfaces with companion packages such as opensr-utils.
 
 ## Citation
 
diff --git a/docs/training-guideline.md b/docs/training-guideline.md
@@ -4,7 +4,7 @@ This section goes over the most important metrics and settings to achieve a bala
 
 
 ## Best Practices
-It is recommended to use the training warmups and schedulers as explained above. The following images present how these rpactices are reflected in the logs.
+It is recommended to use the training warmups and schedulers as explained above. The following images present how these practices are reflected in the logs.
 
 ### Objectives and loss composition
 
@@ -16,23 +16,23 @@ Each coefficient maps directly to the `Training.Losses` block in the configurati
 
 ### Exponential Moving Average (EMA)
 
-For smoother validation curves and more stable inference, the trainer can maintain an exponential moving average of the generator parameters. After each optimisation step, the EMA weights $\theta_{\text{EMA}}$ are updated toward the current generator state $\theta$:
+For smoother validation curves and more stable inference, the trainer can maintain an exponential moving average of the generator parameters. After each optimisation step, the EMA weights \(\theta_{\text{EMA}}\) are updated toward the current generator state \(\theta\):
 $$
 \theta_{\text{EMA}}^{(t)} = \beta \, \theta_{\text{EMA}}^{(t-1)} + (1 - \beta)\, \theta^{(t)},
 $$
-where the decay $\beta \in [0,1)$ controls how much history is retained. During validation and inference, the EMA snapshot replaces the live weights so that predictions are less sensitive to short-term oscillations. The final super-resolved output therefore comes from the smoothed generator,
+where the decay \(\beta \in [0,1)\) controls how much history is retained. During validation and inference, the EMA snapshot replaces the live weights so that predictions are less sensitive to short-term oscillations. The final super-resolved output therefore comes from the smoothed generator,
 $$
 \hat{y}_{\text{SR}} = G(x; \theta_{\text{EMA}}),
 $$
 which empirically reduces adversarial artefacts and improves perceptual consistency.
 
 
 #### Generator LR Warmup
-When starting to train, the learning rate slowly raises from 0 to the indicated value. This prevents exploding gradients after a random initialization of the weights when training the model from scratch. The length of the LR warmup is defined with the `Schedulers.g_warmup_steps` parameter in the config. Wether the increase is linear or more smooth is defined with the `Schedulers.g_warmup_type` setting, ideally this should be set to `cosine`.
+When starting to train, the learning rate slowly rises from 0 to the indicated value. This prevents exploding gradients after random initialization of the weights when training the model from scratch. The length of the LR warmup is defined with the `Schedulers.g_warmup_steps` parameter in the config. Whether the increase is linear or smoother is defined with the `Schedulers.g_warmup_type` setting; ideally this should be set to `cosine`.
 ![lr_gen_warmup](assets/lr_generator_warmup.png)  
 
 #### Generator Pre-training
-After the loss stabilizes, the generator continues to be trained while the discriminator sits idle. This prevents the discriminator form overpowering the generator in early stages of the training, where the generator output is easily identifyable as synthetic. The binary flag `training/pretrain_phase` is logged to indicate wether the model is still in pretraining or not. Wether the pretraining is enabled or not is defined with the `Training.pretrain_g_only` parameter in the config, the parameter `Training.g_pretrain_steps` defines how many steps this pretraining takes in total. The parameter `Training.g_warmup_steps` decides how many training steps (batches) this smooth LR increase takes, setting it to `0` turns it off.
+After the loss stabilizes, the generator continues to be trained while the discriminator sits idle. This prevents the discriminator from overpowering the generator in early training stages, where the generator output is still easily identifiable as synthetic. The binary flag `training/pretrain_phase` is logged to indicate whether the model is still in pretraining. Whether pretraining is enabled is defined with the `Training.pretrain_g_only` parameter in the config; `Training.g_pretrain_steps` defines how many steps this pretraining takes in total. The parameter `Training.g_warmup_steps` defines how many training steps (batches) the smooth LR increase lasts; setting it to `0` turns it off.
 
 During this generator-only pretraining window, the optimization target is hardwired to plain L1 loss only. Once pretraining ends, the normal configured content-loss mix (L1/SAM/perceptual/TV) is used again.
 ![gen_warmup](assets/pretrain_phase.png)  
@@ -42,8 +42,8 @@ Once the `training/pretrain_phase` flag is `0`, pretraining of the generator is
 ![adv_warmup](assets/adv_loss_warmup.png)  
 
 #### Continued Training
-As training continues, the generator is trying to fool the discriminator and the discriminator is trying to distinguish between true/synthetic, we monitor the overall loss of the models independantly. When the overall loss metric of one model reaches a plateau, we reduce it's learning rate in order to optimnally train the model.
-![lr_scheduler](assets/lr_scheduler.png). The patience, LR decrease factor inc ase of plateau and the metric to be used for these LR schedulers are all defined individually for $G$ and $D$ in the `Schedulers.` section of the config file.
+As training continues, the generator tries to fool the discriminator, while the discriminator tries to distinguish between real and synthetic samples. We monitor the overall loss of both models independently. When the overall loss metric of one model reaches a plateau, we reduce its learning rate to train the model optimally.
+![lr_scheduler](assets/lr_scheduler.png). The patience, LR decrease factor in case of a plateau, and the metric used for these LR schedulers are all defined individually for \(G\) and \(D\) in the `Schedulers` section of the config file.
 
 The schedulers now expose a `cooldown` period and `min_lr` floor. Cooldown waits a configurable number of epochs before watching for the next plateau, preventing back-to-back reductions, while `min_lr` guarantees that the optimiser never stalls at zero. Use these knobs to keep the momentum of long trainings without overshooting into vanishing updates.
 
@@ -59,7 +59,7 @@ If you observe faint checkerboard textures, especially in flat/low-frequency are
 - `Training.Losses.fixed_idx: [0, 1, 2]` for 4-band inputs so VGG perceptual loss uses RGB consistently.
 
 #### Final stages of the Training
-With further progression of the training, it is important not only to monitor the absolute reconstruction quality of the generator, but also to keep an eye on the balance between the generator and discriminator. Ideally, we try to reach the Nash equilibrium, where the discriminator can not distinguish between real and synthetic anymore, meaning the super-resolution is (at least fdor the discriminator) indistinguishable from the real high-resolution image. This equilibrium is achieved when both $D(y)$ and $D(G(x))$ approach `0.5`.
+With further progression of training, it is important not only to monitor the absolute reconstruction quality of the generator, but also to keep an eye on the balance between the generator and discriminator. Ideally, we try to reach the Nash equilibrium, where the discriminator cannot distinguish between real and synthetic anymore, meaning the super-resolution is (at least for the discriminator) indistinguishable from the real high-resolution image. This equilibrium is achieved when both \(D(y)\) and \(D(G(x))\) approach `0.5`.
 ![adv1](assets/discr_y_prob.png)  
 ![adv2](assets/discr_x_prob.png)  
 
diff --git a/docs/training.md b/docs/training.md
@@ -5,7 +5,7 @@
 !!! note "PyTorch Lightning 2+ only"
     The training stack uses a single manual-optimisation path. `SRGAN_model.setup_lightning()` enforces Lightning >= 2.0 and binds `training_step_PL2()` where GAN training runs with `automatic_optimization = False`. `opensr_srgan.utils.build_trainer_kwargs.build_lightning_kwargs()` forwards resume checkpoints through `Trainer.fit(..., ckpt_path=...)`. See [Trainer Details](trainer-details.md) for a step-by-step breakdown of warm-up checks, adversarial updates, and EMA lifecycle.
 
-This section is a more technical overview, [Training Guideline](training-guideline.md) gives a more broad overview how to sirveill the training process.
+This section is a more technical overview; [Training Guideline](training-guideline.md) provides a broader overview of how to monitor the training process.
 
 ## Data module construction
 In order to train, you need a dataset. `Data.dataset_type` decides which dataset to use and wraps them in a `LightningDataModule`. Should you implement your own, you will need to add it to the dataset_selector.py file with the settings of your choice (see [Data](data.md)). Optionally, the selector instantiates `ExampleDataset` by default—perfect for smoke tests after downloading the sample data, a dataset of 200 RGB-NIR image pairs. The module inherits batch sizes, worker counts, and prefetching parameters from the configuration and prints a summary including dataset size.
@@ -33,7 +33,7 @@ Both entry points accept the same configuration file. The CLI exposes a single o
 GPU assignment is handled directly in the configuration. Set `Training.gpus` to a list of device indices (for example `[0, 1, 2, 3]`) to enable multi-GPU training; a single value such as `[0]` keeps the run on one card. When more than one device is listed the trainer automatically activates PyTorch Lightning's Distributed Data Parallel (DDP) backend for significantly faster epochs.
 
 ## Initialisation steps - Overview
-The code performs the following, no matter if the script is launched form the CLI or through the import.
+The code performs the following, regardless of whether the script is launched from the CLI or via import.
 1. **Import dependencies.** Torch, PyTorch Lightning, OmegaConf, and logging backends are loaded up-front.
 2. **Parse arguments.** `argparse` reads the configuration path and ensures the file exists.
 3. **Load configuration.** `OmegaConf.load()` parses the YAML file into an object used throughout the run.
diff --git a/opensr_srgan/model/training_step_PL.py b/opensr_srgan/model/training_step_PL.py