Some of the researchers let the Optimizers maintain some intermediate variables,
for example, the weights in the proximal term, the gradient cache for variance reduction.
However, we let the Algorithms maintain such variables, and make them input parameters for relevant Optimizers.
Most (inner) optimizers are based on the ProxSGD optimizer, including
Other (inner) optimizers include
- FedPD Optimizers (Not checked yet)