A from-scratch numerical optimization engine in C++ with Python bindings via pybind11.
This library implements five optimization algorithms over a shared abstract interface, supporting both analytic and finite-difference derivatives. The design separates algorithm logic from objective function definitions, allowing any optimizer to be applied to any function without modification.
Core design principles:
- Uniform optimizer interface: all algorithms implement a single
optimize(f, x0)method. - Separation of objectives and algorithms: functions subclass
ObjectiveFunction; optimizers subclassOptimizer. - Numerical robustness: explicit handling of indefinite Hessians, curvature pair degeneracy, line search bracket escape, and floating-point overflow in acceptance probability.
- Python interoperability: full binding of all types and configs, including support for Python-defined objective functions.
All optimizers inherit from Optimizer and implement:
OptimResult optimize(ObjectiveFunction &f, const Eigen::VectorXd &x_0);Objective functions inherit from ObjectiveFunction and implement evaluate, grad, hessian, and dim. The line search and finite difference wrapper operate on the same interface.
Gradient Descent (src/optimizers/gradient_descent.cpp)
Standard steepest descent: x_{n+1} = x_n - alpha_n * grad f(x_n). Step size is determined by Wolfe line search when backtracking = true (default), or a fixed learning_rate otherwise. Convergence is checked on relative gradient norm.
Newton's Method (src/optimizers/newton.cpp)
Computes d_n = (H + mu*I)^{-1} * grad f(x_n) where H = hessian f(x_n). The diagonal shift mu is initialized at initial_mu and doubled until the Cholesky factorization of H + mu*I succeeds. The step is scaled via Wolfe line search.
L-BFGS (src/optimizers/lbfgs.cpp)
Limited-memory quasi-Newton method. The inverse Hessian approximation H_n is built implicitly from the m most recent displacement pairs (s_n, y_n) and applied via the two-loop recursion (Nocedal 1980). Curvature pairs with s^T y <= 1e-8 * ||s|| * ||y|| are skipped. The line search uses c2 = 0.1. Convergence criterion: ||grad f|| < tol * max(1, |f(x)|).
Hill Climbing (src/optimizers/hill_climbing.cpp)
Derivative-free stochastic local search. At each step, a candidate x' = x + eps is drawn as eps ~ N(0, step_size^2 * I). The candidate is accepted only if f(x') < f(x). Two stopping criteria run in parallel: improvement below tol (CONVERGED), or no accepted step for stagnation_limit consecutive iterations (STAGNATED).
Simulated Annealing (src/optimizers/simulated_annealing.cpp)
Metropolis-Hastings global search with geometric cooling T_{n+1} = cooling_rate * T_n. Worsening moves are accepted with probability exp(-delta_f / T), clamped to prevent overflow when T is near min_temp. The best solution seen across all iterations is returned, not the final chain position.
Wolfe Line Search (src/core/line_search.cpp)
Implements the More-Thuente bracket-and-zoom algorithm (Nocedal & Wright). The bracket phase expands alpha from 1.0 until a Wolfe-satisfying interval is identified. The zoom phase narrows the interval via cubic interpolation, with the trial point clamped to the inner 80% of the bracket to prevent degenerate near-boundary steps. The cubic interpolation falls back to bisection when the discriminant is negative or the denominator is near-zero.
Finite Difference Wrapper (include/core/finite_difference.hpp)
Wraps any ObjectiveFunction to provide numerically approximated derivatives when analytic forms are unavailable. Uses central differences for the gradient and second/mixed differences for the Hessian, both O(h^2) accurate, with default h = 1e-5. Hessian symmetry is enforced explicitly.
- Newton: Cholesky-based indefinite Hessian detection with iterative diagonal shift; fails gracefully if shift exceeds
1e10. - L-BFGS: Relative curvature guard
s^T y > 1e-8 * ||s|| * ||y||on each pair before storage. - Line search: Cubic interpolation result clamped to bracket interior; denominator near-zero check with bisection fallback.
- Simulated Annealing: Acceptance exponent clamped at
-500beforeexp()to prevent IEEE 754 overflow.
Python bindings are implemented with pybind11 (python/bindings.cpp). The module is compiled as a shared library (optim_engine.cpython-*.so) and importable from the project root.
Trampoline classes (PyObjectiveFunction, PyOptimizer) enable Python subclassing of the abstract C++ base classes. Each virtual method is forwarded via PYBIND11_OVERRIDE_PURE, allowing Python-defined objective functions to be passed directly to C++ optimizers.
Exposed to Python:
- All five optimizer classes and their config structs
ObjectiveFunctionbase class (subclassable)FiniteDiffWrapperwithpy::keep_alive<1,2>()to prevent dangling reference to the wrapped objectOptimResult(read-only fields)StatusenumWolfeConfigand thelineSearchfree functionSphere,Rosenbrock,Rastrigin
| Dependency | Version tested |
|---|---|
| Compiler | Apple Clang 21 / GCC 12+ with C++20 support |
| Eigen | 3.4+ (/opt/homebrew/include/eigen3 or equivalent) |
| pybind11 | 3.0+ |
| Python | 3.10+ (headers required) |
| Catch2 | v2 (vendored in vendor/catch2/) |
# Build everything: optimizer objects, test binaries, benchmarks, Python extension
make all
# Individual targets
make src # compile src/ objects only
make benchmarks # build bin/benchmarks
make prelim-tests # build bin/prelim-tests/<name> for each test
make python # build optim_engine.cpython-*.so
make clean # remove build/ and bin/bin/benchmarks
bin/prelim-tests/{gd_test,newton_test,lbfgs_test,hc_test,sa_test}
optim_engine.cpython-*.so
If SDKROOT is stale (common after macOS SDK updates), add unexport SDKROOT to the top of the Makefile (already present) or run unset SDKROOT in your shell before building.
#include "umbrella_header.hpp"
Rosenbrock f(2);
LBFGSConfig config;
config.tol = 1e-10;
LBFGS optimizer(config);
Eigen::VectorXd x0(2);
x0 << 1.0, -1.0;
OptimResult result = optimizer.optimize(f, x0);
// result.optimal_x -- Eigen::VectorXd
// result.optimal_f -- double
// result.status -- Status::CONVERGED | MAX_EPOCHS | STAGNATED | FAILED
// result.iterations -- intimport optim_engine as opt
import numpy as np
f = opt.Rosenbrock(2)
config = opt.LBFGSConfig()
config.tol = 1e-10
optimizer = opt.LBFGS(config)
result = optimizer.optimize(f, np.array([1.0, -1.0]))
print(result.optimal_f) # float
print(result.optimal_x) # numpy array
print(result.status) # opt.Status enum value
print(result.iterations) # intC++:
class MyFunction : public ObjectiveFunction {
public:
double evaluate(const Eigen::VectorXd &x) const override { ... }
Eigen::VectorXd grad(const Eigen::VectorXd &x) const override { ... }
Eigen::MatrixXd hessian(const Eigen::VectorXd &x) const override { ... }
int dim() const override { return n_; }
private:
int n_;
};Python with analytic derivatives:
class MyFunction(opt.ObjectiveFunction):
def evaluate(self, x): return float(np.dot(x, x))
def grad(self, x): return 2.0 * x
def hessian(self, x): return 2.0 * np.eye(len(x))
def dim(self): return 2Python without derivatives (finite differences):
class MyFunction(opt.ObjectiveFunction):
def evaluate(self, x): return float(x[0]**4 + x[1]**4)
def grad(self, x): raise NotImplementedError
def hessian(self, x): raise NotImplementedError
def dim(self): return 2
f = MyFunction()
wrapped = opt.FiniteDiffWrapper(f, h=1e-5) # f must outlive wrapped
result = opt.GradientDescent(opt.GDConfig()).optimize(wrapped, np.array([2.0, 2.0]))make run-tests # build and run all test binaries sequentially
# or individually:
./bin/prelim-tests/gd_test
./bin/prelim-tests/newton_test
./bin/prelim-tests/lbfgs_test
./bin/prelim-tests/hc_test
./bin/prelim-tests/sa_testpython3 tests/python-tests/bindings_test_suite.py
python3 tests/python-tests/fdw_test.pyEach optimizer has two C++ test cases: convergence on a unimodal function and a defined non-convergence scenario. Python tests cover the binding layer for all five optimizers, a Python-subclassed objective, and the finite difference wrapper.
Hill Climbing and Simulated Annealing are stochastic. Individual runs may return STAGNATED rather than CONVERGED under default config; this does not indicate a correctness failure. The final optimal_f value is the relevant metric for these optimizers.
include/
core/ # Abstract interfaces and supporting utilities
objective.hpp
optimizer.hpp
result.hpp
line_search.hpp
finite_difference.hpp
optimizers/ # Config structs and class declarations
functions/ # Sphere, Rosenbrock, Rastrigin (header-only)
src/
core/
line_search.cpp
optimizers/ # Algorithm implementations
python/
bindings.cpp # pybind11 module
tests/
prelim-tests/ # Catch2 C++ tests, one file per optimizer
python-tests/ # Python binding and integration tests
benchmarks/
benchmarks.cpp # C++ benchmark runner
benchmarks.py # Python comparison benchmark
plots.py # Convergence plotting utilities
vendor/
catch2/ # Vendored Catch2 v2 single-header
Header-only functions vs. compiled optimizers: Test functions (Sphere, Rosenbrock, Rastrigin) are header-only since they contain no state and are used across tests, benchmarks, and Python bindings. Optimizers are split into header/source to keep compile times reasonable as the optimizer count grows.
FiniteDiffWrapper holds a raw reference: The wrapper takes non-const ObjectiveFunction &f rather than owning a copy or shared pointer. This avoids imposing a heap allocation on callers and keeps the interface consistent with how optimizers receive objectives. The lifetime requirement is documented and enforced at the Python boundary via py::keep_alive.
Wolfe c2 = 0.1 for L-BFGS vs. c2 = 0.9 for GD/Newton: The tighter curvature condition for L-BFGS is not a tuning choice, it is a correctness requirement. The BFGS update requires y^T s > 0, which the strong Wolfe condition with c2 < 1 guarantees. Using c2 = 0.9 in L-BFGS would allow curvature pairs with near-zero y^T s into the history, corrupting the inverse-Hessian approximation.
Relative convergence criterion: All gradient-based optimizers use ||grad f|| < tol * max(1, |f(x)|) rather than the absolute ||grad f|| < tol. The absolute form fails silently on functions with large objective values where the gradient is proportionally large at the optimum.
No PRNG seeding in HC/SA: Both stochastic optimizers seed from std::random_device rather than a fixed seed. Reproducibility is left to the caller; a fixed seed would require exposing it in the config and complicates the default-construct-and-run usage pattern.
The engine currently supports unconstrained optimization only. The following are not implemented:
- Box constraints (
x_min <= x <= x_max): required for L-BFGS-B style bounded optimization. - Equality constraints (
g(x) = 0): no augmented Lagrangian or penalty method. - Inequality constraints (
h(x) <= 0): no interior point or active set method.
For constrained problems, a penalty or barrier can be added manually to the objective function before passing it to any optimizer. This is not a substitute for a proper constrained solver but works for simple cases.
Line search: The More-Thuente zoom phase maintains the standard bracket invariants. Cubic interpolation results are clamped to the inner 80% of the bracket to prevent degenerate near-boundary trial steps. A denominator near-zero check and discriminant less than zero falls back to bisection.
L-BFGS curvature pairs: Pairs are accepted only when s^T y > 1e-8 * ||s|| * ||y|| (relative threshold, scale-invariant).
Newton regularization: The modified Cholesky approach doubles mu on each failed Cholesky factorization. At mu >= 1e10 the step is abandoned and Status::FAILED is returned, rather than silently producing a non-descent direction.
Simulated annealing overflow: The exponent in exp(-delta_f / T) is clamped to [-500, 0] before evaluation, corresponding to a minimum acceptance probability near 7e-218.