numerical-optimization-engine

A from-scratch numerical optimization engine in C++ with Python bindings via pybind11.

Overview

This library implements five optimization algorithms over a shared abstract interface, supporting both analytic and finite-difference derivatives. The design separates algorithm logic from objective function definitions, allowing any optimizer to be applied to any function without modification.

Core design principles:

Uniform optimizer interface: all algorithms implement a single optimize(f, x0) method.
Separation of objectives and algorithms: functions subclass ObjectiveFunction; optimizers subclass Optimizer.
Numerical robustness: explicit handling of indefinite Hessians, curvature pair degeneracy, line search bracket escape, and floating-point overflow in acceptance probability.
Python interoperability: full binding of all types and configs, including support for Python-defined objective functions.

Architecture

Optimizer Abstraction

All optimizers inherit from Optimizer and implement:

OptimResult optimize(ObjectiveFunction &f, const Eigen::VectorXd &x_0);

Objective functions inherit from ObjectiveFunction and implement evaluate, grad, hessian, and dim. The line search and finite difference wrapper operate on the same interface.

Optimizers

Gradient Descent (src/optimizers/gradient_descent.cpp)

Standard steepest descent: x_{n+1} = x_n - alpha_n * grad f(x_n). Step size is determined by Wolfe line search when backtracking = true (default), or a fixed learning_rate otherwise. Convergence is checked on relative gradient norm.

Newton's Method (src/optimizers/newton.cpp)

Computes d_n = (H + mu*I)^{-1} * grad f(x_n) where H = hessian f(x_n). The diagonal shift mu is initialized at initial_mu and doubled until the Cholesky factorization of H + mu*I succeeds. The step is scaled via Wolfe line search.

L-BFGS (src/optimizers/lbfgs.cpp)

Limited-memory quasi-Newton method. The inverse Hessian approximation H_n is built implicitly from the m most recent displacement pairs (s_n, y_n) and applied via the two-loop recursion (Nocedal 1980). Curvature pairs with s^T y <= 1e-8 * ||s|| * ||y|| are skipped. The line search uses c2 = 0.1. Convergence criterion: ||grad f|| < tol * max(1, |f(x)|).

Hill Climbing (src/optimizers/hill_climbing.cpp)

Derivative-free stochastic local search. At each step, a candidate x' = x + eps is drawn as eps ~ N(0, step_size^2 * I). The candidate is accepted only if f(x') < f(x). Two stopping criteria run in parallel: improvement below tol (CONVERGED), or no accepted step for stagnation_limit consecutive iterations (STAGNATED).

Simulated Annealing (src/optimizers/simulated_annealing.cpp)

Metropolis-Hastings global search with geometric cooling T_{n+1} = cooling_rate * T_n. Worsening moves are accepted with probability exp(-delta_f / T), clamped to prevent overflow when T is near min_temp. The best solution seen across all iterations is returned, not the final chain position.

Supporting Components

Wolfe Line Search (src/core/line_search.cpp)

Implements the More-Thuente bracket-and-zoom algorithm (Nocedal & Wright). The bracket phase expands alpha from 1.0 until a Wolfe-satisfying interval is identified. The zoom phase narrows the interval via cubic interpolation, with the trial point clamped to the inner 80% of the bracket to prevent degenerate near-boundary steps. The cubic interpolation falls back to bisection when the discriminant is negative or the denominator is near-zero.

Finite Difference Wrapper (include/core/finite_difference.hpp)

Wraps any ObjectiveFunction to provide numerically approximated derivatives when analytic forms are unavailable. Uses central differences for the gradient and second/mixed differences for the Hessian, both O(h^2) accurate, with default h = 1e-5. Hessian symmetry is enforced explicitly.

Numerical Safeguards

Newton: Cholesky-based indefinite Hessian detection with iterative diagonal shift; fails gracefully if shift exceeds 1e10.
L-BFGS: Relative curvature guard s^T y > 1e-8 * ||s|| * ||y|| on each pair before storage.
Line search: Cubic interpolation result clamped to bracket interior; denominator near-zero check with bisection fallback.
Simulated Annealing: Acceptance exponent clamped at -500 before exp() to prevent IEEE 754 overflow.

Python Bindings

Python bindings are implemented with pybind11 (python/bindings.cpp). The module is compiled as a shared library (optim_engine.cpython-*.so) and importable from the project root.

Trampoline classes (PyObjectiveFunction, PyOptimizer) enable Python subclassing of the abstract C++ base classes. Each virtual method is forwarded via PYBIND11_OVERRIDE_PURE, allowing Python-defined objective functions to be passed directly to C++ optimizers.

Exposed to Python:

All five optimizer classes and their config structs
ObjectiveFunction base class (subclassable)
FiniteDiffWrapper with py::keep_alive<1,2>() to prevent dangling reference to the wrapped object
OptimResult (read-only fields)
Status enum
WolfeConfig and the lineSearch free function
Sphere, Rosenbrock, Rastrigin

Build Instructions

Dependencies

Dependency	Version tested
Compiler	Apple Clang 21 / GCC 12+ with C++20 support
Eigen	3.4+ (`/opt/homebrew/include/eigen3` or equivalent)
pybind11	3.0+
Python	3.10+ (headers required)
Catch2	v2 (vendored in `vendor/catch2/`)

Commands

# Build everything: optimizer objects, test binaries, benchmarks, Python extension
make all

# Individual targets
make src            # compile src/ objects only
make benchmarks     # build bin/benchmarks
make prelim-tests   # build bin/prelim-tests/<name> for each test
make python         # build optim_engine.cpython-*.so
make clean          # remove build/ and bin/

Expected outputs

bin/benchmarks
bin/prelim-tests/{gd_test,newton_test,lbfgs_test,hc_test,sa_test}
optim_engine.cpython-*.so

If SDKROOT is stale (common after macOS SDK updates), add unexport SDKROOT to the top of the Makefile (already present) or run unset SDKROOT in your shell before building.

Usage

C++

#include "umbrella_header.hpp"

Rosenbrock f(2);

LBFGSConfig config;
config.tol = 1e-10;
LBFGS optimizer(config);

Eigen::VectorXd x0(2);
x0 << 1.0, -1.0;

OptimResult result = optimizer.optimize(f, x0);
// result.optimal_x  -- Eigen::VectorXd
// result.optimal_f  -- double
// result.status     -- Status::CONVERGED | MAX_EPOCHS | STAGNATED | FAILED
// result.iterations -- int

Python

import optim_engine as opt
import numpy as np

f = opt.Rosenbrock(2)
config = opt.LBFGSConfig()
config.tol = 1e-10

optimizer = opt.LBFGS(config)
result = optimizer.optimize(f, np.array([1.0, -1.0]))

print(result.optimal_f)   # float
print(result.optimal_x)   # numpy array
print(result.status)      # opt.Status enum value
print(result.iterations)  # int

Custom Objective Function

C++:

class MyFunction : public ObjectiveFunction {
public:
    double evaluate(const Eigen::VectorXd &x) const override { ... }
    Eigen::VectorXd grad(const Eigen::VectorXd &x) const override { ... }
    Eigen::MatrixXd hessian(const Eigen::VectorXd &x) const override { ... }
    int dim() const override { return n_; }
private:
    int n_;
};

Python with analytic derivatives:

class MyFunction(opt.ObjectiveFunction):
    def evaluate(self, x): return float(np.dot(x, x))
    def grad(self, x):     return 2.0 * x
    def hessian(self, x):  return 2.0 * np.eye(len(x))
    def dim(self):         return 2

Python without derivatives (finite differences):

class MyFunction(opt.ObjectiveFunction):
    def evaluate(self, x): return float(x[0]**4 + x[1]**4)
    def grad(self, x):     raise NotImplementedError
    def hessian(self, x):  raise NotImplementedError
    def dim(self):         return 2

f = MyFunction()
wrapped = opt.FiniteDiffWrapper(f, h=1e-5)  # f must outlive wrapped

result = opt.GradientDescent(opt.GDConfig()).optimize(wrapped, np.array([2.0, 2.0]))

Testing

C++ tests (Catch2)

make run-tests          # build and run all test binaries sequentially

# or individually:
./bin/prelim-tests/gd_test
./bin/prelim-tests/newton_test
./bin/prelim-tests/lbfgs_test
./bin/prelim-tests/hc_test
./bin/prelim-tests/sa_test

Python tests

python3 tests/python-tests/bindings_test_suite.py
python3 tests/python-tests/fdw_test.py

Coverage

Each optimizer has two C++ test cases: convergence on a unimodal function and a defined non-convergence scenario. Python tests cover the binding layer for all five optimizers, a Python-subclassed objective, and the finite difference wrapper.

Hill Climbing and Simulated Annealing are stochastic. Individual runs may return STAGNATED rather than CONVERGED under default config; this does not indicate a correctness failure. The final optimal_f value is the relevant metric for these optimizers.

Project Structure

include/
  core/           # Abstract interfaces and supporting utilities
    objective.hpp
    optimizer.hpp
    result.hpp
    line_search.hpp
    finite_difference.hpp
  optimizers/     # Config structs and class declarations
  functions/      # Sphere, Rosenbrock, Rastrigin (header-only)

src/
  core/
    line_search.cpp
  optimizers/     # Algorithm implementations

python/
  bindings.cpp    # pybind11 module

tests/
  prelim-tests/   # Catch2 C++ tests, one file per optimizer
  python-tests/   # Python binding and integration tests

benchmarks/
  benchmarks.cpp  # C++ benchmark runner
  benchmarks.py   # Python comparison benchmark
  plots.py        # Convergence plotting utilities

vendor/
  catch2/         # Vendored Catch2 v2 single-header

Design Tradeoffs

Header-only functions vs. compiled optimizers: Test functions (Sphere, Rosenbrock, Rastrigin) are header-only since they contain no state and are used across tests, benchmarks, and Python bindings. Optimizers are split into header/source to keep compile times reasonable as the optimizer count grows.

FiniteDiffWrapper holds a raw reference: The wrapper takes non-const ObjectiveFunction &f rather than owning a copy or shared pointer. This avoids imposing a heap allocation on callers and keeps the interface consistent with how optimizers receive objectives. The lifetime requirement is documented and enforced at the Python boundary via py::keep_alive.

Wolfe c2 = 0.1 for L-BFGS vs. c2 = 0.9 for GD/Newton: The tighter curvature condition for L-BFGS is not a tuning choice, it is a correctness requirement. The BFGS update requires y^T s > 0, which the strong Wolfe condition with c2 < 1 guarantees. Using c2 = 0.9 in L-BFGS would allow curvature pairs with near-zero y^T s into the history, corrupting the inverse-Hessian approximation.

Relative convergence criterion: All gradient-based optimizers use ||grad f|| < tol * max(1, |f(x)|) rather than the absolute ||grad f|| < tol. The absolute form fails silently on functions with large objective values where the gradient is proportionally large at the optimum.

No PRNG seeding in HC/SA: Both stochastic optimizers seed from std::random_device rather than a fixed seed. Reproducibility is left to the caller; a fixed seed would require exposing it in the config and complicates the default-construct-and-run usage pattern.

Constraints

The engine currently supports unconstrained optimization only. The following are not implemented:

Box constraints (x_min <= x <= x_max): required for L-BFGS-B style bounded optimization.
Equality constraints (g(x) = 0): no augmented Lagrangian or penalty method.
Inequality constraints (h(x) <= 0): no interior point or active set method.

For constrained problems, a penalty or barrier can be added manually to the objective function before passing it to any optimizer. This is not a substitute for a proper constrained solver but works for simple cases.

Notes on Numerical Stability

Line search: The More-Thuente zoom phase maintains the standard bracket invariants. Cubic interpolation results are clamped to the inner 80% of the bracket to prevent degenerate near-boundary trial steps. A denominator near-zero check and discriminant less than zero falls back to bisection.

L-BFGS curvature pairs: Pairs are accepted only when s^T y > 1e-8 * ||s|| * ||y|| (relative threshold, scale-invariant).

Newton regularization: The modified Cholesky approach doubles mu on each failed Cholesky factorization. At mu >= 1e10 the step is abandoned and Status::FAILED is returned, rather than silently producing a non-descent direction.

Simulated annealing overflow: The exponent in exp(-delta_f / T) is clamped to [-500, 0] before evaluation, corresponding to a minimum acceptance probability near 7e-218.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

numerical-optimization-engine

Overview

Architecture

Optimizer Abstraction

Optimizers

Supporting Components

Numerical Safeguards

Python Bindings

Build Instructions

Dependencies

Commands

Expected outputs

Usage

C++

Python

Custom Objective Function

Testing

C++ tests (Catch2)

Python tests

Coverage

Project Structure

Design Tradeoffs

Constraints

Notes on Numerical Stability

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
benchmarks		benchmarks
include		include
python		python
src		src
tests		tests
vendor/catch2		vendor/catch2
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

numerical-optimization-engine

Overview

Architecture

Optimizer Abstraction

Optimizers

Supporting Components

Numerical Safeguards

Python Bindings

Build Instructions

Dependencies

Commands

Expected outputs

Usage

C++

Python

Custom Objective Function

Testing

C++ tests (Catch2)

Python tests

Coverage

Project Structure

Design Tradeoffs

Constraints

Notes on Numerical Stability

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages