|
| 1 | +--- |
| 2 | +title: 'PyNumDiff: A Python Package for Numerical Differentiation of Noisy Data' |
| 3 | +tags: |
| 4 | + - Python |
| 5 | + - numerical differentiation |
| 6 | + - time series |
| 7 | + - denoising |
| 8 | + - dynamics |
| 9 | + - signal processing |
| 10 | +authors: |
| 11 | + - name: Pavel Komarov |
| 12 | + orcid: 0000-0000-0000-0000 # TODO: fill in |
| 13 | + corresponding: true |
| 14 | + affiliation: 1 |
| 15 | + - name: Floris van Breugel |
| 16 | + orcid: 0000-0000-0000-0000 # TODO: fill in |
| 17 | + affiliation: 2 |
| 18 | + - name: J. Nathan Kutz |
| 19 | + orcid: 0000-0003-0944-900X |
| 20 | + affiliation: 1 |
| 21 | +affiliations: |
| 22 | + - name: Department of Applied Mathematics, University of Washington, USA |
| 23 | + index: 1 |
| 24 | + - name: Department of Mechanical Engineering, University of Nevada, Reno, USA |
| 25 | + index: 2 |
| 26 | +date: 1 April 2026 |
| 27 | +bibliography: paper.bib |
| 28 | +--- |
| 29 | + |
| 30 | +# Summary |
| 31 | + |
| 32 | +Computing derivatives from measured data is a foundational requirement across science and |
| 33 | +engineering. Whether fitting governing equations from experimental observations, designing |
| 34 | +control laws, or processing sensor streams, researchers routinely need derivative estimates |
| 35 | +from discrete, noisy measurements. The textbook approach — finite differencing — amplifies |
| 36 | +noise proportionally to $1/\Delta t$, making it unreliable for real data. Smoothing before |
| 37 | +differencing helps, but the choice of algorithm and its tuning parameters substantially |
| 38 | +affect the result, and no single choice is universally best. |
| 39 | + |
| 40 | +PyNumDiff is an open-source Python package that addresses this by providing a broad suite of |
| 41 | +numerical differentiation methods for noisy data under a unified API. Seven families of |
| 42 | +algorithms are implemented: (1) prefiltering followed by finite difference calculation; |
| 43 | +(2) iterated finite differencing; (3) polynomial fitting; (4) basis function fitting; |
| 44 | +(5) total variation regularization; (6) Kalman smoothing and its outlier-robust variant; |
| 45 | +and (7) local approximation with linear models. All methods return a smoothed signal estimate |
| 46 | +and a derivative estimate as a matched pair `(x_hat, dxdt_hat)`. A companion paper |
| 47 | +[@komarov2025] provides a comprehensive theoretical taxonomy of these methods, benchmarks |
| 48 | +their relative performance, and guides method selection for different application scenarios. |
| 49 | + |
| 50 | + |
| 51 | +# Statement of Need |
| 52 | + |
| 53 | +Estimating derivatives from noisy measurements arises throughout experimental science, |
| 54 | +data-driven modeling, system identification, and control. Standard finite differences amplify |
| 55 | +noise, and while smoothing helps, the field has produced a diverse ecosystem of specialized |
| 56 | +algorithms — each with different strengths regarding outlier robustness, computational cost, |
| 57 | +handling of irregular time steps, or treatment of missing data. Without a consolidated |
| 58 | +library, practitioners either default to the nearest available tool or implement bespoke |
| 59 | +solutions that are hard to compare or reproduce. |
| 60 | + |
| 61 | +PyNumDiff fills this gap. Its unified interface lets users compare methods directly on the |
| 62 | +same data, exploit specialized capabilities, and select hyperparameters without requiring |
| 63 | +ground-truth derivatives. The package is particularly valuable in workflows that use |
| 64 | +derivatives as regression targets: for example, Sparse Identification of Nonlinear Dynamics |
| 65 | +(SINDy) [@brunton2016discovering] learns governing equations by regressing measured |
| 66 | +derivatives, making clean, reliable derivative estimates a prerequisite for accurate model |
| 67 | +identification. |
| 68 | + |
| 69 | + |
| 70 | +# State of the Field |
| 71 | + |
| 72 | +Relevant Python tools exist, but none covers the full scope of PyNumDiff. `numpy.gradient` |
| 73 | +and `scipy.signal.savgol_filter` [@virtanen2020scipy] are widely available but address only |
| 74 | +a narrow slice of the method space. The `findiff` package provides high-order finite |
| 75 | +difference stencils for clean simulation data rather than noisy measurements. The standalone |
| 76 | +`TVRegDiff` code [@chartrand2011numerical] implements a single total variation regularization |
| 77 | +method; PyNumDiff includes and extends this. No existing Python package combines seven method |
| 78 | +families with a unified API, NaN and variable-step support, multidimensional capability, |
| 79 | +circular domain handling, and integrated hyperparameter optimization. |
| 80 | + |
| 81 | +The original PyNumDiff [@vanBreugel2022] introduced four method families with basic parameter |
| 82 | +optimization. The present version substantially extends that foundation in scope, capability, |
| 83 | +and software quality. |
| 84 | + |
| 85 | + |
| 86 | +# Software Design |
| 87 | + |
| 88 | +**Unified API.** All differentiation methods share the call signature |
| 89 | + |
| 90 | +```python |
| 91 | +x_hat, dxdt_hat = method(x, dt_or_t, **params) |
| 92 | +``` |
| 93 | + |
| 94 | +where `x` is a NumPy array [@harris2020array] of measurements, `dt_or_t` is either a |
| 95 | +scalar step size or an array of sample locations (for variable step size), and keyword |
| 96 | +arguments configure the method. The two return values always match the shape of `x`. This |
| 97 | +uniformity makes method comparison straightforward and enables drop-in substitution. |
| 98 | + |
| 99 | +**Multidimensional support.** An `axis` parameter controls which axis is differentiated, |
| 100 | +allowing all methods to operate on blocks of time series simultaneously. The implementation |
| 101 | +iterates over all non-time axes with `np.ndindex`, applying the algorithm independently to |
| 102 | +each vector, so a 2D array of shape `(N, D)` is handled without reshaping or looping in |
| 103 | +user code. |
| 104 | + |
| 105 | +**Missing data and variable step size.** Several methods handle NaN-valued entries as missing |
| 106 | +observations and non-uniform sample spacing: `splinediff`, `polydiff`, `rbfdiff`, `rtsdiff`, |
| 107 | +and `robustdiff`. This supports realistic experimental scenarios where sensors skip samples or |
| 108 | +operate at irregular rates. |
| 109 | + |
| 110 | +**Circular and wrapped domains.** `rtsdiff` accepts a `circular=True` flag for quantities |
| 111 | +like angles that live on a periodic domain. Innovation residuals are wrapped to $[-\pi, \pi]$ |
| 112 | +before each Kalman update, and `x_hat` is returned wrapped to the same range. This avoids the |
| 113 | +erroneous large-magnitude spikes that naive smoothers produce when a signal crosses the |
| 114 | +$\pm\pi$ boundary. The wrapping is injected via an optional `innovation_fn` hook on the |
| 115 | +underlying `kalman_filter` primitive, keeping the filter itself general. |
| 116 | + |
| 117 | +**Outlier robustness.** `robustdiff` solves a convex optimization problem that replaces the |
| 118 | +least-squares Kalman cost with Huber loss on both measurement and process residuals, |
| 119 | +dramatically reducing outlier influence. It uses `cvxpy` [@diamond2016cvxpy] as its |
| 120 | +optimization backend, available via `pip install pynumdiff[advanced]`. `tvrdiff` similarly |
| 121 | +minimizes a total variation penalty on the derivative, promoting piecewise-smooth solutions |
| 122 | +and tolerating abrupt changes; it supports regularization of the velocity, acceleration, or |
| 123 | +jerk (orders 1–3). |
| 124 | + |
| 125 | +**Hyperparameter optimization.** Because every method has tuning parameters, PyNumDiff |
| 126 | +provides a multi-objective optimization framework in `pynumdiff.optimize` that minimizes a |
| 127 | +weighted sum of a smoothness penalty and a data fidelity term [@vanBreugel2020numerical]. |
| 128 | +When ground-truth derivatives are available, the optimizer minimizes derivative error |
| 129 | +directly. The smoothness weight `tvgamma` can be initialized from the signal's estimated |
| 130 | +cutoff frequency $f_c$ via the empirical formula |
| 131 | +$$\texttt{tvgamma} = \exp(-1.6\ln f_c - 0.71\ln \Delta t - 5.1).$$ |
| 132 | + |
| 133 | + |
| 134 | +# Research Impact |
| 135 | + |
| 136 | +The original PyNumDiff paper [@vanBreugel2022] has accumulated over 60 citations since its |
| 137 | +2022 publication, and the package has been used in experimental biology to estimate flight |
| 138 | +kinematics from motion capture, in control engineering for observer design, and in |
| 139 | +data-driven dynamics identification. The companion Taxonomy paper [@komarov2025], submitted |
| 140 | +to the Journal of Computational Physics, provides the theoretical analysis underpinning the |
| 141 | +method collection and benchmarks all included methods across diverse test signals. The |
| 142 | +package is available on PyPI (`pip install pynumdiff`), documented at |
| 143 | +[pynumdiff.readthedocs.io](https://pynumdiff.readthedocs.io/master/), and archived on |
| 144 | +Zenodo [@pynumdiff_zenodo]. |
| 145 | + |
| 146 | + |
| 147 | +# AI Usage Disclosure |
| 148 | + |
| 149 | +The draft of this paper was prepared with assistance from Claude Sonnet 4.6 (Anthropic). |
| 150 | +This included integrating material from the repository, documentation, and related papers |
| 151 | +into a coherent draft. The authors reviewed and edited all content and take full |
| 152 | +responsibility for its accuracy. |
| 153 | + |
| 154 | + |
| 155 | +# Acknowledgements |
| 156 | + |
| 157 | +The authors thank Yuying Liu and Bingni W. Brunton for their contributions to the original |
| 158 | +PyNumDiff package [@vanBreugel2022]. |
| 159 | + |
| 160 | + |
| 161 | +# References |
0 commit comments