ODEParameterEstimation/odepe.tex at main · orebas/ODEParameterEstimation · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
%===============================================================================
% ifacconf.tex 2022-02-11 jpuente
% 2022-11-11 jpuente change length of abstract
% Template for IFAC meeting papers
% Copyright (c) 2022 International Federation of Automatic Control
%===============================================================================
\documentclass{ifacconf}

\usepackage{graphicx}      % include this line if your document contains figures
\usepackage{natbib}        % required for bibliography
%===============================================================================
\begin{document}
	\begin{frontmatter}

		\title{Robust Algebraic Parameter Estimation via Gaussian Process Regression\thanksref{footnoteinfo}}
		% Title, preferably not more than 10 words.

		\thanks[footnoteinfo]{Sponsor and financial support acknowledgment
			goes here. Paper titles should be written in uppercase and lowercase
			letters, not all uppercase.}

		\author[First]{Oren Bassik}5
		\author[Second]{Alexander Demin}
		\author[Third]{Alexey Ovchinnikov}

\address[First]{Graduate Center, City University of New York,
	365 Fifth Avenue, New York, NY 10016, USA (e-mail: obassik@gradcenter.cuny.edu).}
\address[Second]{Higher School of Economics,
	National Research University, Moscow, Russia (e-mail: asdemin_2@edu.hse.ru).}
\address[Third]{Department of Mathematics,
	Queens College, City University of New York,
	65-30 Kissena Blvd, Queens, NY 11367, USA (e-mail: aovchinnikov@qc.cuny.edu).}

		\begin{abstract}                % Abstract of 50--100 words
			To estimate parameters in ODE systems from noisy data, we propose adding a Gaussian Process Regression (GPR) step to an otherwise algebraic approach.  This replaces the interpolation step in the original algorithm, and allows for robust estimation even in the presence of noise.  We demonstrate the method on a suite of benchmark problems from multiple disciplines.  With GPR, we are able to retain the robust features inherent in the algebraic approach, while extending applicability to realistic data with noise.
		\end{abstract}

		\begin{keyword}
			Parameter and state estimation, software for system identification, continuous time system estimation
		\end{keyword}

	\end{frontmatter}
	%===============================================================================

	\section{Introduction}
	Parameter estimation for systems of ordinary differential equations is a fundamental problem in systems modeling and dynamics.  A typical approach to estimating parameters from experiments uses nonlinear optimization to search for parameters which minimize error against observed data.  This inherits various difficulties from nonlinear optimization: the need for good initial guesses, getting stuck in local minima, and only finding a single solution.

	An algorithm outlined in [1] combines differential algebra with baryrational interpolation and multivariate polynomial systems  solving.  This method does not suffer from the non-robustness inherent in nonlinear optimization.  It needs no initial guesses, requires little input from the user, and ideally finds all solutions (in the case of locally identifiability.)

	This differential algebraic method shows excellent performance on noise-free or synthetic data, but due to the reliance on interpolation degrades severely with even minimal measurement noise.  To overcome this limitation, we replace the baryrational interpolation with a Gaussian Process (GP) regression using a squared exponential kernel, and automatically learn the noise hyperparameter from the data.

	This simple replacement significantly improves robustness to measurement noise in observed data.  We demonstrate the method on a suite of benchmark problems from multiple disciplines.  With GP regression, we are able to retain the robust features inherent in the algebraic approach, while extending applicability to realistic data with noise.


	\section{Statement of problem}
	We are given an ODE model
	\[ x'(t) = f(x(t), u(t), p)\]
	\[ y(t) = g(x(t), u(t), p)\]
	\[ x(0) = x_0\]
	where $x$ are state variables, $u$ are control variables, $p$ are unknown parameters, $x_0$ are unknown initial conditions, and $f$ and $g$ are rational functions.

	We are given a set of measurements of the form $(t_i, y_i)$ for $i = 1, 2, \ldots, N$.  We seek to estimate the parameters $p$ and the unknown initial conditions $x_0$ from the measurements.

	\section{Method}
	We give a bird's eye view of the method, and track a toy problem through the algorithm.  While the toy problem is simple, the method generalizes fully to higher-dimensional systems.  The main constraint is that all functions must be rational.
	\subsection{Toy problem}
	Let $x' = (ax)^2 + b $ and $y = x^2+x$.  We seek to estimate the parameters from measurements of $y$ taken at several points.

	\subsection{Step 1: Differentiate  the ODE system}
	\[ x' = a^2x^2 + b\]
	\[ x'' = 2a^2x x' \]
	\[ y = x^2 + x \]
	\[ y' = 2x x' + x' \]
	\[ y'' = 2(x'x' + x x'') + x'' \]
	\subsubsection{Step 2: Approximate the derivatives from data}
	Pick a time point $t$.  Use any method to approximate the derivatives of the measurements $y$ at $t$.  For example, let $y_0 \approx y(t), y_1 \approx y'(t), y_2 \approx y''(t).$  More on this below

	\subsubsection{Step 3: Form the system of equations}
	Use the measurements $y_0, y_1, y_2$ to form a system of polynomial equations:
	\[x_1 = a^2x_0^2 + b\]
	\[x_2 = 2a^2x_0 x_1\]
	\[ y_0 = x_0^2 + x_0 \]
	\[ y_1 = 2x_0 x_1 + x_1 \]
	\[ y_2 = 2(x_1x_1 + x_0 x_2) + x_2 \]
	Care must be taken in this and previous steps to get a square system.

	\subsubsection{Step 4: Solve the system of equations}
	Use any method to solve the system of equations to find the parameters $a, b, x_0, x_1, x_2$. Ideally we find all solutions.  In this case we expect at least two solutions, because of the symmetry between $a$ and $-a$.  Our software supports many solvers, and is currently using HomotopyContinuation.jl followed by some Newton root polishing as a default.

	\subsection{Backsolve and filter the solutions}  %perhaps says "propogate ODE solutions to full timespan" or similar
	If $t$ was not the initial time, then for each solution we found, we can backsolve the ODE to find the initial conditions $x(0)$.  We can also forward solve and compare to the measurements to calculate error.  We filter the solutions to find the one(s) with the least error.

	\section{ The data processing step }
	It is a theorem that for locally identifiable parameters, using (a) perfect approximations of the derivatives in step 2, and a (b) certified polynomial root finder, we can recover all parameters.  Both of these issues are significant numerical analysis problem in their own right.  In [2] we tested this algorithm on synthetic data sampled from precise ODE solutions and found that for estimating derivatives, AAA rational interpolation following by algorithmic differentiation of the interpolant outperformed many alternatives (including polynomial interpolants, splines, fourier interpolation, and finite differences).

	However, AAA is an interpolation scheme, and as such, even small amounts of measurement noise ($10^{-8}$) cause overfitting, oscillation, and particularly poor estimates for derivatives of observables.

	In this work, we modify our algorithm by simply replacing the interpolation step with a Gaussian Process (GP) regression.  This is a standard regression used in machine learning which estimates (via maximizing likelihood) the mean function and noise most likely to have generated the observed data.  Specifically, we use a squared-exponential kernel, and we learn the regression hyperparameters (noise variance and lengthscale) by tuning to the data.  The mean function returned from the regression is smooth (infinitely differentiable), and we apply algorithmic differentiation to that.  See figure 1 for an example.

	\section{Results}
	We tested the extended algorithm on a suite of benchmark problems, including some population dynamics models (Lotka-Volterra, SEIR) and some systems from biology (Crauste NELM model, HIV dynamics, Fitzhugh-Nagumo).
	For each system, we generated synthetic data, and added varying levels of noise (from 1e-8 to 1e-2.)  We ran parameter estimation using both AAA interpolation and the GPR-enhanced method, as well as a baseline comparison vs a traditional loss-function minimizer estimation package.  The metrics reported in Table 1 can be interpreted as follows: for each estimation run, we compute relative errors for each identifiable parameter.  We judge each algorithm by its worst relative error.  This is averaged over 10 runs with random parameters to highlight different conditions.

	As expected, interpolation fails even at low levels of noise.  However, the GPR enhanced method maintains accuracy at useful levels of noise, and while estimation loses precision with higher noise (as it must) the algebraic approach is no longer handicapped vs optimization based approaches.

	\section{Discussion and future directions}
	The algorithm above is currently bottle-necked by the polynomial system solving, and as such, the integration of GPR does not meaningfully affect computation run time of our method.  This modest and low-cost addition significantly increases the applicability of algebraic parameter estimation, as most real-world data is sampled imprecisely.  Because GPR is self-tuning, the algorithm retains its "hands-off" nature, demanding little from the user.  Given these benefits, we are enabling GPR by default in our package.

	Gaussian process regression gives not only an estimate of noise variance, but actually furnishes confidence bounds around the mean function and is used for uncertainly quantification.  While most GPR packages do not provide such uncertainty quantification for higher derivatives, an interesting avenue for future research could be to use this information, for picking optimal timepoints or providing confidence intervals on parameter estimates.


	\begin{ack}
		Place acknowledgments here.
	\end{ack}

	\bibliography{ifacconf}             % bib file to produce the bibliography
	% with bibtex (preferred)

	%\begin{thebibliography}{xx}  % you can also add the bibliography by hand

	%\bibitem[Able(1956)]{Abl:56}
	%B.C. Able.
	%\newblock Nucleic acid content of microscope.
	%\newblock \emph{Nature}, 135:\penalty0 7--9, 1956.

	%\bibitem[Able et~al.(1954)Able, Tagg, and Rush]{AbTaRu:54}
	%B.C. Able, R.A. Tagg, and M.~Rush.
	%\newblock Enzyme-catalyzed cellular transanimations.
	%\newblock In A.F. Round, editor, \emph{Advances in Enzymology}, volume~2, pages
	%  125--247. Academic Press, New York, 3rd edition, 1954.

	%\bibitem[Keohane(1958)]{Keo:58}
	%R.~Keohane.
	%\newblock \emph{Power and Interdependence: World Politics in Transitions}.
	%\newblock Little, Brown \& Co., Boston, 1958.

	%\bibitem[Powers(1985)]{Pow:85}
	%T.~Powers.
	%\newblock Is there a way out?
	%\newblock \emph{Harpers}, pages 35--47, June 1985.

	%\bibitem[Soukhanov(1992)]{Heritage:92}
	%A.~H. Soukhanov, editor.
	%\newblock \emph{{The American Heritage. Dictionary of the American Language}}.
	%\newblock Houghton Mifflin Company, 1992.

	%\end{thebibliography}

	\appendix
	\section{A summary of Latin grammar}    % Each appendix must have a short title.
	\section{Some Latin vocabulary}              % Sections and subsections are supported
	% in the appendices.
\end{document}