(C) PLOS One This story was originally published by PLOS One and is unaltered. . . . . . . . . . . Differential methods for assessing sensitivity in biological models [1] ['Rachel Mester', 'Department Of Computational Medicine', 'University Of California Los Angeles', 'Los Angeles', 'California', 'United States Of America', 'Alfonso Landeros', 'Chris Rackauckas', 'Computer Science', 'Artificial Intelligence Laboratory'] Date: 2022-08 Abstract Differential sensitivity analysis is indispensable in fitting parameters, understanding uncertainty, and forecasting the results of both thought and lab experiments. Although there are many methods currently available for performing differential sensitivity analysis of biological models, it can be difficult to determine which method is best suited for a particular model. In this paper, we explain a variety of differential sensitivity methods and assess their value in some typical biological models. First, we explain the mathematical basis for three numerical methods: adjoint sensitivity analysis, complex perturbation sensitivity analysis, and forward mode sensitivity analysis. We then carry out four instructive case studies. (a) The CARRGO model for tumor-immune interaction highlights the additional information that differential sensitivity analysis provides beyond traditional naive sensitivity methods, (b) the deterministic SIR model demonstrates the value of using second-order sensitivity in refining model predictions, (c) the stochastic SIR model shows how differential sensitivity can be attacked in stochastic modeling, and (d) a discrete birth-death-migration model illustrates how the complex perturbation method of differential sensitivity can be generalized to a broader range of biological models. Finally, we compare the speed, accuracy, and ease of use of these methods. We find that forward mode automatic differentiation has the quickest computational time, while the complex perturbation method is the simplest to implement and the most generalizable. Author summary Over the past few decades, mathematical modeling has become an indispensable tool in the biologist’s toolbox. From deterministic to stochastic to statistical models, computational modeling is ubiquitous in almost every field of biology. Because model parameter estimates are often noisy or depend on poorly understood interactions, it is crucial to examine how both quantitative and qualitative predictions change as parameter estimates change, especially as the number of parameters increases. Sensitivity analysis is the process of understanding how a model’s behavior depends on parameter values. Sensitivity analysis simultaneously quantifies prediction certainty and clarifies the underlying biological mechanisms that drive computational models. While sensitivity analysis is universally recognized to be an important step in modeling, it is often unclear how to best leverage the available differential sensitivity methods. In this manuscript we explain and compare various differential sensitivity methods in the hope that best practices will be widely adopted. We stress the relative advantages of existing software and their limitations. We also present a new numerical technique for computing differential sensitivity. Citation: Mester R, Landeros A, Rackauckas C, Lange K (2022) Differential methods for assessing sensitivity in biological models. PLoS Comput Biol 18(6): e1009598. https://doi.org/10.1371/journal.pcbi.1009598 Editor: Attila Csikász-Nagy, Pázmány Péter Catholic University: Pazmany Peter Katolikus Egyetem, HUNGARY Received: October 29, 2021; Accepted: May 12, 2022; Published: June 13, 2022 Copyright: © 2022 Mester et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: There is no primary data in the paper; all materials are available at https://github.com/rachelmester/SensitivityAnalysis. Funding: RM research supported by NIH grant T32HG002536. KL research supported by USPHS grants HG006139 and GMS141798. CR research supported by the National Science Foundation under grant no. OAC-1835443, grant no. SII-2029670, grant no. ECCS-2029670, grant no. OAC-2103804, and grant no. PHY-2021825, the U.S. Agency for International Development through Penn State for grant no. S002283-USAID, the Advanced Research Projects Agency-Energy (ARPA-E), U.S. Department of Energy, under Award no. DE-AR0001211 and DE-AR0001222, The Research Council of Norway and Equinor ASA through Research Council project “308817 - Digital wells for optimal production and drainage”, the United States Air Force Research Laboratory and the U.S. Air Force Artificial Intelligence Accelerator under Cooperative Agreement Number FA8750-19-2-1000. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof. The funders had no role in study design, data collection, and analysis. Competing interests: The authors have declared that no competing interests exist. This is a PLOS Computational Biology Methods paper. 1 Introduction In many mathematical models underlying parameters are poorly specified. This problem is particularly acute in biological and biomedical models. Model predictions can have profound implications for scientific understanding, further experimentation, and even public-policy decisions. For instance, in an epidemic some model parameters can be tweaked by societal or scientific interventions to drive infection levels down. Differential sensitivity can inform medical judgement about the steps to take with the greatest impact at the least cost. Similar considerations apply in economic modeling. Additionally, parameter estimation for model fitting usually involves differential sensitivity through maximum likelihood or least squares criteria. These optimization techniques depend heavily on gradients and Hessians with respect to parameters. While some parameter estimation methods rely on Bayesian computational techniques [1] rather than gradients, these techniques tend to scale poorly as the number of model parameters increases. A common way to alleviate the poor scaling of Bayesian inference is Hamiltonian Monte Carlo [2], which itself requires gradient calculations. Techniques for assessing sensitivity of stochastic models often rely on the gradient-dependent Fisher information matrix of the model, which is the basis for a variety of multi-step local sensitivity analysis techniques for discrete stochastic models [3]. Calculation of gradients and Hessians of a model can also be important in other steps of the scientific process. For example, iterative model development [4] involves using the Fisher information matrix to inform experimental design. Extended Kalman filtering [5] incorporates differential sensitivity into model construction. Regardless of the method, parameter estimation is an important step in fitting a biological model, and the success of this step strongly impacts the ultimate utility of the model. Understanding the uses and limitations of differential sensitivity can aid in determining the identifiability of model parameters, how sensitive they are to experimental error or measurement noise, and the overall importance of their existence in the model. Finally, it is worth noting that while local sensitivity analysis is the focus of this manuscript, global sensitivity analysis often relies on local differential sensitivity estimates to inform optimal stepsizes in regional searching [6] or to resolve inconsistencies that arise when local sensitivity is non-monotonic [7]. In any case it is imperative to know how sensitive model predictions are to changes in parameter values. Unfortunately, assessment of model sensitivity can be time consuming, computationally intensive, inaccurate, and simply confusing. Most models are nonlinear and resistant to exact mathematical analysis. Understanding their behavior is only approachable by solving differential equations or intensive and noisy simulations. Sensitivity analysis is often conducted over an entire bundle of neighboring parameters to capture interactions. If the parameter space is large or high-dimensional, it is often unclear how to choose representative points from this bundle. Faced with this dilemma, it is common for modelers to fall back on varying just one or two parameters at a time. Model predictions also often take the form of time trajectories. In this setting, sensitivity analysis is based on lower and upper trajectories bounding the behavior of the dynamical system. The differential sensitivity of a model quantity is measured by its gradient with respect to the underlying parameters at their estimated values. The existing literature on differential sensitivity is summarized in the modern references [8,9]. There are a variety of software packages that evaluate parameter sensitivity. For example, the Julia software DifferentialEquations.jl [10] makes sensitivity analysis routine for many problems. Additionally, PESTO [11] is a current Matlab toolbox for parameter estimation that uses adjoint sensitivities implemented as part of the CVODES method from SUNDIALS [12]. Although the physical sciences have widely adopted the method of differential sensitivity [13,14], the papers and software generally focus on a single sensitivity analysis method rather than a comparison of the various approaches. This singular focus leaves open many questions when biologists conduct sensitivity analyses. Should the continuous sensitivity equations be used, or would automatic differentiation of solvers be more efficient on biological models? On the types of models biologists generally explore, would implicit parallelism within the sensitivity equations be beneficial, or would the overhead cost of thread spawning overrule any benefits? How close do simpler methods based on complex perturbation get to these techniques? The purpose of the current paper is to explore these questions on a variety of models of interest to computational biologists. In the current paper we also suggest an accurate method of approximating gradients that compares favorably against forward automatic differentiation techniques, provided a model involves analytic functions without discontinuities, maxima, minima, absolute values, or any other excursion outside the universe of analytic functions. In the sections immediately following, we summarize known theory, including the important adjoint method for computing the sensitivity of functions of solutions [13, 14]. Then we illustrate sensitivity analysis for a few deterministic models and a few stochastic models. Our exposition includes some straightforward Julia code that readers can adapt to their own sensitivity needs. These examples are followed by an evaluation of the accuracy and speed of the suggested numerical methods. The concluding discussion summarizes our experience, indicates limitations of the methods, and suggests new potential applications. For the record, here are some notational conventions used throughout the paper. All functions that we differentiate have real or real-vector arguments and real or real-vector values. All vectors and matrices appear in boldface. The superscript indicates a vector or matrix transpose. For a smooth real-valued function f(x), we write its gradient (column vector of partial derivatives) as ∇f(x) and its differential (row vector of partial derivatives) as . If g(x) is vector-valued with ith component g i (x), then the differential (Jacobi matrix) dg(x) has ith row dg i (x). The chain rule is expressed as the equality of differentials. The transpose (adjoint) form of the chain rule is . For a twice-differentiable function, the second differential (Hessian matrix) d2f(x) = d∇f(x) is the differential of the gradient. Finally, i will denote . 2 Methods for computing sensitivity 2.1 Forward method S3 Appendix briefly discusses sensitivity analysis for the linear constant coefficient system of ordinary differential equations (ODEs). Sensitivity of the nonlinear system can be evaluated by differentiating the original ODE with respect to β j , interchanging the order of differentiation, and numerically integrating the system This formulation of the problem depends on knowing x(t, β). In practice, one solves the system (1) jointly, where d β x[t, β] is the Jacobi matrix of x(t, β) with respect to β. This is commonly referred to as forward sensitivity analysis and is carried out by software suites such as DifferentialEquations.jl and SUNDIALS CVODES [12]. We note that a common implementation of sensitivity analysis is to base calculations on directional derivatives. Thus, the directional derivative version of the forward method allows one to evolve dynamical systems without ever computing full Jacobians. The forward method can also be applied when quantities of interest are defined recursively. 2.2 Adjoint methods The adjoint method is incorporated in the biological parameter estimation software PESTO through CVODES [12]. This method [8,9] is defined directly on a function g[x(β), β] of the solution of the ODE. The adjoint method introduces a Lagrange multiplier λ(β), numerically solves the ODE system forward in time over [t 0 , t n ], then solves the system for λ(β) in reverse time, and finally uses the introduced parameter to compute derivatives via The second and third stages are commonly combined by appending the last equation to the set of ODEs being solved in reverse. This tactic achieves a lower computational complexity than other techniques, which require solving an n-dimensional ODE system p times for p parameters. In contrast, the adjoint method solves an n-dimensional ODE forwards and then solves an n-dimensional and a p-dimensional system in reverse, changing the computational complexity from to . Whether such asymptotic cost advantages lead to more efficiency on practical models is precisely one of the points studied in this paper. Alternatively, one can find the partial derivatives using finite differences. The simplest method here is to compute a slightly perturbed trajectory x(t, β+Δv) and form the forward differences at all specified time points as approximations to the forward directional derivatives of x(t, β) in the direction v. Choosing v to be unit vectors along each coordinate axis gives ordinary partial derivatives. The accuracy of this crude method suffers from round-off error in subtracting two nearly equal function values. These round-off errors are in addition to the usual errors committed in integrating the differential equation numerically. Round-off errors can be ameliorated by using central differences rather than forward differences. However, the central difference method requires twice the number of computations as the forward difference method. Thus, the choice of a difference method depends on prioritization of accuracy versus computational efficiency. In small models, computational efficiency may be less of a priority, in which case central difference methods are preferred. 2.3 Complex perturbation methods There is a far more accurate way of computing model sensitivity when the function f[x, β] defining the ODE is analytic in the parameter vector β [15]. An analytic function can be expanded in a locally convergent power series around every point of its domain. This implies that the trajectory x(t, β) is also analytic in β. For a real analytic function g(β) of a single variable β, the derivative approximation in the complex plane avoids roundoff and is highly accurate for Δ>0 very small [16,17]. Thus, in calculating a directional derivative of x(t, β), it suffices to (a) solve the governing ODE with β+Δiv replacing β, (b) take the imaginary part of the result, and (c) divide by Δ. To make these calculations feasible, the computer language implementing the calculations should support complex arithmetic and ideally have an automatic dispatching mechanism so that only one implementation of each function is required. In contrast to numerical integration of the joint system (Eq 1), the complex perturbation method is much more simply parallelizable across parameters. The following straightforward Julia routine for computing sensitivities function differential(f::F, p, Δ) where F fvalue = real(f(p)) # function value df = zeros(length(fvalue), length(p)) # states x parameters pworker = [map(complex, p) for _ in 1:Threads.nthreads()] Threads.@threads for j = 1:length(p) _p = pworker[Threads.threadid()] # thread worker array _p[j] = _p[j] + Δ * im # perturb parameter fj = f(_p) # compute perturbed function value _p[j] = complex(real(_p[j]), 0.0) # reset parameter df[:,j]. = imag(fj)./ Δ # fill in jth partial end return (fvalue, df) end takes advantage of the simplicity of multithreading the complex perturbation method by parameter. This function requires a function f(p): ℝn↦ℝm of a real vector p declared as complex. The perturbation scalar Δ should be small and real, say 10−10 to 10−12 in double precision. If the parameters p j vary widely in magnitude, then a good heuristic is to perturb p j by p j di instead of di. The returned value df is an m×n real matrix. The Julia commands real and complex effect conversions between real and complex numbers, and Julia substitutes im for . We will employ these functions later in some case studies. A recent extension [18] of the complex perturbation method facilitates accurate approximation of second derivatives. The relevant formula is (2) where . Roundoff errors can now occur but are usually manageable. Here we present a novel result for how to extend the complex perturbation method to approximate mixed partials. Our derivation is condensed into the following equations This approximation is accurate to order O(Δ6) and allows us to infer that (3) Since we can approximate and , we can now approximate to order O(Δ4). These approximations are derived in S1 Appendix. The Julia code for computing second derivatives function hessian(f::F, p, Δ) where F d2f = zeros(length(p), length(p)) # hessian dp = Δ * (1.0 + 1.0 * im) / sqrt(2) for j = 1:length(p) # compute diagonal entries of d2f p[j] = p[j] + dp fplus = f(p) p[j] = p[j] - 2 * dp fminus = f(p) p[j] = complex(real(p[j]), 0.0) # reset parameter d2f[j, j] = imag(fplus + fminus) / Δ^2 end for j = 2:length(p) # compute off diagonal entries for k = 1:(j—1) (p[j], p[k]) = (p[j] + dp, p[k] + dp) fplus = f(p) (p[j], p[k]) = (p[j] - 2 * dp, p[k] - 2 * dp) fminus = f(p) (p[j], p[k]) = (complex(real(p[j]), 0.0), complex(real(p[k]), 0.0)) d2f[j, k] = imag(fplus + fminus) / Δ^2 d2f[j, k] = (d2f[j, k]—d2f[j, j]—d2f[k, k]) / 2 d2f[k, j] = d2f[j, k] end end return d2f end operates on a scalar-valued function f(u) of a real vector p declared as complex. The second-order complex perturbation method can also be multithreaded by parameter, provided the unmixed second partials are computed prior to the mixed ones. Because roundoff error is now a concern, the perturbation scalar Δ should be in the range 10−3 to 10−6 in double precision. The returned value d2f is a symmetric matrix. 2.4 Automatic differentiation Another technique one can use to calculate the derivatives of model solutions is to differentiate the numerical algorithm that calculates the solution. This can be done with computational tools collectively known as automatic differentiation [19]. Forward mode automatic differentiation is performed by carrying forward Jacobian-vector products at each successive calculation. This is accomplished by defining higher-dimensional numbers, known as dual numbers [20], coupled to primitive functions f(x) through the action Here ϵ is a dimensional marker, similar to the complex i, which is a two-dimensional number. For a composite function f = f 2 ∘ f 1 , the chain rule is . The ith column of the Jacobian appears in the expression . Since computational algorithms can be interpreted as the composition of simpler functions, one need only define automatic differentiation on a small set of base cases (such as +, *, sin, and so forth, known as the primitives) and then apply the accepted rules in sequence to differentiate more elaborate functions. The ForwardDiff.jl package [20] in Julia accomplishes this by defining dispatches for such primitives on a dual number type and provides convenience functions for easily extracting common objects like gradients, Jacobians, and Hessians. Hessians are calculated by layering automatic differentiation twice on the same algorithm to effectively take the derivative of a derivative. In this form, forward mode automatic differentiation shares many similarities to the complex perturbation methods described above without the requirement that the extension of f(x) be complex analytic. At every stage of the calculation f(x) must be differentiable, a weaker yet still restrictive assumption. Conveniently, automatic differentiation allows for arbitrarily many derivatives to be calculated simultaneously. By defining higher-dimensional dual numbers that act independently via one can calculate entire Jacobians in a single function call . This use of higher-dimensional dual numbers is a practice known as chunking. Chunking reduces the number of primal (non-derivative) calculations required for computing the Jacobian. Because the ForwardDiff.jl package uses chunking by default, we will investigate the extent to which this detail is applicable in biological models. 5 Discussion Our purpose throughout has been to demonstrate the ease and utility of incorporating differential sensitivity analysis in dynamical modeling. Because models are always approximate, and parameters are measured imprecisely, uncertainty plagues virtually all dynamical models. While improving models is incremental and domain specific, sensitivity analysis provides a handle on local parameter uncertainty across models. Of the methods mentioned in this text, the adjoint method, forward method, and complex perturbation methods all require that the functions defining a model be differentiable in the underlying parameters. While the complex perturbation method has the additional requirement that these functions be complex analytic, it is the only method explored in this manuscript that can be extended to discrete stochastic models in addition to ODE systems. For the modeler who prefers a one size fits all approach, or who prefers to prioritize ease of implementation, we argue that the complex perturbation method should be the method of choice. In addition to its wide range of applicability, the complex perturbation method can be easily multi-threaded and requires only implementation of the component functions of the model. In contrast to the second-order complex perturbation method, forward differentiation slows dramatically in calculating a Hessian directly. It becomes competitive if one calculates the gradient of the gradient. The gradient of the gradient method is not always available natively and usually must be implemented separately as we have done in the current manuscript. Crucially, implementing a specialized forward mode method was possible due to the underlying automatic differentiation software’s flexibility and support for composition. In situations demanding computational speed, our results suggest that choosing a method tailored to a model may be pertinent. In the case of stochastic models, manually differentiating and applying the chain rule must be balanced against the complex perturbation method, which requires less effort up front but longer processing after the derivatives have been determined. For ODE systems models, forward mode is the most computationally efficient when multichunking is available. If multichunking is not available, then the complex perturbation method has comparable speed to the forward method when run with the same tolerance. In maximizing computational efficiency, it is important to note that the use of automatic differentiation tools may require more user input for algorithm selection or multi-threading implementation. Choice of software is critical as well; not all software packages with automatic forward differentiation support chunking as implemented in the ForwardDiff.jl package and that so dramatically improves the computational efficiency of this method. There are additional challenges to computing model sensitivity that we do not address. For example, not all models use functions that are differentiable in their parameters. Additionally, models may be differentiable yet extremely stiff, in which case the computational time for each sensitivity method discussed here will suffer disproportionally as the number of parameters grows. Furthermore, assessing global parameter sensitivity is more challenging. It can be attacked by techniques such as Latin square hypercube sampling or Sobel quasi-random sampling, but these become infeasible in high dimensions [34]. Given the availability of appropriate software, differential sensitivity is computationally feasible, even for high-dimensional systems. In the case of stochastic models, traditional methods require costly and inaccurate simulation over a bundle of parameter values. Differential sensitivity is often out of the question. Current automatic differentiation systems, such as PyTorch, Zygote and ForwardDiff, treat generated random numbers as constants, and thus are not reliable methods for use in calculating differential sensitivity of model outcomes that depend on these random variables. This limits the ability of researchers to understand a biological system and how it responds to parameter changes. If a system index such as a mean, variance, extinction probability, or extinction time can be computed by a reasonable algorithm, then differential parameter sensitivity analysis can be undertaken. We have indicated in a handful of examples how this can be accomplished. In summary, across many models representative of computational biology, we have reached the following conclusions: Forward mode, adjoint, and complex perturbation sensitivity methods all converge to the same differential sensitivity values in non-stiff models, thus offering the same level of accuracy for all methods. For stiff models, forward mode and complex perturbation methods converge but adjoint sensitivity struggles and does not achieve convergence for realistic tolerance parameters. Chunked forward mode automatic differentiation and forward mode sensitivity analysis tend to be the most computationally efficient on the tested models. The complex perturbation methods described in this manuscript are competitive and often outperform the unchunked version of forward mode automatic differentiation, while being less sensitive to stiffness than the adjoint methods. Shared memory multi-threading of the complex perturbation and forward mode automatic differentiation methods provides a performance gain but only in high-dimensional systems. Forward mode automatic differentiation method requires that each step of a calculation is differentiable. This renders it unusable for calculating the derivative of ensemble means of discrete state models, such as birth-death processes. For these cases, the complex perturbation method outperforms manual differentiation. The complex perturbation method is competitive with automatic differentiation methods in accuracy, is more straightforward to implement, and can be applied to a wider variety of methods. These conclusions are tentative but supported by our limited number of biological case studies. We note that the performance differences may change depending on the efficiency of the implementations. The Julia DifferentialEquations.jl library and its DiffEqSensitivity.jl package have been shown to be highly efficient, outperforming other libraries in both equation solving and derivative calculations in Python, MATLAB, C, and Fortran [19,33]. Details on the current state of performance can be found at https://github.com/SciML/SciMLBenchmarks.jl. The automatic differentiation implementations in machine learning libraries optimize array operations much more than scalar operations. This can work to the detriment of forward mode AD. MATLAB or Python style vectorization improves the performance of forward mode AD sensitivity analysis by reducing interpreter overhead. Therefore, our conclusions serve as guidelines for the case where all implementations are well-optimized. For programming languages with high overheads or without compile-time optimization of the automatic differentiation passes, the balance in efficiency shifts more favorably towards the complex perturbation method. One last point worth making is on the coding effort required by the various methods. Both automatic differentiation and the complex perturbation method have comparable accuracy when applied to systems of ODEs, with automatic differentiation having the advantage in speed when it is implemented with the additional level of parallelization provided by chunking. However, the complex perturbation method can easily be generalized to other kinds of objective functions and may be more straightforward to implement for those less sophisticated in computer science. While automatic differentiation is the basis of many large scientific packages, the code required for the complex perturbation methods is fully contained within this manuscript and is easily transferable to other programming languages with similar dispatching on complex numbers. This hard to measure benefit should not be ignored by practicing biologists who simply wish to quickly arrive at reasonably fast code. Acknowledgments We wish to thank Chris Elrod for assistance in adding multithreading to parts of our software. We wish to thank Janet Sinsheimer, Mary Sehl, Xiang Ji, and Jason Xu for helpful comments on the manuscript and biological applications. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein. [END] --- [1] Url: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009598 Published and (C) by PLOS One Content appears here under this condition or license: Creative Commons - Attribution BY 4.0. via Magical.Fish Gopher News Feeds: gopher://magical.fish/1/feeds/news/plosone/