Parameter estimation fits the unknown rate constants, binding affinities, and Hill coefficients of a biological model to experimental data. The challenge in systems biology is that models are typically underdetermined: they have more parameters than the data can constrain, leading to non-identifiability (many parameter sets fit the data equally well) and practical identifiability issues (parameters are correlated, creating ridges in the likelihood landscape). Methods range from optimization-based (least squares, maximum likelihood with global search algorithms) to Bayesian (MCMC sampling of the posterior distribution over parameters). Ensemble approaches that characterize the full range of plausible parameter sets, rather than seeking a single "best fit," are increasingly recognized as essential for making reliable predictions from biological models.
Building an ODE model of a biological system is only half the battle. The model contains parameters — production rates, degradation rates, binding affinities, Hill coefficients, Michaelis-Menten constants — that determine its quantitative behavior. Most of these parameters have never been measured directly, so they must be estimated by fitting the model to experimental data. This sounds like standard curve fitting, but parameter estimation in systems biology is vastly more challenging than fitting a polynomial to a scatter plot.
The first challenge is non-identifiability. A model with 30 parameters fit to a time series measuring 5 molecular species at 10 time points has 50 data points constraining 30 unknowns. The system is underdetermined, and many parameter sets produce indistinguishable fits. This non-identifiability can be structural (a mathematical property of the model — certain parameters always appear together and cannot be separated regardless of data) or practical (the data is insufficiently informative to constrain parameters that are theoretically distinguishable). Identifiability analysis — performed before data collection — determines which parameters can be estimated from the planned experiments and what additional measurements would resolve ambiguities.
The second challenge is the objective function landscape. The distance between model predictions and data, plotted as a function of parameters, is typically highly non-convex — riddled with local minima, flat ridges, and narrow valleys. Nonlinear dynamics with Hill functions and feedback loops create parameter correlations and compensatory effects (increasing one rate while decreasing another can maintain the fit). Standard gradient-based optimization quickly gets trapped in local minima, returning parameter estimates that depend on the starting point. Global optimization methods (differential evolution, particle swarm, simulated annealing) search the parameter space broadly, and multi-start strategies (running local optimization from many random starting points) map out the landscape's multimodal structure.
The modern best practice is Bayesian parameter estimation, which treats parameters as random variables with prior distributions and uses the data to compute posterior distributions. Markov chain Monte Carlo (MCMC) sampling explores the posterior, characterizing not just the best-fit parameters but the full range of plausible values and their correlations. The posterior distribution directly quantifies parameter uncertainty and propagates it to model predictions — revealing which predictions are robust (narrow posterior predictive interval) and which are uncertain (wide interval). This ensemble approach is philosophically more honest than reporting a single "best-fit" parameter set: it acknowledges that in systems biology, we rarely know parameter values precisely, and our predictions should reflect this uncertainty.