# Dynamic protomer sampling

We recently submitted an R01 grant proposal to the NIH focused on removing a major limitation to the accuracy of alchemical free energy calculations: Incorporating fully consistent treatments of protonation states and tautomers for both ligands and proteins, especially in key systems like kinase:inhibitor binding. This post reviews some of the theory and describes the algorithm we’ve come up with to make this efficient.

# Response to NIH RFI on Interim Research Products

The NIH has posted an RFI on including “preprints and interim research products” in NIH applications and reports. Many others have provided responses to this RFI that they have shared publicly, including thoughtful responses from David Mobley, Steven Floor, and others that have been collected by ASAPBio. Here is my response, written very quickly on a train ride back from the NIH.

# Integrated likelihood methods for eliminating nuisance parameters

The process of inferring conclusions from experimental observations often requires the construction and evaluation of a likelihood function, which gives the probability, or probability density, of the observed data given a particular statistical model.

The likelihood function forms the basis for many statistical inference techniques. For example, one could estimate the parameters of a model by maximizing the likelihood. The problem is that likelihood functions frequently contain more parameters than we care about. The inexplicable or uninteresting noise that buffets processes of interest has to be parameterized and accounted for in the likelihood function. As Berger, Liseo, and Wolpert discuss in this paper, the existence of so-called ‘nuisance parameters’ severely hampers inference in many cases. The authors review a few of the common frequentist techniques for dealing with nuisance parameters in likelihood functions, but fall strongly in favor of integrating the likelihood function over the nuisance parameters. Although this method has a Bayesian flavor to it, the authors emphasize the practical benefits of integrated likelihoods, even for statisticians with more frequentist leanings.

# Nonequilibrium candidate Monte Carlo

In a recent Computational Statistics Club meeting, we covered the topic of Nonequilibrium candidate Monte Carlo (NCMC), by Jerome Nilmeier, Gavin Crooks, David Minh, and John Chodera. Our lab is interested in predicting equilibrium thermodynamic properties of materials, such as binding affinities of small molecules to proteins. Using the framework of statistical mechanics, we can frame these prediction problems as (intractable) integrals over all possible configurations of the flexible protein and flexible ligand, weighted by their Boltzmann weight (the negative, exponentiated reduced potential energy $e^{-u(x)}$). For realistic systems, these integrals are complicated enough that there is no analytical solution. Therefore, our objective is to approximate these integrals in clever ways. One way is to construct and simulate a stochastic process whose time averages eventually equal configuration averages. If we simulate long enough, time averages become good estimates for configuration averages. If the stochastic process we construct exhibits slow transitions between configurations (for example, due to high energy barriers between configurations), “long enough” can become prohibitive in cost. We’re interested in coming up with more efficient sampling methods to tackle this issue.