Dynamic protomer sampling

We recently submitted an R01 grant proposal to the NIH focused on removing a major limitation to the accuracy of alchemical free energy calculations: Incorporating fully consistent treatments of protonation states and tautomers for both ligands and proteins, especially in key systems like kinase:inhibitor binding. This post reviews some of the theory and describes the algorithm we’ve come up with to make this efficient.


Click here to read the full post

Response to NIH RFI on Interim Research Products

The NIH has posted an RFI on including “preprints and interim research products” in NIH applications and reports. Many others have provided responses to this RFI that they have shared publicly, including thoughtful responses from David Mobley, Steven Floor, and others that have been collected by ASAPBio. Here is my response, written very quickly on a train ride back from the NIH.


Click here to read the full post

Integrated likelihood methods for eliminating nuisance parameters

The process of inferring conclusions from experimental observations often requires the construction and evaluation of a likelihood function, which gives the probability, or probability density, of the observed data given a particular statistical model.

The likelihood function forms the basis for many statistical inference techniques. For example, one could estimate the parameters of a model by maximizing the likelihood. The problem is that likelihood functions frequently contain more parameters than we care about. The inexplicable or uninteresting noise that buffets processes of interest has to be parameterized and accounted for in the likelihood function. As Berger, Liseo, and Wolpert discuss in this paper, the existence of so-called ‘nuisance parameters’ severely hampers inference in many cases. The authors review a few of the common frequentist techniques for dealing with nuisance parameters in likelihood functions, but fall strongly in favor of integrating the likelihood function over the nuisance parameters. Although this method has a Bayesian flavor to it, the authors emphasize the practical benefits of integrated likelihoods, even for statisticians with more frequentist leanings.


Click here to read the full post

Nonequilibrium candidate Monte Carlo

In a recent Computational Statistics Club meeting, we covered the topic of Nonequilibrium candidate Monte Carlo (NCMC), by Jerome Nilmeier, Gavin Crooks, David Minh, and John Chodera. Our lab is interested in predicting equilibrium thermodynamic properties of materials, such as binding affinities of small molecules to proteins. Using the framework of statistical mechanics, we can frame these prediction problems as (intractable) integrals over all possible configurations of the flexible protein and flexible ligand, weighted by their Boltzmann weight (the negative, exponentiated reduced potential energy ). For realistic systems, these integrals are complicated enough that there is no analytical solution. Therefore, our objective is to approximate these integrals in clever ways. One way is to construct and simulate a stochastic process whose time averages eventually equal configuration averages. If we simulate long enough, time averages become good estimates for configuration averages. If the stochastic process we construct exhibits slow transitions between configurations (for example, due to high energy barriers between configurations), “long enough” can become prohibitive in cost. We’re interested in coming up with more efficient sampling methods to tackle this issue.


Click here to read the full post

Fitting the Correlation Function

Single molecule fluorescence experiments provide a means to study subtle, time-dependent movement, such as structural rearrangements, which cannot easily be studied in bulk. Such experiments result in many trajectories of the observed intensity over time from which useful information needs to be extracted. The trajectories can be used to calculate the correlation function that describes the correlation between points as a function of lag time. The correlation function contains all of the essential information of the trajectories in compact form and can be used to extract relaxation timescales in a model free way. Since the timescales for slow processes are invariant to the system, these timescales can be compared to timescales extracted through other means as a way to invalidate proposed kinetic models.


Click here to read the full post