Algorithm
Overview of the mathematical model
FRET enables the study of molecular interactions in diverse settings. To design a widely applicable analysis technique, we consider a general system containing proteins (or other molecules) that form complexes and are labelled with fluorophores that act as FRET donors and acceptors (Figure 1A). We assume that all the molecules of interest are fluorescently tagged, so instead of referring to 'donor-tagged proteins' or 'acceptor-tagged proteins', we simply refer to 'donors' and 'acceptors'.
Our model relies on a few other assumptions. First, we assume that donors and acceptors may be free or form bimolecular, donor-acceptor complexes (i.e. [D] + [A] ⇌ [DA]) with dissociation constant
. Whether free or in complex, the fluorophores undergo fluorescence excitation and emission (Figure 1B). In complexes, the donor and acceptor are together and FRET may also occur. We assume that FRET due to random collisions between donors and acceptors occurs seldom enough to be neglected. Finally, we assume that donor-acceptor complexes all have the same FRET efficiency, which is denoted E
fr
and refers to the fraction of instances where exciting a donor in complex leads to excitation of the acceptor.
For our general mathematical description of the FRET system, we use a previously described spectral mixing model [25] with slightly modified matrices. This model relates concentrations of fluorescent molecules to the fluorescence observed at various excitation wavelengths and ranges of emission wavelengths (spectral channels). It can be written as a matrix equation in the following form [6]:
where C is a vector of the concentrations of fluorophores (C = ([D], [A], [DA]) T ), I
obs
is a vector of the fluorescence intensities observed for each spectral channel, and M is a matrix of the photophysical parameters that relate the concentrations of fluorophore to the observed intensities (Figure 1B).
The observed fluorescence intensities could represent data obtained from fluorescence microscopy, spectrophotometry or a flow cytometer. Microscopy images would require standard image processing steps for quantification, such as subtracting background fluorescence and defining regions of interest such as areas within the cytosol or nucleus for localized fluorescence [26].
Photophysics in the mathematical model
The matrix M contains all the relevant information about the photophysical processes we model. In addition to the expected direct excitation and emission, we include resonance energy transfer, cross-talk (unintended excitation) and bleed-through (acceptors emitting in the donors' typical emission range).
The spectral mixing framework can represent any number of spectral channels by increasing the dimensions of the matrix, but we focus on the particular case of three, which describes the most common 'three-cube' FRET experiments [12, 7–9, 14, 18, 17, 21, 16]. The first spectral channel, known as the donor channel, consists of an excitation wavelength that primarily excites donors and an emission filter that collects photons from the donor's emission range. The second channel, the acceptor channel, consists of an excitation wavelength that primarily excites acceptors and an emission filter that collects photons from the acceptor's emission range. The third channel, the FRET channel, combines the donor channel's excitation wavelength with the acceptor channel's emission filter, so that donors are excited preferentially and the emissions are filtered to primarily collect photons from acceptors. For three-cube FRET, the detailed form of Eq (1) is
where
. The parameters
each represent a combination of excitation and emission information [25]:
Here the subscript i refers to the spectral channel (for the three-cube case, it is 1 for the donor channel, 2 for the acceptor channel, and 3 for the FRET channel) and the superscript (S) refers to the fluorescent species (D, the donor; A, the acceptor; and DA, the complex). The variable I
i
is the illumination intensity for the excitation wavelength of channel i;
is the molar extinction coefficient of species (S) with excitation wavelength from channel i; Q(S) is the quantum yield of species (S); and
is the product of the emission of species (S) at the wavelengths of channel i and the sensitivity of the detector to the emitted photons. We describe a calibration procedure for obtaining these values in the Methods.
To illustrate Eq (2), consider the expression it yields for
, the intensity in the donor channel. The intensity will be the sum of contributions from free donors, free acceptors, and donor- acceptor complexes. The contribution from free donors is
and the contribution from free acceptors is
, which results from cross-talk and bleed-through. The complex, DA, can potentially produce contributions from donors not undergoing FRET
, from acceptors
and from FRET
. Adding these contributions together, we obtain the following expression for the predicted intensity in the donor channel,
,
Continuing the matrix multiplication yields analogous expressions for
and
.
Molecular interactions in the mathematical model
To include K
d
in our model, we model the chemical equilibrium, [D] + [A] ⇌ [DA]. Defining [D0 ] = [D] + [DA] and [A0] = [A] + [DA], and given that
, we can compute the values of [DA], [D], and [A] in terms of [D0], [A0], and K
d
.
with [D] = [D0] - [DA] and [A] = [A0] - [DA].
Bayesian inference
Our aim is to infer the values of K
d
and E
fr
from the data. To this end, Bayesian inference is an ideal tool. Given a model, it quantifies the knowledge gained from a set of experimental observations about the parameters of interest, including the uncertainty of that knowledge [27]. That is, it allows us to update the prior probability distribution, P (K
d
, E
fr
), which represents what is known about the parameters before data has been acquired, to the posterior probability distribution, P (K
d
, E
fr
|data), which reflects both the prior information and what is learned from the data. According to Bayes' theorem,
The likelihood function, P (data|K
d
, E
fr
), can be explicitly expressed through our photophysical model and a model of measurement noise. It quantifies the likelihood of different values for K
d
and E
fr
given the current data. We assume that the data follows a Gaussian distribution centred around the predicted intensity given by Eq (2). For three measurements, one from each of the three spectral channels, the likelihood is:
The observed intensities for each channel are I
D
, I
A
, and I
F
.
,
and
denote the predicted intensities and σ
D
, σ
A
and σ
F
measure the magnitude of the measurement noise in each channel and are approximated by the variances of the data in that channel (Methods). For fitting, we use a Gaussian approximation of Eq (7) based around the values of D0 and A0 that maximize the likelihood (see section on marginalization of D0 and A0 in the Methods). Combining the likelihood and prior probability distribution allows us to calculate the posterior probability distribution and determine the most probable values for our parameters of interest.
Illustration of method
To illustrate our method, we simulated data from three cells (Figure 2A, left), although in practice, data could come instead from cellular compartments or other sub-cellular regions of interest. For simplicity, each cell contains equal concentrations of donor ([D0]) and acceptor ([A0]). In the first cell, [D0] = [A0] = 0.2 μM; in the second, [D0] = [A0] = 1 μM; and in the third, [D0] = [A0] = 5 μM. Other parameter values are given in the Methods. We then simulated three-cube FRET measurements, generating data for the donor, acceptor, and FRET channels (Figure 2A, right). We made a series of ten measurements for each channel and each cell, to which we added Gaussian noise so that the standard deviation of the measurements was around 5% of the mean signal strength. We define r as the ratio of the standard deviation of the measurements to the mean signal strength; in this case, r ≈ 0.05.
To obtain the posterior probability distribution that corresponds to this data for two parameters of interest, K
d
and E
fr
, we used two methods, the results of which should agree. First, because we are restricting our analysis to two parameters, we can visualize the distribution by computing the energy, the negative logarithm of the posterior probability, over a grid of values of K
d
and E
fr
. The resulting energy surface, shown in Figure 2B, was minimal around the true values of the parameters. If we were to extend our analysis to more than two parameters, including uncertainty in the
constants for instance, we would use a Markov chain Monte Carlo (MCMC) algorithm to sample from the parameter space. To show that the MCMC algorithm gives results consistent with the energy surface, we used the algorithm to sample the posterior probability distribution for K
d
and E
fr
. We ran the algorithm three times, starting each time at different initial values. All three walks converged on the same small region of equally probable values: they are superimposed on the energy surface in Figure 2B. Both methods therefore recover the true values used to generate the data as optimal parameter values. The approximate posterior probability distributions for K
d
and E
fr
are shown in Figure 2C. The distributions are estimated as the histograms of the three walks, excluding the burn-in period before convergence [28]. They peak near the parameter values used to simulate the data, indicating that the highly probable values for K
d
and E
fr
predicted by the methods are accurate.
Inferring other quantities from FRET data
To further illustrate the versatility of our method, we show that we can also use our model and Bayesian framework to estimate other parameters. Two instrument-independent parameters that have been a focus of interest are the apparent FRET efficiency,
, and the ratio,
[15–19, 14]. The analogous apparent FRET efficiency for the acceptor,
, is the product of E
d
and r
da
. We can estimate these quantities using our method, provided that a calibration has been carried out with cells expressing only donors, only acceptors, and only a donor-acceptor construct. The FRET efficiency of the construct need not be known. Supposing we have calibrated the parameters
of Eq 2 relative to
, which is taken to be unity, then we can express Eq 2 just in terms of [A0] with [D0] = r
da
[A0] and E
fr
[DA] = E
d
r
da
[A0]. We can analytically maximize the posterior probability of r
da
and E
d
as a function of [A0] by setting the derivative of the probability with respect to [A0] equal to zero and solving for this optimal [A0] in terms of r
da
and E
d
. Our MCMC algorithm can be used to sample values of r
da
and E
d
that are consistent with the data.
Testing
Algorithm reflects data quality
An algorithm for estimating parameter values should report not just an estimate of the most probable parameter values but also the reliability of that estimate. To evaluate this aspect of our algorithm, we measured both the error and the uncertainty of its output. We defined the error as the discrepancy between the mean of the posterior probability distribution and the true value. We calculated error as
, meaning that a perfect estimate would have zero error. Because under experimental conditions we would not know the true value, we also calculated the uncertainty of the estimate as an indicator of the reliability even when true values are unknown. The uncertainty corresponds to the width of the posterior probability distributions for K
d
and E
fr
. To prevent the magnitude of the parameter of interest from skewing the uncertainty, we use the relative uncertainty, calculated as the coefficient of variation (the standard deviation divided by the mean).
We first tested our algorithm in the presence of varying levels of measurement noise. As the examples in Figure 3A suggest, when measurement noise increased, the optimal regions in K
d
and E
fr
-space grew wider and longer and the peak of the posterior probability distribution for K
d
often deviated from the true value (see Figure 3A inset). In Figures 3C and 3E, the error (upper plot) and uncertainty (lower plot) for estimating K
d
are shown and confirm that both the error and uncertainty of the estimate increased with increasing noise. Similar results were obtained for the estimate of E
fr
(data not shown).
Next, we verified that gathering more data would improve the quality of the estimates of K
d
and E
fr
. Such data could be obtained by, for example, collecting several images of the same sample or dividing a region of interest into subregions. The examples in Figure 3B show that with more measurements, the optimal region became smaller and the peak of the posterior probability distribution for K
d
moved closer to the true value (see Figure 3B inset). Figures 3D and 3F show, as expected, that increasing the number of measurements decreases the error and uncertainty for the K
d
estimate. The effect of data quantity on the error and uncertainty of the E
fr
estimate was similar (data not shown).
Overall, our algorithm reliably reflects the quality of the data analyzed in the uncertainty it gives of its estimate and accurately infers the values of K
d
and E
fr
for 10 measurements per cell per channel with 10% measurement noise. We also note that in Figures 3C and 3D, the mean error in K
d
remained close to zero, indicating that neither measurement noise nor number of measurements systematically biased the location of the peak.
The total amount of donor, [D
0
], and the total amount of acceptor, [ A
0
], affect inference of K
d
and E
fr
Inferring unique values of K
d
and E
fr
requires variations in [D0] and [A0]
A challenge in analyzing FRET data is that both E
fr
and K
d
affect the extent of FRET that may occur, making it difficult to tease out the contribution of each parameter. In general, for data from a single pair of donor and acceptor concentrations, a weak E
fr
and strong K
d
can give rise to the same observations as a strong E
fr
and weak K
d
. However, because our algorithm can analyze data from multiple cells at once, it can overcome this challenge.
To demonstrate the problem, we analyzed a set of simulated data from a single sample with [A0] = [D0] = 1 μM. As Figure 4A shows, the resulting region of optimal values had an elongated shape, indicating that many equally probable solutions exist for {K
d
, E
fr
} for that dataset. To show that collecting more data from similar cells does not help, we repeated the experiment, simulating data from 3 cells with the same concentrations as the first. As Figure 4B shows, including this data made the region slightly narrower but did not change its elongated shape.
The difficulty in determining K
d
without knowing E
fr
or vice versa can be overcome by analyzing data from samples containing varying fluorophore concentrations [16, 21]. To quantitatively justify how varying concentrations facilitates parameter fitting, we simulated data from three cells containing different concentrations of donors and acceptors. As Figure 4C shows, analyzing data from two different cells decreased the size of the region of equally probable values. Combining data from a cell containing [D0] = [A0] = 1 μM with data from a cell containing 0.5 μM eliminated high values of K
d
(Figure 4C); combining data from a 1 μM cell and a 5 μM cell eliminated low values of K
d
(Figure 4C, inset). As Figure 4D indicates, analyzing data from all three cells (0.5 μM, 1 μM, and 5 μM) together eliminated higher and lower K
d
values, leaving a small region of equally probable values that included the true values. To show why varying fluorophore concentrations is effective, we analyzed data from each of the three cells in Figure 4D individually. The optimal regions for each cell are plotted in Figure 4E and clearly intersect at the true value. Varying the concentrations more broadly can further shrink the solution region (Figure 4F).
Inference is limited if [D0] = [A0] ≫ K
d
or [D0] = [A0] ≪ K
d
Plots of complex formation vs. K
d
(Figure 5A, insets) show that only K
d
values within a limited range significantly affect complex formation: outside this range, further changes in K
d
have little effect. For instance, when [D0] = [A0] ≈ K
d
, small changes in K
d
in either direction affect [DA] (Figure 5A centre, inset). However, when [D0] = [A0] ≪ K
d
(Figure 5A, left inset) or [D0] = [A0] ≫ K
d
(Figure 5A, right inset), only changes to K
d
in one direction affect the amount of [DA]. When a range of K
d
values are consistent with the data, K
d
cannot be fit to a single value.
By finding the posterior probability distribution of K
d
, our algorithm predicts when a range of K
d
values will be consistent with the data. First, we simulated data with equal concentrations of acceptors and donors that were very low or very high relative to K
d
. As shown in Figure 5A, the approximate posterior probability distributions of K
d
for data simulated with relatively low or high concentrations form plateaus instead of single peaks. The plateaus indicate that a range of K
d
values is equally probable and consistent with the data. The posterior distribution obtained for [D0] = [A0] ≪ K
d
reveals that K
d
could be any number greater than approximately 10-6M; the distribution for [D0] = [A0] ≫ K
d
reveals that K
d
< 10-6M.
This analysis illustrates how posterior probability distributions for the parameters of interest can be more informative than single values. For example, information from posterior distributions can be useful for improving experimental design. If one were to obtain plateaued distributions like the two shown in Figure 5A, one could recognize that the fluorophore concentrations were too low or high relative to K
d
and devise a more informative experiment.
The ratio [D0]:[A0] affects inference
We also used our algorithm to explore the effects of varying the ratio, [D0] : [A0], on the algorithm's ability to infer K
d
(Figure 5B) and how these effects arise. While other authors have reported that FRET data becomes increasingly unreliable as the ratio [D0] : [A0] deviates from unity [13], they measured how noise propagates in FRET formulas and did not directly address how the ratio impacts the measurement of interaction strength. As Figure 5B shows, the width of the approximate posterior distributions for K
d
broaden as the ratio increases, meaning that the uncertainty in the estimate increases. The peaks of the distributions however remain close to the true values. The same effect occurs when the ratio decreases (data not shown).
The insets in Figure 5 illustrate why the ratio affects the uncertainty, showing data from the donor and FRET channels in the presence and absence of FRET. Here, FRET contributes only a fraction of the observed signals. This small contribution is particularly evident in the FRET channel, where the change in the signal due to FRET is small compared to the measurement error: measurement error is 3% and the increase in the FRET channel from FRET is only 2.9% for [D0] : [A0] = 1 and drops to 0.7% for [D0] : [A0] = 10 (FRET is included by changing the FRET efficiency from 0 to 0.4). As the ratio increased, the relative shortage of acceptors meant that fewer complexes could form, making the contributions from FRET smaller and causing the uncertainty of the estimate to grow. In the absence of measurement noise, however, our algorithm can estimate K
d
even from very extreme ratios. These data also demonstrate that with our method, we can make inferences about K
d
even when FRET causes perceptible changes in only one channel (e.g. the donor channel here).
Knowing how spectral overlap and the ratio of donors to acceptors affect the inference of protein interaction strength helps make it possible to design informative experiments. For example, we find that in each channel, the difference generated by FRET depends on spectral overlap. In the donor channel, where FRET decreases the signal, the relative change is
, meaning that the observed change depends on the difference between acceptor bleed-through,
, and donor fluorescence,
. Analogous changes occur in the other channels. When [D0] = [A0], the change in the donor channel is proportional to
, the change in the acceptor channel to
and the change in the FRET channel to
. In each channel, increasing the difference between
and
would make experiments more informative.
Prior information improves estimate
A further advantage of our method is its flexibility. The Bayesian framework makes it possible to incorporate additional details we know about the system as prior information, allowing us to more accurately represent the system being analyzed and potentially improve our estimate of the parameters of interest. To exploit this feature, we tested whether including additional information about the FRET efficiency would improve our inference of E
fr
and K
d
. Information about E
fr
could be obtained, for example, from lifetime imaging experiments [29].
Figure 6 shows the approximate posterior probability distributions for E
fr
and K
d
that result from analyzing the same data in the presence and absence of prior information for E
fr
. We included slightly inaccurate prior information with large uncertainty, supposing that E
fr
was measured to be 0.45 (instead of the true value, E
fr
= 0.4) with an uncertainty of 0.15. Even with this limited information, the resulting histograms for both K
d
and E
fr
have taller, narrower peaks which are closer to the true values compared to the histograms produced by analyzing the same data in the absence of the prior information. As expected, this improvement was more substantial when we analyzed a set of data consisting of just 3 measurements per cell per channel (Figure 6, right column): prior information becomes more important the less relevant information there is in the data. As another form of prior information, we could also include error in the calibration measurements for the constants that relate molecular concentration to fluorescence in Eq (2). Incorporating this information would make it possible to monitor the impact of that error on the uncertainty of the final estimate of K
d
.
Summary of testing
We have shown, using typical, simulated FRET data, that our algorithm accurately recovers the parameters of interest. It responded consistently and intuitively to changes in the amount of measurement noise present in the data and the quantity of data. We also used the posterior probability distributions we obtain for the parameters to gain insight into how the magnitude and variation of the donor and acceptor concentrations affect the ability of the algorithm to infer K
d
. The algorithm gave informative results even when given data simulated with very high or low concentrations of donors and acceptors relative to the K
d
although, as expected, with this data it could not converge on a unique pair of values for K
d
and E
fr
but gave upper or lower bounds. Gaining an understanding of how experimental parameters such as measurement noise, number of measurements, spectral overlap and fluorophore concentrations affect inference allows experimenters to design optimal experiments that yield information with less uncertainty [24].