Evaluation of the coarse-grained OPEP force field for protein-protein docking

Kynast, Philipp; Derreumaux, Philippe; Strodel, Birgit

doi:10.1186/s13628-016-0029-y

Research Article
Open access
Published: 21 April 2016

Evaluation of the coarse-grained OPEP force field for protein-protein docking

Philipp Kynast¹,
Philippe Derreumaux^2,3,4 &
Birgit Strodel^1,5

BMC Biophysics volume 9, Article number: 4 (2016) Cite this article

4061 Accesses
19 Citations
1 Altmetric
Metrics details

Abstract

Background

Knowing the binding site of protein–protein complexes helps understand their function and shows possible regulation sites. The ultimate goal of protein–protein docking is the prediction of the three-dimensional structure of a protein–protein complex. Docking itself only produces plausible candidate structures, which must be ranked using scoring functions to identify the structures that are most likely to occur in nature.

Methods

In this work, we rescore rigid body protein–protein predictions using the optimized potential for efficient structure prediction (OPEP), which is a coarse-grained force field. Using a force field based on continuous functions rather than a grid-based scoring function allows the introduction of protein flexibility during the docking procedure. First, we produce protein–protein predictions using ZDOCK, and after energy minimization via OPEP we rank them using an OPEP-based soft rescoring function. We also train the rescoring function for different complex classes and demonstrate its improved performance for an independent dataset.

Results

The trained rescoring function produces a better ranking than ZDOCK for more than 50 % of targets, rising to over 70 % when considering only enzyme/inhibitor complexes.

Conclusions

This study demonstrates for the first time that energy functions derived from the coarse-grained OPEP force field can be employed to rescore predictions for protein–protein complexes.

Background

One of the main goals of proteomic research is to understand the biological function of proteins. Many proteins generate their function not as monomers but as part of complexes. Thus knowledge about protein–protein interactions is fundamental and allows regulation of protein structure and function. The Protein Data Bank (PDB) [1] contains more than one hundred thousand protein structures. However, structures of protein–protein complexes are often difficult to determine experimentally. These complexes are usually very big, which is a problem for elucidating structure via nuclear magnetic resonance (NMR), and the interactions are often too transient to be captured by X-ray crystallography.

Protein-protein docking is an in silico method for predicting the structures of protein–protein complexes. One can predict possible binding sites in a complex based on the protein structures in their unbound state. The binding partners can be single proteins or smaller protein–protein complexes. To increase computing efficiency, the proteins are usually modelled as rigid bodies at the first six-dimensional (6D) global search stage. Most of these global search methods are based on the convolution of grids, where the surface of the binding partners are parametrized such that an overlap between the surfaces of the two binding partners becomes possible. The aim of this surface description is to implicitly account for conformational changes upon binding. The convolution of the grids is accelerated by fast Fourier transformation (FFT) [2–5]. In the simplest approach, the convolution produces possible docking positions based solely on the shape of the proteins. However, more sophisticated grid maps exist which take chemical and knowledge-based properties into account. For refining the initial predictions, various methods are commonly applied, for instance Monte Carlo (MC) simulations [6, 7], clustering [8, 9], or side-chain optimization using rotamer libraries [10]. As computation time is usually the limiting factor, an MC simulation should start from a conformation close to the binding site. A complete global search with this method in a reasonable computing time would be impossible.

The global search, which is performed via ZDOCK in this study [11], usually finds many similar solutions [4]. Therefore, it is common practice to cluster and rerank the docking predictions. Reranking classifies and distinguishes native or near-native solutions from non-native or wrong predictions [12, 13]. The number of predictions in a cluster can also be used for reranking [14]. The aim of both approaches is to narrow down the list of possible interaction sites, significantly decreasing computational cost and effort for further analysis of the remaining docking predictions.

To investigate protein–protein complexes produced by ZDOCK, docking approaches that allow for more protein flexibility than ZDOCK with low time expenditure are needed. A coarse-grained force field should be a good choice here. Various coarse-grained force fields have already been developed for the treatment of protein–protein complexes, including the calculation of thermodynamic and structural properties of multi-protein complexes with relatively low binding affinities [15]. Coarse-grained models are also used for molecular dynamics (MD) simulations of protein–protein association [16, 17], where the proteins are modelled using the MARTINI force field [18, 19] or with a Go-model approach [20]. In the latter approach [17], the electrostatic and hydrophobic interactions between proteins are modelled via a Coulomb potential with a distance dependent dielectric constant and the Miyzawa-Jernigan potential [21].

In the current study, we apply the coarse-grained ‘Optimized Potential for Efficient structure Prediction’ (OPEP) [22] to the protein–protein docking problem. A coarse-grained force field is used because of the reduced number of degrees of freedom, making it computationally more efficient than an all atom potential. Moreover, it is believed that a coarse-grained model will smooth the underlying free energy landscape, facilitating exploration of the corresponding phase space [23]. OPEP has already been successfully employed with different techniques, including MD and MC simulations. It was applied to RNA/DNA/protein systems to investigate the effect of crowding, to amyloid formation, and for protein 3D structure prediction. A recent overview of OPEP and its applications can be found in [22]. This work investigates OPEP’s applicability to protein–protein complexes. To test its performance for protein–protein docking, the first step is to investigate the discriminating power of OPEP to distinguish between correctly and wrongly docked complexes. We use global docking predictions produced by ZDOCK which we coarse grain and energy minimize using OPEP, followed by rescoring with an OPEP-based soft potential. Moreover, we enhance the performance of the rescoring function via an iterative learning procedure and test the resulting scoring function on a subset of the Dockground benchmark [24].

Methods

We perform unbound docking, which starts from the binding partners in their native conformations. The methods applied for predicting and rescoring protein–protein complexes can be summarized via the following pipeline: For each of the 96 targets we produce 54,000 docking predictions with ZDOCK and retain the best 2000 of these complexes, as recommended by the ZDOCK developers. These predictions are energy minimized using the OPEP force field (step (1) in Fig. 1). For each prediction we perform 140 minimization steps in full Cartesian space with the limited-memory Broyden–Fletcher–Goldfarb–Shanno (LBFGS) minimizer [25], which leads to minimization times between 3.5 s for the target with PDB ID 1AY7 (185 amino acids) and 250 s for the target with PDB ID 2HMI (1413 amino acids) on a single CPU core. This amounts to an overall minimization time for the 2000 ZDOCK predictions per target of less than 24 h for 85 % of targets. Afterwards, the minimized predictions are reranked. For this, we replaced the side chain–side chain interaction potential of OPEP with a softer 8-6 Lennard-Jones-potential, while preserving the optimal distances and energies (step (2) in Fig. 1). At this stage, the OPEP potentials for salt bridges, interactions involving backbone atoms, and H-bonds are not changed. In a further step, we trained the parameters of side chain–side chain interactions, including salt bridge interactions with an iterative learning approach with the aim of further improving the performance of the OPEP-based rescoring function (steps (3)–(6) in Fig. 1). The resulting scoring function is tested on another dataset to independently prove its ability to distinguish between native and non-native complexes.

The dataset

We use two different benchmarks to perform unbound docking. ZDOCK benchmark 4.0 is used as training dataset, while for further evaluation we use the Dockground benchmark 2.0. We used a subset of ZDOCK benchmark 4.0 [26]. We downloaded the docking predictions for 6° angular sampling from the ZDOCK website, which were obtained using ZDOCK 3.02 [27]. Ninety-six complexes were selected, including 39 enzyme/inhibitor, 19 antigen/antibody, and 38 other types of complexes. The latter will be called ‘other complexes’ for the remainder of this paper. One condition for selecting these complexes is that ZDOCK found at least one hit in the top 2000 predictions. A hit is defined as a prediction with an interface root mean square deviation (IRMSD) from the target of lower than 4 Å. Complexes that contain small molecules like ATP and GTP, for which OPEP is not parametrized, were not considered. The 1N2C complex could not be used, because it has more than 15,000 beads after coarse graining and the fixed file format for parametrization in OPEP currently only allows for up to 9999 beads.

The second dataset is a subset of the Dockground benchmark [24]. Here we follow the same selection criteria as for the ZDOCK benchmark. Furthermore, we remove complexes present in ZDOCK benchmark 4.0 in order to generate an independent and unbiased test set. The resulting test set contains 74 targets with 18 enzyme/inhibitor, 16 antigen/antibody, and 40 other complexes. As before, to generate complex predictions we applied ZDOCK with 6° sampling, using a local ZDOCK 3.02 installation and keeping the top 2000 predictions. As in the ZDOCK dataset, the docking for the antigen/antibody complexes was restricted to the complementarity determining regions (CDRs).

ZDOCK

ZDOCK is an FFT-based rigid-body protein–protein docking algorithm. During the search procedure one protein is kept fixed, while the other is moved around it. The fixed protein is usually the larger of the two and is called the receptor, while the other protein is the ligand. ZDOCK generates grid-based representations from the full atom chains of receptor and ligand and after each ligand rotation the grids can be fast convoluted via FFT. The three rotational angles of the ligand are sampled with a 6° spacing, and the 3 translational degrees of freedom are sampled with a 1.2 Å spacing. For each set of rotational angles, only the best (based on ZDOCK score) translationally sampled prediction is retained [28]. This leads to 54,000 ZDOCK predictions, of which we consider the top 2000 for further refinement. To account for some flexibility in ZDOCK, a soft docking approach is used where the receptor has a 3.4 Å thick surface layer [3]. This allows for some overlap between receptor and ligand and accounts for possible movements during docking. However, it may also lead to atom clashes between receptor and ligand. The ZDOCK scoring function contains a shape-complementary term [29], a knowledge-based contact term for atoms and residues [11], and an electrostatic term [30].

Missing residues and atoms

Some of the complex structures considered are missing certain residues in the receptor and/or ligand. Although this is no problem for a grid-based method like ZDOCK, it must be resolved for treatment with OPEP. Missing residues lead to gaps in the backbone chain and, if untreated, they would be considered overstretched bonds. In order to resolve this problem, polypeptides with missing residues are treated as separate chains. The distance between the terminal carbon and the terminal nitrogen of the gap is kept fixed via a harmonic potential with the equilibrium distance equal to the initial gap length and a force constant of 100 kcal/(mol·Å²).

OPEP

As rescoring function we use the coarse-grained potential OPEP or variations of it. OPEP uses a six bead representation for every amino acid except proline and glycine. The amino nitrogen N, the C_α, and the carbonyl carbon C’ atoms of the backbone are each modelled by one bead. In addition, the hydrogen H of the amino-group and carbonyl oxygen O are explicitly represented. Side chains are described by only one bead, except for proline where all heavy side chain atoms are modelled. The local energy terms in OPEP were developed based on the functional form of the Amber force field [31] and several rounds of minor adaptations to the side chain–side chain interactions have been conducted [22]. We use the latest version of OPEP, OPEPv5 [32], which for the first time includes an explicit potential for salt bridges that were parametrized with an iterative Boltzmann inversion method with parameters extracted from all atom MD simulations. A complete description of the OPEP potential can be found in the original OPEP publications [22, 31–33]. Here, we only present the nonbonded interactions, as they are used to rescore the protein–protein complexes. The nonbonded potential consists of four terms: (1) van der Waals interactions involving backbone atoms (E _VDW), (2) hydrophobic and hydrophilic side chain–side chain interactions (E _SS), (3) hydrogen bond (H-bond) interactions between backbone atoms (E _HB), and (4) a potential for salt bridges (E _SB). Interactions between side chains E _SS are modelled differently for attractive and repulsive interactions [34]:

where r _ij is the distance between interacting beads i and j, the equilibrium distance σ _ij is correlated with $r_{ij}^{0}$ via

$$ \sigma_{ij} \approx 1.0729 r_{ij}^{0} - 0.3992, $$

((3))

ε _ij is the interaction strength, and

$$\begin{array}{@{}rcl@{}} G\left(r_{ij}^{0}\right) = \left[-0.7 \mathrm{e}^{\left(2 \left(r_{ij}^{0}-0.5\right)/5.0\right)} \left(r_{ij}^{0}-0.5 \right)\right]^{6} \end{array} $$

((4))

Figure 2 a shows a matrix of the energies of the side chain–side chain interactions at the minimum distances σ _ij. Equation (1) replaces the common 12-6 Lennard-Jones potential in order to limit E _SS at longer distances. Figure 2 b shows an example of the form of the potential for the Phe/Phe interaction. For proline and glycine the center of interaction is the C_α-atom, while for all other side chains the interaction center is a bead representing the center of mass of the side chain [33]. The potential E _SS is not used for salt bridges between side chains. Instead, salt bridges are modelled with a potential, E _SB, derived from all atom MD simulations [32], where the distance dependent contact probability is translated to free energy profiles. These free energy profiles have one minimum for Arg/Asp and Arg/Glu pairs and two minima for Lys/Asp and Lys/Glu interactions. To describe backbone–backbone and backbone–side chain interactions, OPEP contains a van der Waals term, E _VDW, which is modelled via a 12-6 Lennard-Jones potential. H-Bond interactions, E _HB, are modelled between the backbone N-H and the backbone C’-O atoms. In addition, OPEP has special terms for stabilizing α-helices and β-sheets. The two-body term for H-bonds between residues in the same chain has different equilibrium distances for H-bonds less than five residues apart and for H-bonds further than four residues apart. For stabilizing α-helices, the intra-chain potentials also contain a 4-body H-bond term. Furthermore, 11 side chain–side chain interactions were identified to be more frequently found in (i, i + 3) and (i, i + 4) contacts in α-helices. Therefore, these side chain–side chain interactions with this particular separation were made more attractive [34].

The scoring function

Before rescoring the predictions, we perform an energy minimization using OPEPv5 to relax the complexes after their transformation from the grid presentation to the coarse-grained model. We perform 140 minimization steps, as we found this to be the best compromise between computational efficiency and optimization result. We tested the effect of fewer and more minimization steps. Extending the minimization beyond 140 steps does not change the outcome of the rescoring result as for ∼90 % of the structures the energy only changes marginally at this point. Moreover, it happens especially for misdocked complexes that the energy minimum has not been reached within 140 minimization steps. However, there is no need to further optimize such misdocked decoys. Reducing the number of minimization steps below 140 bears the risk that also near-native structures have not been properly minimized yet, which would lead to a poor ranking for them. For the scoring function we found that it becomes more reliable if we introduce a softer potential, which allows for more overlap between the beads than the original OPEPv5 energy function. To obtain a softer scoring function we replace both the side chain–side chain interaction potential, E _SS from Eq. (1), and the 12-6 Lennard-Jones potential E _VDW with an 8-6 Lennard-Jones potential. This kind of soft potential is also used in the Attract force field that was developed for protein–protein docking [35]. We call the new potentials E _SS86 and E _VDW86, and the formula for E _SS86 is given as:

Here, σij′=0.866σ _ij and εij′=9.481E _SS(σ _ij), with σ _ij given in Eq. (3). The values σij′ and εij′ are chosen such that the minimum energies at the equilibrium distances are identical for E _SS and E _SS86. From Eq. (6), one can see that the repulsive-only potential is not modified. An example of the attractive E _SS86 term is shown in Fig. 2 b for the Phe/Phe side chain interaction. As the 8-6 potentials E _SS86 and E _VDW86 have broader wells than in OPEPv5, some overlap between beads is tolerated and, in addition, imperfectly fitted contacts are more strongly attractive at larger distances. The potentials for H-bonds and salt bridges were not modified, leading to our new scoring function, E ₈₆, with the modified potentials E _VDW86 and E _SS86:

$$ E_{86} = E_{\text{VDW86}}+ E_{\text{SS86}} + E_{\text{HB}} + E_{\text{SB}}, $$

((7))

which calculates the binding energy between receptor and ligand for scoring purposes. It should be noted that each binding partner can consist of several proteins (chains). We consider all chains from one binding partner as a single protein. Hence, we only consider non-bonded energies between the two binding partners, e.g., between receptor and ligand.

Interface RMSD

The interface RMSD (IRMSD) is defined as the RMSD between C_α interface atoms of the co-crystallized model and the prediction after superposition. Interface C_α atoms are all atoms within 10 Å distance of the binding partner in the co-crystallized complex [36]. For the superposition we use the corresponding function from Biopython [37].

Definition of a hit

As is standard [38, 39], we define a hit as a docked conformation with an IRMSD lower than 4 Å.

Performance evaluation

The performance is evaluated by ranking the predictions according to their (re)scoring energy in increasing order. From this list, the best ranked prediction with an IRMSD lower than 4 Å is reported. Furthermore, we calculate the success rate, which is a function of the number of predictions, N _pred, that we consider from the sorted prediction list. This is averaged over the number of targets, N _target, and is calculated according to following equation:

$$ \text{success rate}(N_{\text{pred}}) = \frac{1}{N_{\text{target}}}\sum_{i=1}^{N_{\text{target}}} S_{i}(N_{\text{pred}}), $$

((8))

where S _i(N _pred)=1 when the subset of N _pred=1,2,…,2,000 predictions contains at least one hit, otherwise S _i(N _pred)=0. Thus, the success rate corresponds to the probability of finding the native complex among the N _pred first models based on the (re)scoring energy.

Training the scoring function

After minimization, a residue-residue contact map between receptor and ligand is produced for each prediction. A contact is present if any of the beads of two residues are closer than 8 Å. Depending on the ranking with E ₈₆, one can classify the predictions for each complex into one of the four groups: true positive (TP), false negative (FN), false positive (FP), and true negative (TN). TPs have an IRMSD <4 Å and rank lower than or equal to 20, while the TN predictions have IRMSD ≥4 Å and a rank higher than 20. All other predictions are either FNs or FPs depending on whether their IRMSD is < or ≥ 4 Å and their ranking is > or ≤20. We only consider the first N=20 TPs or, if N<20 hits are found, we consider only those, because ideally one wants the correct predictions within the top hits. Twenty complexes is a small enough number for further processing by computationally more expensive approaches and visual inspection. We further limit the number of FNs and FPs to 20−N for training purposes. Thus, we do not consider FN and FP predictions if ≥20 hits are found for a target, as for such targets E ₈₆ already produces satisfying results. For each TP, FN, FP, and TN prediction considered, we calculate the frequency map for residue-residue contacts and average them over all targets for the enzyme/inhibitor, antigen/antibody, and other complexes. Next, we select residue-residue contacts where the frequency is higher in the maps for TP and FN than for the FP and TN maps. We assume these contacts need to be strengthened, so current FN predictions become TP without further favoring FP predictions. Therefore, we decrease the energy value E _SS86 or E _SB for this contact. The other contacts, for which we modify the potential, are those where the frequency of TPs and FNs is lower than FPs and TNs. It appears these contacts are not important for the complex class in question and should thus be disfavored, with the aim of transforming a current FP prediction into a TN prediction. Therefore, we increase E _SS86 or E _SB for such contacts. Figure 1 illustrates the training procedure.

The amount of change for the selected interaction between residues i and j is determined by the ratio between the corresponding FN _ij and FP _ij frequencies. A value greater than one means this interaction energy has to be decreased, while the opposite indicates this interaction energy has to be increased. We do this by changing the interaction potentials E _SS86(i,j) and E _SB(i,j) according to

$$ E^{\text{trained}}_{\mathrm{X}}(i,j)=E^{\text{old}}_{\mathrm{X}}(i,j) - k \ln\left(\frac{\text{FN}_{ij}}{\text{FP}_{ij}}\right), $$

((9))

where E _X=E _SB or E _X=E _SS86 depending on the residue contact (i,j). For the parameter k, values between 0.1 and 0.6 were tested, and k=0.2 was found to be optimal. Equation (9) was iteratively applied. Thus, we had to determine when to stop the training for best parametrization and to avoid overfitting. To this end, we performed a 4-fold cross-validation on the enzyme/inhibitor training dataset, which gives us meaningful numbers for training and validation. This enzyme/inhibitor set contains 39 targets, of which 29 complexes were used for training, with the remaining 10 used for cross-validation. For these 10 targets, we measured the quality with $\sum _{i=1}^{10} \ln (\text {rank}(\text {target}_{i}))$, where rank() returns the rank of the best ranked hit. This function should decrease during training, while an increase is indicative of overfitting. We observe that overfitting becomes an issue after 30 iterations of Eq. (9). Therefore, we set the number of learning iterations to 30, yielding our new scoring function $E_{86}^{\text {trained}}$:

$$ E_{86}^{\text{trained}}=E_{\text{VDW86}}+ E_{\text{SS86}}^{\text{trained}} + E_{\text{HB}} + E_{\text{SB}}^{\text{trained}} $$

((10))

Results

Overall performance

The ranks of the first hit using ZDOCK and after rescoring are shown in Table 1. The ZDOCK column gives the results for ZDOCK 3.02. The $E_{86}^{\text {initial}}$ column shows the rank after rescoring using Eq. (7) before energy minimization with the OPEP potential, while the E ₈₆ column reports the rank after minimization. Column five reports the rank of the first hit when using all intra- and interprotein contributions of the original OPEPv5 potential [32], while column six shows the rank of the first hit when the predictions are ranked by OPEPv5 energy when only the non-bonded energies between beads from the receptor and ligand are considered. These rescoring energies are denoted by E _OPEP and $E^{\text {int}}_{\text {OPEP}}$ in the following. Figure 3 represents the success rate as defined in Eq. (8) for the different complex classes. In general, ZDOCK and E ₈₆ perform better than E _OPEP and $E^{\text {int}}_{\text {OPEP}}$ and their performance is about equal if one considers the overall performance for all complex classes (Fig. 3 a). However, there are differences between the three complex classes.

Table 1 Best rank for (re)scoring with ZDOCK, $E_{86}^{\text {initial}}$, E ₈₆, E _OPEP, and $E_{\text {OPEP}}^{\text {int}}$ for complexes from the ZDOCK benchmark 4.0. $\varnothing $ indicates the average rank for the complex class in question

Full size table

Enzyme/inhibitor

For enzyme/inhibitor complexes, E ₈₆ finds equal or more hits if more than four predictions are considered, i.e., N _pred≥5 (Fig. 3 b). When considering more than 50 predictions, E ₈₆ becomes substantially better than ZDOCK. Table 1 shows that we can improve or maintain the rank using E ₈₆ for 25 out of 39 enzyme/inhibitor targets. For 1AVX, the rank is only slightly worse, increasing from 1 with ZDOCK to 3 with E ₈₆. Comparing the performance of E ₈₆ to $E^{\text {int}}_{\text {OPEP}}$, it becomes evident that the 140 minimization steps are not always sufficient to put every side chain in the minimum of the well, because the rank with $E^{\text {int}}_{\text {OPEP}}$ is considerably higher than for E ₈₆. Thus rescoring with the softer potential is necessary. When using E _OPEP for ranking, the ranks of only 16 targets are kept or improved. The average rank shows that E ₈₆ is generally better than ZDOCK, while $E^{\text {int}}_{\text {OPEP}}$ produces a similar ranking to ZDOCK, and E _OPEP performs worst.

Antigen/antibody

For antigen/antibody complexes, rescoring with E ₈₆ was least successful. For N _pred≲500, the success rate of E ₈₆ is clearly smaller than for ZDOCK (Fig. 3 c). Out of 19 antigen/antibody complexes, E ₈₆ improves the rank for only six targets and worsens it for the other 13. Using E _OPEP only improves the ranking of six complexes, while the rank of only one complex can be improved using $E^{\text {int}}_{\text {OPEP}}$. The average rank shows that ZDOCK performs considerably better than any of the OPEP-based rescoring approaches. However, it should be noted that ZDOCK is not a perfect scoring function either for antigen/antibody complexes, as revealed by comparing the average ZDOCK ranks with enzyme/inhibitor complexes.

Other complexes

For other complexes, the success rate is always higher for rescoring with E ₈₆ than scoring with ZDOCK, independent of the number of predictions considered (Fig. 3 d). The E ₈₆ score improves or maintains the rank of 21 complexes and worsens it for the other 17; however, for 1ML0 the rank only changes from 1 to 2 and 1RV6 from 1 to 4. While E _OPEP improves the rank of 20 targets and worsens the rank of 18 targets, the improvements mostly occur for higher ranks, and only four predictions have rank 1, compared with eight for E ₈₆. $E^{\text {int}}_{\text {OPEP}}$ can improve the rank of only 15 targets; it worsens the rank of the other 23. On average, for other complexes rescoring with E ₈₆ performs best, $E^{\text {int}}_{\text {OPEP}}$ is least suited for this task, and E _OPEP predicts a similar ranking as ZDOCK. From the strikingly different performance of E ₈₆ and $E^{\text {int}}_{\text {OPEP}}$ it seems that optimal shape complementarity implying favourable residue-residue interactions are very important for protein binding in this complex category.

Structural changes upon energy minimization

We tested whether the structures of the complexes are affected as a result of energy minimization with the OPEP potential. To this end, the secondary structures of the complexes are determined before and after their energy minimization using STRIDE [40]. Since we use crystal structures of the unbound receptor and ligand as input, all 2000 ZDOCK predictions per target have the same secondary structures before minimization, while the secondary structures can change during minimization with the OPEP potential. However, we find that the changes in secondary structure are generally small (<5 %). Especially the near-native structures with IRMSD <3 Å are least affected by energy minimization, indicating that the correct binding helps stabilize the complex structure. However, the overall changes of secondary structure are small and do not follow a pattern, which prevents us from generalizing a dependency between IRMSD and secondary structure.

We further tested if the IRMSD is affected by minimization with OPEP and found it changes only slightly. A plot showing the average change of IRMSD as a function of the initial IRMSD as obtained from ZDOCK can be seen in Fig. 4. For most predictions, the IRMSD slightly increases due to minimization with the average IRMSD change fluctuating aroud 0.1 Å. For some of the complexes, the IRMSD also decreases: for 4.3 % of the predictions with IRMSD <4 Å before minimization, which increases to 8.7 % if one considers all predictions. The preferred IRMSD increase for near-native predictions is likely to be an effect of the tight packing at the binding site, which leads to more bead clashes after transformation from the grid to the coarse-grained representation, causing the atoms or beads to reorient during minimization. Nonetheless, the structures stay close to the conformations predicted by ZDOCK, as Fig. 4 testifies. Only for severely misdocked complexes (IRMSD $\gtrsim 35$ Å) the IRMSD change increases to around 0.2 Å.

Comparison of columns three and four of Table 1 reveals the effect of minimizing the energy before rescoring with E ₈₆. Column three reports the best rank without energy minimization, which we denote as $E_{86}^{\text {initial}}$. For the comparison we concentrate on the complexes for which either E ₈₆ or $E_{86}^{\text {initial}}$, or both, predict a best rank ≤10 as in the Critical Assessment of PRedicted Interactions (CAPRI) experiment [41] one can only upload 10 predictions per target. Thus, the aim is to score the decoys closest to the native structure in the top 10. For enzyme/inhibitor complexes, energy minimization is most successful as E ₈₆ identifies for more than 38 % a hit in the top 10 predictions (see success rate for N _pred=10 in Fig. 3 b). For only four of these 15 complexes (namely 1CLV, 1JTG, 1PPE and 4CPA) also $E_{86}^{\text {initial}}$ predicts best ranks in the top 10, while it does not occur for enzyme/inhibitor complexes that $E_{86}^{\text {initial}}$ finds a hit in the top 10, which is lost upon energy minimization. In two cases (1F34 and 1UDI) energy minimization improves the rank by more than 400 places, leading to second places in the rank list. A similar picture emerges for other complexes, for which for more than 34 % of the complexes a best rank in the top 10 is found with E ₈₆ (see success rate for N _pred=10 in Fig. 3 d). With $E_{86}^{\text {initial}}$, on the other hand, for only three complexes a top-10 rank is achieved. For one of these three (1WDW) the rank increases from 3 to 23 upon energy minimization, while the other two are also top-10 ranked with E ₈₆. Only for antigen/antibody complexes preceding energy minimization of the complexes offers no advantage over direct application of the rescoring function. E ₈₆ and $E_{86}^{\text {initial}}$ find for 2 and 3, respectively, of the 19 complexes a hit in the top-10 rank list. For two complexes (1E6J and 1FSK) the top-10 rank is lost after energy minimization, while for 1IQD the best rank climbed 40 places and is ranked first with E ₈₆. However, it should be noted that the average rank for $E_{86}^{\text {initial}}$ is considerably lower than for both ZDOCK and E ₈₆. Thus, energy minimization of antigen/antibody complexes is not absolutely necessary. Though apart from saving us computing time, omitting this step would also not (considerably) increase our chances of identifying the right prediction as the increase of the average rank for E ₈₆ originates mainly from further deterioration of the already high ranks obtained with $E_{86}^{\text {initial}}$ (e.g., complexes 1AHW, 1K4C and 2VIS). More crucial would be a general improvement of the E ₈₆ scoring function for its application to antigen/antibody complexes.

Energy contributions to the protein-protein interactions

Figure 5 shows the different contributions to the E ₈₆ energy for predictions sorted by their IRMSD using a bin size of 1 Å. We show the averaged values of E _SS86, E _SB, and E _HB for the three complex classes. For the enzyme/inhibitor complexes, a minimum in E _SS86 is present for predictions up to 5 Å. However, for IRMSD values above 25 Å E _SS86 becomes small again, in some cases even smaller than for the hits. This is more than counterbalanced by the H-bond energy, as only near-native hits have more and better oriented H-bonds, leading to E _HB values more than 10 kcal/mol smaller than for all other predictions. Salt bridges seem to be of minor importance for the protein binding in enzyme/inhibitor complexes, as there is no correlation between the E _SB values and the IRMSD, and the contribution of E _SB to E ₈₆ is generally small, with all values fluctuating around −5 kcal/mol. Thus, the sum of E _SS86 and E _HB is mainly responsible for distinguishing between correct and incorrect complex predictions. This partly agrees with previous findings that protease-inhibitor complexes interact predominantly through main chain–main chain interactions [42], which are represented by H-bonds in the E ₈₆ function.

For antigen/antibody complexes, none of the three energy contributions clearly decreases with decreasing IRMSD. Instead, both E _SS86 and E _HB adopt their smallest values for IRMSD ≈20 Å, which explains why E ₈₆ does not perform well for this complex class. Compared to enzyme/inhibitor complexes, backbone H-bonds are less important for the native complex. This agrees with the previous observation that antigen and antibody complexes predominantly bind through side chain–side chain or side chain–main chain interactions [42], which are represented by other contributions from E ₈₆ but not by E _HB. For antigen/antibody complexes, the formation of salt bridges is also of minor importance. There is only one exception, at IRMSD ≈34 Å, where with E _SB≈−13 kcal/mol the smallest salt bridge energy is observed, also taking the other two complex classes into account.

The hits for other complexes are stabilized by side chain–side chain interactions, as the lowest values for E _SS86 are found for the complexes with IRMSD <4 Å. H-bonds seem to be of minor importance for binding receptor and ligand in this complex category, as all E _HB values are >−1 kcal/mol, an order of magnitude higher than those in enzyme/inhibitor and antigen/antibody complexes. On the other hand, other complexes are the only ones where salt bridges contribute to stabilizing the complexes, as for IRMSD >5 Å, E _SB increases. This trend only breaks for IRMSD ≤5 Å as E _SB does not further decrease for the near-native predictions. This means that either E _SS86 dominates these binding modes or the E ₈₆ potential can be further improved in this range.

Improving the rescoring function

Next we tested if the performance of E ₈₆ can be enhanced by training it according to Eq. (9), yielding the new rescoring function $E_{86}^{\text {trained}}$ defined in Eq. (10). As the energy analysis revealed that complex formation in the three categories is driven by different interactions, we decided to optimize E ₈₆ separately for enzyme/inhibitor, antigen/antibody, and other complexes. The resulting $E_{86}^{\text {trained}}$ leads to new energies at the optimal distances between the side chains at the binding sites, which can be presented as a matrix. Subtracting the new energy matrix from the original potential energy matrix shown in Fig. 2 a gives a matrix for each complex category that represents the change in interaction energies. These matrices are shown in Fig. 6.