Science Blog of Dr. Yuan分享 This blog is mainly on Molecular molecular modelling and simulations


do we really need expensive calculation for binding dG?

已有 2633 次阅读 2015-7-22 02:32 |系统分类:科研笔记

Predicting ligand activities towards a specific target is always a hot topic in computational structural biology and drug discovery. For this purpose, many technologies are developed or being improved: from optimizing scoring function to QSAR model; from MM/GBSA to MM/PBSA; from potential mean force (PMF)  to linear interaction energy (LIE); from thermodynamic integration (TI) to free energy perturbation (FEP). More recently, several relative binding energy calculation methods based on Alchemical methods were developed, including:

  • -Schrodinger FEP Mapper  

  • -Amber free energy workflow (FEW)

  • -Gromacs PMX

  • -FESetup

  • -Lead Optimization Mapper (LOMAP)

The basic principle for those tools are identical: making a free energy circle between various  ligands.

Free energy among above ligand mutation circle should be zero in all:


This principle sounds  good and many developer are in the hope that it could improve the accuracy of ligand binding energy prediction. Thus “Mapper” definition were introduced into either TI (Amber FEW) or FEP calculation (eg: Schrodinger FEP Mapper) .


Mapper Netwrok generated by "LOMAP"

However, this new concept has not been widely validated by large number of users yet and the reliability is still uncertain. Recently, we did a systematic test for this technology for a specific target. By comparison, we also compare the results with widely used MM/PBSA calculation.


A 10 compounds testing  set in a 4 GPU workstation

                    Mapper based methods     MM/PBSA  

  • Efficiency:         5 ns/day                    50 ns/day

  • Productivity:    0.6 compound/day         20 compounds/day

  • total time for testing:     18 days            2 days

    Figure Legend| Upper:  Mapper based prediction; Down: MM/PBSA calculation

    As we can observe from above plot that: although the Mapper based free energy calculation cost much more CPU/GPU resources and took much longer time to be done, the results doesn't improve at all. And there is almost no any correlation between experimental data and Mapper predictions. By contract, the inexpensive MM/PBSA was done much faster and the accuracy for the testing set is pretty good. Nervertheless, the original Amber Mapper paper, which published in Journal of Comp. Chem 2013 (34) 965-973, also indicated that the Mapper based free enrgy calcultion has poor correlation factor with experimental data (R2=0.26~0.28 in the testing). There are definitely a lot of space to be improved for this concept.

    Additionally, we did  another two testings with differennt class of compounds for the same targets. We found that MM/PBSA obtained correlation factor R2=0.65-0.85. However, with the same testing sets, the expensive Mapper based free energy calculation only got R2 0.1-0.2!

Result 2

10 compounds are very small testing sets and probably we are lucy to obtain good correlation between MM/PBSA prediction and experimental data. To further validate the MM/PBSA reliabilities, we increased the testing sets to 46 and here is the results:

As we can see from above plot that the correlation factor decreased obviously when the sampling volum increased. It dropped from previous R2=0.72 to 0.34. We also would like to perform large sampling for Mapper based prediction, but we noticed that 46 compound will take us 80 days to do this !! However, since usually large sampling is much poor the than the small testing sets, the results from Mapper based won't be better than its small testing sets as shown in Result 1. Meanwhile, it is extremely difficult to generate a Mapper for 46 compounds as a big circle. Thus, it is infeasible to perform Mapper based free energy calculation for such large samplings. Even this could be done, the inner circle is probably not minimized which could lead to the predictions to be even worse.

Result 3

Can we somehow improve MM/PBSA calculation results, especially when the samplings are very large? The answer is "YES"! We deposit each term of MM/PBSA calulation, and introduced QSAR concept: taking 25 compounds as "training sets" and the left as "testing sets". By employing  multiple linear regression function, we obtained fantastic results:

As we can see, the prediction of  "testing sets" increased from 0.34 to 0.60; while the training sets and overall R2 is 0.94 and 0.78 respectively. !!  That's really amazing improvement.

Interestingly, when we did QSAR alone (without MM/PBSA corelation) by two popuar commercial software, both of them failed: the training sets R2 is 0.3-0.5, but the testing set R2 for both software are only 0.1 or null no matter how we optimized the parameters!!


Although the Mapper based Alchemical free energy calculations sounds quite popular and was advertised everywhere these days, it doesn’t actually improve the ligand binding free energy predictions. By contract, the inexpensive MM/PBSA could work very well at least for certain targets, and benchmark should be done for validation. Combing MM/PBSA and QSAR could improve the prediction accuracy dramatically.

上一篇:The principle of GPCR Ligand specificity
下一篇:The first GPCR/arrestin complex structure released
收藏 IP: 178.83.38.*| 热度|


该博文允许注册用户评论 请点击登录 评论 (0 个评论)



Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2022-11-30 12:55

Powered by

Copyright © 2007- 中国科学报社