Report on the outcomes of a Short-Term Scientific Mission

Action number: CA21101

Applicant name: Heribert Reis

Details of the STSM

Title: Ultra-large docking virtual screening

Start and end date: 18/08/2023 to 18/10/2023

Description of the work carried out during the STSM

The goal of my STSM was to learn the tools and techniques, DOCK1 and the use of the ZINC database ( to developed a computational workflow to discover novel ligands via screening of millions of virtual compound library. Specifically, I applied the software developed at the Host Institution1, on the olfactory receptor OR51E1, used as a case study. The structure of OR51E2 was recently released2 , and provides a template for modeling OR51E1.
Specifically, I will be involved in the following tasks:

  1. Literature and Practice with DOCK
    In the first couple of weeks, I practiced and became familiar with DOCK and the tools developed in Prof. Shoichet’s lab by following tutorials and reading the literature. Moreover, I actively participated in a project involving GABA transporter type 1 (GAT1) to apply docking in a real case study before commencing work on OR51E1.
    2. OR51E1 ligand database curation and control set
    I collected all the available information on experimentally known active and non-active compounds of OR51E13,4. Furthermore, our dataset was augmented with both active and inactive compounds for OR51E1 from a previous high-throughput screening conducted by Professor Vladlen Z. Slepak

(unpublished data). We found a total of 154 compounds available, but only 24 of them reported a significant EC50. We then selected 18 active compounds. Since almost all the inactive molecules from the literature were chemically similar and presented fatty acids-like moieties, we used the DUDE-Z database to retrieve 900 (50 for each active ligand) decoy molecules. Decoy molecules possess the same physical properties as known ligands but differ in their topological structure, making it unlikely for them to bind.

Figure 1. Chemical space of Active (green dots), inactive molecules from HTS (Grey dots), and property-matched decoys (orange dots).

3. OR51E1 structural modelling protocol
The model selection protocol can be divided into three main steps: i) Homology modeling, ii) retrospective docking and iii) model(s) selection. We start by generating 1000 conformations of OR51E1 with MODELLER5 using as template OR51E2 (PDBid: 8F76). The two receptors have sequence identities of 56% for their overall sequences and 85% for their binding sites sequence identity. We generate multiple conformations to sample the binding site conformational space (Figure 2).

Figure 2. OR51E1 modeling protocol.
Then, hydrogen atoms and side chains of both generated models were optimized with the Protein Preparation Wizard tool at physiological pH (Schrödinger Release 2021-3, Maestro, Schrödinger, LLC, New York, NY, 2021). The models were then assessed for their capability to effectively prioritize known OR51E1 ligands over decoys with similar properties by performing docking simulations at the binding site using DOCK3.8. Each model was evaluated with the receiver operating characteristic curve, and visual inspection of the docked poses.

Figure 3. the scatter plot on the left displays the logAUC values for each model generated in the retrospective docking results. On the right, you can observe active compounds’ conformations (pose), along with the ROC plot for conformation 980.

4. Docking parameters optimization
Next, we identified the best 10 performing models and fine-tuned the docking parameters to enhance their ability to differentiate between active and inactive ligands. We used an automatic approach (DOCKopt) developed in Shoichet’s lab to test multiple combinations of parameters in a parallel matter (Figure 2). We tested around ≈ 4500 for each selected conformations (for total of 45000 docking simulations). We modulate the electrostic and the ligand desolvation parameters. We then selected the three best-optimized models based on logAUC, visual inspection, and their capacity to prefer negatively charged molecules over positive ligands (A test set, called the Extrema set, includes molecules with diverse net charges. It assesses the model’s ability to prioritize molecules with net charges similar to known actives.).

Figure 4. Illustrates the landscape of electrostatics and ligand desolvation parameters for each model under examination. In the heatmap, colors transition from blue to red as the logAUC values increase, indicating better performance.

Description of the STSM main achievements and planned follow-up activities

The developed protocol led to the model the binding site of the OR51E1 and obtained refined models that were able to discriminate between active and inactive molecules. The best-performing optimized model was chosen for the virtual screening campaign based on the logAUC, the capacity of docked molecules to maintain key interactions with the receptors and the volume of the pocket (Figure 5).

Figure 5. The optimized models (configuration 980) are presented along with ROC curve plots, DOCKscore, van der Waals (vdW) and Electrostatic Energy violation plots for active and inactive molecules. Additionally, the results of the volume binding pocket and extrema set tests are included.

Since all active molecules carry out carboxylate moieties, we selected for the virtual screening only molecules with it and divided by atom count. Set 1 (HC: 6-16), Set 2 (17-20) and (Set 3) contains 810K, 17.3m and 17.0m of compounds. Additionally, we screened also a forth library of fragments containing only neutral molecules of 14.9m. In total we screened 50 million of virtual compounds from ZINC.
The next step following the project presentation involves the analysis of the virtual screening campaign, often referred to as “HIT-Picking.” In this phase, we will initially choose the top 0.1% to 0.01% of docked molecules for each library segment (Set1-5) based on their DOCK score. Subsequently, we will employ additional filters, such as identifying key interactions using IFP methods and assessing metabolic liabilities. These filters will help us pinpoint promising hits from among the top-scoring 300,000 to 1,000,000 molecules. Following this, we will cluster the compounds based on their 2D structural similarity. The filtered compounds will be subjected to postdocking Molecular Dynamics simulations to asses the stability (Di Pizio’s lab) of the predicted binding poses. Finally, the the most promising molecules will be experimentally tested by Krautwurst group (Home Institutition).
We plan to write a manuscript with the results of the STMS project. The STSM is an important step to strengthen the collaboration between Prof Brian Shoichet and Antonella Di Pizio’s group. The skills gained will subsequently be shared with members of Di Pizio’s group at the Home Institution, aligning with one of the key objectives of COST Actions, namely, the transfer of knowledge. Upon the successful validation of the developed protocol, which integrates cutting-edge molecular modeling methods such as Molecular Docking and Molecular Dynamics (MD) simulations, it will contribute to achieving the objectives of COST ACTION COSY Working Groups 1 and 2 (WG1 and WG2). These objectives involve obtaining computational methods to accurately predict molecular interactions between biomolecules. Significantly, accurate predictions of molecular interactions will facilitate the development of innovative functional materials and drug delivery systems, ultimately advancing the overarching goals of COSY Actions, i.e. useful knowledge forming the basis for application. Additionally, Ultralarge VS demonstrates substantial promise in identifying ligands, which could evolve into valuable pharmacological tools or serve as the foundation for innovative therapeutic agents.


1. Bender BJ, Gahbauer S, Luttens A, Lyu J, Webb CM, Stein RM, et al. A practical guide to large-scale docking. Nat Protoc 2021, 16(10): 4799-4832
2. Billesbolle CB, de March CA, van der Velden WJC, Ma N, Tewari J, Del Torrent CL, et al. Structural basis of odorant recognition by a human odorant receptor. Nature 2023, 615(7953): 742-749.
3. Jovancevic, N., Dendorfer, A., Matzkies, M. et al. Medium-chain fatty acids modulate myocardial function via a cardiac odorant receptor. Basic Res Cardiol 112, 13 (2017).
4. Bushdid C, de March CA, Fiorucci S, Matsunami H, J. Golebiowski J. Agonists of G-Protein-Coupled Odorant Receptors Are Predicted from Chemical Features J. Phys. Chem. Lett. 2018, 9, 9, 2235–224

5. Eswar, N.; Webb, B.; Marti-Renom, M. A.; Madhusudhan, M. S.; Eramian, D.; Shen, M. Y.; Pieper, U.; Sali, A. Comparative protein structure modeling using Modeller. Curr. Protoc Bioinformatics 2006

Share this article, choose your platform!