Learning the Edit Costs of Graph Edit Distance Applied to Ligand-Based Virtual Screening

Carlos       Garcia-Hernandez; Alberto       Fernández; Francesc       Serratosa

doi:10.2174/1568026620666200603122000

Abstract

Background: Graph edit distance is a methodology used to solve error-tolerant graph matching. This methodology estimates a distance between two graphs by determining the minimum number of modifications required to transform one graph into the other. These modifications, known as edit operations, have an edit cost associated that has to be determined depending on the problem.

Objective: This study focuses on the use of optimization techniques in order to learn the edit costs used when comparing graphs by means of the graph edit distance.

Methods: Graphs represent reduced structural representations of molecules using pharmacophore-type node descriptions to encode the relevant molecular properties. This reduction technique is known as extended reduced graphs. The screening and statistical tools available on the ligand-based virtual screening benchmarking platform and the RDKit were used.

Results: In the experiments, the graph edit distance using learned costs performed better or equally good than using predefined costs. This is exemplified with six publicly available datasets: DUD-E, MUV, GLL&GDD, CAPST, NRLiSt BDB, and ULS-UDS.

Conclusion: This study shows that the graph edit distance along with learned edit costs is useful to identify bioactivity similarities in a structurally diverse group of molecules. Furthermore, the target-specific edit costs might provide useful structure-activity information for future drug-design efforts.

Keywords: Structure-activity relationships, Graph edit distance, Extended reduced graph, Virtual screening, Molecular similarity, Machine learning.

« Previous Next »

Graphical Abstract

[1] 
Kubinyi, H.; Mannhold, R.; Timmerman, H. Virtual screening for bioactive molecules; John Wiley & Sons: Hoboken, 2008. Vol. 10. 
[2] 
Schneider, G.; Clément-Chomienne, O.; Hilfiger, L.; Schneider, P.; Kirsch, S.; Böhm, H.J.; Neidhart, W. Virtual screening for bioactive molecules by evolutionary de novo design. Angew. Chem. Int. Ed. Engl.,  2000, 39(22), 4130-4133.
[http://dx.doi.org/10.1002/1521-3773(20001117)39:22<4130:AID-ANIE4130>3.0.CO;2-E ] [PMID:  11093229] 
[3] 
Bajorath, J. Selected concepts and investigations in compound classification, molecular descriptor analysis, and virtual screening. J. Chem. Inf. Comput. Sci.,  2001, 41(2), 233-245.
[http://dx.doi.org/10.1021/ci0001482 ] [PMID:  11277704] 
[4] 
Heikamp, K.; Bajorath, J. The future of virtual compound screening. Chem. Biol. Drug Des.,  2013, 81(1), 33-40.
[http://dx.doi.org/10.1111/cbdd.12054 ] [PMID:  23253129] 
[5] 
Cereto-Massagué, A.; Ojeda, M.J.; Valls, C.; Mulero, M.; Garcia-Vallvé, S.; Pujadas, G. Molecular fingerprint similarity search in virtual screening. Methods,  2015, 71, 58-63.
[http://dx.doi.org/10.1016/j.ymeth.2014.08.005 ] [PMID:  25132639] 
[6] 
Kroemer, R.T. Structure-based drug design: docking and scoring. Curr. Protein Pept. Sci.,  2007, 8(4), 312-328.
[http://dx.doi.org/10.2174/138920307781369382 ] [PMID:  17696866] 
[7] 
Cavasotto, C.N.; Orry, A.J.; Andrew, J. Ligand docking and structure-based virtual screening in drug discovery. Curr. Top. Med. Chem.,  2007, 7(10), 1006-1014.
[http://dx.doi.org/10.2174/156802607780906753 ] [PMID:  17508934] 
[8] 
Sun, H. Pharmacophore-based virtual screening. Curr. Med. Chem.,  2008, 15(10), 1018-1024.
[http://dx.doi.org/10.2174/092986708784049630 ] [PMID:  18393859] 
[9] 
Kirchmair, J.; Distinto, S.; Markt, P.; Schuster, D.; Spitzer, G.M.; Liedl, K.R.; Wolber, G. How to optimize shape-based virtual screening: choosing the right query and including chemical information. J. Chem. Inf. Model.,  2009, 49(3), 678-692.
[http://dx.doi.org/10.1021/ci8004226 ] [PMID:  19434901] 
[10] 
Melville, J.L.; Burke, E.K.; Hirst, J.D. Machine learning in virtual screening. Comb. Chem. High Throughput Screen.,  2009, 12(4), 332-343.
[http://dx.doi.org/10.2174/138620709788167980 ] [PMID:  19442063] 
[11] 
Johnson, M.A.; Maggiora, G.M. Concepts and applications of molecular similarity; Wiley & Sons: Hoboken, 1990. 
[12] 
Bender, A.; Glen, R.C. Molecular similarity: a key technique in molecular informatics. Org. Biomol. Chem.,  2004, 2(22), 3204-3218.
[http://dx.doi.org/10.1039/b409813g ] [PMID:  15534697] 
[13] 
Nikolova, N.; Jaworska, J. Approaches to measure chemical similarity - a review. Mol. Inform.,  2003, 22, 1006-1026.
[14] 
Willett, P. Evaluation of molecular similarity and molecular diversity methods using biological activity data. In: Cheminformatics; Springer: Berlin, 2004; pp. 51-63.
[http://dx.doi.org/10.1385/1-59259-802-1:051] 
[15] 
Lajiness, M. Molecular similarity-based methods for selecting compounds for screening. Computational chemical graph theory; Nova Science Publishers, Inc., 1990, pp. 299-316.
[16] 
Willett, J. Similarity and clustering in chemical information systems; John Wiley & Sons, Inc.: Hoboken, 1987. 
[17] 
Sheridan, R.P.; Kearsley, S.K. Why do we need so many chemical similarity search methods? Drug Discov. Today,  2002, 7(17), 903-911.
[http://dx.doi.org/10.1016/S1359-6446(02)02411-X ] [PMID:  12546933] 
[18] 
Willett, P.; Barnard, J.M.; Downs, G.M. Chemical similarity searching. J. Chem. Inf. Comput. Sci.,  1998, 38, 983-996.
[http://dx.doi.org/10.1021/ci9800211] 
[19] 
Xue, L.; Bajorath, J. Molecular descriptors in chemoinformatics, computational combinatorial chemistry, and virtual screening. Comb. Chem. High Throughput Screen.,  2000, 3(5), 363-372.
[http://dx.doi.org/10.2174/1386207003331454 ] [PMID:  11032954] 
[20] 
Menard, P.R.; Mason, J.S.; Morize, I.; Bauerschmidt, S. Chemistry space metrics in diversity analysis, library design, and compound selection. J. Chem. Inf. Comput. Sci.,  1998, 38, 1204-1213.
[http://dx.doi.org/10.1021/ci9801062] 
[21] 
Pearlman, R.S.; Smith, K.M. Metric validation and the receptor-relevant subspace concept. J. Chem. Inf. Comput. Sci.,  1999, 39, 28-35.
[http://dx.doi.org/10.1021/ci980137x] 
[22] 
Schnur, D. Design and diversity analysis of large combinatorial libraries using cell-based methods. J. Chem. Inf. Comput. Sci.,  1999, 39, 36-45.
[http://dx.doi.org/10.1021/ci980138p] 
[23] 
Livingstone, D.J. The characterization of chemical structures using molecular properties. A survey. J. Chem. Inf. Comput. Sci.,  2000, 40(2), 195-209.
[http://dx.doi.org/10.1021/ci990162i ] [PMID:  10761119] 
[24] 
Barnard, J.M. Substructure searching methods: Old and new. J. Chem. Inf. Comput. Sci.,  1993, 33, 532-538.
[http://dx.doi.org/10.1021/ci00014a001] 
[25] 
James, C.; Weininger, D. Daylight, 4.41 Theory Manual; Daylight Chemical Information Systems Inc.: Irvine, CA, USA, 1995. 
[26] 
MACCS. K. MDL Information Systems. Inc; San Leandro: CA, 1984. 
[27] 
McGregor, M.J.; Pallai, P.V. Clustering of large databases of compounds: using the MDL “keys” as structural descriptors. J. Chem. Inf. Comput. Sci.,  1997, 37, 443-448.
[http://dx.doi.org/10.1021/ci960151e] 
[28] 
Güner, O.F. Pharmacophore perception, development and use in drug design. Molecules,  2000, 5(7), 987-989.
[29] 
Beno, B.R.; Mason, J.S. The design of combinatorial libraries using properties and 3D pharmacophore fingerprints. Drug Discov. Today,  2001, 6(5), 251-258.
[http://dx.doi.org/10.1016/S1359-6446(00)01665-2 ] [PMID:  11182598] 
[30] 
Rarey, M.; Dixon, J.S. Feature trees: a new molecular similarity measure based on tree matching. J. Comput. Aided Mol. Des.,  1998, 12(5), 471-490.
[http://dx.doi.org/10.1023/A:1008068904628 ] [PMID:  9834908] 
[31] 
Takahashi, Y.; Sukekawa, M.; Sasaki, S. Automatic identification of molecular similarity using reduced-graph representation of chemical structure. J. Chem. Inf. Model.,  1992, 32, 639-643.
[http://dx.doi.org/10.1021/ci00010a009] 
[32] 
Barker, E.J.; Buttar, D.; Cosgrove, D.A.; Gardiner, E.J.; Kitts, P.; Willett, P.; Gillet, V.J. Scaffold hopping using clique detection applied to reduced graphs. J. Chem. Inf. Model.,  2006, 46(2), 503-511.
[http://dx.doi.org/10.1021/ci050347r ] [PMID:  16562978] 
[33] 
Fisanick, W.; Lipkus, A.H.; Rusinko, A. Similarity searching on CAS registry substances. 2. 2D structural similarity. J. Chem. Inf. Comput. Sci.,  1994, 34, 130-140.
[http://dx.doi.org/10.1021/ci00017a016] 
[34] 
Gillet, V.J.; Downs, G.M.; Holliday, J.D.; Lynch, M.F.; Dethlefsen, W. Computer storage and retrieval of generic chemical structures in patents. 13. Reduced graph generation. J. Chem. Inf. Comput. Sci.,  1991, 31, 260-270.
[http://dx.doi.org/10.1021/ci00002a011] 
[35] 
Gillet, V.J.; Willett, P.; Bradshaw, J. Similarity searching using reduced graphs. J. Chem. Inf. Comput. Sci.,  2003, 43(2), 338-345.
[http://dx.doi.org/10.1021/ci025592e ] [PMID:  12653495] 
[36] 
Stiefl, N.; Watson, I.A.; Baumann, K.; Zaliani, A. ErG: 2D pharmacophore descriptions for scaffold hopping. J. Chem. Inf. Model.,  2006, 46(1), 208-220.
[http://dx.doi.org/10.1021/ci050457y ] [PMID:  16426057] 
[37] 
Barker, E.J.; Gardiner, E.J.; Gillet, V.J.; Kitts, P.; Morris, J. Further development of reduced graphs for identifying bioactive compounds. J. Chem. Inf. Comput. Sci.,  2003, 43(2), 346-356.
[http://dx.doi.org/10.1021/ci0255937 ] [PMID:  12653496] 
[38] 
Harper, G.; Bravi, G.S.; Pickett, S.D.; Hussain, J.; Green, D.V.S. The reduced graph descriptor in virtual screening and data-driven clustering of high-throughput screening data. J. Chem. Inf. Comput. Sci.,  2004, 44(6), 2145-2156.
[http://dx.doi.org/10.1021/ci049860f ] [PMID:  15554685] 
[39] 
Garcia-Hernandez, C.; Fernández, A.; Serratosa, F. Ligand-based virtual screening using graph edit distance as molecular similarity measure. J. Chem. Inf. Model.,  2019, 59(4), 1410-1421.
[http://dx.doi.org/10.1021/acs.jcim.8b00820 ] [PMID:  30920214] 
[40] 
Munkres, J. Algorithms for the assignment and transportation problems. J. Soc. Ind. Appl. Math.,  1957, 5, 32-38.
[http://dx.doi.org/10.1137/0105003] 
[41] 
Sanfeliu, A.; Fu, K.S. A distance measure between attributed relational graphs for pattern recognition. IEEE Trans. Syst. Man Cybern.,  1983, 353-362.
[http://dx.doi.org/10.1109/TSMC.1983.6313167] 
[42] 
Gao, X.; Xiao, B.; Tao, D.; Li, X. A survey of graph edit distance. Pattern Anal. Appl.,  2010, 13, 113-129.
[http://dx.doi.org/10.1007/s10044-008-0141-y] 
[43] 
Solé, A.; Serratosa, F.; Sanfeliu, A. On the graph edit distance cost: properties and applications. Int. J. Pattern Recognit. Artif. Intell.,  2012, 261260004
[http://dx.doi.org/10.1142/S021800141260004X] 
[44] 
Verma, J.; Khedkar, V.M.; Coutinho, E.C. 3D-QSAR in drug design--a review. Curr. Top. Med. Chem.,  2010, 10(1), 95-115.
[http://dx.doi.org/10.2174/156802610790232260 ] [PMID:  19929826] 
[45] 
Lewis, R.A. A general method for exploiting QSAR models in lead optimization. J. Med. Chem.,  2005, 48(5), 1638-1648.
[http://dx.doi.org/10.1021/jm049228d ] [PMID:  15743205] 
[46] 
Birchall, K.; Gillet, V.J.; Harper, G.; Pickett, S.D. Training similarity measures for specific activities: application to reduced graphs. J. Chem. Inf. Model.,  2006, 46(2), 577-586.
[http://dx.doi.org/10.1021/ci050465e ] [PMID:  16562986] 
[47] 
Xia, J.; Tilahun, E.L.; Reid, T.E.; Zhang, L.; Wang, X.S. Benchmarking methods and data sets for ligand enrichment assessment in virtual screening. Methods,  2015, 71, 146-157.
[http://dx.doi.org/10.1016/j.ymeth.2014.11.015 ] [PMID:  25481478] 
[48] 
Gatica, E.A.; Cavasotto, C.N. Ligand and decoy sets for docking to G protein-coupled receptors. J. Chem. Inf. Model.,  2012, 52(1), 1-6.
[http://dx.doi.org/10.1021/ci200412p ] [PMID:  22168315] 
[49] 
Mysinger, M.M.; Carchia, M.; Irwin, J.J.; Shoichet, B.K. Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking. J. Med. Chem.,  2012, 55(14), 6582-6594.
[http://dx.doi.org/10.1021/jm300687e ] [PMID:  22716043] 
[50] 
Lagarde, N.; Ben Nasr, N.; Jérémie, A.; Guillemain, H.; Laville, V.; Labib, T.; Zagury, J.F.; Montes, M. NRLiSt BDB, the manually curated nuclear receptors ligands and structures benchmarking database. J. Med. Chem.,  2014, 57(7), 3117-3125.
[http://dx.doi.org/10.1021/jm500132p ] [PMID:  24666037] 
[51] 
Rohrer, S.G.; Baumann, K. Maximum unbiased validation (MUV) data sets for virtual screening based on PubChem bioactivity data. J. Chem. Inf. Model.,  2009, 49(2), 169-184.
[http://dx.doi.org/10.1021/ci8002649 ] [PMID:  19434821] 
[52] 
Sanders, M.P.; Barbosa, A.J.; Zarzycka, B.; Nicolaes, G.A.; Klomp, J.P.; de Vlieg, J.; Del Rio, A. Comparative analysis of pharmacophore screening tools. J. Chem. Inf. Model.,  2012, 52(6), 1607-1620.
[http://dx.doi.org/10.1021/ci2005274 ] [PMID:  22646988] 
[53] 
Skoda, P.; Hoksza, D.  In: Benchmarking platform for ligand-based virtual screening; Proceedings of IEEE International Conference on Bioinformatics and Biomedicine (BIBM) Shenzhen, China, , 2016; pp. 1220-1227.
[http://dx.doi.org/10.1109/BIBM.2016.7822693] 
[54] 
Riniker, S.; Landrum, G.A. Open-source platform to benchmark fingerprints for ligand-based virtual screening. J. Cheminform.,  2013, 5(1), 26.
[http://dx.doi.org/10.1186/1758-2946-5-26 ] [PMID:  23721588] 
[55] 
Kearsley, S.K.; Sallamack, S.; Fluder, E.M.; Andose, J.D.; Mosley, R.T.; Sheridan, R.P. Chemical similarity using physiochemical property descriptors. J. Chem. Inf. Comput. Sci.,  1996, 36, 118-127.
[http://dx.doi.org/10.1021/ci950274j] 
[56] 
Blumenthal, D.B.; Gamper, J. On the exact computation of the graph edit distance. Pattern Recognit. Lett.,  2018, 134, 1-12.
[http://dx.doi.org/10.1016/j.patrec.2018.05.002] 
[57] 
Serratosa, F. Fast computation of bipartite graph matching. Pattern Recognit. Lett.,  2014, 45, 244-250.
[http://dx.doi.org/10.1016/j.patrec.2014.04.015] 
[58] 
Santacruz, P.; Serratosa, F. Error-tolerant graph matching in linear computational cost using an initial small partial matching. Pattern Recognit. Lett.,  2018, 134, 10-19.
[http://dx.doi.org/10.1016/j.patrec.2018.04.003] 
[59] 
Conte, D.; Foggia, P.; Sansone, C.; Vento, M. Thirty years of graph matching in pattern recognition. Int. J. Pattern Recognit. Artif. Intell.,  2004, 18, 265-298.
[http://dx.doi.org/10.1142/S0218001404003228] 
[60] 
Vento, M. A long trip in the charming world of graphs for pattern recognition. Pattern Recognit.,  2015, 48, 291-301.
[http://dx.doi.org/10.1016/j.patcog.2014.01.002] 
[61] 
Goldberg, D.E.; Holland, J.H. Genetic algorithms and machine learning. Mach. Learn.,  1988, 3, 95-99.
[http://dx.doi.org/10.1023/A:1022602019183] 
[62] 
Storn, R.; Price, K. Differential evolution - a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim.,  1997, 11, 341-359.
[http://dx.doi.org/10.1023/A:1008202821328] 
[63] 
Rosasco, L.; De Vito, E.; Caponnetto, A.; Piana, M.; Verri, A. Are loss functions all the same? Neural Comput.,  2004, 16(5), 1063-1076.
[http://dx.doi.org/10.1162/089976604773135104 ] [PMID:  15070510] 

Rights & Permissions Print Cite

Article Metrics

18

1

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1568026620666200603122000	Print ISSN 1568-0266
Publisher Name Bentham Science Publisher	Online ISSN 1873-4294

Current Topics in Medicinal Chemistry

Learning the Edit Costs of Graph Edit Distance Applied to Ligand-Based Virtual Screening

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Abstract