Review Article

使用KNIME访问公共大院数据库

卷 27, 期 38, 2020

页: [6444 - 6457] 页: 14

弟呕挨: 10.2174/0929867326666190801152317

价格: $65

摘要

背景:KNIME平台提供了多种工具来分析化学和药物信息学数据。 除非有足够的内部数据可用于感兴趣的分析,否则必须将第三方数据提取到KNIME中。 许多数据源提供了有价值的数据,但是在工作流中包括这些数据并不总是那么简单。 目的:在这里我们讨论访问公共数据源的不同方法。 我们提供了针对不同来源的KNIME节点的概述,并引用了可用的示例工作流程。 对于没有单个KNIME节点可用的数据源,我们提出了一种通过KNIME访问Web界面的通用方法。 此外,我们讨论了可以分析数据之前的必要步骤,例如数据管理,化学标准化和数据集合并。

关键词: KNIME,数据库,数据挖掘,Web服务,数据管理,化学标准化,REST,API。

[1]
Berthold, M.R.; Cebron, N.; Dill, F.; Gabriel, T.R.; Kötter, T.; Meinl, T.; Ohl, P.; Sieb, C.; Thiel, K.; Wiswedel, B. KNIME: The Konstanz Information Miner. In: Data Analysis; Springer Berlin Heidelberg, , 2008, pp. 319-326.
[http://dx.doi.org/10.1145/1656274.1656280]
[2]
Steinmetz, F.P.; Mellor, C.L.; Meinl, T.; Cronin, M.T.D. Screening chemicals for receptor-mediated toxicological and pharmacological endpoints: using public data to build screening tools within a KNIME workflow. Mol. Inform., 2015, 34(2-3), 171-178.
[http://dx.doi.org/10.1002/minf.201400188] [PMID: 27490039]
[3]
Montanari, F.; Zdrazil, B.; Digles, D.; Ecker, G.F. Selectivity profiling of BCRP versus P-gp inhibition: from automated collection of polypharmacology data to multi-label learning. J. Cheminform., 2016, 8, 7.
[http://dx.doi.org/10.1186/s13321-016-0121-y] [PMID: 26855674]
[4]
Digles, D.; Zdrazil, B.; Neefs, J-M.; Van Vlijmen, H.; Herhaus, C.; Caracoti, A.; Brea, J.; Roibás, B.; Loza, M.I.; Queralt-Rosinach, N.; Furlong, L.I.; Gaulton, A.; Bartek, L.; Senger, S.; Chichester, C.; Engkvist, O.; Evelo, C.T.; Franklin, N.I.; Marren, D.; Ecker, G.F.; Jacoby, E. Open PHACTS computational protocols for in silico target validation of cellular phenotypic screens: knowing the knowns. MedChemComm, 2016, 7(6), 1237-1244.
[http://dx.doi.org/10.1039/C6MD00065G] [PMID: 27774140]
[5]
Zdrazil, B.; Guha, R. The rise and fall of a scaffold: a trend analysis of scaffolds in the medicinal chemistry literature. J. Med. Chem., 2018, 61(11), 4688-4703.
[http://dx.doi.org/10.1021/acs.jmedchem.7b00954] [PMID: 29235859]
[6]
Türková, A.; Jain, S.; Zdrazil, B. Integrative data mining, scaffold analysis, and sequential binary classification models for exploring ligand profiles of hepatic organic anion transporting polypeptides. J. Chem. Inf. Model., 2019, 59(5), 1811-1825.
[http://dx.doi.org/10.1021/acs.jcim.8b00466] [PMID: 30372058]
[7]
Dalby, A.; Nourse, J.G.; Hounshell, W.D.; Gushurst, A.K.I.; Grier, D.L.; Leland, B.A.; Laufer, J. Description of several chemical structure file formats used by computer programs developed at molecular design limited. J. Chem. Inf. Model., 1992, 32(3), 244-255.
[http://dx.doi.org/10.1021/ci00007a012]
[8]
Bray, T.; Maler, E.; Yergeau, F.; Sperberg-McQueen, M.; Paoli, J. Extensible Markup Language (XML) 1.0 (Fifth Edition); W3C, 2008.
[9]
Bray, T. The JavaScript Object Notation (JSON) Data interchange format; RFC Editor/ RFC Editor, 2017.
[10]
OPS-Knime, OPEN PHACTS, 2012. Available at: https://github.com/openphacts/OPS-Knime (Accessed Date 17.02.2020.)
[11]
Chichester, C.; Digles, D.; Siebes, R.; Loizou, A.; Groth, P.; Harland, L. Drug discovery FAQs: workflows for answering multidomain drug discovery questions. Drug Discov. Today, 2015, 20(4), 399-405.
[http://dx.doi.org/10.1016/j.drudis.2014.11.006] [PMID: 25463038]
[12]
Groth, P.; Loizou, A.; Gray, A.J. API-centric linked data integration: the open PHACTS discovery platform case study. J. Web Semant., 2014, 29, 12-18.
[http://dx.doi.org/10.1016/j.websem.2014.03.003]
[13]
Varsou, D-D.; Nikolakopoulos, S.; Tsoumanis, A.; Melagraki, G.; Afantitis, A. Enalos+ KNIME Nodes: new cheminformatics tools for drug discovery. Methods Mol. Biol., 2018, 1824, 113-138.
[http://dx.doi.org/10.1007/978-1-4939-8630-9_7] [PMID: 30039404]
[14]
Wolber, G.; Langer, T. LigandScout: 3-D pharmacophores derived from protein-bound ligands and their use as virtual screening filters. J. Chem. Inf. Model., 2005, 45(1), 160-169.
[http://dx.doi.org/10.1021/ci049885e] [PMID: 15667141]
[15]
Gilson, M.K.; Liu, T.; Baitaluk, M.; Nicola, G.; Hwang, L.; Chong, J. BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic Acids Res., 2016, 44(D1), D1045-D1053.
[http://dx.doi.org/10.1093/nar/gkv1072] [PMID: 26481362]
[16]
Howe, E.A.; de Souza, A.; Lahr, D.L.; Chatwin, S.; Montgomery, P.; Alexander, B.R.; Nguyen, D-T.; Cruz, Y.; Stonich, D.A.; Walzer, G.; Rose, J.T.; Picard, S.C.; Liu, Z.; Rose, J.N.; Xiang, X.; Asiedu, J.; Durkin, D.; Levine, J.; Yang, J.J.; Schürer, S.C.; Braisted, J.C.; Southall, N.; Southern, M.R.; Chung, T.D.Y.; Brudz, S.; Tanega, C.; Schreiber, S.L.; Bittker, J.A.; Guha, R.; Clemons, P.A. BioAssay Research Database (BARD): chemical biology and probe-development enabled by structured metadata and result types. Nucleic Acids Res., 2015, 43(Database issue), D1163-D1170.
[http://dx.doi.org/10.1093/nar/gku1244] [PMID: 25477388]
[17]
Gaulton, A.; Bellis, L.J.; Bento, A.P.; Chambers, J.; Davies, M.; Hersey, A.; Light, Y.; McGlinchey, S.; Michalovich, D.; Al-Lazikani, B.; Overington, J.P. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res., 2012, 40(Database issue), D1100-D1107.
[http://dx.doi.org/10.1093/nar/gkr777] [PMID: 21948594]
[18]
Bento, A.P.; Gaulton, A.; Hersey, A.; Bellis, L.J.; Chambers, J.; Davies, M.; Krüger, F.A.; Light, Y.; Mak, L.; McGlinchey, S.; Nowotka, M.; Papadatos, G.; Santos, R.; Overington, J.P. The ChEMBL bioactivity database: an update. Nucleic Acids Res., 2014, 42(Database issue), D1083-D1090.
[http://dx.doi.org/10.1093/nar/gkt1031] [PMID: 24214965]
[19]
Gaulton, A.; Hersey, A.; Nowotka, M.; Bento, A.P.; Chambers, J.; Mendez, D.; Mutowo, P.; Atkinson, F.; Bellis, L.J.; Cibrián-Uhalte, E.; Davies, M.; Dedman, N.; Karlsson, A.; Magariños, M.P.; Overington, J.P.; Papadatos, G.; Smit, I.; Leach, A.R. The ChEMBL database in 2017. Nucleic Acids Res., 2017, 45(D1), D945-D954.
[http://dx.doi.org/10.1093/nar/gkw1074] [PMID: 27899562]
[20]
Davies, M.; Nowotka, M.; Papadatos, G.; Dedman, N.; Gaulton, A.; Atkinson, F.; Bellis, L.; Overington, J.P. ChEMBL web services: streamlining access to drug discovery data and utilities. Nucleic Acids Res., 2015, 43(W1)W612-20
[http://dx.doi.org/10.1093/nar/gkv352] [PMID: 25883136]
[21]
Nowotka, M.M.; Gaulton, A.; Mendez, D.; Bento, A.P.; Hersey, A.; Leach, A. Using ChEMBL web services for building applications and data processing workflows relevant to drug discovery. Expert Opin. Drug Discov., 2017, 12(8), 757-767.
[PMID: 28602100 ]
[22]
Williams, A. ChemSpider and its demanding web: building a structure-centric community for chemists. Chem. Int., 2008, •••, 30.
[23]
Pence, H.E.; Williams, A. ChemSpider: an online chemical information resource. J. Chem. Educ., 2010, 87(11), 1123-1124.
[http://dx.doi.org/10.1021/ed100697w]
[24]
Wishart, D.S.; Knox, C.; Guo, A.C.; Shrivastava, S.; Hassanali, M.; Stothard, P.; Chang, Z.; Woolsey, J. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res., 2006, 34(Database issue), D668-D672.
[http://dx.doi.org/10.1093/nar/gkj067] [PMID: 16381955]
[25]
Wishart, D.S.; Feunang, Y.D.; Guo, A.C.; Lo, E.J.; Marcu, A.; Grant, J.R.; Sajed, T.; Johnson, D.; Li, C.; Sayeeda, Z.; Assempour, N.; Iynkkaran, I.; Liu, Y.; Maciejewski, A.; Gale, N.; Wilson, A.; Chin, L.; Cummings, R.; Le, D.; Pon, A.; Knox, C.; Wilson, M. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res., 2018, 46(D1), D1074-D1082.
[http://dx.doi.org/10.1093/nar/gkx1037] [PMID: 29126136]
[26]
Berman, H.M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T.N.; Weissig, H.; Shindyalov, I.N.; Bourne, P.E. The protein data bank. Nucleic Acids Res., 2000, 28(1), 235-242.
[http://dx.doi.org/10.1093/nar/28.1.235] [PMID: 10592235]
[27]
Release, S. 2019-2: Schrödinger KNIME Extensions; Schrödinger, LLC: New York, NY, 2019.
[28]
Kim, S.; Thiessen, P.A.; Bolton, E.E.; Chen, J.; Fu, G.; Gindulyte, A.; Han, L.; He, J.; He, S.; Shoemaker, B.A.; Wang, J.; Yu, B.; Zhang, J.; Bryant, S.H. PubChem substance and compound databases. Nucleic Acids Res., 2016, 44(D1), D1202-D1213.
[http://dx.doi.org/10.1093/nar/gkv951] [PMID: 26400175]
[29]
Kim, S.; Chen, J.; Cheng, T.; Gindulyte, A.; He, J.; He, S.; Li, Q.; Shoemaker, B.A.; Thiessen, P.A.; Yu, B.; Zaslavsky, L.; Zhang, J.; Bolton, E.E. PubChem 2019 update: improved access to chemical data. Nucleic Acids Res., 2019, 47(D1), D1102-D1109.
[http://dx.doi.org/10.1093/nar/gky1033] [PMID: 30371825]
[30]
Sterling, T.; Irwin, J.J. ZINC 15--ligand discovery for everyone. J. Chem. Inf. Model., 2015, 55(11), 2324-2337.
[http://dx.doi.org/10.1021/acs.jcim.5b00559] [PMID: 26479676]
[31]
Kim, S.; Thiessen, P.A.; Bolton, E.E.; Bryant, S.H. PUG-SOAP and PUG-REST: web services for programmatic access to chemical information in PubChem. Nucleic Acids Res., 2015, 43(W1)W605-11
[http://dx.doi.org/10.1093/nar/gkv396] [PMID: 25934803]
[32]
Fourches, D.; Muratov, E.; Tropsha, A. Trust, but verify: on the importance of chemical structure curation in cheminformatics and QSAR modeling research. J. Chem. Inf. Model., 2010, 50(7), 1189-1204.
[http://dx.doi.org/10.1021/ci100176x] [PMID: 20572635]
[33]
Kotsampasakou, E.; Montanari, F.; Ecker, G.F. Predicting drug-induced liver injury: The importance of data curation. Toxicology, 2017, 389, 139-145.
[http://dx.doi.org/10.1016/j.tox.2017.06.003] [PMID: 28652195]
[34]
Fourches, D.; Muratov, E.; Tropsha, A. Trust, but Verify II: a practical guide to chemogenomics data curation. J. Chem. Inf. Model., 2016, 56(7), 1243-1252.
[http://dx.doi.org/10.1021/acs.jcim.6b00129] [PMID: 27280890]
[35]
Papadatos, G.; Gaulton, A.; Hersey, A.; Overington, J.P. Activity, assay and target data curation and quality in the ChEMBL database. J. Comput. Aided Mol. Des., 2015, 29(9), 885-896.
[http://dx.doi.org/10.1007/s10822-015-9860-5] [PMID: 26201396]
[36]
Tang, J.; Tanoli, Z-U-R.; Ravikumar, B.; Alam, Z.; Rebane, A.; Vähä-Koskela, M.; Peddinti, G.; van Adrichem, A.J.; Wakkinen, J.; Jaiswal, A.; Karjalainen, E.; Gautam, P.; He, L.; Parri, E.; Khan, S.; Gupta, A.; Ali, M.; Yetukuri, L.; Gustavsson, A-L.; Seashore-Ludlow, B.; Hersey, A.; Leach, A.R.; Overington, J.P.; Repasky, G.; Wennerberg, K.; Aittokallio, T. Drug target commons: a community effort to build a consensus knowledge base for drug-target interactions. Cell Chem. Biol., 2018, 25(2), 224-229.e2.
[http://dx.doi.org/10.1016/j.chembiol.2017.11.009] [PMID: 29276046]
[37]
Zdrazil, B.; Pinto, M.; Vasanthanathan, P.; Williams, A.J.; Balderud, L.Z.; Engkvist, O.; Chichester, C.; Hersey, A.; Overington, J.P.; Ecker, G.F. Annotating human p-glycoprotein bioassay data. Mol. Inform., 2012, 31(8), 599-609.
[http://dx.doi.org/10.1002/minf.201200059] [PMID: 23293680]
[38]
Hersey, A.; Chambers, J.; Bellis, L.; Patrícia Bento, A.; Gaulton, A.; Overington, J.P. Chemical databases: curation or integration by user-defined equivalence? Drug Discov. Today. Technol., 2015, 14, 17-24.
[http://dx.doi.org/10.1016/j.ddtec.2015.01.005] [PMID: 26194583]
[39]
Gally, J-M.; Bourg, S.; Do, Q-T.; Aci-Sèche, S.; Bonnet, P. VSPrep: a general KNIME workflow for the preparation of molecules for virtual screening. Mol. Inform., 2017, 36(10), 36.
[http://dx.doi.org/10.1002/minf.201700023] [PMID: 28586180]
[41]
Hähnke, V.D.; Kim, S.; Bolton, E.E. PubChem chemical structure standardization. J. Cheminform., 2018, 10(1), 36.
[http://dx.doi.org/10.1186/s13321-018-0293-8] [PMID: 30097821]
[42]
Digles, D.; Caracoti, A.; Jacoby, E. Accessing the open PHACTS discovery platform with workflow tools in: Phenotypic Screening; Wagner, B., Ed.; , 2018, pp. 183-193.
[http://dx.doi.org/10.1007/978-1-4939-7847-2_14]
[43]
Van Rossum, G.; Drake, F.L., Jr Python Reference Manual; Centrum voor Wiskunde en Informatica Amsterdam, 1995.
[44]
Landrum, G. RDKit: Open-Source Cheminformatics; Available at: http://www.rdkit.org (Accessed Date: 17.02.2020)
[45]
Chambers, J.; Davies, M.; Gaulton, A.; Hersey, A.; Velankar, S.; Petryszak, R.; Hastings, J.; Bellis, L.; McGlinchey, S.; Overington, J.P. UniChem: a unified chemical structure cross-referencing and identifier tracking system. J. Cheminform., 2013, 5(1), 3.
[http://dx.doi.org/10.1186/1758-2946-5-3] [PMID: 23317286]
[46]
Gray, A.; Groth, P.; Loizou, A.; Askjaer, S.; Brenninkmeijer, C.; Burger, K.; Chichester, C.; Evelo, C.; Goble, C.; Harland, L.; Pettifer, S.; Thompson, M.; Waagmeester, A.; Williams, A. Applying linked data approaches to pharmacology: architectural decisions and implementation. Semant. Web, 2014, 33, 101-113.
[http://dx.doi.org/10.3233/SW-2012-0088]
[47]
Karapetyan, K.; Batchelor, C.; Sharpe, D.; Tkachenko, V.; Williams, A.J. The Chemical Validation and Standardization Platform (CVSP): large-scale automated validation of chemical structure datasets. J. Cheminform., 2015, 7, 30.
[http://dx.doi.org/10.1186/s13321-015-0072-8] [PMID: 26155308]
[48]
Batchelor, C.; Brenninkmeijer, C.Y.A.; Chichester, C.; Davies, M.; Digles, D.; Dunlop, I.; Evelo, C.T.; Gaulton, A.; Goble, C.; Gray, A.J.G.; Groth, P.; Harland, L.; Karapetyan, K.; Loizou, A.; Overington, J.P.; Pettifer, S.; Steele, J.; Stevens, R.; Tkachenko, V.; Waagmeester, A.; Williams, A.; Willighagen, E.L. Scientific lenses to support multiple views over linked chemistry data in: The Semantic Web - ISWC 2014; Springer International Publishing, 2014, pp. 98-113.
[http://dx.doi.org/10.1007/978-3-319-11964-9_7]
[49]
Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J.J.; Appleton, G.; Axton, M.; Baak, A.; Blomberg, N.; Boiten, J-W.; da Silva Santos, L.B.; Bourne, P.E.; Bouwman, J.; Brookes, A.J.; Clark, T.; Crosas, M.; Dillo, I.; Dumon, O.; Edmunds, S.; Evelo, C.T.; Finkers, R.; Gonzalez-Beltran, A.; Gray, A.J.G.; Groth, P.; Goble, C.; Grethe, J.S.; Heringa, J.; ’t Hoen, P.A.C.; Hooft, R.; Kuhn, T.; Kok, R.; Kok, J.; Lusher, S.J.; Martone, M.E.; Mons, A.; Packer, A.L.; Persson, B.; Rocca-Serra, P.; Roos, M.; van Schaik, R.; Sansone, S-A.; Schultes, E.; Sengstag, T.; Slater, T.; Strawn, G.; Swertz, M.A.; Thompson, M.; van der Lei, J.; van Mulligen, E.; Velterop, J.; Waagmeester, A.; Wittenburg, P.; Wolstencroft, K.; Zhao, J.; Mons, B. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data, 2016, 3160018
[http://dx.doi.org/10.1038/sdata.2016.18] [PMID: 26978244]

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy