Abstract
Background: Chemical compounds and proteins/genes are an important class of entities in biomedical research, and their interactions play a key role in precision medicine, drug discovery, basic clinical research, and building knowledge bases. Many computational methods have been proposed to identify chemical–protein interactions. However, the majority of these proposed models cannot model long-distance dependencies between chemical and protein, and the neural networks used to suffer from gradient descent, with little taking into account the characteristics of the chemical structure characteristics of the compound.
Methods: To address the above limitations, we propose a novel model, SIMEON, to identify chemical– protein interactions. First, an input sequence is represented with pre-trained language model and an attention mechanism is used to uncover contribution degree of different words to entity relations and potential semantic information. Secondly, key features are extracted by a multi-layer stacked Bidirectional Gated Recurrent Units (Bi-GRU)-normalization residual network module to resolve higherorder dependencies while overcoming network degradation. Finally, the representation is introduced to be enhanced by external knowledge regarding the chemical structure characteristics of the compound external knowledge.
Results: Excellent experimental results show that our stacked integration model combines the advantages of Bi-GRU, normalization methods, and external knowledge to improve the performance of the model by complementing each other.
Conclusion: Our proposed model shows good performance in chemical-protein interaction extraction, and it can be used as a useful complement to biological experiments to identify chemical-protein interactions.
Keywords: chemical–protein interaction, normalization, stacked, residual network, molecular and protein representation, biomedical text.
Graphical Abstract
[http://dx.doi.org/10.1093/database/bav123] [PMID: 26876982]
[http://dx.doi.org/10.1109/TBME.2016.2573285] [PMID: 27740470]
[http://dx.doi.org/10.1016/j.apacoust.2021.108273]
[http://dx.doi.org/10.1186/s12915-021-01002-7] [PMID: 33845831]
[http://dx.doi.org/10.1016/j.bmcl.2021.127852] [PMID: 33609660]
[http://dx.doi.org/10.1093/database/bay108] [PMID: 30346607]
[http://dx.doi.org/10.1093/eurheartj/ehab588] [PMID: 34468739]
[http://dx.doi.org/10.1504/IJDMB.2019.10021458]
[http://dx.doi.org/10.1016/j.neunet.2005.06.042] [PMID: 16112549]
[http://dx.doi.org/10.1109/78.650093]
[http://dx.doi.org/10.2196/17643] [PMID: 32348257]
[http://dx.doi.org/10.1093/bioinformatics/bty294] [PMID: 29949996]
[http://dx.doi.org/10.1109/TST.2015.7297748]
[http://dx.doi.org/10.1109/72.788640] [PMID: 18252602]
[http://dx.doi.org/10.1155/2020/8887088]
[http://dx.doi.org/10.1162/neco.1997.9.8.1735] [PMID: 9377276]
[http://dx.doi.org/10.1093/database/bay073] [PMID: 30020437]
[http://dx.doi.org/10.3115/v1/D14-1181]
[http://dx.doi.org/10.18653/v1/D19-1006]
[http://dx.doi.org/10.2196/24678] [PMID: 33949962]
[http://dx.doi.org/10.18653/v1/2020.bionlp-1.22]
[http://dx.doi.org/10.18653/v1/2020.findings-emnlp.189]
[http://dx.doi.org/10.1016/j.jbi.2020.103392] [PMID: 32068034]
[http://dx.doi.org/10.18653/v1/D19-1371]
[http://dx.doi.org/10.3115/v1/P15-1107]
[http://dx.doi.org/10.1093/bioinformatics/bty535] [PMID: 29982330]
[http://dx.doi.org/10.1093/bioinformatics/btaa491] [PMID: 32399565]
[http://dx.doi.org/10.18653/v1/W19-5006]
[http://dx.doi.org/10.1016/j.jbi.2013.07.011] [PMID: 23906817]
[http://dx.doi.org/10.1155/2016/6918381] [PMID: 26941831]
[http://dx.doi.org/10.1186/s12859-017-1855-x] [PMID: 29017459]
[http://dx.doi.org/10.1093/bioinformatics/btaa907] [PMID: 33098410]