Abstract
Background: With increasing rates of polypharmacy, the vigilant surveillance of clinical drug toxicity has emerged as an important concern. Named Entity Recognition (NER) stands as an indispensable undertaking, essential for the extraction of valuable insights regarding drug safety from the biomedical literature. In recent years, significant advancements have been achieved in the deep learning models on NER tasks. Nonetheless, the effectiveness of these NER techniques relies on the availability of substantial volumes of annotated data, which is labor-intensive and inefficient.
Methods: This study introduces a novel approach that diverges from the conventional reliance on manually annotated data. It employs a transformer-based technique known as Positive-Unlabeled Learning (PULearning), which incorporates adaptive learning and is applied to the clinical cancer drug toxicity corpus. To improve the precision of prediction, we employ relative position embeddings within the transformer encoder. Additionally, we formulate a composite loss function that integrates two Kullback-Leibler (KL) regularizers to align with PULearning assumptions. The outcomes demonstrate that our approach attains the targeted performance for NER tasks, solely relying on unlabeled data and named entity dictionaries.
Conclusion: Our model achieves an overall NER performance with an F1 of 0.819. Specifically, it attains F1 of 0.841, 0.801 and 0.815 for DRUG, CANCER, and TOXI entities, respectively. A comprehensive analysis of the results validates the effectiveness of our approach in comparison to existing PULearning methods on biomedical NER tasks. Additionally, a visualization of the associations among three identified entities is provided, offering a valuable reference for querying their interrelationships.
[http://dx.doi.org/10.1080/03007995.2019.1647021] [PMID: 31328967]
[http://dx.doi.org/10.3389/fphar.2020.624562] [PMID: 33841134]
[http://dx.doi.org/10.3389/fphar.2020.622862] [PMID: 33536925]
[http://dx.doi.org/10.1136/amiajnl-2011-000214] [PMID: 21676938]
[http://dx.doi.org/10.1002/cncr.31389] [PMID: 29645083]
[http://dx.doi.org/10.1093/oncolo/oyac053] [PMID: 35348764]
[http://dx.doi.org/10.1186/s12859-022-04612-2] [PMID: 35255808]
[http://dx.doi.org/10.1016/j.ins.2017.06.021]
[http://dx.doi.org/10.1016/j.tips.2019.07.005] [PMID: 31383376]
[http://dx.doi.org/10.1007/s40264-018-0766-8] [PMID: 30649734]
[http://dx.doi.org/10.1007/978-1-0716-2305-3_13] [PMID: 35713868]
[http://dx.doi.org/10.1007/s40264-014-0218-z] [PMID: 25151493]
[http://dx.doi.org/10.1093/bib/bbaa057] [PMID: 32422651]
[http://dx.doi.org/10.1109/TKDE.2020.2981314]
[http://dx.doi.org/10.1007/s13173-011-0031-9]
[http://dx.doi.org/10.1093/bioinformatics/btx228] [PMID: 28881963]
[http://dx.doi.org/10.1093/bioinformatics/btx761] [PMID: 29186323]
[http://dx.doi.org/10.1093/bioinformatics/btz682] [PMID: 31501885]
[http://dx.doi.org/10.1016/j.jbi.2020.103526] [PMID: 32768446]
[http://dx.doi.org/10.18653/v1/2020.acl-main.139]
[http://dx.doi.org/10.18653/v1/K19-1060]
[http://dx.doi.org/10.24963/ijcai.2017/457]
[http://dx.doi.org/10.18653/v1/D18-1230]
[http://dx.doi.org/10.18653/v1/P19-1231]
[PMID: 29295142]
[http://dx.doi.org/10.1093/annonc/mdv029] [PMID: 25628444]
[http://dx.doi.org/10.1007/978-3-319-43896-2_8] [PMID: 30552660]
[http://dx.doi.org/10.1097/MCO.0b013e3283455d45] [PMID: 21415735]
[http://dx.doi.org/10.1109/ICHI.2017.15] [PMID: 29034375]
[http://dx.doi.org/10.3389/fphar.2023.1121796] [PMID: 37332351]
[http://dx.doi.org/10.1007/s40264-018-0762-z] [PMID: 30649735]
[http://dx.doi.org/10.1016/j.jbi.2015.03.010] [PMID: 25817970]
[http://dx.doi.org/10.1016/j.jbi.2015.06.016] [PMID: 26141794]
[http://dx.doi.org/10.1093/jamia/ocz166] [PMID: 31584655]
[http://dx.doi.org/10.3389/fphar.2020.582470] [PMID: 34017245]
[http://dx.doi.org/10.1186/s13326-023-00287-7] [PMID: 37248476]
[http://dx.doi.org/10.1093/bioinformatics/btad161] [PMID: 37004189]
[http://dx.doi.org/10.1136/dtb.2018.3.0602] [PMID: 29545265]
[PMID: 10835153]
[http://dx.doi.org/10.1353/pbm.2019.0046] [PMID: 31761807]
[http://dx.doi.org/10.1186/s12955-020-01591-x] [PMID: 33046106]
[http://dx.doi.org/10.1007/s13181-021-00833-8] [PMID: 33826117]
[http://dx.doi.org/10.1007/978-94-024-0881-2_11]
[http://dx.doi.org/10.1007/11564096_24]
[http://dx.doi.org/10.1145/1401890.1401920]
[http://dx.doi.org/10.1109/TGRS.2010.2058578]
[http://dx.doi.org/10.1016/j.jbi.2021.103982] [PMID: 34974190]
[http://dx.doi.org/10.18653/v1/2020.findings-emnlp.298]
[http://dx.doi.org/10.1162/coli_a_00445]
[http://dx.doi.org/10.1016/j.artmed.2018.03.001] [PMID: 29559249]
[http://dx.doi.org/10.1609/aaai.v32i1.11715]
[http://dx.doi.org/10.1109/TCYB.2018.2816984] [PMID: 29993676]
[http://dx.doi.org/10.1093/bioinformatics/btac848] [PMID: 36727493]
[http://dx.doi.org/10.1016/j.automatica.2014.09.005]
[http://dx.doi.org/10.1109/TCYB.2020.3008248] [PMID: 32721911]
[http://dx.doi.org/10.3390/e23091122] [PMID: 34573747]
[http://dx.doi.org/10.18653/v1/2020.ecnlp-1.1]
[http://dx.doi.org/10.1177/001872086700900507] [PMID: 5582459]
[http://dx.doi.org/10.1007/s11096-018-0706-9] [PMID: 30069667]
[http://dx.doi.org/10.1016/S0165-6147(00)01742-9] [PMID: 11479005]
[PMID: 20496255]