Abstract
Biomedical information continues to grow beyond the capacity of scientists to capture and use all that is produced. Much of this information is presented in scientific journal articles and expressed in natural language. Biomedical text data mining is concerned with automated methods for analyzing the content of these documents and discovering and extracting the knowledge in them. Numerical data mining has long been used to uncover patterns in numerical data and make predictions based on those patterns. Text data mining builds on the success of numerical data mining but presents additional challenges. This article examines text data mining for biomedical text, paying particular attention to the complexities of natural language that must be taken into account and to the role of biomedical knowledge sources. Using this perspective, recent patents for data mining specific to biomedical text are discussed and expected future patent activity is appraised.
Keywords: Text data mining, biomedical discovery, MeSH, PubMed, Gene Ontology, GO, controlled vocabulary, ontology, entity extraction, exploratory data analysis, natural language processing, patents