Abstract
Background: As a known keyphrase extraction algorithm, TextRank is an analog of the PageRank algorithm, which relies heavily on the statistics of term frequency in the manner of cooccurrence analysis.
Objective: The frequency-based characteristic made it a bottleneck for performance enhancement, and various improved TextRank algorithms were proposed in recent years. Most of the improvements incorporated semantic information into the keyphrase extraction algorithm and achieved improvement.
Method: In this research, taking both syntactic and semantic information into consideration, we integrated syntactic tree algorithm and word embedding and put forward an algorithm of Word Embedding and Syntactic Information Algorithm (WESIA), which improved the accuracy of the TextRank algorithm.
Results: By applying our method on a self-made test set and a public test set, the result implied that the proposed unsupervised keyphrase extraction algorithm outperformed the other algorithms to some extent.
Keywords: Key phrases, Syntactic distance, Word embedding, Algorithm, TextRank, Word Embedding and Syntactic Information Algorithm (WESIA).
Graphical Abstract