Abstract
Background: Text summarization is the process of generating a short description of the entire document which is more difficult to read. This method provides a convenient way of extracting the most useful information and a short summary of the documents. In the existing research work, this is focused by introducing the Fuzzy Rule-based Automated Summarization Method (FRASM). Existing work tends to have various limitations which might limit its applicability to the various real-world applications. The existing method is only suitable for the single document summarization where various applications such as research industries tend to summarize information from multiple documents.
Methods: This paper proposed Multi-document Automated Summarization Method (MDASM) to introduce the summarization framework which would result in the accurate summarized outcome from the multiple documents. In this work, multi-document summarization is performed whereas in the existing system only single document summarization was performed. Initially document clustering is performed using modified k means cluster algorithm to group the similar kind of documents that provides the same meaning. This is identified by measuring the frequent term measurement. After clustering, pre-processing is performed by introducing the Hybrid TF-IDF and Singular value decomposition technique which would eliminate the irrelevant content and would result in the required content. Then sentence measurement is one by introducing the additional metrics namely Title measurement in addition to the existing work metrics to accurately retrieve the sentences with more similarity. Finally, a fuzzy rule system is applied to perform text summarization.
Results: The overall evaluation of the research work is conducted in the MatLab simulation environment from which it is proved that the proposed research method ensures the optimal outcome than the existing research method in terms of accurate summarization. MDASM produces 89.28% increased accuracy, 89.28% increased precision, 89.36% increased recall value and 70% increased the f-measure value which performs better than FRASM.
Conclusion: The summarization processes carried out in this work provides the accurate summarized outcome.
Keywords: Summarization, frequent term measurement, irrelevant content, multi documents, sentence measurement, TF-IDF.
Graphical Abstract
[http://dx.doi.org/10.1007/s10462-016-9475-9]
[http://dx.doi.org/10.1109/TKDE.2015.2405553]
[http://dx.doi.org/10.1016/j.eswa.2016.10.017]
[http://dx.doi.org/10.1109/TASLP.2015.2432578]
[http://dx.doi.org/10.1016/j.chb.2014.10.062]
[http://dx.doi.org/10.1198/1085711032697]
[http://dx.doi.org/10.1093/bioinformatics/btw354 PMID: 27312411]
[http://dx.doi.org/10.1109/TKDE.2014.2345379]
[http://dx.doi.org/10.1109/TPDS.2015.2506573]
[http://dx.doi.org/10.1007/978-1-4899-7637-6_1]
[http://dx.doi.org/10.1109/TKDE.2015.2405553]
[http://dx.doi.org/10.1109/TKDE.2010.228]
[http://dx.doi.org/10.1093/comjnl/bxt109]
[http://dx.doi.org/10.1109/TLT.2017.2682086]
[http://dx.doi.org/10.1109/TASLP.2015.2432578]
[http://dx.doi.org/10.1109/TASLP.2017.2764545]
[http://dx.doi.org/10.1016/S0306-4573(96)00062-3]
[http://dx.doi.org/10.1145/2682571.2797099]