Abstract
Background: Pattern mining is the mechanism of extracting useful information from a large dataset of information. A sub-field of web mining is sequential Noisy data extraction from user query, which is considered along with redundancy handling. This redundancy handling mechanism employed in the existing literature is known as ambiguity handling. The clustering mechanism employed in the existing system includes k means, semantic search and incremental growth of the internet.
Aims: The proposed works comprise an analysis of techniques used to extract useful URLs to replace noisy data.
Methods: We consider noisy data extraction from user query considered along with the redundancy handling. This redundancy handling mechanism employed in the existing literature is known as ambiguity handling. The clustering mechanism used in the existing system includes k means and semantic search. These mechanisms are static, causing performance degradation in terms of execution time. It suggests the performance improvement mechanism in this literature.
Results: The methods MPV (Most-Probable-Values) clustering and N-gram techniques for improvement considered in existing literature can further be improved using the research methodology specified through this literature.
Conclusion: In the proposed system, results are based on MPV clustering with N-grams techniques. N-gram analyzes the instances of a word or phrase across all query data. The parameters fetch the results in terms of execution time and the number of URLs retrieves for web page classification.
Keywords: Ambiguity, clustering, noisy data, user query, URL, webpage.
Graphical Abstract