Abstract
A parallel and distributed computing mining system is proposed for finding protein-protein interaction literatures from the databases on the Internet. In the proposed system, we try to find out discriminating words for protein-protein interaction by way of text mining from the literatures. A threshold called matching-degree is also evaluated to check if a given literature might related to protein- protein interactions. Furthermore, a keypage-based search mechanism is adopted to find related papers for protein-protein interactions from a given document. The system is designed with a webbased graphical user interface and a parallel and distributed job-dispatching kernel. Experiments are conducted the experimental results indicate that by using the proposed system, it is helpful for researchers to find out protein-protein literatures from the overwhelming piece of information. Moreover, the utilization of parallel and distributed architecture makes this system scalable and the speedup and efficiency of the system are promising. With two servers, the speedup is 1.95 and with three servers the speedup is 3.97 which derive the efficiency to be 0.975 and 0.9925, respectively.
Keywords: Protein-protein interactions, parallel and distributed computing, intelligent computing, data mining, information retrieval, keypage-based search.
Graphical Abstract