Cluster and Outlier Analysis for Ground Water Quality Data in the Regions of Kadapa District in Andhra Pradesh

S.V.S.    Ganga    Devi

doi:10.2174/1872212113666190211144935

Abstract

Background: Patents suggest that groundwater contaminated with chemicals, bacteria, oils or gases etc. leads to many types of diseases in people. Fresh and clean water plays a significant role in human life. In this study, water samples were collected from different regions of the Kadapa district, Andhra Pradesh.

Methods: Water samples were collected in plastic bottles with a tight cap washed with distilled water. Totally, 57 samples were collected and analyzed in the laboratory for physicochemical properties like EC (Electrical Conductivity), pH, TH (Total Hardness), Total Dissolved Solids (TDS),Ca, Cl and F. In this paper, K-means clustering, K-Mediods clustering and Hierarchical clustering methods are used to group the collected regions of water samples based on the water quality. Later outlier analysis was carried out and various interesting patterns were identified.

Results: According to the WQI values calculated, all the collected samples were suitable for drinking purpose. According to WQI values calculation, for the collected water sample data, it contained 13 poor tuples, 13 good tuples and 31 excellent tuples. According to K-means clustering, 3 clusters were observed with sizes 8, 17, 32. According to Outlier analysis, the samples from region Pullareddypet (sample No. 7) had the highest EC, TH and TDS values among the 57 collected water samples. The samples from region Veerapalli (Sample No. 37) had the highest fluoride value 3.58 among all 57 samples collected.

Conclusion: Unsupervised learning methods such as K-Means Clustering, K-Mediods clustering and Hierarchical clustering methods are described for collecting data regarding the collected water samples’ physico-chemical parameters. The cluster analysis results were compared with WQI values calculated. The three clusters overlapped with each other with a small degree. In the study area, for drinking purpose, only excellent, good, poor category tuples were found. Later, outlier analysis has been described using Box plot method and K-means clustering method. By using outlier analysis using K-means clustering, various interesting hidden patterns from the data were extracted.

Keywords: Groundwater, data mining, clustering, outliers, classification, WQI.

Graphical Abstract

[1] 
A. Kumar,  Environmental studies, New age International Publications,	New Delhi, 2001..
[2] 
C. Ramachandraiah, Right to drinking water in India.Cent. Econ. Soc. Sci. Stud pp.56, 2004., .
[3] 
P. Rajankar, Assessment of Ground water Quality using water quality index (WQI) in Wardha Maharashtra.J. Environ. Sci. sustainab.,	NEERI, vol. 1, no. 2, 2010..
[4] 
S. Areerachakul,  and S. Sanguansintukul, Clustering analysis of water quality for canals in Bangkok, Thailand.In Computational Science and Its Applications CICCSA 2010”, Springer: Berlin, Germany, vol. 6 no.18, pp. 215–227, 2010..
[5] 
T. Hothorn, K. Hornik,  and A. Zeile, "Unbiased Recursive Partitioning: A Conditional Inference Framework", J. Comput. Graph. Stat., vol. 15, no. 3, pp. 651-674, 2006.
[6] 

WHO, the guide line for water quality recommendations., World Health Organization, 2011.
[7] 
H.S. Xu, Z.X. Xu, W. Wu,  and F.F. Tang, "Assessment and spatiotemporal variation analysis of water quality in the Zhangweinan River Basin, China", Procedia Environ. Sci., vol. 3, pp. 1641-1652, 2012.
[8] 
Y. Zhao, X.H. Xia, Z.F. Yang,  and F. Wang, Assessment of water quality in Baiyangdian Lake using multivariate statistical techniques.Procedia Environ. Sci., Elsevier, pp. 1213-1226.2012, 
[9] 
K. Kolli,  and R. Seshadri, "Ground water quality assessment using data mining techniques", Int. J. Comput. Appl., vol. 76, no. 15, 2013.
[10] 
J. Lu,  and T. Huang, "Data Mining on Forecast Raw Water Quality from Online Monitoring Station Based on Decision-making Tree Fifth International Joint Conference on INC, IMS and IDC, 2009", 
[11] 
J. Camejo, O. Pacheco,  and M. Guevara, “Classifier for Drinking water Quality in real time”, Found. Sci. Technol., IEEE, 2013.
[12] 
M.J. Diamantopoulou, V.Z. Antonopoulos,  and D.M. Papamichail, “The use of a Neural Network technique for the prediction of water quality parameters of Axios River in Northern Greece”, EuropeanWater11/12:55-62, E., W Publications, 2005.
[13] 
G. Shoba, "water Quality prediction using Data mining techniques: A Survey", Int. J. Eng. Comput. Sci., vol. 3, no. 6, pp. 6299-6306, 2014.
[14] 
A. Zimek, E. Schubert,  and H.P. Kriegel, "A survey on unsupervised outlier detection in high-dimensional numerical data", Statistical Analysis and Data Min.ing, vol. 5, no. 5, pp. 363-387, 2012.
[15] 
Y. Zhang, Outlier detection in databases U.S. Patent 10049128 (B1), 2018..
[16] 
J. Wang, and J. Xiao, “Point set matching with outlier detection”,	U.S. Patent 2014133762 (A1), 2014, 
[17] 
S. Maiti,  and R.K. Tiwari, "A comparative study of artificial neural networks, Bayesian neural networks and adaptive neuro-fuzzy inference system in groundwater level prediction", Environ. Earth Sci., vol. 71, no. 7, pp. 3147-3160, 2013.
[18] 
F. Howladar,  and  Md Abdullah A.I., Numanbakth, and M. O. Faruque, “An application of Water quality index (WQI) and multivariate statistics to evaluate the water quality around Maddhapara Granite Mining Industrial area.,  Dinajpur, Bangladesh”,Environ. Syst. Res., Springer Open, 2017.
[19] 
T.K. Boateng, F. Opoku, S. Acquaah,  and O. Akoto, "Ground water quality assessment using statistical approach and water quality index in Ejsujuaben Municipality,Ghana", Environ. Earth Sci., pp. 75-489, 2016.

Rights & Permissions Print Cite

Article Metrics

7

1

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1872212113666190211144935	Print ISSN 1872-2121
Publisher Name Bentham Science Publisher	Online ISSN 2212-4047

Recent Patents on Engineering

Cluster and Outlier Analysis for Ground Water Quality Data in the Regions of Kadapa District in Andhra Pradesh

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Abstract