Preface
Page: ii-iii (2)
Author: Krishna Kumar Mohbey, Arvind Pandey and Dharmendra Singh Rajput
DOI: 10.2174/9789811490491120010002
List of Contributors
Page: iv-iv (1)
Author: Krishna Kumar Mohbey, Arvind Pandey and Dharmendra Singh Rajput
DOI: 10.2174/9789811490491120010003
Data Analytics on Various Domains with Categorized Machine Learning Algorithms
Page: 1-18 (18)
Author: R. Suguna and R. Uma Rani
DOI: 10.2174/9789811490491120010004
PDF Price: $15
Abstract
Data Analytics is an emerging area for analyzing various kinds of data. Predictive analytics is one of the essential techniques under data analytics, which is used to predict the data gainfully with machine learning algorithms. There are various types of machine learning algorithms available coming under the umbrella of supervised and unsupervised methods, which give suitable and better performance on data along with various analytics methods. Regression is a useful and familiar statistical method to analyze the data fruitfully. Analysis of medical data is most helpful to both patients as well as the experts to identify and rectify the problems to overcome future problems. Autism is a brain nerve disorder that is increasing in the children by birth due to some most chemical food items and some side effects of other treatments and various causes. Logistic Regression is one of the supervised machine learning algorithms which can operate the dataset of binary data that is 0 and 1.
Agriculture is one of the primary data which should be considered and analyzed for saving the future generation. Rainfall is a more elementary requirement for the global level and also countries which are having backbone as agriculture. Due to the topography, geography, political, and other socio-economic factors, agriculture is affected. Thus, the demand for food and food products is intensifying. Especially crop production is depending upon the rainfall, so, prediction of rainfall and crop production is essential. Analysis of social crime relevant data is indispensable because analytics can produce better results, which leads to reducing the crime level. Unexpectedly child abuse is increasing day by day in India. Linear regression is the supervised machine learning algorithm to predict quantitative data efficiently.
This chapter is roofed with various datasets such as autism from medical, rainfall, and crop production from agriculture and child abuse data from the social domain. Predictive analytics is one of the analytical models which predict the data for the future era. Supervised machine learning algorithms such as linear and logistic regression will be used to perform the prediction.
Quantifying Players’ Monopoly in a Cricket Team: An Application of Bootstrap Sampling
Page: 19-30 (12)
Author: Bireshwar Bhattacharjee and Dibyojyoti Bhattacharjee
DOI: 10.2174/9789811490491120010005
PDF Price: $15
Abstract
Cricket is a bat-and-ball game. It is played between two teams, each team consisting of 11 players. In limited-overs cricket, the teams play for a fixed number of overs, usually 50 or 20. At the end of the match, the team which scores the most number of runs in those limited-overs win the match. In this paper, taking the data from ICC Cricket World Cup 2019, an attempt is made to identify the type of competition that exists between the players, i.e., batsmen and bowlers using the Herfindahl-Hirschman Index (HHI). This index is a statistical device used for estimating the degree of concentration in a particular market. A team is said to have the monopoly in scoring runs if the bulk of their scoring in the tournament is done by a few batsmen only while the other batsmen made an insignificant contribution with the bat. Likewise, a team is said to have bowler's monopoly, if the majority of wickets is taken by few bowlers of the team while the other bowlers could dismiss an insignificant number of opponent batsmen in the tournament. Applying bootstrap sampling, the teams are classified into three groups viz. monopoly, moderately competitive, and perfectly competitive. From the analysis, it is found that India, Australia, Bangladesh, and New Zealand are the teams where a monopoly exists, i.e., most numbers of runs are scored by two or three batsmen. All other teams except Pakistan, i.e., Afghanistan, West Indies, Sri Lanka, South Africa, England, are categorized as having perfect competition in the task of run-scoring. On the other hand, in the case of bowlers, Australia, Pakistan, Sri Lanka, and Bangladesh enjoys monopolistic nature in bowling. All other teams such as India, New Zealand, West Indies, Afghanistan, South Africa, and England are categorized as having Perfect Competition in the task of taking wickets. The study finds that out of the four semifinalists, three of the teams enjoy a monopoly of batsmen, and three teams enjoy the perfect competition of bowlers. Thus, the work concludes that the monopoly of batsmen in a cricket team and perfect competition amongst bowlers have a role to play in the performance of teams in the tournament.
On Mean Estimation Using a Generalized Class of Chain Type Estimator under Successive Sampling
Page: 31-46 (16)
Author: Shashi Bhushan, Nishi Rastogi and Shailja Pandey
DOI: 10.2174/9789811490491120010006
PDF Price: $15
Abstract
The present paper comprises a significantly generalized class of chain type estimators to estimate the population means on the current occasion under the framework of successive sampling based on auxiliary information on both the occasion. The proposed generalized class constitutes two renowned chain type classes proposed by Singh and Vishwakarma [1, 2]. As its particular case, an improvement over their notion, with some eased regularity conditions, is proposed by us, which consists of chain type regression estimators additionally to chain type ratio estimators. The construction of the proposed class is fruitful in the sense of constructing the chain type classes of estimators in the realm of successive sampling. In terms of efficiency, we provide a comparative study of the proposed class oversample mean estimator, Cochran's estimator [3], Sukhatme et al. estimator [4] and Singh’s estimator [5]. A numerical illustration is demonstrated in support of the proposed class.
Log Type Estimators of Population Mean Under Ranked Set Sampling
Page: 47-74 (28)
Author: Shashi Bhushan and Anoop Kumar
DOI: 10.2174/9789811490491120010007
PDF Price: $15
Abstract
This paper considers some log type and regression cum log type class of estimators under ranked set sampling. The suggested class of estimators are found to be better than most of the estimators proposed to date and equally efficient to the usual regression estimator under ranked set sampling. The theoretical findings have been furnished with a simulation study carried out over some artificially generated symmetric and asymmetric populations. Also, following McIntyre [1], Dell [2], and Dell and Clutter [3], we have investigated the effect of skewness and kurtosis over the efficiency of the proposed class of estimators.
Analysis of Bivariate Survival Data using Shared Inverse Gaussian Frailty Models: A Bayesian Approach
Page: 75-88 (14)
Author: Arvind Pandey, Shashi Bhushan, Lalpawimawha and Shikhar Tyagi
DOI: 10.2174/9789811490491120010008
PDF Price: $15
Abstract
Frailty models are used in the survival analysis to account for the unobserved heterogeneity in individual risks of disease and death. The shared frailty models have been suggested to analyze the bivariate data on related survival times (e.g., matched pairs experiments, twin or family data). This paper introduces the shared Inverse Gaussian (IG) frailty model with baseline distribution as Weibull exponential, Lomax, and Logistic exponential. We introduce the Bayesian estimation procedure using Markov Chain Monte Carlo (MCMC) technique to estimate the parameters involved in these models. We present a simulation study to compare the actual values of the parameters with the estimated values. Also, we apply these models to a real-life bivariate survival data set of McGilchrist and Aisbett [1] related to the kidney infection data, and a better model is suggested for the data.
An Efficient Approach for Weblog Analysis using Machine Learning Techniques
Page: 89-98 (10)
Author: Brijesh Bakariya
DOI: 10.2174/9789811490491120010009
PDF Price: $15
Abstract
Information on the internet is rapidly growing day by day. Some of the information may be related to the person or not. The amount of data on the internet is very vast, and it is tough to store and manage. So the organization of massive amounts of data has also produced a problem in data accessing. The rapid expansion of the web has provided an excellent opportunity to analyze web access logs. Data mining techniques were applied for extracting relevant information from a massive collection of data, but now it is a traditional technique. The web data is either unstructured or semi-structured. So there is not any direct method in data mining for it. Here Python programming language and Machine Learning (ML) approach is used from handling such types of data. In this paper, we are analyzing weblog data through python. This approach is useful for time and space point of view because because python has many libraries for data analysis.
An Epidemic Analysis of COVID-19 using Exploratory Data Analysis Approach
Page: 99-111 (13)
Author: Chemmalar Selvi G. and Lakshmi Priya G. G.
DOI: 10.2174/9789811490491120010010
PDF Price: $15
Abstract
The outbreak of data has empowered the growth of the business by adding business values from the available digital information in recent days. Data is elicited from a diverse source of information systems to bring out certain kinds of meaningful inferences, which serve closer in promoting the business values. The approach used in studying such vital data characteristics and analyzing the data thoroughly is the Exploratory Data Analysis (EDA), which is the most critical and important phase of data analysis. The main objective of the EDA process is to uncover the hidden facts of massive data and discover the meaningful patterns of information which impact the business value. At this vantage point, the EDA can be generalized into two methods, namely graphical and non-graphical EDA’s. The graphical EDA is the quick and powerful technique that visualizes the data summary in a graphical or pictorial representation. The graphical visualization of the data displays the correlation and distribution of data before even attempting the statistical techniques over it. On the other hand, the non-graphical EDA presents the statistical evaluation of data while pursuing its’ key characteristics and statistical summary. Based on the nature of attributes, the above two methods are further divided as Univariate, Bivariate, and Multivariate EDA processes. The univariate EDA shows the statistical summary of an individual attribute in the raw dataset. Whereas, the bivariate EDA demonstrates the correlation or interdependencies between actual and target attributes; the multivariate EDA is performed to identify the interactions among more than two attributes. Hence, the EDA techniques are used to clean, preprocess, and visualize the data to draw the conclusions required to solve the business problems. Thus, in this chapter, a comprehensive synopsis of different tools and techniques can be applied with a suitable programming framework during the initial phase of the EDA process. As an illustration, to make it easier and understandable, the aforementioned EDA techniques are explained with appropriate theoretical concepts along with a suitable case study.
Subject Index
Page: 112-112 (1)
Author: Krishna Kumar Mohbey, Arvind Pandey and Dharmendra Singh Rajput
DOI: 10.2174/9789811490491120010011
Introduction
This book presents a selection of the latest and representative developments in predictive analytics using big data technologies. It focuses on some critical aspects of big data and machine learning and provides studies for readers. The chapters address a comprehensive range of advanced data technologies used for statistical modeling towards predictive analytics. Topics included in this book include: - Categorized machine learning algorithms - Player monopoly in cricket teams. - Chain type estimators - Log type estimators - Bivariate survival data using shared inverse Gaussian frailty models - Weblog analysis - COVID-19 epidemiology This reference book will be of significant benefit to the predictive analytics community as a useful guide of the latest research in this emerging field.