Abstract
With the increase in high volume, velocity, and variety of data, the
traditional data analysis approaches are not adequate to handle diverse analysis
challenges. Traditionally, a data warehouse is being used which is an integrated
repository from various sources used for management and decision-making in business.
Data is already in a transformed and structured format stored in a costly but reliable
storage device. The data warehouse does not include all the data that may be not
required at the time of construction of the data warehouse. With the advent of big data
and to handle the data silos problem, the concept of Data Lake is introduced to handle
data analysis. Data lakes have not replaced the data warehouse but rather complement
it. In this chapter, firstly Data Lake is introduced and compared with predecessor
technologies, then various tools and techniques are discussed to implement Data Lake.