Performance Challenges and Solutions in Big Data Platform Hadoop

Balraj      Singh; Harsh   K   Verma; Vishu      Madaan
doi:10.2174/2666255816666230608165146
Abstract

Background: The present era demands continuous support to bring improvements in executing complex analytics on large-scale data and to work beyond traditional systems.
Objective: The need for processing diverse data types and solutions for different domains of the industry is rising. Such needs increase the requirement for sophisticated techniques and methods to enhance the existing platforms and mechanisms further. It provides an opportunity for the research community to investigate further into the existing systems, find potential issues, and propose new ways to improve the current systems. Hadoop is a popular choice to manage and process Big data. It is an open-source platform and a front-runner in the batch processing of large-scale jobs. The economy associated with the cluster in scaling is low as compared to other platforms. However, this popularity by no means guarantees high performance in all scenarios. With the continuous evolution in data development and industrial requirements, it is imperative to investigate and look into new methods and techniques to bring advancements to the existing system.
Method: A systematic review is represented in this paper to have an insight into the current progress in this field. Research publications from various sources are taken and analyzed. The performance of a cluster largely depends upon the different job processing mechanisms and policies associated with it.
Conclusion: While extensive studies and solutions are proposed, the performance bottlenecks in terms of load balancing, resource utilization, content management, and efficient processing prevail. Not many of the solutions are there on scheduling about the trade-off between different parameters, the process of content splitting and merging is not explored to a large extent and the skew mitigation solutions are more focused on Reduce side of the MapReduce while the Map side is not utilized much for load balancing.
Graphical Abstract

[1]
B. Kitchenham, O. Pearl Brereton, D. Budgen, M. Turner, J. Bailey,  and S. Linkman, "Systematic literature reviews in software engineering – A systematic literature review", Inf. Softw. Technol., vol. 51, no. 1, pp. 7-15, 2009.
 [http://dx.doi.org/10.1016/j.infsof.2008.09.009]
[2]
Q. Chen, D. Zhang, M. Guo, Q. Deng,  and S. Guo, "SAMR: A Self-adaptive MapReduce Scheduling Algorithm in Heterogeneous Environment", 2010 10th IEEE International Conference on Computer and Information Technology, Bradford, UK, 2010, pp. 2736-2743.
[3]
D. Cheng, J. Rao, Y. Guo, C. Jiang,  and X. Zhou, "Improving the performance of heterogeneous mapreduce clusters with adaptive task tuning", IEEE Trans. Parallel Distrib. Syst., vol. 28, no. 3, pp. 774-786, 2017.
 [http://dx.doi.org/10.1109/TPDS.2016.2594765]
[4]
Y. Kwon, M. Balazinska, B. Howe,  and J. Rolia, "SkewTune: mitigating skew in mapreduce applications", In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD '12). Association for Computing Machinery, New York, NY, USA, pp. 25-36, .
 [http://dx.doi.org/10.1145/2213836.2213840]
[5]
Y. Kwon, M. Balazinska, B. Howe,  and J. Rolia, "A study of skew
                    in mapreduce applications", Open Cirrus Summit, vol. 11, no. 8, 2011.
[6]
S. Ibrahim, H. Jin, L. Lu, B. He, G. Antoniu,  and S. Wu, "Maestro: Replica-Aware Map Scheduling for MapReduce", 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), Ottawa, ON, Canada, 2012, pp. 435-442.
[7]
Y. Gao, Y. Zhou, B. Zhou, L. Shi,  and J. Zhang, "Handling data skew in MapReduce cluster by using partition tuning", J. Healthc. Eng., vol. 2017, pp. 1-12, 2017.
 [http://dx.doi.org/10.1155/2017/1425102]
[8]
B. Ye, X. Dong, P. Zheng, Z. Zhu, Q. Liu,  and Z. Wang, "A Delay Scheduling Algorithm Based on History Time in Heterogeneous Environments", 2013 8th ChinaGrid Annual Conference, Los Alamitos, CA, USA, 2013, pp. 86-91.
 [http://dx.doi.org/10.1109/ChinaGrid.2013.19]
[9]
Y. Le, J. Liu, F. Ergün,  and D. Wang, "Online load balancing for MapReduce with skewed data input", IEEE INFOCOM 2014 - IEEE Conference on Computer Communications, Toronto, ON, Canada, 2014, pp. 2004-2012.
[10]
B. Gufler, N. Augsten, A. Reiser,  and A. Kemper, "Handling Data Skew in MapReduce", CLOSER 2011 - International Conference on Cloud Computing and Services Science, 2011, pp. 574-583.
[11]
R. Grandl, M. Chowdhury, A. Akella,  and G. Ananthanarayanan, "Altruistic scheduling in multi-resource clusters", In Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation (OSDI'16), USENIX Association, USA, 2016, pp. 65-80.
[12]
J. Rosen,  and B. Zhao, "Fine-grained micro-tasks for mapreduce
                    skewhandling", White Paper, University of Berkeley, pp. 39-49, 2012.
[13]
M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker,  and I. Stoica, "Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling", Proceedings of the 5th European conference on Computer systems (EuroSys '10). Association for Computing Machinery, New York, NY, USA, 2010, pp. 265-278.
 [http://dx.doi.org/10.1145/1755913.1755940]
[14]
X. Ouyang, H. Zhou, S. Clement, P. Townend,  and J. Xu, "Mitigate data skew caused stragglers through ImKP partition in MapReduce", 2017 IEEE 36th International Performance Computing and Communications Conference (IPCCC), San Diego, CA, USA, 2017, pp. 1-8.
 [http://dx.doi.org/10.1109/PCCC.2017.8280475]
[15]
C.H. Chen, J.W. Lin,  and S.Y. Kuo, "MapReduce scheduling for deadlineconstrained jobs in heterogeneous cloud computing systems", IEEE Trans. Cloud Comput., vol. 6, no. 1, pp. 127-140, 2018.
 [http://dx.doi.org/10.1109/TCC.2015.2474403]
[16]
Z. Tang, W. Ma, K. Li,  and K. Li, "A data skew oriented reduce placement algorithm based on sampling", IEEE Trans. Cloud Comput., vol. 8, no. 4, pp. 1149-1161, 2020.
 [http://dx.doi.org/10.1109/TCC.2016.2607738]
[17]
H.L. Chen,  and Y.S. Shen, "Reducing Imbalance Ratio in MapReduce", 2017 IEEE 7th International Symposium on Cloud and Service Computing (SC2), Kanazawa, Japan,, 2017, pp. 279-282.
 [http://dx.doi.org/10.1109/SC2.2017.54]
[18]
Y. Jiang, Y. Zhu, W. Wu,  and D. Li, "Makespan minimization for MapReduce systems with different servers", Future Gener. Comput. Syst., vol. 67, pp. 13-21, 2017.
 [http://dx.doi.org/10.1016/j.future.2016.07.012]
[19]
T.C. Huang, K.C. Chu, J.H. Lin, G.H. Huang,  and C.K. Shieh, "Workload Alleviation Scheduling Framework to Alleviate Negative Performance Impact of Intermediate Data Skew in Small-Scale MapReduce Cloud", 2018 International Conference on System Science and Engineering (ICSSE), New Taipei, Taiwan, 2018, pp. 1-6.
 [http://dx.doi.org/10.1109/ICSSE.2018.8520003]
[20]
L. Lei, T. Wo,  and C. Hu, "CREST: Towards Fast Speculation of Straggler Tasks in MapReduce", 2011 IEEE 8th International Conference on e-Business Engineering,, 2011 Beijing, China, 2011, pp. 311-316.
[21]
Z. Tang, J. Zhou, K. Li,  and R. Li, "MTSD: A Task Scheduling Algorithm for MapReduce Base on Deadline Constraints", 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum, Shanghai, China, 2012, pp. 2012-2018.
 [http://dx.doi.org/10.1109/IPDPSW.2012.250]
[22]
F. Ahmad, S.T. Chakradhar, A. Raghunathan,  and T.N. Vijaykumar, "Tarazu: optimizing MapReduce on heterogeneous clusters", Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVII). Association for Computing Machinery,. New York, NY, USA, 2012, pp. 61-74.
 [http://dx.doi.org/10.1145/2189750.2150984]
[23]
J. Chen, D. Wang, L. Fu,  and W. Zhao, "An improved small file processing method for HDFS", Int. J. Digit. Content Technol. Its Appl., vol. 6, no. 20, pp. 296-304, 2012.
 [http://dx.doi.org/10.4156/jdcta.vol6.issue20.32]
[24]
L. Shi, X. Li,  and K.L. Tan, "S3: An Efficient Shared Scan Scheduler on MapReduce Framework", 2011 International Conference on Parallel Processing, Taipei, Taiwan, 2011, pp. 325-334.
[25]
M. Hammoud,  and M.F. Sakr, "Locality-Aware Reduce Task Scheduling for MapReduce", 2011 IEEE Third International Conference on Cloud Computing Technology and Science, Athens, Greece, 2011, pp. 570-576.
 [http://dx.doi.org/10.1109/CloudCom.2011.87]
[26]
M. Hammoud, M.S. Rehman,  and M.F. Sakr, "Center-of-Gravity Reduce Task Scheduling to Lower MapReduce Network Traffic", 2012 IEEE Fifth International Conference on Cloud Computing, Honolulu, HI, USA, 2012, pp. 49-58.
 [http://dx.doi.org/10.1109/CLOUD.2012.92]
[27]
Z. Guo, M. Pierce, G. Fox,  and M. Zhou, "Automatic Task Re-organization in MapReduce", 2011 IEEE International Conference on Cluster Computing, Austin, TX, USA, 2011, pp. 335-343.
 [http://dx.doi.org/10.1109/CLUSTER.2011.44]
[28]
W. Wang, K. Zhu, L. Ying, J. Tan,  and L. Zhang, "Maptask scheduling in mapreduce with data locality: Throughput and heavy-traffic optimality", IEEE/ACM Trans. Netw., vol. 24, no. 1, pp. 190-203, 2016.
 [http://dx.doi.org/10.1109/TNET.2014.2362745]
[29]
H. Shen, A. Sarker, L. Yu,  and F. Deng, "Probabilistic Network-Aware Task Placement for MapReduce Scheduling", 2016 IEEE International Conference on Cluster Computing (CLUSTER), Taipei, Taiwan, 2016, pp. 241-250.
 [http://dx.doi.org/10.1109/CLUSTER.2016.48]
[30]
S. Li, S. Hu,  and T. Abdelzaher, "The Packing Server for real-time scheduling of MapReduce workflows", 21st IEEE Real-Time and Embedded Technology and Applications Symposium, Seattle, WA, USA, 2015, pp. 51-62.
 [http://dx.doi.org/10.1109/RTAS.2015.7108416]
[31]
X. Dai,  and B. Bensaou, "Scheduling for response time in Hadoop MapReduce", 2016 IEEE International Conference on Communications (ICC), Kuala Lumpur, Malaysia, 2016, pp. 1-6.
 [http://dx.doi.org/10.1109/ICC.2016.7511252]
[32]
X. Sun, C. He,  and Y. Lu, "ESAMR: An Enhanced Self-Adaptive MapReduce Scheduling Algorithm", 2012 IEEE 18th International Conference on Parallel and Distributed Systems, Singapore, 2012, pp. 148-155.
 [http://dx.doi.org/10.1109/ICPADS.2012.30]
[33]
T. Shu,  and C.Q. Wu, "Energy-Efficient Dynamic Scheduling of Deadline-Constrained MapReduce Workflows", 2017 IEEE 13th International Conference on e-Science (e-Science), Auckland, New Zealand, 2017, pp. 393-402.
 [http://dx.doi.org/10.1109/eScience.2017.18]
[34]
N. Lim, S. Majumdar,  and P. Ashwood-Smith, "MRCP-RM: A technique for resource allocation and scheduling of MapReduce jobs with deadlines", IEEE Trans. Parallel Distrib. Syst., vol. 28, no. 5, pp. 1375-1389, 2017.
 [http://dx.doi.org/10.1109/TPDS.2016.2617324]
[35]
S. Tang, B.S. Lee,  and B. He, "Dynamic job ordering and slot configurations for MapReduce workloads", IEEE Trans. Serv. Comput., vol. 9, no. 1, pp. 4-17, 2016.
 [http://dx.doi.org/10.1109/TSC.2015.2426186]
[36]
Y. Mao, H. Qi, P. Ping,  and X. Li, "FiGMR: A fine-grained MapReduce scheduler in the heterogeneous cloud", 2016 IEEE International Conference on Information and Automation (ICIA), Ningbo, China, 2016, pp. 1956-1963.
 [http://dx.doi.org/10.1109/ICInfA.2016.7832139]
[37]
J. Polo, C. Castillo, D. Carrera, Y. Becerra, I. Whalley, M. Steinder,  and E. Ayguadé, "Resource-Aware Adaptive Scheduling for
                    MapReduce Clusters", Middleware , Lecture Notes in Computer Science, vol. 7049, 2011.
[38]
C. Delimitrou,  and C. Kozyrakis, "Quasar: Resource-efficient and qos-aware cluster management", ACM SIGPLAN Notices., vol. 49, no. 4, pp. 127-144, 2014.
 [http://dx.doi.org/10.1145/2541940.2541941]
[39]
B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A.D. Joseph, R.H. Katz,  and I. Stoica, "Mesos: A platform for fine-grained resource sharing in the data center", University of California: Berkeley, 2011.
[40]
A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker,  and I. Stoica, "Dominant Resource Fairness: Fair Allocation of Multiple Resource Types", NSDI, vol. 11, no. 1, pp. 24-24, 2011.
[41]
D. Cheng, J. Rao, Y. Guo, C. Jiang,  and X. Zhou, "Improving performance of heterogeneous mapreduce clusters with adaptive task tuning", IEEE Trans. Parallel Distrib. Syst., vol. 28, no. 3, pp. 774-786, 2017.
 [http://dx.doi.org/10.1109/TPDS.2016.2594765]
[42]
F. Teng, F. Magoulès, L. Yu,  and T. Li, "A novel real-time scheduling algorithm and performance analysis of a MapReduce-based cloud", J. Supercomput., vol. 69, no. 2, pp. 739-765, 2014.
 [http://dx.doi.org/10.1007/s11227-014-1115-z]
[43]
C. Curino, D.E. Difallah, C. Douglas, S. Krishnan, R. Ramakrishnan,  and S. Rao, "Reservation-based Scheduling: If You're Late Don't Blame Us!", Proceedings of the ACM Symposium on Cloud Computing (SOCC '14). Association for Computing Machinery, New York, NY, USA, 2014, pp. 1-14.
 [http://dx.doi.org/10.1145/2670979.2670981]
[44]
P. Lama,  and X. Zhou, "AROMA: automated resource allocation and configuration of mapreduce environment in the cloud", Proceedings of the 9th international conference on Autonomic computing (ICAC '12). Association for Computing Machinery, New York, NY, USA, 2012, pp. 63-72.
 [http://dx.doi.org/10.1145/2371536.2371547]
[45]
Y. Liang, G. Li, L. Wang,  and Y. Hu, "Dacoop: Accelerating Data- Iterative Applications on Map/Reduce Cluster,", 2011 12th International Conference on Parallel and Distributed Computing, Applica tions and Technologies, Gwangju, Korea (South),, 2011, pp. 207-214.
[46]
H. Lin, X. Ma, J. Archuleta, W.C. Feng, M. Gardner,  and Z. Zhang, "MOON: MapReduce On Opportunistic eNvironments", Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (HPDC '10). Association for Computing Machinery, New York, NY, USA, 2010, pp. 95-106.
 [http://dx.doi.org/10.1145/1851476.1851489]
[47]
M. Sun, H. Zhuang, C. Li, K. Lu,  and X. Zhou, "Scheduling algorithm based on prefetching in MapReduce clusters", Appl. Soft Comput., vol. 38, pp. 1109-1118, 2016.
 [http://dx.doi.org/10.1016/j.asoc.2015.04.039]
[48]
B. Ghit,  and D. Epema, "Tyrex: Size-Based Resource Allocation in MapReduce Frameworks", 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Cartagena, Colombia, 2016, pp. 11-20.
 [http://dx.doi.org/10.1109/CCGrid.2016.82]
[49]
D. Cheng, X. Zhou, P. Lama, M. Ji,  and C. Jiang, "Energy efficiency aware task assignment with dvfs in heterogeneous hadoop clusters", IEEE Trans. Parallel Distrib. Syst., vol. 29, no. 1, pp. 70-82, 2018.
 [http://dx.doi.org/10.1109/TPDS.2017.2745571]
[50]
X. Bu, J. Rao,  and C.Z. Xu, "Interference and locality-aware task scheduling for MapReduce applications in virtual clusters", Proceedings of the 22nd international symposium on Highperformance parallel and distributed computing (HPDC '13). Association for Computing Machinery, New York, NY, USA, 2013, pp. 227-238.
 [http://dx.doi.org/10.1145/2462902.2462904]
[51]
A. Verma, L. Cherkasova,  and R.H. Campbell, "Two Sides of a Coin: Optimizing the Schedule of MapReduce Jobs to Minimize Their Makespan and Improve Cluster Performance", 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, Washington, DC, USA, 2012, pp. 11-18.
[52]
Y. Yao, J. Wang, B. Sheng,  and N. Mi, "Using a Tunable Knob for Reducing Makespan of MapReduce Jobs in a Hadoop Cluster", 2013 IEEE Sixth International Conference on Cloud Computing, Santa Clara, CA, USA, 2013, pp. 1-8.
 [http://dx.doi.org/10.1109/CLOUD.2013.140]
[53]
S. Gupta, C. Fritz, B. Price, R. Hoover, J. Dekleer,  and C. Witteveen, "Throughputscheduler: Learning to schedule on heterogeneous hadoop clusters", 10th International Conference on Autonomic Computing (ICAC 13), San Jose, CA, 2013, pp. 159-165.
[54]
D. Wang,  and W. Zhao, "A task scheduling algorithm for Hadoop platform", J. Comput., vol. 8, no. 4, pp. 929-936, 2013.
[55]
A. Khelifa, T. Hamrouni, R. Mokadem,  and F.B. Charrada, "SLA-aware task scheduling and data replication for enhancing provider profit in clouds", Procedia Comput. Sci., vol. 176, pp. 3143-3152, 2020.
 [http://dx.doi.org/10.1016/j.procs.2020.09.174]
[56]
A.K. Javanmardi,  and S.H. Yaghoubyan, "An architecture for scheduling with the capability of minimum share to heterogeneous Hadoop systems", J. Supercomput., pp. 1-30, 2020.
[57]
H.C. Lu, F.J. Hwang,  and Y.H. Huang, "Parallel and distributed architecture of genetic algorithm on Apache Hadoop and Spark", Appl. Soft Comput., vol. 95, p. 106497, 2020.
 [http://dx.doi.org/10.1016/j.asoc.2020.106497]
[58]
J.B. Hsu, C.F. Lin, Y.C. Chang,  and R.H. Pan, "Using independent resource allocation strategies to solve conflicts of Hadoop distributed architecture in virtualization", Cluster Comput., pp. 1-21, 2020.
[59]
P. Nguyen, T. Simon, M. Halem, D. Chapman,  and Q. Le, "A Hybrid Scheduling Algorithm for Data Intensive Workloads in a MapReduce Environment", 2012 IEEE Fifth International Conference on Utility and Cloud Computing, Chicago, IL, USA, 2012, pp. 161-167.
 [http://dx.doi.org/10.1109/UCC.2012.32]
[60]
N.S. Naik, A. Negi, T.B. Br,  and R. Anitha, "A data locality based scheduler to enhance MapReduce performance in heterogeneous environments", Future Gener. Comput. Syst., vol. 90, pp. 423-434, 2019.
 [http://dx.doi.org/10.1016/j.future.2018.07.043]
[61]
Y. Zhao, W. Wang, D. Meng, Y. Lv, S. Zhang,  and J. Li, "TDWS: A Job Scheduling Algorithm Based on MapReduce", 2012 IEEE Seventh International Conference on Networking, Architecture, and Storage, Xiamen, China, 2012, pp. 313-319.
 [http://dx.doi.org/10.1109/NAS.2012.50]
[62]
K.A. Kumar, V.K. Konishetty, K. Voruganti,  and G.P. Rao, "CASH: context aware scheduler for Hadoop", Proceedings of the International Conference on Advances in Computing, Communications and Informatics (ICACCI '12). Association for Computing Machinery, New York, NY, USA, 2012, pp. 52-61.
[63]
H.H. You, C.C. Yang,  and J.L. Huang, "A load-aware scheduler for MapReduce framework in heterogeneous cloud environments", Proceedings of the 2011 ACM Symposium on Applied Computing (SAC '11). Association for Computing Machinery, New York, NY, USA, 2012, pp. 127-132.
 [http://dx.doi.org/10.1145/1982185.1982218]
[64]
J. Polo, D. Carrera, Y. Becerra, J. Torres, E. Ayguadé, M. Steinder,  and I. Whalley, "Performance-driven task co-scheduling for MapReduce environments", 2010 IEEE Network Operations and Management Symposium - NOMS 2010, Osaka, Japan, 2010, pp. 373-380.
 [http://dx.doi.org/10.1109/NOMS.2010.5488494]
[65]
Q. Zhang, L. Cheng,  and R. Boutaba, "Cloud computing: State-of-the-art and research challenges", J. Internet Serv. Appl., vol. 1, no. 1, pp. 7-18, 2010.
 [http://dx.doi.org/10.1007/s13174-010-0007-6]
[66]
Y. Bu, B. Howe, M. Balazinska,  and M.D. Ernst, "The HaLoop approach to large-scale iterative data analysis", VLDB J., vol. 21, no. 2, pp. 169-190, 2012.
 [http://dx.doi.org/10.1007/s00778-012-0269-7]
[67]
R. Grover,  and M.J. Carey, "Extending Map-Reduce for Efficient Predicate-Based Sampling", 2012 IEEE 28th International Conference on Data Engineering, Arlington, VA, USA, 2012, pp. 486-497.
[68]
A. Verma, L. Cherkasova,  and R.H. Campbell, "ARIA: Automatic resource inference and allocation for mapreduce environments", Proceedings of the 8th ACM international conference on Autonomic computing (ICAC '11). Association for Computing Machinery, New York, NY, USA, 2011, pp. 235-244.
 [http://dx.doi.org/10.1145/1998582.1998637]
[69]
A. Rasooli,  and D.G. Down, "An adaptive scheduling algorithm for dynamic heterogeneous Hadoop systems", An adaptive scheduling algorithm for dynamic heterogeneous Hadoop systems, 2011, pp. 30-44.
[70]
Y. Yao, J. Wang, B. Sheng, C.C. Tan,  and N. Mi, "Self-adjusting slot configurations for homogeneous and heterogeneous hadoop clusters", IEEE Trans. Cloud Comput., vol. 5, no. 2, pp. 344-357, 2017.
 [http://dx.doi.org/10.1109/TCC.2015.2415802]
[71]
Q. Xie, A. Yekkehkhany,  and Y. Lu, "Scheduling with multi-level data locality: Throughput and heavy-traffic optimality", IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications, San Francisco, CA, USA, 2016, pp. 1-9.
[72]
O. Yildiz, S. Ibrahim,  and G. Antoniu, "Enabling fast failure recovery in shared Hadoop clusters: Towards failure-aware scheduling", Future Gener. Comput. Syst., vol. 74, pp. 208-219, 2017.
 [http://dx.doi.org/10.1016/j.future.2016.02.015]
[73]
Y. Jiang,  and Y. Zhu, "BAR: An Efficient Data Locality Driven Task Scheduling Algorithm for Cloud Computing", 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Newport Beach, CA, USA, 2011, pp. 295-304.
[74]
M. Pastorelli, D. Carra, M. DellAmico,  and P. Michiardi, "HFSP: Bringing size-based scheduling to hadoop", IEEE Trans. Cloud Comput., vol. 5, no. 1, pp. 43-56, 2017.
 [http://dx.doi.org/10.1109/TCC.2015.2396056]
[75]
I.A.T. Hashem, N.B. Anuar, M. Marjani, A. Gani, A.K. Sangaiah,  and A.K. Sakariyah, "Multi-objective scheduling of MapReduce jobs in big data processing", Multimedia Tools Appl., vol. 77, no. 8, pp. 9979-9994, 2018.
 [http://dx.doi.org/10.1007/s11042-017-4685-y]
[76]
M. Zaharia, A. Konwinski, A.D. Joseph, R.H. Katz,  and I. Stoica, "Improving MapReduce performance in heterogeneous environments", OSDI, vol. 8, no. 4, p. 7, 2008.
[77]
W. Wang, C. Feng, B. Li,  and B. Liang, "On the Fairness- Efficiency Tradeoff for Packet Processing with Multiple Resources", Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies (CoNEXT '14). Association for Computing Machinery, New York, NY, USA, 2014, pp. 235-248.
 [http://dx.doi.org/10.1145/2674005.2675010]
[78]
Y. Yao, J. Tai, B. Sheng,  and N. Mi, "LsPS: A job size-based scheduler for efficient task assignments in Hadoop", IEEE Trans. Cloud Comput., vol. 3, no. 4, pp. 411-424, 2015.
 [http://dx.doi.org/10.1109/TCC.2014.2338291]
[79]
N. Zacheilas,  and V. Kalogeraki, "A Pareto-based scheduler for exploring costperformance trade-offs for MapReduce workloads", EURASIP J. Embed. Syst., vol. 29, pp. 1-24, 2017.
[80]
M.M. Islam, S. Morshed,  and P. Goswami, "Cloud computing: A survey on its limitations and potential solutions", Int. J. Comput. Sci. Appl., vol. 10, no. 4, p. 159, 2013.
[81]
Y. Tao, Q. Zhang, L. Shi,  and P. Chen, "Job Scheduling Optimization for Multi-user MapReduce Clusters", 2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming, Tianjin, China, 2011, pp. 213-217.
 [http://dx.doi.org/10.1109/PAAP.2011.33]
[82]
Z. Niu, S. Tang,  and B. He, "Gemini: An Adaptive Performance- Fairness Scheduler for Data-Intensive Cluster Computing,", 2015 IEEE 7th International Conference on Cloud Computing Technology and Science (CloudCom), Vancouver, BC, Canada,, 2015, pp. 66-73.
 [http://dx.doi.org/10.1109/CloudCom.2015.52]
[83]
Y.C. Kao,  and Y.S. Chen, "Data-locality-aware mapreduce real-time scheduling framework", J. Syst. Softw., vol. 112, pp. 65-77, 2016.
 [http://dx.doi.org/10.1016/j.jss.2015.11.001]
[84]
A. Yekkehkhany, A. Hojjati,  and M.H. Hajiesmaili, "GB-PANDAS: Throughput and heavy-traffic optimality analysis for affinity scheduling", Perform. Eval. Rev., vol. 45, no. 3, pp. 2-14, 2018.
 [http://dx.doi.org/10.1145/3199524.3199528]
[85]
N. Tagasovska,  and P. Andritsos, "Distributed clustering of categorical data using the information bottleneck framework", Inf. Syst., vol. 72, no. 1, pp. 161-178, 2017.
 [http://dx.doi.org/10.1016/j.is.2017.10.006]
[86]
X. Yao, M.F. Mokbel, L. Alarabi, A. Eldawy, J. Yang, W. Yun, L. Li, S. Ye,  and D. Zhu, "Spatial coding-based approach for partitioning big spatial data in Hadoop", Comput. Geosci., vol. 106, pp. 60-67, 2017.
 [http://dx.doi.org/10.1016/j.cageo.2017.05.014]
[87]
X. Wang, Z. Lu, J. Wu, T. Zhao,  and P. Hung, "In STechAH: An Autoscaling Scheme for Hadoop in the Private Cloud", 2015 IEEE International Conference on Services Computing, New York, NY, USA, 2015, pp. 395-402.
 [http://dx.doi.org/10.1109/SCC.2015.61]
[88]
S. Bende,  and R. Shedge, "Dealing with small files problem in hadoop distributed file system", Procedia Comput. Sci., vol. 79, pp. 1001-1012, 2016.
 [http://dx.doi.org/10.1016/j.procs.2016.03.127]
[89]
Q. Chen, C. Liu,  and Z. Xiao, "Improving MapReduce performance using smart speculative execution strategy", IEEE Trans. Comput., vol. 63, no. 4, pp. 954-967, 2014.
 [http://dx.doi.org/10.1109/TC.2013.15]
[90]
K.P. Jayakar,  and Y.B. Gurav, "Managing small size files through indexing in extended Hadoop file system", Int. J. Adv. Res. Comput. Sci. Manag. Stud., vol. 2, no. 8, pp. 161-167, 2014.
[91]
D. Diaz-Sanchez, F. Almenares, A. Marin, D. Proserpio,  and I. Telemática, "Media Cloud: Sharing contents in the large", 2011 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 2011, pp. 227-228.
 [http://dx.doi.org/10.1109/ICCE.2011.5722554]
[92]
M. Kim, S. Han, Y. Cui, H. Lee,  and C. Jeong, "A hadoop-based multimedia transcoding system for processing social media in the PAAS platform of SMCCSE", Trans. Internet Inf. Syst., vol. 6, no. 11, pp. 2827-2848, 2012.
 [http://dx.doi.org/10.3837/tiis.2012.10.005]
[93]
J. Shafer, S. Rixner,  and A.L. Cox, "The Hadoop distributed filesystem: Balancing portability and performance", 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), White Plains, NY, USA, 2010, pp. 122-133.
 [http://dx.doi.org/10.1109/ISPASS.2010.5452045]
[94]
M.Y. Eltabakh, Y. Tian, F. Özcan, R. Gemulla, A. Krettek,  and J. McPherson, "CoHadoop: Flexible data placement and its exploitation in Hadoop", Proceedings of the VLDB Endowment---, vol. 4, no. 9, pp. 575-585, 2011.
 [http://dx.doi.org/10.14778/2002938.2002943]
[95]
Y. Wang, X. Que, W. Yu, D. Goldenberg,  and D. Sehgal, "Hadoop acceleration through network levitated merge", SC ’11: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, Seattle, WA, USA, 2011, pp. 1-10.
[96]
T. Nykiel, M. Potamias, C. Mishra, G. Kollios,  and N. Koudas, "MRShare: sharing across multiple queries in MapReduce", Proc. VLDB Endow, vol. 3, no. 1–2, pp. 494-505, 2010.
 [http://dx.doi.org/10.14778/1920841.1920906]
[97]
 Diaz-Sanchez, D., Almenares, F., Marin, A., Proserpio, D., &
Telemática, I. (2011, January). Media Cloud: Sharing contents in
the large. In 2011 IEEE International Conference on Consumer
Electronics (ICCE) (pp. 227-228). IEEE
[98]
 Kim, M., Han, S., Cui, Y., Lee, H. and Jeong, C. (2012). A Hadoop-
based multimedia transcoding system for processing social
media in the PAAS platform of SMCCSE, KSII Transactions on
Internet and Information Systems, 6(11), 2827–2848
[99]
J. Shafer, S. Rixner,  and A. L. Cox, "The Hadoop distributed filesystem: Balancing portability and performance.", IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), 2010, pp. 122-133.
[100]
M. Y. Eltabakh, Y. Tian, F. Özcan, R. Gemulla, A. Krettek,  and J. McPherson, "CoHadoop: flexible data placement and its exploitation in Hadoop", Proceedings of the VLDB Endowment, vol. 4, no. 9, pp. 575-585, 2011.
[101]
Y. Wang, X. Que, W. Yu, D. Goldenberg,  and D. Sehgal, "Hadoop acceleration through network levitated merge.", Proceedings of the 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, November 2011, pp. 1-10.
[102]
T. Nykiel, M. Potamias, C. Mishra, G. Kollios,  and N. Koudas, "MRShare: sharing across multiple queries in MapReduce", Proceedings of the VLDB Endowment, vol. 3, no. 1-2, pp. 494-505. 130, 2010.
Rights & Permissions Print Cite
Journal Information
For Authors
For Editors
For Reviewers
Explore Articles
Open Access
Open Access Articles
For Visitors
DOI https://dx.doi.org/10.2174/2666255816666230608165146	Print ISSN 2666-2558
Publisher Name Bentham Science Publisher	Online ISSN 2666-2566
Recent Advances in Computer Science and Communications

Performance Challenges and Solutions in Big Data Platform Hadoop

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Abstract