Generic placeholder image

Recent Patents on Engineering

Editor-in-Chief

ISSN (Print): 1872-2121
ISSN (Online): 2212-4047

Research Article

Evaluation of Reliability in Request-segregated Clouds through Remiadiations Predicting Hardware Failure Scheme

Author(s): Rohit Sharma*, Vibhash Yadav and Raghuraj Singh

Volume 19, Issue 1, 2025

Published on: 16 October, 2023

Article ID: e161023222245 Pages: 16

DOI: 10.2174/0118722121260903231009104614

Price: $65

Abstract

Background: Cloud services have become a popular approach for offering efficient services for a wide range of activities. Predicting hardware failures in a cloud data center can minimize downtime and make the system more reliable and fault-tolerant.

Objective: This research aims to analyze a predictive hardware failure model based on machine learning that anticipates the required remediations for undiagnosed failures in a cloud computing system serving multiclass requests.

Methods: The model is tested on a carefully designed cloud data center that categorizes incoming requests as web, compute, storage, and dedicated server requests. To demonstrate improved reliability, a carefully designed test case is run on ReliaCloud-NS, which is a simulator for creating a CCS and computing its reliability.

Results: The work found that using this model considerably enhanced the reliability of cloud computing systems when compared to not using the model.

Conclusion: Although various estimation methods are patented to evaluate the system reliability of a cloud computing network, the emphasis of this study was mostly on improving the reliability of request-segregated clouds upon failing hardware resources like CPU, memory, bandwidth, and hard disc. Moreover, the prediction model might potentially be expanded to other system resources such as GPUs, software, and database packages.

[1]
Y. Kim, O. Kwon, and K. Lee, "Machine learning framework for accurate diagnosis and remediation of hardware failures in cloud computing systems", IEEE Trans. Cloud Comput., vol. 6, no. 2, pp. 461-472, 2018.
[http://dx.doi.org/10.1109/TCC.2016.2628139]
[2]
S. Lee, S. Lee, and J. Kim, "Reliability evaluation of request segregated cloud computing systems using ReliaCloud-NS", Future Gener. Comput. Syst., vol. 111, pp. 862-874, 2020.
[http://dx.doi.org/10.1016/j.future.2020.06.004]
[3]
J. Wang, and Z. Zhou, "An efficient hardware failure prediction method based on machine learning in cloud computing", IEEE Access, vol. 7, pp. 106159-106170, 2019.
[http://dx.doi.org/10.1109/ACCESS.2019.2938519]
[4]
Z. Zhou, J. Wang, and X. Luo, "Rule-based hardware failure diagnosis and remediation for cloud computing systems", Future Gener. Comput. Syst., vol. 111, pp. 740-750, 2020.
[http://dx.doi.org/10.1016/j.future.2020.05.003]
[5]
Z. Zhou, J. Wang, and Y. Xie, "A hardware replacement strategy for improving reliability of cloud computing systems", Future Gener. Comput. Syst., vol. 86, pp. 283-292, 2018.
[http://dx.doi.org/10.1016/j.future.2018.03.027]
[6]
J. Li, K. Zhang, X. Wei, T. Huang, and X. Xie, "A Bayesian network-based reliability model for cloud computing systems", Future Gener. Comput. Syst., vol. 118, pp. 298-310, 2021.
[http://dx.doi.org/10.1016/j.future.2020.10.003]
[7]
L. Yu, J. Kim, and Y. Zou, "ReliaBolt: Enhancing cloud reliability via preemptive hardware replacement", IEEE Trans. Comput., vol. 70, no. 5, pp. 696-708, 2021.
[8]
J. Li, B. Li, and Q. Zhang, "A reliability-aware resource allocation strategy for cloud computing systems based on Bayesian network", IEEE Access, vol. 9, pp. 87292-87303, 2021.
[http://dx.doi.org/10.1109/ACCESS.2021.3089926]
[9]
R. Sharma, and R. Singh, "Reliability based micro-economic cost model for cloud computing systems", In 2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), Greater Noida, India 19-20 Feb,, 2021, pp. 293-297
[http://dx.doi.org/10.1109/ICCCIS51004.2021.9397210]
[10]
Z. Niu, and X. Zhang, "A reliability model for cloud computing systems considering the effects of data center environment", Future Gener. Comput. Syst., vol. 108, pp. 69-78, 2020.
[http://dx.doi.org/10.1016/j.future.2020.02.040]
[11]
X. Zhang, L. Yang, and S. Xu, "Deep learning-based reliability modeling for cloud computing systems", IEEE Trans. Cloud Comput., vol. 8, no. 4, pp. 1071-1084, 2020.
[http://dx.doi.org/10.1109/TCC.2019.2913129]
[12]
J. Jiang, J. Hu, and S. Xu, "Markov decision process-based reliability modeling for cloud computing systems", IEEE Trans. Cloud Comput., vol. 8, no. 5, pp. 1185-1198, 2020.
[http://dx.doi.org/10.1109/TCC.2019.2918926]
[13]
H. Kim, T. Abdelzaher, and D. Kim, "Predicting cloud system reliability with machine learning", IEEE Trans. Parallel Distrib. Syst., vol. 30, no. 5, pp. 1055-1069, 2019.
[http://dx.doi.org/10.1109/TPDS.2018.2887147]
[14]
X. Ma, and Y. Yang, "Reliability modeling for cloud computing systems based on fuzzy logic", IEEE Trans. Cloud Comput., vol. 7, no. 1, pp. 59-70, 2019.
[http://dx.doi.org/10.1109/TCC.2017.2706371]
[15]
J. Gao, Q. Li, and K. Li, "Markov decision process-based reliability modeling for cloud computing systems", IEEE Trans. Cloud Comput., vol. 7, no. 4, pp. 991-1004, 2019.
[http://dx.doi.org/10.1109/TCC.2018.2830979]
[16]
Y. Xu, J. Wang, and J. Xu, "A probabilistic model for reliability assessment of cloud computing systems considering software failures", IEEE Trans. Cloud Comput., vol. 6, no. 4, pp. 1107-1120, 2018.
[http://dx.doi.org/10.1109/TCC.2017.2774083]
[17]
S. Wang, X. Zhang, and X. Zhou, "Dynamic fault tree-based reliability modeling for cloud computing systems", IEEE Trans. Cloud Comput., vol. 6, no. 2, pp. 570-583, 2018.
[http://dx.doi.org/10.1109/TCC.2017.2713038]
[18]
C. Li, K. Li, and H. Zhu, "Reliability-aware task scheduling for cloud computing systems", IEEE Trans. Serv. Comput., vol. 11, no. 6, pp. 1007-1018, 2018.
[http://dx.doi.org/10.1109/TSC.2016.2639484]
[19]
J. Ma, D. Huang, and Q. Shen, "A reliability model for cloud computing systems based on Bayesian networks", IEEE Trans. Cloud Comput., vol. 5, no. 1, pp. 92-105, 2017.
[http://dx.doi.org/10.1109/TCC.2015.2493078]
[20]
M. Farshchi, H.S. Kim, R.K. Gupta, and P. Mohapatra, "Reliability modeling of cloud computing systems using Bayesian networks", IEEE Trans. Cloud Comput., vol. 5, no. 4, pp. 684-695, 2017.
[http://dx.doi.org/10.1109/TCC.2016.2618198]
[21]
C. Liu, J. Ren, X. Ma, Y. Cui, and X. Li, "Reliability-aware load balancing for cloud computing systems", IEEE Trans. Cloud Comput., vol. 5, no. 4, pp. 751-762, 2017.
[http://dx.doi.org/10.1109/TCC.2016.2606333]
[22]
Q. Li, Y. Yang, Z. Li, and Y. Xia, "Reliability modeling of cloud storage systems using Bayesian networks", IEEE Trans. Cloud Comput., vol. 5, no. 4, pp. 696-707, 2017.
[http://dx.doi.org/10.1109/TCC.2016.2618152]
[23]
E.B. Nightingale, R. Van Renesse, and T. Dumitras, "Failure trends in a large disk drive population", In Proceedings of the 9th USENIX Conference on File and Storage Technologies (FAST ’11), 2011, pp. 17-30
[24]
Y. Kim, K. Nam, S. Kim, and T. Chung, "Analysis of storage system failures and recoveries in a large-scale production environment", In Proceedings of the 6th International Conference on Autonomic Computing (ICAC ’09), 2009, pp. 13-22
[25]
K. Vishwanath, and N. Nagappan, "Characterizing cloud computing hardware reliability", In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC ’10), 2010, pp. 193-204
[http://dx.doi.org/10.1145/1807128.1807161]
[26]
C. Gill, R. Power, and A. Rashid, "Understanding modern device drivers", In Proceedings of the 9th USENIX Conference on File and Storage Technologies (FAST ’11), 2011, pp. 279-292
[27]
R. Birke, R. Broberg, and M. Johansson, "On the analysis of hardware failures in large-scale distributed systems", In Proceedings of the 5th ACM/SPEC International Conference on Performance Engineering (ICPE ’12), 2012, pp. 19-30
[http://dx.doi.org/10.1145/2188286.2188294]
[28]
N. Davis, A. Rezgui, H. Soliman, S. Manzanares, and M. Coates, "FailureSim: A system for predicting hardware failures in cloud data centers using neural networks , IEEE 10th International Conference on Cloud Computing (CLOUD), Honololu, HI, USA, 25-30 June, 2017",
[http://dx.doi.org/10.1109/CLOUD.2017.75]
[29]
M. Ghasemazar, and M. Soltanifar, "Reliability evaluation of cloud computing systems based on bayesian networks", J. Grid Comput., vol. 13, no. 1, pp. 113-124, 2015.
[30]
M. Almulla, M. Al-Bayati, A. Al-Nemrat, and R. Al-Ekrami, "ReliaBolt: A machine learning framework for enhancing cloud system reliability", In Proceedings of the 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), Las Vegas, NV, USA, 2016, pp. 332-337
[31]
Z. Chen, Z. Lin, and H. Chen, "A novel reliability analysis method of cloud computing system based on dynamic bayesian networks", J. Grid Comput., vol. 14, no. 2, pp. 181-193, 2016.
[32]
N. Kumar, S.K. Rath, and S.S. Pattnaik, "Reliability analysis of cloud computing systems using event trees", In 2015 International Conference on Computational Intelligence and Networks (CINE), Bhubaneswar, 2015, pp. 36-40
[http://dx.doi.org/10.1109/CINE.2015.8]
[33]
M. Ghasemazar, and M. Soltanifar, "Reliability-aware task scheduling in cloud computing systems", Int. J. Grid High Perform. Comput., vol. 8, no. 3, pp. 49-63, 2016.
[http://dx.doi.org/10.4018/IJGHPC.2016070104]
[34]
S. Wang, C. Wang, Y. Jia, and X. Liu, "Reliability-aware load balancing in cloud computing systems", IEEE Trans. Parallel Distrib. Syst., vol. 26, no. 7, pp. 1870-1881, 2015.
[http://dx.doi.org/10.1109/TPDS.2014.2341601]
[35]
Z. Zhang, Z. Wang, and Y. Li, "A bayesian network reliability model for cloud storage systems", IEEE Trans. Cloud Comput., vol. 3, no. 2, pp. 214-223, 2015.
[http://dx.doi.org/10.1109/TCC.2014.2350052]
[36]
N. Suthar, and V.V. Phoha, "Reliability evaluation of cloud computing systems using Markov decision processes", IEEE Trans. Cloud Comput., vol. 4, no. 2, pp. 139-151, 2016.
[http://dx.doi.org/10.1109/TCC.2015.2478923]
[37]
S.K. Garg, and R.K. Dhawan, "Reliability modeling and evaluation of cloud computing systems: A survey", J. Netw. Comput. Appl., vol. 79, pp. 132-150, 2017.
[http://dx.doi.org/10.1016/j.jnca.2016.11.020]
[38]
S. Wang, Y. Liu, Y. Xiong, Y. Gao, and H. Chen, "A machine learning approach to predicting the reliability of cloud computing systems", IEEE Trans. Cloud Comput., vol. 5, no. 2, pp. 292-304, 2017.
[http://dx.doi.org/10.1109/TCC.2016.2516818]
[39]
J. Zhang, X. Liu, Y. Wen, Y. Liu, and M. Xu, "Reliability modeling and analysis of cloud computing systems: A systematic literature review", IEEE Access, vol. 6, pp. 23612-23625, 2018.
[http://dx.doi.org/10.1109/ACCESS.2018.2835058]
[40]
M.A. Islam, "Reliability-aware resource allocation in cloud computing systems", IEEE Trans. Serv. Comput., vol. 11, no. 4, pp. 698-710, 2018.
[http://dx.doi.org/10.1109/TSC.2016.2581231]
[41]
B.C. Tak, and S.K. Jena, "Reliability modeling of cloud storage systems: A bayesian network approach", IEEE Trans. Cloud Comput., vol. 6, no. 1, pp. 70-81, 2018.
[http://dx.doi.org/10.1109/TCC.2016.2617292]
[42]
R.K. Singh, S.R. Biradar, and M.K. Tiwari, "Reliability analysis of cloud computing systems using fault tree analysis and fuzzy logic", IEEE Trans. Cloud Comput., vol. 3, no. 2, pp. 168-176, 2015.
[http://dx.doi.org/10.1109/TCC.2014.2360772]
[43]
Y.K. Lin, and P.C. Chang, "Estimation method to evaluate a system reliability of a cloud computing network", U.S. Patent 20120023372A1 2012..

Rights & Permissions Print Cite
© 2025 Bentham Science Publishers | Privacy Policy