A Method based on Multi-agent Systems and Passive Replication
Technique for Predicting Failures in Cloud Computing

Ouided      Hioual; Ouassila      Hioual; Sofiane   Mounine   Hemam; Lyes      Maifi

doi:10.2174/2666255815666220429092324

Abstract

Background: Cloud computing refers to the computing capacities of remote computers, where the user has considerable computing power without having to own power units. The probability of failures, which can occur during execution, increases in the number of nodes. Since failures cannot be completely avoided, one solution is to use failure tolerance mechanisms. Predicting failures has become a major task for engineers and software developers, as failure increases resource usage costs.

Objectives and Methods: This paper presents a hybrid method of predicting failures in a cloudcomputing environment based on the passive replication technique and multi-agent systems. It detects failures, improves the average response time and minimizes lost time. This method makes it possible to efficiently and transparently guarantee the continuity of cloud computing services in the presence of failures.

Results: The results show that the proposed method performs well in the presence of failures, improves the response time and minimizes the additional costs caused by the failures.

Conclusion: This paper proposes a hybrid method of predicting failures in cloud-computing based on the passive replication technique and multi-agent systems to detect failures and minimize lost time. The replication technique works by duplicating some system components, which are deployed simultaneously across different resources. This technique aims to make the system robust, increase availability and guarantee the execution of jobs. In addition, it is suitable for long-running tasks.

Keywords: Failure prediction, cloud-computing, multi-agent system, replication, controller agent, replication agent.

Graphical Abstract

[1]
S. Mustafa, "Sla-aware best fit decreasing techniques for workload consolidation in clouds", IEEE Access, vol. 7, pp. 135256-135267, 2019.
 [http://dx.doi.org/10.1109/ACCESS.2019.2941145]
[2]
S. Chhabra,  and V.S. Dixit, "Cloud computing: State of the art and security issues", ACM SIGSOFT Softw. Eng. Notes, vol. 40, no. 2, pp. 1-11, 2015.
 [http://dx.doi.org/10.1145/2735399.2735405]
[3]
P. Kumari,  and P. Kaur, "A survey of fault tolerance in cloud computing", J. King Saud University Comput. Inf. Sci., vol. 33, no. 10, pp. 1159-1176, 2021.
 [http://dx.doi.org/10.1016/j.jksuci.2018.09.021]
[4]
A. Tchana, L. Broto,  and D. Hagimont, "Approaches to cloud computing fault tolerance", In 2012 International Conference on Computer, Information and Telecommunication Systems (CITS) May 14-16, 2012, Amman, Jordan, 2012, pp. 1-6 
[5]
R. Jhawar, V. Piuri,  and M. Santambrogio, "Fault tolerance management in cloud computing: A system-level perspective", IEEE Syst. J., vol. 7, no. 2, pp. 288-297, 2012.
 [http://dx.doi.org/10.1109/JSYST.2012.2221934]
[6]
A. Ganesh, M. Sandhya,  and S. Shankar, "A study on fault tolerance methods in cloud computing", In 2014 IEEE International Advance Computing Conference (IACC) Feb 21-22, 2014, Gurgaon, India, 2014, pp. 844-849 
 [http://dx.doi.org/10.1109/IAdCC.2014.6779432]
[7]
M.N. Cheraghlou, A. Khadem-Zadeh,  and M. Haghparast, "A survey of fault tolerance architecture in cloud computing", J. Netw. Comput. Appl., vol. 61, pp. 81-92, 2016.
 [http://dx.doi.org/10.1016/j.jnca.2015.10.004]
[8]
Z. Gao, C. Cecati,  and S.X. Ding, "A survey of fault diagnosis and fault-tolerant techniques—Part I: Fault diagnosis with model-based and signal-based approaches", IEEE Trans. Ind. Electron., vol. 62, no. 6, pp. 3757-3767, 2015.
 [http://dx.doi.org/10.1109/TIE.2015.2417501]
[9]
G. Chen, H. Jin, D. Zou, B.B. Zhou,  and W. Qiang, "A lightweight software fault‐tolerance system in the cloud environment", Concurr. Comput. Pract. Exp., vol. 27, no. 12, pp. 2982-2998, 2015.
 [http://dx.doi.org/10.1002/cpe.3190]
[10]
B. Balasubramanian,  and V.K. Garg, "Fault tolerance in distributed systems using fused data structures", IEEE Trans. Parallel Distrib. Syst., vol. 24, no. 4, pp. 701-715, 2012.
 [http://dx.doi.org/10.1109/TPDS.2012.96]
[11]
J. Zhao, Y. Xiang, T. Lan, H.H. Huang,  and S. Subramaniam, "Elastic reliability optimization through peer-to-peer checkpointing in cloud computing", IEEE Trans. Parallel Distrib. Syst., vol. 28, no. 2, pp. 491-502, 2016.
 [http://dx.doi.org/10.1109/TPDS.2016.2571281]
[12]
S.C. Joshi,  and K.M. Sivalingam, "Fault tolerance mechanisms for virtual data center architectures", Photonic Netw. Commun., vol. 28, no. 2, pp. 154-164, 2014.
 [http://dx.doi.org/10.1007/s11107-014-0463-1]
[13]
I. Jangjaimon,  and N.F. Tzeng, "Effective cost reduction for elastic clouds under spot instance pricing through adaptive checkpointing", IEEE Trans. Comput., vol. 64, no. 2, pp. 396-409, 2013.
 [http://dx.doi.org/10.1109/TC.2013.225]
[14]
S. Di, L. Bautista-Gome,  and F. Cappello, "Optimization of a multilevel checkpoint model with uncertain execution scales", In SC’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis Nov 16-21 2014, New Orleans, LA: USA, 2014, pp. 907-918 
 [http://dx.doi.org/10.1109/SC.2014.79]
[15]
M. Amoon, "Adaptive framework for reliable cloud computing environment", IEEE Access, vol. 4, pp. 9469-9478, 2016.
 [http://dx.doi.org/10.1109/ACCESS.2016.2623633]
[16]
B. Mohammed, M. Kiran, I.U. Awan,  and K.M. Maiyama, "Optimising fault tolerance in real-time cloud computing IaaS environment", In 2016 IEEE 4th International Conference on Future Internet of Things and Cloud (FiCloud) Aug 22-24, 2016, Vienna, Austria, 2016, pp. 363-370 
 [http://dx.doi.org/10.1109/FiCloud.2016.58]
[17]
A. Ragmani, A. Elomri, N. Abghour, K. Moussaid, M. Rida,  and E. Badidi, "Adaptive fault-tolerant model for improving cloud computing performance using artificial neural network", Procedia Comput. Sci., vol. 170, pp. 929-934, 2020.
 [http://dx.doi.org/10.1016/j.procs.2020.03.106]
[18]
A. Ragmani, A. Elomri, N. Abghour, K. Moussaid,  and M. Rida, "FACO: A hybrid fuzzy ant colony optimization algorithm for virtual machine scheduling in high-performance cloud computing", J. Ambient Intell. Humaniz. Comput., vol. 11, no. 10, pp. 3975-3987, 2020.
[19]
K.T. Bui, L. Van Vo, C.M. Nguyen, T.V. Pham,  and H.C. Tran, "A fault detection and diagnosis approach for multi-tier application in cloud computing", J. Commun. Netw., vol. 22, no. 5, pp. 399-414, 2020.
 [http://dx.doi.org/10.1109/JCN.2020.000023]
[20]
A. Mimouni, A. Agoune,  and O. Hioual, "Back recovery protocol based multi-agent planning for the fault tolerance of composite cloud services", In  12th edition of the Conference on Advances of  Decisional Systems, ASD 2018: Big data & Applications,, 2018, pp. 138-150 
[21]
M. Liaqat, A. Naveed, R.L. Ali, J. Shuja,  and K.M. Ko, "Characterizing dynamic load balancing in cloud environments using virtual machine deployment models", IEEE Access, vol. 7, pp. 145767-145776, 2019.
 [http://dx.doi.org/10.1109/ACCESS.2019.2945499]
[22]
O. Hioual, O. Hioual, S.M. Hemam,  and L. Maifi, "A new method for predicting failures in cloud computing", In International Conference on Advanced Intelligent Systems for Sustainable Development, Springer Cham, 2020, pp. 607-614 
[23]
X. Défago, “Agreement-Related Problems: From Semi-Passive Replication to Totally Ordered Broadcast”, Présentée Au Département D’informatique., Lausanne EPFL, 2000, pp. 1-178.
[24]
R.N. Calheiros, R. Ranjan, A. Beloglazov, C.A.F. De Rose,  and R. Buyya, "CloudSim: A toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms", Softw. Pract. Exper., vol. 41, no. 1, pp. 23-50, 2011.
 [http://dx.doi.org/10.1002/spe.995]
[25]
N. Budhiraja, K. Marzullo, F.B. Schneider,  and S. Toueg, "The primary-backup approach", Distrib. Syst., vol. 2, pp. 199-216, 1993.
[26]
X. Défago, P. Felber,  and A. Schiper, "Optimization techniques for replicating CORBA objects", In Fourth International Workshop on Object-Oriented Real-Time Dependable Systems (WORDS’99), vol. 1, 1999, p. 2 Santa Barbara: California, 1999.
 [http://dx.doi.org/10.1109/WORDS.1999.806554]
[27]
F.B. Schneider, "Replication Management Using the State-Machine Approach, Distributed Systems", Department of Computer Science Cornell University, Ithaca, New York: USA, 1993, pp. 1-31.
[28]
P. Felber, X. Défago, P. Eugster,  and A. Schiper, "Replicating CORBA objects: A marriage between active and passive replication", IFIP International Conference on Distributed Applications and Interoperable Systems, June 28, 1999, Springer, Boston: MA, 1999, pp. 375-387.
 [http://dx.doi.org/10.1007/978-0-387-35565-8_30]

Rights & Permissions Print Cite

Article Metrics

3

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/2666255815666220429092324	Print ISSN 2666-2558
Publisher Name Bentham Science Publisher	Online ISSN 2666-2566

Recent Advances in Computer Science and Communications

A Method based on Multi-agent Systems and Passive Replication Technique for Predicting Failures in Cloud Computing

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Abstract