Abstract
A source code clone is a type of bad smell caused by pieces of code that
have the same functional semantics, but the syntactical representation varies. In the
past few years, there have been several studies about code clone detection, steered by
numerous machine learning models, software techniques and other mathematical
measures. This paper aims to conduct an impartial comparative study of the existing
literature on Deep Learning and Non-Deep Learning techniques. Due to the lack of
work in studying the previous and the current state-of-the-art tools in code clone
detection, there is no concrete evidence found to underpin the use of Deep Learning
approaches in clone detection, except for a preference from the evolutionary point of
view. We will address and investigate a few research questions related to the intentions
of using DL techniques for code clone detection compared to those of non-DL
approaches (Based on –token, text, AST, metrics, and others). Furthermore, we will
discuss the challenges faced in the Deep Learning implementation for clone detection
and their potential resolutions if feasible. This review would help the audience
understand how different approaches aid the clone detection process along with their
performance measures, limitations, issues, and challenges.