Abstract
Aims: We propose a tool that can automatically generate datasets for software defect prediction from GitHub repositories.
Background: DevOps is a software development approach that emphasizes collaboration, communication, and automation in order to improve the speed and quality of software delivery.
Objective: This study aims to demonstrate the effectiveness of the tool, and in order to do so, a series of experiments were conducted on several popular GitHub repositories and compared the performance of our generated datasets with existing datasets.
Methods: The tool works by analyzing the commit history of a given repository and extracting relevant features that can be used to predict defects. These features include code complexity metrics, code churn, and the number of developers involved in a particular code change.
Results: Our results show that the datasets generated by our tool are comparable in quality to existing datasets and can be used to train effective software defect prediction models.
Conclusion: Overall, the proposed tool provides a convenient and effective way to generate high-quality datasets for software defect prediction, which can significantly improve the accuracy and reliability of prediction models.