Code smell detection using ensemble machine learning algorithms
Peer reviewed, Journal article
Published version
Permanent lenke
https://hdl.handle.net/11250/3123409Utgivelsesdato
2022Metadata
Vis full innførselSamlinger
- Artikler [416]
- Publikasjoner fra Cristin [433]
Sammendrag
Code smells are the result of not following software engineering principles during software development, especially in the design and coding phase. It leads to low maintainability. To evaluate the quality of software and its maintainability, code smell detection can be helpful. Many machine learning algorithms are being used to detect code smells. In this study, we applied five ensemble machine learning and two deep learning algorithms to detect code smells. Four code smell datasets were analyzed: the Data class, the God class, the Feature-envy, and the Long-method datasets. In previous works, machine learning and stacking ensemble learning algorithms were applied to this dataset and the results found were acceptable, but there is scope of improvement. A class balancing technique (SMOTE) was applied to handle the class imbalance problem in the datasets. The Chi-square feature extraction technique was applied to select the more relevant features in each dataset. All five algorithms obtained the highest accuracy—100% for the Long-method dataset with the different selected sets of metrics, and the poorest accuracy, 91.45%, was achieved by the Max voting method for the Feature-envy dataset for the selected twelve sets of metrics. Keywords: code smell, code smell detection, ensemble method, deep learning, Chi-square feature extraction technique, SMOTE class balancing technique