Browsing by Author "Seema Dewangan"

Now showing 1 - 5 of 5

A novel approach for code smell detection: An empirical study
(Institute of Electrical and Electronics Engineers Inc., 2021) Seema Dewangan; Rajwant Singh Rao; Alok Mishra; Manjari Gupta
Code smells detection helps in improving understandability and maintainability of software while reducing the chances of system failure.In this study, six machine learning algorithms have been applied to predict code smells.For this purpose, four code smell datasets (God-class, Data-class, Feature-envy, and Long-method) are considered which are generated from 74 open-source systems.To evaluate the performance of machine learning algorithms on these code smell datasets, 10-fold cross validation technique is applied that predicts the model by partitioning the original dataset into a training set to train the model and test set to evaluate it.Two feature selection techniques are applied to enhance our prediction accuracy.The Chi-squared and Wrapper-based feature selection techniques are used to improve the accuracy of total six machine learning methods by choosing the top metrics in each dataset.Results obtained by applying these two feature selection techniques are compared.To improve the accuracy of these algorithms, grid search-based parameter optimization technique is applied.In this study, 100% accuracy was obtained for the Long-method dataset by using the Logistic Regression algorithm with all features while the worst performance 95.20% was obtained by Naive Bayes algorithm for the Long-method dataset using the chi-square feature selection technique. © 2021 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.
A study for method-level code smells detection using machine learning algorithms
(Academic Press, 2025) Rajwant Singh Rao; Seema Dewangan; Alok Mishra; Manjari Gupta
Motivation: Code smells reflect poor design decisions that degrade software quality and maintainability. Although several machine learning algorithms have been proposed to detect code smells, the impact of feature selection and cross-validation on certain method-level smells, specifically Long Parameter List and Switch Statements, has not been adequately explored in prior research. Methodology: This study employs a rigorous methodology to investigate the detection of four method-level code smells—Long Parameter List (LPL), Switch Statement (SS), Feature Envy (FE), and Long Method (LM) using twenty machine learning algorithms. We apply the Information Gain feature selection algorithm and the Equal Width Discretization (EWD) class balancing method. Performance is evaluated using 10-fold cross-validation across multiple metrics: accuracy, precision, recall, F-measure, MCC, ROC-area, and PRC-area. Key Findings: The proposed framework achieved a remarkable 99.77% accuracy for the Long Method dataset using the Filtered Classifier with feature selection and class balancing. Importantly, this study is the first to demonstrate the effect of feature selection and cross-validation on the LPL and SS datasets, where significant performance improvements are also observed. Contributions: A comprehensive comparative analysis of 20 machine learning algorithms on four method-level code smell datasets. © 2025 The Author(s)
A study of dealing class imbalance problem with machine learning methods for code smell severity detection using PCA-based feature selection technique
(Nature Research, 2023) Rajwant Singh Rao; Seema Dewangan; Alok Mishra; Manjari Gupta
Detecting code smells may be highly helpful for reducing maintenance costs and raising source code quality. Code smells facilitate developers or researchers to understand several types of design flaws. Code smells with high severity can cause significant problems for the software and may cause challenges for the system's maintainability. It is quite essential to assess the severity of the code smells detected in software, as it prioritizes refactoring efforts. The class imbalance problem also further enhances the difficulties in code smell severity detection. In this study, four code smell severity datasets (Data class, God class, Feature envy, and Long method) are selected to detect code smell severity. In this work, an effort is made to address the issue of class imbalance, for which, the Synthetic Minority Oversampling Technique (SMOTE) class balancing technique is applied. Each dataset's relevant features are chosen using a feature selection technique based on principal component analysis. The severity of code smells is determined using five machine learning techniques: K-nearest neighbor, Random forest, Decision tree, Multi-layer Perceptron, and Logistic Regression. This study obtained the 0.99 severity accuracy score with the Random forest and Decision tree approach with the Long method code smell. The model's performance is compared based on its accuracy and three other performance measurements (Precision, Recall, and F-measure) to estimate severity classification models. The impact of performance is also compared and presented with and without applying SMOTE. The results obtained in the study are promising and can be beneficial for paving the way for further studies in this area. © 2023, Springer Nature Limited.
Code Smell Detection Using Ensemble Machine Learning Algorithms
(MDPI, 2022) Seema Dewangan; Rajwant Singh Rao; Alok Mishra; Manjari Gupta
Code smells are the result of not following software engineering principles during software development, especially in the design and coding phase. It leads to low maintainability. To evaluate the quality of software and its maintainability, code smell detection can be helpful. Many machine learning algorithms are being used to detect code smells. In this study, we applied five ensemble machine learning and two deep learning algorithms to detect code smells. Four code smell datasets were analyzed: the Data class, the God class, the Feature-envy, and the Long-method datasets. In previous works, machine learning and stacking ensemble learning algorithms were applied to this dataset and the results found were acceptable, but there is scope of improvement. A class balancing technique (SMOTE) was applied to handle the class imbalance problem in the datasets. The Chi-square feature extraction technique was applied to select the more relevant features in each dataset. All five algorithms obtained the highest accuracy—100% for the Long-method dataset with the different selected sets of metrics, and the poorest accuracy, 91.45%, was achieved by the Max voting method for the Feature-envy dataset for the selected twelve sets of metrics. © 2022 by the authors.
Severity Classification of Code Smells Using Machine-Learning Methods
(Springer, 2023) Seema Dewangan; Rajwant Singh Rao; Sripriya Roy Chowdhuri; Manjari Gupta
Code smell detection can be very useful for minimizing maintenance costs and improving software quality. Code smells help developers/programmers, researchers to subjectively interpret design defects in different ways. Code smells instances can have varied size, intensity or severity which needs to be focused upon as they affect the software quality accordingly. Therefore, this study aims to detect the severity of code smells from code smell datasets. The severity of code smells is significant for reporting code smell detection performance, as it permits refactoring efforts to be prioritized. Code smell severity also describes extent of effort required during software maintenance. In our work, we have considered four code smells severity datasets to detect the severity of code smell. These datasets are data class, god class, feature envy and long method code smells. This paper uses four machine-learning and three ensemble learning approaches to identify the severity of code smells. To improve the models’ performance, we used fivefold cross-validation method: Chi-square-based feature selection algorithm and parameter optimization techniques. We applied two-parameter optimization techniques, namely grid search and random search and also compared their accuracy. The conclusion of this study is that the XG Boost model obtained an accuracy of 99.12%, using the Chi-square-based feature selection technique for the long method code smell dataset. In this study, the results show that ensemble learning is best as compared to machine learning for severity detection of code smells. © 2023, The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd.