Ensemble learning and Class Imbalance mitigation technique in Software Fault prediction
Keywords:
Software Fault Prediction, SMOTE, BorderlineS- MOTE, ADASYN, Stacking Ensemble, Neural Network, Multi- layer Perceptron, Class Imbalance, Software QualityAbstract
Predicting software fault is essential in enhancing the existing software reliability and quality, whereby, faulty modules that may arise are identified at early stages of software development. Since the software defect data has an inherent imbalance in their class labels, i.e., non-faulty cases far exceed the faulty ones, it is a task to effectively predict faulty cases. Trying to combat this problem, this study is combined with several dif- ferent oversampling strategies (Synthetic Minority Oversampling Technique,SMOTE, BorderlineSMOTE, and Adaptive Synthetic Sampling,ADASYN) to produce synthetic minority samples more thoroughly. They can be used to balance the dataset, especially: borderline and hard-to-learn instances, which improves the model performance.
We suggest using a resilient ensemble machine learning-based approach that allows integrating several machine learning models using a stacking method. Our architecture has base classifiers of Random Forest, Extra Trees, LightGBM and CatBoost. Output of such models is used to train a Multilayer Perceptron (MLP) neural network which acts as the meta-classifier. This stacking ensemble exploits the advantages of individual learners and learns complex interaction of features using deep learning. MLP is two-layered with 32 and 16 neurons, and activation ReLU and Adam optimizer were trained during a 300-epoch cycle.
We tested our model on benchmark datasets such as PC1, JM1 and KC2 to recenter its performance. Experimental results show clearly that the proposed NN Meta Stacking model is much more accurate compared to traditional classifiers to a significant extent; it was constructed when improved oversampling techniques were used. The PC1 dataset was the most accurate one after testing and it shows that the model is robust. Our results indicate that advanced oversampling has the effectiveness to increase fault prediction when used in conjunction with deep ensemble learning in software systems.











