A novel oversampling and feature selection hybrid algorithm for imbalanced data classification

Feng, Fang and Li, Kuan-Ching and Yang, Erfu and Zhou, Qingguo and Han, Lihong and Hussain, Amir and Cai, Mingjiang (2023) A novel oversampling and feature selection hybrid algorithm for imbalanced data classification. Multimedia Tools and Applications, 82 (3). 3231–3267. ISSN 1380-7501 (https://doi.org/10.1007/s11042-022-13240-0)

[thumbnail of Feng-etal-MTA-2022-A-novel-oversampling-and-feature-selection-hybrid-algorithm-for-imbalanced-data-classification]
Preview
Text. Filename: Feng_etal_MTA_2022_A_novel_oversampling_and_feature_selection_hybrid_algorithm_for_imbalanced_data_classification.pdf
Accepted Author Manuscript
License: Strathprints license 1.0

Download (1MB)| Preview

Abstract

Traditional approaches tend to cause classier bias in the imbalanced data set, resulting in poor classification performance for minority classes. In particular, there are many imbalanced data in financial fraud, network intrusion, and fault detection, where recognition rate of minority classes is pertinent than the classification performance of majority classes. Therefore, there is pressure on developing efficient algorithms to solve the class imbalance problem. To this end, this article presents a novel hybrid algorithm Negative Binary General (NBG), to improve the performance of imbalanced classifications by combining oversampling and a feature selection algorithm. A novel oversampling algorithm, Negative-positive Synthetic Minority Oversampling Technique (NPSMOTE), improves sample generation’s practicability while the Binary Ant Lion Optimizer (BALO) algorithm extracts the most significant features to improve the classification performance. Simulation experiments carried out using seven benchmark imbalanced data sets demonstrate that, the proposed NBG algorithm significantly outperforms the classification of imbalanced small-sample data sets compared to nine other existing and six recently published algorithms.