Evolutionary data analysis for the class imbalance problem
Title:
Evolutionary data analysis for the class imbalance problem
Author:
Khoshgoftaar, Taghi M. Seliya, Naeem Drown, Dennis J.
Appeared in:
Intelligent data analysis
Paging:
Volume 14 (2010) nr. 1 pages 69-88
Year:
2010-02-04
Contents:
Class imbalance, where the classes in a dataset are not represented equally, is a common occurrence in machine learning. Classification models built with such datasets are often not practical since most machine learning algorithms would tend to perform poorly on the minority class instances. We present a unique evolutionary computing-based data sampling approach as an effective solution for the class imbalance problem. The genetic algorithm-based approach, Evolutionary Sampling, works as a majority undersampling technique where instances from the majority class are selectively removed. This preserves the relative integrity of the majority class while maintaining the original minority class group. Our research prototype, eVann, also implements genetic-algorithm-based optimization of modeling parameters for the machine learning algorithms considered in our study. An extensive empirical investigation involving four real-world datasets is performed, comparing the proposed approach to other existing data sampling techniques that target the class imbalance problem. Our results demonstrate that Evolutionary Sampling, both with and without learner optimization, performs relatively better than other data sampling techniques. A detailed coverage of our case studies in this paper lends itself toward empirical replication.