Digital Library
Close Browse articles from a journal
 
<< previous    next >>
     Journal description
       All volumes of the corresponding journal
         All issues of the corresponding volume
           All articles of the corresponding issues
                                       Details for article 2 of 6 found articles
 
 
  Detecting noisy instances with the rule-based classification model
 
 
Title: Detecting noisy instances with the rule-based classification model
Author: Taghi M. Khoshgoftaar
Naeem Seliya
Kehan Gao
Appeared in: Intelligent data analysis
Paging: Volume 9 (2005) nr. 4 pages 347-364
Year: 2005-08-29
Contents: The performance of a classification model is invariably affected by the characteristics of measurement data it is built upon. If quality of the data is generally poor, then the classification model will demonstrate poor performance. The amount of noisy instances present in a given dataset is a good reflection of quality of the data. The detection and removal of noisy data instances will improve quality of the data, and consequently the performance of the classification model. This study presents an attractive and user-friendly approach for detecting data noise based on Boolean rules generated from the measurement data. The approach follows a simple and replicable approach that analyzes the rules to detect mislabeled noisy instances in the training dataset. Such instances are treated as data noise, and are removed to obtain a clean dataset. A case study of a software measurement dataset with known noisy instances is used to demonstrate the effectiveness of our approach. The dataset is obtained from a NASA software project developed for realtime predictions based on simulations. It is empirically demonstrated that the proposed approach is extremely effective in detecting noise in the dataset; in fact, the approach detected 100% of the known noisy instances. The proposed approach is compared with noise filtering based on five classification filters and an ensemble filter of five classifiers. We also demonstrate that the proposed approach shows excellent promise in detecting noisy instances in several (six) independent and real-world software measurement datasets with unknown noisy instances.
Publisher: IOS Press
Source file: Elektronische Wetenschappelijke Tijdschriften
 
 

                             Details for article 2 of 6 found articles
 
<< previous    next >>
 
 Koninklijke Bibliotheek - National Library of the Netherlands