The application of non-parametric techniques to solve classification problems in complex data sets in veterinary epidemiology – An example
Titel:
The application of non-parametric techniques to solve classification problems in complex data sets in veterinary epidemiology – An example
Auteur:
Stärk, Katharina D.C. Pfeiffer, Dirk U.
Verschenen in:
Intelligent data analysis
Paginering:
Jaargang 3 (2013) nr. 1 pagina's 23-35
Jaar:
2013-06-14
Inhoud:
Statistical classification problems are very common in veterinary epidemiology. Traditionally, parametric techniques such as logistic regression or discriminant analysis are used to analyse data sets that contain several classes of observations. However, characteristics of the data set such as high dimensionality, multicollinearity and non-homogeneity can make a data set unsuitable for parametric techniques. In this article, classification tree algorithms (ID3, C4.5, CHAID, CART) and artificial neural networks are suggested as non-parametric alternatives. Their application is illustrated using a field data set containing pig farms with 3 levels of respiratory disease prevalence. The performance of non-parametric classification algorithms is compared with results from multinomial logistic regression. None of the algorithms was significantly better than the others. The proportions of correctly classified farms were between 84% and 96%. However, the data set was small (86 observations), which created technical problems when using the artificial neural networks and multinomial logistic regression. The choice of statistical technique should therefore be based on the objectives of the study and the data set under consideration. Classification trees are well-suited for exploratory data analysis. They are easy to apply and worth considering.