Automatic Induction of Rule Based Text Categorization
Titel:
Automatic Induction of Rule Based Text Categorization
Auteur:
D.Maghesh Kumar
Verschenen in:
International journal of computer science and information technology
Paginering:
Jaargang 2 (2010) nr. 6 pagina's 163-172
Jaar:
2010
Inhoud:
The automated categorization of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of documents in digital form and the ensuingneed to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. This paper describes, a novel method for the automatic induction of rule-based text classifiers. This method supports a hypothesis language of the form "if T1, … or Tn occurs in document d, and none of T1+n,... Tn+m occurs in d, then classify d under category c," where each Ti is a conjunction of terms. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. Issues pertaining tothree different problems, namely, document representation, classifier construction, and classifier evaluation were discussed in detail.
Uitgever:
Academy & Industry Research Collaboration Center (AIRCC) (provided by DOAJ)