Digitale Bibliotheek
Sluiten Bladeren door artikelen uit een tijdschrift
 
<< vorige   
     Tijdschrift beschrijving
       Alle jaargangen van het bijbehorende tijdschrift
         Alle afleveringen van het bijbehorende jaargang
           Alle artikelen van de bijbehorende aflevering
                                       Details van artikel 7 van 7 gevonden artikelen
 
 
  WRAPPER INFERENCE FOR AMBIGUOUS WEB PAGES
 
 
Titel: WRAPPER INFERENCE FOR AMBIGUOUS WEB PAGES
Auteur: Crescenzi, Valter
Merialdo, Paolo
Verschenen in: Applied artificial intelligence
Paginering: Jaargang 22 (2008) nr. 1-2 pagina's 21-52
Jaar: 2008-01
Inhoud: Several studies have concentrated on the generation of wrappers for web data sources. As wrappers can be easily described as grammars, the grammatical inference heritage could play a significant role in this research field. Recent results have identified a new subclass of regular languages, called prefix mark-up languages, that nicely abstract the structures usually found in HTML pages of large web sites. This class has been proven to be identifiable in the limit, and a PTIME unsupervised learning algorithm has been previously developed. Unfortunately, many real-life web pages do not fall in this class of languages. In this article we analyze the roots of the problem and we propose a technique to transform pages in order to bring them into the class of prefix mark-up languages. In this way, we have a practical solution without renouncing to the formal background defined within the grammatical inference framework. We report on some experiments that we have conducted on real-life web pages to evaluate the approach; the results of this activity demonstrate the effectiveness of the presented techniques.
Uitgever: Taylor & Francis
Bronbestand: Elektronische Wetenschappelijke Tijdschriften
 
 

                             Details van artikel 7 van 7 gevonden artikelen
 
<< vorige   
 
 Koninklijke Bibliotheek - Nationale Bibliotheek van Nederland