Digitale Bibliotheek
Sluiten Bladeren door artikelen uit een tijdschrift
 
<< vorige    volgende >>
     Tijdschrift beschrijving
       Alle jaargangen van het bijbehorende tijdschrift
         Alle afleveringen van het bijbehorende jaargang
           Alle artikelen van de bijbehorende aflevering
                                       Details van artikel 2 van 5 gevonden artikelen
 
 
  Efficient String Matching in Huffman Compressed Texts
 
 
Titel: Efficient String Matching in Huffman Compressed Texts
Auteur: Kimmo Fredriksson
Jorma Tarhio
Verschenen in: Fundamenta informaticae
Paginering: Jaargang 63 (2004) nr. 1 pagina's 1-16
Jaar: 2004-11-30
Inhoud: We present an efficient algorithm for scanning Huffman compressed texts. The algorithm parses the compressed text in O(n{[log2σ]/b}) time, where n is the size of the compressed text in bytes, σ is the size of the alphabet, and b is a user specified parameter. The method uses a variable size super-alphabet, with an average size of O({b/[H log2σ]}) characters, where H is the entropy of the text. Each super-character is processed in O(1) time. The algorithm uses O(2b) space and O(b2b) preprocessing time. The method can be easily augmented by auxiliary functions, which can e.g. decompress the text or perform pattern matching in the compressed text. We give three example functions: decoding the text in average time O(n{[log2σ]/[Hw]}), where w is the number of bits in a machine word; an Aho-Corasick dictionary matching algorithm, which works in time O(n{[log2σ]/b}+t), where t is the number of occurrences reported; and a shift-or string matching algorithm that works in time O(n{[log2σ]/b}[(m+s-1)/w]+t), where m is the length of the pattern and s depends on the encoding. The Aho-Corasick algorithm uses an automaton with variable length moves, i.e. it processes variable number of states at each step. The shift-or algorithm makes variable length shifts, effectively also processing variable number of states at each step. The number of states processed in O(1) time is O(b/[H log2σ]). The method can be applied to several other algorithms as well. Finally, we apply the methods to natural language taking the words (vocabulary) as the alphabet. This improves the compression ratio and allows more complex search problems to be solved efficiently. We conclude with some experimental results.
Uitgever: IOS Press
Bronbestand: Elektronische Wetenschappelijke Tijdschriften
 
 

                             Details van artikel 2 van 5 gevonden artikelen
 
<< vorige    volgende >>
 
 Koninklijke Bibliotheek - Nationale Bibliotheek van Nederland