Digitale Bibliotheek
Sluiten Bladeren door artikelen uit een tijdschrift
<< vorige    volgende >>
     Tijdschrift beschrijving
       Alle jaargangen van het bijbehorende tijdschrift
         Alle afleveringen van het bijbehorende jaargang
           Alle artikelen van de bijbehorende aflevering
                                       Details van artikel 9 van 11 gevonden artikelen
  Nomenclature-Based Data Retrieval without Prior Annotation: Facilitating Biomedical Data Integration with Fast Doublet Matching
Titel: Nomenclature-Based Data Retrieval without Prior Annotation: Facilitating Biomedical Data Integration with Fast Doublet Matching
Auteur: Jules J. Berman
Verschenen in: In silico biology
Paginering: Jaargang 5 (2005) nr. 3 pagina's 313-322
Jaar: 2005-08-04
Inhoud: Assigning nomenclature codes to biomedical data is an arduous, expensive and error-prone task. Data records are coded to to provide a common representation of contained concepts, allowing facile retrieval of records via a standard terminology. In the medical field, cancer registrars, nurses, pathologists, and private clinicians all understand the importance of annotating medical records with vocabularies that codify the names of diseases, procedures, billing categories, etc. Molecular biologists need codified medical records so that they can discover or validate relationships between experimental data and clinical data. This paper introduces a new approach to retrieving data records without prior coding. The approach achieves the same result as a search over pre-coded records. It retrieves all records that contain any terms that are synonymous with a user's query-term. A recently described fast algorithm (the doublet method) permits quick iterative searches over every synonym for any term from any nomenclature occurring in a dataset of any size. As a demonstration, a 105+ Megabyte corpus of Pubmed abstracts was searched for medical terms. Query terms were matched against either of two vocabularies and expanded as an array of equivalent search items. A single search term may have over one hundred nomenclature synonyms, all of which were searched against the full database. Iterative searches of a list of concept-equivalent terms involves many more operations than a single search over pre-annotated concept codes. Nonetheless, the doublet method achieved fast query response times (0.05 seconds using Snomed and 5 seconds using the Developmental Lineage Classification of Neoplasms, on a computer with a 2.89 GHz processor). Pre-annotated datasets lose their value when the chosen vocabulary is replaced by a different vocabulary or by a different version of the same vocabulary. The doublet method can employ any version of any vocabulary with no pre-annotation. In many instances, the enormous effort and expense associated with data annotation can be eliminated by on-the-fly doublet matching. The algorithm for nomenclature-based database searches using the doublet method is described. Perl scripts for implementing the algorithm and testing execution speed are provided as open source documents available from the Association for Pathology Informatics (
Uitgever: IOS Press
Bronbestand: Elektronische Wetenschappelijke Tijdschriften

                             Details van artikel 9 van 11 gevonden artikelen
<< vorige    volgende >>
 Koninklijke Bibliotheek - Nationale Bibliotheek van Nederland