|
Home Page
|
Semantic Relationship Operators Semantic relationship operators define the types of mapping which are supported by the lexical scanner. Mapping is defined by expressions of the form:
Detection of certain kinds of semantic relationships could trigger special help functions by the search engine. For example, detection of SP (Spanish) relations could automatically add a bilingual help message to the output in the correct language.
As logical primitives these semantic operators allow for the design of a simple substitution matrix, or the collection of related primitives into semantic packages or frames. For example, the input phrase "PCA pumps" might trigger the following package of primitive relations:
{found="PCA pump*"; Packages could contain from 1 to n relations in any order. The precedence of operator evaluation must be considered because the order in which packages are evaluated could impact which packages are evaluated. While optimal solutions would probably require optimal sequential ordering, the practical problems associated with indexing large vocabularies against randomly-occurring text make it virtually impossible to formulate a perfect ordering strategy. Ease of maintenance argues for a more relaxed sequencing approach in which packages can fire in virtually any order and still provide some marginal utility to the next search step even if their relative firing order is suboptimal. In other words, it is better to design a semantic database which is robust from the point of view of maintenance than to design a semantic database in which the sequencing of pacakges requires extensive analysis. It should be possible to add new packages without regard to the presence of other packages and still expect a positive incremental return on the search strategy as a result of addition of the new package.
Semantic relations versus grammar analysis Natural language comprehension efforts may use techniques based on simple string recognition, or on deeper analysis of grammatical relations. The lexical scanner will use the former method. Efforts to develop grammatical scanners have generally proved unsatisfactory due to underlying complexity of human language. Because the domain of palliative care is relatively well-defined, simple phrase recognition methods can give immediate improvements in search results without the complexity of grammatical methods. This approach is also consistent with experience based on expert system construction, which has shown that it is more useful to encode domain-specific concepts in a limited, artificial computer form than to attempt to have the computer master a poorly-bounded domain. The SPECIALIST lexicon The UMLS SPECIALIST lexicon offers an important alternative for semantic parsing. It is provided in three formats: a unit record format, a relational table format, and in Abstract Syntax Notation One (ASN.1) format. The information associated with each lexical entry includes a unique identifier, a base form, a syntactic category code, certain agreement information, complementation information if relevant, and various other properties relevant to the particular lexical entry. The unit record format is a frame structure consisting of slots and fillers. The slots are the basic lexical attributes, and the fillers express the possible values of those attributes for that particular lexical item. The record for "anaesthetic" given below illustrates some of the features of the lexical unit record:
{base=anaesthetic The base form "anaesthetic" and its spelling_variant "anesthetic" determine a lexical record consisting of a noun and an adjective entry. The variants= slot contains a code indicating the inflectional morphology of the entry; the filler reg in the noun entry indicates that the noun "anaesthetic" is a count noun which undergoes regular English plural formation ("anaesthetics"); inv in the variants= slot of the adjective entry indicates that the adjective "anesthetic" does not form a comparative or superlative. The position= slot indicates that the adjective "anaesthetic" is attributive and appears after color adjectives in the normal adjective order. Data for lexical entries is represented in ten relational tables. The lexicon relational format is not fully normalized. By design, there is duplication of data among different relations and within certain relations. NLM notes that developers will need to decide the extent to which this redundancy should be retained, reduced, or increased for their applications. Among other tables, there are separate tables for agreement and inflection information, complementation patterns, spelling variants, and abbreviations and acronyms and their fully expanded forms. The EMBASE thesaurus (EMTREE Codes) EMBASE is a comprehensive bibliographic database covering the worldwide literature on biomedical and pharmaceutical fields. It is produced by Excerpta Medica, a division of one of the world's largest medical publishers, Elsevier Science Publishers B.V. of Amsterdam. Excerpta Medica was originally founded by a group of physicians as Excerpta Medica Foundation in 1946 to promote the flow of medical information. In 1972, this organization joined the Elsevier Group. About 300,000 records are added annually. More than 60% of the records contain abstracts from over 4,500 journals worldwide. CAS Registry Numbers (R) are present in the file from 1988 to the present. EMBASE contains two different types of records due to a major change in indexing policy in 1988. The records from 1988 to date may have slightly different displays and search data than those in the early portion of the file. Linked terms (similar to MEDLINE MeSH subheadings) are only present in records from 1988 to the present. However, the entire file may be searched with the EMTREE codes (similar to MEDLINE Tree Numbers), together with a comprehensive online thesaurus.
|