The effects of query complexity, expansion and structure on retrieval performance in probabilistic text retrieval
Kekäläinen, Jaana (1999)
Tampere University Press
Informaatiotutkimus - Information Studies
The effects of query complexity, expansion and structure on retrieval performance - measured as precision and recall - in probabilistic text retrieval were tested. Complexity refers to the number of search facets or intersecting concepts in a query. Facets were divided into major and minor facets on the basis of their importance with respect to a corresponding request. Two complexity levels were tested: high complexity refers to queries using all search facets identified from requests, low complexity was achieved by formulating queries with major facets only. Query expansion was based on a thesaurus, from which the expansion keys were elicited for queries. There were five expansion types: (1) the first query version was an unexpanded, original query with one search key for each search concept (original search concepts) elicited from the test thesaurus; (2) the synonyms of the original search keys were added to the original query; (3) search keys representing the narrower concepts of the original search concepts were added to the original query; (4) search keys representing the associative concepts of the original search concepts were added to the original query; (5) all previous expansion keys were cumulatively added to the original query. Query structure refers to the syntactic structure of a query expression, marked with query operators and parentheses. The structure of queries was either weak (queries with no differentiated relations between search keys, except weights) or strong (different relationships between search keys). More precisely, strong query structures were based on facets or intersecting concepts. Altogether five weak and eight strong structure types were tested. The test involved 30 test requests which all were formulated into 110 queries representing different structure, expansion and complexity combinations. The test database was a text database of 53,893 newspaper articles. The test was run in InQuery, a probabilistic text retrieval system. The test revealed that when the queries were unexpanded, there were no great differences between different structure types irrespective of the complexity level. When queries were expanded, the performance of the weakly structured queries dropped, but the performance of the best strongly structured queries improved. The differences in performance between complexity levels varied by different expansions, but in all, differences were minor in this respect. The best performance was achieved with a combination of a facet structure, high complexity, and the largest expansion. However, all strong structures did not perform well with expansion. The operator combining search keys within a facet was more decisive than the operator combining facets. The typical interpretation given to the OR operator in partial match retrieval proved to be too permissive, and thus, performance decreased when queries were expanded. The best result was achieved by treating all search keys within a facet as instances of one search key, i.e., using the SYN operator as a 'facet operator'.
- Väitöskirjat