|
|
|
|
|
|
|
| Reports & Other Publications of the APLab |
Effective automation of the information retrieval task has long been an active area of research, leading to sophisticated retrieval models. With many IR schemes available, researchers have begun to investigate the benefits of combining the results of different IR schemes to improve performance. There are many successful data fusion experiments reported in IR literature, but there are also experiments in which data fusion did not work while using the same fusion rules. What is needed is a theory to tell a priori when one should use data fusion methods. We categorize different theoretical justifications of data fusion into two approaches, examine their implications, analyze some of the unsuccessful data fusion experiments, and propose two conditions for effective data fusion: (1) The condition of efficacy and (2) The condition of dissimilarity. We have developed a mathematical measure (Pair-out-of-order) to measure inter-scheme dissimilarity, and have developed algorithms and computer programs to implement our ideas. We report on a pilot test using the output lists of all IR schemes which participated in the Routing task of TREC 4 . Our result indicates that the efficacy and inter-scheme dissimilarity are good predictors for effectiveness of data fusion. In addition, we find that model using the ratio of efficacies of two schemes can improve our ability to predict fusion effectiveness.
|
|
|
|