Relevance Feedback and Personalization:

A Language Modeling Perspective


W. B. Croft

Computer Science Department

University of Massachusetts

Amherst, MA 01003-4610






From some perspectives, personalization has been studied in information retrieval for some time. If the goal of personalization is to improve the effectiveness of information access by adapting to individual users’ needs, then techniques such as relevance feedback and filtering would certainly be considered to support personalization. There has also been considerable research done, mostly in the 1980s, on user modeling for information retrieval. This research had essentially the same goal as current research on personalization, which is to build a model of a user’s interests and preferences over time. Filtering systems, too, emphasize learning a user’s interest profile (or profiles) over time. Relevance feedback, on the other hand, has the goal of learning a model of the user’s interest in a single search session. This is short-term personalization. Starting with the same query, two users could end up with very different documents or answers depending on their feedback. Put another way, the system uses the user’s feedback to learn the specific context they have in mind for the initial query.


Query expansion or local feedback techniques are also related to personalization of context. Based on the user’s query and the document corpus, possible contexts for the query are inferred and used to suggest additional query terms. Although “true” personalization would make use of a long-term profile or user model to choose contexts, and therefore additional query terms, for each individual user, we believe that the representation of context and how it can be inferred is a basic component of any approach to content personalization. By studying these issues in the short-term (a single session), we may learn how to handle context effectively in the long-term (over multiple sessions).


Relevance feedback, despite its long history in information retrieval research, has not been successfully adapted in Web-based search engines. The closest feature found in some search engines is “find more documents like this”. Query expansion techniques have been used in a number of systems to suggest additional search terms, with limited success. There are a number of reasons for the apparent failure of relevance feedback in current systems. The primary one that is usually mentioned is the difficulty of getting users to provide the relevance information. Simply providing “relevant” and “not relevant” buttons in the interface does not seem to provide enough incentive for the user. For this reason, a number of researchers are investigating techniques to infer relevance through passive measures such as time spent browsing a page or number of links followed from a page. Even if the relevance information was provided, however, there is a significant problem with using the current feedback techniques. With full text and limited relevance information, the relevance feedback techniques developed in the 70’s and 80’s are simply not as reliable as the experiments with collections of abstracts had indicated. In other words, identifying the correct context is not simple. Experiments have shown that if a user can indicate relevant sections or even phrases in a document, relevance feedback is more accurate. This, however, seems to imply that we will need more input from the users rather than less.


In summary then, relevance feedback and query expansion is a personalization technique that attempts to infer a user’s context from the query and additional feedback. The successful application of this technique is not easy and involves both sophisticated interface design and good algorithms for inferring context. In this paper, we focus on the algorithms for inferring context. In particular, we describe how a language modeling approach to the problem can lead to new perspectives and potentially more effective algorithms.


Relevance Feedback and Language Models


There are a number of formal ways of describing relevance feedback, beginning with the notion of an “optimal query” used in the SMART system (Salton, 1968). The optimal query is defined as a vector obtained by taking the difference between the relevant and non-relevant sets of documents, also represented as vectors. This query vector can be shown to be the “best”, under some assumptions, for distinguishing relevant and non-relevant documents. This relevance feedback approach evolved to a query modification process where the old query is modified by a weighted average of the identified relevant documents and, in some versions of the algorithm, the identified non-relevant documents. The revised query is then used to produce a new document ranking. Another common way of describing relevance feedback is to a Bayesian classification model of retrieval (Van Rijsbergen, 1979). In this approach, identified relevant documents are used to estimate the characteristics (probabilities of occurrences of words) of the relevant class of documents for a query. The corpus is used to estimate probabilities for the non-relevant class. The revised estimates are used to produce a new document ranking based on the probability of belonging to the relevant class. Both of these approaches can be viewed as applications of different machine learning techniques to the problem of identifying relevant documents based on training data. There have been a number of other recent experiments with machine learning techniques, but these have not in general shown significant improvements over the approaches already described (for example, Schapire et al, 1998). Any of these techniques are always more effective when applied to document filtering rather than relevance feedback, because in that application there is significantly more training data.


The language model approach to feedback does not at first appear to lend itself to relevance feedback. In the basic approach, first suggested by Ponte and Croft (1998), each document is represented by a document language model. The query is treated as a sample of text from a language model, and the documents are ranked according to the probability that the document language model could generate the query text. This simple model produces surprisingly good retrieval results, and the model has been extended in a variety of ways. For example, documents have been modeled as mixtures of topic models (Hoffman, 1999) and translation probabilities have been introduced to deal with synonymy and cross-lingual retrieval (Berger and Lafferty, 1999).


In terms of relevance feedback, the basic approach has been to modify the initial query using words from top-ranked (for query expansion) or identified relevant documents. Ponte (2000) simply adds some additional words to the query based on the log ratio of the probability of occurrence in the model for relevant documents to the probability in the whole collection. The model for relevant documents was taken as the sum of the individual document models. In Miller et al (1999), words from relevant documents were added to the query and probabilities in their model adjusted by training over queries. Although both of these approaches produced good results, they are not very satisfactory models from the point of view of describing and defining query contexts and user models, which are central to personalization.


In order to better capture the important processes behind relevance feedback and query expansion, we believe it is important to view the query as a sample of text from a model of the information need. That is, documents are generated from document language models associated with authors and queries are generated by information need language models associated with individual users. This raises the interesting possibility that users, like documents, could be represented as a mixture of topic language models, but we do not pursue that further here. From this perspective, the task of the relevance feedback or query expansion component of the system is to infer the language model associated with the query. Given this query model, retrieval could then be done either by ranking the documents according to their probability of being generated by the query model, or the query model and the document models could be directly compared and documents ranked by the similarity of these models (using a measure such as Kullback-Leibler divergence). From the language model perspective, inferring the query language model is what is meant by inferring the context of the query. Give the query model, we can predict good suggestions for additional search terms and produce better results by personalizing the search. The problem, of course, is that we still have to estimate the query model from very limited data. This is discussed in the next section.


Inferring Language Models


To infer the query model from either top-ranked documents (in the case of query expansion) or identified relevant documents (in the case of relevance feedback) requires an approach where document text can be viewed as samples from the query model. In a recent paper, Lavrenko and Croft (2001) show how the standard probabilistic model of retrieval can be described in language modeling terms. Documents are ranked by the ratio P(D|R)/P(D|G) where P(D|R) is the probability of generating the document text given the relevance (query) model and G is a global or corpus model. They then describe a technique for constructing a query model with no relevance data. This technique uses the document models associated with the top-ranked documents to estimate the probability P(w|Q) which is the probability of observing the word w from a model that is known to have generated Q. These probabilities are used to smooth the query model, and the computation is very similar to successful, ad-hoc query expansion techniques such as LCA (Xu and Croft, 2000). When used with relevance feedback, this technique will be able to more accurately estimate the query model from little training data. While this does not solve the usability issues associated with feedback, it does provide more stable, predictable, and effective outcomes when the user does provide relevance data.


Quantifying Ambiguity


Many of the queries presented to an information retrieval system are ambiguous. That is, the context of the query is not clear. An often-used example of such a query in Web search is “java”, where the context could be programming languages, Indonesia, or coffee (or possibly others). Given such a query, the only way the ambiguity can be resolved is to ask the user for clarification in some form. Since there will be many thousands of documents on Java programming and many fewer on Indonesia, relying on relevance feedback to resolve the problem is unrealistic. Instead, a better method would be to first determine if a query is ambiguous, then ask the user specific questions to resolve the ambiguity. How to accomplish this goal is another focus of our research. Even if we could resolve this domain ambiguity, we would still have to deal with task ambiguity (e.g. do you want to buy Java, travel to Java, learn about Java, etc.) but we do not discuss this issue further here.


The first part of addressing query ambiguity is to define it and quantify it. We are currently developing a language model framework to do this. One simple approach would be to assume that there are a fixed number of topic models for a collection and use a clustering technique to “discover” them. The ambiguity of a query could then be directly related to the number of topic models that could generate the query with high probability. The problem with this approach is that it is not clear how many topic models there should be, or how many is appropriate for a given query. The query “middle east” is less ambiguous than “java” at a high level of abstraction, but there are many subtopics related to the Middle East in a news corpus so the context of the query could still be viewed as poorly defined. We are instead developing an entropy-based approach where we measure how well query words predict other words, and how words are grouped according to how they predict each other. The data for this analysis is a corpus language model based on co-occurrence of words. This approach was initially developed for summarization (Lawrie and Croft, 2001) and we are currently testing its application to query ambiguity.


After quantifying the ambiguity of the query, we must then decide how to resolve it. If the query is sufficiently ambiguous (according to our measure), we should be able to identify the most probable contexts (word associations, language models). Given those contexts, we plan to select sample sentences that are representative of them. This means the sentences have high probabilities of generation in those contexts. The sentences would be shown to the user for clarification. This process can be viewed as a generalization of a KWIC index.


By using a language model view of ambiguity, we hope to resolve the context of the query quickly and with minimal user input. The end result of this process will be more accurate search results.




This paper describes some of the ongoing research in the Mongrel project, which is a collaboration between Professor Nick Belkin at Rutgers University, and Professors Allan and Croft at the University of Massachusetts. A description of the project can be found at The paper is also related to research done in collaboration with Professor John Lafferty at Carnegie Mellon University. This research is supported in part by the Library of Congress and Department of Commerce under cooperative agreement number EEC-9209623 and in part by NSF Grant number ISS-9907018. Any opinions, findings and conclusions or recommendations expressed in this material are the author(s) and do not necessarily reflect those of the sponsor.




A. Berger and J. Lafferty, “Information retrieval as statistical translation”, Proceedings of ACM SIGIR 99, 222-229, 1999.


T. Hoffman, “Probabilistic latent semantic indexing”, Proceedings of ACM SIGIR 99, 50-57, 1999.


V. Lavrenko and W. B.Croft,  “Relevance-based language models”, submitted to ACM SIGIR 2001.


D. Lawrie and W.B. Croft, “Selecting terms for hierarchical document summaries”, submitted to ACM 2001.


D. Miller, T. Leek and R. Schwartz, “A Hidden Markov Model information retrieval system”, Proceedings of ACM SIGIR 99, 1999.


J. Ponte and W.B. Croft, “A language modeling approach to information retrieval”, Proceedings of ACM 98, 275-28, 1998.


J. Ponte, “Language models for relevance feedback”, in Advances in Information Retrieval, ed. W.B. Croft, 73-96, 2000.


C.J. Van Rijsbergen, Information Retrieval, Butterworths, 1979.


G. Salton, Automatic Information Organization and Retrieval, McGraw-Hill, 1968.


R. Schapire, Y. Singer and A. Singhal, “Boosting and Rocchio applied to text filtering”, Proceedings of ACM SIGIR 98, 1998.


 J. Xu and W.B. Croft,  "Improving the Effectiveness of Information Retrieval with Local Context Analysis", ACM Transactions on Information Systems, 18(1), 79-112, (2000).