Recommendation Systems: Helping People Find What they Don't Know

 

Nicholas J. Belkin

School of Communication, Information & Library Studies

Rutgers University

4 Huntington Street

New Brunswick NJ 08901-1071 USA

nick@belkin.rutgers.edu

Imagine you are performing a task while interacting with a service hosted on the Internet or with an automated speech recognition mobile cell phone service. What if during your interaction with this service, a machine makes a recommendation suggesting how you could better perform your current task? An important problem relating to personalization concerns understanding how a machine could help an individual user via suggesting recommendations.

When people engage in information-seeking behavior, it's usually because they are hoping to resolve some problem, or achieve some goal, for which their current state of knowledge is inadequate. This suggests that they don't know what it is that might be useful for them, and therefore may not be able to specify the salient characteristics of potentially useful information objects.

Unfortunately, typical information systems require their users to specify that which they want retrieved by the system. Furthermore, people engaging in large-scale information systems typically are unfamiliar with the underlying operations of the systems, the vocabularies the systems use to describe the information objects in their databases, and even the nature of the databases themselves. This situation suggests that it might be appropriate for some part of the information system to recommend courses of action to information seekers, which could help them better to understand their problems, and more effectively to use the resources of the system itself. This is the general problem which our research group at Rutgers has been addressing over the last several years, see, e.g. Belkin, et al. (1995); Belkin (1999).

One specific aspect of the difficulties that people face in interacting in information systems is the problem of choosing the "right" words to represent their information problems. In the typical information system, which assumes a model of information seeking called "specified searching", the user in the system is asked to generate a query, which is understood to be a specification of what s/he wants to have retrieved. In order for the system to search for, and find appropriate responses, it is necessary that the query be couched in terms which match the way the information objects are represented in the system. Whether such representation is based on the actual words used in the information objects themselves (so-called keyword representation), or on a controlled vocablulary representing the domain or the database (so-called conceptual representation), the problem for the user is the same: how to guess what words to use for the query, which will both adequately represent the person’s problem, and be the same as those used by the system in its representation. In information retrieval research, and practice, it is generally understood that accomplishing these two goals is a multi-stage, interactive process of initial query formulation, which allows the user to enter into interaction with the system, and subsequent iterations of query reformulation, based upon the results of the interaction. (cf Efthimiadis, 199x; Spink & Losee, 199x). This is an extremely difficult problem, since: it is difficult for people to specify that which they don’t know; there are many words which can be used to express the same ideas; predicting how another will talk about a topic is uncertain at best; and, predicting what another finds important, and worthy of representation, cannot be readily ascertained. Consider, for instance, the person who wishes to find obituary information about some group of well-known Americans. In a system in which relies on the words in the text for representation, using the term "obituary" in the query will not be useful, since that word is itself never used in the text of an obituary. However, words or phrases such as "died", "yesterday" (or any of the days of the week), "mourned by", "survived by", are commonly used in obituaries. It will be the rare user who will understand these characteristics of newspaper obituaries and be able to make use of them in an initial query, or even in query reformulation. Similar arguments hold for the representation of "well-known" and "American". How can a system help its user to overcome such problems?

In the mid-1960s, John Rocchio (Rocchio, 197x) suggested a technique for addressing this problem called relevance feedback. He saw the problem in the following terms. For the reasons that have been discussed above, a user is unlikely to begin an interaction with the "ideal" query (that is, that query which will best specify what is to be searched for and retrieved). Furthermore, because the user is unlikely to understand the complexities of representation and matching within an information retrieval system, that person will be unlikely to engage in effective query reformulation. However, we can assume that the user will be able to recognize, and indicate whether a retrieved information object is relevant or not to the problem. Rocchio suggested that the system could use the characteristics (that is, word frequencies and distributions) of the information objects which had been judged relevant or not relevant in order to modify (reformulate) the original query, until the query eventually became "ideal", separating relevant from non-relevant objects in the best possible way. The user’s role in this interaction is merely to indicate relevance or non-relevance of a retrieved object; the query reformulation takes place internal to the system, and the user’s only knowledge of that process is through the list of objects retrieved as a result of the reformulated query. We can characterize this type of interaction as system-controlled with respect to term recommendation, but indicating relevance or non-relevance gives the user some measure of influence on query reformulation through her/his interaction with the system results.

An alternative approach to system support for query reformulation is for the system to show to the user, given the terms used in the original query, and/or the documents retrieved by the original query, new terms which are believed might be useful for query reformulation. These terms can be determined through their empirical relationships to the query terms as determined by, for instance, co-occurrence with the query terms in a document, or co-occurence in similar contexts in the database. It is the user’s task in such systems to examine the suggested terms, and to manually re-formulate the query given the information provided by the system. Such techniques are typically known as term suggestion devices, and can be thought of as user-controlled, at least to the extent that the user controls just how the query is reformulated. But in this case, the actual terms that are suggested do not depend upon the user’s response to the system’s results.

At Rutgers, we have been investigating support for query reformulation (that is, of recommendation by the system of how a query might be better put) both with respect to relevance feedback versus term recommendation, and with respect to user knowledge and control of such support. One of our early results (Koenemann, 1996) showed that relevance feedback worked well in an interactive information retrieval environment, but also that it worked better with both increased knowledge of how it worked, and with increased control by the user of its suggestions. That is, a version of relevance feedback in which the user was informed of the basic algorithms used in query reformulation, and in which the terms which the system would use to reformulate the query based on the user’s relevance judgments were presented to the user for selection (that is, as a term suggestion device), performed consistently better than one in which the user knew only that marking documents relevant would help the system to find more documents like those. Perhaps more important, the subjects in the experiment preferred the former to the latter by a wide margin, because they felt they had control and knowledge of the query reformulation process. This led us to the conclusion that explicit term suggestion is a better way to recommend system support for query reformulation, than automatic, behind the scenes query reformulation. More recently, we have compared our version of relevance feedback as a term suggestion device (in which the user controls the suggested terms through marking documents relevant) with a version of term suggestion in which the user has no control over which terms are suggested (Belkin et al., 2000). In both systems, users had some knowledge of how the suggested terms were chosen. The primary difference between the two was that in the relevance feedback-based system, users had to make decisions about whether a document was relevant, before they were offered any suggested terms, whereas in the uncontrolled term suggestion system such terms were displayed to them at the same time as were the results of their query. Our results indicate that users were willing to give up the control they gained over suggested terms through explicit relevance feedback, in favor of the reduced effort (that is, not having to make both relevance and term selections decisions) on their part in the uncontrolled term suggestion system.

What can we make of these results? It seems that user control over system recommendation for query reformulation is important to users with respect to their main task, a good query reformulation. But control (and therefore, better understanding) of what terms are actually suggested, a subsidiary task, is not very important. Rather, having to engage in the subsidiary task distracts them from what they actually need to be doing. These conclusions need to be understood with several caveats, however. First, it does seem to be necessary that users have some understanding of how the suggested terms are determined, in order to be comfortable and effective in using them. Also, the terms that are suggested need to be perceived as related to the context of the search. Strange or unexpected terms made the subjects uncomfortable, and distracted them from query reformulation, and from the search task. These conditions can be interpreted as meaning that in order to accept and use the system recommendations effectively, the users need to have some trust in the system, with respect to the suggested terms, and then they needed to exert control over the system, with respect to which of the terms they thought would be useful. Trust with respect to the task which was not perceived as salient allowed the to accept the recommendation without question. But with respect to the task which is clearly salient, the users were not willing to give up their autonomy to the system. These results have clear implications for how recommender systems should operate in general.

The work just described is concerned with offering support to users in information systems who are engaged in one specific kind of information seeking activity: specified searching. Of course, people engage in many other kinds of interactions with information, for instance, browsing, evaluating, using, learning, both within a single information seeking episode, and across episodes. At Rutgers, and in collaboration with colleagues elsewhere, we are engaged in a long-term program of research in how best to offer support to people in a variety of different information seeking behaviors (see, e.g. Belkin, et al., 1995; Belkin, 1996). Query formulation and reformulation is just one problem that people face in one or more of such activities. Understanding the contents of databases, learning about effective vocabularies, being able quickly and accurately to evaluate the relevance of an information object are all other kinds of important problems that people face in their information seeking, for which system recommendations could offer useful support. As we have addressed various of these problems, we have seen results similar to those which we found in our query reformulation studies: with sufficient reason to trust the system recommendations, users are willing to give up some measure of control, accepting suggestions while maintaining control over how they are applied. We are attempting to apply these results in the design of cooperative, collaborative, dialogue-based information systems, in which users and the rest of the system each have their own roles and responsibilities, offering and accepting suggestions from one another, as appropriate.

References.

Belkin, N.J. (1996)

Belkin, N.J. (1999)

Belkin, N.J., Cool, C., Stein, A., Thiel, U. (1995)

Belkin, N.J., Cool, C., Jeng, J., Kelly, D., Lin, S.J., Park, S.Y., Savage-Knepshield, P., Sikora, C. (2000)

Efthimiadis, E. (199x)

Koenemann, J. (1996)

Rocchio, J. (197x)

Spink, A. & Losee, R. (199x)