Nick’s contribution to AntWeb

 

1         Background

Crucial to the success of the AntWeb project is the ability for the system to determine a description of both the quest and the pheromone, such that it is possible to compare quest description to the set of pheromone descriptions available at a particular AA site. Elicitation of both a person’s information problem (in AntWeb terms, a description of the person’s quest), and of what a person has found of value in the course of a quest, are notoriously difficult (e.g. Belkin, 1980; references). In the former case, it is well known that people who are searching for information often have substantial difficulties in specifying that which they need; in the latter, it is well known that people are reluctant to explicitly evaluate the usefulness or relevance of documents to their quests. This aspect of the proposal addresses these two problems in two complementary ways: through inference from previous or current information-seeking and information-using behaviors; and, through devising interfaces which make it very easy for searchers to indicate both description of quest and value of documents with respect to a quest.

There has been a fairly substantial amount of research done on inferring user interest in a document (i.e. pheromone) based on implicit sources of evidence, such as previous or current behavior (Kelly & Teveen, 2003). Example behaviors which have been investigated include reading (or dwell, or display) time of a document, retention behaviors such as saving or bookmarking a page, click-through, scrolling, and annotating. Although some studies have found reasonably strong and regular relationships between some of these behaviors and interest in, preference for, or usefulness of a document, others have demonstrated that a number of contextual factors may have an effect on such relationships. In particular, it has been suggested that it may be necessary to interpret such behaviors with respect to the individual person engaging in them, and to such characteristics as the task that led to information seeking, in order properly to identify interest based on the behaviors (Kelly, 2003).

There has also been some research on the issue of relating behaviors or other types of indirect evidence to descriptions of a person’s quest. Suggestions have been made that what other activities a person is engaged in when a quest is initiated could provide clues about the nature of the person’s quest: for instance, if the person is engaged in writing software, and engages in a quest whose initial description is “java”, it might be reasonable to infer that the quest is with respect to the programming language, and that therefore the description of the quest might include vocabulary associated with the language. If the person is engaged in travel-related activities when initiating a quest with the same term, it might be reasonable to infer that the quest has to do with the island, Java, and to modify the description appropriately. If a person is working on a particular document when a quest is initiated, it might be reasonable to use that document, and other documents related to it (e.g. in the same directory) as a basis for an extended description of the quest. Another type of behavior that could be used as a basis for quest description is the record of the documents which a person has encountered in the course of a quest; those which are identified (by some technique, perhaps other behaviors) as interesting to the person could be used to further describe the quest, roughly in the manner of relevance feedback. Even more specifically, there may be indications, such as annotations, or the immediate surroundings of the original quest description, of particular parts of documents which are of interest to a person. In such cases, it might be appropriate to extend the quest representation using this evidence. Although some work has been done with respect to several of these types of evidence, results to date are rather preliminary and scattered, and no really strong regularities have been identified.

Implicit sources of evidence about useful documents and the topics of quests have the inherent characteristic that they can never do more than provide a basis for a guess about the issue at hand. If the system with which the person interacts can be designed so that it is easy for the person to explicitly indicate which documents are useful, or what the topic of the quest is, then AntWeb would be on much firmer ground in accomplishing its goals. Unfortunately, engaging in these activities is, in general, orthogonal to the person’s real goal, which is to find useful information, and therefore people are reluctant to engage in them. There might, however, be ways to make it either very easy for people to engage in this sort of specification, or very clear that there would be a high pay-off to them if they did so. Appropriate interaction design could be one way to address both of these possible solutions. But doing this will almost certainly require moving away from the standard browser and list of links type of interface and system design, to one that supports much more direct interaction between user and information objects, and more opportunity for the user to manipulate information objects. One possibility would be to support multi-modal interaction, in which speech and gesture replace or complement other forms of user interaction with the system. Speech, in particular, holds the promise of allowing the user to express much more about the quest than is normally the case in the standard GUI interface.

It will be noted that the suggestions outlined here imply some sort of very private, client-side logging, the results of which would be summarized in some way that would then be associated with the server-side database, as indication of document usefulness with respect to a specific quest description, or, which would be used to construct a quest description for matching against the quest database.

2         What will be done

Building largely upon results from the NSF-funded MONGREL project, which investigated the use of implicit sources of evidence for both useful document detection and topic identification and description, we will:

1.     Further analyze data collected during the course of the MONGREL project concerning what the subjects in the Kelly study were doing at the time of their searching, in order to develop methods for identifying topics, based on the user’s context at the time of searching. It is likely that this could take a language-modeling perspective, in terms of developing user interest topic models based on the contents of documents associated with the search.

2.     Run experiments with the AntWeb system which will collect data similar to that in Kelly’s MONGREL project, to understand better what implicit evidence from previous and current behaviors can be used as indicators of document usefulness, and especially the effect of contextual factors on the accuracy of prediction of usefulness.

3.     Using results from, and participation in the TREC HARD track, we will test the effectiveness of various methods of taking account of different contextual factors in tailoring the representation of quests to the specific requirements and characteristics of the individual person.

4.     Using results from the MONGREL Speak to Me experiment, which investigated the effect of spoken language and gesture input on the quantity and quality of quest description, devise and evaluate an interface to the AntWeb system which incorporates at least some speech input. Evaluation will be in terms of the degree to which such an interface encourages users to provide quest descriptions which are better than those elicited in the normal GUI interface, where better means resulting in more accurate matching of quests, and more accurate prediction of useful documents.

5.     We will collaborate with others in the team in the development and evaluation of interaction models and interface techniques which will support strong user interaction with information objects, leading to explicit indication by the user of document preference.

6.     We will collaborate with others in the team in the development of quest and document representation and retrieval techniques, based on the results of the research described above, and perhaps drawing on the research in using language modeling techniques for these purposes from the MONGREL project.

7.     We will collaborate with other members of the team in implementing the results of our research into the AntWeb prototype, and in evaluating that prototype, and in its subsequent redesign.