Understanding and Supporting Multiple Information
Seeking Strategies,
a TIPSTER Phase III Research Project
School of Communication, Information and Library Studies
Rutgers University
Contract No. MDA904-96-C-1297
Nicholas J. Belkin, PI; José Pérez Carballo, Co-PI
12-MONTH REPORT
10 October 1997
1. General Progress Report
In the first year of this project (now referred to as RTP3) we have worked on Project Tasks 1 and 2, as specified in our proposal. These tasks are, roughly, to :
1. Identify and classify the range of information seeking behaviors of a group of people engaged in intensive knowledge work, and to propose information retrieval (IR) system functionalities appropriate for supporting several of those behaviors;
2. Install and test an object-oriented framework/toolkit for implementing IR system functionalities.
In the next two sections of this report, we describe in detail the progress, results and current status of these two tasks; here we give a very general overview of our work this year, and of the project personnel and infrastructure.
1.1 Project overview
The first quarter of the project was devoted primarily to setting up the infrastructure, and investigating possibilities for the study group for task 1. During this period we also completed our analysis and presentation for the Rutgers TREC-5 Interactive Track project, Belkin et al., (1997) (see Appendix B for a copy of this paper). Although the TREC-5 work was begun before RTP3 started, it contributes to that project in that we have analyzed the data to identify and characterize different information seeking behaviors in a specific browsing-type task. The results have informed our work on both tasks 1 and 3 of RTP3.
In the second quarter, we began serious work on Task 2, installing ObjectStore and FIRE, the basic software for our IR framework, and also installing various versions of InQuery, to use as an IR testbed. During this period, we also initiated arrangements with Michael Crandall, External Systems Requirements Librarian at Boeing Technical Libraries in Seattle, to recruit a sample of volunteer knowledge workers from Boeing managers, engineers and technical staff, to be our task 1 study group. During the second quarter we began constructing and testing the data collection instruments we would be using for task 1. During this period, we also began work on the TREC-6 Interactive Track, which was quite important for RTP3 in terms of developing and testing methodologies for evaluating interactive IR systems. We were one of only three groups who participated in the TREC-6 Interactive Track Pre-experiment, which was an initial test of the methodology which had been proposed for comparing interactive systems installed at different locations, and used by different people. We then went on, during this, and the third and fourth quarters, to design and conduct our TREC-6 Interactive Track experiment. This experiment was designed to be a preliminary test of some specific functionalities for supporting one particular information seeking behavior which we expected to be relevant to RTP3: discovering what topics are discussed in a specific database.
In the third and fourth quarters we completed installation and testing of FIRE and ObjectStore, and also of various versions of InQuery. We also began work in the fourth quarter on specifying particular functionalities that would likely be required in our task 3 work, and to investigate how they could be implemented in both FIRE and InQuery. During this period we ran a series of pilot tests of our data collection instruments and methods for task 1, using volunteer engineers, scientists and managers from local research industries such as Lucent Technologies and Novartis Pharmaceuticals as subjects. These resulted in a fairly radical change in our methods from those which we had originally proposed. We finalized our sample of subjects for task 1 by the end of the third quarter, and in early July a group of investigators went to Seattle to interview the subjects. The audiotapes of the interviews were transcribed during July and August, and the content analysis of these data was begun as the transcripts were available. By the end of the first 12 months, we had established a content analysis scheme, and a preliminary classification of the tasks, activities and information seeking behaviors of our sample.
In general, our work on Task 1 was put somewhat behind schedule for a variety of reasons, most of which had to do with the difficulty of finding an appropriate group of subjects to study. In particular, we had gone quite a long way toward arranging to study analysts at NSA through arrangements with our then COTRs. Unfortunately, when these COTRs were reassigned from our project, we could no longer plan to study these analysts. Although our final study group was identified only at about six months into the project, rather than the projected two months, we feel that we have nevertheless made substantial progress toward our second Task 1 milestone, the description and classification of information seeking strategies. Although coding of the transcripts is not yet complete, the initial coding has led to a quite detailed description, and also to a classification scheme which we believe will remain the basis for our further work. Although the classification scheme will not be finalized until about two months into the second year of the project, we feel that we are well-positioned to meet the milestones for tasks 3 and 4.
We have met the second milestone for task 2, in that the basic systems are installed and are robust, although installation of the basic systems (the first milestone) took somewhat longer than anticipated (primarily because of problems of ordering and delivery, and of software incompatibilities).
The general project theory, plan, structure, and some preliminary results have been reported in a number of venues during this year. The relevant papers (or presentations, when there was no publication) are in Appendix B.
1.2 Project infrastructure
During the first year of RTP3 we set up our experimental and administrative facility. This consisted of purchasing a SUN Ultra workstation, a Dell 200 mhz MMX workstation, and some related furniture, all from budgeted contract funds; and, establishing a usability-type lab in which to carry out our searching experiments. The lab facility, which consists of a room in which experimental subjects do searching, an observation room (with 1-way mirror between the two rooms), and a variety of video and audio recording, monitoring, transcribing and editing equipment, was established and outfitted by funds provided by Rutgers University central administration, and by the School of Communication, Information & Library Studies. This facility was tested in the TREC-6 project, and will be used for the experiments in Task 4. We purchased ObjectStore, the object-oriented database management system required by FIRE, the IR development framework that we obtained from Ubilab, the Information Technology Research Laboratory of the Union Bank of Switzerland. We entered into an agreement with Ubilab that they provide us with FIRE, and with some support, in exchange for our providing them with the interface designs that we will develop during the project. This arrangement has been working quite well.
1.3 Project personnel
During the first quarter of the year, Nicholas Belkin, the PI, was away on a Fulbright in Finland. He was able to maintain close contact with the project through electronic means, and also through meetings with the other investigators at various conferences. Since mid-November 1997, he has been back at Rutgers, and working on both tasks 1 and 2 as stipulated in the proposal. Jose Pèrez Carballo, the Co-PI, has been devoting somewhat more than the 12 1/2% time stipulated in the proposal to the project, primarily because programming issues were somewhat of a concern at various times.
Two Graduate Assistants, Hong Xie and Shinjeng Lin, worked full-time (i.e. 15 hours/week) on the project throughout the year, Xie primarily on task 1 and Lin primarily on task 2. In addition to this budgeted student assistance, four other PhD students worked on the project at various times, supported by other funds. Pamela Savage, Soyeon Park, and Soo Young Rieh participated throughout the year on task 1, and also in supporting our participation in TREC-6. Cynthia Sikora joined the project in February, primarily in support of TREC-6 work related to the project. Colleen Cool, who began work on the project as a Graduate Assistant in the Fall Semester of 1997, continued work on the project as a consultant (about one day/week) for the rest of the year, after leaving Rutgers to join the faculty at Queens College, New York. We also hired three transcribers during the final two months of the year, to transcribe the audiotapes of our interviews with the knowledge workers whom we studied in task 1. Four Masters students in the Department of Library and Information Studies also joined the project team during the Spring and Summer semesters in support of our TREC-6 involvement.
2. Task 1 Results
2.1 Goals
The goal of Task 1 was to identify, describe and classify a range of information seeking strategies in a group of knowledge-intensive workers. This goal was to be achieved in such a way that we would be able to establish relationships between the tasks and intentions of these workers, the information resources they interacted with, and their information seeking strategies. Furthermore, we required that we be able to predict what IR functionalities might be appropriate for supporting different information seeking strategies, and that we be able to identify patterns or sequences of information seeking strategies associated with different behaviors, intentions, resources and tasks, if such existed.
2.2 Subjects
We began by identifying an appropriate group of subjects. By appropriate, we meant in particular that they be people whose everyday work requires substantial use of information, but who are not professional information workers themselves. Furthermore, we wanted to have a group with fairly wide representation of types of uses of information, and with varying degrees of urgency of information need. Finally, we needed to be able to observe, or otherwise collect data about these people in their ordinary working environments. An ideal group for us would have been practicing intelligence analysts; unfortunately, we were unable to arrange to study such a group. We investigated several other possibilities, including scientists, engineers, managers in technical and research industries of various kinds. This type of group was of interest to us, because they typically work in information-intensive environments, with tasks which often have strong temporal constraints, and which often involve problem solving or production of an intellectual artifact.
Constraints on selection of the subject group included problems of: confidentiality; agreement of parent organization; numbers of subjects available; type of work in which subjects were involved; possibilities for observation of subjects. After investigating a variety of possibilities, we entered into an agreement with the Boeing Aircraft Company, Seattle, which was suggested and mediated by Michael Crandall, External Systems Requirements Librarian at Boeing Technical Libraries. The terms of the agreement were that Crandall would recruit a set of potential subjects at Boeing, from which we would solicit volunteers to take part in the study, in return for which we would provide Boeing with the results of the study, and with the opportunity to use any system design or interfaces developed as a result of the study. As a result of this arrangement, we were able to recruit a group of 14 engineers, managers and technical staff in various divisions of the Boeing Company in and around Seattle as our study group. The characteristics of this group are specified in our TIPSTER III 12-Month Workshop presentation, attached as Appendix A.
Five members of the RTP3 research staff collected data from the study group during the first two weeks of July 1997 (see section 2.4 Data collection, below).
2.3 Development of methodology
Our initial data collection plan was to directly observe each of our subjects during the course of a working day, making notes on their activities, the resources with which they engaged, and their various behaviors. At the end of the day, we would interview each subject about each activity in which they engaged, and then about those tasks, resources and activities in which they normally engage, but which were not accomplished during that day.
During the first two quarters of the project year, we developed preliminary observation and interview instruments based on this original plan. The specific details of the instruments could not be specified, however, until we had settled upon a study group. Once this was accomplished, in February 1997, we were able to specify the data collection instruments sufficiently to begin pilot testing. The pilot testing went through three iterations, using volunteer subjects from local technical and industrial research organizations whose tasks, responsibilities and general information activities were judged to be roughly similar to those of our study group. The results of the pilot study were that we changed our initial methodological plan substantially. The general method on which we finally settled is specified in section 2.4 Data collection (see Appendix A for more detail on the reasons for the changes and the eventual method).
2.4 Data collection
In April 1997, we began setting up appointments with the people at Boeing who had agreed to be participants in the study. Their constraints, and ours, meant that we could not begin to collect data until the first two weeks of July 1997. Data collection for each subject proceeded as follows (Data collection instruments, including instructions to the investigator, are in Appendix A):
1. On the day prior to the observation, each subject was contacted to confirm the appointment, and was asked at that time to bring a job description to the appointment the next morning.
2. At the beginning of the subject’s work day, an investigator met with the subject, to have the consent form signed, to explain the project and the nature of the subject’s participation, and to give the subject the Activity Notes form which s/he was asked to complete during the day, with an explanation of what kind of data we expected them to enter on that form. The investigator collected the job description, and left.
2. During the course of the day, each subject indicated on the Activity Notes form the specific activities, and reasons for or intentions behind those activities that s/he engaged in. During this period, the investigator who would be interviewing the subject later went over the job description in order better to structure the eventual interviews.
3. Approximately two hours before the end of each subject’s working day, one or more investigators would return to the subject’s place of work, in order to administer a set of questionnaires and interviews.
4. Each subject first completed a questionnaire about her/his general work experience and use of software, and about her/his use of, and satisfaction with a variety of information resources.
5. While the subject was filling out the questionnaire, the interviewer(s) made a diagram of the subject’s workplace including type and location of various information access devices and information resources (not all interviews were conducted in the actual workspace of the subject, however).
6. On completion of the questionnaire, the investigator(s) initiated an interview about the tasks that the subject worked on that day, and about the activities in which they engaged. This interview, audiotaped for subsequent transcription and analysis, asked questions about each activity entered on the Activity Notes form, focusing particularly on the communication and information behaviors in which the subject engaged.
7. When all of the day’s activities had been discussed, the interview then shifted to activities that the subject might have engaged in in support of the day’s tasks, but did not during that day. The interview then shifted to discussion of the tasks which are part of the subject’s work responsibility, but in which they did not engage during that day. For each task, the subject was asked to describe the activities and resources with which they normally engaged in order to accomplish that task.
8. At the close of the interview, subjects were asked to comment upon their information activities in general, their satisfaction with the resources and systems available to them, and on what kinds of resources, functionalities and support systems they felt would be helpful to them.
2.5 Data analysis
The interviews with subjects ranged in length from 1 1/4 to 2 hours. The audiotapes of these interviews were transcribed, and the transcripts were marked by line number. The transcripts were then subjected to detailed content analysis, in order to identify the tasks in which the subjects engage, the resources and activities they use to support those tasks, the reasons for using those specific resources and activities, the information behaviors or information interactions in which they engage within the resources or activities, and the intentions underlying the behaviors or interactions. The method by which the content-analytic scheme was developed, and through which the eventual categories were identified, was as follows (both an early, and the current version of the classification is presented in Appendix A, along with examples of the use of the schema in encoding the transcripts):
1. Nine different members of the research team each attempted a separate initial content analysis of a single transcript (that transcript being the same for all members), each using whatever codes and categories seemed best to that person. The encoding of the transcript was supported by the subject’s job description and Activity Notes form.
2. The results of the different encodings of the transcript were presented at group meetings of the research team, with explanations of how and why each code was applied. Differences between the different encodings were resolved through group consensus. This procedure was repeated for several weeks, until a relatively stable set of codes and rules was agreed upon.
3. The research team was then split into four groups of two researchers each, and each group was assigned a different transcript to be analyzed. The instructions to each group were to attempt to code the transcripts according to the schema which had been developed, but to be careful not to force difficult cases into existing codes, rather trying to develop new ones.
4. Over a series of group meetings, the results of each encoding of each transcript were discussed, as was done for the original single encoded transcript. This procedure resulted in a new schema for classifying information interactions and information behaviors, and the identification of a number of new category types for encoding the episodes.
5. Each transcript was then reanalyzed according to the new schema, using each activity/resource episode within each transcript as the unit of analysis, and encoding each such episode according to all of the facets of the coding scheme.
6. On completion of encoding of all of the transcripts, an analysis will be made of the associations of the various categories with one another, and of sequences of activities and associated intentions and behaviors.
2.6 Results
The results for Task 1 (presented in Appendix A) are: the scheme for encoding the transcripts; the current classification scheme for describing and characterizing information behaviors; and, preliminary characterization of a set of common information seeking strategies as combinations of values on a small number of dimensions, and the associated functionalities necessary for supporting those strategies.
3. Task 2 Results
The results for Task 2 are that the milestones as specified have been met. Both FIRE and ObjectStore have been installed, and are working robustly enough to begin implementing separate IR system functionalities within the framework. In addition, several different interfaces to the InQuery IR engine have been constructed, to support work in the TREC-6 Interactive Track, to test different functionalities which we intend to implement within Task 3, and to begin work on a common-look interface for the Task 3 systems.
There is some question whether the capabilities of ObjectStore will be sufficient for us to use it in our interactive experiments in Task 3. The problem that arises is that it has been used only for rather small databases, and it is still unclear whether it will be fast enough, in its current implementation, to support the size databases which we will be using for the Tasks 3 and 4. We are currently working on this problem, but in case it cannot be resolved in time for our first set of experiments in Task 4, we are in parallel using the basic InQuery capabilities for indexing and retrieval to develop separate systems which have the particular combinations of representation and comparison techniques we predict will be useful for supporting different information seeking strategies. This work will also include producing different interfaces for the different systems. In addition, we are beginning to install the TIPSTER-compliant modules which have been developed at Logicon and NMSU, to see if we can use these to construct the IR systems which we will be testing in Task 4.
The initial specification of systems to be developed in Task 3, and tested in Task 4 has now begun. We have identified, in Task 1, the following general information seeking intentions / strategies for which we will test different IR system support:
These four intentions/strategies will be the basic ones on which we will test the different combinations of IR functionalities that will be implemented in our systems in Task 3; we anticipate that they might be modified or added to as a result of the experiments in Task 4.
4. Finances
Expenditures for RTP3 rather closely followed the projected plan for expenses in the original budget, both in terms of when funds were expended, for what, and how much things cost. There were a few instances of shifting funds from one category of expenditure to another, in particular because we neglected to budget for required software (i.e. ObjectStore), and because it turned out not to be necessary to pay the subjects participating in Task 1.
There were some significant problems in the accounting and invoicing procedures for RTP3 during this year. Invoices were regularly sent (monthly) from our accounting office starting in January 1997. However, it appears that these invoices for some reason did not reach the appropriate accounting authority, and therefore remained unpaid until September 1997. At that time, the situation (that no funds had been expended by the contracting agency in support of our contract for the entire year) was made known to us by the Maryland Procurement Office. Eventually, an agreement was made with the current accounting authority, ONR Draper, that all of the invoices that had been submitted be cumulated and submitted as one, and paid in a single payment. This has now been done, and expenses on RTP3 through July 1997 have been submitted and paid. The total amount billed and paid as of 31 July 1997 is $152,563. Since the budgeted total up to September 1997 is $199,978, it appears that RTP3 will be on budget for its first 12 months.
APPENDIX A:
RTP3 TIPSTER III 12-MONTH WORKSHOP PRESENTATION
APPENDIX B:
DOCUMENTS AND PRESENTATIONS RELATED TO RTP3
PREPARED OR PRESENTED DURING YEAR 1