INFORMATION RETRIEVAL
17:610:551
Spring 2002
Gheorghe Muresan
SCHEDULE AND ASSIGNED READINGS (expect revisions)
You are requested to read at least two chapters or articles from the list provided each week. According to your background, interest and potential project you have in mind, as well as the time available, you are encouraged to read the other items in the list, or to read background material recommended, or to implement some of the algorithms we discuss.
Lecture 1 – Jan 24: Introduction and
overview of the course.
Lecture 2 –
Jan 31: The goals of IR. IR problems, the IR situation, and IR systems.
Readings: Hersh, Chapters 1 and 2. In Sparck Jones & Willett, "Overall
introduction", and Chapter Two, "Introduction". Belew, R. K. (2000), Chapter 1,
“Overview”. Belkin, N.J. (1980) Anomalous states of knowledge as a basis for
information retrieval. Canadian Journal of Information Science, v. 5:
133-143. Also: Belkin & Vickery (1985) Chapters 1 and 2; Ingwersen (1992),
Chapter 3; the introductory chapters to any of: Lancaster (1978); Lancaster
& Warner (1993); Meadow (1992); van Rijsbergen (1979),
chapter 1: Introduction; Salton & McGill (1983).
Lecture 3 – Feb 7: Fundamental
concepts in IR. Information, meaning, aboutness, relevance.
Readings: In Sparck Jones &
Willett, from Chapter 3: the "Introduction". Belkin, N.J. (1978)
Information concepts for information science. Journal of Documentation,
v. 34, no.1: 55-85. Hutchins, W.J. (1978) The concept of "aboutness"
in subject indexing. Aslib Proceedings, vol. 30: 172-181 (Also in Sparck
Jones & Willett, pp. 93-97). Saracevic, T. (1975) Relevance: a review of
and a framework for the thinking on the topic. Journal of the American
Society for Information Science, vol. 26: 321-343 (Also in Sparck Jones
& Willett, pp. 143-165).
Lecture 4 –
Feb 14: Actors and processes in IR systems. What do we want from
Information Retrieval ?
Readings: Belkin, N.J. (1993) Interaction with texts: Information retrieval as
information-seeking behavior. In: Information Retrieval `93: Von der
modellierung zur Anwendung. Konstanz: Universitaetsverlag Konstanz, 55-66.
Croft, W.B. (1995) What
do people want from information retrieval? D-Lib Magazine, November.
In Kowalski and Maybury: Chapter 2 “Information Retrieval System Capabilities”.
Belkin N.J. & Croft, W.B. (1992) Information
filtering and information retrieval: Two sides of the same coin? Communications
of the ACM, v. 35 no. 12: 29-38.
Lecture 5 – Feb 21: Document
and query representation. Manual vs. automatic indexing.
Compulsory
readings:
Hersh, Chapters 5: “Indexing”. J. D. Anderson & J. Perez-Carballo, “The
nature of indexing: how humans and machines analyze messages and texts for
retrieval. Part
I: Research, and the nature of human indexing; Part II:
Machine indexing, and the allocation of human versus machine effort”,
Information Processing and Management, vol. 37 (2001), p. 231-254, p. 255-277.
Other readings: In Sparck Jones & Willett, from Chapter
6, the "Introduction" (especially the section on Indexing). Foskett,
D.J. (1980) Thesaurus. In A. Kent, J. Lancour & J.E. Daily, eds., Encyclopedia
of Library and Information Science, v. 30, pp. 416-462. New York: Marcel
Dekker (Also in Sparck Jones & Willett, pp. 111-134).
Lecture 6 – Feb 28: Automatic
indexing. Lexical analysis. Weighting. Data structures.
Compulsory
readings:
van Rijsbergen (1979), Chapter 2:
“Automatic text analysis”. Also review “Automatic indexing” from last
week.
Other readings: Hersh, Chapters 8: “Lexical-statistical systems”, Belew, R. K. (2000), Chapter 2,
“Extracting lexical features”. In Sparck Jones & Willett, from Chapter 6,
the "Introduction" (especially the section on Indexing). Salton, G.
& Buckley, C. (1988) “Term weighting approaches in automatic text
retrieval”, Information Processing and Management, vol. 24: 513-523
(Also in Sparck Jones & Willett, pp. 323-328). Robertson, S. E. and Sparck
Jones, K. (1997), “Simple, proven approaches to text retrieval”, University of Cambridge
Computer Laboratory Technical Report no. 356, 1994 (updated 1996,1997).
For
stemming code or a demo, see Martin
Porter’s site.
Presentations:
Eakins,
J. P. and Graham, M. E. "Content-based Image
Retrieval: A Report to the JISC Technology Applications Programme" -
Stacy Adduci.
Mikheev,
Andrei “Document
Centered Approach to Text Normalization”, SIGIR 2000, Athens – Craig
Willard.
Homework
!
Lecture7 -
Mar 7: Models of IR. Interaction models. Indexing models. Relevance
feedback.
Readings: In Sparck Jones & Willett, from Chapter 5, the "Introduction".
Cooper, W.S. Getting beyond Boole. Information Processing and Management,
vol. 24: 243-248. Also in Sparck Jones & Willett, pp. 265-267. Robertson,
S.E. The probability ranking principle in IR. Journal of Documentation.
vol 33: 294-304 (Also in Sparck Jones & Willett, pp. 281-286). Salton, G.,
Wong, A. & Yang, C.S. (1975) “A vector
space model for automatic indexing”, Communications of the ACM, vol 18:
613-620. Also in Sparck Jones and Willett, pp. 273-280. Saracevic, T. (1996).
Interactive models in information retrieval (IR): Progress, problems, proposal.
In Proceedings of the 1996 ASIS Annual Meeting. Medford, NJ: Learned
Information. Turtle, H. & Croft, W.B. (1990) “Inference
networks for document retrieval”, SIGIR 1990, New York: ACM, 1-24.
Readings proposed for presentation:
Rajashekar, T. B. and
Croft, W. B. “Combining
Automatic and Manual Index Representations in Probabilistic Retrieval”,
JASIS, 1995.
Campbell,
I. “Supporting
Information Needs by Ostensive Definition in an Adaptive Information Space”,
MIRO’95.
Presentation:
Bates, Marcia J. “The Design of
Browsing and Berrypicking Techniques for the Online Search Interface."
Online Review 13 (October 1989): 407-424 – Sharon Kaye.
Lecture
8 – Mar 14: User interfaces for IR systems.
Part I: Interaction models.
Compulsory readings: Chapter 10: “User
Interfaces and Visualization” by Marti Hearst in “Modern Information
Retrieval”.
Recommended readings: Journal of the American Society of Information
Science, vol. 43, issue 2, 1992, special issue on Human-Computer
Interface: “Introduction
and Overview” by Lunin and Harman, “Interfaces for end-user information seeking” by Gary Marchionini, “User-friendly
systems instead of user-friendly front-ends” by Donna Harman, “Intelligent
information retrieval: An introduction” by Susan Gauch, “Models for
hypertext” by Mark F. Frisse and Steve B. Cousins; Muresan, G. and Harper, D. J. “Document
Clustering and Language Models for System-Mediated Information Access”, ECDL’01, Darmstadt,
p. 438-449.
Presentations:
Bates, M. (1990) “Where should
the person stop and the information search interface start?” Information
Processing and Management, v 26(5): 575-591 – Cheryl Milburn.
O’Day,
V. L. and Jeffries, R. “Orienteering in an
information landscape: how information seekers get from here to there”, InterCHI’93, Amsterdam – Tamara Richman.
Hendry,
D. G. and Harper, D. J. “An informal
information-seeking environment”, JASIS 48 (11), 1997 – Roman
Santillan.
Spring break !
Lecture
9 – Mar 28: User interfaces for IR systems.
Part II : Tools and techniques. Information Visualization.
Structure. Categorization vs. clustering.
Readings: Shneiderman, Ben, chapter
“Information Search and Visualization” in “Designing the user Interface”, 3rd
ed., 1997 (and associated webpage);
Belkin, N.J., Marchetti, P.-G., Cool, C. (1993) BRAQUE: Design of an interface
to support user interaction in information retrieval. Information Processing
and Management, 29 (3): 325-344; Chalmers, M. and Chitson, P. “Bead:
Exploration in information visualization”, SIGIR’92, Copenhagen, p.
330-337; Nowell, L.T., France, R.K., Hix, D., Heath, L.S., Fox, E.A. (1996)
“Visualizing search results: Some alternatives to query-document similarity”, SIGIR’
96, New York, p. 67-75; Williamson, C., Shneiderman, B. (1992) “The
Dynamic HomeFinder: Evaluating dynamic queries in a real-estate information
exploration system”, SIGIR’92, New York, p. 338-346; Nowell, L. T.
and France R. K. and Hix, D. and Heath, L. S. and Fox, E. A. “Visualizing
search results: some alternatives to query-document similarity”, SIGIR’96, Zurich, p. 67-75; Lin, Xia “Map
displays for information retrieval”, JASIS, 48(1), 1997, p. 40-54.
Further readings on HCI:
Preece,
J., Rogers, Y. and Sharp, H. (2002) – “Interaction Design – Beyond
Human-Computer Interaction” (and
associated webpage).
Further readings on Information Visualization (IV):
Spence,
R. (2000) – “Information Visualization”, ISBN: 0201596261; Chen, C. (1999) – “Information Visualisation
and Virtual Environments”, ISBN: 1852331364; Card, S. K., MacKinlay, J. D. and Shneiderman (1999)
– “Readings in
Information Visualization : Using Vision to Think”, ISBN: 1558605339. Also,
University of Maryland’s HCI Lab website,
and InfoViz, a
repository for IV.
Readings proposed for presentation:
Korfhage,
Robert R. “To see, or not
to see - is
Cutting,
D. R., Pedersen, J. O., Karger, D. and Tukey, J. W. “Scatter/Gather:
A cluster-based approach to browsing large document collections”, SIGIR’92,
Copenhagen, p. 318-329.
Presentations:
Gary
Marchionini, “Interfaces for end-user information seeking”, JASIS, 43(2), 1992 –
Minsoo Park.
Lecture 10 – Apr 4: Evaluation
of IR systems. Experimental vs operational IR systems.
Readings: Hersh, chapter 3: “System evaluation”,
and chapter 7: “Evaluation”. In Baeza-Yates & Ribeiro-Neto “Modern Information
Retrieval”, chapter 3: “Retrieval Evaluation”. In Sparck Jones &
Willett, from Chapter 4, the "Introduction" and the articles by
Saracevic, et al., Lancaster, and Harman. Su, L. (1992) Evaluation measures for
interactive information retrieval. Information Processing and Management,
28(4): 503-516; Harman, Donna “Overview of
the first TREC conference”, SIGIR’93, Pittsburg.
In
JASIS, 47(1), January 1996, Special Issue: Evaluation
of Information Retrieval :- Tague-Sutcliffe, J. M. – “Some
perspectives on the evaluation of information retrieval systems”, Blair, D.
C. – “STAIRS
redux: Thoughts on the STAIRS evaluation, ten years after”, Hersh, W. et
al. – “A
task-oriented approach to information retrieval evaluation”; Ellis, D. – “The dilemma
of measurement in information retrieval research”; Beaulieu, M. et al. – “Evaluating
interactive systems in TREC”.
In
Information Processing and Management, 31 (3), May-June 1995, Special
issue: TREC :- Harman, D. - “Overview of the Second Text Retrieval
Conference (TREC-2)”; Sparck Jones, K. – “Reflections
on TREC”; Robertson, S. E. et al. – “Large Test
Collection Experiments on an Operational, Interactive System: Okapi at TREC”;
Belkin, N. et al. – “Combining the
Evidence of Multiple Query Representations for Information Retrieval”.
In
Information Processing and Management, 36 (1), January 2000, Special
issue: TREC :- Harman, D. - “Overview of the Sixth Text REtrieval
Conference (TREC-2)”; Sparck Jones, K. – “Further
reflections on TREC”; Robertson, S. E. et al. – “Experimentation
as a way of life: Okapi at TREC”.
The
Text Retrieval Conference (TREC) webpage.
Presentations:
Brajnik,
G., Mizzaro, S., Tasso, C. and Venuti, F. “Strategic
Help in User Interfaces for Information Retrieval”, JASIST, 53(5),
2002, p. 343-358 – Tina Marie Doody.
Saracevic,
T. “Evaluation
of Evaluation in Information Retrieval”, SIGIR’95, Seattle – Dana
Knauff.
Lecture 11 –
Apr 11: Evaluation of interactive IR systems. IR evaluation in context.
Readings: Hersh, Chapters 3, 7. In Sparck Jones & Willett, from Chapter 4,
the "Introduction" and the articles by Saracevic, et al., Lancaster,
and Harman. Su, L. (1992) Evaluation measures for interactive information
retrieval. Information Processing and Management, 28(4): 503-516;
Borlund, P. and Ingwersen, P. (1997) “The
development of a method for the evaluation of interactive information retrieval
systems”, Journal of Documentation, 53(3).
In
Information Processing and Management, 37 (3), May 2001, Special
issue: Interactive TREC :- Hersh, W. and Over, P. - “Interactivity at the Text Retrieval
Conference (TREC)”; Over, P. - “The TREC
interactive track: an annotated bibliography”; Hersh et al. – “Challenging conventional assumptions of
automated information retrieval with real users: Boolean searching and batch
retrieval evaluations”; Belkin, N. et al. “Iterative
exploration, design and evaluation of support for query reformulation in
interactive information retrieval”; Allan, J. et al. – “Evaluating
combinations of ranked lists and visualizations of inter-document similarity”;
Wu, M. et al. – “Using
clustering and classification approaches in interactive retrieval”; Larson,
R. R. - “
TREC interactive with Cheshire II”; Bodner, R. C. et al. – “The impact of
text browsing on text retrieval performance”; Yang, K. - “Passage feedback
with IRIS”.
Belkin
et al. “Rutgers' TREC 2001 Interactive Track
Experience”, at TREC
2001.
Preece,
J., Rogers, Y. and Sharp, H. (2002) – “Interaction Design – Beyond
Human-Computer Interaction” (and
associated webpage) – chapters on
Evaluation.
Hull,
D. “Using Statistical Testing in the Evaluation
of Retrieval Experiments”, SIGIR ’93; Wilcox, R. R.
“Statistics for Social Sciences” or any other book on Stats; also, a Statistics textbook
online.
Presentations:
Borlund, P. “Experimental
Components for the evaluation of interactive information retrieval systems”,
Journal of Documentation, Vol. 56, no. 1, 2000, 71-90 – Christine
Bates.
Reid,
J. “A Task-Oriented Non-Interactive Evaluation
Methodology for Information Retrieval Systems”, Information Retrieval,
2(1), Feb 2000 - Melissa Roll.
PROJECT TOPICS DUE.
Lecture 12
– Apr 18: Structure. Classification. Clustering.
Readings. van Rijsbergen (1979), Chapter 3:
“Automatic classification”; In Sparck Jones & Willett, from Chapter 6 the
article by Griffiths, Luckhurst & Willett; from Chapter 8, the article by
Hayes, Knecht and Cellio and the article by Rau; Leuski, Anton "Evaluating
Document Clustering for Interactive Information Retrieval", CIKM'01,
33-40; Hearst,
Marti “The Use of Categories and Clusters in
Information Access Interfaces”, in Natural Language Information Retrieval, Strzalkowski
(ed.), Kluwer Academic Publishers, 1999; Sanderson, M. and Croft, W. B. “Deriving
concept hierarchies from text”, SIGIR 1999, Berkeley; Tombros, A., Villa,
R. and Van Rijsbergen, C. J. (2002) “The
effectiveness of query-specific hierarchic clustering in information retrieval”,
Information
Processing and Management, 38(4); Yang, Yiming “An Evaluation of
Statistical Approaches to Text Categorization”, Information Retrieval 1,
1999, p.69-90.
Presentations:
Hearst,
M. A. and Pedersen, J. O. “Reexamining the
cluster hypothesis: scatter/gather on retrieval results”, SIGIR’96, Zurich, p. 76-84 – Mary Ellen
Valverde.
Kural,
Y. and Robertson, S. and Jones, S. “Deciphering
cluster representations”, Information Processing and Management, 37,
2001, p. 593-601 – Brendan Banks.
Lecture 13 – Apr 25: IR on
the Web.
Readings: See Journal of the American
Society for Information Science and Technology, 53(2), 2002 - Special issue on Web
research; Almind, T. C. and Ingwersen, P. (1997) “Informetric Analysis on the World Wide Web: Methodological Approaches to
Webometrics”, Journal of Documentation, 53(4); Chu, H. and
Rosenthal, M (1996) “Search Engines for the World Wide
Web: A Comparative Study and Evaluation Methodology”, Proceedings of
ASIS’96.
“The Internet: Bringing
Order from Chaos”, special report in Scientific American, March
1997.
“PageRank: Bringing Order
to the Web” the model behind Google.
Presentations:
Spink, Amanda (2002) “A user-centered approach to evaluating human
interaction with Web search engines: an exploratory study”, Information Processing and
Management,
38(3) – Shilpa Shanbhag.
Ellis,
D., Ford, N. and Furner, J. (1998) “In search of
the unknown user: indexing, hypertext and the World Wide Web”, Journal
of Documentation, 54(1) – Jinyoung Park.
Lecture 14 –
May 2: Current research and future directions for IR systems. Multimedia
IR. Collaborative systems. Recommender systems. User modeling. Document
summarization. Information extraction.
Course
evaluation.
Readings: Hersh, Chapter 9: “Linguistic Systems”; In Sparck Jones & Willett,
from Chapter 8, the "Introduction" and any other article there that
looks interesting; Chapter 9. SIGIR’99 Workshop on Recommender
Systems, UC Berkeley; Set of articles on Recommender Systems in Communications
of the ACM, 40 (3), March 1997 – leading article: Resnick, P. and Varian,
H. R. “Recommender Systems”. Xie, H. “Patterns
between Interactive Intentions and Information-Seeking Strategies”, Information
Processing and Management, 38, 2002; Chalmers, Matthew “Paths
and Contextually Specific Recommendations”, DELOS Workshop, 2001;
Pazzani, M. and Billsus, D. “Learning and
Revising User Profiles: The Identification of Interesting Web Sites”, Machine
Learning 27, 1997, p313-331.
Presentation:
Gaizauskas, R. and Wilks, Y. (1998) “Information Extraction: Beyond Document Retrieval”, Journal of Documentation, 54(1) – Fran Pfeffer.
Lecture
15 – May 9: Discussion/presentation of final projects.
PROJECTS DUE.
Mon – May 13: Grades due.