| - Week - | - Topics / Activities - | - Resources - (Readings before class) | - Students' responsibilities - (During and/or after class) |
|---|---|---|---|
* 1 * Tue, |
Introduction and overview of the course. | Get familiar with the course website. Set up your course website on
eden. |
|
* 2 * Tue, |
Introduction to IR. Information vs data retrieval. What do we want from IR ? Introduction to evaluation. |
Readings: vR, ch.1: "Introduction"; RB, ch.1: "Overview"; SJW, ch.1, "Overall introduction", ch.2: "Introduction"; Croft, W.B. (1995) "What do people want from information retrieval ?", D-Lib Magazine, November, Belkin N.J. & Croft, W.B. (1992) "Information filtering and information retrieval: Two sides of the same coin?" Communications of the ACM, v. 35 no. 12: 29-38.. |
|
* 3 * Tue, |
IR concepts. Aboutness. Relevance. Rationalist vs. empriricist approaches (AI vs. Stats) Ron Gorda and Sunny Yoon's presentation: Design decisions for IRS; automatic vs. manual/intellectual systems. |
Readings: Paul Debraski's presentation: Saracevic, T. (1975) "Relevance: a review of and a framework for the thinking on the topic", Journal of the American Society for Information Science, vol. 26: 321-343 (SJW, 143-165). |
|
* 4 * Tue, |
Document and query representation. Manual vs. automatic indexing. | Readings: Sarah Jewell's presentation: Furnas, G.W., Landauer, T. K., Gomez, L. M., Dumais, S. T. (1987) "The vocabulary problem in human-system communication", Communications of the ACM, 30(11), 964-971. |
Look at an example of a document
collection, a stopword
list, an indexed collection
and an inverted file. |
* 5 * Tue, |
Automatic indexing. Lexical analysis. Weighting. Data structures. | Readings: For stemming code or a demo, see Martin Porter’s site. Thomas Tamburin's presentation:
Mikheev, Andrei “Document
Centered Approach to Text Normalization”, SIGIR 2000, Athens. |
|
* 6 * Tue, Stephen Robertson's presentation on Probabilistic Models A student's investigation of the math in the vector space model |
Models of IR. Interaction models. Indexing models. Language models. Topic models. User models. Relevance feedback. |
Note. The list below offers a good coverage of the fundamental models in IR, for those interested in the history and development of IR. It is not required that you read them (some are actually hard to read !); it is sufficient that you have an idea of what each model implies. SJW, intro to ch.5 ("Models") and ch.6 ("Techniques"). Cooper, W.S. "Getting beyond Boole", Information Processing and Management, vol. 24: 243-248 (also in SJW, 265-267); van Rijsbergen,C. J. "A new theoretical framework for information retrieval", SIGIR'86; Robertson, S.E. "The probability ranking principle in IR", Journal of Documentation, vol 33: 294-304, 1977 (also in SJW, 281-286). Salton, G., Wong, A. & Yang, C.S. (1975) “A vector space model for automatic indexing”, Communications of the ACM, vol 18: 613-620 (also in SJW, 273-280); N. J. Belkin, R. N. Oddy, and H. M. Brooks "ASK for information retrieval: Part I. Background and theory.", Journal of Documentation, 38(2):61--71, 1982; Saracevic, T. "Interactive models in information retrieval (IR): Progress, problems, proposal", in Proceedings of the 1996 ASIS Annual Meeting, Medford, NJ. Turtle, H. & Croft, W.B. (1990) “Inference networks for document retrieval”, SIGIR 1990, New York: ACM, 1-24; Ponte, J. and Croft, W.B. "A Language Modeling Approach to Information Retrieval", SIGIR'98; Lavrenko, V. "Language models", tutorial at SIGIR 2003. Robertson, S. and Sparck Jones, K. "Simple, proven approaches to text retrieval", Technical Report TR356, Cambridge University Computer Laboratory, 1997; Salton, G. and Buckley, C. "Term-weighting approaches in automatic text retrieval", Information Processing and Management, 24(5):513-523, 1988. Cinthia Levy's presentation: Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., Harshman R. (1990) "Indexing by Latent Semantic Analysis", JASIST, 41(6), 391-407 (a less mathematical alternative to Deerwester's original LSA paper is: Landauer, T. K., Foltz, P. W., Laham, D. "An Introduction to Latent Semantic Analysis", Discourse Processes, 25,259-284, 1998). Stephen Donweber's presentation: Bates, Marcia J. “The Design of Browsing and Berrypicking Techniques for the Online Search Interface.", Online Review 13 (October 1989): 407-424. |
Homework. |
* 7 * Tue, |
Information Retrieval as interaction. Evaluation of interactive systems. Intro to Statistics for IR. Case study: Rutgers at TREC, Interactive Track (here's the presentation from Interactive TREC 2002). |
Readings: Any book or online tutorial on Statistics (concentrate on hypothesis
testing, t-tests, ANOVA, Chi-square, correlation, ...); Hull, D. (1993)
"Using statistical testing
in the evaluation of retrieval experiments", SIGIR'93'; Buckley,
C. and Voorhees E.M. (2000) "Evaluating
evaluation measure stability", SIGIR'00; Voorhees E.M. and Buckley,
C. (2002) "The effect
of topic set size on retrieval experiment error", SIGIR'02. TREC, Rutgers' recent work at TREC (see Muresan's publications, and TREC 2003 webpage), and previous work (and actually the whole 37 (3), May 2001 issue of Information Processing and Management, focussing on Interactive TREC, would be interesting for people who choose to do an evaluation for the final project). |
Here are some data from our participation in the interactive TREC 2003 (user data, search data, results poster, paper written with CSIRO). Play with SPSS and try to reproduce (some of) the results. |
* 8 * Tue, |
User interfaces for IR systems. Part I: Interaction models. |
Readings: Journal of the American Society of Information Science, vol. 43, issue 2, 1992, special issue on Human-Computer Interface: “Introduction and Overview” by Lunin and Harman, “Interfaces for end-user information seeking” by Gary Marchionini, “User-friendly systems instead of user-friendly front-ends” by Donna Harman, “Intelligent information retrieval: An introduction” by Susan Gauch, “Models for hypertext” by Mark F. Frisse and Steve B. Cousins; Muresan, G. and Harper, D. J. “Document Clustering and Language Models for System-Mediated Information Access”, ECDL’01, Darmstadt, p. 438-449. Student presentations: O’Day, V. L. and Jeffries, R. “Orienteering in an information landscape: how information seekers get from here to there”, InterCHI’93, Amsterdam. Brajnik, G., Mizzaro, S., Tasso, C. and Venuti, F. “Strategic Help in User Interfaces for Information Retrieval”, JASIST, 53(5), 2002, p. 343-358. Gretchen Schwarz' presentation: Campbell, I. “Supporting Information Needs by Ostensive Definition in an Adaptive Information Space”, MIRO’95, or "The Ostensive Model of Developing Information Needs", CoLIS, Copenhagen, 1996. |
|
* 9 * Tue, |
Spring break. | ||
* 10 * Tue, |
User interfaces for IR systems. Part II : Tools and techniques. Information Visualization. |
Readings: Shneiderman, Ben, ch.“Information Search and Visualization” in “Designing the user Interface”, 3rd ed., 1997 (see webpage); Belkin, N.J., Marchetti, P.-G., Cool, C. (1993) BRAQUE: Design of an interface to support user interaction in information retrieval. Information Processing and Management, 29 (3): 325-344; Chalmers, M. and Chitson, P. “Bead: Exploration in information visualization”, SIGIR’92, p. 330-337; Nowell, L.T., France, R.K., Hix, D., Heath, L.S., Fox, E.A. (1996) “Visualizing search results: Some alternatives to query-document similarity”, SIGIR’ 96, p. 67-75; Williamson, C., Shneiderman, B. (1992) “The Dynamic HomeFinder: Evaluating dynamic queries in a real-estate information exploration system”, SIGIR’92, p. 338-346; Lin, Xia “Map displays for information retrieval”, JASIS, 48(1), 1997, p. 40-54; George Robertson (2000) "The Task Gallery: a 3D window manager", SIGCHI. Further readings on Human Computer Interaction (HCI): Further readings on Information Visualization (IV): Student presentations: Tony Joachim's presentation: Gary Marchionini, “Interfaces for end-user information seeking”, JASIS, 43(2), 1992. Alyssa Katz's presentation: Cutting, D. R., Pedersen, J. O., Karger, D. and Tukey, J. W. “Scatter/Gather: A cluster-based approach to browsing large document collections”, SIGIR’92, p. 318-329; Hearst, M. and Karadi, C. "Cat-a-Cone: An Interactive Interface for Specifying Searches and Viewing Retrieval Results using a Large Category Hierarchy", SIGIR'97 , Philadelphia, PA; Chen, M., Hearst, M., Hong, J., and Lin, J. "Cha-Cha: A System for Organizing Intranet Search Results" in the Proceedings of the 2nd USENIX Symposium on Internet Technologies and SYSTEMS (USITS), Boulder, CO, 1999. |
|
* 11 * Tue, |
Evaluation of IR systems. Experimental vs operational IR systems. Evaluation of interactive IR systems. IR evaluation in context. |
Readings: In JASIS, 47(1), January 1996, Special Issue: Evaluation of Information Retrieval :- Tague-Sutcliffe, J. M. – “Some perspectives on the evaluation of information retrieval systems”, Blair, D. C. – “STAIRS redux: Thoughts on the STAIRS evaluation, ten years after”, Hersh, W. et al. – “A task-oriented approach to information retrieval evaluation”; Ellis, D. – “The dilemma of measurement in information retrieval research”; Beaulieu, M. et al. – “Evaluating interactive systems in TREC”. In Information Processing and Management, 31 (3), May-June 1995, Special issue: TREC :- Harman, D. - “Overview of the Second Text Retrieval Conference (TREC-2)”; Sparck Jones, K. – “Reflections on TREC”; Robertson, S. E. et al. – “Large Test Collection Experiments on an Operational, Interactive System: Okapi at TREC”; Belkin, N. et al. – “Combining the Evidence of Multiple Query Representations for Information Retrieval”. In Information Processing and Management, 36 (1), January 2000, Special issue: TREC :- Harman, D. - “Overview of the Sixth Text REtrieval Conference (TREC-2)”; Sparck Jones, K. – “Further reflections on TREC”; Robertson, S. E. et al. – “Experimentation as a way of life: Okapi at TREC”. Borlund, P. and Ingwersen, P. (1997) “The development of a method for the evaluation of interactive information retrieval systems”, Journal of Documentation, 53(3). In Information Processing and Management, 37 (3), May 2001, Special issue: Interactive TREC :- Hersh, W. and Over, P. - “Interactivity at the Text Retrieval Conference (TREC)”; Over, P. - “The TREC interactive track: an annotated bibliography”; Hersh et al. – “Challenging conventional assumptions of automated information retrieval with real users: Boolean searching and batch retrieval evaluations”; Belkin, N. et al. “Iterative exploration, design and evaluation of support for query reformulation in interactive information retrieval”; Allan, J. et al. – “Evaluating combinations of ranked lists and visualizations of inter-document similarity”; Wu, M. et al. – “Using clustering and classification approaches in interactive retrieval”; Larson, R. R. - “ TREC interactive with Cheshire II”; Bodner, R. C. et al. – “The impact of text browsing on text retrieval performance”; Yang, K. - “Passage feedback with IRIS”. The Text Retrieval Conference (TREC) webpage. In TREC2001, Belkin et al. “Rutgers' TREC 2001 Interactive Track Experience”. Jakob Nielsen's Alertbox, March 1, 2004: "Risks of Quantitative Studies" Student presentations: Allysa Katz's presentation: Reid, J. “A Task-Oriented Non-Interactive Evaluation Methodology for Information Retrieval Systems”, Information Retrieval, 2(1), Feb 2000. |
|
* 12 * Tue, |
(More Evaluation, if needed) Structure. Document and query structure. Links. Categorization vs. clustering. Filtering. XML & INEX. |
Readings. Student presentations: Hearst, M. A. and Pedersen, J. O. “Reexamining the cluster hypothesis: scatter/gather on retrieval results”, SIGIR’96, Zurich, p. 76-84 Y. Kural, S. Robertson and S. Jones, "Deciphering cluster representations", Information Processing and Management 37, 593-601 (2001). |
|
* 13 * Tue, |
IR on the Web. | Readings: Hao Chen and Susan Dumais (2000) “Bringing Order from Chaos: automatically categorizing search results”, SIGCHI; Susan Dumais and Hao Chen (2000) "Hierarchical classification of Web content", SIGIR, Susan Dumais, Edward Cutrell, Hao Chen (2001) "Optimizing search by showing results in context ", SIGCHI; Ed H. Chi, Peter Pirolli, James Pitkow (2000), "The scent of a site: a system for analyzing and predicting information scent, usage, and usability of a Web site", SIGCHI. Lary Page, Sergey Brin et al “PageRank: Bringing Order to the Web”, Stanford Uni. report (the model behind Google). Krishna Bharat and Monika R. Henzinger (1998) "Improved algorithms for topic distillation in a hyperlinked environment", SIGIR. Search EngineWatch Potential student presentations: Ellis, D., Ford, N. and Furner, J. (1998) “In
search of the unknown user: indexing, hypertext and the World Wide Web”,
Journal of Documentation, 54(1). |
|
* 14 * Tue, Slides in HTML and PDF |
Advanced topics: Natural Language Processing in IR |
Guest lecture - Nina Wacholder: "NLP in IR" Readings: |
|
* 15 * Tue, |
Advanced topics: Multimedia IR (image, video, music, ...). Information
extraction. Collaborative systems. Recommender systems. Personalization and user modeling. AI and IR. Agents. |
Advanced topics presented by students (final project): Readings: |
|
* 16 * Tue, |
Project presentations. | Alyssa Katz: "Recommender systems" Paul Debraski: "Relevance feedback" Yibu Chen Stephen Donweber Dawn Filan Tony Joachim Gretchen Schwarz Albena Stoyanova-Tzankova |
|
| Thu, May 06 | Grades due |