Schedule for 17:610:551

This schedule is subject to alterations.

(Legend for the readings: MBK = Meadow, Boyce and Kraft, vR = van Rijsbergen, RB = Rik Belew, SJW = Sparck-Jones and Willett)

- Week - - Topics / Activities - - Resources -
(Readings before class)
- Students' responsibilities -
(During and/or after class)

* 1 *

Tue,
Jan 20

Slides in HTML and PDF

Introduction and overview of the course.  

Get familiar with the course website. Set up your course website on eden.
Send me email with your details (use students.xml template).

* 2 *

Tue,
Jan 27

Slides in HTML and PDF

Introduction to IR. Information vs data retrieval.

What do we want from IR ? Introduction to evaluation.

Readings:
vR, ch.1: "Introduction"; RB, ch.1: "Overview"; SJW, ch.1, "Overall introduction", ch.2: "Introduction"; Croft, W.B. (1995) "What do people want from information retrieval ?", D-Lib Magazine, November, Belkin N.J. & Croft, W.B. (1992) "Information filtering and information retrieval: Two sides of the same coin?" Communications of the ACM, v. 35 no. 12: 29-38..
 

* 3 *

Tue,
Feb 03

Slides in HTML and PDF

IR concepts. Aboutness. Relevance.

Rationalist vs. empriricist approaches (AI vs. Stats)

Ron Gorda and Sunny Yoon's presentation: Design decisions for IRS; automatic vs. manual/intellectual systems.

Readings:
SJW, ch.3: "Introduction". Belkin, N.J. (1978) Information concepts for information science. Journal of Documentation, v. 34, no.1: 55-85. Hutchins, W.J. (1978) The concept of "aboutness" in subject indexing. Aslib Proceedings, vol. 30: 172-181 (SJW, 93-97).

Paul Debraski's presentation: Saracevic, T. (1975) "Relevance: a review of and a framework for the thinking on the topic", Journal of the American Society for Information Science, vol. 26: 321-343 (SJW, 143-165).

 

* 4 *

Tue,
Feb 10

Slides in HTML and PDF

Document and query representation. Manual vs. automatic indexing.

Readings:
J. D. Anderson & J. Perez-Carballo, “The nature of indexing: how humans and machines analyze messages and texts for retrieval. Part I: Research, and the nature of human indexing; Part II: Machine indexing, and the allocation of human versus machine effort”, Information Processing and Management, vol. 37 (2001), 231-254, 255-277.

Sarah Jewell's presentation: Furnas, G.W., Landauer, T. K., Gomez, L. M., Dumais, S. T. (1987) "The vocabulary problem in human-system communication", Communications of the ACM, 30(11), 964-971.

Look at an example of a document collection, a stopword list, an indexed collection and an inverted file.
Formulate a few boolean queries and figure out the result of a boolean search.

* 5 *

Tue,
Feb 17

Slides in HTML and PDF

Automatic indexing. Lexical analysis. Weighting. Data structures.

Readings:
vR, ch.2: “Automatic text analysis”; RB, ch.2: “Extracting lexical features”; SJW, ch.6, "Introduction" (esp. the section on Indexing). Salton, G. & Buckley, C. (1988) “Term weighting approaches in automatic text retrieval”, Information Processing and Management, vol. 24: 513-523 (SJW, pp. 323-328). Robertson, S. E. and Sparck Jones, K. (1997), “Simple, proven approaches to text retrieval”, University of Cambridge Computer Laboratory Technical Report no. 356, 1994 (updated 1996,1997).

For stemming code or a demo, see Martin Porter’s site.

Thomas Tamburin's presentation: Mikheev, Andrei “Document Centered Approach to Text Normalization”, SIGIR 2000, Athens.

 

* 6 *

Tue,
Feb 24

Slides in HTML and PDF

Stephen Robertson's presentation on Probabilistic Models

A student's investigation of the math in the vector space model

Models of IR.

Interaction models. Indexing models. Language models. Topic models. User models.

Relevance feedback.

Note. The list below offers a good coverage of the fundamental models in IR, for those interested in the history and development of IR. It is not required that you read them (some are actually hard to read !); it is sufficient that you have an idea of what each model implies.

SJW, intro to ch.5 ("Models") and ch.6 ("Techniques"). Cooper, W.S. "Getting beyond Boole", Information Processing and Management, vol. 24: 243-248 (also in SJW, 265-267); van Rijsbergen,C. J. "A new theoretical framework for information retrieval", SIGIR'86; Robertson, S.E. "The probability ranking principle in IR", Journal of Documentation, vol 33: 294-304, 1977 (also in SJW, 281-286). Salton, G., Wong, A. & Yang, C.S. (1975) “A vector space model for automatic indexing”, Communications of the ACM, vol 18: 613-620 (also in SJW, 273-280); N. J. Belkin, R. N. Oddy, and H. M. Brooks "ASK for information retrieval: Part I. Background and theory.", Journal of Documentation, 38(2):61--71, 1982; Saracevic, T. "Interactive models in information retrieval (IR): Progress, problems, proposal", in Proceedings of the 1996 ASIS Annual Meeting, Medford, NJ. Turtle, H. & Croft, W.B. (1990) “Inference networks for document retrieval”, SIGIR 1990, New York: ACM, 1-24; Ponte, J. and Croft, W.B. "A Language Modeling Approach to Information Retrieval", SIGIR'98; Lavrenko, V. "Language models", tutorial at SIGIR 2003.

Robertson, S. and Sparck Jones, K. "Simple, proven approaches to text retrieval", Technical Report TR356, Cambridge University Computer Laboratory, 1997; Salton, G. and Buckley, C. "Term-weighting approaches in automatic text retrieval", Information Processing and Management, 24(5):513-523, 1988.

Cinthia Levy's presentation: Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., Harshman R. (1990) "Indexing by Latent Semantic Analysis", JASIST, 41(6), 391-407 (a less mathematical alternative to Deerwester's original LSA paper is: Landauer, T. K., Foltz, P. W., Laham, D. "An Introduction to Latent Semantic Analysis", Discourse Processes, 25,259-284, 1998).

Stephen Donweber's presentation: Bates, Marcia J. “The Design of Browsing and Berrypicking Techniques for the Online Search Interface.", Online Review 13 (October 1989): 407-424.

Homework.

* 7 *

Tue,
Mar 02

Slides in HTML and PDF

Information Retrieval as interaction. Evaluation of interactive systems.

Intro to Statistics for IR.

Case study: Rutgers at TREC, Interactive Track (here's the presentation from Interactive TREC 2002).

Readings:
Preece et al, "Interaction design" and/or complementary website, esp. chapters on evaluation.

Any book or online tutorial on Statistics (concentrate on hypothesis testing, t-tests, ANOVA, Chi-square, correlation, ...); Hull, D. (1993) "Using statistical testing in the evaluation of retrieval experiments", SIGIR'93'; Buckley, C. and Voorhees E.M. (2000) "Evaluating evaluation measure stability", SIGIR'00; Voorhees E.M. and Buckley, C. (2002) "The effect of topic set size on retrieval experiment error", SIGIR'02.

TREC, Rutgers' recent work at TREC (see Muresan's publications, and TREC 2003 webpage), and previous work (and actually the whole 37 (3), May 2001 issue of Information Processing and Management, focussing on Interactive TREC, would be interesting for people who choose to do an evaluation for the final project).

Here are some data from our participation in the interactive TREC 2003 (user data, search data, results poster, paper written with CSIRO). Play with SPSS and try to reproduce (some of) the results.

* 8 *

Tue,
Mar 09

Slides in HTML and PDF

User interfaces for IR systems. Part I: Interaction models.

Readings:
Chapter 10: “User Interfaces and Visualization” by Marti Hearst in “Modern Information Retrieval”.

Journal of the American Society of Information Science, vol. 43, issue 2, 1992, special issue on Human-Computer Interface: “Introduction and Overview” by Lunin and Harman, “Interfaces for end-user information seeking” by Gary Marchionini, “User-friendly systems instead of user-friendly front-ends” by Donna Harman, “Intelligent information retrieval: An introduction” by Susan Gauch, “Models for hypertext” by Mark F. Frisse and Steve B. Cousins; Muresan, G. and Harper, D. J. “Document Clustering and Language Models for System-Mediated Information Access”, ECDL’01, Darmstadt, p. 438-449.

Student presentations:
Albena Stoyanova-Tzankova's presentation: Bates, M. (1990) “Where should the person stop and the information search interface start?” Information Processing and Management, v 26(5): 575-591.

O’Day, V. L. and Jeffries, R. “Orienteering in an information landscape: how information seekers get from here to there”, InterCHI’93, Amsterdam.

Brajnik, G., Mizzaro, S., Tasso, C. and Venuti, F. “Strategic Help in User Interfaces for Information Retrieval”, JASIST, 53(5), 2002, p. 343-358.

Gretchen Schwarz' presentation: Campbell, I. “Supporting Information Needs by Ostensive Definition in an Adaptive Information Space”, MIRO’95, or "The Ostensive Model of Developing Information Needs", CoLIS, Copenhagen, 1996.

 

* 9 *

Tue,
Mar 16

Spring break.    

* 10 *

Tue,
Mar 23

Slides in HTML and PDF

User interfaces for IR systems. Part II : Tools and techniques.

Information Visualization.

Readings:
Shneiderman, Ben, ch.“Information Search and Visualization” in “Designing the user Interface”, 3rd ed., 1997 (see webpage); Belkin, N.J., Marchetti, P.-G., Cool, C. (1993) BRAQUE: Design of an interface to support user interaction in information retrieval. Information Processing and Management, 29 (3): 325-344; Chalmers, M. and Chitson, P. “Bead: Exploration in information visualization”, SIGIR’92, p. 330-337; Nowell, L.T., France, R.K., Hix, D., Heath, L.S., Fox, E.A. (1996) “Visualizing search results: Some alternatives to query-document similarity”, SIGIR’ 96, p. 67-75; Williamson, C., Shneiderman, B. (1992) “The Dynamic HomeFinder: Evaluating dynamic queries in a real-estate information exploration system”, SIGIR’92, p. 338-346; Lin, Xia “Map displays for information retrieval”, JASIS, 48(1), 1997, p. 40-54; George Robertson (2000) "The Task Gallery: a 3D window manager", SIGCHI.

Further readings on Human Computer Interaction (HCI):
Preece, J., Rogers, Y. and Sharp, H. (2002) – “Interaction Design – Beyond Human-Computer Interaction” (and associated webpage).

Further readings on Information Visualization (IV):
Spence, R. (2000) – “Information Visualization”, ISBN: 0201596261; Chen, C. (1999) – “Information Visualisation and Virtual Environments”, ISBN: 1852331364; Card, S. K., MacKinlay, J. D. and Shneiderman (1999) – “Readings in Information Visualization : Using Vision to Think”, ISBN: 1558605339. Also, University of Maryland’s HCI Lab website, and InfoViz, a repository for IV.

Student presentations:
Yibu Chen's presentation: Korfhage, Robert R. “To see, or not to see - is that the query?”, SIGIR’91, p. 134-141;

Tony Joachim's presentation: Gary Marchionini, “Interfaces for end-user information seeking”, JASIS, 43(2), 1992.

Alyssa Katz's presentation: Cutting, D. R., Pedersen, J. O., Karger, D. and Tukey, J. W. “Scatter/Gather: A cluster-based approach to browsing large document collections”, SIGIR’92, p. 318-329;

Hearst, M. and Karadi, C. "Cat-a-Cone: An Interactive Interface for Specifying Searches and Viewing Retrieval Results using a Large Category Hierarchy", SIGIR'97 , Philadelphia, PA;

Chen, M., Hearst, M., Hong, J., and Lin, J. "Cha-Cha: A System for Organizing Intranet Search Results" in the Proceedings of the 2nd USENIX Symposium on Internet Technologies and SYSTEMS (USITS), Boulder, CO, 1999.

 

* 11 *

Tue,
Mar 30

Slides in HTML and PDF

Stephen Robertson's lecture

Evaluation of IR systems. Experimental vs operational IR systems.

Evaluation of interactive IR systems. IR evaluation in context.

Readings:
In Baeza-Yates & Ribeiro-Neto “Modern Information Retrieval”, ch.3: “Retrieval Evaluation”; in RK, ch.4: "Assessing the Retrieval".
In Sparck Jones & Willett, from Chapter 4, the "Introduction" and the articles by Saracevic, et al., Lancaster, and Harman;
Su, L. (1992) "Evaluation measures for interactive information retrieval", Information Processing and Management, 28(4): 503-516;
Harman, Donna “Overview of the first TREC conference”, SIGIR’93, Pittsburg.

In JASIS, 47(1), January 1996, Special Issue: Evaluation of Information Retrieval :- Tague-Sutcliffe, J. M. – “Some perspectives on the evaluation of information retrieval systems”, Blair, D. C. – “STAIRS redux: Thoughts on the STAIRS evaluation, ten years after”, Hersh, W. et al. – “A task-oriented approach to information retrieval evaluation”; Ellis, D. – “The dilemma of measurement in information retrieval research”; Beaulieu, M. et al. – “Evaluating interactive systems in TREC”.

In Information Processing and Management, 31 (3), May-June 1995, Special issue: TREC :- Harman, D. - “Overview of the Second Text Retrieval Conference (TREC-2)”; Sparck Jones, K. – “Reflections on TREC”; Robertson, S. E. et al. – “Large Test Collection Experiments on an Operational, Interactive System: Okapi at TREC”; Belkin, N. et al. – “Combining the Evidence of Multiple Query Representations for Information Retrieval”.

In Information Processing and Management, 36 (1), January 2000, Special issue: TREC :- Harman, D. - “Overview of the Sixth Text REtrieval Conference (TREC-2)”; Sparck Jones, K. – “Further reflections on TREC”; Robertson, S. E. et al. – “Experimentation as a way of life: Okapi at TREC”.

Borlund, P. and Ingwersen, P. (1997) “The development of a method for the evaluation of interactive information retrieval systems”, Journal of Documentation, 53(3).

In Information Processing and Management, 37 (3), May 2001, Special issue: Interactive TREC :- Hersh, W. and Over, P. - “Interactivity at the Text Retrieval Conference (TREC)”; Over, P. - “The TREC interactive track: an annotated bibliography”; Hersh et al. – “Challenging conventional assumptions of automated information retrieval with real users: Boolean searching and batch retrieval evaluations”; Belkin, N. et al. “Iterative exploration, design and evaluation of support for query reformulation in interactive information retrieval”; Allan, J. et al. – “Evaluating combinations of ranked lists and visualizations of inter-document similarity”; Wu, M. et al. – “Using clustering and classification approaches in interactive retrieval”; Larson, R. R. - “ TREC interactive with Cheshire II”; Bodner, R. C. et al. – “The impact of text browsing on text retrieval performance”; Yang, K. - “Passage feedback with IRIS”.

The Text Retrieval Conference (TREC) webpage. In TREC2001, Belkin et al. “Rutgers' TREC 2001 Interactive Track Experience”.

Jakob Nielsen's Alertbox, March 1, 2004: "Risks of Quantitative Studies"

Student presentations:
Samantha Francisco-Deutch's presentation: Saracevic, T. “Evaluation of Evaluation in Information Retrieval”, SIGIR’95.

Allysa Katz's presentation: Reid, J. “A Task-Oriented Non-Interactive Evaluation Methodology for Information Retrieval Systems”, Information Retrieval, 2(1), Feb 2000.

 

* 12 *

Tue,
Apr 06

Slides in HTML and PDF

(More Evaluation, if needed)

Structure. Document and query structure. Links. Categorization vs. clustering. Filtering. XML & INEX.

Readings.
vR, ch.3: “Automatic classification”;
SJWt, from ch.6 the article by Griffiths, Luckhurst & Willett; from ch.8, the article by Hayes, Knecht and Cellio and the article by Rau;
Leuski, Anton "Evaluating Document Clustering for Interactive Information Retrieval", CIKM'01, 33-40;
Hearst, Marti “The Use of Categories and Clusters in Information Access Interfaces”, in Natural Language Information Retrieval, Strzalkowski (ed.), Kluwer Academic Publishers, 1999;
Sanderson, M. and Croft, W. B. “Deriving concept hierarchies from text”, SIGIR 1999, Berkeley; Hideo Joho et al (2002) "Hierarchical presentation of expansion terms", Proceedings of the 2002 ACM symposium on Applied Computing;
Tombros, A., Villa, R. and Van Rijsbergen, C. J. (2002) “The effectiveness of query-specific hierarchic clustering in information retrieval”, Information Processing and Management, 38(4);
Yang, Yiming “An Evaluation of Statistical Approaches to Text Categorization”, Information Retrieval 1, 1999, p.69-90;
Fabrizio Sebastiani (2002) "Machine learning in automated text categorization", ACM Computing Surveys (CSUR), 34(1):1-47.

Student presentations:
Dawn Filan's presentation: Borlund, P. “Experimental Components for the evaluation of interactive information retrieval systems”, Journal of Documentation, Vol. 56, no. 1, 2000, 71-90.

Hearst, M. A. and Pedersen, J. O. “Reexamining the cluster hypothesis: scatter/gather on retrieval results”, SIGIR’96, Zurich, p. 76-84

Y. Kural, S. Robertson and S. Jones, "Deciphering cluster representations", Information Processing and Management 37, 593-601 (2001).

 

* 13 *

Tue,
Apr 13

Slides in HTML and PDF

IR on the Web.

Readings:
See Journal of the American Society for Information Science and Technology, 53(2), 2002 - Special issue on Web research;
Almind, T. C. and Ingwersen, P. (1997) “Informetric Analysis on the World Wide Web: Methodological Approaches to Webometrics”, Journal of Documentation, 53(4);
Chu, H. and Rosenthal, M (1996) “Search Engines for the World Wide Web: A Comparative Study and Evaluation Methodology”, Proceedings of ASIS’96;
Mei Kobayashi and Koichi Takeda (2000) "Information retrieval on the web", ACM Computing Surveys (CSUR), 32(2); Arvind Arasu et al (2001) "Searching the Web", ACM Transactions on Internet Technology (TOIT), 1(1): 2-43.

Hao Chen and Susan Dumais (2000) “Bringing Order from Chaos: automatically categorizing search results”, SIGCHI; Susan Dumais and Hao Chen (2000) "Hierarchical classification of Web content", SIGIR, Susan Dumais, Edward Cutrell, Hao Chen (2001) "Optimizing search by showing results in context ", SIGCHI; Ed H. Chi, Peter Pirolli, James Pitkow (2000), "The scent of a site: a system for analyzing and predicting information scent, usage, and usability of a Web site", SIGCHI.

Lary Page, Sergey Brin et al “PageRank: Bringing Order to the Web”, Stanford Uni. report (the model behind Google).

Krishna Bharat and Monika R. Henzinger (1998) "Improved algorithms for topic distillation in a hyperlinked environment", SIGIR.

Search EngineWatch
How Internet Search Engines Work
Pew report: Internet and American life

Potential student presentations:
Spink, Amanda (2002) “A user-centered approach to evaluating human interaction with Web search engines: an exploratory study”, Information Processing and Management, 38(3).

Ellis, D., Ford, N. and Furner, J. (1998) “In search of the unknown user: indexing, hypertext and the World Wide Web”, Journal of Documentation, 54(1).

 

* 14 *

Tue,
Apr 20

Slides in HTML and PDF

Advanced topics:

Natural Language Processing in IR

Document summarization (tutorial slides in HTML and PDF)

Guest lecture - Nina Wacholder: "NLP in IR"

Readings:
Robert Krovetz and W. Bruce Croft (1992) "Lexical ambiguity and information retrieval", ACM Transactions on Information Systems, 10(2):115-141;
Smeaton, A. F. (1997) "Using NLP or NLP Resources for Information Retrieval Tasks" In Strzalkowski, T., editor, Natural Language Information Retrieval. Kluwer Academic Publishers;
T. Strzalkowski, L. Guthrie, J. Karlgren, J. Leistensnider, F. Lin, J. Perez-Carballo, T. Straszheim, J. Wang and J. Wilding (1996) "Natural Language Information Retrieval: TREC-5 Report" In D. K. Harman, editor, Proceedings of the Fifth Text REtrieval Conference (TREC-5);
Voorhees, EM, "Natural Language Processing and Information Retrieval" in Pazienza, M.T. (ed.), Information Extraction: Towards Scalable, Adaptable Systems, New York: Springer, 1999, pp. 32-48.

 

* 15 *

Tue,
Apr 27

Advanced topics: Multimedia IR (image, video, music, ...). Information extraction.
Collaborative systems. Recommender systems. Personalization and user modeling. AI and IR. Agents.

Advanced topics presented by students (final project):
Ron Gorda: "Image retrieval"
Cinthia Levy: "Looking at ARTHUR: an analysis of music and audio retrieval in the waveform domain"
Samantha Francisco-Deutch: "Information Extraction"
Sunny Yoon: "Speech Retrieval"
Sarah T. Jewell: "Personalization and user modeling"
Thomas Tamburin: "The Digital Library of Human
Movement"

Readings:
Eakins, J. P. and Graham, M. E. "Content-based Image Retrieval: A Report to the JISC Technology Applications Programme";
ACM Transactions on Information Systems (TOIS), vol 22, no.1, January 2004 - Joseph A. Konstan: "Introduction to recommender systems: Algorithms and Evaluation"; J.L. Herlocker, J.A. Konstan, L.G. Terveen, J.T. Riedl: "Evaluating collaborative filtering recommender systems";
Susan Dumais et al (2003) "Stuff I've seen: a system for personal information retrieval and re-use", SIGIR, 72-79.

 

* 16 *

Tue,
May 04

Project presentations. Alyssa Katz: "Recommender systems"
Paul Debraski: "Relevance feedback"
Yibu Chen
Stephen Donweber
Dawn Filan
Tony Joachim
Gretchen Schwarz
Albena Stoyanova-Tzankova
 
Thu, May 06 Grades due