Schedule for 16:194:614

This schedule is subject to alterations.

 
- Week - - Topics / Activities - - Students' responsibilities -
(During and/or after class)
Textbook IR: general topics

* 1 *

Tue,
Jan 24

Slides in HTML and PDF

Introduction and overview of the course.

Get familiar with the course website. Set up your course website on scils.
Send me email with your details (use students.xml template).

Play with Model.xls to solve the homework.If you need to, do some Excel practice.

* 2 *

Tue,
Jan 31

Slides in HTML and PDF

Introduction to IR. Information vs. data retrieval.

What do we want from IR ? Introduction to evaluation.

 

* 3 *

Tue,
Feb 07

Slides in HTML and PDF

IR concepts. Aboutness. Relevance.

Rationalist vs. empiricist approaches (AI vs. Stats)

Design decisions for IRS; automatic vs. manual/intellectual systems.

Parsing labwork. (Please install Python on your laptops before class. Recommended: ActiveState)

* 4 *

Tue,
Feb 14

Slides in HTML and PDF

Indexing.

Document and query representation. Manual vs. automatic indexing.

WebClusterLite lab work. (Please install WebClusterLite onto your laptops before class.)

Look at an example of a document collection, a stopword list, an indexed collection and an inverted file.
Formulate a few boolean queries and figure out the result of a boolean search.

* 5 *

Tue,
Feb 21

Slides in HTML and PDF


Automatic indexing. Lexical analysis. Weighting. Data structures.

Lab work.

Homework (to be graded).

* 6 *

Tue,
Feb 28

Slides in HTML and PDF

Models of IR.

Ranking / relevance estimation models.
Vector space model. Probabilistic model. Language models.

 

WebClusterLite homework.

See Lavrenko's tutorial on Language Models.

* 7 *

Tue,
Mar 07

Slides in HTML and PDF

Models of IR.

Interaction models. Information Retrieval as interaction.

Evaluation of interactive systems.

Lemur lab work. Lemur homework.

* 8 *

Tue,
Mar 14

Spring break, no class.

 

* 9 *

Tue,
Mar 21

Slides in HTML and PDF

Evaluation of IR systems.
Measures of performance.

Introduction to Statistics and Hypothesis Testing (HTML and PDF).

Lab work / homework - Performance evaluation .

* 10 *

Tue,
Mar 28

Evaluation of IR systems.

TREC
Muresan: Intro, Interactive
Beckett: Noisy Text track
Gass: Ad-hoc track
Nicholson: Cross language
Noyes: Very large collections
Rutter: Interactive track

Optional homework - Analysis of Rutgers TREC 2002 data.

* 11 *

Tue,
Apr 04

Evaluation of IR systems.

TREC
Kirkyla: Question answering
Fang: Routing and Filtering
Muresan: HARD

Lab work / homework - TREC-type evaluation.

* 12 *

Tue,
Apr 11

Slides in HTML and PDF

User interfaces and Information Visualization for IR. Part I: Interaction models.

Mediated IR (HTML and PDF).

Gass: Designing user interfaces;
Rutter: Marcia Bates' "Where should the person stop and the search interface start?"

Proposed Lemur-based term project(s)

* 13 *

Tue,
Apr 18

User interfaces and Information Visualization for IR. Part II : Tools and techniques.

Kirkyla: Supporting
multitasking in browser design
Nicholson: O'Day's "Orienteering ..."
Rutter: Brajnik's "Strategic help ..."
Fang: White's "Evaluating implicit feedback .."
Beckett & Noyes: Search user interfaces

 
Advanced IR: current research topics

* 14 *

Tue,
Apr 25

Slides in HTML and PDF

Web IR

Structure. Clustering vs. classification.

Gass: The Semantic Web.
Fang: RSS.

Blogs and wikis.

Informetrics and IR.

Collaborative and recommender systems.

Personalization and user modeling.

Topic modeling.

Implicit vs. explicit feedback.

See Ravi Kumar's tutorial on Internet Search.

* 15 *

Tue,
May 02

Slides in HTML and PDF

AI and IR.

Machine learning and data mining for IR.

Natural language processing for IR.

Document summarization.

Information extraction.

Cross-language IR.

Multimedia IR (image, video, music, ...).

IR for structured documents. INEX.

(Also see a tutorial on ML )

* 16 *

Tue,
May 09

Student presentations of term projects.

Nicholson: Informetrics