INFORMATION RETRIEVAL
17:610:551
Spring 2002
Gheorghe Muresan
READINGS
There will be
weekly readings assigned for each topic, usually from the `essential
textbooks', but also from journal articles, conference proceedings, and other
books. Most of the additional readings will be distributed in class (students
will be asked to reimburse SCILS for the photocopying charges). It is required
that in addition to these readings, students read at least one other, and
preferably more, of the standard texts on information retrieval.
Essential
textbooks
(First edition (1995) is out of
print, but the draft of the second edition (due 2002) is available.)
Good general introductory text to all of information retrieval, well
written and at an appropriate level for this course. The examples are
from health care applications, but the book covers general principles and
pays particular attention to the problem of designing good evaluation
experiments.
This book won the ASIS Best
Information Science Book Award for 1998. It is a general overview of IR, and
offers good references.
Recommended
readings
A collection of fundamental
research papers in information retrieval, plus very good introductory material
to the book as a whole, and to each section, by the editors. A basic resource
for understanding information retrieval.
This is a very good place to look for papers on which to report to the
class.
Although more targeted at a
Computer Science audience, it does a good job of introducing IR concepts and
principles.
Although
it has a slight Computer Science, the text does not enter in details of data
structures and algorithms but instead addresses concepts, principles, and the
mathematical model underlying Information Retrieval. My only gripe is that the
chapters have different authors, so the level of detail and the notation vary a
lot, plus there is some overlapping between chapters.
A quite recent book that seems to
cover well most aspects of building IR systems.
A collection of lectures on IR
given at the Third European Summer-School, ESSIR 2000.
Not a text on IR as a whole, but
useful for several of its chapters. Provides a general review of the topic.
A very technical book on
data structures and algorithms for IR. Much of the code (in C) is available at
ftp://sunsite.dcc.uchile.cl/pub/users/rbaeza/irbook/. Recommended if you want
to build an IR system. The chapters are written by various authors, so the
level of detail and the notation varies widely.
A good overview of this topic, with
useful chapters on theoretical and experimental approaches to IR.
This is essentially the third
edition of Lancaster (1978). It is expanded in several respects, but does not
do a very good job on experimental and advanced IR systems. It is somewhat
superior to the previous edition because it is more up-to-date.
· Meadow, C.
(1992) Text information retrieval systems. San Diego: Academic Press.
A good text on the topic.
A standard, good text on the
topic, somewhat technically and theoretically oriented. Parts are quite mathematical.
Available free at http://www.dcs.gla.ac.uk/Keith/Preface.html.
·
Salton, G. (1989) Automatic text processing: The
transformation, analysis and retrieval of information by computer. Reading,
MA: Addison-Wesley.
Has some useful sections on automatic
IR systems, and integrates IR within an overall text-processing framework.
·
Salton, G. & McGill, M. (1983) Introduction to modern
information retrieval. New York: McGraw-Hill.
Quite old now, but still a very
standard text for IR, heavily focused on technical issues of
representation and retrieval techniques. It introduces many of the ideas that
are otherwise only to be found in the papers and technical reports of Salton's
group at Cornell.
·
Sparck Jones, K. ed. (1981) Information retrieval experiment.
London: Butterworths.
The classic work on this topic.
Very good chapters on different aspects of experimentation in IR, by very good
people.
A very general introduction to the
topic, which is best treated as a means of becoming familiar with IR and its
terminology (see also Lancaster & Warner, 1993).
A somewhat dated, but conceptually quite good text on
traditional commercial IR systems.
Background
reading
Information Retrieval is not an isolated field, but relies on knowledge
and research results from different domains such as Human-Computer Interaction,
Artificial Intelligence, Computational Linguistics, Statistics and Probability
Theory.
You are encouraged to investigate at least the domains
relevant to your project.
The list below will increase in time.
·
Preece, Rogers and Sharp (2002) Interaction design -
Beyond human-computer interaction, ISBN: 0471492787, Wiley.
An IR system is an interactive
system, so a book on the design and evaluation of interactive systems is
essential background reading.
Also see the complementary website.
·
Manning,
Christopher D. and Schutze, Hinrich (2001) Foundations of statistical
natural language processing, Cambridge: MIT Press
It explains
the mathematical foundations of the statistical approach to Information
Retrieval.
Bibliographic Note
There are several sources, primary, secondary and tertiary, with which people concerned with IR should be familiar, and which they should at least look through as they appear. The most important of them are listed below.
Journals
ACM Transactions on Information Systems. This is a standard journal for substantial, archival work in IR, with a computer science emphasis. Experiment and research in IR. Quarterly.
CD-ROM and Online Review (formerly Online Review). Best available journal dealing with operational interactive IR systems. At least some of the articles are reviewed.
Information Processing and Management. A standard international journal, with significant work in IR in every issue. Experiment and research in IR. Bi-monthly.
Information Retrieval. A new journal, edited by Paul Kantor and Stephen Robertson, whose stated aims are to publish high quality technical work in IR. Quarterly.
Journal of the American Society for Information Science. A good, standard publication source for information science, with much good work in IR. Experiment and research in IR. About 14 issues/year.
Journal of Documentation. Published by Aslib. Long one the standard and most important journals in IR, which avoids the US bias of JASIS. Experiment and research in IR. Quarterly.
Program. Published by Aslib. Good source for advanced work in operational IR systems
Conferences
ACM SIGIR International Conference on Research and Development in Information Retrieval (SIGIR 'nn). This is the standard place for publication of research papers with a computer science orientation toward IR. In recent years, it has been accepting an increasing number of papers on information seeking and on human-computer interaction in IR. Very strictly refereed.
Annual Conference of the American Society for Information Science. The quality of the papers in this meeting is highly variable, but it is always worth looking at it. It is just about the only place where person-oriented research in IR is reported.
Digital Libraries `nn. The Nth ACM Conference on Digital Libraries. The first of this series was in 1996. Although the standard of papers is variable, this meeting is developing into the standard source for papers on digital libraries (which often means papers on IR). There is some emphasis on reporting on systems and prototypes, which makes it different from the SIGIR Conferences.
IEEE ADL `nn. IEEE Forum on Research and Technology Advances in Digital Libraries. This is another conference on digital libraries, but with less high quality papers than in the ACM DL `nn series. More focus on policy and economic issues, and also on database and other technical issues. Some IR content.
National Online Meeting. Most papers in these proceedings are not terribly high-level, but there are always a few of interest. This is probably the best conference to report on new work in operational IR systems.
TREC-n. Proceedings of the nth Text REtrieval Conference. The eighth in this series will be published later this spring. Although not refereed, it has become a standard place for publication of high-quality IR evaluation papers. The most important new results in computer-oriented IR are now first published in this forum. The entire set of proceedings is available on-line, at http://trec.nist.gov/
Reviews and Indexes
Annual Review of Information Science and Technology. The standard review source in the field.
Perspectives on...: Journal of the American Society for Information Science. An irregular series of grouped articles on special topics within the Journal. A number of articles on some topic of current interest are put together by a special editor for that topic.
Progress in Documentation: Journal of Documentation. An irregular series of review articles published in this journal. Very high quality.
Trends in ...: Information Processing and Management. A good, irregular series of review articles.
Information Science Abstracts. Good indexing service, with a rather pronounced USA bias.
Library and Information Science Abstracts. Good indexing service, with better international coverage than ISA.
Other
The home
page for ACM SIGIR has a great deal of information on it, with links to
many other resources in information retrieval.
Online resource: http://www.searchenginewatch.com/