Much effort and research has gone into solving the problem of evaluation of information retrieval systems.
However, it is probably fair to say that most people active in the field of information storage and retrieval still feel that the problem is far from solved.
One may get an idea of the extent of the effort by looking at the numerous survey articles that have been published on the topic (see the regular chapter in the Annual Review on evaluation).
Nevertheless, new approaches to evaluation are constantly being published (e.g. Cooper; Jardin and van Rijsbergen; Heine).
In a book of this nature it will be impossible to cover all work to date about evaluation.
Instead I shall attempt to explicate the conventional, most commonly used method of evaluation, followed by a survey of the more promising attempts to improve on the older methods of evaluation.
To put the problem of evaluation in perspective let me pose three questions: (1) Why evaluate? (2) What to evaluate? (3) How to evaluate? The answers to these questions pretty well cover the whole field of evaluation.
There is much controversy about each and although I do not wish to add to the controversy I shall attempt an answer to each one in turn.
The answer to the first question is mainly a social and economic one.
The social part is fairly intangible, but mainly relates to the desire to put a measure on the benefits (or disadvantages) to be got from information retrieval systems.
I use 'benefit' here in a much wider sense than just the benefit accruing due to acquisition of relevant documents.
For example, what benefit will users obtain (or what harm will be done) by replacing the traditional sources of information by a fully