Problems of measurement have arisen in physics, psychology, and more
recently, the social sciences.
Clarification of these problems has
been sought with the help of the theory of measurement.
shall attempt to do the same for information retrieval.
My purpose is
to construct a framework, based on the mathematical theory of
measurement within which measures of effectiveness for retrieval
systems can be derived.
The basic mathematical notions underlying the
measurement ideas will be introduced, but for their deeper
understanding the reader is referred to the excellent book by Krantz
It would be fair to say
that the theory developed there is applied here.
Also of interest are
the books by Ellis and Lieberman.
The problems of measurement in information retrieval differ from those encountered in the physical sciences in one important aspect.
In the physical sciences there is usually an empirical ordering of the quantities we wish to measure.
For example, we can establish empirically by means of a scale which masses are equal, and which are greater or less than others.
Such a situation does not hold in information retrieval.
In the case of the measurement of effectiveness by precision and recall, there is no absolute sense in which one can say that one particular pair of precision-recall values is better or worse than some other pair, or, for that matter, that they are comparable at all.
However, to leave it at that is to admit defeat.
no reason why we cannot postulate a particular ordering, or, to put it more mildly, why we can not show that a certain model for the measurement of effectiveness has acceptable properties.
The immediate consequence of proceeding in this fashion is that each property ascribed to the model may be challenged.
The only defence one has against this is that:
(1) all properties ascribed are consistent;
(2) they bring out into the open all the assumptions made in measuring effectiveness;
(3) each property has an acceptable interpretation;
(4) the model leads to a plausible measure of effectiveness.
It is as well to point out here that it does not lead to a unique measure, but it does show that certain classes of measures can be regarded as being equivalent.
* The next three sections are substantially the same as those appearing in my paper: 'Foundations of evaluation', Journal of Documentation, 30, 365-373 (1974).
They have been included with the kind permission of the Managing Editor of Aslib.