TY - JOUR

T1 - On probabilistic notions of precision as a function of recall

AU - Bollmann, Peter

AU - Raghavan, Vijay V.

AU - Jung, Gwang S.

AU - Shu, Lih C.

PY - 1992

Y1 - 1992

N2 - Two problems that arise when recall and precision are used to evaluate information retrieval systems are due to the weak ordering of the documents generated by the system and evaluation with multiple queries. Although several alternative stopping criteria are available, our emphasis in this paper is on defining precision when recall is used as the stopping criterion. A number of different probabilistic notions of precision for handling the problem of weak ordering have been proposed in the past, including PRECALL, probability of relevance given retrieval (PRR), and expected precision (EP). Recently Raghavan et al. provided a comparative analysis of PRECALL, PRR, and EP. They showed that previous usages of PRECALL for dealing with the problem of weak ordering and interpolation, which involved the application of ceiling operation, are inconsistent, and the results obtained are not easy to interpret. Consequently, they introduced an interpolation scheme, termed intuitive interpolation, that leads to consistent and meaningful handling of averaging results given by PRR over multiple queries. A simple way of calculating PRR was also given. However, a comparable analysis of precision defined as EPhas not been provided. Furthermore, given that several alternative ways of defining precision in a probabilistic sense are available, no theoretical basis for deciding which alternative to use in a specific situation exists. This paper initially investigates an efficient way of calculating EP and an interpolation scheme for averaging EP that are consistent with the intuitive interpolation scheme proposed for PRR. In addition, PRECALL with intuitive interpolation is termed R-B Precision, and is shown to have interpretation as the value of PRR and EP, in the limit. From this result, PRR and EP are shown to be attractive in their ability to present experimental results in a descriptive sense. In contrast, in situations where experimental tests are intended for predictive use, R-B Precision is shown to be a better choice.

AB - Two problems that arise when recall and precision are used to evaluate information retrieval systems are due to the weak ordering of the documents generated by the system and evaluation with multiple queries. Although several alternative stopping criteria are available, our emphasis in this paper is on defining precision when recall is used as the stopping criterion. A number of different probabilistic notions of precision for handling the problem of weak ordering have been proposed in the past, including PRECALL, probability of relevance given retrieval (PRR), and expected precision (EP). Recently Raghavan et al. provided a comparative analysis of PRECALL, PRR, and EP. They showed that previous usages of PRECALL for dealing with the problem of weak ordering and interpolation, which involved the application of ceiling operation, are inconsistent, and the results obtained are not easy to interpret. Consequently, they introduced an interpolation scheme, termed intuitive interpolation, that leads to consistent and meaningful handling of averaging results given by PRR over multiple queries. A simple way of calculating PRR was also given. However, a comparable analysis of precision defined as EPhas not been provided. Furthermore, given that several alternative ways of defining precision in a probabilistic sense are available, no theoretical basis for deciding which alternative to use in a specific situation exists. This paper initially investigates an efficient way of calculating EP and an interpolation scheme for averaging EP that are consistent with the intuitive interpolation scheme proposed for PRR. In addition, PRECALL with intuitive interpolation is termed R-B Precision, and is shown to have interpretation as the value of PRR and EP, in the limit. From this result, PRR and EP are shown to be attractive in their ability to present experimental results in a descriptive sense. In contrast, in situations where experimental tests are intended for predictive use, R-B Precision is shown to be a better choice.

UR - http://www.scopus.com/inward/record.url?scp=1542621957&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=1542621957&partnerID=8YFLogxK

U2 - 10.1016/0306-4573(92)90077-D

DO - 10.1016/0306-4573(92)90077-D

M3 - Article

AN - SCOPUS:1542621957

SN - 0306-4573

VL - 28

SP - 291

EP - 315

JO - Information Processing and Management

JF - Information Processing and Management

IS - 3

ER -