%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% sample.tex %% example to show how latex works %% X. Meng %% oct-27-2004 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %\documentclass{llncs} \documentclass[twocolumn]{article} \input epsf %%\setlength{\textwidth}{6.8in} %%\setlength{\textheight}{9.2in} %%\setlength{\hoffset}{-0.8in} %%\setlength{\voffset}{-0.8in} \setlength{\textheight}{9.0in} \setlength{\columnsep}{0.375in} \setlength{\textwidth}{6.5in} %%% Preset settings %\setlength{\footheight}{0.0in} %%% not recognized 2002-09-19 \setlength{\topmargin}{-0.0625in} \setlength{\headheight}{0.0in} \setlength{\headsep}{0.0in} \setlength{\oddsidemargin}{0.0in} \setlength{\parindent}{1pc} \begin{document} % \title{HERE IS A MULTIPLE LINE\\ TITLE} \author{ Xiannong Meng\\ Department of Computer Science\\ Bucknell University\\ Lewisburg, PA 17837, U.S.A.\\ Email: xmeng@bucknell.edu \and second author\\ Department of Computer Science\\ Some University\\ Some city, ST 12345, U.S.A. \\ } \date{} \thispagestyle{empty} %\setcounter{page}{1} \maketitle %\begin{abstract} \begin{center}{\bf Abstract}\end{center} {\small There are other measures that improve over the precision-recall measure, such as \emph{expected search length} (ESL) and \emph{average search length} (ASL). These measures are not intuitive and difficult to compute. We propose in this paper, as an alternative measure for evaluating the effectiveness of IR systems such as a web search system. \emph{RankPower} takes both the rank and the number of relevant documents into consideration. \emph{RankPower} is bounded below as the number of relevant documents approaches infinity. Thus comparisons among different systems using \emph{RankPower} become intuitive and easy. } \noindent {\bf Keywords}: information retrieval, user preference, relevance feedback, adaptive algorithm, term weighting, web search %\end{abstract} \vspace{.3in} %\newpage \section{Introduction\label{sec:intro}} \emph{Recall} and \emph{precision} are two best-known traditional measures of the quality of a information retrieval system\,\cite{yates:99,salton:89}. Assume the whole document collection is \emph{S}, the relevant document collection among \emph{S} for a given query \emph{q} is \emph{R}. When the query \emph{q} is issued, an information retrieval (IR) system (or specifically, a web search engine) returns the collection $S_q$, among which $R_q$ is the set of relevant documents, then \[ Precsion = \frac{|R_q|}{|S_q|} ~~~~~~ Recall = \frac{|R_q|}{|R|} \] That is, \emph{precision} measures how many documents are relevant to the query among the returned documents; and \emph{recall} measures how many relevant documents are returned among the total relevant document set. We will use \emph{P} to denote the precision measure and \emph{R} to denote the recall measure for the rest of the paper. The goal of an IR system is to have high precision, meaning high percentage of returned documents are relevant; and high recall, meaning high percentage of relevant documents are returned. In a closed lab testing environment where different algorithms can be studied, the complete document collection is known, the test queries and the set of documents relevant to each query among the whole document collection are identified by the experts. Under these conditions, precision and recall are very good measures of the degree of success of an IR system. However since precision and recall are two separate measures it is not always convenient to compare among different IR systems using two separate measures. To amend this deficiency, many \emph{single value} measures have been proposed and studied\,\cite{korfhage:97}. The \emph{harmonic mean} measure is defined as the inverse of the sum of the inverse of precision and the inverse of recall. So for a \emph{harmonic mean} value to be high, both precision and recall have to be high. \[harmonic~mean = \frac{2}{\frac{1}{P} + \frac{1}{R}} \] Variations of the basic \emph{harmonic mean} measure include E-measure among others. \[E~measure = \frac{(1+\beta^2)}{\frac{\beta^2}{R} + \frac{1}{P}} \] where $\beta$ is used to control the trade-off between precision and recall. If $\beta$ is one, the precision and recall are equally weighted, which becomes the \emph{harmonic mean} measure; if $\beta$ is greater than one, the precision is weighed more; otherwise the recall is weighed more. A closed-form expression of the optimal \emph{RankPower} can be found such that comparisons of different web information retrieval systems can be easily made. The \emph{RankPower} measure reaches its optimal value when all returned documents are relevant. The rest of the paper is organized as follows. First the various measures of the effectiveness of IR systems including web systems are reviewed in Section \ref{sec:lit}. The definition, derivation, and rationales of the measure \emph{RankPower} are presented in Section \ref{sec:core}. Section \ref{sec:result} contains some numerical results and comparisons among different measures, followed by some concluding remarks in Section \ref{sec:conclude}. \section{Example of Citing References\label{sec:lit}} The classic measures of user-oriented performance of an IR system are \emph{precision} and \emph{recall} which can be traced back to the time frame of 1960's\,\cite{cleverdon:66, treu:67}. The ideal goal is a high precision rate, as well as a high recall rate. Several other measures are related to the precision and recall. The \emph{average precision at seen relevant documents}\,\cite{yates:99} takes the average of precision values seen so far to give a single value measure. \section{Equations\label{sec:core}} The following notions will be used throughout the discussion. \begin{enumerate} \item For a given query, a set of \emph{N} documents is returned. \item Among the \emph{N} returned documents, $R_N$ represents the documents that are relevant to the query, $R_N \le N$. \item Each of the relevant documents in $R_N$ is placed among the \emph{N} different places. These places are denoted as $L_i$, $0 \le i \le N$. \item The number of relevant documents in the returned set, $|R_N|$ is denoted as $C_N$. \end{enumerate} Note that we are considering practical retrieval systems such as web search engines. In this situation the size of the total document set is unknown, nor do we know the exact relevant document set from the complete document collection for any given query. \[ \lim_{N \rightarrow \infty} R_{avg} (N) \rightarrow \infty \] In an ideal case where every single returned document is relevant, the average rank has the follow form. \[ R_{avg} = \frac{\sum_{i=1}^{C_N} L_i} {C_N} = \frac{\sum_{i=1}^{N} i} {N} = \frac{\frac{N(N+1)}{2}}{N} = \frac{N+1}{2}\] We now introduce a new measure, \emph{RankPower} to tackle the above issues. The basic idea is that the average rank captures only the average over a set. We would like to see a small average rank, and a large relevant document count. Combing these two factors, we define the \emph{RankPower} as follows. \begin{equation} RankPower = \frac{R_{avg}} {C_N} = \frac{\sum_{i=1}^{C_N} L_i} {{C_N}^2} \end{equation} An example of table. Precision-Recall: We use the traditional 11 recall intervals. The results are listed in Table \ref{tbl:rp}. \[Recall = \frac{|R_q|}{|R|} \] \[Precision = \frac{|R_q|}{|S_q|} \] \begin{table} \caption{Example of Recall and Precision\label{tbl:rp}} \begin{tabular}{|l|l|l|l|l|l|} \hline Place & 1 & 2 & 3 & 4 & 5 \\ \hline Recall & 0.1 & 0.2 & 0.3 & 0.4 & 0.5 \\ \hline Precision & 0.50 & 0.40 & 0.43 & 0.00 & 0.00 \\ \hline\hline Place & 6 & 7 & 8 & 9 & 10 \\ \hline Recall & 0.6 & 0.7 & 0.8 & 0.9 & 1.0 \\ \hline Precision & 0.00 & 0.00 & 0.00 &0.00 & 0.00 \\ \hline \end{tabular} \end{table} \section{Figure Example\label{sec:result}} Some more results. \begin{figure}[ht] \centering \epsfysize=1.8in \vspace*{0in}\hspace*{0in}\epsfbox{pawsc-arch.eps} \caption{Architecture of PAWS-cluster\label{fig:arch}} \end{figure} \section{Conclusion\label{sec:conclude}} Some conclusions. \vspace*{5mm} \noindent{\bf URLS} referenced in the paper: [AW] $<$http://www.alltheweb.com/$>$ [GG] $<$http://www.google.com/$>$ %\bibliographystyle{chenBST} %\bibliography{chenBib} \bibliographystyle{plain} %%\bibliography{chenBib} \begin{thebibliography}{10} \bibitem{yates:99} R.~Baeza-Yates and B.~Ribeiro-Neto, \emph{Modern Information Retrieval}, Addison Wesley, 1999. \bibitem{cleverdon:66} C.W.~Cleverdon, J.~Mills, \& E.M.~Keen, Factors Determining the Performance of Indexing Systems, Volume 1 -- Design, Aslib Cranfield Research Project, Cranfield, England, 1966. \bibitem{cooper:68} W.S.~Cooper, Expected search length: A single measure of retrieval effectiveness based on weak ordering action of retrieval systems, \emph{Journal of the American Society for Information Science}, 19(1), 30-41, 1968. \bibitem{korfhage:97} R.R.~Korfhage, \emph{Information Storage and Retrieval}, John Wiley \& Sons, 1997. \bibitem{rijsbergen:74} C.~van~Rijsbergen, Foundation of evaluation, \emph{Journal of Documentation}, 30(4), 365-373, 1974. \bibitem{losee:98} R.M.~Losee, \emph{Text Retrieval and Filtering: Analytic Models of Performance}, Kluwer Publisher, Boston, 1998. \bibitem{losee:99} R.M.~Losee, Measuring Search Engine Quality and Query Difficulty: Ranking with Target and Freestyle, \emph{Journal of the American Society for Information Science}, 50(10), 882-889, 1999. \bibitem{losee:00} R.M.~Losee, When Information Retrieval Measures Agree about the Relative Quality of Document Rankings, \emph{Journal of the American Society for Information Science}, 51(9), 834-840, 2000. \bibitem{meng:03sub} X. Meng \& Z. Chen, MARS: Multiplicative Adaptive Refinement Web Search, submitted for publication, 2003. \bibitem{salton:89} G.~Salton, \emph{Automatic Text Processing}, Addison Wesley, 1989. \bibitem{shaw:86} W.M.~Shaw~Jr. On the foundation of evaluation, \emph{Journal of the American Society for Information Science}, 37(5), 346-348, 1986. \bibitem{treu:67} S. Treu, Testing and Evaluation -- Literature Review, \emph{Electronic Handling of Information: Testing and Evaluation}, A. Kent, O.E. Taulbee, J. Belzer, and G.D. Goldstein, editors, Thompson Book Co., Washington, D.C. 1967, 71-88. \end{thebibliography} \end{document}