%%% my own latex stuff are commented to use pdpta2001 format
%%% \documentstyle[11pt,cprog,html]{article}
%%% \setlength{\textheight}{9.6in}
%%% \setlength{\voffset}{-1.4in}
%%% \setlength{\textwidth}{7.2in}
%%% \setlength{\hoffset}{-1.2in}
%%% %\parskip 2ex
%%% \renewcommand{\baselinestretch}{1.2}

%% PDPTA-2001 format from www.ashland.edu/~iajwa/conferences

\documentstyle[twocolumn,11pt, html]{article}
\def\htlink{\htmladdnormallink}   %%%% I added it for my convenience

\pagestyle{empty}                         %%%% No page Numbering
 
\setlength{\textheight}{9.0in}
\setlength{\columnsep}{0.375in}
\setlength{\textwidth}{6.5in}              %%% Preset settings
\setlength{\footheight}{0.0in}
\setlength{\topmargin}{-0.0625in}
\setlength{\headheight}{0.0in}
\setlength{\headsep}{0.0in}
\setlength{\oddsidemargin}{0.0in}
\setlength{\parindent}{1pc}

%\title{The Architecture of Yarrow: A Real-Time Intelligent Meta-Search Engine}
%\author{Xiannong Meng\& Zhixiang Chen \\
%Department of Computer Science\\
%University of Texas - Pan American\\
%Edinburg, TX 78539-2999\\
%Contacting and presenting author: Xiannong Meng\\
%meng@cs.panam.edu \\
%Phone: (956) 316-7062 \\
%Fax: (956) 384-5099
%}
%\date{February 22, 2001}

\title{The Architecture of Yarrow: A Real-Time Intelligent Meta-Search Engine}
\author{
Xiannong Meng\\
Department of Computer Science\\
University of Texas - Pan American\\
Edinburg, TX 78539-2999, U.S.A.\\
\and
Zhixiang Chen\\
Department of Computer Science\\
University of Texas - Pan American\\
Edinburg, TX 78539-2999, U.S.A.\\
}
\date{}

\input epsf
\begin{document}
\maketitle

%\begin{abstract}
\noindent
{\bf Abstract}
{\small\em In this paper we present the architecture of Yarrow[a]\footnote{{\em
Yarrow} is the name of a family of small plants and happens to be the
name of the street where the two authors
live.} -- an intelligent web meta-search engine. Yarrow takes a user
query and sends it automatically to a number of major search
engines. As the results are sent back from these search engines Yarrow
processes these results using a practically efficient on-line learning
algorithm before displaying the re-ranked results to the
user. Users have the opportunity to interactively refine the search
results presented by Yarrow, which dynamically promotes or demotes the
search results until a satisfactory set of pages are located by the
user.
}

\vspace{0.5cm}

\noindent
{\it Keywords:}
{\small meta-search engine, World Wide Web, Internet
application, adaptive-learning}

%\end{abstract}

\section{Introduction\label{sec:intro}}
The web provides a pervasive amount of information. According to a
recent study\cite{lawrence99a}, there are estimated 800 million pages
on the web. Finding information on the web in a reasonable amount of
time is very difficult. General purpose search engines such as
AltaVista[g], Yahoo![n], NorthernLight[m] do help. But with exponential growth
in the size of the web, the coverage of the web by general search
engines has been decreasing, with no engine indexing more than about
16\% of the estimated size of the publicly indexable web
\cite{lawrence99a}. In response to this difficulty, two approaches
have been taken recently. One is the development of {\em meta-search
engines} that forward user queries to multiple search engines at the
same time in order to increase the coverage and hope to {\em
include} in a short list of top-ranked results what the user wants. Examples of
such meta-search engine include MetaCrowler [b], Inference Find [c] and
Dogpile[d]. Another approach is the development of {\em
topic-specific} search engines that are specialized in particular
topics. These topics range from vacation guides [e] to kids health
[f]. General search engines cover large amounts of information even though
the percentage of coverage is decreasing. But users have hard time
locating efficiently what they want.  The first generation of
meta-search engines addressed the problem of decreasing coverage by
simultaneously querying multiple general-purpose engines. These
meta-search engines suffer to a certain extent the inherited problem of
{\em information overflow}. It is difficult for users to pin down
specific information for which they are searching. Specialized
search engines
typically contain much more accurate and narrowly focused
information. However it is not easy for a novice user to know where
and which specialized engine to use.

Meta-search engines may be classified into two categories: {\em shallow 
meta-search engines} and {\em deep meta-search engines}. A shallow meta-search 
engine simply echoes the search results of one or several general-purpose 
search engines. 
There may be some collating, filtering, or sorting processes, but such efforts 
are very limited. A deep meta-search engine will use the search results of the 
general-purpose search engines as its starting search space, from which 
it will adaptively learn from the user's feedback
to boost and enhance the search performance and 
the relevance accuracy of the general-purpose search engines. 
It may use clustering,  filtering, and other methods to help 
its adaptive learning process. 
From an engineering point of view, a meta-search 
engine is usually light-weighted, that is, it doesn't require the
support of very complicated data structures,  it does not require a
large database, it does not require
a large amount of memory. It should and is able to emphasize the intelligent 
processing of
the search results returned by general-purpose search engines. 
Recent research on web communities 
\cite{kleinberg99,gibson98,chakrabarti98}
has used a short list of hits
returned by a search engine as a starting set for further expansion.
There have been great efforts on applying machine learning on web search
related applications, for example, 
scientific article locating and user profiling 
\cite{kurt98,kurt99,lawrence99}, 
and focused crawling \cite{rennie99}.

This paper presents Yarrow [a], a second-generation meta-search engine
that is an intelligent
deep meta-search engine. Currently, Yarrow can query eight of the most
popular general-purpose 
search engines and is able to perform document parsing and 
indexing, and learning in real-time on client side.
The predominant feature of Yarrow is that
in contrast to the lack of 
adaptive learning features in existing meta-search engines, 
Yarrow is equipped with an on-line
learning algorithm TW2 (Tailored Winnow2) \cite{chenyarrow}
%{chenquery,chenwebsail}
so that it is  capable of helping the user
to search for  the desired documents with user feedbacks.
We designed in \cite{chenquery} %,chenwebsail} 
 the learning algorithm, TW2, a tailored version 
of Winnow2 \cite{littlestone88} in the case of web search. 
When used to learn a disjunction of at most $k$ relevant keywords,  
TW2 has surprisingly small mistake 
bounds that are independent of the dimensionality
of the indexing keywords. 
  TW2 has been successfully used as part of the learning components 
  in our other projects \cite{chenwebsail,chenfeatures}. 


......

\section{Concluding Remarks\label{sec:concl}}

This paper describes the architecture of Yarrow, an intelligent
meta-search engine and some of their implementation details.
From the engineering point of view, deep meta-search is possible, because
a meta-search engine is usually light-weighted and does not require a large
database nor a large amount of memory. Yarrow is a first-step
attempt to build deep meta-search engine. It is powered by an
efficient learning algorithm 
and is also equipped with functions of  
document parsing and indexing. It adaptively learns from the user's feedback  
to search for the desired documents. In the future we plan to
implement Yarrow on a cluster of computers and we also plan to add
personalized features.\\

{\noindent{\large\bf URLs Used in the Paper}}\\
\noindent [a] Yarrow $<$\htlink{http://www.cs.panam.edu/\~{}\\
chen/WebSearch/Yarrow.html}
{http://www.cs.panam.edu/\~{}chen/WebSearch/Yarrow.html}$>$

\noindent [b] MetaCrawler \\ $<$\htlink{http://www.metacrawler.com}
{http://www.metacrawler.com}$>$ 

\noindent [c] Inference Find: $<$\htlink{http://www.infind.com}
{http://www.infind.com}$>$

\noindent [d] Dogpile $<$\htlink{http://www.dogpile.com}
{http://www.dogpile.com}$>$

\noindent [e] VacationSpot.com \\
$<$\htlink{http://www.vacationspot.com}
{http://www.vacationspot.com}$>$

\noindent [f] KidsHealth $<$\htlink{http://www.kidshealth.com}
{http://www.kidshealth.com}$>$

\noindent [g] AltaVista $<$\htlink{http://www.altavista.com}
{http://www.altavista.com}$>$

\noindent [h] Excite $<$\htlink{http://www.excite.com}
{http://www.excite.com}$>$

\noindent [i] GoTo $<$\htlink{http://www.goto.com}
{http://www.goto.com}$>$

\noindent [j] HotBot $<$\htlink{http://www.hotbot.com}
{http://www.hotbot.com}$>$

\noindent [k] InfoSeek $<$\htlink{http://www.infoseek.com}
{http://www.infoseek.com}$>$

\noindent [l] Lycos $<$\htlink{http://www.lycos.com}
{http://www.lycos.com}$>$

\noindent [m] NorthernLight\\  $<$\htlink{http://www.northernlight.com}
{http://www.northernlight.com}$>$

\noindent [n] Yahoo! $<$\htlink{http://www.yahoo.com}
{http://www.yahoo.com}$>$
\bibliographystyle{plain}
\bibliography{/home/accounts/facultystaff/x/xmeng/lib/tex/web}

\end{document}