CSCI 335
Fall 2006
Review Questions For Exam One
The exam questions will be based on the concepts covered by these review
questions. You are allowed to bring one information sheet. You will need
to do some computational work so a calculator would be helpful, but not
absolutely necessary.
- Overview of IR system
- What is the typical IR task? (or what does an IR system do
usually?)
- What are the common relevance judgments? (e.g. on the
subject, being timely ...)
- What are the common problems with keyword search alone?
- What are other major related ares to IR?
- Web search system
- What are the major components of a typical Web search system?
(Web search system is an application of IR).
- How does the HTTP protocol work as far as the flow of
information is concerned? (we may not need to memorize specifics
of detailed commands, but we should have a general idea).
- What does the command
get
do in HTTP?
- What does the command
post
do in HTTP? In case
of post
, what does content-type
and
content-length
mean?
- IR models
- What is a retrieval model?
- What are the different types of retrieval tasks?
- What are the common steps an IR system would go through to
build an index system from a set of raw documents? What does each step
do?
- How is a document represented in the vector model?
- How is a query represented in the vector model?
- What is a term frequency?
- What is a document frequency?
- What is the inverse document frequency for a given
document frequency?
- Why do we choose the product of term frequency and inverse
document frequency as a common measure of weight for a term in a
document?
- How to compute tf-idf for a given set of statistics?
- Why do we measure the similarity?
- What are the two common similarity measures we discussed in
the lectures?
- How is an inner product (or dot product)
similarity computed?
- What are the problem(s) with the inner product similarity
measure?
- How is a cosine similarity computed?
- What is the main advantage of the cosine similarity measure?
- When a query is received, how does the vector space model IR
system find a set of documents that are relevant to the query?
- Basic text processing and indexing
- What is the basic idea of Porter's algorithm?
- What is the basic structure of an inverted index? What are
the typical information an inverted index contain?
- How to build an inverted index from a list of words and the
set of documents that these words appear?
- Given a query and the inverted index for a collection of
documents, how does an IR system respond to the query? (inverted
index retrieval algorithm)
- Performance evaluation
- What does an evaluation system do here?
- What are the major difficulties in evaluating an IR system?
- What is
precision
?
- What is
recall
?
- What is the ideal (the best) case for a precision-recall
measure?
- How to compute precision/recall measure if we know which
documents are relevant and which ones are retrieved?
- How to draw a precision/recall diagram?
- What is the
F-measure
? Why is it
useful?
- What is the
E-measure
? Why is it
useful? How to use the beta-parameter? What does it mean?
- What is the
fall-out
rate? What does it mean?
- How to compute
ESL
for a given set of retrieved
documents?
- Relevance feedback
- What is the basic idea of relevance feedback?
- How does an automatic feedback work?
- How does a manual feedback work?
- What is the basic idea of Rocchio's vector space relevance
feedback? What are the meanings of the parameters,
alpha,beta,
and gama
?
- How does WordNet-based query expansion work?
- How does thesaurus-based query Expansion work?
- In terms of term similarity analysis, what does global analysis
mean? What does local analysis mean?