Review one for CSCI 335

CSCI 335
Fall 2006
Review Questions For Exam One

The exam questions will be based on the concepts covered by these review questions. You are allowed to bring one information sheet. You will need to do some computational work so a calculator would be helpful, but not absolutely necessary.

Overview of IR system
1. What is the typical IR task? (or what does an IR system do usually?)
2. What are the common relevance judgments? (e.g. on the subject, being timely ...)
3. What are the common problems with keyword search alone?
4. What are other major related ares to IR?
Web search system
1. What are the major components of a typical Web search system? (Web search system is an application of IR).
2. How does the HTTP protocol work as far as the flow of information is concerned? (we may not need to memorize specifics of detailed commands, but we should have a general idea).
3. What does the command get do in HTTP?
4. What does the command post do in HTTP? In case of post, what does content-type and content-length mean?
IR models
1. What is a retrieval model?
2. What are the different types of retrieval tasks?
3. What are the common steps an IR system would go through to build an index system from a set of raw documents? What does each step do?
4. How is a document represented in the vector model?
5. How is a query represented in the vector model?
6. What is a term frequency?
7. What is a document frequency?
8. What is the inverse document frequency for a given document frequency?
9. Why do we choose the product of term frequency and inverse document frequency as a common measure of weight for a term in a document?
10. How to compute tf-idf for a given set of statistics?
11. Why do we measure the similarity?
12. What are the two common similarity measures we discussed in the lectures?
13. How is an inner product (or dot product) similarity computed?
14. What are the problem(s) with the inner product similarity measure?
15. How is a cosine similarity computed?
16. What is the main advantage of the cosine similarity measure?
17. When a query is received, how does the vector space model IR system find a set of documents that are relevant to the query?
Basic text processing and indexing
1. What is the basic idea of Porter's algorithm?
2. What is the basic structure of an inverted index? What are the typical information an inverted index contain?
3. How to build an inverted index from a list of words and the set of documents that these words appear?
4. Given a query and the inverted index for a collection of documents, how does an IR system respond to the query? (inverted index retrieval algorithm)
Performance evaluation
1. What does an evaluation system do here?
2. What are the major difficulties in evaluating an IR system?
3. What is precision?
4. What is recall?
5. What is the ideal (the best) case for a precision-recall measure?
6. How to compute precision/recall measure if we know which documents are relevant and which ones are retrieved?
7. How to draw a precision/recall diagram?
8. What is the F-measure? Why is it useful?
9. What is the E-measure? Why is it useful? How to use the beta-parameter? What does it mean?
10. What is the fall-out rate? What does it mean?
11. How to compute ESL for a given set of retrieved documents?
Relevance feedback
1. What is the basic idea of relevance feedback?
2. How does an automatic feedback work?
3. How does a manual feedback work?
4. What is the basic idea of Rocchio's vector space relevance feedback? What are the meanings of the parameters, alpha,beta, and gama?
5. How does WordNet-based query expansion work?
6. How does thesaurus-based query Expansion work?
7. In terms of term similarity analysis, what does global analysis mean? What does local analysis mean?

CSCI 335 Fall 2006 Review Questions For Exam One

CSCI 335
Fall 2006
Review Questions For Exam One