CSCI 335: Web Information Retrieval
Fall 2006--Xiannong Meng
This is CSCI335: Fall 2006 Web Information Retrieval,
on-line courseware. The web pages are
constantly evolving. Please re-visit us often. If you have any
comments or suggestions, please
send mail to me. Thank you very much.
Syllabus
Textbook and Other Rerences
Other reference books include
- An Introduction to Information Retrieval by Christopher D.
Manning, Prabhakar Raghavan, and Hinrich Schutze, August 2006.
- Information Retrieval, by Van Rijsbergen, available on-line
www.dcs.gla.ac.uk/Keith/Preface.html.
- Automatic Text Processing by Gerard Salton,
Addison-Wesley, 1989.
- Finding Out About: A Cognitive Perspective on Search Engine
Technology and the WWW, by Richard K. Below, Cambridge University
Press, 2001.
- Internet Agents: Spiders, Wanders, Brokers, and Bots, by
Fah-Chun Cheong, New Riders Publishers, 1996.
- Data Mining: Concepts and Techniques, by Jiawei Han and
Micheline Kamber, Morgan Kaufmann, 2001.
- Data Mining Methods for Knowledge Discovery, by Krzysztof Cios,
Witold Pedrycz, Roman Swiniarski, Kluwer, 1998.
- Information Storage and Retrieval, by Robert
R. Korfhage, John Wiley & Sons, 1997
Lecture Notes
Most of these
notes are from other professors with some revisions.
Notes are based on
Professor Raymond Mooney's Intelligent Information Retrieval and Web Search
Course at UT Austin, Professor David
Yarowsky's Information Retrieval and Web Agents at John Hopkins
University, and Professor David Grossman's
of IIT.
- Introduction to
Inormation Retrieval PowerPoint,
PDF and
PDF for
printing
- Web Search -- Introduction PowerPoint,
PDF and
PDF for
printing
- Web Interface PowerPoint,
PDF and
PDF for
printing
- Information Retrieval Models PowerPoint,
PDF and
PDF for
printing
- Basic Text Processing and Indexing PowerPoint,
PDF and
PDF for
printing
- Case study: AltaVista based on notes from the original DEC
presentation at
http://gatekeeper.dec.com/pub/DEC/SRC/publications/sites/talk/ last
accessed September 12, 2006.
- Case study: Google PowerPoint,
PDF and
PDF for
printing
- Performance evaluation PowerPoint,
PDF and
PDF for
printing
- Query operations PowerPoint,
PDF and
PDF for
printing
- MARS presentation PowerPoint,
PDF and
PDF for
printing
- Link analysis PowerPoint,
PDF and
PDF for
printing
- Web crawling PowerPoint,
PDF and
PDF for
printing
- Basic database concepts, information based on
Silberschatz, Korth,
and Sudarshan's database book
- Text Properties PowerPoint,
PDF and
PDF for
printing
- Text Clustering PowerPoint,
PDF and
PDF for
printing
- Text Categorization PowerPoint,
PDF and
PDF for
printing
- The peer evaulation form used for all projects
- An exercise to develop
team guidelines for your team.
Read the Team Work Guidelines for
references
Assigned: Friday, August 25th
Due: Wednesday, August 30th
- Programming Project
Part One.
Assigned: Monday, August 28th
Due: Wednesday, September 13th
- Programming Project
Part Two.
Assigned: Wednesday, September 13th
Due: Wednesday, September 27th
Various implementations of Porter's algorithm
- Programming Project
Part Three.
Assigned: Monday, October 2nd
Due: Monday, October 23rd
- Research paper.
Assigned: Monday October 23, 2006
Proposal due: Monday October 30, 2006
Information update due: Monday November 6, 2006
Paper due: Monday November 20, 2006
Presentation:November 27, 29, December 1, 2006
- Programming Project
Part Four.
Assigned: Monday, October 23rd
Due: Monday, November 13th
In-Class Work and Suggested Solutions
Written Homework Assignments
- Homework One Due: September 15th, 2006,
assigned September 8th, 2006
- Homework Two Due: September 29th, 2006,
assigned September 22nd, 2006
- Homework Three Due: October 6th, 2006,
assigned September 29th, 2006
- Homework Four Due: October 20th, 2006,
assigned October 13th, 2006, proposed solution.
Reviews For Exams
Important research papers
- The Google architecture paper, Sergey Brin and Lawrence Page,
"The Anatomy of a Large-Scale
Hypertextual Web Search Engine", 7th IWWW Conference, Brisbane,
Australia, 14-18 April 1998.
- David Gibson, Jon Kleinberg, Prabhakar Raghavan Inferring Web
Communities from Link Topology, Proceedings of the 9th ACM
Conference on Hypertext and Hypermedia, 1998.
- Mei Kobayashi and Koichi Takeda, "Information Retrieval on the Web", ACM
Computing Surveys, 32(2), pp. 144-173, 2000.
- Raymond Kosala and Hendrik Blockeel,
"Web Mining Research: A survey", SIGKDD Explorations, July 2000,
2(1), pp. 1-15.
- Z. Wu, W. Meng, C. Yu, and Z. Li,
"Towards a highly-scalable and
effective metasearch engine", WWW10, 2001.
- Sergey Melnik, Sriram Raghavan, Beverly Yang, Hector Garcia-Monina,
"Building a distributed full-text index for the Web", WWW10, 2001.
- Soumen Chakrabarti, Byron E. Dom, David Gibson, Jon Kleinbeig,
Ravi Kumar,
Prabhakar Raghavan, Sridhar Rajagopalan and Andrew Tomkins,
"Mining the Link Structure of the World Wide Web", WWW10, 2001.
- R. Srikant and Y. Yang,
"Mining Web log to improve Website organization", WWW10, 2001.
- L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan,
G. Wolfman, and E. Ruppin,
"Placing search in context: The concept revisited", ACM
Transcations on Information Systems (20) 1, pages
116-131, January 2002.
- D. Haines and W. Bruce Croft, "Relevance Feedback and Inference
Networks", Proceedings of SIGIR 93, pages 2-11, 1993.
- More extensive reading list can be found from this link
How To Do A Literature Search
Researh by
Subject Page of Bucknell Library Web site
Critically Evaluating the Quality of a Research Source
How to Distinguish
Scholarly Works from Popular Press
How To Write a Research Paper
Resources on Oral
Presentations
An Example of Simple LaTeX
File
An Example of LaTeX File That
Uses BibTex
Note that the BibTex source file can be found at
~xmeng/lib/tex/web.bib
from the UNIX file system.
Some Resource Links:
Academic Responsibility
Students are expected to read and abide by the principles clearly
explained in the Student
Handbook. Under no circumstance, should any student submit work
that is not of his or her authorship. If a deadline is tight, or
impossible, before getting desperate, talk to your instructor. It is
better to be late than dishonest. Remember that your instructor's main
goal is to give you nothing but the best opportunities to learn.
The Computer Science department also has an
Academic Responsibility
policy posted on the department website under student information. Please read this policy carefully.
Your instructor will make every effort to explain in detail the
collaboration
policy for each specific assignment. Before you start your work, make
sure to
read and understand this policy. Should any questions arise, contact
your
instructor immediately to have them clarified.
This page is created and maintained by Xiannong Meng.
Please send comments to xmeng@bucknell.edu