Dictionaries, inverted files, postings, Term weighting, Similarity, ranking and the vector space model, String processing 1: Wild cards, stemming, and spelling, String processing: String search, Relevance feedback and query refinement, Latent semantic indexing, Probabilistic information retrieval, Evaluation of retrieval effectiveness, Web crawling, Architecture of information retrieval systems, Links and anchor text, Spam and advertising, Interfaces for browsing and searching, Metadata, Classification and categorization.
Text Understanding - This segment of the course will focus on additional language processing steps for template filling and information extraction from retrieved documents, including reference resolution, sense tagging and summarization. Emphasis will be placed on recent, primarily statistical methods.
Web Agents and WWW Applications - The final segment of the course will explore current issues in information retrieval and data mining on the World Wide Web. It will focus on case studies of web agents, spiders, robots and search engines, exploring both their practical implementation and the economic and legal issues surrounding their use. One of the hot technologies of the 21st century!