Review Questions For Exam Two
The exam questions will be based on the concepts covered by these review
questions. You are allowed to bring one information sheet. You will need
to do some computational work so a calculator would be helpful, but not
absolutely necessary.
- Link Analysis
- Basic idea: citation analysis
- Impact factor
- Bibliographic coupling
- Co-citation
- Citation vs. web link: differences and similarities
- Authorities
- Hubs
- Relation between authorities and hubs:
- Hubs point to lots of authorities
- Authorities are pointed to by lots of hubs
- The HITS algorithm
- Applications of the HITS algorithm: e.g. finding similar
pages using link structure
- PageRank
- Similarities and differences between PageRank and
Authorities/Hubs
- PageRank algorithm
- Web Crawling and Data Gathering
- General algorithm of crawling
- Search strategies
- Breadth-first search
- Depth-first search
- Focused search (some priority queue such as topic,
directory, site focused)
- Issues
- Avoid page duplication
- Expand relative URLs to absolute (complete) URLs
- Anchor text indexing
- Robot exclusion protocol: robots.txt for a site and
meta-tages for individual pages
- Good crawling behavior: self-identification; don't visit
a site continuously; following the robot exclusion protocol
- General URL syntax
- Database and the web
- What is a relational database?
- Basic components of a relational database (tables,
operations on these tables, user interface)
- Physical models, logic models, views
- Keys in a table (primary key, secondary keys)
- Basic operations selection, project, union, set
difference and create table, insert into a table,
update a table, delete a table, delete records from a table
- How a query is represented in a DB?