Assignment 3

  1. Loader: From at least 4 different starting Wikipedia sites, recursively collect a total of at least 400 Wikipedia sites by following links (ignore Wikipedia navigation links). Store the edges along with similarity-based distance metrics persistently (possibly just in a Serialized file).
  2. Application: Write a program (either GUI or web-based) that recreates the graph from step 1, and reports the number of disjoint sets (based on one of the roots) as a connectivity check. Allow a user to select any two sites, and display the shortest (with resepect to weights) path between them. The path can be indicated by a series of sites, at each step indicating the links not taken; but you are encouraged to also graphically display paths.

Doug Lea