Assignment 3

  1. Extend Assignment 2 to record links from each of at least 1000 business to its four geographically closest neighbors. Use the Haversine formula based on longitudes and latitudes. Because some of the Yelp records are geographically clustered, you might pick say every 100th record from the orignal data set to avoid all being from one city or state. As a connectivity check, report the number of disjoint sets (from arbitrary roots). Store persistently (possibly just in a Serialized file).
  2. Write a program (either GUI or web-based) that uses graph from step 1, allows a user to select two nodes, and displays the shortest (weighted using any similarity metric other than geographical distance) path between them, using connected sets from step 1 to determine reachability. Minimally, the path can be indicated by a series of sites, but you are encouraged to also graphically display links not taken for each site along the path

Doug Lea