Assignment 2

This extends Assignment 1 using persistent data structures and additional similarity metrics. It requires two programs.

Loader

For each of at least 10,000 businesses, create a file-based representation containing everything needed for your similarity metric. You can use Java serialization, with the file name the same as the business ID. (If any two businesses have the same name, you can discard one of them.)
Create a persistent block-based file-based B-Tree or Hash Table mapping the business name to its business ID file name.
Traverse this map to pre-categorize (and somehow store) records into 5 to 10 clusters using k-means, k-medoids, or a similar metric as discussed in class and outlined in the course notes.

Application

Extend Assignment 1 to display a category (cluster) and most similar key from the above data structures.