Assignment 2

This extends Assignment 1 using persistent data structures and additional similarity metrics. It requires two programs.

Loader

For each of at least 200 sites (URLs), create a persistent file including its word frequencies and any other similarity information
Associate a file for each site. Traverse though them to pre-categorize (and somehow store) records into 5 to 10 clusters using k-means, k-medoids, or a similar metric as discussed in class and outlined in the course notes.

Application

Adapt Assignment 1 to display a category (cluster) and most similar key from the above data structures.