This extends Assignment 1 using persistent data structures
and additional similarity metrics. It requires two programs.
- For each of at least 10,000 businesses, create a
file-based structured representation containing
everything needed for your similarity metric.
- Create a persistent block-based extensible hash table
that maps businesses to their representation file names.
You may (and are encouraged to) maintain a buffer cache
to speed up IO.
- Traverse this map to pre-categorize (and somehow
store) records into 5 to 10 clusters using k-means,
k-mediods, or a similar metric as discussed in class and
outlined in the course notes.
Extend Assignment 1 to display a category (cluster) and most
similar key from the above data structures.