Assignment 2
This extends Assignment 1 using persistent data structures
and additional similarity metrics. It requires two programs.
- Loader
-
- For each of at least 200 sites (URLs), create a
persistent file including its word frequencies and any
other similarity information
- Associate a file for each site. Traverse though them
to pre-categorize (and somehow store) records into 5 to
10 clusters using k-means, k-medoids, or a similar
metric as discussed in class and outlined in the course
notes.
- Application
-
Adapt Assignment 1 to display a category (cluster) and most
similar key from the above data structures.
Doug Lea