CSC365 Assignment 1
Requirements
This assignment asks you to create a web page categorization
program.
- The program reads 10 web pages each from two or more
different pre-defined categories. You can define the categories
yourself. (Examples: "star trek fan sites", "java developer
blogs", "information about south american rodents"). The urls for
these web pages should be maintained in a control file that is
read when the program starts
- For each category, the program maintains frequencies of
words appearing in the web pages. (You can add any other collected
information as well if you like.)
- The user can enter any other URL, and the program decides
which category it best belogs in, using a similarity metric of
your choosing, and further, recommends the most closely matching
of the other known pages.
The presentation details are up to you.
The implementation restrictions are:
- Use java.util.collections for all data structures. Read
through the Collections
tutorial first. Do NOT implement any collections yourself;
use only the supplied ones.
- Use Swing components for the GUI. Read through the relevant
parts of the Swing
tutorial first.
- Use Java networking components for accessing web pages. See the
Java networking with URLs tutorial
Test your program thoroughly before submitting, and arrange a demo within
48 hours of submitting. (Demoing before submitting is strongly encouraged.)
Doug Lea
Last modified: Wed Sep 6 20:39:53 EDT 2000