overview


The program, currently over 14,000 lines of Java 1.1 code, presently has the following major parts:

* Ferrets * Ferret advisors
* Filters * Filter advisors
* Page features * Fingerprint functions
* Page caches * Mapmakers
* The page database
* Generalizers
* The user interface


The ferrets roam the web gathering pages likely to interest the user using simple fingerprint functions to test whether any particular page will be interesting. Any such pages are stored in a page cache for later filtering. The filters then decide whether the user is actually likely to be interested in each cached page using more sophisticated and stringent fingerprint functions.

Both ferrets and filters have their own advisors, who construct the various fingerprints that determine the goodness of a page. Each fingerprint is built around a particular interest area of the user, so a page may fail one test yet pass another. Consequently, each filter looks at every cached page and if any filter approves a page it is passed to the mapmakers, even if no other filter approves it.

Along with each approved page, the filters also tell the mapmakers which filter approved the page, and how much it approved the page. This approval is the numeric value assigned to a page when passing it through the filter's fingerprint function (the higher the value the higher the approval). The approval rating is later used in a feedback loop (see below).

After a filter approves a page, the mapmakers analyze the page's contents and pass the page plus its discovered attributes to the database. Inside the database, the page is analyzed even further, this time with respect to the other pages currently in the database. The mapmakers compare the new page to the other pages to determine which pages the new page is most similar to.

The database then creates a Document for each accepted page. A Document contains a reference to the page, the page's attributes (as determined by the mapmakers), a list of references to other pages that this new page is most similar to, and a list of the corresponding similarity values for those similar pages. All this information is stored in the database, which then becomes the sole information source for the user interface.

All programs in the above page pipeline run as separate applications, independent of the user interface. Thus they can gather and analyze new pages at any convenient time, regardless of whether the user has an instance of the interface running at present.

When the user starts a new session, the user interface displays a space of icons with each icon representing a page stored in the database. The user interface also monitors every movement the user makes and records them in various UserActions, a simple class that holds the basics of an action. Several subclasses of UserAction capture information about specific user events.

The user's actions give the system information that helps it learn what the user's interests and preferences are. The user interface passes UserActions to the generalizers, who use the information to update the database to better reflect the user's interests. For example, when the user deletes or moves a page icon, the system deduces that it had made an error (in one case in accepting the page, and in the other in deciding where the page should be placed).

If the user dislikes a page, the filter that approved it is somewhat discredited proportional to how much it approved the page, and so are the pages the advisor used to develop that particular fingerprint in the first place. That makes it less likely that the same (discredited) page will be used to generate fingerprints in future. Further, the filter that approved the page may have its fingerprint function replaced. Similar feedback applies if the user approves a page.

If the user deletes a page, the generalizers will not remove it from the database but will add large negative weights to it. The advisors use these weights when constructing fingerprints, which are then passed to the ferrets and filters. Also, such pages will no longer be passed to the user interface to be displayed.

The program forms one big circular page pipeline with messages in objects passing from one actor to another. The main focus is the database, which dynamically rearranges the approved pages to better serve the user.

Several packages implement the above functionality. Here are the packages and areas each package helps define:


Area Package(s)
Page caches cache
Fingerprint functions fingerprint
Page features feature
Ferrets ferret
Filters filter
Ferret advisors advisor
Filter advisors advisor
User interface gui
Mapmakers mapmaker, parser, notifier
Page database webpageDB
All packages tools



last | | to sitemap | | up one level | | next