filter package

Package Contract

The Filter package is responsible for determining which HTML pages will finally be present in the user's space of pages. It filters the HTML pages approved by the Ferrets, that is, it applies exhaustive suitability tests on the HTML pages fetched and approved by the Ferrets to decide if the pages should be present in the user's space of pages. The suitability tests are based on a fingerprint function which is a mathematical function representing the attributes that an HTML page containing something of interest to the user, should possess.

This fingerprint function is applied to each HTML page approved by the Ferret and if the result of this function is greater than the defined threshold, the page is considered approved by the Filter. Once the Filter has approved these pages, they are passed on to the Analyzer which analyzes the page to determine all its attributes. The fingerprint functions are maintained in the FilterFingerprint Pool. These functions are generated by the Filter Advisor based on its knowledge of the user's currents interests and his behavior when browsing the Internet. This package consists of three classes- Filter, FilterFingerprintPool and Annotations.

Package-Level CRCs


Collaborator classes in cache: FerretApprovedPagesCache
Collaborator classes in fingerprint: FingerprintID, FingerprintFunction, FilterFingerprintFunction
Collaborator classes in parser: PageAttribute, AddFilterAnnotations
Collaborator classes in notifier: Notifier
Collaborator classes in advisor: FilterAdvisor


Class-Level CRCs

The filter package contains the following classes:

* Filter
* FilterFingerprintPool
* Annotations

Class filter

* Responsibilities:
Applies further exhaustive suitability tests on the pages and passes on the web pages satisfying these tests to the Analyzer.
* Collaborators:
* Within Package:
Class FilterFingerprintPool
Class Annotations
* With other packages:
Class FerretApprovedPagesCache
Class FingerprintID
Class FingerprintFunction
Class FilterFingerprintFunction
Class AddFilterAnnotations
Interface Notifier

* Flow of Control:
- The Filter first gets a reference to the PageAttribute object of the web page which has been accepted and approved by the Ferret, from the FerretApprovedPagesCache. The PageAttribute is a class encapsulating all the attributes of an HTML document in space of pages. The FerretApprovedPagesCache is a datastore of all the PageAttributes of the HTML pages approved by the Ferret.
- The Filter then selects the appropriate Filter Fingerprint function from the FilterFingerprint Pool to apply to this page. If the result on applying the fingerprint function is greater than the threshold for the fingerprint function, the page is considered to be approved by the Filter.
- The Filter then adds Annotations (useful information about the Filter Fingerprint function that approved the page like the fingerprintID, the threshold, the seed site used to derive the fingerprint function) to the PageAttribute object for this page. This is done via the AddAnnotations class of the parser. This information will later be used by the FilterAdvisor to determine the reliability of the fingerprint function.
- Finally the Filter then calls the notify() method of the Notifier for the Analyzer to inform the Analyzer of the arrival of a new PageAttribute of an HTML page approved by the Filter. The FerretApprovedPagesCache is responsible for notifying the Filter of the presence of a new PageAttribute object for a web page which has been approved by the Ferret. The Filter is provided with a reference to the Notifier and the FilterFingerprintFunctionPool on creation.

* Variable and Methods:
public Filter(FilterFingerprintPool filterFingerpintPool, Notifier notifierForAnalyzer)
The constructor gets a reference to the fingerprint pool and the notifier for the Analyzer.
public void processNewPageArrival(PageAttribute, ferretApprovedPageAttribute)
This method is signaled by the put() method of the FerretApprovedPagesCache whenever a new page is approved by the Ferrets. The filter, on getting the PageAttribute reference to this new page, calls its applyFingerprint() method.
private void applyFingerprint(PageAttribute incomingPageAttribute)
This method selects from the Filter FingerprintPool the fingerprint function suitable for applying to the incoming web page whose PageAttribute reference has been passed by the Filter's processNewPageArrival() method. The method used to decide suitability of a filter for a particular page belongs to the FilterFingerprintFunction class. Once the suitable filterFingerprint functions have been selected, each of them is applied to the web page to check if it is good to be stored. If so the PageAttribute of this page is passed as argument to the notify method of the notifier for the Analyzer.

Class FilterFingerprintPool

* Responsibilities:
This class acts as a datastore for all the Filter FingerprintFunctions. It contains a Hashtable keyed by the FingerprintID and containing objects type FilterFingerprintFunctions. The Filter Advisor is responsible for inserting new FilterFingerprint function entries into this Pool based on its understanding of the user current interests. The Purger (when it is implemented) will be responsible for purging fingerprint functions which are not going to be used any more. The Filter calls methods of this class to get an Enumeration of the Filter Fingerprint Functions in the Pool. When the Filter gets a new web page which has been approved by the Ferrets, it runs through Enumeration to determine the suitable fingerprint functions for testing the web page.
* Collaborators:
* Within Package:
Class Filter
* With other packages:
Class FilterAdvisor
* Variables and Methods:
public void insert(FilterFingerprintFunction fingerprintFunction)
This method is called by the FilterAdvisor to add new FilterFingerprint functions into the pool.
public FilterFingerprintFunction remove(FilterFingerprintFunction fingerprint)
This method is called by the Filter Advisor to remove any FilterFingerprint functions not needed in the future. In the future, when the Purger is implemented, this method will be called by the Purger.
public Enumeration getFingerprintFunctions()
This returns an Enumeration of all the FilterFingerprint functions present in the pool. When the Filter gets a new ferret-approved page, it runs through this list to check which filter is suitable for testing the page with.
public FilterFingerprintFunction getFingerprintFunction(FingerprintID fingerprintID)
This method retrieves a fingerprint function given the fingerprint ID.

Class Annotations

* Responsibilities:
This class encapsulates the information annotated by the Filter for each Filter Fingerprint Function that considers a Web page suitable for storing in the user's space of pages. A Vector of Annotations is a field of the PageAttribute class which encapsulates all the attributes that belong to a Web page stored in the space of pages.
The Annotations consist of a fingerprint ID, vectors of threshold values, evaluated values and seed sites. Each of the items in these vectors correspond to a particular feature of a fingerprint function.
The threshold for a feature is the limiting deciding value which determines whether a web page has that particular feature. Any web page having this feature should have an evaluated value greater than that feature's threshold when the fingerprint function is applied on the page. The seed Site is the URL of the web page based on which the feature's threshold has been calculated. The Annotations are used by the Filter Advisors to calculate the reliability of the Filter Fingerprint Functions.
* Collaborators:
* Within Package:
Class Filter
* With other packages:
Class FilterAdvisor
Class Methods:
* Variables and Methods:
All methods of this class are accessor methods which return the private variable values of this class.
public FingerprintID getfingerprintID() //NOTE: THIS NAME IS NON-STANDARD
public Vector getThresholds()
public Vector getEvaluatedValues()
public Vector getseedSites() //NOTE: THIS NAME IS NON-STANDARD

Message Interactions

Message Interactions with other packages
* with cache
The processNewPageArrival() of the FerretApprovedPagesCache class calls the processNewPageArrival() method of the Filter to inform it of the arrival of a new HTML page approved by the Ferret.
processNewPageArrival() method of the Filter to inform it of the arrival of a new HTML page approved by the Ferret.
FerretApprovedPagesCache calls Filter.processNewPageArrival()
* with advisor
The FilterAdvisor calls the insert() or remove() methods of the FilterFingerprintPool to insert or remove FilterFingerprintFunctions.
public void insert(FilterFingerprintFunction fingerprint)
public FilterFingerprintFunction remove(FilterFingerprintFunction fingerprint)
* with notifier
The Filter calls the Notifier's notify() method to inform the Analyzer of a page that it has approved.
* with parser
The Filter instantiates the AddFilterAnnotations class of the parser package. This adds the Vector of Annotations made by the Filter to the PageAttribute object of the HTML page that the Filter acted on.
Message Interactions between classes in the package
* Filter and FilterFingerprintFunction.
Class Filter's applyFingerprint() method calls the isFilterSuitable() method of the FilterFingerprintFunction.
Class Filter's applyFingerprint() calls the isPageGood() method of the FilterFingerprintFunction.
last | | to sitemap | | up one level | | next