Next: Conclusions Up: WordSieve: A Method for Extraction Previous: Discussion

Relationship to Other Work

Calvin is an intelligent agent for learning about a user's context and making suggestions based on that information. Perhaps the best known agent in this class is the Microsoft Office Assistant [12]. Using a Bayesian network, the office assistant infers what a user's goals might be from the user's behavior, and makes suggestions based on those inferences. However, the Office Assistant has a very specific model of how the underlying application works, and its suggestions deal with how to use the software. Thus, this agent's understanding of context is tailored to its special-purpose task. The advantage of this approach is the ability to give very specific advice. However, the advantage is gained at the cost of generality. (See [11] for comments on the potential difficulty of applying a problem-specific notion of context to form a broader theory.) CALVIN could potentially make suggestions based on the documents loaded into any application. Although it does not claim a formal theory of context, its architecture is easily transferred across document access applications.

Margin Notes [16] is another agent that suggests previously accessed documents. Unlike Calvin, which indexes documents accessed at runtime, this system pre-indexes documents and email stored on the user's computer. Then, while the user is accessing information on the WWW, the user's web pages are automatically annotated with references to related pre-indexed files. Margin Notes uses TFIDF for pre-indexing documents, so its understanding of context is based purely on the content of a set of documents and not on how the user accesses them. Other approaches also index documents in isolation from the context in which they were accessed [10].

Watson [6] observes uses of standard software tools, such as browsers and word processors, and generates queries to WWW search engines for context-relevant information. Watson, like Calvin, uses a vector representation of the context. However, it focuses on information about the immediate task context, rather than information about the user's task sequence. Its method of document parsing helps it find significant words without a larger corpus, in a spirit similar to WordSieve, and its automated use of WWW search engines gives it an advantage over both Calvin and Margin Notes in that it can suggest documents which system users have not yet seen. In should be noted, however, that as document indices are generated by WordSieve for multiple users, those indices can enable cross-user retrievals, providing an individual user with new documents expected to be of interest in the current context. Other techniques, such as data mining with rule learning, have been employed to record user profiles [1]. Rule learning methods would be well-suited to finding the kinds of terms that WordSieve learns. However, rule learning usually requires a database of user activity to analyze in order to find the rules, while WordSieve works incrementally in real time.

Finally, the idea of learning user profiles for making suggestions has been applied to areas other than text document indexing and retrieval, such as recommending television programs [9]. The PTV system builds a user profile using information such as explicit descriptions of preferences, as well as user watching practices. This system also makes use of the preferences of other users who have similar profiles. One design challenge facing such systems is that the actual contents of the television programs are opaque to the system, unlike the contents of text documents for systems such as WordSieve.

Next: Conclusions Up: WordSieve: A Method for Extraction Previous: Discussion

Travis Bauer
2002-01-25