Background


We draw much of the motivation for this work from one author's recent experience managing emerging technology needs of a personal online services division of a large company. Preliminary market research indicated that the major technical improvements needed to our online offerings were smarter search and personalization. The two initial web sites were concerned with offering medical information and community information, respectively. One challenge for the medical information site was the extensive medical vocabulary that was not well understood by typical users of the site. Thus, we believed that users would be served if we provided background information about medicine which could be used to organize the site as well as to expand or refine queries. Challenges for the community site included the many ways that information can be referred to and the breadth of information. An organization scheme and synonym based searching was particularly important in this site. Additionally these sites needed to run with standard commercial tools on a mainstream platform. Thus, any solution needed to be easily integrated with standard search technology, such as the Verity search tools [6].

We also looked for trends in web sites and found one common need in some of the more ambitious emerging services. Many sites (e.g., CitySearch[10], Quintillion[16], etc.) maintain information in either a commercial or proprietary database and then dynamically generate web pages. This provides much greater flexibility in terms of manipulation of data allowing such benefits as simple multiple view generation, compressed data storage, some capacity for type integrity checking, and, potentially, reasoning capabilities. From a site manager's perspective, this can provide the ability to update the look, feel, and function of a service with less effort and fewer errors while supporting automatic (or semi-automatic) global updates. We would want any search functionality to function with web sites which generate ephemeral web pages, only existing in response to a user query, and also with content that is stored in forms such as databases and knowledge bases.

We considered user needs and found, of course, that users desired search functionality with human-like understanding qualities. Beyond this ambitious need, users unfamiliar with a site require some information about the subject areas a site covers. Also, users may benefit from some structured support for site browsing beyond the usual: menu structure, site maps, and simple text searches. Therefore, we would like a search environment to support some organization and presentation of the general content on the site. Finally, users can benefit from tools which support generation of more accurate questions, particularly if access or speed issues are present. It then becomes even more important to help users make fewer inappropriate queries, and thus fewer requests from the server. We use background knowledge in support of all these goals: content introduction, structured browsing, and query refinement.

Additionally users may want to search data in multiple content sources, so searches that can access multiple sources transparently will be desirable. Further, it may be useful to limit the content sources searched. Thus, simple search modules should allow users to specify which sources to use; more advanced search functions will make informed choices for the user.

There is an unquestionable benefit to using structured search language to find information on a web site. However, learning and remembering a search language puts a burden on the novice or infrequent user. We would like to provide a way for low level users to derive the benefits of structured searches without requiring them to have to learn Boolean terminology.

Another trend in web sites is that content generation is distributed. Community sites and event calendars, for example, are likely to have more correct and up to date information if the content responsibility lies with the local organization rather than with a central repository. Thus, we find that both users and content providers will be the people who know the domain the best, and in fact, there may be no domain expert on a central community support staff. Thus, we would like any search that uses background knowledge to have a source of knowledge collection that can utilize the domain knowledge in the community rather than just the domain knowledge of a central repository.