During the gopher phase of the project, many Veronica and Jughead searches were performed on the keywords society, association, union, federation and academy. A few variants of these words in other major European languages were occasionally tried, but with much less success.
As is well known, searching for resources of a substantial nature using Veronica is best done by restricting the searches to directories. Alternatively Jughead searches should be done. Otherwise the searcher is likely to be overwhelmed by flimsy news reports or digests of discussion groups that have been archived in gopherspace under descriptive titles. In the Scholarly Societies Project, the problem was compounded by the different senses in which each of our five keywords could be taken (especially "society").
For the gopher phase of the Project, we also monitored "gophers added lately..." from Washington and Lee University. It quickly became evident, however, that a considerable number of interesting and useful resources were not being registered quickly if at all, since Veronica and Jughead searches tended to uncover many resources that did not appear on "Gophers added lately....".
In the WWW phase, we also relied on listings of newly registered (or announced) resources, principally "NCSA - What's New". As with the gopher phase, it became clear that many useful and interesting resources were not being registered or announced expeditiously, if at all.
Consequently we relied heavily on subject searches using various WWW search engines. In the following, we shall indicate some of our experiences with these search engines.
WebCrawler. In the beginning, much use was made of WebCrawler. Although many useful resources were found in this way, the searches were very frustrating, for three reasons.
First, until late 1994, WebCrawler limited the total number of items retrieved to (about) 50. (In late 1994, however, this was modified so that the user had more options.)
Second, WebCrawler indexed all significant words on the entire top level page of each resource. This caused numerous hits of no relevance. Project staff would have preferred some control over the retrieval vs relevance ratio: for example, being able to limit the search to the titles of resources. An alternative would be to have the relevant passages of the top-level page displayed to give some context, so that it would not always be necessary to link to a page to determine why it was retrieved.
Third, WebCrawler appeared to use an elaborate set of hidden synonyms and truncations to increase retrieval. Here again Project staff would have preferred more control over the retrieval vs relevance ratio: for example, being able to disable synonym and truncation matching.
CUI WWW Index. The CUI WWW Index is less comprehensive than WebCrawler, since the former is constructed only from several announcement listings, like "NCSA What's New." As noted above, many interesting and useful resources are thereby missed. On the other hand, Project staff found this index much better for our purposes than WebCrawler, since it had a better balance between retrieval and relevance. This is because only a brief (and highly relevant) announcement is indexed, rather than the entire top-level page. Furthermore, the complete text of the announcement is displayed as part of the search result, so that the user has immediate context, and often can discard an inappropriate item without even linking to it.
Lycos. In late 1994, Project realized another problem with WebCrawler: it had not been updated since September. So we began to use instead a similar search engine, Lycos. Like WebCrawler, Lycos was not restricted to resources that had been registered/announced. But like the CUI WWW Index, Lycos displayed contextual information, so that it was often clear why a resource had been retrieved. Like the modified WebCrawler, Lycos offered the user options in the number of entries retrieved. Unlike WebCrawler, Lycos was very uptodate.
Critique of Search Engines. It is interesting to note that none of the above search engines incorporated important features that had long ago become commonplace in CD-ROM search software: control over which fields of a record are searched, and control over which synonyms are used in the matching. Put another way, these search engines appear to have been constructed disregarding one of the most basic precepts of information science: allowing the user techniques for controlling the retrieval vs relevance ratio.