Re[3]: URLs

Tony Barry (mailto:tony@INFO.ANU.EDU.AU)
Fri, 9 Feb 1996 11:41:15 +1100

Message-Id: <199602090047.SAA04338@library.wustl.edu>
Date:         Fri, 9 Feb 1996 11:41:15 +1100
From: Tony Barry <mailto:tony@INFO.ANU.EDU.AU>
Subject:      Re[3]: URLs
To: Multiple recipients of list WEBCAT-L <mailto:WEBCAT-L@WUVMD.WUSTL.EDU>

At 01:56 96/02/09, M. Jessie Barczak wrote:
>    Do you know if this Harvest can also do a robotic search and retrieve?
> We are looking into tools that will crawl the Web and find documents or
> URLs to documents that we would like to include in our homepage.

Harvest is a cooperative mechanism which decouples the index collection from the search mechanism and does not require a single site to index the world.

At the indexing level a site can control what indexing information goes into the system. This is called the "Gatherer" level. There is a potential role for librarians here.

The "Gatherers" pass their information to "Brokers" which are the actual databases which are searched. The "Brokers" in turn may pass the indexing information they receive to other brokers which only need to keep that indexing information which covers their field of interest. Indexing information spreads from broker to broker in this way. Potentially a broker can then provide a service only from a local group of sites. sites in a given disciple or at a particular level of quality or a mixture of these.

In this way the collection of indexing information is distributed as is the interrogation. The publisher can enhance the indexing quality in an orgaised way and searching services can be set up that are selective.

The main seacrh engines are all brute force centralised approaches which will fail as the net grows, either through size, or through falling precision in their indexes. The harvest architecture provides an alternative. This technology is attracting a lot of interest in Australia.

More information can be found at http://harvest.cs.colorado.edu/

Tony

__________________________________________________________________________ Tony Barry URL:http://snazzy.anu.edu.au/People/TonyB.html Centre for Networked Information and Publishing & also Centre for Networked Access to Scholarly Information fone +61 6 249 4632 Australian National University Library phax +61 6 279 8120 Canberra A.C.T. 0200, AUSTRALIA mailto:Tony.Barry@library.anu.edu.au