Re: WEB catalogs that handle MARC records

Andrzej Kowalski (mailto:andrzej@DINGO.COM)
Wed, 20 Mar 1996 09:00:39 -0800

Message-Id: <199603201706.LAA15256@library.wustl.edu>
Date:         Wed, 20 Mar 1996 09:00:39 -0800
From: Andrzej Kowalski <mailto:andrzej@DINGO.COM>
Subject:      Re: WEB catalogs that handle MARC records
To: Multiple recipients of list WEBCAT-L <mailto:WEBCAT-L@WUVMD.WUSTL.EDU>

As a software vendor, my response to this question will be targeted towards
my own products, but I will try to be objective as possible in the
circumstances.

At 12:02 03/19/96 EST, you wrote: >The Vermont State Archives is interested in putting its catalog on
>the Secretary of States WEB site. The Deputy Secretary of State
>wants more information on the following subject:
>
> 1. Is there software (shareware, freeware, commercial) available
>for converting MARC records into HTML (2.0) markup so that they can
>be accessed through the web?

This may not be what you want to do. I assume you have some records in MARC format that you wish to make searchable over the Web? If so, what you are probably looking for is a database engine into which you can load the records and then design Web forms and templates which can be used to query on and display the data over the Internet. Converting the records to HTML beforehand would be unnecessary. Any database engine on the Web worth its salt should do this dynamically on the fly.

> 2. What type of search engines (shareware, freeware, commercial)
>are available to allow catalog searches?

Numerous, including our own KE Texpress object oriented database engine. Be aware that most search engines on the Web are primarily designed to index and query _free_text_, not structured fields, such as MARC records. After all, the Web is primarily a textual medium consisting of pages of free flowing text. A smaller number of search engines, like KE Texpress, are designed to handle collections with structured (catalogue) fields. Now the opportunity for a big plug: very few engines, if any, allow efficient indexed searching across both free text and structured data within the same database schemas, KE Texpress being a notable exception. It has been our experience that archives collections, as opposed to library, tend to consist of a combination of fielded and free text data.

>
> 3. What are the pitfalls to watch out for?

Always try to match the type of data you wish to publish to the capabilities of the search engine. For example, we have seen numerous sites try to shoehorn catalogue-type data into the free text search engine paradigm. Not only do they not get the granularity of searching they need, but they generally have to convert their records to HTML first.

With commercial products, look closely at the licencing scheme for the WWW component of the database scheme. Some database vendors have realised that with a stateless environment like the WWW customers can in many cases do with much smaller licences than on a traditional stateful LAN e.g. 10 concurrent user Web licence versus 100 concurrent user LAN-based licence. Pricing schemes may theerefore be designed to make you buy a large licence instead of utlizing the economies that a stateless connection environment offers. For example, with KE Texpress, one of our clients has a Web site that gets up to 200,000 hits and 45,000 database requests per day. They manage this with a 15 concurrent user licence. In a conventional environemnt, they might require a 100+ concurrent user licence.

Look closely at the layer of software that is used to interact between the database engine and the WWW forms. What language is it written in? Perl is OK for post-processing forms or manipulating data streams, but you do not want it to be used for the actual communication between httpd and the database engine. How long has the software layer been available? Is it supported directly by the database company or is unsupported free/shareware. For example, KE Texpress' Texhtml module is written in C, has been available for over 2 years, is fully supported and runs on all major Unix platforms.

How much programming work will be required. We have seen a particular major database player whose WWW implementation rquires you to laboriously read in every table row, process in a program every field and output a document. Some of the code was stored in stored procedures which required you to query the database once to get the code and then again to get the data. Very time consuming.

Look closely at how the software handles the issue of connecting to the database engine. The most time consuming stage of a database query is often the opening of the connection to the database engine. In a conventional LAN environment, a user may connect once and then stay connected all today. On the WWW, each page request or database search may involve an open and closed connection. KE Texpress handles this by maintaining a number of permanently open connections to the database server through its API.

What type of database auditing is available? Do you want to capature stats over and above those provided by httpd e.g. number of matching records, query terms etc. KE Texpress provides full auditing of Web transactions.

I believe the bottom line is that there are a lot of brand new products, vapour ware, smoke and mirrors and good, stable products all available today.

>
> 4. What is the best design approach to take to ensure a robust
>working catalog?

Choose the right tool set.

>
>Thank you for any information you can provide.
>
> Christie Carter
> Vermont State Archives
> mailto:ccarter@sec.state.vt.us

Hope this helps,

Andrzej Kowalski ################ Andrzej Kowalski Vancouver, BC, Canada Dingo Software Systems Inc. Tel: (604) 877-1960 mailto:andrzej@dingo.com Fax: (604) 877-1961 http://www.dingo.com

Fast and flexible data retrieval from data collections of all sizes in-house and on the WWW with the KE Texpress object oriented database. ###############