Message-Id: <mailto:199502022340.RAA25645@library.wustl.edu> Date: Thu, 2 Feb 1995 16:38:37 -0500 From: "David M. Seaman" <mailto:dms8f@ETEXT.LIB.VIRGINIA.EDU> Subject: SGML Text Embedded in Image Files To: Multiple recipients of list IMAGELIB
SGML Text Embedded in Image FilesFor close to three years now the Electronic Text Center has been producing SGML texts (tagged to the TEI Guidelines). The advent of the World Wide Web has meant that we can provide on-line access to HTML versions -- in our case generated from the TEI copy "on the fly" (more about this in a subsequent posting.)
Almost as soon as we had the TEI-to-HTML conversions running, a problem dawned on us. A growing number of our electronic texts have book illustrations and other book-related images along with the tagged ASCII text, and these images carried no obvious attribution. The solution to the problem of unlabelled book illustrations wandering free from their texts presented itself quite readily: the user who downloads an image file of a book illustration or manuscript page needs to have delivered along with it a copy of the bibliographical header -- a catalog record, a finding aid, and a description of the production of the electronic text -- that is at the top of every TEI text.
We now achieve this by burying a version of the TEI header into the binary code of the image itself. The user who saves an image from a text on our etext server now gets -- in Trojan Horse fashion -- a tagged full-text record of the creation of that image as part of the single image file they save. If a user has an image tool that permits the viewing of text comments in the image file (I use XV, the X Windows viewer with XMosaic) then both image and header can be seen simultaneously, but any program that lets you see the contents of a file is sufficient to read the text.
The text that goes into the image file does not have to be the TEI header, of course, but a version of the TEI header is the obvious choice as it already exists for the written text. There are long-term advantages to making this "text in the image file" contain clearly delimited fields: when we have software that can search (rather than simply view) the text contained in image files then suddenly we have collections of images that are searchable by data field and keyword. Even now, by keeping a copy of the image header separate, one can have a searchable SGML text database hypertextually linked to the images it describes.
I'm hoping that the practice of burying SGML-tagged ASCII data in the code of an image file will become commonplace in the electronic data communities, and would hope to see libraries, museums, and grant-giving agencies lead the way in instituting this process. The information in the header does not have to be exhaustive -- at worst a simple "name of item/place of creation" attribution would help to curb the growing problem of unidentified network data that gets free from its home site.
The Electronic Text Center has been creating TEI-derived image headers and burying them into images for more than six months now, and can safely say that it is relatively little work on top of the other data creation processes. We are writing up the procedure we use to aid others who wish to do the same -- presentations of the idea at conferences and site visits since the summer have met generally with approval, and I would like to hear what this list thinks of the practice.
Examples of web-accessible JPEG files that contain textheaders can be seen in the following:
The illustrations in Rita Dove's "Lady Freedom Among Us" (the University of Virginia's four-millionth volume):
http://www.lib.virginia.edu/etext/fourmill.html
The illustrations in the University of Virginia section of Michael Plunkett's Afro-American Sources in Virginia: A Guide to Manuscripts.
http://www.virginia.edu/~press/
The illustrations in the following items in the British Poetry archive -- Carroll, Polwhele, Tennyson -- at
http://www.lib.virginia.edu/etext/britpo/britpo.html
Next week we will announce a web-accessible collection of hundreds of modern English texts, many of them illustrated with images including tagged headers.
I am appending here a sample TEI header for one of the pages of the Rita Dove poem that is our four-millionth volume. In this case I have included the text that appears on the page as well -- a stanza from Dove's poem "Lady freedom among us":
******************************************************************** My article "Campus Publishing in Standardized Electronic Formats -- HTML and TEI" in _Filling the Pipeline and Paying the Piper: Proceedings of the Fourth Symposium_ (ARL Publications, 1995) contains a longer illustrated account of this topic. E-mail mailto:arlhq@cni.org for ordering information. ******************************************************************** David Seaman, Coordinator 804-924-3230 (phone) Electronic Text Center 804-924-1431 (fax) Alderman Library email: mailto:etext@virginia.edu University of Virginia http://www.lib.virginia.edu/etext/ETC.html Charlottesville, Virginia 22903 *********************************************************************
<imageHeader> <fileDesc> <titlStmt> <title>Lady Freedom among us</title> <resp><role>Illustrator</role> <name>Claire van Vliet</name></resp> <resp><role>Creation of digital image</role> <name>University of Virginia Library Electronic Text Center</name></resp> </titlStmt> <pubStmt><resp><name>University of Virginia Library</name> <role></role></resp> <address>Charlottesville, Va.</address> <idno typeþTC">Modern English, DovLady</idno> <date>1994</date> </pubStmt>
<srcDesc> <biblFull> <titlStmt><title>Lady Freedom among us</title> <author>Rita Dove</author></titlStmt> <pubStmt> <resp><role>publisher</role><name>Janus Press</name></resp> <address>West Burke, Vermont</address> <date>[1994[, c1993</date> </pubStmt> <noteStmt><note></note></noteStmt> </biblFull> </srcDesc> </fileDesc> <encDesc> <projDesc><p>Prepared by David Seaman for the University of Virginia Library Electronic Text Center</p></projDesc> <editDecl> <p>This image exists as an archived TIFF image, one or more JPEG versions for general use, and a thumbnail GIF.</p> </editDecl></encDesc> <profDesc> <txtClass><keywords>24-bit color; 300 dpi</keywords></txtClass> </profDesc> <revDesc> <change><date></date><resp><name></name><role></role></resp></change> </revDesc> </imageHeader>
<text idÿvLady> <lg typeÿtanza"> <l>don't lower your eyes </l> <l>or stare straight ahead to where </l> <l>you think you ought to be going </l> </lg> </text>