Message-Id: <mailto:199501062001.OAA00389@library.wustl.edu> Date: Fri, 6 Jan 1995 14:58:46 -0500 From: Lou Sharpe <mailto:lsha@LOC.GOV> Subject: Library of Congress Questionnaire To: Multiple recipients of list IMAGELIB
Library of Congress Preservation OfficeDigital Imaging for Manuscript Preservation A Survey of the Field
NOTE: If you return this questionnaire, please do so directly to the contact person listed at the end of this message, not as a reply to the mailing list from which you received it. Replies are requested by 31 January 1995. Partial submissions will be gladly accepted. Please note the offer of useful utilities (carrots!) down in the INSTRUCTIONS section.
PURPOSE
The Library of Congress wishes to learn about existing practice among archives, libraries and the wider commercial marketplace for digital imaging of documents, especially the types of documents found in manuscript collections. This information, together with technical studies undertaken by the Library, will be used to develop approaches for future digital conversion efforts.
GENERAL BACKGROUND
As the Library of Congress continues to develop its capabilities for providing computerized access to its collections, it must address a wide array of issues as well as identifying and testing a broad range of tools and techniques, especially those which will assure that digital imaging can be used successfully within the institution's preservation programs.
In order to address some of those preservation issues in the particular case of manuscript and document collections, the Library has engaged Picture Elements, Inc., to carry out one or more surveys in order to determine and/or identify:
- The most appropriate image formats for manuscript conversion projects
- Available software and hardware tools for image enhancement
- Available software and hardware tools for efficient throughput
This survey is a part of that effort. A later demonstration conversion project is also planned.
DETAILED BACKGROUND
The focus of this demonstration project will be on documents consisting of unbound, separate handwritten or typed sheets of 8.5 inch by 14 inch or smaller paper -- what might be considered to be typical manuscript documents.
A key issue for the Library is finding the most judicious balance between conserving precious original documents -- protecting them from damage -- and achieving a reasonably rapid rate of conversion. The outcomes of this project are expected to assist the Library in designing models for further conversion applications for the Library's collections.
The Library foresees the need for at least two types of images that reproduce typical manuscript-collection documents. One image, proposed for consideration as a potential digital preservation-quality image, will have high quality and offer a faithful copy of the original.
The Library also seeks to create smaller-sized images in addition to the preservation-quality image. These will be used in end-user retrieval systems, especially those accessed via computer networks, including Internet. Smaller-sized or access-quality images can be more easily handled in such systems. The Library would like to identify a practical level of quality that, although less faithful than the preservation-quality image, offers high legibility and good service to researchers.
INSTRUCTIONS
Please complete this questionnaire if your organization has or is planning a project involving preservation of manuscript or other primarily textual documents using digital imaging.
It may be that you have no such project, but have opinions or policies on this topic. Or, you may find a questionnaire format confining. You may not have time to address the entire set of questions. In these cases, please feel free to provide any information with a form and content you feel appropriate, using the questionnaire as a guide to our issues of interest. Comments may be inserted in-line into the questionnaire or attached. When presented with a list, multiple answers will often be appropriate.
You may reply in any format by contacting Lou Sharpe of Picture Elements, Inc. directly at mailto:lsha@loc.gov or at lsharpe@netcom.com or by phone at 202-543-7495 or by fax at 202-543-6767.
Please complete the GENERAL QUESTIONS section below. Then proceed to special questions for ARCHIVISTS and special questions for TECHNOLOGISTS. You need not answer every question; feel free to offer some replies in both sections.
In exchange for your returned questionnaire, we would like to offer you two useful public domain utilities for checking the format of image files. TIFFLOOK dumps TIFF files and JPEGINFO dumps JPEG Interchange Format (JPG) or JPEG File Interchange Format (JFIF) files. Please indicate your desire for further information on these utilities on your returned questionnaire.
GENERAL QUESTIONS -----------------
1. Contact Information Name _____________________________________ Organization _____________________________________ Address _____________________________________ _____________________________________ _____________________________________ Email address _____________________________________ Phone _____________________________________
2. Name of Project or Department _____________________________________
3. Nature of Your Organization archive ___ library ___ commercial company ___ government agency ___ other (please specify) _____________________________________
4. Materials Being Digitized
4a. physical form loose pages ___ bound volumes ___ other (please specify) _____________________________________ 4b. content types typewritten ___ handwritten ___ engravings ___ lithographs ___ other (please specify) _____________________________________
ARCHIVIST OR CURATORIAL QUESTIONS ---------------------------------
5. Do you create digital images of
manuscript papers ___ printed matter ___ handwritten items ___ other types of documents ___ (please specify)? _________________________________
6. Do you consider these copies to be for
access ___ preservation surrogate ___ document delivery ___ republication ___ transcriptions ___ optical character recognition ___ a mix of the above ___ other (please specify)? _____________________________
7. What is the total number of images scanned in your project to date?
8. If you create digital images with preservation as a goal, do you discard or retain the original paper item?
discard ___ retain ___ other (please specify)? _____________________________________
9. Regarding microfilming of the items being digitized, do you
microfilm in parallel ___ scan from microfilm ___ output digital image to an electron beam film recorder ___ other (please specify)? _____________________________
10. What are your approaches to retrieval?
catalog ___ non-bibliographic database ___ directory ___ register ___ SGML-tagged register ___ searchable full texts ___ other (please specify) _____________________________
11. Do you use the image file header content for searching and retrieval?
12. Do you use any special approaches to protect or authenticate images?
encryption ___ authentication ___ watermarks ___ hidden watermarks ___ other (please specify) _____________________________________
13. Regarding the rapid and efficient capture of images:
Do you use a sheet-feed or other device? ___ Do you use a book-edge or other special scanner? ___
What is the approximate number of images that your conversion facility can capture per hour or day? _____
How many staff and scanners provide this total throughput? _____
14. Have any of your documents been damaged during the capture process? Please give details, if possible.
15. Do you capture any items while they are sleeved in Mylar?
16. Have you been forced into workarounds by special problems, for example:
- thin paper bleedthrough requiring special image processing, - image quality problems forcing transcription?
TECHNICAL IMAGING QUESTIONS ---------------------------
17. Do you create more than one type of digital image, e.g., a preservation image and an access image? Why? How do they differ in terms of the below three sections (image characteristics, compression techniques, file formats)?
IMAGE CHARACTERISTICS USED
18. Please provide technical information on the image types you create, including:
- spatial resolution as delivered (dots per inch or millimeter) - actual optical resolution (dots per inch or millimeter) - tonal-depth resolution (number of shades or colors or bits per pixel)
In this regard, are you aware of whether the scanning subsystem converts from the actual optical resolution to the delivered resolution? Do you know what technique is used for this process (for example pixel replication/deletion or linear interpolation)?
COMPRESSION TECHNIQUES USED
19. Please indicate the compression techniques used.
CCITT T.6/Group 4 ___ CCITT T.4/Group 3 ___ JPEG ___ JBIG ___ LZW ___ other (please specify) _____________________________________
FILE FORMATS USED
20. Please indicate the image file formats used.
TIFF vs. 6.00 ___ TIFF vs. 5.00 ___ ODA or ANSI/AIIM MS-53 ___ JPEG Interchange Format ___ JFIF ___ other (please specify) _____________________________________
FILE HEADER OR TRAILER INFORMATION FIELDS
21. For the named file formats, what header fields or tags are used? Please provide a list, giving the tag or element number. Can you provide a text dump of one of your files? Do they conform with any identifiable subsets of file formats (e.g. TIFF Class B or RFC 1314)?
22. Do you place identifying information in the header, e.g., a code number for the image, the name of your organization, or a title or subject term?
SCANNERS USED
23. What primary scanner was used for capture (manufacturer and model)? Was it modified or customized in any fashion?
24. Was more than one scanner type used? Are any documents routed to a specialized scanner having different capture characteristics?
SPECIAL IMAGE PROCESSING USED
25. What image processing or image enhancement approaches have you found helpful?
26. Do you apply de-skewing or border cropping techniques to your images?
27. Does your approach result in bitonal (one-bit-per-pixel) images?
28. Does your system employ special forms of threshholding, density control, or contrast and brightness management?
DATABASE ISSUES
29. How do you link images to the retrieval tool? Do you link directly by pathname/filename, use a look-up table, use an identifier in the header, or some other approach?
30. What file or directory naming conventions do you use? Are these techniques used to link images to other records?
31. What indexing means is used to link documents and image files to bibliographic records or other search tools?
32. Do you do more or less indexing work for materials being scanned as compared to traditional materials?
ACCESS ISSUES
33. What are the intended uses of your images? Are they for preservation only, for screen access over local area networks, for wide area network access, or for local printing?
STORAGE
34. Does your institution have an approach for the preservation of the digital data represented by the images? Please provide a brief statement.
35. Is there a policy on migration of data to newer media as time progresses?
36. Is there a policy on the monitoring of error correction rates or for random sampling of seldom used collections?
37. What media are used?
38. What is the average image size? If both preservation and access images are stored, please indicate the average size for each type.
STANDARDS
39. To what extent are standards issues key to your approach to digital imaging? Do you believe de facto or de jure standards should be used?
QUALITY ASSURANCE
40. What level of quality assurance is used?
41. Is visual inspection used? On what percentage of scans?
42. Is automatic quality assurance used?
DOCUMENT PREPARATION
43. How do you prepare documents for scanning?
44. Do you separate different material types or keep them together in the workflow?
45. Are any special steps taken in the physical preparation for scanning, such as
disbinding ___ guillotining ___ fastener removal ___ other (please specify)? _____________________________________
46. How much time does each of these steps consume?
FUTURES
47. What breakthroughs in imaging technology would help you most?
good microfilm scanner ___ high-end book scanner ___ face-up book cradle ___ page turning device ___ high-speed input subsystem ___ automatic quality assurance ___ preservation file format ___ other (please specify) ____________________________
CONTACT INFORMATION
For further information, to convey answers verbally, or to discuss any of the questions in more detail, please contact Picture Elements directly.
Louis H. Sharpe, II Picture Elements, Inc. Box 75760 Washington, DC 20013 202-543-7495 202-543-6767 fax
mailto:lsha@loc.gov or mailto:lsharpe@netcom.com