Handout on Computer Storage Requirements by Document Type (fwd)

Judi Zidar (mailto:jzidar@NAL.USDA.GOV)
Thu, 28 Aug 1997 14:57:35 -0400

Message-Id: <199708281857.LAA32564@dns.ccit.arizona.edu>
Date:         Thu, 28 Aug 1997 14:57:35 -0400
From: Judi Zidar <mailto:jzidar@NAL.USDA.GOV>
Subject:      Handout on Computer Storage Requirements by Document Type (fwd)
To: mailto:IMAGELIB@LISTSERV.ARIZONA.EDU

---------- Forwarded message ----------
Date: Mon, 25 Aug 1997 22:22:35 -0700
From: Steve Gilheany <mailto:SteveGilheany@WORLDNET.ATT.NET>
Subject: 3 Day UCLA Extension Document Imaging Course Handout on Computer
         Storage Requirements by Document Type

3 Day UCLA Extension Document Imaging Course Handout on Computer Storage Requirements by Document Type

To the people who requested that I post this, thanks for your interest.

Trademarks are the property of their respective holders. No warranty of any type is expressed or implied.

Please see the note at the end for cross listing of this posting.

If you receive questions on the accuracy of these estimates or have questions yourself, please email me.

These listings, along with Microsoft Word, Excel, and Power Point files will be posted at www.ArchiveBuilders.com after September 3, 1997 under Course Notes. All of the information for this one page handout is here, except the formatting information.

I also have a magnetic disk storage price list based on IBM projections that should be good for the next twenty years. I will post it if people are interested.

[N.B. these estimates will help you size your system. After you have scanned in from 1 to 10 percent of your documents, you will know quite precisely how your documents match these estimates and you can apply a conversion factor. For example, if your images are ten percent smaller than these estimates, on average, multiply your storage estimates by 90 percent. Because storage costs are a small part of overall conversion costs, these slight variations are generally not a problem in planning.]

1 scanned page (8 1/2 by 11 inches) (CCITT G4 compressed) = 50 KiloBytes (KByte) (on average)

1 file cabinet (4 drawer) (10,000 pages on average) = 500 MegaBytes (MByte) = 1 CD ROM

2 file cabinets = 1,000 MBytes = 1 GigaByte (GByte); 10 file cabinets = 1 DVD (see below)

2,000 file cabinets = 1,000 GBytes = 1 TeraByte (TByte); 2,000 file cabinets = 200 DVDs

1 banker's box (2,500 pages) = 1 file drawer = 2 linear feet of files = 125 MBytes

8 banker's boxes = 16 linear feet = 1 GByte; 8,000 boxes = 16,000 linear feet = 1 TByte

1 roll of 16 mm microfilm (100 ft) = 2,500 letter size images = 1 banker's box = 125 MBytes

1 roll of 35 mm microfilm (100 ft) = 5,000 letter size images (or letter size image equivalents) = 250 MBytes

1 microfiche (average) = 100 letter size images; 200 fiche = 20,000 images = 1 GByte

[N.B. In many record series, microfiche contain only a few images because each fiche represents a single record in the series. In this case filming breaks on record boundaries, rather than being continuous. To a lesser extent this is also true for roll film. In these cases, the amount of storage required depends on the number of images on the film, not the number of microfiche or the number of rolls of film.]

Scanned aperture card images require the same storage as the document or drawing in the aperture would require at its physical, one-to-one, full-size, un-microphotographed size.

1 E size drawing (48 inches by 36 inches) = 16 letter size pages (8 1/2 by 11 inches);

[D size = 8 pages; C size = 4 pages; B size = 2 pages; A size = 1 page //old E size 48 x 36 in., new E size 44 x 34 in. (A0 size is the ISO European size equivalent nomenclature for E size), D size (A1) 34 x 22, C size (A2) 22 x 17, B size (A3) 11 x 17, A size (A4) 8½ x 11 // F size 28 x 40, Roll sizes: G size 11 x 22 ½ to 11 x 90, H size 28 x 44 to 28 x 143, J size 34 x 56 to 34 by 176, K size 40 x 56 to 40 x 143 in. // For newspapers, a double truck (center fold) full broadsheet is 24 x 36 inches, equivalent to an old D size drawing.]

1 hour compressed color video = 2 GBytes (DVD, MPEG 2) (image quality dependent)

1 hour audio = 10 MBytes (dictation, answering machine) to 500 MBytes (a CD holds 74 minutes of music)

1 color picture = 10 KBytes (thumbnail) to 5 MBytes (for each of 100 photos on a 500 MByte photo CD)

[N.B. The size of the compressed file for a scanned photograph depends on the resolution (DPI: Dots Per Inch) and the detail (information) in the photograph. The detail in a photograph is dependent on the size of the negative and the quality of the film and the camera and lens (It is not related to the print size unless the print is smaller than the negative). The resolution of the scan should be chosen to match the detail of the photograph. For most cameras, films, and formats 35 mm and smaller, the 5 MByte Photo CD format (3,072 by 2,048 pixels) captures all the information in the image. N.B. this is in dots per image rather than dots per inch.]

1 Chest X-ray = 1 MegaByte (14 x 17 inches), 150 DPI (Dots Per Inch), 12 bits (compressed)

[(12 bits per pixel, provides 4,096 shades of grey) (wavlet compression, lossless mode, has FDA 510(k) approval) / (150 DPI, 12 bit images recommended by American College of Radiology for primary reads) / 14 x 17 Chest X-ray =200 KiloBytes (for secondary reads: wavlet compression, lossy mode, has FDA 510(k) approval)]

1 Byte (B)(common usage) = 8 bits (b) = 1 character; 1 Unicode Byte = 16 bits = 1 character

[1,000 Bytes =~ (~ about) 1 KiloByte; 1,000 KBytes =~ 1 MegaByte; 1,000 MBytes =~ 1 GigaByte; 1,000 GBytes =~ 1 TeraByte; 1,000 TBytes =~ 1 PetaByte; 1,000 PBytes =~ 1 ExaByte]

Modem = 33 Kbit per second = 2 pages per minute (~$30.00 per month for a standard phone line)

ISDN (1 voice channel) = 56 Kbit per second = 5 pages per minute (~$50.00 per month)(ISDN charge)

T1 (24 voice channels) = 1.544 Mbit per second = 3 pages per second (~$1,000.00 per month)

Ethernet (CSMA/CD) = 1 Mbit per second (effective) or 10 Mbit per second (nominal) = 2 pages per second

OC3 ATM (Asynchronous Transfer Mode) = 155 Mbit per second = 300 pages per second

OC192 (SONET optical fiber) = 10 Gbit per second = 20,000 pages (2 file cabinets) per second

Optical carrier frequency = 400 THz (40,000 cycles used for every OC192 bit transmitted)

[N.B. Spelling out Byte and bit whenever used considerably reduces confusion as files stored as Bytes are transmitted as bits.]

1 DVD (Digital Video Disk) (same physical size as a CD ROM) = 7.4 GByte (WORM)

[(WORM: Write Once, Read Many) (2 sided, 1 layer per side); = 5.2 GByte RAM or RW (overwrite, rewrite) (2 sided, 1 layer per side); = 17 GBytes (ROM) (2 sided, 2 layers per side). Multimedia: 5 channel (theater quality surround sound)(5.1, Dolby AC-3) / 96 KHz / 24 bit audio, 8 languages , 32 subtitles, and about 135 minutes (long enough to accommodate 94% of all movies) of high quality (720 horizontal lines) video on each of 4 layers. The file format is ISO 13346 UDF (Universal Disk Format) which harmonizes all CD recording standards including ISO 9660. Available in 1996. A future technology, 3rd generation blue lasers [sort of a blue light special], should yield a 40 GByte ROM for HDTV.]

1 pulp tree (loblolly pine) = 1/10th cord of wood = 10,000 pages = 1 File Cabinet = 4 banker's boxes = 1/2 GByte

[1 lumber tree (20 inch diameter, 110 ft tall, 50 years old) = 1 cord, 10 pulp tree (8 in. dia., 50 ft tall, 20 yrs old) = 1 cord, 1 cord = 4 x 4 x 8 ft = 128 cubic ft (75 cubic feet of wood)]

1 wordprocessor or OCR'ed (Optical Character Recognition) page = 5 KBytes (all pages listed above are scanned pages)

1 compressed page of COLD (Computer Output to Laser Disk) or COOL (Computer Output On-Line) = 1 KBytes

Minimum commercial scanning cost for backfile conversion (more than 1 million pages) ~ 5 cents per page

Search by:

Database entry/Unique identifier Full text/Fuzzy search Nested folders/Aliased folders Concept/Thesaurus search Document structure (SGML) Hyperlink traversal/annotation Email{ed}link/Workflow link Card catalog/Finding aid Sequential search/Date scanned Log of reading history/Date entered Bibliography/Citation counts ActiveX (Object Link) link Spatial /Temporal coordinates (GIS) Internet agents/Popularity chart Time Code (SMPTE)/GPS orientation Image Matching / Image Analysis Thumb Print / Physiological ID Combination of any or all of the above

The raster image is the image of record: (OCR'ed/vectorized images constitute re-authoring/re-engineering) Rev 30

http://www.ArchiveBuilders.com mailto:SteveGilheany@ArchiveBuilders.com 1147 Manhattan Avenue, Suite 322, Manhattan Beach, CA 90266 Tel: (310) 937-7000 Fax: (310) 937-7001

The above is one of the one-page handouts for the following course:

UCLA Extension will present a three day class on Document Imaging and Document Management in Downtown Los Angeles at the World Trade Center, next to the Westin Bonaventure Hotel (800) 228-3000 (213) 624-1000. The dates are September 25, 26, and 27, Thursday, Friday, and Saturday. To accommodate fly-in students, the class meets from 1 PM to 9 PM on Thursday and Friday, and from 9 AM to 5:00 PM on Saturday. The fee is $375.00. This course is for managers who have been assigned to specify, install or manage a document imaging system. Students will learn about the technology of scanning, importing, transmitting, storing, protecting, locating, retrieving, viewing, and printing documents.. Image and document formats, multimedia, rich text, GIS (Geographic Information Systems), CAD (Computer Aided Design), and image enabled databases will be discussed. The course also covers the integration of the DVD, DirecTV, DirecPC, Cable, Telephony, the Internet and PC. UCLA Extension registration is (310) 825-9971. Ask for course X 814.14, registration number B4004. For information, please contact the instructor, at mailto:SteveGilheany@ArchiveBuilders.com. (310) 937-7000. Instructor: Steve Gilheany, BA CS, MBA, MLS Specialization in Information Science, CDIA (Certified Document Imaging System Architect), Sr. Systems Engineer, Archive Builders.

I also have a Cliff's Notes version of the course that fits on one page, which I will post if people are interested. (see above)

The following is offered to reduce duplication: This posting has been cross listed on the following lists: ALA-LITA-L, Archives, DigLib, DigLib-ns, DPRA, ERECS-L, PACS-L, RecMgmt, and SLA-DITA. If you can suggest other lists that might have readers that are interested in the topic, please let me know and I will subscribe to those lists and post this message to those lists. If you can post it more easily than I can, please let me know and I will ask one person to post it to each list. (I have not been successful at finding and subscribing to ImageLib)

Steve Gilheany Tel: (310) 937-4757 Fax: (310) 937-4758 mailto:SteveGilheany@worldnet.att.net