Message-Id: <199708281857.LAA32564@dns.ccit.arizona.edu> Date: Thu, 28 Aug 1997 14:57:35 -0400 From: Judi Zidar <mailto:jzidar@NAL.USDA.GOV> Subject: Handout on Computer Storage Requirements by Document Type (fwd) To: mailto:IMAGELIB@LISTSERV.ARIZONA.EDU
---------- Forwarded message ----------
Date: Mon, 25 Aug 1997 22:22:35 -0700
From: Steve Gilheany <mailto:SteveGilheany@WORLDNET.ATT.NET>
Subject: 3 Day UCLA Extension Document Imaging Course Handout on Computer
Storage Requirements by Document Type
3 Day UCLA Extension Document Imaging Course Handout on Computer Storage
Requirements by Document Type
To the people who requested that I post this, thanks for your interest.
Trademarks are the property of their respective holders. No warranty of
any type is expressed or implied.
Please see the note at the end for cross listing of this posting.
If you receive questions on the accuracy of these estimates or have
questions yourself, please email me.
These listings, along with Microsoft Word, Excel, and Power Point files
will be posted at www.ArchiveBuilders.com after September 3, 1997 under
Course Notes. All of the information for this one page handout is here,
except the formatting information.
I also have a magnetic disk storage price list based on IBM projections
that should be good for the next twenty years. I will post it if people
are interested.
[N.B. these estimates will help you size your system. After you have
scanned in from 1 to 10 percent of your documents, you will know quite
precisely how your documents match these estimates and you can apply a
conversion factor. For example, if your images are ten percent smaller
than these estimates, on average, multiply your storage estimates by 90
percent. Because storage costs are a small part of overall conversion
costs, these slight variations are generally not a problem in planning.]
1 scanned page (8 1/2 by 11 inches) (CCITT G4 compressed) = 50 KiloBytes
(KByte) (on average)
1 file cabinet (4 drawer) (10,000 pages on average) = 500 MegaBytes (MByte)
= 1 CD ROM
2 file cabinets = 1,000 MBytes = 1 GigaByte (GByte); 10 file cabinets = 1
DVD (see below)
2,000 file cabinets = 1,000 GBytes = 1 TeraByte (TByte); 2,000 file
cabinets = 200 DVDs
1 banker's box (2,500 pages) = 1 file drawer = 2 linear feet of files = 125
MBytes
8 banker's boxes = 16 linear feet = 1 GByte; 8,000 boxes = 16,000 linear
feet = 1 TByte
1 roll of 16 mm microfilm (100 ft) = 2,500 letter size images = 1 banker's
box = 125 MBytes
1 roll of 35 mm microfilm (100 ft) = 5,000 letter size images (or letter
size image equivalents) = 250 MBytes
1 microfiche (average) = 100 letter size images; 200 fiche = 20,000 images
= 1 GByte
[N.B. In many record series, microfiche contain only a few images because
each fiche represents a single record in the series. In this case filming
breaks on record boundaries, rather than being continuous. To a lesser
extent this is also true for roll film. In these cases, the amount of
storage required depends on the number of images on the film, not the
number of microfiche or the number of rolls of film.]
Scanned aperture card images require the same storage as the document or
drawing in the aperture would require at its physical, one-to-one,
full-size, un-microphotographed size.
1 E size drawing (48 inches by 36 inches) = 16 letter size pages (8 1/2 by
11 inches);
[D size = 8 pages; C size = 4 pages; B size = 2 pages; A size = 1 page
//old E size 48 x 36 in., new E size 44 x 34 in. (A0 size is the ISO
European size equivalent nomenclature for E size), D size (A1) 34 x 22, C
size (A2) 22 x 17, B size (A3) 11 x 17, A size (A4) 8½ x 11 // F size 28 x
40, Roll sizes: G size 11 x 22 ½ to 11 x 90, H size 28 x 44 to 28 x 143, J
size 34 x 56 to 34 by 176, K size 40 x 56 to 40 x 143 in. // For
newspapers, a double truck (center fold) full broadsheet is 24 x 36 inches,
equivalent to an old D size drawing.]
1 hour compressed color video = 2 GBytes (DVD, MPEG 2) (image quality
dependent)
1 hour audio = 10 MBytes (dictation, answering machine) to 500 MBytes (a CD
holds 74 minutes of music)
1 color picture = 10 KBytes (thumbnail) to 5 MBytes (for each of 100 photos
on a 500 MByte photo CD)
[N.B. The size of the compressed file for a scanned photograph depends on
the resolution (DPI: Dots Per Inch) and the detail (information) in the
photograph. The detail in a photograph is dependent on the size of the
negative and the quality of the film and the camera and lens (It is not
related to the print size unless the print is smaller than the negative).
The resolution of the scan should be chosen to match the detail of the
photograph. For most cameras, films, and formats 35 mm and smaller, the 5
MByte Photo CD format (3,072 by 2,048 pixels) captures all the information
in the image. N.B. this is in dots per image rather than dots per inch.]
1 Chest X-ray = 1 MegaByte (14 x 17 inches), 150 DPI (Dots Per Inch), 12
bits (compressed)
[(12 bits per pixel, provides 4,096 shades of grey) (wavlet compression,
lossless mode, has FDA 510(k) approval) / (150 DPI, 12 bit images
recommended by American College of Radiology for primary reads) / 14 x 17
Chest X-ray =200 KiloBytes (for secondary reads: wavlet compression, lossy
mode, has FDA 510(k) approval)]
1 Byte (B)(common usage) = 8 bits (b) = 1 character; 1 Unicode Byte = 16
bits = 1 character
[1,000 Bytes =~ (~ about) 1 KiloByte; 1,000 KBytes =~ 1 MegaByte; 1,000
MBytes =~ 1 GigaByte; 1,000 GBytes =~ 1 TeraByte; 1,000 TBytes =~ 1
PetaByte; 1,000 PBytes =~ 1 ExaByte]
Modem = 33 Kbit per second = 2 pages per minute (~$30.00 per month for a
standard phone line)
ISDN (1 voice channel) = 56 Kbit per second = 5 pages per minute (~$50.00
per month)(ISDN charge)
T1 (24 voice channels) = 1.544 Mbit per second = 3 pages per second
(~$1,000.00 per month)
Ethernet (CSMA/CD) = 1 Mbit per second (effective) or 10 Mbit per second
(nominal) = 2 pages per second
OC3 ATM (Asynchronous Transfer Mode) = 155 Mbit per second = 300 pages per
second
OC192 (SONET optical fiber) = 10 Gbit per second = 20,000 pages (2 file
cabinets) per second
Optical carrier frequency = 400 THz (40,000 cycles used for every OC192 bit
transmitted)
[N.B. Spelling out Byte and bit whenever used considerably reduces
confusion as files stored as Bytes are transmitted as bits.]
1 DVD (Digital Video Disk) (same physical size as a CD ROM) = 7.4 GByte
(WORM)
[(WORM: Write Once, Read Many) (2 sided, 1 layer per side); = 5.2 GByte RAM
or RW (overwrite, rewrite) (2 sided, 1 layer per side); = 17 GBytes (ROM)
(2 sided, 2 layers per side). Multimedia: 5 channel (theater quality
surround sound)(5.1, Dolby AC-3) / 96 KHz / 24 bit audio, 8 languages , 32
subtitles, and about 135 minutes (long enough to accommodate 94% of all
movies) of high quality (720 horizontal lines) video on each of 4 layers.
The file format is ISO 13346 UDF (Universal Disk Format) which harmonizes
all CD recording standards including ISO 9660. Available in 1996. A
future technology, 3rd generation blue lasers [sort of a blue light
special], should yield a 40 GByte ROM for HDTV.]
1 pulp tree (loblolly pine) = 1/10th cord of wood = 10,000 pages = 1 File
Cabinet = 4 banker's boxes = 1/2 GByte
[1 lumber tree (20 inch diameter, 110 ft tall, 50 years old) = 1 cord, 10
pulp tree (8 in. dia., 50 ft tall, 20 yrs old) = 1 cord, 1 cord = 4 x 4 x 8
ft = 128 cubic ft (75 cubic feet of wood)]
1 wordprocessor or OCR'ed (Optical Character Recognition) page = 5 KBytes
(all pages listed above are scanned pages)
1 compressed page of COLD (Computer Output to Laser Disk) or COOL (Computer
Output On-Line) = 1 KBytes
Minimum commercial scanning cost for backfile conversion (more than 1
million pages) ~ 5 cents per page
Search by:
Database entry/Unique identifier
Full text/Fuzzy search
Nested folders/Aliased folders
Concept/Thesaurus search
Document structure (SGML)
Hyperlink traversal/annotation
Email{ed}link/Workflow link
Card catalog/Finding aid
Sequential search/Date scanned
Log of reading history/Date entered
Bibliography/Citation counts
ActiveX (Object Link) link
Spatial /Temporal coordinates (GIS)
Internet agents/Popularity chart
Time Code (SMPTE)/GPS orientation
Image Matching / Image Analysis
Thumb Print / Physiological ID
Combination of any or all of the above
The raster image is the image of record: (OCR'ed/vectorized images
constitute re-authoring/re-engineering)
Rev 30
http://www.ArchiveBuilders.com mailto:SteveGilheany@ArchiveBuilders.com 1147
Manhattan Avenue, Suite 322, Manhattan Beach, CA 90266 Tel: (310) 937-7000
Fax: (310) 937-7001
The above is one of the one-page handouts for the following course:
UCLA Extension will present a three day class on Document Imaging and
Document Management in Downtown Los Angeles at the World Trade Center, next
to the Westin Bonaventure Hotel (800) 228-3000 (213) 624-1000. The dates
are September 25, 26, and 27, Thursday, Friday, and Saturday. To
accommodate fly-in students, the class meets from 1 PM to 9 PM on Thursday
and Friday, and from 9 AM to 5:00 PM on Saturday. The fee is $375.00.
This course is for managers who have been assigned to specify, install or
manage a document imaging system. Students will learn about the technology
of scanning, importing, transmitting, storing, protecting, locating,
retrieving, viewing, and printing documents.. Image and document formats,
multimedia, rich text, GIS (Geographic Information Systems), CAD (Computer
Aided Design), and image enabled databases will be discussed. The course
also covers the integration of the DVD, DirecTV, DirecPC, Cable, Telephony,
the Internet and PC. UCLA Extension registration is (310) 825-9971. Ask
for course X 814.14, registration number B4004. For information, please
contact the instructor, at mailto:SteveGilheany@ArchiveBuilders.com. (310)
937-7000. Instructor: Steve Gilheany, BA CS, MBA, MLS Specialization in
Information Science, CDIA (Certified Document Imaging System Architect),
Sr. Systems Engineer, Archive Builders.
I also have a Cliff's Notes version of the course that fits on one page,
which I will post if people are interested. (see above)
The following is offered to reduce duplication: This posting has been cross
listed on the following lists: ALA-LITA-L, Archives, DigLib, DigLib-ns,
DPRA, ERECS-L, PACS-L, RecMgmt, and SLA-DITA. If you can suggest other
lists that might have readers that are interested in the topic, please let
me know and I will subscribe to those lists and post this message to those
lists. If you can post it more easily than I can, please let me know and I
will ask one person to post it to each list. (I have not been successful
at finding and subscribing to ImageLib)
Steve Gilheany
Tel: (310) 937-4757 Fax: (310) 937-4758
mailto:SteveGilheany@worldnet.att.net