[Coco] Resolution, size and usability

Dennis Bathory-Kitsz dennis-ix at maltedmedia.com
Fri May 22 13:09:33 EDT 2009


Hi all,

I really hope this discussion is helpful as people work on archiving 
what they have, and as they continue to make it available in the most 
effective way -- for everybody. There was only one dismissive 
comment, and it certainly doesn't reflect what most of us have been saying.

You know I rarely participate here, and only watch in the background 
to make sure things are working smoothly.

This time, though, the discussion has been important to me because 
since 1995 I have been a consultant in online accessibility. There 
are lots of barriers to access, and in a case such as ours, an 
effective solution between accuracy ('real' archiving) and usability 
('library' archiving) is always welcome. Existing paper documents 
provide an enormous challenge in this transitional phase between 
low-speed, high-speed and the promised future internet (as well as 
processing power and disk space, and the time needed for hands-on 
re-archiving to long-term storage media).

Jeff's comments are interesting.

At 12:26 PM 5/22/2009, Jeff Teunissen <deek at d2dc.net> wrote:
>Except for choosing the white and black points and maybe the scanner's
>gamma curve, all of that stuff is ALMOST completely useless these
>days. A monochrome page of a given resolution compresses to pretty-much
>exactly the same size as a full-color page, and is often larger than
>the color scan because it can't be compressed as well (too much
>"redundant" information has been thrown away).

However one prefers to get rid of noise or whiten the background, I'm 
all for that. I wanted to put it to the test and, beyond that, I was 
curious about the compression differences Jeff mentioned. So I just 
did a few test pages from the same source with a small quantity of 
large text on a noisy page. (TIFF used here is uncompressed.)

Format order below is TIFF, ZIP, PDF and ZIPPED PDF:

Noisy page
24-bit color
      35363K (100%) - 15910K (45%) - 13122 (37%) - 13133K (37%)
8-bit grayscale
      11784K (33%) - 9774K (28%) - 9789K (28%) - 9780K (28%)
4-bit grayscale
      5895K (17%) - 4389K (12%) - 7892K (22%) - 7885K (22%)

Cleaned (whitened) page
24-bit color
      22557K (64%) - 1521K (4%) - 522K (1.5%) - 398K (1%)
8-bit grayscale
      11783K (33%) - 942K (2.5%) - 938K (2.5%) - 935K (2.5%)
4-bit grayscale
      5895K (17%) - 535K (1.5%) - 478K (1.5%) - 356K (1%)

As I'd mentioned in my earlier post, cleaning up the noise is the 
most important factor in the compressable size. The color information 
on a monochrome page turns out to be irrelevant -- thanks to Jeff for 
teaching me that although the TIFFs are larger, the PDFs show 
virtually no difference in size between the color setting and the bit depth.

If you look at many of the hundreds of documents scanned on our 
maltedMedia site and others, you'll find that the handling of page 
background noise is greatly responsible for ballooning document size.

My point is that documents can be archived for speedy downloads or 
for accurate results. At my home (and in Bill's case) the documents 
are archived for accuracy. The 'look' of the scanned page is exactly 
the same as the 'look' of the paper document. That is a good thing -- 
with the one exception, and this is for the person downloading.

(Archiving visual accuracy is another topic, one of great concern to 
those of us with artistic requirements.)

>Throwing away page content by reducing bit depth is MUCH worse for
>the content than lossy compression, because at least the lossy
>compression of J2K (or even JFIF) drops bits that the human brain
>doesn't readily notice, while reducing bit depth loses bits across
>the whole range at arbitrary cutoff points.

Yes. Given the choice, though, I'll take lower bit depth. My aging 
eyes find the fuzziness of compression more effortful because I tend 
to magnify documents to read them. I find the 4-bit grayscale easier 
overall than the artifacts of lossy compression.

Dennis











More information about the Coco mailing list