[Coco] Re: [Color Computer] Requesting .pdf help...

Mon Jul 18 11:19:53 EDT 2005

I have written scripts that utilize imagemagick to eliminate yellowing and
page clutter by doing the following:

1. I create a second copy of the scanned image as a monochrome image with
the black-threshold set to reduce as much "noise" as possible while
retaining most of the "good" data of the original scan.
2. I use the monochrome image as a mask against the original image by making
the black areas transparent and overlaying the mask on top of the original
scan.  This cleans up a lot of the "noise" in the image (paper grain,
yellowing, speckling etc).
3. I clean up the edges, reduce the colors in the image to 256, reduce the
bit depth to 8, and create an LZW compressed .tif as a final result.

An image processed with the above "clean" script looks very nice as a .tif,
but pretty grainy with tons of artifacts when compressed to an acceptable
size using Adobe's Acrobat v6.  I know the .pdf standard allows for a lot
more options than is presented in the Acrobat GUI.  Perhaps converting the
.tif files to postscript and processing them using ghostscript would be a
better option, but this will not help with the OCR aspect.  I can use
ghostscript's ps2ascii.ps filter to pull hidden text (including glyph
location on the images) but I have no idea how to import the info back in.

Regards,
Michael Harwood  

-----Original Message-----
From: coco-bounces at maltedmedia.com [mailto:coco-bounces at maltedmedia.com] On
Behalf Of James Diffendaffer
Sent: Monday, July 18, 2005 8:56 AM
To: ColorComputer at yahoogroups.com
Subject: [Coco] Re: [Color Computer] Requesting .pdf help...

The reason you are having to shrink the image down is because your scan has
large areas of color that, look the same to the human eye, but it's composed
of thousands of pixels differening in color.

Since djvu approximates the image (lossy) it isn't a problem.  
PDF is trying to create vectors to store all of them (non-lossy).
If Acrobat has some sort of option that eliminates this you'll have better
luck. 

If you want to stick with the large file but get better compression you
could pass the images through some sort of image processing that eliminates
the isolated pixels.  It would probably make both programs compress better.
Smoothing filters do that.  It's how they remove moles and freckles from
cover models so they appear to have perfect skin.  But it causes
indiscriminant fuzzing of the image and those photos are edited by hand...
not automated.  I'm not sure what would be a good alternative.

The large scale document archival systems I've worked with all expect B&W
source images (order forms, membership applications, etc...) so I'm not sure
what would work well for color.

Brought to you by the 6809, the 6803 and their cousins! 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/ColorComputer/

<*> To unsubscribe from this group, send an email to:
    ColorComputer-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/