[Coco] Scanning documents

Sun Feb 25 11:39:07 EST 2007

Bob Devries wrote:
> I use Adobe Acrobat V6.0 Professional in conjunction with my CanoScan 5200F 
> scanner to create PDF's of the Colour Computer Documentation that I have 
> uploaded to The Maltedmedia FTP site.
>
> I believe it does a very good job, and creates relatively small files, 
> provided I don't want to be able to search for text in the file. For 
> example, the latest one I uploaded, OS-9 Pascal.pdf, is 169 pages, which 
> comes out at 1,396,855 bytes, and 1,404,891 bytes if I include bookmarks for 
> all the chapters.
>
> If however, I run the file through the OCR section of Acrobat, it increases 
> the file size to 4,501,469 bytes.
>
> I uploaded the second type to Maltedmedia. The text is very clear; I scan at 
> 300 DPI, in black and white with text enhancement turned on. I scan the 
> front and rear covers in colour at the same DPI setting.
>
> If anyone particularly wants the OCR'd version, I can make it available, if 
> someone wants to host it.
>
> I noticed that someone else has been scanning the manuals, and saving the 
> pages separately as TIFF files (which are themselves huge), and then making 
> a PDF file from that. This makes for very large PDF files, unfortunately.
>
> Any comments for improvements are always welcome. I am by no means an expert 
> at this; I'm learning as I go along.
> --
> Regards, Bob Devries, Dalby, Queensland, Australia
>
> Isaiah 50:4 The sovereign Lord has given me
> the capacity to be his spokesman,
> so that I know how to help the weary.
>
> website: http://www.home.gil.com.au/~bdevasl
> my blog: http://bdevries.invigorated.org/
>
>
>   
How on earth can Acrobat make OCR'ed documents 3x larger?  Bizarre. 
When I've scanned roleplaying game books I've just used an OCR program
like OmniPage Pro, pasted it all in OpenOffice.org Writer, taught the
spell checker all kinds of Tolkien words, done some proofreading, pasted
images in, and written a PDF from OOo.  It has meant a fair amount of
editing work, which I suspect is unavoidable if you want accurate
results, but the file sizes have invariably been far smaller than
non-OCR versions I've obtained off the Internet.

Maybe Acrobat is producing a PDF that somehow overlays the text on top
of the original scanned image.  Imagine copying text from a PDF and
pasting it into a new document only to discover that the text version
contains OCR errors and isn't at all what you see on the screen!

JCE