[Coco] Re: [Color Computer] Requesting .pdf help...

Gene Heskett gene.heskett at verizon.net
Mon Jul 18 10:19:26 EDT 2005


On Monday 18 July 2005 08:51, Michael Wayne Harwood wrote:
>test2.pdf is a multipage .pdf file with 10 test pages.  To get this
>quality I had to resize the image to 100ppi (782x1082 in this case)
> before I published to .pdf.  A source image at 100ppi does not go
> though most OCR engines very well.

Agreed, this won't ocr, its too fuzzy.  OTOH, now that I saved it from 
kghostview which only shows the first page (it shows that there are 
10 pages in the left border, but will only display page 1, all others 
are blank) so that I could look at it with ar7, then I could see the 
rest of it.  Considering the material, 10 pages that total up to 
1.5megs=150k a page average, I'd say you've come close to hitting 
your target.  The text as is, seems ledgible enough.  You could go 
with it and forget the OCR part maybe?

WRT to the blank pages, this is not something I've observed kghostview 
doing before.  Ever.  So I have NDI where to point the finger.

>Let me clarify my intentions a bit...   A lot has been said about
> the pros and cons of the .djvu format vs the .pdf format, and I am
> not trying to re-open this debate.  However I am finding that I am
> consitently getting a higher quality with the .djvu format that has
> extras like searchable text, while I am not able to figure out how
> to get the same level of quality of features in the .pdf format.
>
>What I would like is for someone to assist me in finding a way to
> pass a 2550x3300 image through an OCR engine with the end product
> being a .pdf with searchable text (not perfect, but useful) that
> ends up with an average size of 160kb or less per page.  All of
> this needs to be done with either open sourced or freeware tools
> rather than a $400 publishing or OCR package.
>
>If I am not able to figure out how to make .pdf jump through the
> same hoops I can get .djvu to jump through I am going to publish
> exclusively to .djvu rather than have two formats that vary in
> quality and features.
>
>Let me reiterate that I am NOT trying to re-open the .djvu vs .pdf
> debate - please focus your replies on advice and practical examples
> of how to accomplish these goals in the .pdf format.
>
>Regards,
>Michael Harwood
>
>> A cover page is not a very good test page.  I mean thats a great
>> .pdf, I blew it up to 800% before the effects of the dct could be
>> seen on screen here, but its also 1.5 megs because its the cover
>> of the rag and all image, in full color, but there is nothing
>> there that would make OCR'ing it worthwhile.  To see what it will
>> do, take a random page from inside the magazine thats >60% plain
>> text.  Page 2 of a multipage article would be a good test page.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.35% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2005 by Maurice Eugene Heskett, all rights reserved.



More information about the Coco mailing list