[Coco] Re: [Color Computer] Requesting .pdf help...

Gene Heskett gene.heskett at verizon.net
Mon Jul 18 09:05:41 EDT 2005


On Monday 18 July 2005 07:11, Michael Wayne Harwood wrote:
>I know that this is a bit offtopic in general, but it ties to the
> Rainbow project so I figure it's not too far off base.
>
>I am writing this to ask for help in generating .pdf documents from
> hires images that have a resolution of 2550x3300 at 8bit color (8
> 1/2 x 11 at 300ppi).  As I have been working on this project I have
> come up with a variety of ways to decrease the size of the finished
> documents without seriously affecting quality.  My goal is a .pdf
> document whose size averages out to 160kb or less per page so that
> I can meet my goal of having the entire Rainbow collection on a
> single disc.
>
>I have been pretty ununimpressed with the quality of .pdf documents
> I have managed to create using a source image as descibed above. 
> My best results have come from resizing the image to 1/3 the size
> (100ppi), but this leaves little room to pass the image through an
> OCR image prior to the PDF generation.  My goals are:
>
>1) Size must average out to 160kb or less per page
>2) Visual quality must be at least as good as
>http://www.musicheadproductions.org/test2.pdf
>3) It would be nice if a "once pass OCR" could be used to make the
> .pdf searchable.
>
>Can anyone help me?
>
>Regards,
>Michael Harwood

While the obvious thing is to scan at a high enough resolution that 
the OCR works well, and edit that to fix any miss-cues, that leaves 
you trying to scan the images at the same time as the OCR scan is 
done.  The OCR isn't going to be very happy while trying to OCR the 
gfx images.

The gimp can be used to blank fill the image area of the scan, 
reducing the OCR's confusion considerably, and thereby speeding that 
up, probably by an order of magnitude.  Save the image areas first as 
seperate pix files unless you want to rescan it again.  Once the text 
has been recovered, then OOo can recompose the page by re-inserting 
the pix you saved out before you blanked it out in the gimp.   And it 
can then be told to output the .pdf, whose size will be determined by 
the size of the graphics since text isn't usually more than 2500 
bytes per page. Decent photo style gfx  in .jpeg format may eat over 
2 megs for a full page, but the stuff I get in the mail from my kids 
at 20k a snapshot isn't fit to print either, so expect to use a 
little more space where the gfx are.

All of this doesn't seem to be amenable to scripting and automatic 
processing though, something you need to do some of unless you want 
to spend the next 2 or more years of your life on this. 

>
>
>
>Brought to you by the 6809, the 6803 and their cousins!
>Yahoo! Groups Links
>
>
>

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.35% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2005 by Maurice Eugene Heskett, all rights reserved.


Brought to you by the 6809, the 6803 and their cousins! 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/ColorComputer/

<*> To unsubscribe from this group, send an email to:
    ColorComputer-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 





More information about the Coco mailing list