[Coco] Re: Rainbow on Disc - OCR
John R. Hogerhuis
jhoger at pobox.com
Fri Jun 10 12:34:19 EDT 2005
On Fri, 2005-06-10 at 09:25 -0600, Michael Wayne Harwood wrote:
> John,
>
> You make some excellent points! Would you be willing to lead the charge
> in investigating and organizing what would be required to move forward
> with this? I think that before we start scanning magazines en masse we
> should look into the minimal requirements needed a successful OCR project.
>
Yes I will, if we have a non-squishy OK from Lonnie. From his point of
view, the concept is this:
With each PDF on the disk, there will be a similarly named ascii text
file. This text file will have the raw ASCII text that a computer
scanned from Rainbow, with editing for proofreading. The purpose is to
be able to do a text search through Rainbow to find articles and even
advertisements (you'd be surprised how often this comes up). For each
program listing this file may be broken further into a set of program
text files with the volume/issue/listing name & number.
-- John.
More information about the Coco
mailing list