[Color Computer][Coco] Rainbow on Disc - OCR

John R. Hogerhuis jhoger at pobox.com
Sat Jun 11 00:04:31 EDT 2005


Well I've determined that there are sufficient Open Source tools to
generate just about any kind of PDF. However, to do the "text under
image" thing will require precise data about where each character is in
the image. Naturally this information is known by the OCR program, but
I'm not sure Transym OCR engine outputs this kind of detail. If it does,
we could conceivably use that to generate text under image.

PDF generation libraries like Panda seem pretty powerful.

However this would generate too much work I think. If it's a priority to
do the text under image thing, I'd say we could buy a few copies of
previous rev of ABBYY, as Neil suggests and use them to do the OCR and
PDF generation simultaneously. I'd guess if one person gets done with
OCR they can send the copy of ABBYY onto the next person to do scanning
work. When we're done with the project we can sell the uneeded copies of
ABBYY back onto ebay or keep them.

Alternatively, which would keep us scalable, I'd suggest just going the
flat text file approach. Cheap and efficient. Not as exciting as "select
box" grabbing of text, but certainly workable and cheap.

So one way or another I think OCR can happen, it's just how integrated
we want to make the text.

-- John.







More information about the Coco mailing list