[Coco] OT: Archiving books

John R. Hogerhuis jhoger at pobox.com
Wed Jul 7 12:05:59 EDT 2004


On Wed, 2004-07-07 at 08:34, Bootstrap Bill wrote:
> What is the best way to convert books to PDF files? I'd like to create PDF
> files that resemble the original books as much as possible, but with
> searchable text, not just a JPG scan of the original pages.
> 
> I'd like something that can work fairly fast with little input from the
> user, since I will be scanning several thousand pages, possibly more.

There is no such animal.

You can come close though:

Chop of the binding
Feed doc into sheet fed document scanner
Scan all pages into the computer
OCR all pages
Go through each OCRed page and fix all the mistakes, there will be a
lot.
Get all the illustrations/graphics inserted the way you want

If you want *nice* formatting you may want to convert the text/graphics
to LaTeX format and generate your final PDFs or whatever from that. This
is what my team is doing with Thinking Forth.

The second easiest thing is probably scanning straight to PDF and forget
about searchable text. I did the same thing with all free tools (open
office + xsane is good enough to generate PDFs) but it wasn't the
easiest process ever. I archived the Tandy WP-2 manual if you see it
around...

-- John.




More information about the Coco mailing list