[Coco] [Color Computer] .djvu vs .pdf: example files posted

Michael Wayne Harwood michael at musicheadproductions.org
Thu Jul 14 07:48:00 EDT 2005


You are correct in that bigger images create a more fertile ground for
excellent quality OCR, but generating documents that can pass through an
OCR application is not my first priority.  Due to licensing restrictions
we need to make the first revision of the end product have as close to
100% of the features desired as a second revision or "upgrade" would
require paying the $36 in license fees again.  This is the priority list I
have:

1. Document files (.pdf or .djvu) with page scans
2. Searchable and linked indexes
3. Data (Rainbow on Disk and Rainbow on Tape)
4. Raw text from "one pass" OCR
5. Searchable documents (.pdf or .djvu)

The Lizard Tech OCR does an excellent job - it uses the ABBYY engine I
believe...  I think it matches word bounding boxes much better than the
OCR's I have seen that produce .pdf output, but I am somewhat of a
perfectionist...I want the bounding boxes to be as picture perfect as
possible.  If by some off chance the community were willing to live with
the choice of .djvu as "the" end format then the $400 investment in Lizard
Tech software would be worth it.

Also a correction - in the .txt attachments I sent out I mistakenly
labeled the .txt files as July instead of August...  Rest assured that I
will do a MUCH better job of proof-reading before the final product get
sent out!!

;)

Regards,
Michael Harwood


> I honestly can't see any difference in the graphics between the three
> copies.  test1.djvu clearly has better quality text - but then that's what
> you'd expect after running it through OCR.  Without an OCR'd copy on PDF,
> we can't make a proper comparison for test1, but I suspect that the
> differences would likely be similarly negligible.
>
> Since you say you're planning on using image files instead of OCR (unless
> I
> totally misread you), then it seems the PDF slightly edges out DJVU.  But
> then, I don't think if a filesize difference of only ~5% is really enough
> to force a decision.  Although, won't the images have to be much larger in
> order to do OCR later?  (Sorry if that's already been resolved, I haven't
> really been following these threads too closely.)
>
> (On a point unrelated to quality...  The DJVU viewer for Windows comes in
> a
> single, self-contained executable weighing in at a mere 484 kB, which I
> find *VERY* appealing.  The only minor problem there was actually locating
> the downloadable, but I have it now, so I shouldn't have to mess around
> with that again.  :) )
>
> --Rob
>
>




Brought to you by the 6809, the 6803 and their cousins! 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/ColorComputer/

<*> To unsubscribe from this group, send an email to:
    ColorComputer-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 





More information about the Coco mailing list