[Coco] [Color Computer] .djvu vs .pdf: example files posted

Michael Wayne Harwood michael at musicheadproductions.org
Wed Jul 13 13:00:47 EDT 2005


The OCR was generated using Lizard Tech's commercial product.  There are
currently no open source "out of the box" methods for OCR generation to the
DjVU format.  

The reason that test1 has OCR and test2 does not is that the source for
test1 was a high resolution file that worked well as OCR input, while test2
source images were resized to 1/3 of their size to promote better generation
of both .pdf and .djvu.  Interestingly enough both the .pdf and .djvu end
products form a higher resolution file was worse than the results after I
resized the source images.

I am playing around with GOCR a little right now (lunch break) and seeing
whether the OCR output is worth the trouble.  If the results are ok and I
can get bounding box information as an output I might be able to automate an
OCR process that will produce a file I can import into the DjVU files.

Keep the comments coming - good stuff...

Regards,
Michael Harwood


-----Original Message-----
From: coco-bounces at maltedmedia.com [mailto:coco-bounces at maltedmedia.com] On
Behalf Of Mark Anderson
Sent: Wednesday, July 13, 2005 10:23 AM
To: coco
Subject: [Coco] [Color Computer] .djvu vs .pdf: example files posted

Michael, no question --- djvu.  Others seem to be very PDF happy (and with
good reason since PDF is the STANDARD)  but I like programs to open
immediately when I click on a file.  The DJVU web browser plug-in I have
opens immediately on a double click of a djvu file.  With PDF, there is that
annoying wait.  And my laptop fan kicks on more with PDF's.  Call me picky
and anal but that's the way I am. But wait...there's more, REAL practical
reasons to follow...

The OCR of the test1.djvu  is simply amazing. I can actually select text on
the August 1989 COVER!!  Why is the OCR disabled in the test2.djvu?  I admit
test2.djvu is slightly cleaner,  but test1 is acceptable, especially with
the OCR included!  I was amazed that I can select the text on the cover.
Did the djvu conversion software do that OCR process automatically?  If so,
how cool is that.......

Michael, ironically, I uploaded to your ftp a muich better version of that
Aug. 1989 cover weeks ago.  Do you still have it?  Did I send it to you in
an unacceptable format?  Let me know.

You guys have to admit that file size on test1.djvu with the OCR is mighty
attractive.  Its crisp and colorful enough for me with OCR.  Artifacts in
DJVU are not as annoying as PDF artifacts at low compression values.  Also,
the DJVU haters need to realize that there is a all black and white mode you
can select.  Right click on the DJVU file while open and select "display".
There are 4 modes, color, b&w, background and foreground.  Play with those
while viewing the test1.djvu sample Michael provided on your machine.
Test1.djvu is the only one it works with.  AMAZING.  How does DJVU do that?
I'm able to look at the cover WITHOUT the text and I'm also able to look at
the pages without text and the beautiful all black and white mode on top of
it all.  All up to me the end user viewing the teeny weeny 502kb DJVU file.

Thus endeth the sermon.

Mark
rammdesign at msn.com<about:blank>
ittybittybeadco at msn.com

--
Coco mailing list
Coco at maltedmedia.com
http://five.pairlist.net/mailman/listinfo/coco




More information about the Coco mailing list