[Coco] Rainbow archives in DjVu

Jeff Teunissen deek at d2dc.net
Tue Mar 17 03:00:43 EDT 2009


Bob Devries wrote:
> The question in my mind is:
> 
> Do I need to download yet another file viewer to be able to read these 
> files? I've never heard of this file format before.

You would, yes, but this is one viewer you're likely to be using a lot in the
future; now that there's a free software (GPL) version of the viewer and
libraries, the format is being used for all kinds of things.

For example, Google are using it in their project to digitize all the world's
books, and the Internet Archive (<http://www.archive.org/>) are using it to
store public-domain printed works of all kinds, mostly because of the huge
advantages DjVu has over other formats when it comes to scanned texts. The
technology is used to put out many "magazine on disk" collections, like
Rolling Stone's. Mike Haaland's abortive "Rainbow on Disk" project was also
going to use the (semi-proprietary at the time) format.

In particular, PDF is especially lousy for scans. It's great for stuff that's
made of text, but when you're starting out with a picture of a page, PDF might
as well just be a somewhat worse replacement for a .zip file. DjVu lets you do
a lot more.

DjVu lets you split up a page into multiple layers and add invisible text
blocks and hyper-links to what is basically a picture, so you can do nifty
stuff like search for a word or sentence in a scanned document without
changing its form. That is, you can add links from the table of contents to
the page an article begins on, from one page to another (so you can continue
reading an article that has ads in the middle of it), from one issue to
another (the indexes in the anniversary issues could link directly to the
articles they reference), without converting the whole shebang out of the
format we knew and loved. And since DjVu has web browser plug-ins and Java
viewer applets, someone could set up a Web site where people could browse the
whole collection without downloading any huge files. After all, if a full page
is only 200 kilobytes, it may just use less bandwidth that way.

I'll be doing a lot of the work anyway, because I can't in good conscience
keep those giant 200+MB Rainbow scans around. Especially when I can have
almost the same quality in a tenth of the HDD space and even less time and RAM
used to display them -- where PDF takes 10 seconds, DjView is taking half of
one second. My only real question is whether or not anyone else wants them too. :)




More information about the Coco mailing list