[Coco] Test OCR

Luis Fernández luis46coco at hotmail.com
Thu Apr 28 14:29:37 EDT 2011


Okbut
weight file, 6.6 mb not 20 or 40
is automatic

without retouching, I did nothing (only the first 20 pages)

is made ​​perfect text pages

I'm leaving now slowing is very similar to the original, the biggest problem now is the lines and black backgrounds

but that aside

if I do it with rainbow with magazines not processed by tim, and the OCX is almostperfect

Maybe Tim could simply place the text and coconut that helps people, would haveless work

Statistically, the article would have 5 errors per page, without corrections, and canprocess multiple DIIA magazines. I think

please see page 83 of Hot ...... 1983-Dec retouch.pdf Optimizer & OCR

is almost perfect, just align the blocks and re-read it, this text would serve to Timabout No?

> Date: Thu, 28 Apr 2011 02:13:45 -0400
> From: aawolfe at gmail.com
> To: coco at maltedmedia.com
> Subject: Re: [Coco] Test OCR
> 
> 2011/4/28 Luis Fernández <luis46coco at hotmail.com>:
> >
> 
> > Aaron, please listen to me, I want to help, please, compare with cococoging, I'm notcompeting, I just want the best for the community coconut, and I made an advanceinterlock
> >
> 
> no hay problema
> 
> I think Tim's project is the best way to preserve the Rainbow and
> other mags like the Hot Coco in the long term.  I think it is the only
> way to accurately translate the printed material into digital text.
> Unfortunately, it is a huge amount of work, as you are finding out
> with your experiments.  Tim's project helps tremendously by splitting
> this work up and letting everyone easily spend a little time helping.
> I think it's great.
> 
> CoCoCoding is a different approach, with different goals and different
> results :)  The processing I do on the mags is automated, which puts
> it in the realm of what a single person can do with a little spare
> time here and there.  The OCR results are much inferior to Tim's
> project, with most OCR being so poor that you cannot copy and paste
> text without manually editing it.  The goal was never to provide what
> Tim's project provides (perfect digital recreation).   Instead, I
> wanted to reduce the file sizes for what we already had and improve
> the human readability by using ClearScan.  The settings I used were
> chosen with these goals in mind, not to produce 100% accurate OCR.
> The OCR is simply good enough that when combined with the Google Docs
> engine running the cococoding site, you can do reasonably accurate
> queries for topics and phrases and see results from lots of mags and
> documentation.  It isn't perfect, but it can often answer a question
> quickly.. sort of like Google itself.
> 
> When Tim's rainbow project is complete, there will probably be no need
> for the processed files.  Instead, you can generate perfect PDFs (or
> any other format) from the content stored in Tim's database.
> CoCoCoding is a temporary attempt to improve things until then.
> 
> I hope you can work with Tim and combine efforts, since you both seem
> to want the same results.  I will continue to help on Tim's cocomag
> site and encourage everyone to consider spending a few minutes now and
> then helping out there.
> 
> -Aaron
> 
> --
> Coco mailing list
> Coco at maltedmedia.com
> http://five.pairlist.net/mailman/listinfo/coco
 		 	   		  


More information about the Coco mailing list