[Coco] RainbowArchive . The Rainbow Archive Project

Thu Jun 16 12:29:26 EDT 2005

I would be just as interested in the product if OCR was left out. Half
the fun of receiving Rainbow was keying in the listings :)

Rod

On 6/16/05, Michael Wayne Harwood <michael at musicheadproductions.org> wrote:
> Here is a reply from Lonnie when I asked him for hie thoughts in general
> about the project, and if he would be able to provide any insight that
> might be helpful...
> 
>   It depends on how you view the project. The main thing is to get it done
> in a reasonable period of time. Every time you add an element to it, you
> not only increase cost (whether it is volunteers' cost in time or actual
> cost in money for having to buy something or pay someone) but you
> increase the time it takes to have the finished product in hand.
> 
>   Let's just apply it to the OCR question. Certainly, having the whole
> thing in machine-readable format would be great. But remember, when the
> people who sell OmniPro say they have a 99 percent accuracy rate, that
> means that one letter in every 100 is wrong. You not only have to fix
> it, you have to find it first. May the Good Lord help the poor soul who
> would have to proof those pages and pages of listings. We NEVER
> rekeyboarded the listings, we just printed them out from the tapes or
> disks we required the author send us. One character wrong out of every
> 100 in a DATA statement? I am betting the proofing process would double
> the time in getting this project out. Is that acceptable? Is it that
> important to that many people?
> 
> This reply and the constraints surrounding licensing led me to make the
> statement that we should have one person step up and lead the OCR charge
> and come up with an actual baseline we can use to determine how much work
> will need to be done, how feasible it is, how much time would be needed,
> etc.  Once that has been established we would have the data to determine
> whether it's worth it.  Alot of people have been discussing OCR, some have
>  said that without it the product would not a very attractive, so I
> thought that I would immediately have a person step up and say "I'll prove
> this will be workable", but so far that hasn't happened on this list.
> 
> If OCR is important enough someone should step up and volunteer to execute
> a feasibility study and document the process and time it will take for all
> aspects of the project.  There would be two main deliverables:
> 
> 1. Text files of the entire text OCR'd
> 2. A searchable PDF file
> 
> I'll ask again - are there any takers willing to do the study?
> 
> Regards,
> Michael Harwood
> 
> > Sure, something can be done. But constraints are constraints. They will
> either affect the quality or the cost or timeliness of delivery of the
> product.
> >
> > Anwyay, this does mean that we can't do the kind of OCR that's been
> described so far. The only way I think that can work cost effectively is
> if we distribute the work of doing corrections. say one issue for a
> given person.
> >
> > At least assuming we want a good quality OCR, anyway. If it's good
> enough to distribute a machine's first approximation of the text, then I
> think OCR is doable, otherwise, I'm thinking it isn't.
> >
> > Here's the way OCR work on volunteer projects is usually done: one
> person scans in the work. Then fragments are given to various people to
> provide corrections. Usually it's a lot of people, since it's a lot of
> work. In general though that's not a problem for the copyright holder
> since no one has the whole work... everyone interested enough to
> volunteer is going to come back and license a copy at the end of the
> day. Perhaps Lonnie's concerns are more the legal issues of having a lot
> of claims after the fact on the work. But there's no reason that stuff
> can't be handled completely by the legal agreement.
> >
> > -- John.
> 
> 
> 
> --
> "The best place to be is here,
>  the best time to be is now."
>              -- Bill and Ted
> 
> 
> --
> Coco mailing list
> Coco at maltedmedia.com
> http://five.pairlist.net/mailman/listinfo/coco
> 

-- 
                                   _
ASCII ribbon campaign (  )
  - against HTML email  X
                    & vCards / \