[Coco] 10 years of Delphi posts (1985-1995)

Roger Taylor operator at coco3.com
Sat Oct 3 23:43:56 EDT 2009


At 10:14 PM 10/3/2009, you wrote:
>If you want a second set of eyes or help on the messages, feel free 
>to send me the Delphi message files and I can try to relink the 
>messages back into coherent threads.


I think the debate is whether to use subject matching or ID linking 
to reduce the "look" of broken threads.  The ID linking is the best 
method because all replies definately get linked to their parents, 
but then you see broken threads. With subject matching you're 
filtering out all but the original subject text (and obviously 
trimming it and using case insensitive comparison) and joining all 
like-subjects.  This has the side effect of merging threads that 
might have the same subject line but aren't really related.  I've 
toyed with all methods so far.  The Delphi archives are definately 
already has thousands of messages missing, but then we're talking 
about almost 23,000 messages and I'm not even done yet with 
Delphi.  I'd say that at least 5% of the messages point to a missing 
parent.  I'm not sure what kind of help would be best unless we 
create prefab messages for the missing ones.  That's a lot of fake 
messages to create.







>--------------------------------------------------
>From: "Roger Taylor" <operator at coco3.com>
>Sent: Saturday, October 03, 2009 3:35 PM
>To: "CoCoList for Color Computer Enthusiasts" <coco at maltedmedia.com>
>Subject: Re: [Coco] 10 years of Delphi posts (1985-1995)
>
>>At 02:12 AM 10/3/2009, you wrote:
>>>I was looking over the Delphi messages and noticed how fragmented 
>>>the threads are as well. I doubt what you're suggesting would be 
>>>much help because the message referenced in the Re: tags don't' 
>>>exist. In other words, look at the Iron Forest messages and you'll 
>>>see 52635 re: 52438 in one thread and 52594 re: 52578 with a reply 
>>>52813 re: 52594. The problem here is that (as far I can tell), 
>>>neither 52438 nor 52578 exist in the archive (presumably because 
>>>they were deleted from Delphi before the messages were captured).
>>>
>>>I wonder if it might be better to thread the messages by subject? 
>>>That would probably be the easiest thing to do given the number of 
>>>message missing from the archive. Another possibility might be to 
>>>try to rebuild the tree by creating dummy placeholders for the 
>>>missing messages, but that would probably require some amount of 
>>>fuzzy matching since I'm sure some threads have intermediate 
>>>linking messages missing as well. In other words, A->deleted->C is 
>>>likely fairly trivial to relink into a coherent thread, but 
>>>A->deleted->deleted->D is a little trickier.
>>
>>I might try using a backwards search for the parent message if the 
>>parent ID doesn't actually exist. This is where I can just filter 
>>the subject line, removing the (Re: #####) part and looking for the 
>>most recent same subject, and Assuming it's the parent.  If no 
>>equal subject is found over, say, a week's time, then the message 
>>would become a New Post.
>>
>>--
>>~ Roger Taylor
>>
>>
>>
>>--
>>Coco mailing list
>>Coco at maltedmedia.com
>>http://five.pairlist.net/mailman/listinfo/coco
>
>--
>Coco mailing list
>Coco at maltedmedia.com
>http://five.pairlist.net/mailman/listinfo/coco

-- 
~ Roger Taylor





More information about the Coco mailing list