[Coco] disk files and program speed

Mon Nov 23 18:39:33 EST 2009

I have come to the conclusion that fast and disk files do not belong in the same sentence together. "fast disk search" is an oxymoron.

I was going over the post from jdiffendaffer concerning searches. I knew that I was getting the entire record before comparing the fields, and thought, "What if I only get the fields that are to be compared? Surely getting 1 byte and 1 other field for comparison is faster than getting the whole record. Then, if a match is found, I can seek to the correct position and update the reference count." Well, the extra seeks involved in getting 2 fields of a record from a disk file slowed the program down alot. Decoding createmsg went from 2:10 to 2:49.

It looks like the only way to speed the searches and sorts up is to find a way to deal with them in memory, at least the parts that need to be compared. I would use arrays, but there is a fundamental problem. Differing sizes of arrays. One procedure may contain 300 variable references, while another contains over 2000. One may contain no line references while another contains over 200. And the literals? Decode (before the string optimizations) contained 549 unique literal references, and over 1000 total. I wish there was a way to dynamically add to the elements of an array, but there isn't. Because of this, I'd have to establish arrays large enough to handle the largest number of references that could be contained in a procedure.

I have calculated that, even if the arrays only contain the fields for comparison, the variable reference array would have to be 600 minimum elements, each element composed of 1 byte and 1 integer, for a total of 1800 bytes. The line reference array would be structured similarly, but wouldn't need more than 300 elements, since I have yet to see a procedure with more than 150 line references, but that may not be the max I could run into. So, that's another 900 bytes. The literals references would need 2 bytes, 1 integer, 1 real and a string of 80 per element, and (based on decode) I would need at least 600 elements. That's 89 bytes times 600 elements, for a total of 53,400 bytes.

Add all three together, and you get a total size, just for the arrays, of 56,100 bytes of data space. I don't think OS9 or NitrOS9 will allow data of that size in memory.

You may ask, why do you need to count all those literals anyway? I knew, back when I wrote DCom, that I needed to find a way to correlate variables passed as parameters to the procedures they are passed to. While this is not necessary to the idea of reproducing the source code of any one procedure, it would help with figuring out how your procedures were related.

Named subroutines are the easy ones. The procedure name exists in the subroutine section of the VDT. If you used a string variable to name the procedure to be called, all you get is a reference to a string variable. Without knowing what the name of the procedure is, it's impossible to match parameters between them.

In addition, counting the literals can help show you where you can optimize the literals and reduce code size. While I still haven't gotten to the point where I can say literal X contains the name of the procedure called in RUN statement Y, I am getting closer to that mark.

I can probably put sorting the literals (and dealing with that association) in a different program, but I still have to be able to identify them, and doing it while going through the first pass through the instruction code is the best time. But maybe, for the above reasons, it isn't. Maybe I should put literals on the shelf for now and concentrate on getting decode to correctly identify everything else.

Wayne