[Coco] the final stage of optimization

Sat Nov 21 16:22:01 EST 2009

If you can fit the entire file in a ram disk, then that would be an
easy way to speed things up.  The other thing I suggested with keeping
just the keys in ram is useful when you cannot keep all the data in
ram for whatever reason.

Basically, if you are sorting/looking for matching records based on a
key, then you only need to keep that key in ram, along with a pointer
that tells the program where to get the rest of the record from,
usually a specific place in a disk file.   This way you can do very
fast sorts and searches but only use up the ram taken by the field(s)
you need to sort/search on instead of the entire record.  Your memory
structure does not have to mirror the disk structure until you want it
to, so in this way you can do sorts and such entirely in ram and then
only write out the results to disk, or in other words know in advance
where each record needs to be moved to/from and do it all in one disk
processing pass.

How you store the keys and pointers can have a big effect on how fast
you can do sorts, inserts, etc.  As others have mentioned there are
lots of different strategies.  A simple linked list can be quite fast
for inserts and searching in small sets of data.  For larger sets you
can look at btrees and hashes.  These are just more efficient ways of
implementing a list, taking advantage of clever structure to make
searching or sorting or inserting or X more efficient.  They all have
advantages or disadvantages depending or what you are trying to
optimize the code to do.

Googling for linked list, binary tree, hash, etc will probably find
some good resources on these.  I have a great book on data structures
somewhere, if I can find the title will let you know.

-Aaron

On 11/20/09, Wayne Campbell <asa.rand at yahoo.com> wrote:
> It would be nice to keep the data in ram. I have been toying with the idea of trying to use GFX2's buffer commands to see if I can use them to store the records in memory and work on them there. Then I would only need to create the file when it was ready to be saved. I'm really uneducated in terms of things like indexing and linked-lists and the like. If you wouldn't mind, could you explain the index with keys and record positions idea? It sounds interesting.
>
> Wayne
>
>
>
>
> ________________________________
> From: Aaron Wolfe <aawolfe at gmail.com>
> To: CoCoList for Color Computer Enthusiasts <coco at maltedmedia.com>
> Sent: Fri, November 20, 2009 8:46:09 AM
> Subject: Re: [Coco] the final stage of optimization
>
> Hi, I don't know your project well enough to be sure, but this seems
> like a situation where keeping the data in ram, or at least an index
> with the keys and record positions, would be worth the space.  Have
> you considered something like that?
>
> -Aaron
>
>
> On Fri, Nov 20, 2009 at 2:33 AM, Wayne Campbell <asa.rand at yahoo.com> wrote:
> > I have been aware that the one thing that is slowing decode down more than anything else is having to use disk files to store, search and sort records. This email concerns searches and sorts.
> >
> > I have never really gotten a good understanding of how search and sort algorithms are designed, and how they operate, beyond the least complex. When I wrote DCom, I wrote sort procedures that were recursive. They worked, but I never really understood if they were working efficiently or not. I do know that the sorting operations were faster after they were modified.
> >
> > In decode, I am using a simple search algorithm. Basically, it starts with the first record, and continues reading each succeeding record until either a match is found or the last record is read. The number of records that get added to the file depends on the I-Code being decoded. The more records in the file, the longer the search takes.
> >
> > Three files are created, variables reference, line reference, and literal reference. Decoding createmsg (one of Ron's modules, about 2K in size), the decode takes about 20 minutes. Decoding RConfig (another of Ron's modules, about 16K in size) takes about 40 minutes. Decoding decode (20K) takes 2 hours.
> >
> > I have the same problem with my sort. I'm using a sifter-style (for lack of a better term) of sort. It starts with the first record, proceeds to the last record, then performs the sort in reverse, starting with the last record and proceeding to the first. Each pass of the routine performs both of these operations. It works, and it was much faster than using the top-to-bottom method I first used that was based on the search routine. However, it takes as long to sort the literals records as it does to decode the instruction section of the I-Code module.
> >
> > There has to be a better way. Can someone help me understand how to make my search and sort algorithms faster?
> >
> > Wayne
> >
> >
> >
> >
> > --
> > Coco mailing list
> > Coco at maltedmedia.com
> > http://five.pairlist.net/mailman/listinfo/coco
> >
>
> --
> Coco mailing list
> Coco at maltedmedia.com
> http://five.pairlist.net/mailman/listinfo/coco
>
>
>
>
>
> --
> Coco mailing list
> Coco at maltedmedia.com
> http://five.pairlist.net/mailman/listinfo/coco
>