[Coco] Optimizing BASIC (was: Expanding 4K Coco Mem)

Allen Huffman alsplace at pobox.com
Fri Jan 2 10:02:46 EST 2015


WARNING: Long post about BASIC stuff (I will be taking this and making it an article on my site, I think)....

> On Jan 2, 2015, at 12:26 AM, Arthur Flexser <flexser at fiu.edu> wrote:
> 
> A little googling identified the utility I was referring to as "The
> Stripper" from Eigen Systems.  Apparently, there was also a similar program
> by Bob van der Poel called "Packer".  Maybe one or the other is available
> online someplace.
> These should be useful for those finding tight memory for Basic programs in
> the 4K contest.

Art, I recall "The Stripper" standing out because of its name, but I never say it. I did, however, get a cool repack program from Carl England. I used it on one of my programs at Sub-Etha Software. Soon I will get to it and post more about it. I do not know how it compares with The Stripper or Packer, but beyond true optimization of the code, it did about all you could do. Here are the tricks he used which produce the smallest and fastest version of the same program, but render it (potentially) un-editable and hard to read.

NOTE: Code can be written for size or speed. Carl's program optimized for size which in some cases made it faster since BASIC had less to parse, but as you will see, some optimizations might increase execution time making it slower. Maybe.

1. Remove all spaces. The less bytes to parse, the smaller and faster it will run. It has to be smart enough to know when spaces are required, such as "FORA=1TOQ STEP1" when a variable needs s space after it before another keyword.

2. Pack/combine lines where possible. Any line called by a GOTO had to remain at the start, by following lines would be combined up to the max size of a BASIC line. Only when it had to be broken by logic (ELSE, etc.) would it not be combined. For example:

10 FOR I=1 TO 100
20 GOSUB 50
30 NEXT I
40 END
50 REM PRINT SOMETHING
60 PRINT "HELLO WORLD!"
70 RETURN

...would end up as:

10 FORI=1TO100:GOSUB60:NEXTI:END
60 PRINT"HELLO WORLD!":RETURN

3. Remove all REMs. Obvious. I *think* his program would adjust a GOTO/GOSUB that went to a REM line to go to the next line after it, and remove the REM:

10 GOSUB 1000
20 END
1000 REM PRINT SOMETHING
1010 PRINT "HELLO"
1020 RETURN

...would become like:

10 GOSUB1010:END
1010 PRINT"HELLO":RETURN

NOTE: Even without packing, I learned not to GOTO or GOSUB to a REM since it required the interpreter to parse through the line. I would do things like:

10 GOSUB 1010
...
1000 REM MY ROUTINE
1010 PRINT "HERE IS WHERE IT STARTS"
...

Every thing you do like that saves a bit of parsing time since it doesn't have to start parsing 1000 and look up a token then realize it can ignore the rest of the line.

Not bad so far

But that's not all!

The BASIC input buffer is some maximum number of bytes (250?). When you type a line to max, it stops you from typing more. When you hit ENTER, it is tokenized and could be much smaller. For instance, "PRINT" turns in to a one-byte token instead of the five characters. Carl's program would combine and pack the tokens up to the max line size. Thus, it created lines that BASIC could run which were IMPOSSIBLE to enter on the keyboard. If I recall, you could LIST and they would print (?) but if you did an EDIT you were done.

5. Renumber by 1. GOSUB2000 would take up one bye for the token, then four bytes for the line number. If you renumber by 1s, it might make that function be GOSUB582 and save a byte. Multiply that by every use of GOTO/GOSUB/ON GOTO/etc. and you save a bit.

Even without one of these compressors, try a before/after just doing a RENUM 1,1,1. I recently posted an article with a short (27 line) word wrap routine in BASIC. Doing this renum saved 7 bytes.

6. Removing NEXT variables. I am not sure if this is just something I knew, or if his program also did it. If you use FOR/NEXT and it is used normally, you don't need the NEXT variables like this:

10 FOR D=0 TO 3 'DRIVES
20 FOR T=0 TO 34 'TRACKS
30 FOR S=1 TO 18 'SECTORS
40 DSKI$ D,T,S,S1$,S2$
50 NEXT S
60 NEXT T
70 NEXT D

The NEXTs could also be

50 NEXT S,T,D

Or

50 NEXT:NEXT:NEXT

Ignoring spaces, "NEXT:NEXT:NEXT" takes up one less byte than "NEXTS,T,D" (NEXT is turned in to a token, so it is TOKEN COLON TOKEN COLON TOKEN versus TOKEN BYTEVAR COMMA BYTEVAR COMMA BYTEBAR".

This takes up less memory, but there is a speed difference:

10 T=TIMER
20 FOR I=1 TO 10000
30 REM DO NOTHING
40 NEXT I
50 PRINT TIMER-T

Run this and it shows something in the 1181 range (speed varies based on other things BASIC does; pound on keys while it runs and it takes more time, for example). Change the "NEXT I" to "NEXT" and it shows something in the range of 1025. The more FOR variables, the more the savings, too-

10 TM=TIMER
20 FOR D=0 TO 3
30 FOR T=0 TO 34
40 FOR S=1 TO 18
50 REM DO SOMETHING
60 NEXT S
70 NEXT T
80 NEXT D
90 PRINT TIMER-TM

...shows around 345. Changing it to:

60 NEXTS,T,D

...changes it to around 336 - slightly smaller and faster since it is not parsing three different lines.

60 NEXT:NEXT:NEXT

...changes it to around 293! One byte smaller and slightly faster.

AND THERE'S MORE! But this post is already too long.

What I think would be cool is an actual BASIC OPTMIZER that could do some smart things to programs, much like we do with the C language optimizers for asm and C. For example, noticing stupid things:

A$ = B$+"."+"."+C$

to

A$ = B$+".."+C$

And removing NEXT variables, or doing minor restructuring based on what it knows about the interpreter.

More to come...

		-- Allen



More information about the Coco mailing list