[Coco] Need some advice

Gene Heskett gheskett at wdtv.com
Sun May 22 06:25:12 EDT 2016

On Sunday 22 May 2016 00:56:45 William Astle wrote:

> Even so, hand written 6809 code is often slightly better than the
> gcc6809 output and occasionally substantially better. That's not
> surprising given the general simplicity of the 6809 itself. It also
> supports your reluctance to wager that the compiler will always be
> better.

My experience is 100% with the microware compiler, and the 6809's lack of 
barrel shifter is the biggest speed killer because it writes a left or 
right shift as a loop, shifting one bit each time thru the loop.  So 
back in the day when rzsz was the best way to guarantee error free 
transmission, I ran the compiler to the output of c.comp and then looked 
at the assembly it generated.  Any time I came across one of those loops 
for 8 or more shifts,  it got replaced by a TFR A,B;CLRA if it was a 
right  shift, or a TFR B,A;CLRB f it was a left shift. And the crc 
routines were loaded with those cases.  That alone took rzsz from around 
430 CPS (running on a 6309) to a bit over 575.  Then Byte magazine had 
an article showing how crc's could be done even faster with a lookup 
table to do a simple add, claiming it was faster.  So that got 
incorporated into rzsz-3.3.6, and on a 6309, I consistently got 730 cps.
FWIW rxsz does this crc a bte at a time.  I never tried to make the table 
lookup method run on 256 bytes at a time, but my tests outside of rzsz 
showed a more than tripled speed for that function alone due to the 
elimination of all the calls and returns the one byte at a time gave.  
But that would have required major surgery to rzsz that I didn't feel I 
was capable of. Whether that might have gotten rzsz up to 9600 baud I'm 
not sure, as is I think I got the best practical speed I could get out 
of that code base. 

I was hoping that somewhere in the 6309, there was a 16 bit barrel 
shifter hidden but no one ever found it.  Sniff.

Another point was that the compiler used SEX commands like they were 
Orvilles popcorn.  One could go thru the assembly code to see if it was 
needed, or that data was just thrown away. By that criteria, 90%  of 
them could be removed, and that got a couple percentage points of speed.

Along the line, the compiler showed me a few tricks I have since used in 
my own assembly efforts, and I squeezed as much extra speed as I could 
out of it by my hand optimizations.  Now the pc versions have extended 
the block size to 8 kilobytes before they do a crc check.  That of 
course falls face down in its morning bowl of oatmeal, so you MUST limit 
the pc's "window" to 256 bytes, and because the driver code has bit 
rotted and lost the ability to swap the ready signals, the pc and the 
coco must be held to 4800 baud max because the cts handshaking is broken 
and cannot use the "7 wire" protocol for flow control.

It was fun for me at the time, but that time was 20 years ago.  Now I am 
in the fading years, trying to remember what I had for breakfast, or 
even if I had breakfast. :(

Cheers, Gene Heskett
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Genes Web page <http://geneslinuxbox.net:6309/gene>

More information about the Coco mailing list