[Coco] Need some advice
Gene Heskett
gheskett at wdtv.com
Sun May 22 06:25:12 EDT 2016
On Sunday 22 May 2016 00:56:45 William Astle wrote:
> Even so, hand written 6809 code is often slightly better than the
> gcc6809 output and occasionally substantially better. That's not
> surprising given the general simplicity of the 6809 itself. It also
> supports your reluctance to wager that the compiler will always be
> better.
My experience is 100% with the microware compiler, and the 6809's lack of
barrel shifter is the biggest speed killer because it writes a left or
right shift as a loop, shifting one bit each time thru the loop. So
back in the day when rzsz was the best way to guarantee error free
transmission, I ran the compiler to the output of c.comp and then looked
at the assembly it generated. Any time I came across one of those loops
for 8 or more shifts, it got replaced by a TFR A,B;CLRA if it was a
right shift, or a TFR B,A;CLRB f it was a left shift. And the crc
routines were loaded with those cases. That alone took rzsz from around
430 CPS (running on a 6309) to a bit over 575. Then Byte magazine had
an article showing how crc's could be done even faster with a lookup
table to do a simple add, claiming it was faster. So that got
incorporated into rzsz-3.3.6, and on a 6309, I consistently got 730 cps.
FWIW rxsz does this crc a bte at a time. I never tried to make the table
lookup method run on 256 bytes at a time, but my tests outside of rzsz
showed a more than tripled speed for that function alone due to the
elimination of all the calls and returns the one byte at a time gave.
But that would have required major surgery to rzsz that I didn't feel I
was capable of. Whether that might have gotten rzsz up to 9600 baud I'm
not sure, as is I think I got the best practical speed I could get out
of that code base.
I was hoping that somewhere in the 6309, there was a 16 bit barrel
shifter hidden but no one ever found it. Sniff.
Another point was that the compiler used SEX commands like they were
Orvilles popcorn. One could go thru the assembly code to see if it was
needed, or that data was just thrown away. By that criteria, 90% of
them could be removed, and that got a couple percentage points of speed.
Along the line, the compiler showed me a few tricks I have since used in
my own assembly efforts, and I squeezed as much extra speed as I could
out of it by my hand optimizations. Now the pc versions have extended
the block size to 8 kilobytes before they do a crc check. That of
course falls face down in its morning bowl of oatmeal, so you MUST limit
the pc's "window" to 256 bytes, and because the driver code has bit
rotted and lost the ability to swap the ready signals, the pc and the
coco must be held to 4800 baud max because the cts handshaking is broken
and cannot use the "7 wire" protocol for flow control.
It was fun for me at the time, but that time was 20 years ago. Now I am
in the fading years, trying to remember what I had for breakfast, or
even if I had breakfast. :(
Cheers, Gene Heskett
--
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Genes Web page <http://geneslinuxbox.net:6309/gene>
More information about the Coco
mailing list