[Coco] CoCo bitbanger does 115200 bps !

Wed Apr 5 09:30:05 EDT 2006

On Apr 4, 2006, at 11:41 PM, George's Coco Address wrote:

> I'm a bit ignorant about just how fast the coco can process data.  
> So, please be kind to me when I present my thoughts on this..
>
> If we were to process 115,200 bits per second through the bitbanger  
> port, how many clock cycles and instructions per bit, would this  
> require?
> I pushed this into my calculator and found that each bit from the  
> bitbanger port takes about fifteen and a half clock cycles per bit  
> when the coco is running at 1.78mhz. Machine code instructions take  
> several clock cycles. Don't ask me how many, but I know it isn't  
> one per cycle.

Your calculations are correct, as is your observation about cycle  
count: every instruction on the CoCo takes at least 2 clock cycles,  
unless you're running a 6309 in native mode where some instructions  
are just 1 cycle.

The code below could be optimized for size, and there are most  
certainly variations of how this could be done, but it amply  
demonstrates how a CoCo 3 could receive a byte from a sender at  
115,200 bps (cycle counts for each instruction are in parentheses):

* Mask interrupts
          orcc   #IRQ+FIRQ
* Wait for an incoming start bit
WaiStart lda   >$FF20+2	 (5) get byte that holds bit banger bit
          lsra				(2) shift in data bit 0 to carry
          bcs   WaiStart	(3) branch if bit 0 was zero

          clra				(2)
          nop			(2)

* Bit 0
          ldb   >$FF20+2		(5) get PIA byte into B...
          lsrb				(2) shift bit 0 into carry...
          rora			(2) and roll Carry into hi bit of A
          nop			(2)
          nop			(2)
          nop			(2)

* Bit 1
          ldb   >$FF20+2		(5) get PIA byte into B...
          lsrb				(2) shift bit 0 into carry...
          rora			(2) and roll Carry into hi bit of A
          nop			(2)
          nop			(2)
          nop			(2)

* Bit 2
          ldb   >$FF20+2		(5) get PIA byte into B...
          lsrb				(2) shift bit 0 into carry...
          rora			(2) and roll Carry into hi bit of A
          nop			(2)
          nop			(2)
          nop			(2)

* Bit 3
          ldb   >$FF20+2		(5) get PIA byte into B...
          lsrb				(2) shift bit 0 into carry...
          rora			(2) and roll Carry into hi bit of A
          nop			(2)
          nop			(2)
          nop			(2)

* Bit 4
          ldb   >$FF20+2		(5) get PIA byte into B...
          lsrb				(2) shift bit 0 into carry...
          rora			(2) and roll Carry into hi bit of A
          nop			(2)
          nop			(2)   additional NOP to catch up to the 4 .5 cycles  
passed already
          nop			(2)
          nop			(2)

* Bit 5
          ldb   >$FF20+2		(5) get PIA byte into B...
          lsrb				(2) shift bit 0 into carry...
          rora			(2) and roll Carry into hi bit of A
          nop			(2)
          nop			(2)
          nop			(2)

* Bit 6
          ldb   >$FF20+2		(5) get PIA byte into B...
          lsrb				(2) shift bit 0 into carry...
          rora			(2) and roll Carry into hi bit of A
          nop			(2)
          nop			(2)
          nop			(2)

* Bit 7
          ldb   >$FF20+2		(5) get PIA byte into B...
          lsrb				(2) shift bit 0 into carry...
          rora			(2) and roll Carry into hi bit of A
          rts				(5)

The goal is to perform the nine "ldb >$FF20+2" instructions as close  
to the middle of the bit transition time as possible.  The code that  
loops waiting for the start bit is as small as I can possibly imagine  
and is 10 cycles in length.  The bit is actually sampled by the ldb  
instruction around cycle 4.  Therefore, worst case is that the start  
bit comes at cycle 5 (right after we sampled it) so we have to wait  
10 cycles to see it and 6 more cycles to clear the final cycle of the  
lda and the lsra/bcs instructions for a total worst case of 16  
cycles.  The best case is that the bit comes in right when we check,  
meaning we stay in the loop for only 6 cycles.  We get the average  
case: (16 + 6)/2 = 11 cycles.  To get up to the magic 15.5, I've  
added 2 nops after the bcs.  Additional tweaking by adding or  
removing these nops may be necessary.

Things get a little more interesting if we want to implement a  
timeout.  Additional code in the WaiStart portion would be necessary,  
adding to the best/worst/average case timings. Also, this code  
doesn't take into account any additional code that would normally  
'bsr' into this code to get a byte.  There are about 18 additional  
cycles that we can use up after the 'rts' since we aren't waiting  
around for the stop bit.  We've got enough time to store the data  
byte in A somewhere and call WaiStart again to recover the next byte  
(assuming the sending ACIA is going at full speed).

Code to send out a byte at 115,200 from the CoCo 3 would look  
similar, except that the CoCo would be in control of the timing at  
that point, not the other end.  So while timing is critical here too,  
there is no start bit to detect since it is generated.

I believe this is close to the approach Roger took with his code.   
Roger?

> That being asked, does a "stock" coco at 1.78mhz have time to do  
> this and still be able to store data onto a storage device? Or is  
> this just displaying it to the VDG generator?

There's enough time to grab a byte and store it.  The code is  
idealized however.  What is desirable in a normal situation is to  
have some type of timeout so that if the start bit isn't detected  
after a certain amount of time, the routine is exited.  This is  
especially true of OS-9; it would not be nice to just sit and wait  
forever.  Adding code at WaiStart to handle a timeout would push the  
average case computed above outside of the critical 15.5 cycles.  It  
becomes a hit or miss proposition at that point.

> Can this be useable?

It depends upon the design of the protocol.  The challenge I had when  
developing DriveWire was to balance speed with reliability.  At  
57,600 errors can occur occasionally, that's why DriveWire uses a  
checksum method on larger data packets to provide some degree of data  
integrity.

> If so, I bet when other folks jump in and expand on it, this could  
> be a winfall for OS-9.

When I developed DriveWire some three years ago, I analyzed  
communications behavior at 115,200 on the bitbanger with code similar  
to the above.  Given the requirements that I placed on the desired  
product, going that fast was just too problematic.  For a disk access  
product like DriveWire, reliability must trump speed or else data  
corruption will become a reality.

For a DLOAD type protocol that Roger has talked about, then 115,200  
bps could be workable.  With DLOAD, no timeout is necessary and so  
timeout code is not an issue.  A simple protocol could be developed  
to allow the sender to send out bursts of data of a predetermined  
size that the CoCo would then consume.  Some form of bi-directional  
communication would need to be hashed out, and I think that's the  
next challenge.

Boisy