[Coco] CoCo bitbanger does 115200 bps !

Wed Apr 5 18:22:40 EDT 2006

Could you give yourself more cycles by preloading the DP register with  
$FF, and then doing ldb <$22 when reading the bits? Would that give you  
enough extra time to either bump it up in speed, or add a little code for  
reliability testing?

On Wed, 05 Apr 2006 07:30:05 -0600, Boisy Pitre <boisy at boisypitre.com>  
wrote:

> On Apr 4, 2006, at 11:41 PM, George's Coco Address wrote:
>
>> I'm a bit ignorant about just how fast the coco can process data. So,  
>> please be kind to me when I present my thoughts on this..
>>
>> If we were to process 115,200 bits per second through the bitbanger  
>> port, how many clock cycles and instructions per bit, would this  
>> require?
>> I pushed this into my calculator and found that each bit from the  
>> bitbanger port takes about fifteen and a half clock cycles per bit when  
>> the coco is running at 1.78mhz. Machine code instructions take several  
>> clock cycles. Don't ask me how many, but I know it isn't one per cycle.
>
> Your calculations are correct, as is your observation about cycle count:  
> every instruction on the CoCo takes at least 2 clock cycles, unless  
> you're running a 6309 in native mode where some instructions are just 1  
> cycle.
>
> The code below could be optimized for size, and there are most certainly  
> variations of how this could be done, but it amply demonstrates how a  
> CoCo 3 could receive a byte from a sender at 115,200 bps (cycle counts  
> for each instruction are in parentheses):
>
> * Mask interrupts
>           orcc   #IRQ+FIRQ
> * Wait for an incoming start bit
> WaiStart lda   >$FF20+2	 (5) get byte that holds bit banger bit
>           lsra				(2) shift in data bit 0 to carry
>           bcs   WaiStart	(3) branch if bit 0 was zero
>
>           clra				(2)
>           nop			(2)
>
> * Bit 0
>           ldb   >$FF20+2		(5) get PIA byte into B...
>           lsrb				(2) shift bit 0 into carry...
>           rora			(2) and roll Carry into hi bit of A
>           nop			(2)
>           nop			(2)
>           nop			(2)
>
> * Bit 1
>           ldb   >$FF20+2		(5) get PIA byte into B...
>           lsrb				(2) shift bit 0 into carry...
>           rora			(2) and roll Carry into hi bit of A
>           nop			(2)
>           nop			(2)
>           nop			(2)
>
> * Bit 2
>           ldb   >$FF20+2		(5) get PIA byte into B...
>           lsrb				(2) shift bit 0 into carry...
>           rora			(2) and roll Carry into hi bit of A
>           nop			(2)
>           nop			(2)
>           nop			(2)
>
> * Bit 3
>           ldb   >$FF20+2		(5) get PIA byte into B...
>           lsrb				(2) shift bit 0 into carry...
>           rora			(2) and roll Carry into hi bit of A
>           nop			(2)
>           nop			(2)
>           nop			(2)
>
> * Bit 4
>           ldb   >$FF20+2		(5) get PIA byte into B...
>           lsrb				(2) shift bit 0 into carry...
>           rora			(2) and roll Carry into hi bit of A
>           nop			(2)
>           nop			(2)   additional NOP to catch up to the 4 .5 cycles  
> passed already
>           nop			(2)
>           nop			(2)
>
> * Bit 5
>           ldb   >$FF20+2		(5) get PIA byte into B...
>           lsrb				(2) shift bit 0 into carry...
>           rora			(2) and roll Carry into hi bit of A
>           nop			(2)
>           nop			(2)
>           nop			(2)
>
> * Bit 6
>           ldb   >$FF20+2		(5) get PIA byte into B...
>           lsrb				(2) shift bit 0 into carry...
>           rora			(2) and roll Carry into hi bit of A
>           nop			(2)
>           nop			(2)
>           nop			(2)
>
> * Bit 7
>           ldb   >$FF20+2		(5) get PIA byte into B...
>           lsrb				(2) shift bit 0 into carry...
>           rora			(2) and roll Carry into hi bit of A
>           rts				(5)
>
> The goal is to perform the nine "ldb >$FF20+2" instructions as close to  
> the middle of the bit transition time as possible.  The code that loops  
> waiting for the start bit is as small as I can possibly imagine and is  
> 10 cycles in length.  The bit is actually sampled by the ldb instruction  
> around cycle 4.  Therefore, worst case is that the start bit comes at  
> cycle 5 (right after we sampled it) so we have to wait 10 cycles to see  
> it and 6 more cycles to clear the final cycle of the lda and the  
> lsra/bcs instructions for a total worst case of 16 cycles.  The best  
> case is that the bit comes in right when we check, meaning we stay in  
> the loop for only 6 cycles.  We get the average case: (16 + 6)/2 = 11  
> cycles.  To get up to the magic 15.5, I've added 2 nops after the bcs.   
> Additional tweaking by adding or removing these nops may be necessary.
>
> Things get a little more interesting if we want to implement a timeout.   
> Additional code in the WaiStart portion would be necessary, adding to  
> the best/worst/average case timings. Also, this code doesn't take into  
> account any additional code that would normally 'bsr' into this code to  
> get a byte.  There are about 18 additional cycles that we can use up  
> after the 'rts' since we aren't waiting around for the stop bit.  We've  
> got enough time to store the data byte in A somewhere and call WaiStart  
> again to recover the next byte (assuming the sending ACIA is going at  
> full speed).
>
> Code to send out a byte at 115,200 from the CoCo 3 would look similar,  
> except that the CoCo would be in control of the timing at that point,  
> not the other end.  So while timing is critical here too, there is no  
> start bit to detect since it is generated.
>
> I believe this is close to the approach Roger took with his code.  Roger?
>
>> That being asked, does a "stock" coco at 1.78mhz have time to do this  
>> and still be able to store data onto a storage device? Or is this just  
>> displaying it to the VDG generator?
>
> There's enough time to grab a byte and store it.  The code is idealized  
> however.  What is desirable in a normal situation is to have some type  
> of timeout so that if the start bit isn't detected after a certain  
> amount of time, the routine is exited.  This is especially true of OS-9;  
> it would not be nice to just sit and wait forever.  Adding code at  
> WaiStart to handle a timeout would push the average case computed above  
> outside of the critical 15.5 cycles.  It becomes a hit or miss  
> proposition at that point.
>
>> Can this be useable?
>
> It depends upon the design of the protocol.  The challenge I had when  
> developing DriveWire was to balance speed with reliability.  At 57,600  
> errors can occur occasionally, that's why DriveWire uses a checksum  
> method on larger data packets to provide some degree of data integrity.
>
>> If so, I bet when other folks jump in and expand on it, this could be a  
>> winfall for OS-9.
>
> When I developed DriveWire some three years ago, I analyzed  
> communications behavior at 115,200 on the bitbanger with code similar to  
> the above.  Given the requirements that I placed on the desired product,  
> going that fast was just too problematic.  For a disk access product  
> like DriveWire, reliability must trump speed or else data corruption  
> will become a reality.
>
> For a DLOAD type protocol that Roger has talked about, then 115,200 bps  
> could be workable.  With DLOAD, no timeout is necessary and so timeout  
> code is not an issue.  A simple protocol could be developed to allow the  
> sender to send out bursts of data of a predetermined size that the CoCo  
> would then consume.  Some form of bi-directional communication would  
> need to be hashed out, and I think that's the next challenge.
>
> Boisy
>

-- 
L. Curtis Boyle