[Coco] CoCo bitbanger does 115200 bps !
L. Curtis Boyle
curtisboyle at sasktel.net
Wed Apr 5 18:22:40 EDT 2006
Could you give yourself more cycles by preloading the DP register with
$FF, and then doing ldb <$22 when reading the bits? Would that give you
enough extra time to either bump it up in speed, or add a little code for
reliability testing?
On Wed, 05 Apr 2006 07:30:05 -0600, Boisy Pitre <boisy at boisypitre.com>
wrote:
> On Apr 4, 2006, at 11:41 PM, George's Coco Address wrote:
>
>> I'm a bit ignorant about just how fast the coco can process data. So,
>> please be kind to me when I present my thoughts on this..
>>
>> If we were to process 115,200 bits per second through the bitbanger
>> port, how many clock cycles and instructions per bit, would this
>> require?
>> I pushed this into my calculator and found that each bit from the
>> bitbanger port takes about fifteen and a half clock cycles per bit when
>> the coco is running at 1.78mhz. Machine code instructions take several
>> clock cycles. Don't ask me how many, but I know it isn't one per cycle.
>
> Your calculations are correct, as is your observation about cycle count:
> every instruction on the CoCo takes at least 2 clock cycles, unless
> you're running a 6309 in native mode where some instructions are just 1
> cycle.
>
> The code below could be optimized for size, and there are most certainly
> variations of how this could be done, but it amply demonstrates how a
> CoCo 3 could receive a byte from a sender at 115,200 bps (cycle counts
> for each instruction are in parentheses):
>
> * Mask interrupts
> orcc #IRQ+FIRQ
> * Wait for an incoming start bit
> WaiStart lda >$FF20+2 (5) get byte that holds bit banger bit
> lsra (2) shift in data bit 0 to carry
> bcs WaiStart (3) branch if bit 0 was zero
>
> clra (2)
> nop (2)
>
> * Bit 0
> ldb >$FF20+2 (5) get PIA byte into B...
> lsrb (2) shift bit 0 into carry...
> rora (2) and roll Carry into hi bit of A
> nop (2)
> nop (2)
> nop (2)
>
> * Bit 1
> ldb >$FF20+2 (5) get PIA byte into B...
> lsrb (2) shift bit 0 into carry...
> rora (2) and roll Carry into hi bit of A
> nop (2)
> nop (2)
> nop (2)
>
> * Bit 2
> ldb >$FF20+2 (5) get PIA byte into B...
> lsrb (2) shift bit 0 into carry...
> rora (2) and roll Carry into hi bit of A
> nop (2)
> nop (2)
> nop (2)
>
> * Bit 3
> ldb >$FF20+2 (5) get PIA byte into B...
> lsrb (2) shift bit 0 into carry...
> rora (2) and roll Carry into hi bit of A
> nop (2)
> nop (2)
> nop (2)
>
> * Bit 4
> ldb >$FF20+2 (5) get PIA byte into B...
> lsrb (2) shift bit 0 into carry...
> rora (2) and roll Carry into hi bit of A
> nop (2)
> nop (2) additional NOP to catch up to the 4 .5 cycles
> passed already
> nop (2)
> nop (2)
>
> * Bit 5
> ldb >$FF20+2 (5) get PIA byte into B...
> lsrb (2) shift bit 0 into carry...
> rora (2) and roll Carry into hi bit of A
> nop (2)
> nop (2)
> nop (2)
>
> * Bit 6
> ldb >$FF20+2 (5) get PIA byte into B...
> lsrb (2) shift bit 0 into carry...
> rora (2) and roll Carry into hi bit of A
> nop (2)
> nop (2)
> nop (2)
>
> * Bit 7
> ldb >$FF20+2 (5) get PIA byte into B...
> lsrb (2) shift bit 0 into carry...
> rora (2) and roll Carry into hi bit of A
> rts (5)
>
> The goal is to perform the nine "ldb >$FF20+2" instructions as close to
> the middle of the bit transition time as possible. The code that loops
> waiting for the start bit is as small as I can possibly imagine and is
> 10 cycles in length. The bit is actually sampled by the ldb instruction
> around cycle 4. Therefore, worst case is that the start bit comes at
> cycle 5 (right after we sampled it) so we have to wait 10 cycles to see
> it and 6 more cycles to clear the final cycle of the lda and the
> lsra/bcs instructions for a total worst case of 16 cycles. The best
> case is that the bit comes in right when we check, meaning we stay in
> the loop for only 6 cycles. We get the average case: (16 + 6)/2 = 11
> cycles. To get up to the magic 15.5, I've added 2 nops after the bcs.
> Additional tweaking by adding or removing these nops may be necessary.
>
> Things get a little more interesting if we want to implement a timeout.
> Additional code in the WaiStart portion would be necessary, adding to
> the best/worst/average case timings. Also, this code doesn't take into
> account any additional code that would normally 'bsr' into this code to
> get a byte. There are about 18 additional cycles that we can use up
> after the 'rts' since we aren't waiting around for the stop bit. We've
> got enough time to store the data byte in A somewhere and call WaiStart
> again to recover the next byte (assuming the sending ACIA is going at
> full speed).
>
> Code to send out a byte at 115,200 from the CoCo 3 would look similar,
> except that the CoCo would be in control of the timing at that point,
> not the other end. So while timing is critical here too, there is no
> start bit to detect since it is generated.
>
> I believe this is close to the approach Roger took with his code. Roger?
>
>> That being asked, does a "stock" coco at 1.78mhz have time to do this
>> and still be able to store data onto a storage device? Or is this just
>> displaying it to the VDG generator?
>
> There's enough time to grab a byte and store it. The code is idealized
> however. What is desirable in a normal situation is to have some type
> of timeout so that if the start bit isn't detected after a certain
> amount of time, the routine is exited. This is especially true of OS-9;
> it would not be nice to just sit and wait forever. Adding code at
> WaiStart to handle a timeout would push the average case computed above
> outside of the critical 15.5 cycles. It becomes a hit or miss
> proposition at that point.
>
>> Can this be useable?
>
> It depends upon the design of the protocol. The challenge I had when
> developing DriveWire was to balance speed with reliability. At 57,600
> errors can occur occasionally, that's why DriveWire uses a checksum
> method on larger data packets to provide some degree of data integrity.
>
>> If so, I bet when other folks jump in and expand on it, this could be a
>> winfall for OS-9.
>
> When I developed DriveWire some three years ago, I analyzed
> communications behavior at 115,200 on the bitbanger with code similar to
> the above. Given the requirements that I placed on the desired product,
> going that fast was just too problematic. For a disk access product
> like DriveWire, reliability must trump speed or else data corruption
> will become a reality.
>
> For a DLOAD type protocol that Roger has talked about, then 115,200 bps
> could be workable. With DLOAD, no timeout is necessary and so timeout
> code is not an issue. A simple protocol could be developed to allow the
> sender to send out bursts of data of a predetermined size that the CoCo
> would then consume. Some form of bi-directional communication would
> need to be hashed out, and I think that's the next challenge.
>
> Boisy
>
--
L. Curtis Boyle
More information about the Coco
mailing list