[Coco] Mod10 Suggestions
wmikrut72 at gmail.com
Sat Feb 18 14:03:41 EST 2017
You are right -- I looked at is closer.
One thing I need to do is reverse the order of operations.
The LSLA is performed first.
First I need to store the byte and LSLA the next byte.
Otherwise if I flip it from left to right:
it works perfectly.
On Sat, Feb 18, 2017 at 11:35 AM, William Astle <lost at l-w.ca> wrote:
> Take a closer look. It only does the LSLA on every other digit. It does
> *two* digits per loop, just like Brett's version.
> You can easily pretend all numbers are 16 digits by right justifying the
> numbers in your buffer and padding with zeros.
> On 2017-02-18 10:06 AM, William Mikrut wrote:
>> I like how this works from right to left.
>> The only issue is the LSLA on every number.
>> The algo is to double every other number, starting with the right most
>> digit, and sub 9 if the result is 10 or more.
>> Now if the number is always 16 digits, Brett's 16 bit word seems the
>> easiest way to go.
>> If the number is 13 digits long the 16 bit word method won't work, but I
>> happy to pretend all numbers are 16 digits!
>> I am going to try to include a couple things you showed me into Brett's 16
>> bit chunk method and try a slightly different routine!
>> On Sat, Feb 18, 2017 at 10:22 AM, William Astle <lost at l-w.ca> wrote:
>> On 2017-02-18 12:43 AM, msmcdoug wrote:
>>> Actually I'm surprised noone has suggested bcd arithmetic on the result
>>>> to eliminate divide by 10 loop
>>> BCD would certainly give a predictable overall cycle count. It would
>>> require a significantly different approach, though. The only register you
>>> can use for BCD arithmetic is A and DAA is only useful after ADDA or
>>> I had thought about using BCD but had initially dismissed it due to
>>> possible complexity. However, upon reflection, the extra cycles to use
>>> would probably be less than the average cycle time of the modulus loop
>>> combined or checking for digit overflow during the loop.
>>> I think you could use code that looks something like the following which
>>> is based off Mr. Mikrut's most recent posted code. (warning: mailer code™
>>> follows so it may have errors)
>>> ORG $1200
>>> CCD RMB 16
>>> RESULT RMB 1
>>> START LEAX CCD+16,PCR
>>> LDB #8
>>> LOOP PSHS A
>>> LDA ,-X
>>> CMPA #10
>>> BLO LOOP2
>>> SUBA #9
>>> LOOP2 ADDA ,S+
>>> ADDA ,-X
>>> BNE LOOP
>>> ANDA #$0F
>>> STA RESULT,PCR
>>> ENDPGM RTS
>>> I'm using the stack for a temporary storage location instead of something
>>> PCR relative for code size reasons. You could use the "RESULT variable
>>> the temporary to eliminate stack usage. That would probably be slightly
>>> faster at the expense of two more code bytes. This is one of those
>>> size/speed trade-offs.
>>> DAA has to be used after every addition and only applies to A. Using BCD
>>> means we can eliminate the mod 10 loop and just mask off the upper digit
>>> (BCD stores two decimal digits in a byte). That gives a constant time for
>>> the "mod 10" result and also only takes 2 bytes (and 2 cycles).
>>> I have also eliminated the STATUS variable and just store the result. You
>>> can test RESULT for non-zero trivially so there's no need for a separate
>>> STATUS value.
>>> By my calculation, this version is 32 bytes, requires 1 byte of stack
>>> space, 17 bytes of data space, and runs in a maximum of 351 cycles (and a
>>> minimum of 336 cycles if none of the doubled digits goes above 9). For
>>> analysis, I've assumed 8 bit offsets for the PCR references. 16 bit
>>> in PCR mode are quite a bit more expensive (4 extra cycles and 1 extra
>>> Coco mailing list
>>> Coco at maltedmedia.com
> Coco mailing list
> Coco at maltedmedia.com
More information about the Coco