[Coco] Mod10 Suggestions

Sat Feb 18 20:41:36 EST 2017

That's pretty well optimized!  Have you ever considered the difference between optimizing for size and optimizing for speed?  So, for instance, if you weren't necessarily constrained for size but you knew you were going to process a list of jillions of cc numbers would you write it differently?

Dave Philipsen

> On Feb 18, 2017, at 5:06 PM, William Mikrut <wmikrut72 at gmail.com> wrote:
> 
> Some slight re ordering of the code and it works perfectly!
> 48 Bytes total, Less 17 for storage -- 31 program bytes to get the job done.
> 
> My original code was 61 program bytes... down to half the size and does the
> exact same thing.
> Absolutely amazing!
> 
> 
> ORG $1200
> CCD     RMB 16
> RESULT  RMB 1
> 
> START   LEAX CCD+16,PCR
> CLRA
>        LDB #8
> 
> 
> LOOP    ADDA ,-X
>        DAA
>        PSHS A
>        LDA ,-X
>        LSLA
>        CMPA #10
>        BLO LOOP2
>        SUBA #9
> LOOP2   ADDA ,S+
>        DAA
> 
>        DECB
>        BNE LOOP
> 
> 
> 
>        ANDA #$0F
>        STA RESULT,PCR
> ENDPGM  RTS
> END START
> 
>> On Sat, Feb 18, 2017 at 1:03 PM, William Mikrut <wmikrut72 at gmail.com> wrote:
>> 
>> You are right -- I looked at is closer.
>> One thing I need to do is reverse the order of operations.
>> 
>> The LSLA is performed first.
>> First I need to store the byte and LSLA the next byte.
>> 
>> Otherwise if I flip it from left to right:
>> (LEAX CCD,PCR
>> ...
>> LDA ,X+
>> ...
>> ADDA ,X+)
>> 
>> it works perfectly.
>> 
>> 
>>> On Sat, Feb 18, 2017 at 11:35 AM, William Astle <lost at l-w.ca> wrote:
>>> 
>>> Take a closer look. It only does the LSLA on every other digit. It does
>>> *two* digits  per loop, just like Brett's version.
>>> 
>>> You can easily pretend all numbers are 16 digits by right justifying the
>>> numbers in your buffer and padding with zeros.
>>> 
>>> 
>>>> On 2017-02-18 10:06 AM, William Mikrut wrote:
>>>> 
>>>> I like how this works from right to left.
>>>> The only issue is the LSLA on every number.
>>>> 
>>>> The algo is to double every other number, starting with the right most
>>>> digit, and sub 9 if the result is 10 or more.
>>>> 
>>>> Now if the number is always 16 digits, Brett's 16 bit word seems the
>>>> easiest way to go.
>>>> If the number is 13 digits long the 16 bit word method won't work, but I
>>>> am
>>>> happy to pretend all numbers are 16 digits!
>>>> 
>>>> I am going to try to include a couple things you showed me into Brett's
>>>> 16
>>>> bit chunk method and try a slightly different routine!
>>>> 
>>>> 
>>>> On Sat, Feb 18, 2017 at 10:22 AM, William Astle <lost at l-w.ca> wrote:
>>>> 
>>>> On 2017-02-18 12:43 AM, msmcdoug wrote:
>>>>> 
>>>>> Actually I'm surprised noone has suggested bcd arithmetic on the result
>>>>>> to eliminate divide by 10 loop
>>>>>> 
>>>>>> 
>>>>> BCD would certainly give a predictable overall cycle count. It would
>>>>> require a significantly different approach, though. The only register
>>>>> you
>>>>> can use for BCD arithmetic is A and DAA is only useful after ADDA or
>>>>> ADCA.
>>>>> 
>>>>> I had thought about using BCD but had initially dismissed it due to
>>>>> possible complexity. However, upon reflection, the extra cycles to use
>>>>> BCD
>>>>> would probably be less than the average cycle time of the modulus loop
>>>>> combined or checking for digit overflow during the loop.
>>>>> 
>>>>> I think you could use code that looks something like the following which
>>>>> is based off Mr. Mikrut's most recent posted code. (warning: mailer
>>>>> code™
>>>>> follows so it may have errors)
>>>>> 
>>>>>        ORG $1200
>>>>> CCD     RMB 16
>>>>> RESULT  RMB 1
>>>>> START   LEAX CCD+16,PCR
>>>>>        CLRA
>>>>>        LDB #8
>>>>> LOOP    PSHS A
>>>>>        LDA ,-X
>>>>>        LSLA
>>>>>        CMPA #10
>>>>>        BLO LOOP2
>>>>>        SUBA #9
>>>>> LOOP2   ADDA ,S+
>>>>>        DAA
>>>>>        ADDA ,-X
>>>>>        DAA
>>>>>        DECB
>>>>>        BNE LOOP
>>>>>        ANDA #$0F
>>>>>        STA RESULT,PCR
>>>>> ENDPGM  RTS
>>>>> 
>>>>> I'm using the stack for a temporary storage location instead of
>>>>> something
>>>>> PCR relative for code size reasons. You could use the "RESULT variable
>>>>> for
>>>>> the temporary to eliminate stack usage. That would probably be slightly
>>>>> faster at the expense of two more code bytes. This is one of those
>>>>> size/speed trade-offs.
>>>>> 
>>>>> DAA has to be used after every addition and only applies to A. Using BCD
>>>>> means we can eliminate the mod 10 loop and just mask off the upper digit
>>>>> (BCD stores two decimal digits in a byte). That gives a constant time
>>>>> for
>>>>> the "mod 10" result and also only takes 2 bytes (and 2 cycles).
>>>>> 
>>>>> I have also eliminated the STATUS variable and just store the result.
>>>>> You
>>>>> can test RESULT for non-zero trivially so there's no need for a separate
>>>>> STATUS value.
>>>>> 
>>>>> By my calculation, this version is 32 bytes, requires 1 byte of stack
>>>>> space, 17 bytes of data space, and runs in a maximum of 351 cycles (and
>>>>> a
>>>>> minimum of 336 cycles if none of the doubled digits goes above 9). For
>>>>> this
>>>>> analysis, I've assumed 8 bit offsets for the PCR references. 16 bit
>>>>> offsets
>>>>> in PCR mode are quite a bit more expensive (4 extra cycles and 1 extra
>>>>> byte).
>>>>> 
>>>>> 
>>>>> --
>>>>> Coco mailing list
>>>>> Coco at maltedmedia.com
>>>>> https://pairlist5.pair.net/mailman/listinfo/coco
>>>>> 
>>>>> 
>>>> 
>>> 
>>> --
>>> Coco mailing list
>>> Coco at maltedmedia.com
>>> https://pairlist5.pair.net/mailman/listinfo/coco
>>> 
>> 
>> 
> 
> -- 
> Coco mailing list
> Coco at maltedmedia.com
> https://pairlist5.pair.net/mailman/listinfo/coco