[Coco] 6309 MULD real and emulators

Gene Heskett gheskett at shentel.net
Mon Oct 28 03:57:51 EDT 2019


On Monday 28 October 2019 02:41:33 Walter Zambotti wrote:

> And I managed to get muld to perform the 32 variable (0-31+) bit
> shift.
>
> The function takes about 197 cycles regardless of number of bits. 
> Except for the special case of 16 bits where it takes 99 cycles.
>
One thing I always do when building a c function that involves bit 
twidding, is check how far because the existing c library does it one 
bit at a time in a loop.  So I stopped the compile at the output of 
c.pass2 to inspect the generated code. After editing in the differences, 
the compile was resumed to generate the final binary.

I always checked the how far count, and if over 8, subtracted 8, the did 
a tfr a,b clear a ,if to the right, and a tfr b,a clear b if to the 
left.

This resulted in bit shifts a lot faster while still being 100% correct, 
and I see no reason that it couldn't be applied to regs.e & f to 
accomplish exactly the same thing for a 32 bit operation.

The last version of rzsz-3.36 I built and you may have was so hand 
optimized, gaining around 100 cps in speed. 

How that might compare to what you are doing here, I've no clue. However,
reading this more carefully, it looks  pretty good.

> It confirms to OS9 C ABI stack and result passing convention.
>
> /*#include <stdio.h>*/
>
> unsigned long shfl32(val, shft)
> unsigned long val;
> short shft;
> {
>   return val<<shft;
> }
>
> unsigned long shfl32a(val, shft)
> unsigned long val;
> short shft;
> {
> #asm
> * 10,s pointer to long result
> * 4,s 4 byte value
> * 8,s 2 byte shift
> * d = shift amount
> * x = pointer to result
>   ldx 10,s
>   ldd 8,s
> * if shift amount is greater than 31 then
> * just return zero
>   cmpd #32
>   blt _10x
>   ldq #0
>   stq 4,s
>   bra _13x
> * if shift amount is greater than 16 than
> * move bottom word of value into top word
> * and clear bottom word
> _10x
>   cmpb #16
>   blt _1x
>   ldu 6,s
>   stu 4,s
>   clr 6,s
>   clr 7,s
> _1x
> * setup pointer u and offset e into mult table _2x
>   leau _2x,pc
>   andb #15
> * if there is no shift value just return value
>   beq _13x
>   aslb * need to double shift to use as word table offset
>   stb 8,s     * save double shft
>   tfr b,e
> * shift top word q = val.word.high * multtab[shft]
>   ldd 4,s
>   muld e,u
>   stw ,x * result.word.high = low word of mult
> * shift bottom word q = val.word.low * multtab[shft]
>   lde 8,s     * reload double shft
>   ldd 6,s
>   muld e,u
>   stw 2,x     * result.word.low = low word of mult
> * The high word or mult needs to be corrected for sign
> * if val is negative then muld will return negated results
> * and need to un negate it
>   lde 8,s     * reload double shift
>   tst 4,s     * test top byte of val for negative
>   bge _11x
>   addd e,u    * add the multtab[shft] again to top word
> _11x
> * if multtab[shft] is negative (shft is 15 or shft<<1 is 30)
> * also need to un negate result
>   cmpe #30
>   bne _12x
>   addd 6,s    * add val.word.low to top word
> _12x
> * combine top and bottom and save bottom half of result
>   ord ,x
>   std ,x
>   bra _14x
> * this is only reached if the result is in value (let result = value)
> _13x
>   ldq 4,s     * load value
>   stq ,x      * result = value
> _14x
>   puls u,pc
> _2x fdb $01,$02,$04,$08,$10,$20,$40,$80,$0100,$0200,$0400,$0800
>    fdb $1000,$2000,$4000,$8000
> #endasm
> }
>
> unsigned long val, val1, val2;
> short shft;
>
> int main(argc, argv)
> int argc;
> char *argv[];
> {
>   /*long val, val1, val2;
>   short shft;*/
>   unsigned long dummy = 0;
>   /*long (*shftstfunc)(long, short);*/
>
>   pflinit();
>
>   sscanf(argv[1], "%D", &val);
>   shft = (short)atoi(argv[2]);
>   /* val = 1; shft = 1;*/
>   printf("%lx %d\n", val, shft);
>   val1 = shfl32(val, shft);
>   val2 = shfl32a(val, shft);
>   printf("%lx\n", val1);
>   printf("%lx\n", val2);
>   return 0;
>   /*
>   shft = (short)atoi(argv[1]);
>
>   if(shft == 1)
>   {
>      printf("shfl32\n");
>      shftstfunc = shfl32;
>   }
>   else
>   {
>      printf("shfl32a\n");
>      shftstfunc = shfl32a;
>   }
>   */
>
>   for(val = 1 ; val < 1000000 ; val++)
>   {
>     for(shft = 0 ; shft < 32 ; shft++)
>     {
>       /*printf("%lx ", shfl32(val, shft));*/
>       /*printf("%lx\n", shfl32a(val, shft));*/
>       val1 = 0 + shfl32(val, shft);
>       val2 = 0 + shfl32a(val, shft);
>       /*printf("%lx ", val1);*/
>       /*printf("%lx\n", val2);*/
>       if (val1 != val2)
>       {
>         printf("%lx %d = %lx %lx\n", val, shft, val1, val2);
>       }
>     }
>   }
> }
>
>
> -----Original Message-----
> From: Coco [mailto:coco-bounces at maltedmedia.com] On Behalf Of Walter
> Zambotti Sent: Friday, 25 October 2019 2:04 PM
> To: 'CoCoList for Color Computer Enthusiasts' <coco at maltedmedia.com>
> Subject: Re: [Coco] 6309 MULD real and emulators
>
> Robert
>
> On OVCC it has already been correct in version 1.1.
>
> In my recent 6309 emulator rewrite in X86 assembly I added all the
> missing ops and corrected some other 6309 ops that I thought were not
> correct.
>
> I also did this in the C version.  The C version should be backwards
> portable to VCC with very little effort.
>
> Walter
>
> Here is the OVCC muld C code
>
> void Muld_M(void)
> { //118F Phase 5 6309
> 	Q_REG =  (signed short)D_REG * (signed short)IMMADDRESS(PC_REG);
> 	cc[C] = 0;
> 	cc[Z] = ZTEST(Q_REG);
> 	cc[V] = 0;
> 	cc[N] = NTEST32(Q_REG);
> 	PC_REG+=2;
> 	CycleCounter+=28;
> }
>
> -----Original Message-----
> From: Coco [mailto:coco-bounces at maltedmedia.com] On Behalf Of Robert
> Gault Sent: Friday, 25 October 2019 10:47 AM
> To: CoCoList for Color Computer Enthusiasts <coco at maltedmedia.com>
> Subject: [Coco] 6309 MULD real and emulators
>
> There was a question posted about the 6309 opcode MULD. That is a
> multiplication of the content of regD with Immediate, Direct,
> Extended, or Indexed numbers. What makes it different from the opcode
> MUL is that MULD is a signed multiplication.
>
> However, be warned that while for a real 6309, and the MAME/MESS
> emulator MULD is signed, it is unsigned with VCC v2.0.1. VCC should be
> corrected! ex.
>   real 6309
>   ldd #$8001
>   muld #$8001
>   regQ = $3FFF0001
>
>   VCC
>   ldd #$8001
>   muld #$8001
>   regQ = $40010001     Correct if the multiplication was unsigned.
>
> You can get the same $3FFF0001 answer with real 6309
>   ldd #$7FFF
>   muld #$7FFF
> regQ = $3FFF0001
>
> Now since $10000-$7FFF=$8001 the above signed math makes sense as
> $8001=-$7FFF.
>
> Robert
>
> --
> Coco mailing list
> Coco at maltedmedia.com
> https://pairlist5.pair.net/mailman/listinfo/coco
>
>
> --
> Coco mailing list
> Coco at maltedmedia.com
> https://pairlist5.pair.net/mailman/listinfo/coco


Cheers, Gene Heskett
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis
Genes Web page <http://geneslinuxbox.net:6309/gene>


More information about the Coco mailing list