[Coco] CCASM, CASM, MAMOU, EDTASM

Sun May 20 14:47:11 EDT 2007

I wanted to say a few things about the recent discussion of the odd 
,pcr and ,pc behaviors with the different available assemblers and 
cross-assemblers out there.

I did serious testing in the addressing mode department with my CCASM 
cross assembler with a goal to produce tighter code where EDTASM and 
other assemblers lacked the ability to do so.  I assembled the same 
sample source using various cross assemblers and although they all 
ran fine, CCASM sometimes produced tighter code.

How is this possible?  There are several examples like:

LDA 0,X
LDA ,X
LDA <0,X
LDA >0,X
LDA <LABEL,X
LDA <LABEL,PCR
LDA LABEL,PCR
LDA >LABEL,PCR

... which obviously can be coded in slightly different ways that will 
yield an extra byte of code if no optimization is considered.  In my 
opinion, the < directive should attempt to create the smallest offset 
of a direct value or label value.  Some assemblers only produce an 
8-bit offset if you use <.  CCASM goes goes for the 5-bit offset mode 
first then steps up to 8, then 16 bits automatically based on the 
offset RANGE.  The offset ranges from negative to positive, and both 
far extremes must fit into the operand.

If CCASM is adding or subtracting bytes from an direct-value offset, 
I'll have to check this out and see if it's actually a bug, but 
referenced labels will produce offsets based on how far they are from 
a certain byte of the complete instruction.  If a 5-bit or 8-bit 
offset is found to be possible in pass 1 or pass 2 of the source 
code, then the instruction size could change since the referenced 
label *could* now be 1 byte more or less away from the offset 
operand.  Since the operand IS the offset from the label, it seems 
almost impossible to assume anything but 8 or 16 bits in pass 1 only, 
then pass 2 won't have to worry about phasing errors.  Now, if you 
force an operand size using < or > but CCASM attempts to use 5 or 8 
bits for <, then pass 2 has to play a part in this.  If you think 
this is confusing, try writing the code that deals with it!

For an example, run the CCASM test source included with the Rainbow 
IDE and examine the listing to see if a 5 or 8-bit offset was 
produced.  Furthermore, modify the code to reference text messages 
that you can print to the VDG screen and see if the complete string 
is printed, proving that the offset was computed correctly.  Try 
placing the text above then below the printing code and then move it 
further away (up or down) to watch how the opcodes change.  If you 
find that there is an exact point that the produced offset/operand is 
incorrect, I'd like to know this.  I think I tried this many times in 
my tests and came to the conclusion that the system was working.

Another note: assemblers on other platforms such as the PC all 
produce different code.  No two assemblers are alike.  Some optimize 
better, and some do not.  But if the resulting code runs and the 
program works, then there is no real problem.  CCASM tries to 
optimize some of the code so there is a claim in my docs that CCASM 
can produce tighter and faster code than EDTASM.  With a 1mhz CoCo, 
this is the idea.

I also found better ways to code some of the 6309 
instructions.  Since there are different and varying forms of 
documentation and ideas of how the 6309 instructions should be 
represented (and that I don't agree with them all), I took it upon 
myself to do a few things my way.

One thing that some may not like, but I can always change things, is 
the block-copy instructions, which are seriously confusing in the way 
they are represented such as TFRP, TFRR, etc. which gives no clue to 
what is going to happen unless you find the docs and study them.  I 
realize most people try to skip the docs and just want things to work 
in the easiest way, so I created the explode, impode, copy- and copy+ 
instructions:

This may seem confusing as well, but since the instruction names 
describe what is happening, I thought it was cool... :)

implode r0,r1  takes a series of data bytes starting at the source 
address pointed to by (r0) and blasts it into address r1.  How many 
iterations?  The W register holds that value.  What use is there for 
this?  Maybe a DMA mode for some kind of hardware product like a HD 
controller perhaps?  PCM audio?  You decide.

explode r0,r1  takes the data byte pointed to by the source address 
(r0) and blasts it into a series of addresses starting at r1.  W 
holds the iterations.  What use is there for this?  Clearing screens 
and buffers perhaps.

copy+ r0,r1  copies all of the bytes pointed to starting at address 
r0 into r1 while incrementing both addresses.  This is the preferred 
block copy instruction.  W holds the # of bytes to copy.

copy- r0,r1  does the same kind of block copy but in REVERSE.  What 
use is there for this?  Maybe not any, but if the data can be seen on 
the screen that you're copying then there *might* be a desire to make 
it appear in the direction of the raster beam, or in the opposite 
direction, but that's really far out there and may seem crazy.

Again, 'implode' blasts a range of data bytes into one address.
'explode' blasts the same data byte into a range of addresses.
'copy' does just what it implies as well and copies a chunk of memory 
into another location.

-- 
Roger Taylor