[Coco] wanting to patch HPUT routine... (ping RG!)

Sun Aug 16 13:33:29 EDT 2009

----- Original Message ----
From: Robert Gault <robert.gault at worldnet.att.net>
To: CoCoList for Color Computer Enthusiasts <coco at maltedmedia.com>
Sent: Sunday, August 16, 2009 6:57:14 AM
Subject: Re: [Coco] wanting to patch HPUT routine... (ping RG!)

theother_bob wrote:
> I'm experimenting with speeding up the cursor drawing routine in Color FOG. I've found that while I can get faster results with HPUT, it actually looks a bit more "flickery" due to the process...
> 
> Original method was to HPUT a box erasing the old cursor, then HDRAW the cursor in the new location. HDRAW benchmarks @ 200 timer clicks/100 cursor drawing cycles (obviously dependent on complexity of cursor. I use three colors but kept it small and drawn efficiently.)
> 
> Modified routine uses HPUT to put cursor on the screen, but requires more overhead... one HBUFF with a 00/FF mask of the cursor, HPUT with AND option, followed by HPUT real cursor with OR option. Benchmarks at <150 timer clicks/100 double-HPUTs, but looks a little more flickery as a solid black cursor is placed and then the "colored in" cursor in a follow-up HPUT command.
> 
> My goal is to speed up the process to a single HPUT operation. Basically I want to support "transparency" in HPUT by ignoring pixels of palette 0.
> 
> One approach would be to intercept one of the options and make a new HPUT routine that compares pixels to 0, ignoring if 0, basically using PSET option if non 0. I'm thinking this would probably be a semi-complex subroutine compared to my next thought...
> 
> I suppose an easier way would be to intercept one of the options. Maybe HPUT the mask but hijack the exit from AND at $EF04 and then (without returning to Basic) increment the HBUFF number, set the OR option flag and re-run the same HPUT command (reset X,Y locations?). This would just require two HBUFFers, mask and cursor, to be defined adjacently.
> 
> Thoughts? Coding help?
> 
> TIA,
> Bob
> 

The ROM code for HPUT is fairly complex. Modifying it to include more options can only make it run slower.
The only thing I can vaguely remember is that keeping HPUT boundaries exactly equal to bytes lets the routine run faster because the conversion of pixels to bytes is faster.

If the above is true, I'd suggest trying to ensure that for the particular HSCREEN in use, your HBUFF and HPUT routines use rectangles which are exact byte multiples. So for example, if you are in HSCREEN2 - a 16 color mode - your HPUT box should be x*2pixels wide. Put another way, in HSCREEN2 you should find:
10 HPUT (0,0)-(6,5),1,OR    0,1,2,3,4,5,6    3.5 bytes wide
is slower than
10 HPUT (0,0)-(7,5),1,OR    0,1,2,3,4,5,6,7  4.0 bytes wide
even though the second is larger.
>
I did try a brief test of the above commands timed in a FOR/NEXT loop of  1 to 10000. I did not use TIMER but a real time clock. Each loop took 20 seconds for no discernible difference.
>
Want to post the exact code you ran tests on, so we can see how it works for us?

>

While I agree that the mod I'm thinking of would make the routine slower, my thinking is that an ML routine running *itself* in two passes would be faster than Basic running the ML routine twice. I do see that I need to be more careful in planning my sizes and locations. I've already thought of modifying the ML joystick routine to crop and scale the numbers... I can force it to even numbers while I'm at it.

Basically I'm using Color FOG as a testbench in Kiel's CC3 emulator. You have to run FOG.BAS first to get the GUI up and running. Press Q, Brk or X to quit the program (Pressing Q is quicker as it skips the "are you sure?" dialog.)
Line 128 is where the cursor background is saved and a new cursor drawn...

128 HGET(CX-6,CY)-(CX,CY+4),1:HDRAWD$(0)  

To benchmark this routine I do this...

128 TIMER=0:FORCX=1TO100:HGET(CX-6,CY)-(CX,CY+4),1:HDRAWD$(0):NEXT:TI=TIMER:K$="Q":GOTO131

The timer is saved in variable TI as soon as the loop is done. The last part simulates pressing Q to quit the program as soon as the loop is done... kind of important. As soon as I'm at a prompt I issue ?TI to see how long it took. I tried it without changing CX also, but I didn't try it with all odd vs all even values of CX. Repeated runs came out in the high 190s.

For the HPUT test, I had to create more HBUFFers and "populate" them, but you can skip that and just HPUT the same buffer just grabbed. You won't see anything this way, but the timing comes out the same, around 146.

128 TIMER=0:FORCX=10 TO 110:HGET(CX-6,CY)-(CX,CY+4),1:HPUT(CX-6,CY)-(CX,CY+4),1,AND:HPUT(CX-6,CY)-(CX,CY+4),1,OR:NEXT:TI=TIMER:K$="Q":GOTO131

Even though this line is bigger and more coplex, it finishes the loop about 25% faster.  I suspect that by fine-tuning HPUT I could get closer to a 50% improvement in cursor drawing speed, or maintain current speeds with a larger cursor.

SO many ideas... so little time to try them all!
Bob