[Coco] Nitros9 & Mess

Mon Feb 2 14:25:24 EST 2004

Bill Nobel wrote:

> Through much debugging though I have discovered that the error is not 
> RBF in any way.  I have tested it fully on all RBF devices and found 
> that the bug only exists in the virtual drive system (DSK and VHD 
> devices).  The Ram drives do not have this problem.  I have tried to 
> make the ram drive fail, but to no avail.

Do you base this conclusion on the fact that RAM drives have no 
problem?  Or is there more to it?

When I was debugging through NitrOS-9, I found that the orders to write 
invalid data into the sectors camefrom RBF, and cc3disk and the below 
layers in MESS seemed to be faithfully carrying out that order.  It is 
also possible that the true bug is elsewhere, and somehow it causes RBF 
to write the invalid data.  On the other hand, I have no explaination as 
to why the RAM drives do not have the problem.

> I have traced each of these routines to find that these sources have 
> the file handling routines:
>
>  formats/coco_dsk.c               DSK (No trace of these functions 
> being used)
>  devices/basicdsk.c                 DSK handling (virtual only)
>  devices/coco_vhd.c               VHD handling
>  src/fileio.c                            Mame file handling (DSK and 
> VHD calls come here)
>  src/windows/fileio.c               Base OS file handling (called from 
> above)

There is actually a lot more to it, but this is a reasonable summary.  
The functions in formats/coco_dsk.c get called when a disk image is 
mounted and are used to do stuff like decode the geometry etc.

> Below you can see the error in action.  I started with a blank VHD and 
> built 6 zero length files, all is fine.  on the 7th build it needs to 
> expand the directory to another sector and this is what happens:
>
> * read original file descriptor of Directory
> vhd seek: 0000B5 0000B500
> read dump: 00FD00
> BF 00 00 04 02 01 16 19 02 00 00 01 00 00 00 00
> 00 00 B6 00 07 00 00 00 00 00 00 00 00 00 00 00
> .....(extra cut)
>
> * write file descriptor (file size expanded by 32 bytes)
> vhd seek: 0000B5 0000B500
> write dump: 00FD00
> BF 00 00 04 02 01 16 19 02 00 00 01 20 00 00 00
> 00 00 B6 00 07 00 00 00 00 00 00 00 00 00 00 00
> ....(extra cut)
>
> * read empty directory sector
> * This should be all 00's on a freshly formatted disk
> * Doing a check with a windows file editor, it is all 00's before 
> occurance
> vhd seek: 0000B7 0000B700
> read dump: 00FD00
> 2E AE 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 B5
> AE 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 B5
> E1 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 BD
> ....(extra cut)
>
> * update and write directory entry (from this point the directory is 
> now trashed)
> vhd seek: 0000B7 0000B700
> write dump: 00FD00
> E7 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 C3
> AE 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 B5
> E1 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 BD
> ....(extra cut)

Because I have not looked at this bug in a while and I am an utter 
neophyte when it comes to OS-9, I am not sure how to interpret these 
results.

>  The only place I can see the bug existing (if any) is the 
> windows/fileio.c, as basicdsk.c and coco_vhd.c just pass the 
> parameters down the tree to these functions: osd_fseek(), osd_fread() 
> and osd_fwrite().  These functions manipulate the buffer/file to 
> update files.
>
>  Now the bad news,  RS-DOS uses these same routines for disk access.  
> RS-DOS has to my knowledge no problems.

I do not think that the bug exists in windows/fileio.c, simply because 
these functions are very low level and are shared by all of the other 
MESS and MAME drivers.  Not only does RS-DOS not have problems, but 
earlier NitrOS-9 versions do not have problems either.

When I was working on this four months ago, I noticed that RBF (when run 
under MESS) was issuing the orders to mangle the appropriate sectors, 
and MESS was simply faithfully executing these orders.  Obviously at 
some level this must be a bug in the emulation for the simple reason 
that the same code works just dandy on a real CoCo.  Have you tried 
using the MESS debugger to debug the emulated CoCo and stepped through 
RBF to try to observe this?

Another approach would be do test this against a VHD (so you can bypass 
all of the ugly FDC crud) and hack in debugging code into 
devices/coco_vhd.c.  This should show that the invalid writes seem to 
originate from the emulated CoCo.