[Coco] OT: Help with horrible bad NAS purchase, Linux EXT3 file system blindly used.

William Astle lost at l-w.ca
Sun Dec 6 03:42:37 EST 2009


Stephen H. Fischer wrote:
> Is the Mirror software part of Linux EXT3 or must it be Maxtor / 
> Seagate's code? The web interface surely is. My only knowledge of Linux 
> is that it is something like OS-9.

On all modern and most not-so-modern systems, the actual mirroring/RAID 
operations are completely independent of the file system. The 
RAID/mirroring subsystem presents an interface to the operating system 
that looks just like a regular block device.

File systems operate on top of block devices. In the case of a mirror 
configuration, it would be operating on top of the block device 
presented by the mirroring/RAID subsystem.

Thus, EXT3 probably never saw the error that the NAS device reported. In 
fact, it was probably never bothered by it at all. The mirroring 
subsystem would have handled all of that confusion transparently, 
assuming correct operation.

Now for a few comments on EXT3. I operate servers for a living. I have 
been using EXT3 on them for many years without issues. EXT3 is quite 
reliable. In almost every case when EXT3 went wrong, it was because the 
underlying storage medium (hard drive, raid array, etc.) failed. In 
every other case, it was bad memory in the server. EXT3 is not the best 
file system ever created but it works quite well. Of course, if the 
implementation is bad, all bets are off, but assuming a recent or even 
not so recent linux kernel, the implementation is good.

As far as whether the mirroring is Maxtor/Seagate's gimmick, Linux doing 
it, or something else altogether, that is impossible to say without 
knowing how the NAS is built. It could be a hardware gimmick doing it 
(ideal) but unlikely for cost reasons. It could be some proprietary 
Seagate thing but that, too, is unlikely simply because it would be more 
expensive to do that. They may have Linux doing the mirror but that 
seems unlikely as well since mirroring in the Linux kernel does not 
handle ECC errors in a useful manner. Most likely, they're using some 
sort of "software raid card" which provides some assistance for the 
mirroring but the driver does most of the work. (The same possibilities 
apply if they're using a different OS inside the NAS (Windows, BSD, etc.))

In any event, the mirroring bit is not part of EXT3. EXT3 doesn't even 
know what mirroring is so it is almost 100% certain that EXT3 has 
nothing to do with the problem.

Also, as has been said, current EXT3 implementations do not manage the 
bad block list automatically. It is usually managed via fsck and/or the 
badblocks program.

Most likely, in the NAS, no errors will percolate up to the EXT3 file 
system unless both drives fail simultaneously.

I'll close with a note about what could be causing problems that allow 
rebuilds to succeed:

One drive may be having trouble reading a block. If it gives up on a 
block and remaps it before it is written to, it will read as all zeroes. 
That will amost certainly corrupt data. If the NAS is doing things 
right, it will detect a CRC error and attempt to correct it. It seems 
this NAS is fairly naïve about it and simply rebuilds the mirror, which 
is a valid option. The same thing could happen if the drive eventually 
reads the sector but doesn't detect it read incorrectly but in the 
latter case, there will likely be a subsequent failure and you'll likely 
see additional "failures". One of these two is the most likely case. As 
disconcerting as it is, it is not necessarily a problem for it to happen 
once in a while - that is expected operation for most drives. If it 
happens regularly, then you likely have a drive that is "weak" and 
probably should not be trusted.

That's enough off-topic rambling. Hopefully it's useful to some folks.

-- 
William Astle
lost at l-w.ca




More information about the Coco mailing list