linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* FYI: RAID5 unusably unstable through 2.6.14
@ 2006-01-17 19:35 Cynbe ru Taren
  2006-01-17 19:39 ` Benjamin LaHaise
                   ` (6 more replies)
  0 siblings, 7 replies; 44+ messages in thread
From: Cynbe ru Taren @ 2006-01-17 19:35 UTC (permalink / raw)
  To: linux-kernel


Just in case the RAID5 maintainers aren't aware of it:

The current Linux kernel RAID5 implementation is just
too fragile to be used for most of the applications
where it would be most useful.

In principle, RAID5 should allow construction of a
disk-based store which is considerably MORE reliable
than any individual drive.

In my experience, at least, using Linux RAID5 results
in a disk storage system which is considerably LESS
reliable than the underlying drives.

What happens repeatedly, at least in my experience over
a variety of boxes running a variety of 2.4 and 2.6
Linux kernel releases, is that any transient I/O problem
results in a critical mass of RAID5 drives being marked
'failed', at which point there is no longer any supported
way of retrieving the data on the RAID5 device, even
though the underlying drives are all fine, and the underlying
data on those drives almost certainly intact.

This has just happened to me for at least the sixth time,
this time in a brand new RAID5 consisting of 8 200G hotswap
SATA drives backing up the contents of about a dozen onsite
and offsite boxes via dirvish, which took me the better part
of December to get initialized and running, and now two weeks
later I'm back to square one.

I'm currently digging through the md kernel source code
trying to work out some ad-hoc recovery method, but this
level of flakiness just isn't acceptable on systems where
reliable mass storage is a must -- and when else would
one bother with RAID5?

I run a RAID1 mirrored boot and/or root partition on all
the boxes I run RAID5 on -- and lots more as well -- and
RAID1 -does- work as one would hope, providing a disk
store -more- reliable than the underlying drives.  A
Linux RAID1 system will ride out any sort of sequence
of hardware problems, and if the hardware is physically
capable of running at all, the RAID1 system will pop
right back like a cork coming out of white water.

I've NEVER had a RAID1 throw a temper trantrum and go
into apoptosis mode the way RAID5s do given the slightest
opportunity.

We need RAID5 to be equally resilient in the face of
real-world problems, people -- it isn't enough to
just be able to function under ideal lab conditions!

A design bug is -still- a bug, and -still- needs to
get fixed.

Something HAS to be done to make the RAID5 logic
MUCH more conservative about destroying RAID5
systems in response to a transient burst of I/O
errors, before it can in good conscience be declared
ready for production use -- or at MINIMUM to provide
a SUPPORTED way of restoring a butchered RAID5 to
last-known-good configuration or such once transient
hardware issues have been resolved.

There was a time when Unix filesystems disintegrated
on the slightest excuse, requiring guru-level inode
hand-editing to fix.  fsck basically ended that,
allowing any idiot to successfully maintain a unix
filesystem in the face of real-life problems like
power failures and kernel crashes.  Maybe we need
a mdfsck which can fix sick RAID5 subsystems?

In the meantime, IMHO Linux RAID5 should be prominently flagged
EXPERIMENTAL -- NONCRITICAL USE ONLY or some such, to avoid
building up ill-will and undeserved distrust of Linux
software quality generally.

Pending some quantum leap in Linux RAID5 resistance to
collapse, I'm switching to RAID1 everywhere:  Doubling
my diskspace hardware costs is a SMALL price to pay to
avoid weeks of system downtime and rebuild effort annually.
I like to spend my time writing open source, not
rebuilding servers. :)   (Yes, I could become an md
maintainer myself.  But only at the cost of defaulting
on pre-existing open source commitments.  We all have
full plates.)

Anyhow -- kudos to everyone involved:  I've been using
Unix since v7 on PDP-11, Irix since its 68020 days,
and Linux since booting off floppy was mandatory, and
in general I'm happy as a bug in a rug with the fleet
of Debian Linux boxes I manage, with uptimes often exceeding
a year, typically limited only by hardware or software
upgrades -- great work all around, everyone!

Life is Good!

 -- Cynbe




^ permalink raw reply	[flat|nested] 44+ messages in thread
* RE: FYI: RAID5 unusably unstable through 2.6.14
@ 2006-02-03 17:00 Salyzyn, Mark
  2006-02-03 17:39 ` Martin Drab
  2006-02-03 19:46 ` Phillip Susi
  0 siblings, 2 replies; 44+ messages in thread
From: Salyzyn, Mark @ 2006-02-03 17:00 UTC (permalink / raw)
  To: Martin Drab, Phillip Susi
  Cc: Bill Davidsen, Cynbe ru Taren, Linux Kernel Mailing List

Martin Drab [mailto:drab@kepler.fjfi.cvut.cz] sez:
> That may very well be true. I do not know what the Adaptec 
> BIOS does under the "Low-Level Format" option. Maybe someone from
Adaptec 
> would know that.

The drive is low level formatted. This resolved the problem you were
having.

> No, I don't think this was the case of a physically bad 
> sectors. I think it was just an inconsistency of the RAID controllers
metadata (or 
> something simillar) related to that particular array.

It was a case of a set of physically bad sectors in a non-redundant
formation resulting in a non-recoverable situation, from what I could
tell. Read failures do not take the array offline, write failures do.
Instead the adapter responds with a hardware failure to the read
responses. Writing the data would have re-assigned the bad blocks. (RAID
controllers do reassign media bad blocks automatically, but sets them as
inconsistent under some scenarios, requiring a write to mark them
consistent again. This is no different to how single drive media reacts
to faulty or corruption issues).

The bad sectors were localized only affecting the Linux partition, the
accesses were to directory or superblock nodes if memory serves. Another
system partition was unaffected because the errors were not localized to
it's area.

Besides low level formatting, there is not much anyone can do about this
issue except ask for a less catastrophic response from the Linux File
system drivers. I make no offer or suggestion regarding the changes that
would be necessary to support the system limping along when file system
data has been corrupted; UNIX policy in general is to walk away as
quickly as possible and do the least continuing damage.

Except this question: If a superblock can not be read in, what about the
backup copies? Could an fsck play games with backup copies to result in
a write to close inconsistencies?

-- Mark Salyzyn

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2006-02-09 17:06 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-17 19:35 FYI: RAID5 unusably unstable through 2.6.14 Cynbe ru Taren
2006-01-17 19:39 ` Benjamin LaHaise
2006-01-17 20:13   ` Martin Drab
2006-01-17 23:39     ` Michael Loftis
2006-01-18  2:30       ` Martin Drab
2006-02-02 20:33     ` Bill Davidsen
2006-02-03  0:57       ` Martin Drab
2006-02-03  1:13         ` Martin Drab
2006-02-03 15:41         ` Phillip Susi
2006-02-03 16:13           ` Martin Drab
2006-02-03 16:38             ` Phillip Susi
2006-02-03 17:22               ` Roger Heflin
2006-02-03 19:38                 ` Phillip Susi
2006-02-03 17:51             ` Martin Drab
2006-02-03 19:10               ` Roger Heflin
2006-02-03 19:12                 ` Martin Drab
2006-02-03 19:41                   ` Phillip Susi
2006-02-03 19:45                     ` Martin Drab
2006-01-17 19:56 ` Kyle Moffett
2006-01-17 19:58 ` David R
2006-01-17 20:00 ` Kyle Moffett
2006-01-17 23:27 ` Michael Loftis
2006-01-18  0:12   ` Kyle Moffett
2006-01-18 11:24     ` Erik Mouw
2006-01-18  0:21   ` Phillip Susi
2006-01-18  0:29     ` Michael Loftis
2006-01-18  2:10       ` Phillip Susi
2006-01-18  3:01         ` Michael Loftis
2006-01-18 16:49           ` Krzysztof Halasa
2006-01-18 16:47         ` Krzysztof Halasa
2006-02-02 22:10     ` Bill Davidsen
2006-02-08 21:58       ` Pavel Machek
2006-01-18 10:54 ` Helge Hafting
2006-01-18 16:15   ` Mark Lord
2006-01-18 17:32     ` Alan Cox
2006-01-19 15:59       ` Mark Lord
2006-01-19 16:25         ` Alan Cox
2006-02-08 14:46           ` Alan Cox
2006-01-18 23:37     ` Neil Brown
2006-01-19 15:53       ` Mark Lord
2006-01-19  0:13 ` Neil Brown
2006-02-03 17:00 Salyzyn, Mark
2006-02-03 17:39 ` Martin Drab
2006-02-03 19:46 ` Phillip Susi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).