All of lore.kernel.org
 help / color / mirror / Atom feed
From: keld@keldix.com
To: Rudy Zijlstra <rudy@grumpydevil.homelinux.org>
Cc: Dark Penguin <darkpenguin@yandex.ru>, linux-raid@vger.kernel.org
Subject: Re: Why not just return an error?
Date: Fri, 7 Oct 2016 11:30:05 +0200	[thread overview]
Message-ID: <20161007093005.GB14682@www5.open-std.org> (raw)
In-Reply-To: <0d0109f4-2c03-484c-9f70-008a7a1a0d67@grumpydevil.homelinux.org>

On Fri, Oct 07, 2016 at 10:21:26AM +0200, Rudy Zijlstra wrote:
> 
> 
> Op 07-10-16 om 07:26 schreef keld@keldix.com:
> >On Fri, Oct 07, 2016 at 02:32:40AM +0300, Dark Penguin wrote:
> >>Greetings!
> >>
> >>The more I read about md-raid, the more I notice that the biggest
> >>problem of it: if you hit an error on a degraded RAID, it falls apart.
> >>Because of this, it is possible to lose a huge amount of data due to one
> >>tiny read error, which particularly makes raid5 the sword of Damocles.
> >>
> >>But one question keeps me increasingly frustrated. Yes, during its
> >>normal functioning, it totally makes sense to kick a faulty device out
> >>of an array. But if we're running a degraded array, and doing so will
> >>definitely result is massive data loss, why not just return a read error
> >>instead? Just add a little check: on error, if degraded -> then just
> >>return an error. I believe this is the dream of everyone who had ever
> >>dealt with RAIDs.
> >>
> >>With RAID, the first proprity is keeping data safe. Yes, it's not an
> >>alternative to backups and all that, but still - if we hit an error on a
> >>degraded array, the array should scream and panic and send all kinds of
> >>warnings, but definitely NOT collapse and warrant a visit to the RAID
> >>recovery laboratory (or this mailing list). Imagine how much headache
> >>and lost hair would that relieve!..
> >>
> >>Now, I'm probably not the first one to think of such a bright idea. So
> >>there must be a very good reason why this is not possible; I don't think
> >>the problem is just that "the existing behaviour is preferred, and
> >>anyone who does not agree is an idiot". If not for enterprise use, then
> >>at least it would be very useful for the "home archive" scenario when
> >>"uptime" and "absense of errors" hold much less meaning than "losing one
> >>file and not all the data". So, why is this not possible?..
> >Likewise, when the first disk fails, one could mark it as kind of in an 
> >error state,
> >and keep it running, and if one gets a read error, then you could get
> >the data from the good disks.
> >
> >Often read errors can be remedied by writing data to the failing disk.
> >The good data could then be obtained from the good parts of the array.
> >
> >This behaviour could be optional and could even be set during operation.
> >
> >Best regards
> >keld
> >--
> >To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> One big reason is human behaviour. And it is human behaviour that in the 
> end causes all the collapsed raids. I have lost count how often i have 
> seen requests for help once the raid had collapsed. But the earlier 
> signal, where the RAID had become degraded was ignored. This means that 
> if you only give an error message and continue going you will -- most 
> likely in increasing rate -- have errors in the files. Very quickly it 
> will become impossible to state which file is correct and which is not. 
> Essentially you have lost at that point all information with NO ability 
> to recover. Unless you have a backup....
> 
> That is one of the big reasons the behaviour is as it is. RAID is 
> intented to guarantee the consistency and correctness of the stored 
> data. When this becomes impossible, the only way out is to clearly 
> signal this. Even a collapsed RAID has  more consistent data (although 
> it takes effort to recover) then a corrupted RAID which would be the 
> result of your proposal. The corruption resulting from your proposal 
> above CANNOT be recovered.

I believe you are incorrect. As long as it is marked which parts of the array
that are in error, we know which data is good.  Of cause some data may be unobtainable,
but this may well be just a few files, and the rest will be good. A much better result
than all data being lost!

Anyway this could be an optional feature, so that it can be chosen or not be chosen.

Best regards
keld

  reply	other threads:[~2016-10-07  9:30 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-06 23:32 Why not just return an error? Dark Penguin
2016-10-07  5:26 ` keld
2016-10-07  8:21   ` Rudy Zijlstra
2016-10-07  9:30     ` keld [this message]
2016-10-07 11:21 ` Andreas Klauer
2016-10-07 14:43   ` Phil Turmel
2016-10-07 16:23     ` Dark Penguin
2016-10-07 16:52       ` Phil Turmel
2016-10-07 17:44         ` Dark Penguin
2016-10-07 18:41           ` Phil Turmel
2016-10-07 20:39             ` Dark Penguin
2016-10-07 23:11             ` Edward Kuns
2016-10-10 20:47           ` Anthony Youngman
2016-10-10 21:37             ` Andreas Klauer
2016-10-10 21:55               ` Wols Lists
2016-10-11  4:00                 ` Brad Campbell
2016-10-11  9:18                   ` Wols Lists
2016-10-11 10:01                     ` Brad Campbell
2016-10-11 10:15                       ` Wols Lists
2016-10-10 22:10             ` Wakko Warner
2016-10-07 14:19 ` Phil Turmel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161007093005.GB14682@www5.open-std.org \
    --to=keld@keldix.com \
    --cc=darkpenguin@yandex.ru \
    --cc=linux-raid@vger.kernel.org \
    --cc=rudy@grumpydevil.homelinux.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.