All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wols Lists <antlists@youngman.org.uk>
To: Chris Murphy <lists@colorremedies.com>
Cc: Linux-RAID <linux-raid@vger.kernel.org>
Subject: Re: Fault tolerance with badblocks
Date: Wed, 10 May 2017 05:49:05 +0100	[thread overview]
Message-ID: <59129BC1.2090005@youngman.org.uk> (raw)
In-Reply-To: <CAJCQCtR0y-g+8dKC2-fykmFnOPKMA5YQs-Hku67c79SMECQrEg@mail.gmail.com>

On 10/05/17 04:53, Chris Murphy wrote:
> On Tue, May 9, 2017 at 1:44 PM, Wols Lists <antlists@youngman.org.uk> wrote:
> 
>>> This is totally non-trivial, especially because it says raid6 cannot
>>> detect or correct more than one corruption, and ensuring that
>>> additional corruption isn't introduced in the rare case is even more
>>> non-trivial.
>>
>> And can I point out that that is just one person's opinion?
> 
> Right off the bat you ask a stupid question that contains the answer
> to your own stupid question. This is condescending and annoying, and
> it invites treating you with suspicious as a troll. But then you make
> it worse by saying it again:
> 
Sorry. But I thought we were talking about *Neil's* paper. My bad for
missing it.

>> A
>> well-informed, respected person true, but it's still just opinion.
> 
> Except it is not just an opinion, it's a fact by any objective reader
> who isn't even a programmer, let alone if you know something about
> math and/or programming. Let's break down how totally stupid your
> position is.
> 

<snip ad hominems :-) >
> 
>> At the end of the day, md should never corrupt data by default. Which is
>> what it sounds like is happening at the moment, if it's assuming the
>> data sectors are correct and the parity is wrong. If one parity appears
>> correct then by all means rewrite the second ...
> 
> This is an obtuse and frankly malicious characterization. Scrubs don't
> happen by default. And scrub repair's assuming data strips are correct
> is well documented. If you don't like this assumption, don't use scrub
> repair. You can't say corruption happens by default unless you admit
> that there's URE's on a drive by default - of course that's absurd and
> makes no sense.
> 
Documenting bad behaviour doesn't turn it into good behaviour, though ...
>>
>> But the current setup, where it's currently quite happy to assume a
>> single-drive error and rewrite it if it's a parity drive, but it won't
>> assume a single-drive error and and rewrite it if it's a data drive,
>> just seems totally wrong. Worse, in the latter case, it seems it
>> actively prevents fixing the problem by updating the parity and
>> (probably) corrupting the data.
> 
> The data is already corrupted by definition. No additional damage to
> data is done. What does happen is good P and Q are replaced by bad P
> and Q which matches the already bad data.

Except, in my world, replacing good P & Q by bad P & Q *IS* doing
additional damage! We can identify and fix the bad data. So why don't
we? Throwing away good P & Q prevents us from doing that, and means we
can no longer recover the good data!
> 
> And nevertheless you have the very real problem that drives lie about
> having committed data to stable media. And they reorder writes,
> breaking the write order assumptions of things. And we have RMW
> happening on live arrays. And that means you have a real likelihood
> that you cannot absolutely determine with the available information
> why P and Q don't agree with the data, you're still making probability
> assumptions and if that assumption is wrong any correction will
> introduce more corruption.
> 
> The only unambiguous way to do this has already been done and it's ZFS
> and Btrfs. And a big part of why they can do what they do is because
> they are copy on write. IIf you need to solve the problem of ambiguous
> data strip integrity in relation to P and Q, then use ZFS. It's
> production ready. If you are prepared to help test and improve things,
> then you can look into the Btrfs implementation.

So how come btrfs and ZFS can handle this, and md can't? Can't md use
the same techniques. (Seriously, I don't know the answer. But, like Nix,
when I feel I'm being fed the answer "we're not going to give you the
choice because we know better than you", I get cheesed off. If I get the
answer "we're snowed under, do it yourself" then that is normal and
acceptable.)
> 
> Otherwise I'm sure md and LVM folks have a feature list that
> represents a few years of work as it is without yet another pile on.
> 
>>
>> Report the error, give the user the tools to fix it, and LET THEM sort
>> it out. Just like we do when we run fsck on a filesystem.
> 
> They're not at all comparable. One is a file system, the other a raid
> implementation, they have nothing in common.
> 
> 
And what are file systems and raid implementations? They are both data
store abstractions. They have everything in common.

Oh and by the way, now I've realised my mistake, I've taken a look at
the paper you mention. In particular, section 4. Yes it does say you
can't detect and correct multi-disk errors - but that's not what we're
asking for!

By implication, it seems to be saying LOUD AND CLEAR that you CAN detect
and correct a single-disk error. So why the blankety-blank won't md let
you do that!

Neil's point seems to be that it's a bad idea to do it automatically. I
get his logic. But to then actively prevent you doing it manually - this
is the paternalistic attitude that gets my goat.

Anyways, I've been thinking about this, and I've got a proposal (RFC?).
I haven't got time right now - I'm supposed to be at work - but I'll
write it up this evening. If the response is "we're snowed under - it
sounds a good idea but do it yourself", then so be it. But if the
response is "we don't want the sysadmin to have the choice", then expect
more flak from people like Nix and me.

(And the proposal involves giving sysadmins CHOICE. If they want to take
the hit, it's *their* decision, not a paternalistic choice forced on them.)

(Sorry to keep on about paternalism, but there is a sense that decisions
have been made, and they're not going to be reversed "because I say so".
I'm NOT getting a "you want it, you write it" vibe, and that's what gets
to me.)

Cheers,
Wol

  reply	other threads:[~2017-05-10  4:49 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-04 10:04 Fault tolerance in RAID0 with badblocks Ravi (Tom) Hale
2017-05-04 13:44 ` Wols Lists
2017-05-05  4:03   ` Fault tolerance " Ravi (Tom) Hale
2017-05-05 19:20     ` Anthony Youngman
2017-05-06 11:21       ` Ravi (Tom) Hale
2017-05-06 13:00         ` Wols Lists
2017-05-08 14:50           ` Nix
2017-05-08 18:00             ` Anthony Youngman
2017-05-09 10:11               ` David Brown
2017-05-09 10:18               ` Nix
2017-05-08 19:02             ` Phil Turmel
2017-05-08 19:52               ` Nix
2017-05-08 20:27                 ` Anthony Youngman
2017-05-09  9:53                   ` Nix
2017-05-09 11:09                     ` David Brown
2017-05-09 11:27                       ` Nix
2017-05-09 11:58                         ` David Brown
2017-05-09 17:25                           ` Chris Murphy
2017-05-09 19:44                             ` Wols Lists
2017-05-10  3:53                               ` Chris Murphy
2017-05-10  4:49                                 ` Wols Lists [this message]
2017-05-10 17:18                                   ` Chris Murphy
2017-05-16  3:20                                   ` NeilBrown
2017-05-10  5:00                                 ` Dave Stevens
2017-05-10 16:44                                 ` Edward Kuns
2017-05-10 18:09                                   ` Chris Murphy
2017-05-09 20:18                             ` Nix
2017-05-09 20:52                               ` Wols Lists
2017-05-10  8:41                               ` David Brown
2017-05-09 21:06                             ` A sector-of-mismatch warning patch (was Re: Fault tolerance with badblocks) Nix
2017-05-12 11:14                               ` Nix
2017-05-16  3:27                               ` NeilBrown
2017-05-16  9:13                                 ` Nix
2017-05-16 21:11                                 ` NeilBrown
2017-05-16 21:46                                   ` Nix
2017-05-18  0:07                                     ` Shaohua Li
2017-05-19  4:53                                       ` NeilBrown
2017-05-19 10:31                                         ` Nix
2017-05-19 16:48                                           ` Shaohua Li
2017-06-02 12:28                                             ` Nix
2017-05-19  4:49                                     ` NeilBrown
2017-05-19 10:32                                       ` Nix
2017-05-19 16:55                                         ` Shaohua Li
2017-05-21 22:00                                           ` NeilBrown
2017-05-09 19:16                         ` Fault tolerance with badblocks Phil Turmel
2017-05-09 20:01                           ` Nix
2017-05-09 20:57                             ` Wols Lists
2017-05-09 21:22                               ` Nix
2017-05-09 21:23                             ` Phil Turmel
2017-05-09 21:32                     ` NeilBrown
2017-05-10 19:03                       ` Nix
2017-05-09 16:05                   ` Chris Murphy
2017-05-09 17:49                     ` Wols Lists
2017-05-10  3:06                       ` Chris Murphy
2017-05-08 20:56                 ` Phil Turmel
2017-05-09 10:28                   ` Nix
2017-05-09 10:50                     ` Reindl Harald
2017-05-09 11:15                       ` Nix
2017-05-09 11:48                         ` Reindl Harald
2017-05-09 16:11                           ` Nix
2017-05-09 16:46                             ` Reindl Harald
2017-05-09  7:37             ` David Brown
2017-05-09  9:58               ` Nix
2017-05-09 10:28                 ` Brad Campbell
2017-05-09 10:40                   ` Nix
2017-05-09 12:15                     ` Tim Small
2017-05-09 15:30                       ` Nix
2017-05-05 20:23     ` Peter Grandi
2017-05-05 22:14       ` Nix

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=59129BC1.2090005@youngman.org.uk \
    --to=antlists@youngman.org.uk \
    --cc=linux-raid@vger.kernel.org \
    --cc=lists@colorremedies.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.