All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Janos Toth F." <toth.f.janos@gmail.com>
To: Chris Murphy <lists@colorremedies.com>
Cc: Goffredo Baroncelli <kreijack@inwind.it>,
	linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [RFC] Checksum of the parity
Date: Mon, 14 Aug 2017 01:40:08 +0200	[thread overview]
Message-ID: <CANznX5EfXJ4eUh-bGXK0stPqiDfAnafFFw=_hxgxm2UiFpPeOQ@mail.gmail.com> (raw)
In-Reply-To: <CAJCQCtToFj4BowawgYPT-GiUnZPAXsjtuZO2=imcoyOZmaQzug@mail.gmail.com>

On Sun, Aug 13, 2017 at 8:45 PM, Chris Murphy <lists@colorremedies.com> wrote:
> Further, the error detection of corrupt reconstruction is why I say
> Btrfs is not subject *in practice* to the write hole problem. [2]
>
> [1]
> I haven't tested the raid6 normal read case where a stripe contains
> corrupt data strip and corrupt P strip, and Q strip is good. I expect
> instead of EIO, we get a reconstruction from Q, and then both data and
> P get fixed up, but I can't find it in comments or code.

Yes, that's what I would expect (which theoretically makes the odds of
successful recovery better on RAID6, possible "good enough") but I
have no clue how that actually gets handled right now (I guess the
current code isn't that thorough).

> [2]
> Is Btrfs subject to the write hole problem manifesting on disk? I'm
> not sure, sadly I don't read the code well enough. But if all Btrfs
> raid56 writes are full stripe CoW writes, and if the prescribed order
> guarantees still happen: data CoW to disk > metadata CoW to disk >
> superblock update, then I don't see how the write hole happens. Write
> hole requires: RMW of a stripe, which is a partial stripe overwrite,
> and a crash during the modification of the stripe making that stripe
> inconsistent as well as still pointed to by metadata.

I guess the problem is that stripe size or stripe element size is
(sort of) fixed (not sure which one, I guess it's the latter, in which
case the actual stripe size depends on the number of devices) and
relatively big (much bigger than the usual 4k sector size or even the
leaf size which now defaults to 16k, if I recall [but I set this to 4k
myself]), so a partial stripe update (RMW) is certainly possible
during generic use.

This is why I threw the idea around a few months ago to resurrect that
old (but dead looking / stuck) project about making the stripe
(element) size configurable by the user. That would allow for making
the stripe size equal to the filesystem sector size on a limited
amount of setups (for example, 5 or 6 HDD with 512-byte physical
sectors in RAID-5 or RAID-6 respectively) which would (as I
understand) practically eliminate the problem (at least on the
filesystem side, I am not sure if the HDD's volatile write-cache or at
least it's internal re-ordering feature should still be disabled for
this to really avoid inconsistencies between stripe elements --- I
can't recall ever seeing partially written sectors [we would know
since these are checksummed in place and thus appear unreadable if
partially written], I guess there might be usually enough electricity
in some small capacitor to finish the current sector after the power
gets cut ???).

  reply	other threads:[~2017-08-13 23:40 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-13 14:16 [RFC] Checksum of the parity Goffredo Baroncelli
2017-08-13 18:45 ` Chris Murphy
2017-08-13 23:40   ` Janos Toth F. [this message]
2017-08-14 14:12   ` Goffredo Baroncelli
2017-08-14 19:28     ` Chris Murphy
2017-08-14 20:18       ` Goffredo Baroncelli
2017-08-14 21:10         ` Chris Murphy
2017-08-14 13:23 ` Austin S. Hemmelgarn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANznX5EfXJ4eUh-bGXK0stPqiDfAnafFFw=_hxgxm2UiFpPeOQ@mail.gmail.com' \
    --to=toth.f.janos@gmail.com \
    --cc=kreijack@inwind.it \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lists@colorremedies.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.