linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Johannes Hirte <johannes.hirte@datenkhaos.de>
Cc: Justin Engwer <justin@mautobu.com>, linux-btrfs@vger.kernel.org
Subject: Re: I think he's dead, Jim
Date: Thu, 21 May 2020 02:20:43 -0400	[thread overview]
Message-ID: <20200521062043.GE10769@hungrycats.org> (raw)
In-Reply-To: <20200520205319.GA26435@latitude>

On Wed, May 20, 2020 at 10:53:19PM +0200, Johannes Hirte wrote:
> On 2020 Mai 19, Zygo Blaxell wrote:
> > 
> > Corollary:  Never use space_cache=v1 with raid5 or raid6 data.
> > space_cache=v1 puts some metadata (free space cache) in data block
> > groups, so it violates the "never use raid5 or raid6 for metadata" rule.
> > space_cache=v2 eliminates this problem by storing the free space tree
> > in metadata block groups.
> > 
> 
> This should not be a real problem, as the space-cache can be discarded
> and rebuild anytime. Or do I miss something?

Keep in mind that there are multiple reasons to not use space_cache=v1;
space_cache=v1 is quite slow, especially on filesystems big enough that
raid5 is in play, even when it's not recovering from integrity failures.

The free space cache (v1) is stored in nodatacow inodes, so it has all
the btrfs RAID data integrity problems of nodatasum, plus the parity
corruption and write hole issues of raid5.  Free space tree (v2) is
stored in metadata, so it has csums to detect data corruption and transid
checks for dropped writes, and if you are using raid1 metadata you also
avoid the parity corruption bug in btrfs's raid5/6 implementation and
the write hole.  v2 is faster too, especially at commit time.

The probability of undetected space_cache=v1 failure is low, but not zero.
In the event of failure, the filesystem should detect the error when it
tries to create new entries in the extent tree--they'll overlap existing
allocated blocks, and the filesystem will force itself read-only, so
there should be no permanent damage other than killing any application
that was writing to the disk at the time.

Come to think of it, though, the space_cache=v1 problems are not specific
to raid5.  You shouldn't use space_cache=v1 with raid1 or raid10 data
either, for the same reasons.

In the raid5/6 case it's a bit simpler:   kernels that can't do
space_cache=v2 (4.4 and earlier) don't have working raid5 recovery either.

> -- 
> Regards,
>   Johannes Hirte
> 

  parent reply	other threads:[~2020-05-21  6:20 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-18 20:51 I think he's dead, Jim Justin Engwer
2020-05-18 23:23 ` Chris Murphy
     [not found]   ` <CAGAeKuv3y=rHvRsq6SVSQ+NadyUaFES94PpFu1zD74cO3B_eLA@mail.gmail.com>
     [not found]     ` <CAJCQCtQXR+x4mG+jT34nhkE69sP94yio-97MLmd_ugKS+m96DQ@mail.gmail.com>
2020-05-19 18:45       ` Justin Engwer
2020-05-19 20:44         ` Chris Murphy
2020-05-20  1:32 ` Zygo Blaxell
2020-05-20 20:53   ` Johannes Hirte
2020-05-20 21:35     ` Chris Murphy
2020-05-20 22:15       ` Johannes Hirte
2020-05-21  6:20     ` Zygo Blaxell [this message]
2020-05-21 17:24       ` Justin Engwer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200521062043.GE10769@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=johannes.hirte@datenkhaos.de \
    --cc=justin@mautobu.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).