linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: Chris Murphy <lists@colorremedies.com>, waxhead <waxhead@dirtcellar.net>
Cc: Stefan K <shadow_7@gmx.net>, Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: btrfs as / filesystem in RAID1
Date: Mon, 11 Feb 2019 07:17:42 -0500	[thread overview]
Message-ID: <a8e00ae7-9e18-ba74-5521-a2db7b525e51@gmail.com> (raw)
In-Reply-To: <CAJCQCtQ-nLkOYE5ARk+rjT4JBxR6Atn1gU-+U8gAT0sb7Mduow@mail.gmail.com>

On 2019-02-10 13:34, Chris Murphy wrote:
> On Sat, Feb 9, 2019 at 5:13 AM waxhead <waxhead@dirtcellar.net> wrote:
> 
>> Understood, but that is not quite what I meant - let me rephrase...
>> If BTRFS still can't mount, why would it blindly accept a previously
>> non-existing disk to take part of the pool?!
> 
> It doesn't do it blindly. It only ever mounts when the user specifies
> the degraded mount option, which is not a default mount option.
> 
>> E.g. if you have "disk" A+B
>> and suddenly at one boot B is not there. Now you have only A and one
>> would think that A should register that B has been missing. Now on the
>> next boot you have AB , in which case B is likely to have diverged from
>> A since A has been mounted without B present - so even if both devices
>> are present why would btrfs blindly accept that both A+B are good to go
>> even if it should be perfectly possible to register in A that B was
>> gone. And if you have B without A it should be the same story right?
> 
> OK no, you haven't gone far enough to setup the split brain scenario
> where there is a partially legitimate complaint. Prior to split brain,
> it's entirely reasonable for Btrfs to mount *when you use the degraded
> mount option* - it does not blindly mount. And if you've ever done
> exactly what you wrote in the above paragraph, you'd see Btrfs
> *complains vociferously* about all the errors it's passively finding
> and fixing. If you want a more active method of getting device B
> caught up with A automatically - that's completely reasonable, and
> something people have been saying for some time, but it takes a design
> proposal, and code.
> 
> As for split brain scenario, it is only the user's manual intervention
> with multiple 'degraded' mount options (which again, is not the
> default) that caused the volume to arrive in such a state. Would it be
> wise to have some additional error checking? Sure. Someone would need
> to step up with a design and to do code work, same as any other
> feature. Maybe a rudimentary check would be comparing the timestamps
> for leaves or nodes ostensibly with the same transid, but in any case
> that doesn't just happen for free.
And even then it couldn't be made truly reliable, because data from old 
transactions may be arbitrarily overwritten at any point after the next 
transaction (and is just plain gone if you're using the `discard` mount 
option).
> 
> 
>>>> So what you are saying is that the generation number does not
>>>> represent a true frozen state of the filesystem at that point?
>>> It does _only_ for those devices which were present at the time of the
>>> commit that incremented it.
>>>
>> So in other words devices that are not present can easily be marked /
>> defined as such at a later time?
> 
> That isn't how it currently works. When stale device B is subsequently
> mounted (normally) along with device A, it's only passively fixed up.
> Part of the point of non-automatic degraded mounts that require user
> intervention is the lack of anything beyond simple error handling and
> fixups.
> 
>> Ok, not sure I still understand how/why systemd knows what devices are
>> part of btrfs (or md or lvm for that matter). I'll try to research this
>> a bit - thanks for the info!
> 
> It doesn't, not directly. It's from the previously mentioned udev
> rule. For md, the assembly, delays, and fall back to running degraded,
> are handled in dracut. But the reason why this is in udev is to
> prevent a mount failure just because one or more devices are delayed;
> basically it inserts a pause until the devices appear, and then
> systemd issues the mount command.
Last I knew, it was systemd itself doing the pause, because we provide 
no real device for udev to wait on appearing.


  reply	other threads:[~2019-02-11 12:17 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-01 10:28 btrfs as / filesystem in RAID1 Stefan K
2019-02-01 19:13 ` Hans van Kranenburg
2019-02-07 11:04   ` Stefan K
2019-02-07 12:18     ` Austin S. Hemmelgarn
2019-02-07 18:53       ` waxhead
2019-02-07 19:39         ` Austin S. Hemmelgarn
2019-02-07 21:21           ` Remi Gauvin
2019-02-08  4:51           ` Andrei Borzenkov
2019-02-08 12:54             ` Austin S. Hemmelgarn
2019-02-08  7:15           ` Stefan K
2019-02-08 12:58             ` Austin S. Hemmelgarn
2019-02-08 16:56             ` Chris Murphy
2019-02-08 18:10           ` waxhead
2019-02-08 19:17             ` Austin S. Hemmelgarn
2019-02-09 12:13               ` waxhead
2019-02-10 18:34                 ` Chris Murphy
2019-02-11 12:17                   ` Austin S. Hemmelgarn [this message]
2019-02-11 21:15                     ` Chris Murphy
2019-02-08 20:17             ` Chris Murphy
2019-02-07 17:15     ` Chris Murphy
2019-02-07 17:37       ` Martin Steigerwald
2019-02-07 22:19         ` Chris Murphy
2019-02-07 23:02           ` Remi Gauvin
2019-02-08  7:33           ` Stefan K
2019-02-08 17:26             ` Chris Murphy
2019-02-11  9:30     ` Anand Jain
2019-02-02 23:35 ` Chris Murphy
2019-02-04 17:47   ` Patrik Lundquist
2019-02-04 17:55     ` Austin S. Hemmelgarn
2019-02-04 22:19       ` Patrik Lundquist
2019-02-05  6:46         ` Chris Murphy
2019-02-05  7:37           ` Chris Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a8e00ae7-9e18-ba74-5521-a2db7b525e51@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lists@colorremedies.com \
    --cc=shadow_7@gmx.net \
    --cc=waxhead@dirtcellar.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).