From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: problem with degraded boot and systemd
Date: Wed, 21 May 2014 00:03:08 +0000 (UTC) [thread overview]
Message-ID: <pan$6f6ec$5ba926ea$9860d25c$5d86cd20@cox.net> (raw)
In-Reply-To: 20140520222609.GD1756@carfax.org.uk
Hugo Mills posted on Tue, 20 May 2014 23:26:09 +0100 as excerpted:
> On Wed, May 21, 2014 at 12:00:24AM +0200, Goffredo Baroncelli wrote:
>> On 05/19/2014 02:54 AM, Chris Murphy wrote:
>>>
>>> It's insufficient to pass rootflags=degraded to get the system root
>>> to mount when a device is missing. It looks like when a device is
>>> missing, udev doesn't [...]
>>>
>>> This is the current udev rule:
>>>
>>> # cat /usr/lib/udev/rules.d/64-btrfs.rules
>>> # do not edit this file, it will be overwritten on update
>>>
>>> SUBSYSTEM!="block", GOTO="btrfs_end"
>>> ACTION=="remove", GOTO="btrfs_end"
>>> ENV{ID_FS_TYPE}!="btrfs", GOTO="btrfs_end"
>>>
>>> # let the kernel know about this btrfs filesystem, and check if it is
>>> # complete
>>> IMPORT{builtin}="btrfs ready $devnode"
>>>
>>> # mark the device as not ready to be used by the system
>>> ENV{ID_BTRFS_READY}=="0", ENV{SYSTEMD_READY}="0"
>>>
>>> LABEL="btrfs_end"
>>
>> The key is the line
>>
>> IMPORT{builtin}="btrfs ready $devnode"
>>
>> This line sets ID_BTRFS_READY=0 if a filesystem is not ready; otherwise
>> set ID_BTRFS_READY=1 [1].
>> The next line
>>
>> ENV{ID_BTRFS_READY}=="0", ENV{SYSTEMD_READY}="0"
>>
>> sets SYSTEMD_READY=0 if the filesystem is not ready so the "plug" event
>> is not raised to systemd.
>>
>> This is my understanding.
Looks correct to me. =:^)
>>> How this works with raid:
>>>
>>> RAID assembly is separate from filesystem mount. The volume UUID
>>> isn't available until the RAID is successfully assembled.
>>>
>>> On at least Fedora (dracut) systems with the system root on an md
>>> device, the initramfs contains 30-parse-md.sh [with a sleep loop and
>>> a timeout]
>>
>>> The approximate Btrfs equivalent down the road would be a similar
>>> initrd script, or maybe a user space daemon, that causes btrfs device
>>> ready to confirm/deny all devices are present. And after x number of
>>> failures, then it's issue an equivalent to mdadm -R which right now
>>> we don't seem to have.
>>
>> I suggest to implement a mount.btrfs command, which waits all the
>> needed disks until a timeout expires. After this timeout it could try a
>> "degraded" mount until a second timeout. Only then it fails.
>>
>> Each time a device appear, the system may start mount.btrfs. Each
>> invocation has to test if there is another instance of mount.btrfs
>> related to the same filesystem; if so it ends, otherwise it follows the
>> above behavior.
>
> Don't we already have something approaching this functionality with
> btrfs device ready? (i.e. this is exactly what it was designed for).
Well, sort of.
btrfs device ready is used directly in the udev rule quoted above. And
in the non-degraded case it works as intended, checking if the filesystem
is complete and only letting the udev plug event complete when all
devices are available.
But this thread is about a degraded state mount, with devices missing.
In that case, the missing devices never appear so the plug event never
happens, so systemd will never mount the device, despite the fact that
degraded was specifically passed as an option, indicating that the admin
wants the mount to happen anyway.
In dracut[1] (on gentoo), the result is an eventual timeout on rootfs
appearing and a kick to the initr* rescue shell prompt. Where an admin
can manually mount using the degraded option, and continue from there.
I'd actually argue that's functioning as it should, since I see forced
manual intervention in ordered to mount degraded as a FEATURE, NOT A BUG.
But never-the-less, being able to effectively pass degraded either as
part of rootflags or in the fstab that dracut (and systemd in dracut)
use, such that degraded-mount could still be automated, could I suppose
be seen as a feature, to some.
To do that would require a script with a countdown and timeout, first for
undegraded ready (and thus mount), then if all devices don't appear,
bypassing the ready test and plugging it anyway, to let mount try it if
the degraded option was passed, and only if THAT fails falling back to
the emergency shell prompt.
Note that such a script wouldn't have to actually check for degraded in
the mount options, only fall back to plugging without all devices if the
complete timeout triggered, since mount would then take care of success/
failure on its own based on whether the degraded option was passed, just
as it does if a mount is attempted on an incomplete btrfs at other times.
---
[1] dracut: I use it here on gentoo as well, because my rootfs is a multi-
device btrfs and a kernel rootflags=device= line won't parse correctly,
apparently due to splitting at the wrong =, so I must use an initr*
despite my preference for a direct initr*-less boot, and I use dracut to
generate it.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
next prev parent reply other threads:[~2014-05-21 0:03 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-19 0:54 problem with degraded boot and systemd Chris Murphy
2014-05-20 22:00 ` Goffredo Baroncelli
2014-05-20 22:26 ` Hugo Mills
2014-05-21 0:03 ` Duncan [this message]
2014-05-21 0:51 ` Chris Murphy
[not found] ` <4QrS1o00b1EMSLa01QrT4N>
2014-05-21 10:35 ` Duncan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='pan$6f6ec$5ba926ea$9860d25c$5d86cd20@cox.net' \
--to=1i5t5.duncan@cox.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).