From: Goffredo Baroncelli <kreijack@libero.it>
To: Chris Murphy <lists@colorremedies.com>,
Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: problem with degraded boot and systemd
Date: Wed, 21 May 2014 00:00:24 +0200 [thread overview]
Message-ID: <537BD078.7070504@libero.it> (raw)
In-Reply-To: <45D5C607-ED9D-49BB-BA60-CA2B0E94223D@colorremedies.com>
On 05/19/2014 02:54 AM, Chris Murphy wrote:
> Summary:
>
> It's insufficient to pass rootflags=degraded to get the system root
> to mount when a device is missing. It looks like when a device is
> missing, udev doesn't create the dev-disk-by-uuid linkage that then
> causes systemd to change the device state from dead to plugged. Only
> once plugged, will systemd attempt to mount the volume. This issue
> was brought up on systemd-devel under the subject "timed out waiting
> for device dev-disk-by\x2duuid" for those who want details.
>
[...]
>
> I think the key problem is either a limitation of udev, or a problem
> with the existing udev rule, that prevents the link creation for any
> remaining btrfs device. Or maybe it's intentional. But I'm not a udev
> expert. This is the current udev rule:
>
> # cat /usr/lib/udev/rules.d/64-btrfs.rules
> # do not edit this file, it will be overwritten on update
>
> SUBSYSTEM!="block", GOTO="btrfs_end" ACTION=="remove",
> GOTO="btrfs_end" ENV{ID_FS_TYPE}!="btrfs", GOTO="btrfs_end"
>
> # let the kernel know about this btrfs filesystem, and check if it is complete
> IMPORT{builtin}="btrfs ready $devnode"
>
> # mark the device as not ready to be used by the system
> ENV{ID_BTRFS_READY}=="0", ENV{SYSTEMD_READY}="0"
>
> LABEL="btrfs_end"
The key is the line
IMPORT{builtin}="btrfs ready $devnode"
This line sets ID_BTRFS_READY=0 if a filesystem is not ready; otherwise
set ID_BTRFS_READY=1 [1].
The next line
ENV{ID_BTRFS_READY}=="0", ENV{SYSTEMD_READY}="0"
sets SYSTEMD_READY=0 if the filesystem is not ready so the "plug" event
is not raised to systemd.
This is my understanding.
> How this works with raid:
>
> RAID assembly is separate from filesystem mount. The volume UUID
> isn't available until the RAID is successfully assembled.
>
> On at least Fedora (dracut) systems with the system root on an md
> device, the initramfs contains 30-parse-md.sh which includes a loop
> to check for the volume UUID. If it's not found, the script sleeps
> for 0.5 seconds, and then looks for it again, up to 240 times. If
> it's still not found at attempt 240, then the script executes mdadm
> -R to forcibly run the array with fewer than all devices present
> (degraded assembly). Now the volume UUID exists, udevd creates the
> linkage, systemd picks this up and changes device state from dead to
> plugged, and then executes a normal mount command.
> The approximate Btrfs equivalent down the road would be a similar
> initrd script, or maybe a user space daemon, that causes btrfs device
> ready to confirm/deny all devices are present. And after x number of
> failures, then it's issue an equivalent to mdadm -R which right now
> we don't seem to have.
I suggest to implement a mount.btrfs command, which waits all the
needed disks until a timeout expires. After this timeout it could try
a "degraded" mount until a second timeout. Only then it fails.
Each time a device appear, the system may start mount.btrfs. Each
invocation has to test if there is another instance of mount.btrfs related
to the same filesystem; if so it ends, otherwise it follows the above
behavior.
>
> That equivalent might be a decoupling of degraded as a mount option,
> such that the user space tool deals with degradedness. And the mount
>[...]
>
> Chris Murphy
G.Baroncelli
[1] http://lists.freedesktop.org/archives/systemd-commits/2012-September/002503.html
--
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
next prev parent reply other threads:[~2014-05-20 21:57 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-19 0:54 problem with degraded boot and systemd Chris Murphy
2014-05-20 22:00 ` Goffredo Baroncelli [this message]
2014-05-20 22:26 ` Hugo Mills
2014-05-21 0:03 ` Duncan
2014-05-21 0:51 ` Chris Murphy
[not found] ` <4QrS1o00b1EMSLa01QrT4N>
2014-05-21 10:35 ` Duncan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=537BD078.7070504@libero.it \
--to=kreijack@libero.it \
--cc=kreijack@inwind.it \
--cc=linux-btrfs@vger.kernel.org \
--cc=lists@colorremedies.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).