From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from outrelay07.libero.it ([212.52.84.111]:57298 "EHLO outrelay07.libero.it" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750826AbaETV5V (ORCPT ); Tue, 20 May 2014 17:57:21 -0400 Message-ID: <537BD078.7070504@libero.it> Date: Wed, 21 May 2014 00:00:24 +0200 From: Goffredo Baroncelli Reply-To: kreijack@inwind.it MIME-Version: 1.0 To: Chris Murphy , Btrfs BTRFS Subject: Re: problem with degraded boot and systemd References: <45D5C607-ED9D-49BB-BA60-CA2B0E94223D@colorremedies.com> In-Reply-To: <45D5C607-ED9D-49BB-BA60-CA2B0E94223D@colorremedies.com> Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-btrfs-owner@vger.kernel.org List-ID: On 05/19/2014 02:54 AM, Chris Murphy wrote: > Summary: > > It's insufficient to pass rootflags=degraded to get the system root > to mount when a device is missing. It looks like when a device is > missing, udev doesn't create the dev-disk-by-uuid linkage that then > causes systemd to change the device state from dead to plugged. Only > once plugged, will systemd attempt to mount the volume. This issue > was brought up on systemd-devel under the subject "timed out waiting > for device dev-disk-by\x2duuid" for those who want details. > [...] > > I think the key problem is either a limitation of udev, or a problem > with the existing udev rule, that prevents the link creation for any > remaining btrfs device. Or maybe it's intentional. But I'm not a udev > expert. This is the current udev rule: > > # cat /usr/lib/udev/rules.d/64-btrfs.rules > # do not edit this file, it will be overwritten on update > > SUBSYSTEM!="block", GOTO="btrfs_end" ACTION=="remove", > GOTO="btrfs_end" ENV{ID_FS_TYPE}!="btrfs", GOTO="btrfs_end" > > # let the kernel know about this btrfs filesystem, and check if it is complete > IMPORT{builtin}="btrfs ready $devnode" > > # mark the device as not ready to be used by the system > ENV{ID_BTRFS_READY}=="0", ENV{SYSTEMD_READY}="0" > > LABEL="btrfs_end" The key is the line IMPORT{builtin}="btrfs ready $devnode" This line sets ID_BTRFS_READY=0 if a filesystem is not ready; otherwise set ID_BTRFS_READY=1 [1]. The next line ENV{ID_BTRFS_READY}=="0", ENV{SYSTEMD_READY}="0" sets SYSTEMD_READY=0 if the filesystem is not ready so the "plug" event is not raised to systemd. This is my understanding. > How this works with raid: > > RAID assembly is separate from filesystem mount. The volume UUID > isn't available until the RAID is successfully assembled. > > On at least Fedora (dracut) systems with the system root on an md > device, the initramfs contains 30-parse-md.sh which includes a loop > to check for the volume UUID. If it's not found, the script sleeps > for 0.5 seconds, and then looks for it again, up to 240 times. If > it's still not found at attempt 240, then the script executes mdadm > -R to forcibly run the array with fewer than all devices present > (degraded assembly). Now the volume UUID exists, udevd creates the > linkage, systemd picks this up and changes device state from dead to > plugged, and then executes a normal mount command. > The approximate Btrfs equivalent down the road would be a similar > initrd script, or maybe a user space daemon, that causes btrfs device > ready to confirm/deny all devices are present. And after x number of > failures, then it's issue an equivalent to mdadm -R which right now > we don't seem to have. I suggest to implement a mount.btrfs command, which waits all the needed disks until a timeout expires. After this timeout it could try a "degraded" mount until a second timeout. Only then it fails. Each time a device appear, the system may start mount.btrfs. Each invocation has to test if there is another instance of mount.btrfs related to the same filesystem; if so it ends, otherwise it follows the above behavior. > > That equivalent might be a decoupling of degraded as a mount option, > such that the user space tool deals with degradedness. And the mount >[...] > > Chris Murphy G.Baroncelli [1] http://lists.freedesktop.org/archives/systemd-commits/2012-September/002503.html -- gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5