From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from plane.gmane.org ([80.91.229.3]:48001 "EHLO plane.gmane.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750737AbaEUAD0 (ORCPT ); Tue, 20 May 2014 20:03:26 -0400 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1Wmu0A-0003JS-1M for linux-btrfs@vger.kernel.org; Wed, 21 May 2014 02:03:22 +0200 Received: from ip68-231-22-224.ph.ph.cox.net ([68.231.22.224]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 21 May 2014 02:03:22 +0200 Received: from 1i5t5.duncan by ip68-231-22-224.ph.ph.cox.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 21 May 2014 02:03:22 +0200 To: linux-btrfs@vger.kernel.org From: Duncan <1i5t5.duncan@cox.net> Subject: Re: problem with degraded boot and systemd Date: Wed, 21 May 2014 00:03:08 +0000 (UTC) Message-ID: References: <45D5C607-ED9D-49BB-BA60-CA2B0E94223D@colorremedies.com> <537BD078.7070504@libero.it> <20140520222609.GD1756@carfax.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-btrfs-owner@vger.kernel.org List-ID: Hugo Mills posted on Tue, 20 May 2014 23:26:09 +0100 as excerpted: > On Wed, May 21, 2014 at 12:00:24AM +0200, Goffredo Baroncelli wrote: >> On 05/19/2014 02:54 AM, Chris Murphy wrote: >>> >>> It's insufficient to pass rootflags=degraded to get the system root >>> to mount when a device is missing. It looks like when a device is >>> missing, udev doesn't [...] >>> >>> This is the current udev rule: >>> >>> # cat /usr/lib/udev/rules.d/64-btrfs.rules >>> # do not edit this file, it will be overwritten on update >>> >>> SUBSYSTEM!="block", GOTO="btrfs_end" >>> ACTION=="remove", GOTO="btrfs_end" >>> ENV{ID_FS_TYPE}!="btrfs", GOTO="btrfs_end" >>> >>> # let the kernel know about this btrfs filesystem, and check if it is >>> # complete >>> IMPORT{builtin}="btrfs ready $devnode" >>> >>> # mark the device as not ready to be used by the system >>> ENV{ID_BTRFS_READY}=="0", ENV{SYSTEMD_READY}="0" >>> >>> LABEL="btrfs_end" >> >> The key is the line >> >> IMPORT{builtin}="btrfs ready $devnode" >> >> This line sets ID_BTRFS_READY=0 if a filesystem is not ready; otherwise >> set ID_BTRFS_READY=1 [1]. >> The next line >> >> ENV{ID_BTRFS_READY}=="0", ENV{SYSTEMD_READY}="0" >> >> sets SYSTEMD_READY=0 if the filesystem is not ready so the "plug" event >> is not raised to systemd. >> >> This is my understanding. Looks correct to me. =:^) >>> How this works with raid: >>> >>> RAID assembly is separate from filesystem mount. The volume UUID >>> isn't available until the RAID is successfully assembled. >>> >>> On at least Fedora (dracut) systems with the system root on an md >>> device, the initramfs contains 30-parse-md.sh [with a sleep loop and >>> a timeout] >> >>> The approximate Btrfs equivalent down the road would be a similar >>> initrd script, or maybe a user space daemon, that causes btrfs device >>> ready to confirm/deny all devices are present. And after x number of >>> failures, then it's issue an equivalent to mdadm -R which right now >>> we don't seem to have. >> >> I suggest to implement a mount.btrfs command, which waits all the >> needed disks until a timeout expires. After this timeout it could try a >> "degraded" mount until a second timeout. Only then it fails. >> >> Each time a device appear, the system may start mount.btrfs. Each >> invocation has to test if there is another instance of mount.btrfs >> related to the same filesystem; if so it ends, otherwise it follows the >> above behavior. > > Don't we already have something approaching this functionality with > btrfs device ready? (i.e. this is exactly what it was designed for). Well, sort of. btrfs device ready is used directly in the udev rule quoted above. And in the non-degraded case it works as intended, checking if the filesystem is complete and only letting the udev plug event complete when all devices are available. But this thread is about a degraded state mount, with devices missing. In that case, the missing devices never appear so the plug event never happens, so systemd will never mount the device, despite the fact that degraded was specifically passed as an option, indicating that the admin wants the mount to happen anyway. In dracut[1] (on gentoo), the result is an eventual timeout on rootfs appearing and a kick to the initr* rescue shell prompt. Where an admin can manually mount using the degraded option, and continue from there. I'd actually argue that's functioning as it should, since I see forced manual intervention in ordered to mount degraded as a FEATURE, NOT A BUG. But never-the-less, being able to effectively pass degraded either as part of rootflags or in the fstab that dracut (and systemd in dracut) use, such that degraded-mount could still be automated, could I suppose be seen as a feature, to some. To do that would require a script with a countdown and timeout, first for undegraded ready (and thus mount), then if all devices don't appear, bypassing the ready test and plugging it anyway, to let mount try it if the degraded option was passed, and only if THAT fails falling back to the emergency shell prompt. Note that such a script wouldn't have to actually check for degraded in the mount options, only fall back to plugging without all devices if the complete timeout triggered, since mount would then take care of success/ failure on its own based on whether the degraded option was passed, just as it does if a mount is attempted on an incomplete btrfs at other times. --- [1] dracut: I use it here on gentoo as well, because my rootfs is a multi- device btrfs and a kernel rootflags=device= line won't parse correctly, apparently due to splitting at the wrong =, so I must use an initr* despite my preference for a direct initr*-less boot, and I use dracut to generate it. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman