* Extend BTRFS_IOC_DEVICES_READY for degraded RAID @ 2015-01-05 9:46 Harald Hoyer 2015-01-05 11:31 ` Lennart Poettering 0 siblings, 1 reply; 6+ messages in thread From: Harald Hoyer @ 2015-01-05 9:46 UTC (permalink / raw) To: linux-btrfs, Kay Sievers, Lennart Poettering We have BTRFS_IOC_DEVICES_READY to report, if all devices are present, so that a udev rule can report ID_BTRFS_READY and SYSTEMD_READY. I think we need a third state here for a degraded RAID, which can be mounted, but should only after a certain timeout/kernel command line params. We also have to rethink how to handle the udev DB update for the change of the state. incomplete -> degraded -> complete ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Extend BTRFS_IOC_DEVICES_READY for degraded RAID 2015-01-05 9:46 Extend BTRFS_IOC_DEVICES_READY for degraded RAID Harald Hoyer @ 2015-01-05 11:31 ` Lennart Poettering 2015-01-05 12:08 ` Austin S Hemmelgarn 2015-01-05 16:36 ` Goffredo Baroncelli 0 siblings, 2 replies; 6+ messages in thread From: Lennart Poettering @ 2015-01-05 11:31 UTC (permalink / raw) To: Harald Hoyer; +Cc: linux-btrfs, Kay Sievers On Mon, 05.01.15 10:46, Harald Hoyer (harald@redhat.com) wrote: > We have BTRFS_IOC_DEVICES_READY to report, if all devices are present, so that > a udev rule can report ID_BTRFS_READY and SYSTEMD_READY. > > I think we need a third state here for a degraded RAID, which can be mounted, > but should only after a certain timeout/kernel command line params. > > We also have to rethink how to handle the udev DB update for the change of the > state. incomplete -> degraded -> complete I am not convinced that automatically booting degraded arrays would be a good idea. Instead, requiring one manual step before booting a degraded array sounds OK to me. Lennart -- Lennart Poettering, Red Hat ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Extend BTRFS_IOC_DEVICES_READY for degraded RAID 2015-01-05 11:31 ` Lennart Poettering @ 2015-01-05 12:08 ` Austin S Hemmelgarn 2015-01-05 16:36 ` Goffredo Baroncelli 1 sibling, 0 replies; 6+ messages in thread From: Austin S Hemmelgarn @ 2015-01-05 12:08 UTC (permalink / raw) To: Lennart Poettering, Harald Hoyer; +Cc: linux-btrfs, Kay Sievers [-- Attachment #1: Type: text/plain, Size: 942 bytes --] On 2015-01-05 06:31, Lennart Poettering wrote: > On Mon, 05.01.15 10:46, Harald Hoyer (harald@redhat.com) wrote: > >> We have BTRFS_IOC_DEVICES_READY to report, if all devices are present, so that >> a udev rule can report ID_BTRFS_READY and SYSTEMD_READY. >> >> I think we need a third state here for a degraded RAID, which can be mounted, >> but should only after a certain timeout/kernel command line params. >> >> We also have to rethink how to handle the udev DB update for the change of the >> state. incomplete -> degraded -> complete > > I am not convinced that automatically booting degraded arrays would be > a good idea. Instead, requiring one manual step before booting a > degraded array sounds OK to me. > > Lennart > I can think of half a dozen use cases where it is better to automatically mount degraded and send a notification that this happened than to refuse to mount without manual intervention. [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 2455 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Extend BTRFS_IOC_DEVICES_READY for degraded RAID 2015-01-05 11:31 ` Lennart Poettering 2015-01-05 12:08 ` Austin S Hemmelgarn @ 2015-01-05 16:36 ` Goffredo Baroncelli 2015-01-05 17:02 ` Austin S Hemmelgarn 1 sibling, 1 reply; 6+ messages in thread From: Goffredo Baroncelli @ 2015-01-05 16:36 UTC (permalink / raw) To: Lennart Poettering, Harald Hoyer Cc: linux-btrfs, Kay Sievers, Chris Mason, David Sterba On 2015-01-05 12:31, Lennart Poettering wrote: > On Mon, 05.01.15 10:46, Harald Hoyer (harald@redhat.com) wrote: > >> We have BTRFS_IOC_DEVICES_READY to report, if all devices are present, so that >> a udev rule can report ID_BTRFS_READY and SYSTEMD_READY. >> >> I think we need a third state here for a degraded RAID, which can be mounted, >> but should only after a certain timeout/kernel command line params. >> >> We also have to rethink how to handle the udev DB update for the change of the >> state. incomplete -> degraded -> complete > > I am not convinced that automatically booting degraded arrays would be > a good idea. Instead, requiring one manual step before booting a > degraded array sounds OK to me. I think that a good use case is when the root filesystem is a raid one. However I don't think that the current architecture is enough flexible to perform this job: - mounting a raid filesystem in degraded mode is good for some setup but it is not the right solution for all: a configure parameter to allow one behavior or the other is needed: - the degraded mode should be allowed only if not all the devices are discovered AND a timeout is expired. This timeout is another variable which (IMHO) should be configurable; - there are different degrees of degraded mode: if the raid is a RAID6, losing a device would be acceptable; loosing two devices may be unacceptable. Again there is no a simple answer; it is needed a configurable policy; - pay attention that the current architecture has some flaws: if a device disappear during the device discovery, ID_BTRFS_READY returns OK even if a device is missing. I proposed a mount.btrfs helper[1], which (IMHO) is a good place for this kind of job. However I have to point out that both Chris, and David aren't fully convinced of this solution. I hope that they could change opinion. In conclusion, I see some use case to allow to mount a degraded btrfs filesystem; however I don't see as viable the idea to enhance the BTRFS_IOC_DEVICES_READY ioctl. More logic is required. > > Lennart > BR G.Baroncelli [1] http://www.spinics.net/lists/linux-btrfs/msg39706.html -- gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Extend BTRFS_IOC_DEVICES_READY for degraded RAID 2015-01-05 16:36 ` Goffredo Baroncelli @ 2015-01-05 17:02 ` Austin S Hemmelgarn 2015-01-05 17:57 ` Goffredo Baroncelli 0 siblings, 1 reply; 6+ messages in thread From: Austin S Hemmelgarn @ 2015-01-05 17:02 UTC (permalink / raw) To: kreijack, Lennart Poettering, Harald Hoyer Cc: linux-btrfs, Kay Sievers, Chris Mason, David Sterba [-- Attachment #1: Type: text/plain, Size: 2784 bytes --] On 2015-01-05 11:36, Goffredo Baroncelli wrote: > On 2015-01-05 12:31, Lennart Poettering wrote: >> On Mon, 05.01.15 10:46, Harald Hoyer (harald@redhat.com) wrote: >> >>> We have BTRFS_IOC_DEVICES_READY to report, if all devices are present, so that >>> a udev rule can report ID_BTRFS_READY and SYSTEMD_READY. >>> >>> I think we need a third state here for a degraded RAID, which can be mounted, >>> but should only after a certain timeout/kernel command line params. >>> >>> We also have to rethink how to handle the udev DB update for the change of the >>> state. incomplete -> degraded -> complete >> >> I am not convinced that automatically booting degraded arrays would be >> a good idea. Instead, requiring one manual step before booting a >> degraded array sounds OK to me. > > I think that a good use case is when the root filesystem is a raid one. > > However I don't think that the current architecture is enough flexible to > perform this job: > - mounting a raid filesystem in degraded mode is good for some setup > but it is not the right solution for all: a configure > parameter to allow one behavior or the other is needed: > - the degraded mode should be allowed only if not all the devices are > discovered AND a timeout is expired. This timeout is another variable which > (IMHO) should be configurable; These first 2 points can be easily handled with some simple logic in userspace without needing a mount helper. > - there are different degrees of degraded mode: if the raid is a RAID6, > losing a device would be acceptable; loosing two devices may be > unacceptable. Again there is no a simple answer; it is needed a > configurable policy; This can be solved by providing 2 new return values for the BBTRFS_IOC_DEVICES_READY ioctl (instead of just one), one for for arrays that are in such a state that losing another disk will almost certainly cause data loss (ie, a RAID6 with two missing devices, or a BTRFS raid1/10 with one missing device), and one for an array (theoretically) won't lose any data if one more device drops out (ie, a RAID6 (or something with higher parity) with one missing disk), and then provide a module parameter to allow forcing the kernel to report one or the other. > - pay attention that the current architecture has some flaws: if a device > disappear during the device discovery, ID_BTRFS_READY returns OK > even if a device is missing. Point 4 would require for some kind of continuous scanning/notification (and therefore add more bulk, the lack of which is in my opinion one of the biggest advantages of BTRFS over ZFS), and even then there will always be the possibility that a device drops out between you calling the ioctl and trying to mount the filesystem. [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 2455 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Extend BTRFS_IOC_DEVICES_READY for degraded RAID 2015-01-05 17:02 ` Austin S Hemmelgarn @ 2015-01-05 17:57 ` Goffredo Baroncelli 0 siblings, 0 replies; 6+ messages in thread From: Goffredo Baroncelli @ 2015-01-05 17:57 UTC (permalink / raw) To: Austin S Hemmelgarn, Lennart Poettering, Harald Hoyer Cc: linux-btrfs, Kay Sievers, Chris Mason, David Sterba On 2015-01-05 18:02, Austin S Hemmelgarn wrote: > On 2015-01-05 11:36, Goffredo Baroncelli wrote: >> On 2015-01-05 12:31, Lennart Poettering wrote: >>> On Mon, 05.01.15 10:46, Harald Hoyer (harald@redhat.com) wrote: >>> >>>> We have BTRFS_IOC_DEVICES_READY to report, if all devices are >>>> present, so that a udev rule can report ID_BTRFS_READY and >>>> SYSTEMD_READY. >>>> >>>> I think we need a third state here for a degraded RAID, which >>>> can be mounted, but should only after a certain timeout/kernel >>>> command line params. >>>> >>>> We also have to rethink how to handle the udev DB update for >>>> the change of the state. incomplete -> degraded -> complete >>> >>> I am not convinced that automatically booting degraded arrays >>> would be a good idea. Instead, requiring one manual step before >>> booting a degraded array sounds OK to me. >> >> I think that a good use case is when the root filesystem is a raid >> one. >> >> However I don't think that the current architecture is enough >> flexible to perform this job: > - mounting a raid filesystem in >> degraded mode is good for some setup but it is not the right >> solution for all: a configure parameter to allow one behavior or >> the other is needed: > - the degraded mode should be allowed only if >> not all the devices are discovered AND a timeout is expired. This >> timeout is another variable which (IMHO) should be configurable; > These first 2 points can be easily handled with some simple logic in > userspace without needing a mount helper. If you implement it in a mount.btrfs, you have this logic available for all cases, not only for mounting the root fs >> - there are different degrees of degraded mode: if the raid is a >> RAID6, losing a device would be acceptable; loosing two devices may >> be unacceptable. Again there is no a simple answer; it is needed a >> configurable policy; > This can be solved by providing 2 new return values for the > BBTRFS_IOC_DEVICES_READY ioctl (instead of just one), one for for > arrays that are in such a state that losing another disk will almost > certainly cause data loss (ie, a RAID6 with two missing devices, or a > BTRFS raid1/10 with one missing device), and one for an array > (theoretically) won't lose any data if one more device drops out (ie, > a RAID6 (or something with higher parity) with one missing disk) This is a detail; the point is that it is needed to implement this policy. I am suggesting to not "spread" this logic in too many subsystem (kernel, systemd, udev, scripts......). BTRFS couples a filesystem with a devices manager. This exposes a lot of new problems and options. I am suggesting to create a "tool" to manage all these new problems/options. This tool is (of course) btrfs specific, and I am convinced that a good place to start is a mount.btrfs helper. >, and > then provide a module parameter to allow forcing the kernel to report > one or the other. this policy should be different by mount point: if the machine is a remote one, I can allow to mount the root of filesystem even in degraded mode to start some "recovery"; but a more conservative policy may be applied to the other ones fss. This is one of the reason to let the policy out from the kernel. >> - pay attention that the current architecture has some flaws: if a >> device disappear during the device discovery, ID_BTRFS_READY >> returns OK even if a device is missing. > Point 4 would require for some kind of continuous > scanning/notification (and therefore add more bulk, the lack of which > is in my opinion one of the biggest advantages of BTRFS over ZFS), > and even then there will always be the possibility that a device > drops out between you calling the ioctl and trying to mount the > filesystem. If you shorter the windows, then less likely it may happen. -- gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-01-05 17:55 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-01-05 9:46 Extend BTRFS_IOC_DEVICES_READY for degraded RAID Harald Hoyer 2015-01-05 11:31 ` Lennart Poettering 2015-01-05 12:08 ` Austin S Hemmelgarn 2015-01-05 16:36 ` Goffredo Baroncelli 2015-01-05 17:02 ` Austin S Hemmelgarn 2015-01-05 17:57 ` Goffredo Baroncelli
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.