* 64-btrfs.rules and degraded boot @ 2016-07-05 18:53 Chris Murphy 2016-07-05 19:27 ` Kai Krakow 2016-07-07 16:37 ` Goffredo Baroncelli 0 siblings, 2 replies; 33+ messages in thread From: Chris Murphy @ 2016-07-05 18:53 UTC (permalink / raw) To: Btrfs BTRFS For some reason I thought it was possible to do degraded Btrfs boots by removing root=UUID= in favor of a remaining good block device, e.g. root=/dev/vda2, and then adding degraded to rootflags. But this doesn't work either on CentOS 7.2 or Fedora Rawhide. What happens is systemd waits for vda2 (or by UUID) indefinitely, it doesn't even try to mount the volume. I think it's due to the udev rule that's basically saying the device isn't ready because not all of its devices are there. [root@f24m ~]# cat /usr/lib/udev/rules.d/64-btrfs.rules # do not edit this file, it will be overwritten on update SUBSYSTEM!="block", GOTO="btrfs_end" ACTION=="remove", GOTO="btrfs_end" ENV{ID_FS_TYPE}!="btrfs", GOTO="btrfs_end" # let the kernel know about this btrfs filesystem, and check if it is complete IMPORT{builtin}="btrfs ready $devnode" # mark the device as not ready to be used by the system ENV{ID_BTRFS_READY}=="0", ENV{SYSTEMD_READY}="0" LABEL="btrfs_end" I am kinda confused about this "btrfs ready $devnode" portion. Isn't it "btrfs device ready $devnode" if this is based on user space tools? -- Chris Murphy ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 64-btrfs.rules and degraded boot 2016-07-05 18:53 64-btrfs.rules and degraded boot Chris Murphy @ 2016-07-05 19:27 ` Kai Krakow 2016-07-05 19:30 ` Chris Murphy 2016-07-07 16:37 ` Goffredo Baroncelli 1 sibling, 1 reply; 33+ messages in thread From: Kai Krakow @ 2016-07-05 19:27 UTC (permalink / raw) To: linux-btrfs Am Tue, 5 Jul 2016 12:53:02 -0600 schrieb Chris Murphy <lists@colorremedies.com>: > For some reason I thought it was possible to do degraded Btrfs boots > by removing root=UUID= in favor of a remaining good block device, e.g. > root=/dev/vda2, and then adding degraded to rootflags. But this > doesn't work either on CentOS 7.2 or Fedora Rawhide. What happens is > systemd waits for vda2 (or by UUID) indefinitely, it doesn't even try > to mount the volume. > > I think it's due to the udev rule that's basically saying the device > isn't ready because not all of its devices are there. > > [root@f24m ~]# cat /usr/lib/udev/rules.d/64-btrfs.rules > # do not edit this file, it will be overwritten on update > > SUBSYSTEM!="block", GOTO="btrfs_end" > ACTION=="remove", GOTO="btrfs_end" > ENV{ID_FS_TYPE}!="btrfs", GOTO="btrfs_end" > > # let the kernel know about this btrfs filesystem, and check if it is > complete IMPORT{builtin}="btrfs ready $devnode" This doesn't come from the user-space tools but from the udev builtins, I think: # udevadm test-builtin btrfs > # mark the device as not ready to be used by the system > ENV{ID_BTRFS_READY}=="0", ENV{SYSTEMD_READY}="0" > > LABEL="btrfs_end" > > > I am kinda confused about this "btrfs ready $devnode" portion. Isn't > it "btrfs device ready $devnode" if this is based on user space tools? > > > > -- Regards, Kai Replies to list-only preferred. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 64-btrfs.rules and degraded boot 2016-07-05 19:27 ` Kai Krakow @ 2016-07-05 19:30 ` Chris Murphy 2016-07-05 20:10 ` Chris Murphy 0 siblings, 1 reply; 33+ messages in thread From: Chris Murphy @ 2016-07-05 19:30 UTC (permalink / raw) To: Kai Krakow; +Cc: Btrfs BTRFS On Tue, Jul 5, 2016 at 1:27 PM, Kai Krakow <hurikhan77@gmail.com> wrote: > Am Tue, 5 Jul 2016 12:53:02 -0600 > schrieb Chris Murphy <lists@colorremedies.com>: > >> For some reason I thought it was possible to do degraded Btrfs boots >> by removing root=UUID= in favor of a remaining good block device, e.g. >> root=/dev/vda2, and then adding degraded to rootflags. But this >> doesn't work either on CentOS 7.2 or Fedora Rawhide. What happens is >> systemd waits for vda2 (or by UUID) indefinitely, it doesn't even try >> to mount the volume. >> >> I think it's due to the udev rule that's basically saying the device >> isn't ready because not all of its devices are there. >> >> [root@f24m ~]# cat /usr/lib/udev/rules.d/64-btrfs.rules >> # do not edit this file, it will be overwritten on update >> >> SUBSYSTEM!="block", GOTO="btrfs_end" >> ACTION=="remove", GOTO="btrfs_end" >> ENV{ID_FS_TYPE}!="btrfs", GOTO="btrfs_end" >> >> # let the kernel know about this btrfs filesystem, and check if it is >> complete IMPORT{builtin}="btrfs ready $devnode" > > This doesn't come from the user-space tools but from the udev builtins, > I think: > > # udevadm test-builtin btrfs [root@f24m ~]# udevadm test-builtin btrfs calling: test-builtin syspath missing # dnf provides /usr/lib/udev/rules.d/64-btrfs.rules Last metadata expiration check: 1:17:58 ago on Tue Jul 5 12:11:07 2016. systemd-udev-229-8.fc24.x86_64 : Rule-based device node and kernel event manager Repo : @System -- Chris Murphy ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 64-btrfs.rules and degraded boot 2016-07-05 19:30 ` Chris Murphy @ 2016-07-05 20:10 ` Chris Murphy 2016-07-06 9:51 ` Andrei Borzenkov 0 siblings, 1 reply; 33+ messages in thread From: Chris Murphy @ 2016-07-05 20:10 UTC (permalink / raw) To: Chris Murphy; +Cc: Kai Krakow, Btrfs BTRFS I started a systemd-devel@ thread since that's where most udev stuff gets talked about. https://lists.freedesktop.org/archives/systemd-devel/2016-July/037031.html -- Chris Murphy ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 64-btrfs.rules and degraded boot 2016-07-05 20:10 ` Chris Murphy @ 2016-07-06 9:51 ` Andrei Borzenkov 2016-07-06 11:45 ` Austin S. Hemmelgarn 2016-07-06 17:19 ` Chris Murphy 0 siblings, 2 replies; 33+ messages in thread From: Andrei Borzenkov @ 2016-07-06 9:51 UTC (permalink / raw) To: Chris Murphy; +Cc: Kai Krakow, Btrfs BTRFS On Tue, Jul 5, 2016 at 11:10 PM, Chris Murphy <lists@colorremedies.com> wrote: > I started a systemd-devel@ thread since that's where most udev stuff > gets talked about. > > https://lists.freedesktop.org/archives/systemd-devel/2016-July/037031.html > Before discussing how to implement it in systemd, we need to decide what to implement. I.e. 1) do you always want to mount filesystem in degraded mode if not enough devices are present or only if explicit hint is given? 2) do you want to restrict degrade handling to root only or to other filesystems as well? Note that there could be more early boot filesystems that absolutely need same treatment (enters separate /usr), and there are also normal filesystems that may need be mounted even degraded. 3) can we query btrfs whether it is mountable in degraded mode? according to documentation, "btrfs device ready" (which udev builtin follows) checks "if it has ALL of it’s devices in cache for mounting". This is required for proper systemd ordering of services. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 64-btrfs.rules and degraded boot 2016-07-06 9:51 ` Andrei Borzenkov @ 2016-07-06 11:45 ` Austin S. Hemmelgarn 2016-07-06 11:55 ` Andrei Borzenkov 2016-07-06 17:19 ` Chris Murphy 1 sibling, 1 reply; 33+ messages in thread From: Austin S. Hemmelgarn @ 2016-07-06 11:45 UTC (permalink / raw) To: Andrei Borzenkov, Chris Murphy; +Cc: Kai Krakow, Btrfs BTRFS On 2016-07-06 05:51, Andrei Borzenkov wrote: > On Tue, Jul 5, 2016 at 11:10 PM, Chris Murphy <lists@colorremedies.com> wrote: >> I started a systemd-devel@ thread since that's where most udev stuff >> gets talked about. >> >> https://lists.freedesktop.org/archives/systemd-devel/2016-July/037031.html >> > > Before discussing how to implement it in systemd, we need to decide > what to implement. I.e. > > 1) do you always want to mount filesystem in degraded mode if not > enough devices are present or only if explicit hint is given? > 2) do you want to restrict degrade handling to root only or to other > filesystems as well? Note that there could be more early boot > filesystems that absolutely need same treatment (enters separate > /usr), and there are also normal filesystems that may need be mounted > even degraded. > 3) can we query btrfs whether it is mountable in degraded mode? > according to documentation, "btrfs device ready" (which udev builtin > follows) checks "if it has ALL of it’s devices in cache for mounting". > This is required for proper systemd ordering of services. To be entirely honest, if it were me, I'd want systemd to fsck off. If the kernel mount(2) call succeeds, then the filesystem was ready enough to mount, and if it doesn't, then it wasn't, end of story. The whole concept of trying to track in userspace something the kernel itself tracks and knows a whole lot more about is absolutely stupid. It makes some sense when dealing with LVM or MD, because that is potentially a security issue (someone could inject a bogus device node that you then mount instead of your desired target), but it makes no sense here, because there's no way to prevent the equivalent from happening in BTRFS. As far as the udev rules, I'm pretty certain that _we_ ship those with btrfs-progs, I have no idea why they're packaged with udev in CentOS (oh wait, I bet they package every single possible udev rule in that package just in case, don't they?). ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 64-btrfs.rules and degraded boot 2016-07-06 11:45 ` Austin S. Hemmelgarn @ 2016-07-06 11:55 ` Andrei Borzenkov 2016-07-06 12:14 ` Austin S. Hemmelgarn 2016-07-06 12:49 ` Tomasz Torcz 0 siblings, 2 replies; 33+ messages in thread From: Andrei Borzenkov @ 2016-07-06 11:55 UTC (permalink / raw) To: Austin S. Hemmelgarn; +Cc: Chris Murphy, Kai Krakow, Btrfs BTRFS On Wed, Jul 6, 2016 at 2:45 PM, Austin S. Hemmelgarn <ahferroin7@gmail.com> wrote: > On 2016-07-06 05:51, Andrei Borzenkov wrote: >> >> On Tue, Jul 5, 2016 at 11:10 PM, Chris Murphy <lists@colorremedies.com> >> wrote: >>> >>> I started a systemd-devel@ thread since that's where most udev stuff >>> gets talked about. >>> >>> >>> https://lists.freedesktop.org/archives/systemd-devel/2016-July/037031.html >>> >> >> Before discussing how to implement it in systemd, we need to decide >> what to implement. I.e. >> >> 1) do you always want to mount filesystem in degraded mode if not >> enough devices are present or only if explicit hint is given? >> 2) do you want to restrict degrade handling to root only or to other >> filesystems as well? Note that there could be more early boot >> filesystems that absolutely need same treatment (enters separate >> /usr), and there are also normal filesystems that may need be mounted >> even degraded. >> 3) can we query btrfs whether it is mountable in degraded mode? >> according to documentation, "btrfs device ready" (which udev builtin >> follows) checks "if it has ALL of it’s devices in cache for mounting". >> This is required for proper systemd ordering of services. > > > To be entirely honest, if it were me, I'd want systemd to fsck off. If the > kernel mount(2) call succeeds, then the filesystem was ready enough to > mount, and if it doesn't, then it wasn't, end of story. How should user space know when to try mount? What user space is supposed to do during boot if mount fails? Do you suggest while true; do mount /dev/foo && exit 0 done as part of startup sequence? And note that nowhere is systemd involved so far. > The whole concept > of trying to track in userspace something the kernel itself tracks and knows > a whole lot more about is absolutely stupid. It need not be user space. If kernel notifies user space when filesystem is mountable, problem solved. It could be udev event, netlink, whatever. Until kernel does it, user space need to either poll or somehow track it based on available events. > It makes some sense when > dealing with LVM or MD, because that is potentially a security issue > (someone could inject a bogus device node that you then mount instead of > your desired target), I do not understand it at all. MD and LVM has exactly the same problem - they need to know when they can assemble MD/VG. I miss what it has to do with security, sorry. > but it makes no sense here, because there's no way to > prevent the equivalent from happening in BTRFS. > > As far as the udev rules, I'm pretty certain that _we_ ship those with > btrfs-progs, No, you do not. You ship rule to rename devices to be more "user-friendly". But the rule in question has always been part of udev. > I have no idea why they're packaged with udev in CentOS (oh > wait, I bet they package every single possible udev rule in that package > just in case, don't they?). ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 64-btrfs.rules and degraded boot 2016-07-06 11:55 ` Andrei Borzenkov @ 2016-07-06 12:14 ` Austin S. Hemmelgarn 2016-07-06 12:39 ` Andrei Borzenkov 2016-07-06 12:49 ` Tomasz Torcz 1 sibling, 1 reply; 33+ messages in thread From: Austin S. Hemmelgarn @ 2016-07-06 12:14 UTC (permalink / raw) To: Andrei Borzenkov; +Cc: Chris Murphy, Kai Krakow, Btrfs BTRFS On 2016-07-06 07:55, Andrei Borzenkov wrote: > On Wed, Jul 6, 2016 at 2:45 PM, Austin S. Hemmelgarn > <ahferroin7@gmail.com> wrote: >> On 2016-07-06 05:51, Andrei Borzenkov wrote: >>> >>> On Tue, Jul 5, 2016 at 11:10 PM, Chris Murphy <lists@colorremedies.com> >>> wrote: >>>> >>>> I started a systemd-devel@ thread since that's where most udev stuff >>>> gets talked about. >>>> >>>> >>>> https://lists.freedesktop.org/archives/systemd-devel/2016-July/037031.html >>>> >>> >>> Before discussing how to implement it in systemd, we need to decide >>> what to implement. I.e. >>> >>> 1) do you always want to mount filesystem in degraded mode if not >>> enough devices are present or only if explicit hint is given? >>> 2) do you want to restrict degrade handling to root only or to other >>> filesystems as well? Note that there could be more early boot >>> filesystems that absolutely need same treatment (enters separate >>> /usr), and there are also normal filesystems that may need be mounted >>> even degraded. >>> 3) can we query btrfs whether it is mountable in degraded mode? >>> according to documentation, "btrfs device ready" (which udev builtin >>> follows) checks "if it has ALL of it’s devices in cache for mounting". >>> This is required for proper systemd ordering of services. >> >> >> To be entirely honest, if it were me, I'd want systemd to fsck off. If the >> kernel mount(2) call succeeds, then the filesystem was ready enough to >> mount, and if it doesn't, then it wasn't, end of story. > > How should user space know when to try mount? What user space is > supposed to do during boot if mount fails? Do you suggest > > while true; do > mount /dev/foo && exit 0 > done > > as part of startup sequence? And note that nowhere is systemd involved so far. Nowhere there, except if you have a filesystem in fstab (or a mount unit, which I hate for other reasons that I will not go into right now), and you mount it and systemd thinks the device isn't ready, it unmounts it _immediately_. In the case of boot, it's because of systemd thinking the device isn't ready that you can't mount degraded with a missing device. In the case of the root filesystem at least, the initramfs is expected to handle this, and most of them do poll in some way, or have other methods of determining this. I occasionally have issues with it with dracut without systemd, but that's due to a separate bug there involving the device mapper. > >> The whole concept >> of trying to track in userspace something the kernel itself tracks and knows >> a whole lot more about is absolutely stupid. > > It need not be user space. If kernel notifies user space when > filesystem is mountable, problem solved. It could be udev event, > netlink, whatever. Until kernel does it, user space need to either > poll or somehow track it based on available events. THis I agree could be done better, but it absolutely should not be in userspace, the notification needs to come from the kernel, but that leads to the problem of knowing whether or not the FS can mount degraded, or only ro, or any number of other situations. > >> It makes some sense when >> dealing with LVM or MD, because that is potentially a security issue >> (someone could inject a bogus device node that you then mount instead of >> your desired target), > > I do not understand it at all. MD and LVM has exactly the same problem > - they need to know when they can assemble MD/VG. I miss what it has > to do with security, sorry. If you don't track whether or not the device is assembled, then someone could create an arbitrary device node with the same name and then get you to mount that, possibly causing all kinds of issues depending on any number of other factors. > >> but it makes no sense here, because there's no way to >> prevent the equivalent from happening in BTRFS. >> >> As far as the udev rules, I'm pretty certain that _we_ ship those with >> btrfs-progs, > > No, you do not. You ship rule to rename devices to be more > "user-friendly". But the rule in question has always been part of > udev. Ah, you're right, I was mistaken about this. > >> I have no idea why they're packaged with udev in CentOS (oh >> wait, I bet they package every single possible udev rule in that package >> just in case, don't they?). ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 64-btrfs.rules and degraded boot 2016-07-06 12:14 ` Austin S. Hemmelgarn @ 2016-07-06 12:39 ` Andrei Borzenkov 2016-07-06 12:48 ` Austin S. Hemmelgarn 0 siblings, 1 reply; 33+ messages in thread From: Andrei Borzenkov @ 2016-07-06 12:39 UTC (permalink / raw) To: Austin S. Hemmelgarn; +Cc: Chris Murphy, Kai Krakow, Btrfs BTRFS Отправлено с iPhone > 6 июля 2016 г., в 15:14, Austin S. Hemmelgarn <ahferroin7@gmail.com> написал(а): > >> On 2016-07-06 07:55, Andrei Borzenkov wrote: >> On Wed, Jul 6, 2016 at 2:45 PM, Austin S. Hemmelgarn >> <ahferroin7@gmail.com> wrote: >>> On 2016-07-06 05:51, Andrei Borzenkov wrote: >>>> >>>> On Tue, Jul 5, 2016 at 11:10 PM, Chris Murphy <lists@colorremedies.com> >>>> wrote: >>>>> >>>>> I started a systemd-devel@ thread since that's where most udev stuff >>>>> gets talked about. >>>>> >>>>> >>>>> https://lists.freedesktop.org/archives/systemd-devel/2016-July/037031.html >>>> >>>> Before discussing how to implement it in systemd, we need to decide >>>> what to implement. I.e. >>>> >>>> 1) do you always want to mount filesystem in degraded mode if not >>>> enough devices are present or only if explicit hint is given? >>>> 2) do you want to restrict degrade handling to root only or to other >>>> filesystems as well? Note that there could be more early boot >>>> filesystems that absolutely need same treatment (enters separate >>>> /usr), and there are also normal filesystems that may need be mounted >>>> even degraded. >>>> 3) can we query btrfs whether it is mountable in degraded mode? >>>> according to documentation, "btrfs device ready" (which udev builtin >>>> follows) checks "if it has ALL of it’s devices in cache for mounting". >>>> This is required for proper systemd ordering of services. >>> >>> >>> To be entirely honest, if it were me, I'd want systemd to fsck off. If the >>> kernel mount(2) call succeeds, then the filesystem was ready enough to >>> mount, and if it doesn't, then it wasn't, end of story. >> >> How should user space know when to try mount? What user space is >> supposed to do during boot if mount fails? Do you suggest >> >> while true; do >> mount /dev/foo && exit 0 >> done >> >> as part of startup sequence? And note that nowhere is systemd involved so far. > Nowhere there, except if you have a filesystem in fstab (or a mount unit, which I hate for other reasons that I will not go into right now), and you mount it and systemd thinks the device isn't ready, it unmounts it _immediately_. In the case of boot, it's because of systemd thinking the device isn't ready that you can't mount degraded with a missing device. In the case of the root filesystem at least, the initramfs is expected to handle this, and most of them do poll in some way, or have other methods of determining this. I occasionally have issues with it with dracut without systemd, but that's due to a separate bug there involving the device mapper. > How this systemd bashing answers my question - how user space knows when it can call mount at startup? >> >>> The whole concept >>> of trying to track in userspace something the kernel itself tracks and knows >>> a whole lot more about is absolutely stupid. >> >> It need not be user space. If kernel notifies user space when >> filesystem is mountable, problem solved. It could be udev event, >> netlink, whatever. Until kernel does it, user space need to either >> poll or somehow track it based on available events. > THis I agree could be done better, but it absolutely should not be in userspace, the notification needs to come from the kernel, but that leads to the problem of knowing whether or not the FS can mount degraded, or only ro, or any number of other situations. >> >>> It makes some sense when >>> dealing with LVM or MD, because that is potentially a security issue >>> (someone could inject a bogus device node that you then mount instead of >>> your desired target), >> >> I do not understand it at all. MD and LVM has exactly the same problem >> - they need to know when they can assemble MD/VG. I miss what it has >> to do with security, sorry. > If you don't track whether or not the device is assembled, then someone could create an arbitrary device node with the same name and then get you to mount that, possibly causing all kinds of issues depending on any number of other factors. Device node is created as soon as array is seen for the first time. If you imply someone may replace it, what prevents doing it at any arbitrary time in the future? >> >>> but it makes no sense here, because there's no way to >>> prevent the equivalent from happening in BTRFS. >>> >>> As far as the udev rules, I'm pretty certain that _we_ ship those with >>> btrfs-progs, >> >> No, you do not. You ship rule to rename devices to be more >> "user-friendly". But the rule in question has always been part of >> udev. > Ah, you're right, I was mistaken about this. >> >>> I have no idea why they're packaged with udev in CentOS (oh >>> wait, I bet they package every single possible udev rule in that package >>> just in case, don't they?). > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 64-btrfs.rules and degraded boot 2016-07-06 12:39 ` Andrei Borzenkov @ 2016-07-06 12:48 ` Austin S. Hemmelgarn 2016-07-07 16:52 ` Goffredo Baroncelli 0 siblings, 1 reply; 33+ messages in thread From: Austin S. Hemmelgarn @ 2016-07-06 12:48 UTC (permalink / raw) To: Andrei Borzenkov; +Cc: Chris Murphy, Kai Krakow, Btrfs BTRFS On 2016-07-06 08:39, Andrei Borzenkov wrote: > > > Отправлено с iPhone > >> 6 июля 2016 г., в 15:14, Austin S. Hemmelgarn <ahferroin7@gmail.com> написал(а): >> >>> On 2016-07-06 07:55, Andrei Borzenkov wrote: >>> On Wed, Jul 6, 2016 at 2:45 PM, Austin S. Hemmelgarn >>> <ahferroin7@gmail.com> wrote: >>>> On 2016-07-06 05:51, Andrei Borzenkov wrote: >>>>> >>>>> On Tue, Jul 5, 2016 at 11:10 PM, Chris Murphy <lists@colorremedies.com> >>>>> wrote: >>>>>> >>>>>> I started a systemd-devel@ thread since that's where most udev stuff >>>>>> gets talked about. >>>>>> >>>>>> >>>>>> https://lists.freedesktop.org/archives/systemd-devel/2016-July/037031.html >>>>> >>>>> Before discussing how to implement it in systemd, we need to decide >>>>> what to implement. I.e. >>>>> >>>>> 1) do you always want to mount filesystem in degraded mode if not >>>>> enough devices are present or only if explicit hint is given? >>>>> 2) do you want to restrict degrade handling to root only or to other >>>>> filesystems as well? Note that there could be more early boot >>>>> filesystems that absolutely need same treatment (enters separate >>>>> /usr), and there are also normal filesystems that may need be mounted >>>>> even degraded. >>>>> 3) can we query btrfs whether it is mountable in degraded mode? >>>>> according to documentation, "btrfs device ready" (which udev builtin >>>>> follows) checks "if it has ALL of it’s devices in cache for mounting". >>>>> This is required for proper systemd ordering of services. >>>> >>>> >>>> To be entirely honest, if it were me, I'd want systemd to fsck off. If the >>>> kernel mount(2) call succeeds, then the filesystem was ready enough to >>>> mount, and if it doesn't, then it wasn't, end of story. >>> >>> How should user space know when to try mount? What user space is >>> supposed to do during boot if mount fails? Do you suggest >>> >>> while true; do >>> mount /dev/foo && exit 0 >>> done >>> >>> as part of startup sequence? And note that nowhere is systemd involved so far. >> Nowhere there, except if you have a filesystem in fstab (or a mount unit, which I hate for other reasons that I will not go into right now), and you mount it and systemd thinks the device isn't ready, it unmounts it _immediately_. In the case of boot, it's because of systemd thinking the device isn't ready that you can't mount degraded with a missing device. In the case of the root filesystem at least, the initramfs is expected to handle this, and most of them do poll in some way, or have other methods of determining this. I occasionally have issues with it with dracut without systemd, but that's due to a separate bug there involving the device mapper. >> > > How this systemd bashing answers my question - how user space knows when it can call mount at startup? You mentioned that systemd wasn't involved, which is patently false if it's being used as your init system, and I was admittedly mostly responding to that. Now, to answer the primary question which I forgot to answer: Userspace doesn't. Systemd doesn't either but assumes it does and checks in a flawed way. Dracut's polling loop assumes it does but sometimes fails in a different way. There is no way other than calling mount right now to know for sure if the mount will succeed, and that actually applies to a certain degree to any filesystem (because any number of things that are outside of even the kernel's control might happen while trying to mount the device. > > >>> >>>> The whole concept >>>> of trying to track in userspace something the kernel itself tracks and knows >>>> a whole lot more about is absolutely stupid. >>> >>> It need not be user space. If kernel notifies user space when >>> filesystem is mountable, problem solved. It could be udev event, >>> netlink, whatever. Until kernel does it, user space need to either >>> poll or somehow track it based on available events. >> THis I agree could be done better, but it absolutely should not be in userspace, the notification needs to come from the kernel, but that leads to the problem of knowing whether or not the FS can mount degraded, or only ro, or any number of other situations. >>> >>>> It makes some sense when >>>> dealing with LVM or MD, because that is potentially a security issue >>>> (someone could inject a bogus device node that you then mount instead of >>>> your desired target), >>> >>> I do not understand it at all. MD and LVM has exactly the same problem >>> - they need to know when they can assemble MD/VG. I miss what it has >>> to do with security, sorry. >> If you don't track whether or not the device is assembled, then someone could create an arbitrary device node with the same name and then get you to mount that, possibly causing all kinds of issues depending on any number of other factors. > > Device node is created as soon as array is seen for the first time. If you imply someone may replace it, what prevents doing it at any arbitrary time in the future? It's still possible, but it's not as easy because replacing it after it's mounted would require a remount to have any effect. The most reliable time to do something like this is during boot before the mount. LVM and/or MD may or may not replace the node properly when they start (I don't have enough background on MD and haven't tested with LVM), but if that's after the fake node has already been mounted, then it's won't help much, except for helping cover up the attack. > >>> >>>> but it makes no sense here, because there's no way to >>>> prevent the equivalent from happening in BTRFS. >>>> >>>> As far as the udev rules, I'm pretty certain that _we_ ship those with >>>> btrfs-progs, >>> >>> No, you do not. You ship rule to rename devices to be more >>> "user-friendly". But the rule in question has always been part of >>> udev. >> Ah, you're right, I was mistaken about this. >>> >>>> I have no idea why they're packaged with udev in CentOS (oh >>>> wait, I bet they package every single possible udev rule in that package >>>> just in case, don't they?). >> ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 64-btrfs.rules and degraded boot 2016-07-06 12:48 ` Austin S. Hemmelgarn @ 2016-07-07 16:52 ` Goffredo Baroncelli 2016-07-07 18:23 ` Austin S. Hemmelgarn 0 siblings, 1 reply; 33+ messages in thread From: Goffredo Baroncelli @ 2016-07-07 16:52 UTC (permalink / raw) To: Austin S. Hemmelgarn, Andrei Borzenkov Cc: Chris Murphy, Kai Krakow, Btrfs BTRFS On 2016-07-06 14:48, Austin S. Hemmelgarn wrote: > On 2016-07-06 08:39, Andrei Borzenkov wrote: [....] >>>>> >>>>> To be entirely honest, if it were me, I'd want systemd to >>>>> fsck off. If the kernel mount(2) call succeeds, then the >>>>> filesystem was ready enough to mount, and if it doesn't, then >>>>> it wasn't, end of story. >>>> >>>> How should user space know when to try mount? What user space >>>> is supposed to do during boot if mount fails? Do you suggest >>>> >>>> while true; do mount /dev/foo && exit 0 done >>>> >>>> as part of startup sequence? And note that nowhere is systemd >>>> involved so far. >>> Nowhere there, except if you have a filesystem in fstab (or a >>> mount unit, which I hate for other reasons that I will not go >>> into right now), and you mount it and systemd thinks the device >>> isn't ready, it unmounts it _immediately_. In the case of boot, >>> it's because of systemd thinking the device isn't ready that you >>> can't mount degraded with a missing device. In the case of the >>> root filesystem at least, the initramfs is expected to handle >>> this, and most of them do poll in some way, or have other methods >>> of determining this. I occasionally have issues with it with >>> dracut without systemd, but that's due to a separate bug there >>> involving the device mapper. >>> >> >> How this systemd bashing answers my question - how user space knows >> when it can call mount at startup? > You mentioned that systemd wasn't involved, which is patently false > if it's being used as your init system, and I was admittedly mostly > responding to that. > > Now, to answer the primary question which I forgot to answer: > Userspace doesn't. Systemd doesn't either but assumes it does and > checks in a flawed way. Dracut's polling loop assumes it does but > sometimes fails in a different way. There is no way other than > calling mount right now to know for sure if the mount will succeed, > and that actually applies to a certain degree to any filesystem > (because any number of things that are outside of even the kernel's > control might happen while trying to mount the device. I think that there is no a simple answer, and the answer may depend by context. In the past, I made a prototype of a mount helper for btrfs [1]; the aim was to: 1) get rid of the actual btrfs volume discovery (udev which trigger btrfs dev scan) which has a lot of strange condition (what happens when a device disappear ?) 2) create a place where we develop and define strategies to handle all (or most) of the case of [partial] failure of a [multi-device] btrfs filesystem By default, my mount.btrfs waited the needed devices for a filesystem, and mount in degraded mode if not all devices are appeared (depending by a switch); if a timeout is reached, and error is returned. It doesn't need any special udev rule, because it performs a discovery of the devices using libuuid. I think that mounting a filesystem and handling all the possibles case relaying of the udev and its syntax of the udev rules is more a problem than a solution. Adding that udev and the udev rules are developed in a different project, the difficulties increase. I think that BTRFS for its complexity and their peculiarities need a dedicated tool like a mount helper. My mount.btrfs is not able to solve all the problem, but might be a starts for handling the issues. BR G.Baroncelli [1] http://www.spinics.net/lists/linux-btrfs/msg28764.html -- gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 64-btrfs.rules and degraded boot 2016-07-07 16:52 ` Goffredo Baroncelli @ 2016-07-07 18:23 ` Austin S. Hemmelgarn 2016-07-07 18:58 ` Chris Murphy 2016-07-07 19:41 ` Goffredo Baroncelli 0 siblings, 2 replies; 33+ messages in thread From: Austin S. Hemmelgarn @ 2016-07-07 18:23 UTC (permalink / raw) To: kreijack, Andrei Borzenkov; +Cc: Chris Murphy, Kai Krakow, Btrfs BTRFS On 2016-07-07 12:52, Goffredo Baroncelli wrote: > On 2016-07-06 14:48, Austin S. Hemmelgarn wrote: >> On 2016-07-06 08:39, Andrei Borzenkov wrote: > [....] >>>>>> >>>>>> To be entirely honest, if it were me, I'd want systemd to >>>>>> fsck off. If the kernel mount(2) call succeeds, then the >>>>>> filesystem was ready enough to mount, and if it doesn't, then >>>>>> it wasn't, end of story. >>>>> >>>>> How should user space know when to try mount? What user space >>>>> is supposed to do during boot if mount fails? Do you suggest >>>>> >>>>> while true; do mount /dev/foo && exit 0 done >>>>> >>>>> as part of startup sequence? And note that nowhere is systemd >>>>> involved so far. >>>> Nowhere there, except if you have a filesystem in fstab (or a >>>> mount unit, which I hate for other reasons that I will not go >>>> into right now), and you mount it and systemd thinks the device >>>> isn't ready, it unmounts it _immediately_. In the case of boot, >>>> it's because of systemd thinking the device isn't ready that you >>>> can't mount degraded with a missing device. In the case of the >>>> root filesystem at least, the initramfs is expected to handle >>>> this, and most of them do poll in some way, or have other methods >>>> of determining this. I occasionally have issues with it with >>>> dracut without systemd, but that's due to a separate bug there >>>> involving the device mapper. >>>> >>> >>> How this systemd bashing answers my question - how user space knows >>> when it can call mount at startup? >> You mentioned that systemd wasn't involved, which is patently false >> if it's being used as your init system, and I was admittedly mostly >> responding to that. >> >> Now, to answer the primary question which I forgot to answer: >> Userspace doesn't. Systemd doesn't either but assumes it does and >> checks in a flawed way. Dracut's polling loop assumes it does but >> sometimes fails in a different way. There is no way other than >> calling mount right now to know for sure if the mount will succeed, >> and that actually applies to a certain degree to any filesystem >> (because any number of things that are outside of even the kernel's >> control might happen while trying to mount the device. > > I think that there is no a simple answer, and the answer may depend by context. > In the past, I made a prototype of a mount helper for btrfs [1]; the aim was to: > > 1) get rid of the actual btrfs volume discovery (udev which trigger btrfs dev scan) which has a lot of strange condition (what happens when a device disappear ?) > 2) create a place where we develop and define strategies to handle all (or most) of the case of [partial] failure of a [multi-device] btrfs filesystem > > By default, my mount.btrfs waited the needed devices for a filesystem, and mount in degraded mode if not all devices are appeared (depending by a switch); if a timeout is reached, and error is returned. > > It doesn't need any special udev rule, because it performs a discovery of the devices using libuuid. I think that mounting a filesystem and handling all the possibles case relaying of the udev and its syntax of the udev rules is more a problem than a solution. Adding that udev and the udev rules are developed in a different project, the difficulties increase. > > I think that BTRFS for its complexity and their peculiarities need a dedicated tool like a mount helper. > > My mount.btrfs is not able to solve all the problem, but might be a starts for handling the issues. FWIW, I've pretty much always been of the opinion that the device discovery belongs in a mount helper. The auto-discovery from udev (and more importantly, how the kernel handles being told about a device) is much of the reason that it's so inherently dangerous to do block level copies. There's obviously no way that can be changed now without breaking something, but that's on the really short list of things that I personally feel are worth breaking to fix a particularly dangerous pitfall. The recent discovery that device ready state is write-once when set just reinforces this in my opinion. Here's how I would picture the ideal situation: * A device is processed by udev. It detects that it's part of a BTRFS array, updates blkid and whatever else in userspace with this info, and then stops without telling the kernel. * The kernel tracks devices until the filesystem they are part of is unmounted, or a mount of that FS fails. * When the user goes to mount the a BTRFS filesystem, they use a mount helper. 1. This helper queries udev/blkid/whatever to see which devices are part of an array. 2. Once the helper determines which devices are potentially in the requested FS, it checks the following things to ensure array integrity: - Does each device report the same number of component devices for the array? - Does the reported number match the number of devices found? - If a mount by UUID is requested, do all the labels match on each device? - If a mount by LABEL is requested, do all the UUID's match on each device? - If a mount by path is requested, do all the component devices reported by that device have matching LABEL _and_ UUID? - Is any of the devices found already in-use by another mount? 4. If any of the above checks fails, and the user has not specified an option to request a mount anyway, report the error and exit with non-zero status _before_ even talking to the kernel. 5. If only the second check fails (the check verifying the number of devices found), and it fails because the number found is less than required for a non-degraded mount, ignore that check if and only if the user specified -o degraded. 6. If any of the other checks fail, ignore them if and only if the user asks to ignore that specific check. 7. Otherwise, notify the kernel about the devices and call mount(2). * The mount helper parses it's own set of special options similar to the bg/fg/retry options used by mount.nfs to allow for timeouts when mounting, as well as asynchronous mounts in the background. * btrfs device scan becomes a no-op * btrfs device ready uses the above logic minus step 7 to determine if a filesystem is probably ready. Such a situation would probably eliminate or at least reduce most of our current issues with device discovery, and provide much better error reporting and general flexibility. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 64-btrfs.rules and degraded boot 2016-07-07 18:23 ` Austin S. Hemmelgarn @ 2016-07-07 18:58 ` Chris Murphy 2016-07-07 19:14 ` Chris Murphy ` (2 more replies) 2016-07-07 19:41 ` Goffredo Baroncelli 1 sibling, 3 replies; 33+ messages in thread From: Chris Murphy @ 2016-07-07 18:58 UTC (permalink / raw) To: Austin S. Hemmelgarn Cc: Goffredo Baroncelli, Andrei Borzenkov, Chris Murphy, Kai Krakow, Btrfs BTRFS On Thu, Jul 7, 2016 at 12:23 PM, Austin S. Hemmelgarn <ahferroin7@gmail.com> wrote: > > Here's how I would picture the ideal situation: > * A device is processed by udev. It detects that it's part of a BTRFS > array, updates blkid and whatever else in userspace with this info, and then > stops without telling the kernel. > * The kernel tracks devices until the filesystem they are part of is > unmounted, or a mount of that FS fails. > * When the user goes to mount the a BTRFS filesystem, they use a mount > helper. > 1. This helper queries udev/blkid/whatever to see which devices are part > of an array. > 2. Once the helper determines which devices are potentially in the > requested FS, it checks the following things to ensure array integrity: > - Does each device report the same number of component devices for the > array? > - Does the reported number match the number of devices found? > - If a mount by UUID is requested, do all the labels match on each > device? > - If a mount by LABEL is requested, do all the UUID's match on each > device? > - If a mount by path is requested, do all the component devices reported > by that device have matching LABEL _and_ UUID? > - Is any of the devices found already in-use by another mount? > 4. If any of the above checks fails, and the user has not specified an > option to request a mount anyway, report the error and exit with non-zero > status _before_ even talking to the kernel. > 5. If only the second check fails (the check verifying the number of > devices found), and it fails because the number found is less than required > for a non-degraded mount, ignore that check if and only if the user > specified -o degraded. > 6. If any of the other checks fail, ignore them if and only if the user > asks to ignore that specific check. > 7. Otherwise, notify the kernel about the devices and call mount(2). > * The mount helper parses it's own set of special options similar to the > bg/fg/retry options used by mount.nfs to allow for timeouts when mounting, > as well as asynchronous mounts in the background. > * btrfs device scan becomes a no-op > * btrfs device ready uses the above logic minus step 7 to determine if a > filesystem is probably ready. > > Such a situation would probably eliminate or at least reduce most of our > current issues with device discovery, and provide much better error > reporting and general flexibility. It might be useful to see where ZFS and LVM work and fail in this regard. And also plan for D-Bus support to get state notifications up to something like storaged or other such user space management tools. Early on in Fedora there were many difficulties between systemd and LVM, so avoiding whatever that was about would be nice. Also, tangentially related, Fedora is replacing udisks2 with storaged. Storaged already has a Btrfs plug-in so there should be better awareness there. I get all kinds of damn strange behaviors in GNOME with Btrfs multiple device volumes: volume names appearing twice in the UI, unmounting one causes umount errors with the other. https://fedoraproject.org/wiki/Changes/Replace_UDisks2_by_Storaged http://storaged.org/ -- Chris Murphy ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 64-btrfs.rules and degraded boot 2016-07-07 18:58 ` Chris Murphy @ 2016-07-07 19:14 ` Chris Murphy 2016-07-07 19:59 ` Austin S. Hemmelgarn 2016-07-07 20:13 ` Goffredo Baroncelli 2 siblings, 0 replies; 33+ messages in thread From: Chris Murphy @ 2016-07-07 19:14 UTC (permalink / raw) To: Btrfs BTRFS More Btrfs udev issues, they involve making btrfs multiple device volumes via 'btrfs dev add' which then causes problems at boot time. https://bugzilla.opensuse.org/show_bug.cgi?id=912170 https://bugzilla.suse.com/show_bug.cgi?id=984516 The last part is amusing in that the proposed fix is going to end up in btrfs-progs. And so that's why: [chris@f24m ~]$ dnf provides /usr/lib/udev/rules.d/64-btrfs-dm.rules Last metadata expiration check: 1:18:18 ago on Thu Jul 7 11:54:20 2016. btrfs-progs-4.6-1.fc25.x86_64 : Userspace programs for btrfs Repo : @System [chris@f24m ~]$ dnf provides /usr/lib/udev/rules.d/64-btrfs.rules Last metadata expiration check: 1:18:30 ago on Thu Jul 7 11:54:20 2016. systemd-udev-229-8.fc24.x86_64 : Rule-based device node and kernel event manager Repo : @System Ha. So the btrfs rule is provided by udev upstream. The dm specific Btrfs rule is provided by Btrfs upstream. That's not confusing at all. Chris Murphy ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 64-btrfs.rules and degraded boot 2016-07-07 18:58 ` Chris Murphy 2016-07-07 19:14 ` Chris Murphy @ 2016-07-07 19:59 ` Austin S. Hemmelgarn 2016-07-07 20:20 ` Chris Murphy 2016-07-07 20:13 ` Goffredo Baroncelli 2 siblings, 1 reply; 33+ messages in thread From: Austin S. Hemmelgarn @ 2016-07-07 19:59 UTC (permalink / raw) To: Chris Murphy Cc: Goffredo Baroncelli, Andrei Borzenkov, Kai Krakow, Btrfs BTRFS On 2016-07-07 14:58, Chris Murphy wrote: > On Thu, Jul 7, 2016 at 12:23 PM, Austin S. Hemmelgarn > <ahferroin7@gmail.com> wrote: > >> >> Here's how I would picture the ideal situation: >> * A device is processed by udev. It detects that it's part of a BTRFS >> array, updates blkid and whatever else in userspace with this info, and then >> stops without telling the kernel. >> * The kernel tracks devices until the filesystem they are part of is >> unmounted, or a mount of that FS fails. >> * When the user goes to mount the a BTRFS filesystem, they use a mount >> helper. >> 1. This helper queries udev/blkid/whatever to see which devices are part >> of an array. >> 2. Once the helper determines which devices are potentially in the >> requested FS, it checks the following things to ensure array integrity: >> - Does each device report the same number of component devices for the >> array? >> - Does the reported number match the number of devices found? >> - If a mount by UUID is requested, do all the labels match on each >> device? >> - If a mount by LABEL is requested, do all the UUID's match on each >> device? >> - If a mount by path is requested, do all the component devices reported >> by that device have matching LABEL _and_ UUID? >> - Is any of the devices found already in-use by another mount? >> 4. If any of the above checks fails, and the user has not specified an >> option to request a mount anyway, report the error and exit with non-zero >> status _before_ even talking to the kernel. >> 5. If only the second check fails (the check verifying the number of >> devices found), and it fails because the number found is less than required >> for a non-degraded mount, ignore that check if and only if the user >> specified -o degraded. >> 6. If any of the other checks fail, ignore them if and only if the user >> asks to ignore that specific check. >> 7. Otherwise, notify the kernel about the devices and call mount(2). >> * The mount helper parses it's own set of special options similar to the >> bg/fg/retry options used by mount.nfs to allow for timeouts when mounting, >> as well as asynchronous mounts in the background. >> * btrfs device scan becomes a no-op >> * btrfs device ready uses the above logic minus step 7 to determine if a >> filesystem is probably ready. >> >> Such a situation would probably eliminate or at least reduce most of our >> current issues with device discovery, and provide much better error >> reporting and general flexibility. > > It might be useful to see where ZFS and LVM work and fail in this > regard. And also plan for D-Bus support to get state notifications up > to something like storaged or other such user space management tools. > Early on in Fedora there were many difficulties between systemd and > LVM, so avoiding whatever that was about would be nice. D-Bus support needs to be optional, period. Not everybody uses D-Bus (I have dozens of systems that get by just fine without it, and know hundreds of other people who do as well), and even people who do don't always use every tool needed (on the one system I manage that does have it, the only things I need it for are Avahi, ConsoleKit, udev, and NetworkManager, and I'm getting pretty close to the point of getting rid of NM and CK and re-implementing or forking Avahi). You have to consider the fact that there are and always will be people who do not install a GUI on their system and want the absolute minimum of software installed. > > Also, tangentially related, Fedora is replacing udisks2 with storaged. > Storaged already has a Btrfs plug-in so there should be better > awareness there. I get all kinds of damn strange behaviors in GNOME > with Btrfs multiple device volumes: volume names appearing twice in > the UI, unmounting one causes umount errors with the other. > https://fedoraproject.org/wiki/Changes/Replace_UDisks2_by_Storaged > http://storaged.org/ Personally, I don't care what Fedora is doing, or even what GNOME (or any other DE for that matter, the only reason I use Xfce is because some things need a GUI (many of them unnecessarily), and that happens to be the DE I have the fewest complaints about) is doing. The only reason that things like GNOME Disks and such exist is because they're trying to imitate Windows and OS X, which is all well and good for a desktop, but is absolute crap for many server and embedded environments (Microsoft finally realized this, and Windows Server 2012 added the ability to install without a full desktop, which actually means that they have _more_ options than a number of Linux distributions (yes you can rip out the desktop on many distros if you want, but that takes an insane amount of effort most of the time, not to mention storage space)). Storaged also qualifies as something that _needs_ to be optional, especially because it appears to require systemd (and it falls into the same category as D-Bus of 'unnecessary bloat on many systems'). Adding a mandatory dependency on systemd _will_ split the community and severely piss off quite a few people (you will likely get some rather nasty looks from a number of senior kernel developers if you meet them in person). ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 64-btrfs.rules and degraded boot 2016-07-07 19:59 ` Austin S. Hemmelgarn @ 2016-07-07 20:20 ` Chris Murphy 2016-07-08 12:24 ` Austin S. Hemmelgarn 0 siblings, 1 reply; 33+ messages in thread From: Chris Murphy @ 2016-07-07 20:20 UTC (permalink / raw) To: Austin S. Hemmelgarn Cc: Chris Murphy, Goffredo Baroncelli, Andrei Borzenkov, Kai Krakow, Btrfs BTRFS On Thu, Jul 7, 2016 at 1:59 PM, Austin S. Hemmelgarn <ahferroin7@gmail.com> wrote: > D-Bus support needs to be optional, period. Not everybody uses D-Bus (I > have dozens of systems that get by just fine without it, and know hundreds > of other people who do as well), and even people who do don't always use > every tool needed (on the one system I manage that does have it, the only > things I need it for are Avahi, ConsoleKit, udev, and NetworkManager, and > I'm getting pretty close to the point of getting rid of NM and CK and > re-implementing or forking Avahi). You have to consider the fact that there > are and always will be people who do not install a GUI on their system and > want the absolute minimum of software installed. That's fine, they can monitor kernel messages directly as their notification system. I'm concerned with people who don't ever look at kernel messages, you know, mortal users who have better things to do with a computer than that. It's important for most anyone to not have to wait for problems to manifest traumatically. > Personally, I don't care what Fedora is doing, or even what GNOME (or any > other DE for that matter, the only reason I use Xfce is because some things > need a GUI (many of them unnecessarily), and that happens to be the DE I > have the fewest complaints about) is doing. The only reason that things > like GNOME Disks and such exist is because they're trying to imitate Windows > and OS X, which is all well and good for a desktop, but is absolute crap for > many server and embedded environments (Microsoft finally realized this, and > Windows Server 2012 added the ability to install without a full desktop, > which actually means that they have _more_ options than a number of Linux > distributions (yes you can rip out the desktop on many distros if you want, > but that takes an insane amount of effort most of the time, not to mention > storage space)). I'm willing to bet dollars to donuts Xfce fans would love to know if one of their rootfs mirrors is spewing read errors, while smartd defers to the drive which says "hey no problems here". GNOME at least does report certain critical smart errors, but that still leaves something like 40% of drive failures happening without prior notice. > Storaged also qualifies as something that _needs_ to be optional, especially > because it appears to require systemd (and it falls into the same category > as D-Bus of 'unnecessary bloat on many systems'). Adding a mandatory > dependency on systemd _will_ split the community and severely piss off quite > a few people (you will likely get some rather nasty looks from a number of > senior kernel developers if you meet them in person). I just want things to work for users, defined as people who would like to stop depending on Windows and macOS for both server and desktop usage. I don't really care about ideological issues outside of that goal. -- Chris Murphy ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 64-btrfs.rules and degraded boot 2016-07-07 20:20 ` Chris Murphy @ 2016-07-08 12:24 ` Austin S. Hemmelgarn 2016-07-11 21:07 ` Chris Murphy 0 siblings, 1 reply; 33+ messages in thread From: Austin S. Hemmelgarn @ 2016-07-08 12:24 UTC (permalink / raw) To: Chris Murphy Cc: Goffredo Baroncelli, Andrei Borzenkov, Kai Krakow, Btrfs BTRFS On 2016-07-07 16:20, Chris Murphy wrote: > On Thu, Jul 7, 2016 at 1:59 PM, Austin S. Hemmelgarn > <ahferroin7@gmail.com> wrote: > >> D-Bus support needs to be optional, period. Not everybody uses D-Bus (I >> have dozens of systems that get by just fine without it, and know hundreds >> of other people who do as well), and even people who do don't always use >> every tool needed (on the one system I manage that does have it, the only >> things I need it for are Avahi, ConsoleKit, udev, and NetworkManager, and >> I'm getting pretty close to the point of getting rid of NM and CK and >> re-implementing or forking Avahi). You have to consider the fact that there >> are and always will be people who do not install a GUI on their system and >> want the absolute minimum of software installed. > > That's fine, they can monitor kernel messages directly as their > notification system. I'm concerned with people who don't ever look at > kernel messages, you know, mortal users who have better things to do > with a computer than that. It's important for most anyone to not have > to wait for problems to manifest traumatically. My point is that they probably need btrfs-progs too. Take me for example, I don't use some fancy graphical tool to tell me when my disks are failing, but I don't scrape kernel logs either. I have things set up to monitor the disks directly (using btrfs-progs in the case of stuff that can check for), and notify me via e-mail if there's an issue. Not supporting that use case at all would be like e2fsprogs adding a dependency on X11 and telling everyone who doesn't want to use X11 to just go implement their own tools. If that happened, e2fsprogs would get forked, the commit reverted in that fork, and most of the non-enterprise distros would probably switch pretty damn quick to the forked version. > > >> Personally, I don't care what Fedora is doing, or even what GNOME (or any >> other DE for that matter, the only reason I use Xfce is because some things >> need a GUI (many of them unnecessarily), and that happens to be the DE I >> have the fewest complaints about) is doing. The only reason that things >> like GNOME Disks and such exist is because they're trying to imitate Windows >> and OS X, which is all well and good for a desktop, but is absolute crap for >> many server and embedded environments (Microsoft finally realized this, and >> Windows Server 2012 added the ability to install without a full desktop, >> which actually means that they have _more_ options than a number of Linux >> distributions (yes you can rip out the desktop on many distros if you want, >> but that takes an insane amount of effort most of the time, not to mention >> storage space)). > > I'm willing to bet dollars to donuts Xfce fans would love to know if > one of their rootfs mirrors is spewing read errors, while smartd > defers to the drive which says "hey no problems here". GNOME at least > does report certain critical smart errors, but that still leaves > something like 40% of drive failures happening without prior notice. I'm not saying some specific users don't care, I'm saying that requiring people to have a specific software stack which may not work for their use case is a stupid choice for something as low level as this. Yes people want to know when something failed, but we shouldn't mandate _how_ they choose in a given system to check this. There need to be more choices than just a GUI tool and talking directly to the kernel. Looking at this another way, it is fully possible to implement something to do this in a DE agnostic manner _without depending on D-BUS_ using the tools as they are right now. An initial implementation would of course be inefficient, but until we get notifications _from the kernel_ about FS state, we have to poll regardless, which means that having D-Bus support would not help (and would probably just make things slower). > > >> Storaged also qualifies as something that _needs_ to be optional, especially >> because it appears to require systemd (and it falls into the same category >> as D-Bus of 'unnecessary bloat on many systems'). Adding a mandatory >> dependency on systemd _will_ split the community and severely piss off quite >> a few people (you will likely get some rather nasty looks from a number of >> senior kernel developers if you meet them in person). > > I just want things to work for users, defined as people who would like > to stop depending on Windows and macOS for both server and desktop > usage. I don't really care about ideological issues outside of that > goal. Making us hard depend on storaged would not help this goal. It's no different than the Microsoft and Apple approach of 'our way or not at all'. To clarify, I'm not trying to argue against adding support, I'm arguing against it being mandatory. A filesystem which requires specific system services to be running just for regular maintenance tasks is not a well designed filesystem. To be entirely honest, I'm not all that happy about the functional dependency on udev to have device discovery, but there's no point in me arguing about that... Just thinking aloud, but why not do a daemon that does the actual monitoring, and then provide an interface (at least a UNIX domain socket, and optionally a D-Bus endpoint) that other tools can use to query filesystem status. LVM already has a similar setup for monitoring DM-RAID volumes, snapshots, and thin storage pools, although it's designed as an event driven tool that does something when specific things happen (for example, possibly auto-extending snapshots when they start to get full). Other than the D-Bus support, I could probably write a basic piece of software to do this in Python in about a week of work (most of which would be figuring out the edge cases and making sure it works on both 2.7 and 3) that would provide similar functionality (with better configurability too) to that which could easily provide an interface to query filesystem status. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 64-btrfs.rules and degraded boot 2016-07-08 12:24 ` Austin S. Hemmelgarn @ 2016-07-11 21:07 ` Chris Murphy 2016-07-12 15:34 ` Austin S. Hemmelgarn 0 siblings, 1 reply; 33+ messages in thread From: Chris Murphy @ 2016-07-11 21:07 UTC (permalink / raw) To: Austin S. Hemmelgarn Cc: Chris Murphy, Goffredo Baroncelli, Andrei Borzenkov, Kai Krakow, Btrfs BTRFS On Fri, Jul 8, 2016 at 6:24 AM, Austin S. Hemmelgarn <ahferroin7@gmail.com> wrote: > To clarify, I'm not trying to argue against adding support, I'm arguing > against it being mandatory. By "D-Bus support" I did not mean to indicate mandating it, just that it would be one possible way to get some very basic state change messages to user space tools so we're doing the least amount of wheel reinvention as possible. > A filesystem which requires specific system > services to be running just for regular maintenance tasks is not a well > designed filesystem. To be entirely honest, I'm not all that happy about > the functional dependency on udev to have device discovery, but there's no > point in me arguing about that... Well everything else that came before it is effectively deprecated, so there's no going back. The way forward would be to get udev more granular state information about a Btrfs volume than 0 and 1. > > Just thinking aloud, but why not do a daemon that does the actual > monitoring, and then provide an interface (at least a UNIX domain socket, > and optionally a D-Bus endpoint) that other tools can use to query > filesystem status. LVM already has a similar setup for monitoring DM-RAID > volumes, snapshots, and thin storage pools, although it's designed as an > event driven tool that does something when specific things happen (for > example, possibly auto-extending snapshots when they start to get full). That would be consistent with mdadm --monitor and dmeventd, but it is yet another wheel reinvention at the lower level, which then also necessitates higher level things to adapt to that interface. It would be neat if there could be some unification and consistency. -- Chris Murphy ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 64-btrfs.rules and degraded boot 2016-07-11 21:07 ` Chris Murphy @ 2016-07-12 15:34 ` Austin S. Hemmelgarn 0 siblings, 0 replies; 33+ messages in thread From: Austin S. Hemmelgarn @ 2016-07-12 15:34 UTC (permalink / raw) To: Chris Murphy Cc: Goffredo Baroncelli, Andrei Borzenkov, Kai Krakow, Btrfs BTRFS On 2016-07-11 17:07, Chris Murphy wrote: > On Fri, Jul 8, 2016 at 6:24 AM, Austin S. Hemmelgarn > <ahferroin7@gmail.com> wrote: > >> To clarify, I'm not trying to argue against adding support, I'm arguing >> against it being mandatory. > > By "D-Bus support" I did not mean to indicate mandating it, just that > it would be one possible way to get some very basic state change > messages to user space tools so we're doing the least amount of wheel > reinvention as possible. Minimizing the amount of work would be good, but I would not agree about D-Bus doing that. It's easy to debug socket based IPC, it's not easy to debug D-Bus based IPC. From a development perspective, I'd say we need to get something working with sockets first, and then worry about D-Bus once we have working infrastructure and abstraction for IPC. > > >> A filesystem which requires specific system >> services to be running just for regular maintenance tasks is not a well >> designed filesystem. To be entirely honest, I'm not all that happy about >> the functional dependency on udev to have device discovery, but there's no >> point in me arguing about that... > > Well everything else that came before it is effectively deprecated, so > there's no going back. The way forward would be to get udev more > granular state information about a Btrfs volume than 0 and 1. People still use other options, usually in embedded systems, but options do exist and are used. That said, I couldn't agree more about reporting more info about the state of the FS, but I still feel that scanning on device connection is not a good thing with the way things are currently designed in the kernel, not just the binary state reporting. > >> >> Just thinking aloud, but why not do a daemon that does the actual >> monitoring, and then provide an interface (at least a UNIX domain socket, >> and optionally a D-Bus endpoint) that other tools can use to query >> filesystem status. LVM already has a similar setup for monitoring DM-RAID >> volumes, snapshots, and thin storage pools, although it's designed as an >> event driven tool that does something when specific things happen (for >> example, possibly auto-extending snapshots when they start to get full). > > That would be consistent with mdadm --monitor and dmeventd, but it is > yet another wheel reinvention at the lower level, which then also > necessitates higher level things to adapt to that interface. It would > be neat if there could be some unification and consistency. > A consistent external API would be a good thing, but I'm not sure if unifying the internal design would be. Trying to unify handling in an external project would make things less reliable, not more reliable, because ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 64-btrfs.rules and degraded boot 2016-07-07 18:58 ` Chris Murphy 2016-07-07 19:14 ` Chris Murphy 2016-07-07 19:59 ` Austin S. Hemmelgarn @ 2016-07-07 20:13 ` Goffredo Baroncelli 2 siblings, 0 replies; 33+ messages in thread From: Goffredo Baroncelli @ 2016-07-07 20:13 UTC (permalink / raw) To: Chris Murphy, Austin S. Hemmelgarn Cc: Andrei Borzenkov, Kai Krakow, Btrfs BTRFS On 2016-07-07 20:58, Chris Murphy wrote: > I get all kinds of damn strange behaviors in GNOME > with Btrfs multiple device volumes: volume names appearing twice in > the UI, unmounting one causes umount errors with the other. > https://fedoraproject.org/wiki/Changes/Replace_UDisks2_by_Storaged > http://storaged.org/ Unfortunately BTRFS is a mess from this point of view. Some btrfs subcommand query the system inspecting directly the data stored on the disks; others use the ioctl(2) syscall, which provides what the kernel think. Unfortunately, due to the cache, these two kind of source of information are out of sync. Often, when some command output don't convince me, I do some "sync"; then repeating the command the output is better ("btrfs fi show" is one of these commands). BR G.Baroncelli -- gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 64-btrfs.rules and degraded boot 2016-07-07 18:23 ` Austin S. Hemmelgarn 2016-07-07 18:58 ` Chris Murphy @ 2016-07-07 19:41 ` Goffredo Baroncelli 1 sibling, 0 replies; 33+ messages in thread From: Goffredo Baroncelli @ 2016-07-07 19:41 UTC (permalink / raw) To: Austin S. Hemmelgarn, Andrei Borzenkov Cc: Chris Murphy, Kai Krakow, Btrfs BTRFS On 2016-07-07 20:23, Austin S. Hemmelgarn wrote: [...] > FWIW, I've pretty much always been of the opinion that the device discovery belongs in a mount helper. The auto-discovery from udev (and more importantly, how the kernel handles being told about a device) is much of the reason that it's so inherently dangerous to do block level copies. There's obviously no way that can be changed now without breaking something, but that's on the really short list of things that I personally feel are worth breaking to fix a particularly dangerous pitfall. The recent discovery that device ready state is write-once when set just reinforces this in my opinion. > > Here's how I would picture the ideal situation: > * A device is processed by udev. It detects that it's part of a BTRFS array, updates blkid and whatever else in userspace with this info, and then stops without telling the kernel. > * The kernel tracks devices until the filesystem they are part of is unmounted, or a mount of that FS fails. > * When the user goes to mount the a BTRFS filesystem, they use a mount helper. > 1. This helper queries udev/blkid/whatever to see which devices are part of an array. > 2. Once the helper determines which devices are potentially in the requested FS, it checks the following things to ensure array integrity: > - Does each device report the same number of component devices for the array? > - Does the reported number match the number of devices found? > - If a mount by UUID is requested, do all the labels match on each device? > - If a mount by LABEL is requested, do all the UUID's match on each device? > - If a mount by path is requested, do all the component devices reported by that device have matching LABEL _and_ UUID? > - Is any of the devices found already in-use by another mount? ^^^^^^^^^^^^^^^^^ It is possible to mount two time the same device. I add my favorite: - is there a conflict of disk-uuid (i.e two different disk with the same uuid) ? Anyway the point 2 has to be in loop until timeout: i.e. if systemd ask to mount a filesystem when the first device appear, wait for all devices appear. > 4. If any of the above checks fails, and the user has not specified an option to request a mount anyway, report the error and exit with non-zero status _before_ even talking to the kernel. > 5. If only the second check fails (the check verifying the number of devices found), and it fails because the number found is less than required for a non-degraded mount, ignore that check if and only if the user specified -o degraded. > 6. If any of the other checks fail, ignore them if and only if the user asks to ignore that specific check. > 7. Otherwise, notify the kernel about the devices and call mount(2). > * The mount helper parses it's own set of special options similar to the bg/fg/retry options used by mount.nfs to allow for timeouts when mounting, as well as asynchronous mounts in the background. > * btrfs device scan becomes a no-op > * btrfs device ready uses the above logic minus step 7 to determine if a filesystem is probably ready. > > Such a situation would probably eliminate or at least reduce most of our current issues with device discovery, and provide much better error reporting and general flexibility. > -- gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 64-btrfs.rules and degraded boot 2016-07-06 11:55 ` Andrei Borzenkov 2016-07-06 12:14 ` Austin S. Hemmelgarn @ 2016-07-06 12:49 ` Tomasz Torcz 1 sibling, 0 replies; 33+ messages in thread From: Tomasz Torcz @ 2016-07-06 12:49 UTC (permalink / raw) To: Btrfs BTRFS On Wed, Jul 06, 2016 at 02:55:37PM +0300, Andrei Borzenkov wrote: > On Wed, Jul 6, 2016 at 2:45 PM, Austin S. Hemmelgarn > <ahferroin7@gmail.com> wrote: > > On 2016-07-06 05:51, Andrei Borzenkov wrote: > >> > >> On Tue, Jul 5, 2016 at 11:10 PM, Chris Murphy <lists@colorremedies.com> > >> wrote: > >>> > >>> I started a systemd-devel@ thread since that's where most udev stuff > >>> gets talked about. > >>> > >>> > >>> https://lists.freedesktop.org/archives/systemd-devel/2016-July/037031.html > >>> > >> > >> Before discussing how to implement it in systemd, we need to decide > >> what to implement. I.e. > >> > >> 1) do you always want to mount filesystem in degraded mode if not > >> enough devices are present or only if explicit hint is given? > >> 2) do you want to restrict degrade handling to root only or to other > >> filesystems as well? Note that there could be more early boot > >> filesystems that absolutely need same treatment (enters separate > >> /usr), and there are also normal filesystems that may need be mounted > >> even degraded. > >> 3) can we query btrfs whether it is mountable in degraded mode? > >> according to documentation, "btrfs device ready" (which udev builtin > >> follows) checks "if it has ALL of it’s devices in cache for mounting". > >> This is required for proper systemd ordering of services. > > > > > > To be entirely honest, if it were me, I'd want systemd to fsck off. If the > > kernel mount(2) call succeeds, then the filesystem was ready enough to > > mount, and if it doesn't, then it wasn't, end of story. > > How should user space know when to try mount? What user space is > supposed to do during boot if mount fails? Do you suggest > > while true; do > mount /dev/foo && exit 0 > done > > as part of startup sequence? And note that nowhere is systemd involved so far. Getting rid of such loops was the original motivation for the ioctl: http://www.spinics.net/lists/linux-btrfs/msg17372.html Maybe the ioctl need extending? Instead of returning 1/0, it could take flag saying ”return 1 as soon as degraded mount is possible”? -- Tomasz Torcz Morality must always be based on practicality. xmpp: zdzichubg@chrome.pl -- Baron Vladimir Harkonnen ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 64-btrfs.rules and degraded boot 2016-07-06 9:51 ` Andrei Borzenkov 2016-07-06 11:45 ` Austin S. Hemmelgarn @ 2016-07-06 17:19 ` Chris Murphy 2016-07-06 18:04 ` Austin S. Hemmelgarn 2016-07-06 18:24 ` Andrei Borzenkov 1 sibling, 2 replies; 33+ messages in thread From: Chris Murphy @ 2016-07-06 17:19 UTC (permalink / raw) To: Andrei Borzenkov; +Cc: Chris Murphy, Kai Krakow, Btrfs BTRFS On Wed, Jul 6, 2016 at 3:51 AM, Andrei Borzenkov <arvidjaar@gmail.com> wrote: > On Tue, Jul 5, 2016 at 11:10 PM, Chris Murphy <lists@colorremedies.com> wrote: >> I started a systemd-devel@ thread since that's where most udev stuff >> gets talked about. >> >> https://lists.freedesktop.org/archives/systemd-devel/2016-July/037031.html >> > > Before discussing how to implement it in systemd, we need to decide > what to implement. I.e. Fair. > 1) do you always want to mount filesystem in degraded mode if not > enough devices are present or only if explicit hint is given? Right now on Btrfs, it should be explicit. The faulty device concept, handling, and notification is not mature. It's not a good idea to silently mount degraded considering Btrfs does not actively catch up the devices that are behind the next time there's a normal mount. It only fixes things passively. So the user must opt into degraded mounts rather than opt out. The problem is the current udev rule is doing its own check for device availability. So the mount command with explicit hint doesn't even get attempted. > 2) do you want to restrict degrade handling to root only or to other > filesystems as well? Note that there could be more early boot > filesystems that absolutely need same treatment (enters separate > /usr), and there are also normal filesystems that may need be mounted > even degraded. I'm mainly concerned with rootfs. And I'm mainly concerned with a very simple 2 disk raid1. With a simple user opt in using rootflags=degraded, it should be possible to boot the system. Right now it's not possible. Maybe just deleting 64-btrfs.rules would fix this problem, I haven't tried it. > 3) can we query btrfs whether it is mountable in degraded mode? > according to documentation, "btrfs device ready" (which udev builtin > follows) checks "if it has ALL of it’s devices in cache for mounting". > This is required for proper systemd ordering of services. Where does udev builtin use btrfs itself? I see "btrfs ready $device" which is not a valid btrfs user space command. I never get any errors from "btrfs device ready" even when too many devices are missing. I don't know what it even does or if it's broken. This is a three device raid1 where I removed 2 devices and "btrfs device ready" does not complain, it always returns silent for me no matter what. It's been this way for years as far as I know. [root@f24s ~]# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert 1 VG Vwi-a-tz-- 50.00g thintastic 2.55 2 VG Vwi-a-tz-- 50.00g thintastic 4.00 3 VG Vwi-a-tz-- 50.00g thintastic 2.54 thintastic VG twi-aotz-- 90.00g 5.05 2.92 [root@f24s ~]# btrfs fi show Label: none uuid: 96240fd9-ea76-47e7-8cf4-05d3570ccfd7 Total devices 3 FS bytes used 2.26GiB devid 1 size 50.00GiB used 3.00GiB path /dev/mapper/VG-1 devid 2 size 50.00GiB used 2.01GiB path /dev/mapper/VG-2 devid 3 size 50.00GiB used 3.01GiB path /dev/mapper/VG-3 [root@f24s ~]# btrfs device ready /dev/mapper/VG-1 [root@f24s ~]# [root@f24s ~]# lvchange -an VG/1 [root@f24s ~]# lvchange -an VG/2 [root@f24s ~]# btrfs dev scan Scanning for Btrfs filesystems [root@f24s ~]# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert 1 VG Vwi---tz-- 50.00g thintastic 2 VG Vwi---tz-- 50.00g thintastic 3 VG Vwi-a-tz-- 50.00g thintastic 2.54 thintastic VG twi-aotz-- 90.00g 5.05 2.92 [root@f24s ~]# btrfs fi show warning, device 2 is missing Label: none uuid: 96240fd9-ea76-47e7-8cf4-05d3570ccfd7 Total devices 3 FS bytes used 2.26GiB devid 3 size 50.00GiB used 3.01GiB path /dev/mapper/VG-3 *** Some devices missing [root@f24s ~]# btrfs device ready /dev/mapper/VG-3 [root@f24s ~]# -- Chris Murphy ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 64-btrfs.rules and degraded boot 2016-07-06 17:19 ` Chris Murphy @ 2016-07-06 18:04 ` Austin S. Hemmelgarn 2016-07-06 18:23 ` Chris Murphy 2016-07-06 18:24 ` Andrei Borzenkov 1 sibling, 1 reply; 33+ messages in thread From: Austin S. Hemmelgarn @ 2016-07-06 18:04 UTC (permalink / raw) To: Chris Murphy, Andrei Borzenkov; +Cc: Kai Krakow, Btrfs BTRFS On 2016-07-06 13:19, Chris Murphy wrote: > On Wed, Jul 6, 2016 at 3:51 AM, Andrei Borzenkov <arvidjaar@gmail.com> wrote: >> 3) can we query btrfs whether it is mountable in degraded mode? >> according to documentation, "btrfs device ready" (which udev builtin >> follows) checks "if it has ALL of it’s devices in cache for mounting". >> This is required for proper systemd ordering of services. > > Where does udev builtin use btrfs itself? I see "btrfs ready $device" > which is not a valid btrfs user space command. > > I never get any errors from "btrfs device ready" even when too many > devices are missing. I don't know what it even does or if it's broken. > > This is a three device raid1 where I removed 2 devices and "btrfs > device ready" does not complain, it always returns silent for me no > matter what. It's been this way for years as far as I know. > > [root@f24s ~]# lvs > LV VG Attr LSize Pool Origin Data% Meta% Move > Log Cpy%Sync Convert > 1 VG Vwi-a-tz-- 50.00g thintastic 2.55 > 2 VG Vwi-a-tz-- 50.00g thintastic 4.00 > 3 VG Vwi-a-tz-- 50.00g thintastic 2.54 > thintastic VG twi-aotz-- 90.00g 5.05 2.92 > [root@f24s ~]# btrfs fi show > Label: none uuid: 96240fd9-ea76-47e7-8cf4-05d3570ccfd7 > Total devices 3 FS bytes used 2.26GiB > devid 1 size 50.00GiB used 3.00GiB path /dev/mapper/VG-1 > devid 2 size 50.00GiB used 2.01GiB path /dev/mapper/VG-2 > devid 3 size 50.00GiB used 3.01GiB path /dev/mapper/VG-3 > > [root@f24s ~]# btrfs device ready /dev/mapper/VG-1 > [root@f24s ~]# > [root@f24s ~]# lvchange -an VG/1 > [root@f24s ~]# lvchange -an VG/2 > [root@f24s ~]# btrfs dev scan > Scanning for Btrfs filesystems > [root@f24s ~]# lvs > LV VG Attr LSize Pool Origin Data% Meta% Move > Log Cpy%Sync Convert > 1 VG Vwi---tz-- 50.00g thintastic > 2 VG Vwi---tz-- 50.00g thintastic > 3 VG Vwi-a-tz-- 50.00g thintastic 2.54 > thintastic VG twi-aotz-- 90.00g 5.05 2.92 > [root@f24s ~]# btrfs fi show > warning, device 2 is missing > Label: none uuid: 96240fd9-ea76-47e7-8cf4-05d3570ccfd7 > Total devices 3 FS bytes used 2.26GiB > devid 3 size 50.00GiB used 3.01GiB path /dev/mapper/VG-3 > *** Some devices missing > > [root@f24s ~]# btrfs device ready /dev/mapper/VG-3 > [root@f24s ~]# You won't get any output from it regardless, you have to check the return code as it's intended to be a tool for scripts and such. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 64-btrfs.rules and degraded boot 2016-07-06 18:04 ` Austin S. Hemmelgarn @ 2016-07-06 18:23 ` Chris Murphy 2016-07-06 18:29 ` Andrei Borzenkov 2016-07-06 19:17 ` Austin S. Hemmelgarn 0 siblings, 2 replies; 33+ messages in thread From: Chris Murphy @ 2016-07-06 18:23 UTC (permalink / raw) To: Austin S. Hemmelgarn Cc: Chris Murphy, Andrei Borzenkov, Kai Krakow, Btrfs BTRFS On Wed, Jul 6, 2016 at 12:04 PM, Austin S. Hemmelgarn <ahferroin7@gmail.com> wrote: > On 2016-07-06 13:19, Chris Murphy wrote: >> >> On Wed, Jul 6, 2016 at 3:51 AM, Andrei Borzenkov <arvidjaar@gmail.com> >> wrote: >>> >>> 3) can we query btrfs whether it is mountable in degraded mode? >>> according to documentation, "btrfs device ready" (which udev builtin >>> follows) checks "if it has ALL of it’s devices in cache for mounting". >>> This is required for proper systemd ordering of services. >> >> >> Where does udev builtin use btrfs itself? I see "btrfs ready $device" >> which is not a valid btrfs user space command. >> >> I never get any errors from "btrfs device ready" even when too many >> devices are missing. I don't know what it even does or if it's broken. >> >> This is a three device raid1 where I removed 2 devices and "btrfs >> device ready" does not complain, it always returns silent for me no >> matter what. It's been this way for years as far as I know. >> >> [root@f24s ~]# lvs >> LV VG Attr LSize Pool Origin Data% Meta% Move >> Log Cpy%Sync Convert >> 1 VG Vwi-a-tz-- 50.00g thintastic 2.55 >> 2 VG Vwi-a-tz-- 50.00g thintastic 4.00 >> 3 VG Vwi-a-tz-- 50.00g thintastic 2.54 >> thintastic VG twi-aotz-- 90.00g 5.05 2.92 >> [root@f24s ~]# btrfs fi show >> Label: none uuid: 96240fd9-ea76-47e7-8cf4-05d3570ccfd7 >> Total devices 3 FS bytes used 2.26GiB >> devid 1 size 50.00GiB used 3.00GiB path /dev/mapper/VG-1 >> devid 2 size 50.00GiB used 2.01GiB path /dev/mapper/VG-2 >> devid 3 size 50.00GiB used 3.01GiB path /dev/mapper/VG-3 >> >> [root@f24s ~]# btrfs device ready /dev/mapper/VG-1 >> [root@f24s ~]# >> [root@f24s ~]# lvchange -an VG/1 >> [root@f24s ~]# lvchange -an VG/2 >> [root@f24s ~]# btrfs dev scan >> Scanning for Btrfs filesystems >> [root@f24s ~]# lvs >> LV VG Attr LSize Pool Origin Data% Meta% Move >> Log Cpy%Sync Convert >> 1 VG Vwi---tz-- 50.00g thintastic >> 2 VG Vwi---tz-- 50.00g thintastic >> 3 VG Vwi-a-tz-- 50.00g thintastic 2.54 >> thintastic VG twi-aotz-- 90.00g 5.05 2.92 >> [root@f24s ~]# btrfs fi show >> warning, device 2 is missing >> Label: none uuid: 96240fd9-ea76-47e7-8cf4-05d3570ccfd7 >> Total devices 3 FS bytes used 2.26GiB >> devid 3 size 50.00GiB used 3.01GiB path /dev/mapper/VG-3 >> *** Some devices missing >> >> [root@f24s ~]# btrfs device ready /dev/mapper/VG-3 >> [root@f24s ~]# > > You won't get any output from it regardless, you have to check the return > code as it's intended to be a tool for scripts and such. How do I check the return code? When I use strace, no matter what I'm getting +++ exited with 0 +++ I see both 'brfs device ready' and the udev btrfs builtin test are calling BTRFS_IOC_DEVICES_READY so, it looks like udev is not using user space tools to check but rather a btrfs ioctl. So clearly that works or I wouldn't have stalled boots when all devices aren't present. -- Chris Murphy ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 64-btrfs.rules and degraded boot 2016-07-06 18:23 ` Chris Murphy @ 2016-07-06 18:29 ` Andrei Borzenkov 2016-07-06 19:17 ` Austin S. Hemmelgarn 1 sibling, 0 replies; 33+ messages in thread From: Andrei Borzenkov @ 2016-07-06 18:29 UTC (permalink / raw) To: Chris Murphy; +Cc: Austin S. Hemmelgarn, Kai Krakow, Btrfs BTRFS On Wed, Jul 6, 2016 at 9:23 PM, Chris Murphy <lists@colorremedies.com> wrote: >>> [root@f24s ~]# btrfs fi show >>> warning, device 2 is missing >>> Label: none uuid: 96240fd9-ea76-47e7-8cf4-05d3570ccfd7 >>> Total devices 3 FS bytes used 2.26GiB >>> devid 3 size 50.00GiB used 3.01GiB path /dev/mapper/VG-3 >>> *** Some devices missing >>> >>> [root@f24s ~]# btrfs device ready /dev/mapper/VG-3 >>> [root@f24s ~]# >> >> You won't get any output from it regardless, you have to check the return >> code as it's intended to be a tool for scripts and such. > > How do I check the return code? When I use strace, no matter what I'm getting > > +++ exited with 0 +++ > > I see both 'brfs device ready' and the udev btrfs builtin test are > calling BTRFS_IOC_DEVICES_READY so, it looks like udev is not using > user space tools to check but rather a btrfs ioctl. Correct. It is possible that ioctl returns correct result only the very first time; notice that in your example btrfs had seen all other devices at least once while at boot it is really the case of other devices missing so far. Which returns us to the question - how we can reliably query kernel about mountability of filesystem. > So clearly that > works or I wouldn't have stalled boots when all devices aren't > present. > > -- > Chris Murphy ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 64-btrfs.rules and degraded boot 2016-07-06 18:23 ` Chris Murphy 2016-07-06 18:29 ` Andrei Borzenkov @ 2016-07-06 19:17 ` Austin S. Hemmelgarn 2016-07-06 20:00 ` Chris Murphy 1 sibling, 1 reply; 33+ messages in thread From: Austin S. Hemmelgarn @ 2016-07-06 19:17 UTC (permalink / raw) To: Chris Murphy; +Cc: Andrei Borzenkov, Kai Krakow, Btrfs BTRFS On 2016-07-06 14:23, Chris Murphy wrote: > On Wed, Jul 6, 2016 at 12:04 PM, Austin S. Hemmelgarn > <ahferroin7@gmail.com> wrote: >> On 2016-07-06 13:19, Chris Murphy wrote: >>> >>> On Wed, Jul 6, 2016 at 3:51 AM, Andrei Borzenkov <arvidjaar@gmail.com> >>> wrote: >>>> >>>> 3) can we query btrfs whether it is mountable in degraded mode? >>>> according to documentation, "btrfs device ready" (which udev builtin >>>> follows) checks "if it has ALL of it’s devices in cache for mounting". >>>> This is required for proper systemd ordering of services. >>> >>> >>> Where does udev builtin use btrfs itself? I see "btrfs ready $device" >>> which is not a valid btrfs user space command. >>> >>> I never get any errors from "btrfs device ready" even when too many >>> devices are missing. I don't know what it even does or if it's broken. >>> >>> This is a three device raid1 where I removed 2 devices and "btrfs >>> device ready" does not complain, it always returns silent for me no >>> matter what. It's been this way for years as far as I know. >>> >>> [root@f24s ~]# lvs >>> LV VG Attr LSize Pool Origin Data% Meta% Move >>> Log Cpy%Sync Convert >>> 1 VG Vwi-a-tz-- 50.00g thintastic 2.55 >>> 2 VG Vwi-a-tz-- 50.00g thintastic 4.00 >>> 3 VG Vwi-a-tz-- 50.00g thintastic 2.54 >>> thintastic VG twi-aotz-- 90.00g 5.05 2.92 >>> [root@f24s ~]# btrfs fi show >>> Label: none uuid: 96240fd9-ea76-47e7-8cf4-05d3570ccfd7 >>> Total devices 3 FS bytes used 2.26GiB >>> devid 1 size 50.00GiB used 3.00GiB path /dev/mapper/VG-1 >>> devid 2 size 50.00GiB used 2.01GiB path /dev/mapper/VG-2 >>> devid 3 size 50.00GiB used 3.01GiB path /dev/mapper/VG-3 >>> >>> [root@f24s ~]# btrfs device ready /dev/mapper/VG-1 >>> [root@f24s ~]# >>> [root@f24s ~]# lvchange -an VG/1 >>> [root@f24s ~]# lvchange -an VG/2 >>> [root@f24s ~]# btrfs dev scan >>> Scanning for Btrfs filesystems >>> [root@f24s ~]# lvs >>> LV VG Attr LSize Pool Origin Data% Meta% Move >>> Log Cpy%Sync Convert >>> 1 VG Vwi---tz-- 50.00g thintastic >>> 2 VG Vwi---tz-- 50.00g thintastic >>> 3 VG Vwi-a-tz-- 50.00g thintastic 2.54 >>> thintastic VG twi-aotz-- 90.00g 5.05 2.92 >>> [root@f24s ~]# btrfs fi show >>> warning, device 2 is missing >>> Label: none uuid: 96240fd9-ea76-47e7-8cf4-05d3570ccfd7 >>> Total devices 3 FS bytes used 2.26GiB >>> devid 3 size 50.00GiB used 3.01GiB path /dev/mapper/VG-3 >>> *** Some devices missing >>> >>> [root@f24s ~]# btrfs device ready /dev/mapper/VG-3 >>> [root@f24s ~]# >> >> You won't get any output from it regardless, you have to check the return >> code as it's intended to be a tool for scripts and such. > > How do I check the return code? When I use strace, no matter what I'm getting > > +++ exited with 0 +++ > > I see both 'brfs device ready' and the udev btrfs builtin test are > calling BTRFS_IOC_DEVICES_READY so, it looks like udev is not using > user space tools to check but rather a btrfs ioctl. So clearly that > works or I wouldn't have stalled boots when all devices aren't > present. > In bash or most other POSIX compliant shells, you can run this: echo $? to get the return code of the previous command. In your case though, it may be reporting the FS ready because it had already seen all the devices, IIUC, the flag that checks is only set once, and never unset, which is not a good design in this case. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 64-btrfs.rules and degraded boot 2016-07-06 19:17 ` Austin S. Hemmelgarn @ 2016-07-06 20:00 ` Chris Murphy 2016-07-07 17:00 ` Goffredo Baroncelli 0 siblings, 1 reply; 33+ messages in thread From: Chris Murphy @ 2016-07-06 20:00 UTC (permalink / raw) To: Austin S. Hemmelgarn Cc: Chris Murphy, Andrei Borzenkov, Kai Krakow, Btrfs BTRFS On Wed, Jul 6, 2016 at 1:17 PM, Austin S. Hemmelgarn <ahferroin7@gmail.com> wrote: > In bash or most other POSIX compliant shells, you can run this: > echo $? > to get the return code of the previous command. > > In your case though, it may be reporting the FS ready because it had already > seen all the devices, IIUC, the flag that checks is only set once, and never > unset, which is not a good design in this case. Oh dear. [root@f24s ~]# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert 1 VG Vwi---tz-- 50.00g thintastic 2 VG Vwi---tz-- 50.00g thintastic 3 VG Vwi-a-tz-- 50.00g thintastic 2.54 thintastic VG twi-aotz-- 90.00g 5.05 2.92 [root@f24s ~]# btrfs dev scan Scanning for Btrfs filesystems [root@f24s ~]# echo $? 0 [root@f24s ~]# btrfs device ready /dev/mapper/VG-3 [root@f24s ~]# echo $? 0 [root@f24s ~]# btrfs fi show warning, device 2 is missing Label: none uuid: 96240fd9-ea76-47e7-8cf4-05d3570ccfd7 Total devices 3 FS bytes used 2.26GiB devid 3 size 50.00GiB used 3.01GiB path /dev/mapper/VG-3 *** Some devices missing Cute, device 1 is also missing but that's not mentioned. In any case, the device is still ready even after a dev scan. I guess this isn't exactly testable all that easily unless I reboot. -- Chris Murphy ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 64-btrfs.rules and degraded boot 2016-07-06 20:00 ` Chris Murphy @ 2016-07-07 17:00 ` Goffredo Baroncelli 0 siblings, 0 replies; 33+ messages in thread From: Goffredo Baroncelli @ 2016-07-07 17:00 UTC (permalink / raw) To: Chris Murphy, Austin S. Hemmelgarn Cc: Andrei Borzenkov, Kai Krakow, Btrfs BTRFS On 2016-07-06 22:00, Chris Murphy wrote: > On Wed, Jul 6, 2016 at 1:17 PM, Austin S. Hemmelgarn > <ahferroin7@gmail.com> wrote: > >> In bash or most other POSIX compliant shells, you can run this: >> echo $? >> to get the return code of the previous command. >> >> In your case though, it may be reporting the FS ready because it had already >> seen all the devices, IIUC, the flag that checks is only set once, and never >> unset, which is not a good design in this case. > > Oh dear. > > [root@f24s ~]# lvs > LV VG Attr LSize Pool Origin Data% Meta% Move > Log Cpy%Sync Convert > 1 VG Vwi---tz-- 50.00g thintastic > 2 VG Vwi---tz-- 50.00g thintastic > 3 VG Vwi-a-tz-- 50.00g thintastic 2.54 > thintastic VG twi-aotz-- 90.00g 5.05 2.92 > [root@f24s ~]# btrfs dev scan > Scanning for Btrfs filesystems > [root@f24s ~]# echo $? > 0 > [root@f24s ~]# btrfs device ready /dev/mapper/VG-3 > [root@f24s ~]# echo $? > 0 > [root@f24s ~]# btrfs fi show > warning, device 2 is missing > Label: none uuid: 96240fd9-ea76-47e7-8cf4-05d3570ccfd7 > Total devices 3 FS bytes used 2.26GiB > devid 3 size 50.00GiB used 3.01GiB path /dev/mapper/VG-3 > *** Some devices missing > > > Cute, device 1 is also missing but that's not mentioned. In any case, > the device is still ready even after a dev scan. I guess this isn't > exactly testable all that easily unless I reboot. IIRC a device when "registered" by "btrfs dev scan", is never removed from the available devices. This means that if you remove a valid device after that it is already scanned, "btrfs dev ready" still return OK until a reboot happened. >From your email, it is not clear if you rebooted (or rmmod-ded btrfs) after you removed the devices. Only my 2¢... BR G.Baroncelli > > > -- gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 64-btrfs.rules and degraded boot 2016-07-06 17:19 ` Chris Murphy 2016-07-06 18:04 ` Austin S. Hemmelgarn @ 2016-07-06 18:24 ` Andrei Borzenkov 2016-07-06 18:57 ` Chris Murphy 1 sibling, 1 reply; 33+ messages in thread From: Andrei Borzenkov @ 2016-07-06 18:24 UTC (permalink / raw) To: Chris Murphy; +Cc: Kai Krakow, Btrfs BTRFS On Wed, Jul 6, 2016 at 8:19 PM, Chris Murphy <lists@colorremedies.com> wrote: > > I'm mainly concerned with rootfs. And I'm mainly concerned with a very > simple 2 disk raid1. With a simple user opt in using > rootflags=degraded, it should be possible to boot the system. Right > now it's not possible. Maybe just deleting 64-btrfs.rules would fix > this problem, I haven't tried it. > While deleting this rule will fix your specific degraded 2 disk raid 1 it will break non-degraded multi-device filesystem. Logic currently implemented by systemd assumes that mount is called after prerequisites have been fulfilled. Deleting this rule will call mount as soon as the very first device is seen; such filesystem is obviously not mountable. Equivalent of this rule is required under systemd and desired in general to avoid polling. On systemd list I outlined possible alternative implementation as systemd service instead of really hackish udev rule. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 64-btrfs.rules and degraded boot 2016-07-06 18:24 ` Andrei Borzenkov @ 2016-07-06 18:57 ` Chris Murphy 2016-07-07 17:07 ` Goffredo Baroncelli 0 siblings, 1 reply; 33+ messages in thread From: Chris Murphy @ 2016-07-06 18:57 UTC (permalink / raw) To: Andrei Borzenkov; +Cc: Chris Murphy, Kai Krakow, Btrfs BTRFS On Wed, Jul 6, 2016 at 12:24 PM, Andrei Borzenkov <arvidjaar@gmail.com> wrote: > On Wed, Jul 6, 2016 at 8:19 PM, Chris Murphy <lists@colorremedies.com> wrote: >> >> I'm mainly concerned with rootfs. And I'm mainly concerned with a very >> simple 2 disk raid1. With a simple user opt in using >> rootflags=degraded, it should be possible to boot the system. Right >> now it's not possible. Maybe just deleting 64-btrfs.rules would fix >> this problem, I haven't tried it. >> > > While deleting this rule will fix your specific degraded 2 disk raid 1 > it will break non-degraded multi-device filesystem. Logic currently > implemented by systemd assumes that mount is called after > prerequisites have been fulfilled. Deleting this rule will call mount > as soon as the very first device is seen; such filesystem is obviously > not mountable. Seems like we need more granularity by btrfs ioctl for device ready, e.g. some way to indicate: 0 all devices ready 1 devices not ready (don't even try to mount) 2 minimum devices ready (degraded mount possible) Btrfs multiple device single and raid0 only return code 0 or 1. Where raid 1, 5, 6 could return code 2. The systemd default policy for code 2 could be to wait some amount of time to see if state goes to 0. At the timeout, try to mount anyway. If rootflags=degraded, it mounts. If not, mount fails, and we get a dracut prompt. That's better behavior than now. > Equivalent of this rule is required under systemd and desired in > general to avoid polling. On systemd list I outlined possible > alternative implementation as systemd service instead of really > hackish udev rule. I'll go read it there. Thanks. -- Chris Murphy ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 64-btrfs.rules and degraded boot 2016-07-06 18:57 ` Chris Murphy @ 2016-07-07 17:07 ` Goffredo Baroncelli 0 siblings, 0 replies; 33+ messages in thread From: Goffredo Baroncelli @ 2016-07-07 17:07 UTC (permalink / raw) To: Chris Murphy, Andrei Borzenkov; +Cc: Kai Krakow, Btrfs BTRFS On 2016-07-06 20:57, Chris Murphy wrote: [...] > > Seems like we need more granularity by btrfs ioctl for device ready, > e.g. some way to indicate: > > 0 all devices ready > 1 devices not ready (don't even try to mount) > 2 minimum devices ready (degraded mount possible) > > > Btrfs multiple device single and raid0 only return code 0 or 1. Where > raid 1, 5, 6 could return code 2. The systemd default policy for code > 2 could be to wait some amount of time to see if state goes to 0. At > the timeout, try to mount anyway. If rootflags=degraded, it mounts. If > not, mount fails, and we get a dracut prompt. > Pay attention that to return 2, you have to scan all the VGs to check if all the involved devices are available: i.e. a filesystem composed by 5 disks, may have a VG RAID5 with only 3 disks used for data, and a VG RAID1 for metadata for the other two disks.... Think to try to perform this for each disk appearing.... I fear that it is too expensive > That's better behavior than now. > >> Equivalent of this rule is required under systemd and desired in >> general to avoid polling. On systemd list I outlined possible >> alternative implementation as systemd service instead of really >> hackish udev rule. > > I'll go read it there. Thanks. > > -- gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: 64-btrfs.rules and degraded boot 2016-07-05 18:53 64-btrfs.rules and degraded boot Chris Murphy 2016-07-05 19:27 ` Kai Krakow @ 2016-07-07 16:37 ` Goffredo Baroncelli 1 sibling, 0 replies; 33+ messages in thread From: Goffredo Baroncelli @ 2016-07-07 16:37 UTC (permalink / raw) To: Chris Murphy, Btrfs BTRFS On 2016-07-05 20:53, Chris Murphy wrote: > I am kinda confused about this "btrfs ready $devnode" portion. Isn't > it "btrfs device ready $devnode" if this is based on user space tools? systemd, implemented this as internal command -- gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 ^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2016-07-12 15:34 UTC | newest] Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-07-05 18:53 64-btrfs.rules and degraded boot Chris Murphy 2016-07-05 19:27 ` Kai Krakow 2016-07-05 19:30 ` Chris Murphy 2016-07-05 20:10 ` Chris Murphy 2016-07-06 9:51 ` Andrei Borzenkov 2016-07-06 11:45 ` Austin S. Hemmelgarn 2016-07-06 11:55 ` Andrei Borzenkov 2016-07-06 12:14 ` Austin S. Hemmelgarn 2016-07-06 12:39 ` Andrei Borzenkov 2016-07-06 12:48 ` Austin S. Hemmelgarn 2016-07-07 16:52 ` Goffredo Baroncelli 2016-07-07 18:23 ` Austin S. Hemmelgarn 2016-07-07 18:58 ` Chris Murphy 2016-07-07 19:14 ` Chris Murphy 2016-07-07 19:59 ` Austin S. Hemmelgarn 2016-07-07 20:20 ` Chris Murphy 2016-07-08 12:24 ` Austin S. Hemmelgarn 2016-07-11 21:07 ` Chris Murphy 2016-07-12 15:34 ` Austin S. Hemmelgarn 2016-07-07 20:13 ` Goffredo Baroncelli 2016-07-07 19:41 ` Goffredo Baroncelli 2016-07-06 12:49 ` Tomasz Torcz 2016-07-06 17:19 ` Chris Murphy 2016-07-06 18:04 ` Austin S. Hemmelgarn 2016-07-06 18:23 ` Chris Murphy 2016-07-06 18:29 ` Andrei Borzenkov 2016-07-06 19:17 ` Austin S. Hemmelgarn 2016-07-06 20:00 ` Chris Murphy 2016-07-07 17:00 ` Goffredo Baroncelli 2016-07-06 18:24 ` Andrei Borzenkov 2016-07-06 18:57 ` Chris Murphy 2016-07-07 17:07 ` Goffredo Baroncelli 2016-07-07 16:37 ` Goffredo Baroncelli
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.