All of lore.kernel.org
 help / color / mirror / Atom feed
* 64-btrfs.rules and degraded boot
@ 2016-07-05 18:53 Chris Murphy
  2016-07-05 19:27 ` Kai Krakow
  2016-07-07 16:37 ` Goffredo Baroncelli
  0 siblings, 2 replies; 33+ messages in thread
From: Chris Murphy @ 2016-07-05 18:53 UTC (permalink / raw)
  To: Btrfs BTRFS

For some reason I thought it was possible to do degraded Btrfs boots
by removing root=UUID= in favor of a remaining good block device, e.g.
root=/dev/vda2, and then adding degraded to rootflags. But this
doesn't work either on CentOS 7.2 or Fedora Rawhide. What happens is
systemd waits for vda2 (or by UUID) indefinitely, it doesn't even try
to mount the volume.

I think it's due to the udev rule that's basically saying the device
isn't ready because not all of its devices are there.

[root@f24m ~]# cat /usr/lib/udev/rules.d/64-btrfs.rules
# do not edit this file, it will be overwritten on update

SUBSYSTEM!="block", GOTO="btrfs_end"
ACTION=="remove", GOTO="btrfs_end"
ENV{ID_FS_TYPE}!="btrfs", GOTO="btrfs_end"

# let the kernel know about this btrfs filesystem, and check if it is complete
IMPORT{builtin}="btrfs ready $devnode"

# mark the device as not ready to be used by the system
ENV{ID_BTRFS_READY}=="0", ENV{SYSTEMD_READY}="0"

LABEL="btrfs_end"


I am kinda confused about this "btrfs ready $devnode" portion. Isn't
it "btrfs device ready $devnode" if this is based on user space tools?




-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 64-btrfs.rules and degraded boot
  2016-07-05 18:53 64-btrfs.rules and degraded boot Chris Murphy
@ 2016-07-05 19:27 ` Kai Krakow
  2016-07-05 19:30   ` Chris Murphy
  2016-07-07 16:37 ` Goffredo Baroncelli
  1 sibling, 1 reply; 33+ messages in thread
From: Kai Krakow @ 2016-07-05 19:27 UTC (permalink / raw)
  To: linux-btrfs

Am Tue, 5 Jul 2016 12:53:02 -0600
schrieb Chris Murphy <lists@colorremedies.com>:

> For some reason I thought it was possible to do degraded Btrfs boots
> by removing root=UUID= in favor of a remaining good block device, e.g.
> root=/dev/vda2, and then adding degraded to rootflags. But this
> doesn't work either on CentOS 7.2 or Fedora Rawhide. What happens is
> systemd waits for vda2 (or by UUID) indefinitely, it doesn't even try
> to mount the volume.
> 
> I think it's due to the udev rule that's basically saying the device
> isn't ready because not all of its devices are there.
> 
> [root@f24m ~]# cat /usr/lib/udev/rules.d/64-btrfs.rules
> # do not edit this file, it will be overwritten on update
> 
> SUBSYSTEM!="block", GOTO="btrfs_end"
> ACTION=="remove", GOTO="btrfs_end"
> ENV{ID_FS_TYPE}!="btrfs", GOTO="btrfs_end"
> 
> # let the kernel know about this btrfs filesystem, and check if it is
> complete IMPORT{builtin}="btrfs ready $devnode"

This doesn't come from the user-space tools but from the udev builtins,
I think:

# udevadm test-builtin btrfs
 
> # mark the device as not ready to be used by the system
> ENV{ID_BTRFS_READY}=="0", ENV{SYSTEMD_READY}="0"
> 
> LABEL="btrfs_end"
> 
> 
> I am kinda confused about this "btrfs ready $devnode" portion. Isn't
> it "btrfs device ready $devnode" if this is based on user space tools?
> 
> 
> 
> 



-- 
Regards,
Kai

Replies to list-only preferred.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 64-btrfs.rules and degraded boot
  2016-07-05 19:27 ` Kai Krakow
@ 2016-07-05 19:30   ` Chris Murphy
  2016-07-05 20:10     ` Chris Murphy
  0 siblings, 1 reply; 33+ messages in thread
From: Chris Murphy @ 2016-07-05 19:30 UTC (permalink / raw)
  To: Kai Krakow; +Cc: Btrfs BTRFS

On Tue, Jul 5, 2016 at 1:27 PM, Kai Krakow <hurikhan77@gmail.com> wrote:
> Am Tue, 5 Jul 2016 12:53:02 -0600
> schrieb Chris Murphy <lists@colorremedies.com>:
>
>> For some reason I thought it was possible to do degraded Btrfs boots
>> by removing root=UUID= in favor of a remaining good block device, e.g.
>> root=/dev/vda2, and then adding degraded to rootflags. But this
>> doesn't work either on CentOS 7.2 or Fedora Rawhide. What happens is
>> systemd waits for vda2 (or by UUID) indefinitely, it doesn't even try
>> to mount the volume.
>>
>> I think it's due to the udev rule that's basically saying the device
>> isn't ready because not all of its devices are there.
>>
>> [root@f24m ~]# cat /usr/lib/udev/rules.d/64-btrfs.rules
>> # do not edit this file, it will be overwritten on update
>>
>> SUBSYSTEM!="block", GOTO="btrfs_end"
>> ACTION=="remove", GOTO="btrfs_end"
>> ENV{ID_FS_TYPE}!="btrfs", GOTO="btrfs_end"
>>
>> # let the kernel know about this btrfs filesystem, and check if it is
>> complete IMPORT{builtin}="btrfs ready $devnode"
>
> This doesn't come from the user-space tools but from the udev builtins,
> I think:
>
> # udevadm test-builtin btrfs

[root@f24m ~]# udevadm test-builtin btrfs
calling: test-builtin
syspath missing


# dnf provides /usr/lib/udev/rules.d/64-btrfs.rules
Last metadata expiration check: 1:17:58 ago on Tue Jul  5 12:11:07 2016.
systemd-udev-229-8.fc24.x86_64 : Rule-based device node and kernel event manager
Repo        : @System





-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 64-btrfs.rules and degraded boot
  2016-07-05 19:30   ` Chris Murphy
@ 2016-07-05 20:10     ` Chris Murphy
  2016-07-06  9:51       ` Andrei Borzenkov
  0 siblings, 1 reply; 33+ messages in thread
From: Chris Murphy @ 2016-07-05 20:10 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Kai Krakow, Btrfs BTRFS

I started a systemd-devel@ thread since that's where most udev stuff
gets talked about.

https://lists.freedesktop.org/archives/systemd-devel/2016-July/037031.html



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 64-btrfs.rules and degraded boot
  2016-07-05 20:10     ` Chris Murphy
@ 2016-07-06  9:51       ` Andrei Borzenkov
  2016-07-06 11:45         ` Austin S. Hemmelgarn
  2016-07-06 17:19         ` Chris Murphy
  0 siblings, 2 replies; 33+ messages in thread
From: Andrei Borzenkov @ 2016-07-06  9:51 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Kai Krakow, Btrfs BTRFS

On Tue, Jul 5, 2016 at 11:10 PM, Chris Murphy <lists@colorremedies.com> wrote:
> I started a systemd-devel@ thread since that's where most udev stuff
> gets talked about.
>
> https://lists.freedesktop.org/archives/systemd-devel/2016-July/037031.html
>

Before discussing how to implement it in systemd, we need to decide
what to implement. I.e.

1) do you always want to mount filesystem in degraded mode if not
enough devices are present or only if explicit hint is given?
2) do you want to restrict degrade handling to root only or to other
filesystems as well? Note that there could be more early boot
filesystems that absolutely need same treatment (enters separate
/usr), and there are also normal filesystems that may need be mounted
even degraded.
3) can we query btrfs whether it is mountable in degraded mode?
according to documentation, "btrfs device ready" (which udev builtin
follows) checks "if it has ALL of it’s devices in cache for mounting".
This is required for proper systemd ordering of services.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 64-btrfs.rules and degraded boot
  2016-07-06  9:51       ` Andrei Borzenkov
@ 2016-07-06 11:45         ` Austin S. Hemmelgarn
  2016-07-06 11:55           ` Andrei Borzenkov
  2016-07-06 17:19         ` Chris Murphy
  1 sibling, 1 reply; 33+ messages in thread
From: Austin S. Hemmelgarn @ 2016-07-06 11:45 UTC (permalink / raw)
  To: Andrei Borzenkov, Chris Murphy; +Cc: Kai Krakow, Btrfs BTRFS

On 2016-07-06 05:51, Andrei Borzenkov wrote:
> On Tue, Jul 5, 2016 at 11:10 PM, Chris Murphy <lists@colorremedies.com> wrote:
>> I started a systemd-devel@ thread since that's where most udev stuff
>> gets talked about.
>>
>> https://lists.freedesktop.org/archives/systemd-devel/2016-July/037031.html
>>
>
> Before discussing how to implement it in systemd, we need to decide
> what to implement. I.e.
>
> 1) do you always want to mount filesystem in degraded mode if not
> enough devices are present or only if explicit hint is given?
> 2) do you want to restrict degrade handling to root only or to other
> filesystems as well? Note that there could be more early boot
> filesystems that absolutely need same treatment (enters separate
> /usr), and there are also normal filesystems that may need be mounted
> even degraded.
> 3) can we query btrfs whether it is mountable in degraded mode?
> according to documentation, "btrfs device ready" (which udev builtin
> follows) checks "if it has ALL of it’s devices in cache for mounting".
> This is required for proper systemd ordering of services.

To be entirely honest, if it were me, I'd want systemd to fsck off.  If 
the kernel mount(2) call succeeds, then the filesystem was ready enough 
to mount, and if it doesn't, then it wasn't, end of story.  The whole 
concept of trying to track in userspace something the kernel itself 
tracks and knows a whole lot more about is absolutely stupid.  It makes 
some sense when dealing with LVM or MD, because that is potentially a 
security issue (someone could inject a bogus device node that you then 
mount instead of your desired target), but it makes no sense here, 
because there's no way to prevent the equivalent from happening in BTRFS.

As far as the udev rules, I'm pretty certain that _we_ ship those with 
btrfs-progs, I have no idea why they're packaged with udev in CentOS (oh 
wait, I bet they package every single possible udev rule in that package 
just in case, don't they?).

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 64-btrfs.rules and degraded boot
  2016-07-06 11:45         ` Austin S. Hemmelgarn
@ 2016-07-06 11:55           ` Andrei Borzenkov
  2016-07-06 12:14             ` Austin S. Hemmelgarn
  2016-07-06 12:49             ` Tomasz Torcz
  0 siblings, 2 replies; 33+ messages in thread
From: Andrei Borzenkov @ 2016-07-06 11:55 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Chris Murphy, Kai Krakow, Btrfs BTRFS

On Wed, Jul 6, 2016 at 2:45 PM, Austin S. Hemmelgarn
<ahferroin7@gmail.com> wrote:
> On 2016-07-06 05:51, Andrei Borzenkov wrote:
>>
>> On Tue, Jul 5, 2016 at 11:10 PM, Chris Murphy <lists@colorremedies.com>
>> wrote:
>>>
>>> I started a systemd-devel@ thread since that's where most udev stuff
>>> gets talked about.
>>>
>>>
>>> https://lists.freedesktop.org/archives/systemd-devel/2016-July/037031.html
>>>
>>
>> Before discussing how to implement it in systemd, we need to decide
>> what to implement. I.e.
>>
>> 1) do you always want to mount filesystem in degraded mode if not
>> enough devices are present or only if explicit hint is given?
>> 2) do you want to restrict degrade handling to root only or to other
>> filesystems as well? Note that there could be more early boot
>> filesystems that absolutely need same treatment (enters separate
>> /usr), and there are also normal filesystems that may need be mounted
>> even degraded.
>> 3) can we query btrfs whether it is mountable in degraded mode?
>> according to documentation, "btrfs device ready" (which udev builtin
>> follows) checks "if it has ALL of it’s devices in cache for mounting".
>> This is required for proper systemd ordering of services.
>
>
> To be entirely honest, if it were me, I'd want systemd to fsck off.  If the
> kernel mount(2) call succeeds, then the filesystem was ready enough to
> mount, and if it doesn't, then it wasn't, end of story.

How should user space know when to try mount? What user space is
supposed to do during boot if mount fails? Do you suggest

while true; do
  mount /dev/foo && exit 0
done

as part of startup sequence? And note that nowhere is systemd involved so far.

> The whole concept
> of trying to track in userspace something the kernel itself tracks and knows
> a whole lot more about is absolutely stupid.

It need not be user space. If kernel notifies user space when
filesystem is mountable, problem solved. It could be udev event,
netlink, whatever. Until kernel does it, user space need to either
poll or somehow track it based on available events.

> It makes some sense when
> dealing with LVM or MD, because that is potentially a security issue
> (someone could inject a bogus device node that you then mount instead of
> your desired target),

I do not understand it at all. MD and LVM has exactly the same problem
- they need to know when they can assemble MD/VG. I miss what it has
to do with security, sorry.

> but it makes no sense here, because there's no way to
> prevent the equivalent from happening in BTRFS.
>
> As far as the udev rules, I'm pretty certain that _we_ ship those with
> btrfs-progs,

No, you do not. You ship rule to rename devices to be more
"user-friendly". But the rule in question has always been part of
udev.

> I have no idea why they're packaged with udev in CentOS (oh
> wait, I bet they package every single possible udev rule in that package
> just in case, don't they?).

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 64-btrfs.rules and degraded boot
  2016-07-06 11:55           ` Andrei Borzenkov
@ 2016-07-06 12:14             ` Austin S. Hemmelgarn
  2016-07-06 12:39               ` Andrei Borzenkov
  2016-07-06 12:49             ` Tomasz Torcz
  1 sibling, 1 reply; 33+ messages in thread
From: Austin S. Hemmelgarn @ 2016-07-06 12:14 UTC (permalink / raw)
  To: Andrei Borzenkov; +Cc: Chris Murphy, Kai Krakow, Btrfs BTRFS

On 2016-07-06 07:55, Andrei Borzenkov wrote:
> On Wed, Jul 6, 2016 at 2:45 PM, Austin S. Hemmelgarn
> <ahferroin7@gmail.com> wrote:
>> On 2016-07-06 05:51, Andrei Borzenkov wrote:
>>>
>>> On Tue, Jul 5, 2016 at 11:10 PM, Chris Murphy <lists@colorremedies.com>
>>> wrote:
>>>>
>>>> I started a systemd-devel@ thread since that's where most udev stuff
>>>> gets talked about.
>>>>
>>>>
>>>> https://lists.freedesktop.org/archives/systemd-devel/2016-July/037031.html
>>>>
>>>
>>> Before discussing how to implement it in systemd, we need to decide
>>> what to implement. I.e.
>>>
>>> 1) do you always want to mount filesystem in degraded mode if not
>>> enough devices are present or only if explicit hint is given?
>>> 2) do you want to restrict degrade handling to root only or to other
>>> filesystems as well? Note that there could be more early boot
>>> filesystems that absolutely need same treatment (enters separate
>>> /usr), and there are also normal filesystems that may need be mounted
>>> even degraded.
>>> 3) can we query btrfs whether it is mountable in degraded mode?
>>> according to documentation, "btrfs device ready" (which udev builtin
>>> follows) checks "if it has ALL of it’s devices in cache for mounting".
>>> This is required for proper systemd ordering of services.
>>
>>
>> To be entirely honest, if it were me, I'd want systemd to fsck off.  If the
>> kernel mount(2) call succeeds, then the filesystem was ready enough to
>> mount, and if it doesn't, then it wasn't, end of story.
>
> How should user space know when to try mount? What user space is
> supposed to do during boot if mount fails? Do you suggest
>
> while true; do
>   mount /dev/foo && exit 0
> done
>
> as part of startup sequence? And note that nowhere is systemd involved so far.
Nowhere there, except if you have a filesystem in fstab (or a mount 
unit, which I hate for other reasons that I will not go into right now), 
and you mount it and systemd thinks the device isn't ready, it unmounts 
it _immediately_.  In the case of boot, it's because of systemd thinking 
the device isn't ready that you can't mount degraded with a missing 
device.  In the case of the root filesystem at least, the initramfs is 
expected to handle this, and most of them do poll in some way, or have 
other methods of determining this.  I occasionally have issues with it 
with dracut without systemd, but that's due to a separate bug there 
involving the device mapper.

>
>> The whole concept
>> of trying to track in userspace something the kernel itself tracks and knows
>> a whole lot more about is absolutely stupid.
>
> It need not be user space. If kernel notifies user space when
> filesystem is mountable, problem solved. It could be udev event,
> netlink, whatever. Until kernel does it, user space need to either
> poll or somehow track it based on available events.
THis I agree could be done better, but it absolutely should not be in 
userspace, the notification needs to come from the kernel, but that 
leads to the problem of knowing whether or not the FS can mount 
degraded, or only ro, or any number of other situations.
>
>> It makes some sense when
>> dealing with LVM or MD, because that is potentially a security issue
>> (someone could inject a bogus device node that you then mount instead of
>> your desired target),
>
> I do not understand it at all. MD and LVM has exactly the same problem
> - they need to know when they can assemble MD/VG. I miss what it has
> to do with security, sorry.
If you don't track whether or not the device is assembled, then someone 
could create an arbitrary device node with the same name and then get 
you to mount that, possibly causing all kinds of issues depending on any 
number of other factors.
>
>> but it makes no sense here, because there's no way to
>> prevent the equivalent from happening in BTRFS.
>>
>> As far as the udev rules, I'm pretty certain that _we_ ship those with
>> btrfs-progs,
>
> No, you do not. You ship rule to rename devices to be more
> "user-friendly". But the rule in question has always been part of
> udev.
Ah, you're right, I was mistaken about this.
>
>> I have no idea why they're packaged with udev in CentOS (oh
>> wait, I bet they package every single possible udev rule in that package
>> just in case, don't they?).


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 64-btrfs.rules and degraded boot
  2016-07-06 12:14             ` Austin S. Hemmelgarn
@ 2016-07-06 12:39               ` Andrei Borzenkov
  2016-07-06 12:48                 ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 33+ messages in thread
From: Andrei Borzenkov @ 2016-07-06 12:39 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Chris Murphy, Kai Krakow, Btrfs BTRFS



Отправлено с iPhone

> 6 июля 2016 г., в 15:14, Austin S. Hemmelgarn <ahferroin7@gmail.com> написал(а):
> 
>> On 2016-07-06 07:55, Andrei Borzenkov wrote:
>> On Wed, Jul 6, 2016 at 2:45 PM, Austin S. Hemmelgarn
>> <ahferroin7@gmail.com> wrote:
>>> On 2016-07-06 05:51, Andrei Borzenkov wrote:
>>>> 
>>>> On Tue, Jul 5, 2016 at 11:10 PM, Chris Murphy <lists@colorremedies.com>
>>>> wrote:
>>>>> 
>>>>> I started a systemd-devel@ thread since that's where most udev stuff
>>>>> gets talked about.
>>>>> 
>>>>> 
>>>>> https://lists.freedesktop.org/archives/systemd-devel/2016-July/037031.html
>>>> 
>>>> Before discussing how to implement it in systemd, we need to decide
>>>> what to implement. I.e.
>>>> 
>>>> 1) do you always want to mount filesystem in degraded mode if not
>>>> enough devices are present or only if explicit hint is given?
>>>> 2) do you want to restrict degrade handling to root only or to other
>>>> filesystems as well? Note that there could be more early boot
>>>> filesystems that absolutely need same treatment (enters separate
>>>> /usr), and there are also normal filesystems that may need be mounted
>>>> even degraded.
>>>> 3) can we query btrfs whether it is mountable in degraded mode?
>>>> according to documentation, "btrfs device ready" (which udev builtin
>>>> follows) checks "if it has ALL of it’s devices in cache for mounting".
>>>> This is required for proper systemd ordering of services.
>>> 
>>> 
>>> To be entirely honest, if it were me, I'd want systemd to fsck off.  If the
>>> kernel mount(2) call succeeds, then the filesystem was ready enough to
>>> mount, and if it doesn't, then it wasn't, end of story.
>> 
>> How should user space know when to try mount? What user space is
>> supposed to do during boot if mount fails? Do you suggest
>> 
>> while true; do
>>  mount /dev/foo && exit 0
>> done
>> 
>> as part of startup sequence? And note that nowhere is systemd involved so far.
> Nowhere there, except if you have a filesystem in fstab (or a mount unit, which I hate for other reasons that I will not go into right now), and you mount it and systemd thinks the device isn't ready, it unmounts it _immediately_.  In the case of boot, it's because of systemd thinking the device isn't ready that you can't mount degraded with a missing device.  In the case of the root filesystem at least, the initramfs is expected to handle this, and most of them do poll in some way, or have other methods of determining this.  I occasionally have issues with it with dracut without systemd, but that's due to a separate bug there involving the device mapper.
> 

How this systemd bashing answers my question - how user space knows when it can call mount at startup?


>> 
>>> The whole concept
>>> of trying to track in userspace something the kernel itself tracks and knows
>>> a whole lot more about is absolutely stupid.
>> 
>> It need not be user space. If kernel notifies user space when
>> filesystem is mountable, problem solved. It could be udev event,
>> netlink, whatever. Until kernel does it, user space need to either
>> poll or somehow track it based on available events.
> THis I agree could be done better, but it absolutely should not be in userspace, the notification needs to come from the kernel, but that leads to the problem of knowing whether or not the FS can mount degraded, or only ro, or any number of other situations.
>> 
>>> It makes some sense when
>>> dealing with LVM or MD, because that is potentially a security issue
>>> (someone could inject a bogus device node that you then mount instead of
>>> your desired target),
>> 
>> I do not understand it at all. MD and LVM has exactly the same problem
>> - they need to know when they can assemble MD/VG. I miss what it has
>> to do with security, sorry.
> If you don't track whether or not the device is assembled, then someone could create an arbitrary device node with the same name and then get you to mount that, possibly causing all kinds of issues depending on any number of other factors.

Device node is created as soon as array is seen for the first time. If you imply someone may replace it, what prevents doing it at any arbitrary time in the future?

>> 
>>> but it makes no sense here, because there's no way to
>>> prevent the equivalent from happening in BTRFS.
>>> 
>>> As far as the udev rules, I'm pretty certain that _we_ ship those with
>>> btrfs-progs,
>> 
>> No, you do not. You ship rule to rename devices to be more
>> "user-friendly". But the rule in question has always been part of
>> udev.
> Ah, you're right, I was mistaken about this.
>> 
>>> I have no idea why they're packaged with udev in CentOS (oh
>>> wait, I bet they package every single possible udev rule in that package
>>> just in case, don't they?).
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 64-btrfs.rules and degraded boot
  2016-07-06 12:39               ` Andrei Borzenkov
@ 2016-07-06 12:48                 ` Austin S. Hemmelgarn
  2016-07-07 16:52                   ` Goffredo Baroncelli
  0 siblings, 1 reply; 33+ messages in thread
From: Austin S. Hemmelgarn @ 2016-07-06 12:48 UTC (permalink / raw)
  To: Andrei Borzenkov; +Cc: Chris Murphy, Kai Krakow, Btrfs BTRFS

On 2016-07-06 08:39, Andrei Borzenkov wrote:
>
>
> Отправлено с iPhone
>
>> 6 июля 2016 г., в 15:14, Austin S. Hemmelgarn <ahferroin7@gmail.com> написал(а):
>>
>>> On 2016-07-06 07:55, Andrei Borzenkov wrote:
>>> On Wed, Jul 6, 2016 at 2:45 PM, Austin S. Hemmelgarn
>>> <ahferroin7@gmail.com> wrote:
>>>> On 2016-07-06 05:51, Andrei Borzenkov wrote:
>>>>>
>>>>> On Tue, Jul 5, 2016 at 11:10 PM, Chris Murphy <lists@colorremedies.com>
>>>>> wrote:
>>>>>>
>>>>>> I started a systemd-devel@ thread since that's where most udev stuff
>>>>>> gets talked about.
>>>>>>
>>>>>>
>>>>>> https://lists.freedesktop.org/archives/systemd-devel/2016-July/037031.html
>>>>>
>>>>> Before discussing how to implement it in systemd, we need to decide
>>>>> what to implement. I.e.
>>>>>
>>>>> 1) do you always want to mount filesystem in degraded mode if not
>>>>> enough devices are present or only if explicit hint is given?
>>>>> 2) do you want to restrict degrade handling to root only or to other
>>>>> filesystems as well? Note that there could be more early boot
>>>>> filesystems that absolutely need same treatment (enters separate
>>>>> /usr), and there are also normal filesystems that may need be mounted
>>>>> even degraded.
>>>>> 3) can we query btrfs whether it is mountable in degraded mode?
>>>>> according to documentation, "btrfs device ready" (which udev builtin
>>>>> follows) checks "if it has ALL of it’s devices in cache for mounting".
>>>>> This is required for proper systemd ordering of services.
>>>>
>>>>
>>>> To be entirely honest, if it were me, I'd want systemd to fsck off.  If the
>>>> kernel mount(2) call succeeds, then the filesystem was ready enough to
>>>> mount, and if it doesn't, then it wasn't, end of story.
>>>
>>> How should user space know when to try mount? What user space is
>>> supposed to do during boot if mount fails? Do you suggest
>>>
>>> while true; do
>>>  mount /dev/foo && exit 0
>>> done
>>>
>>> as part of startup sequence? And note that nowhere is systemd involved so far.
>> Nowhere there, except if you have a filesystem in fstab (or a mount unit, which I hate for other reasons that I will not go into right now), and you mount it and systemd thinks the device isn't ready, it unmounts it _immediately_.  In the case of boot, it's because of systemd thinking the device isn't ready that you can't mount degraded with a missing device.  In the case of the root filesystem at least, the initramfs is expected to handle this, and most of them do poll in some way, or have other methods of determining this.  I occasionally have issues with it with dracut without systemd, but that's due to a separate bug there involving the device mapper.
>>
>
> How this systemd bashing answers my question - how user space knows when it can call mount at startup?
You mentioned that systemd wasn't involved, which is patently false if 
it's being used as your init system, and I was admittedly mostly 
responding to that.

Now, to answer the primary question which I forgot to answer:
Userspace doesn't.  Systemd doesn't either but assumes it does and 
checks in a flawed way.  Dracut's polling loop assumes it does but 
sometimes fails in a different way.  There is no way other than calling 
mount right now to know for sure if the mount will succeed, and that 
actually applies to a certain degree to any filesystem (because any 
number of things that are outside of even the kernel's control might 
happen while trying to mount the device.
>
>
>>>
>>>> The whole concept
>>>> of trying to track in userspace something the kernel itself tracks and knows
>>>> a whole lot more about is absolutely stupid.
>>>
>>> It need not be user space. If kernel notifies user space when
>>> filesystem is mountable, problem solved. It could be udev event,
>>> netlink, whatever. Until kernel does it, user space need to either
>>> poll or somehow track it based on available events.
>> THis I agree could be done better, but it absolutely should not be in userspace, the notification needs to come from the kernel, but that leads to the problem of knowing whether or not the FS can mount degraded, or only ro, or any number of other situations.
>>>
>>>> It makes some sense when
>>>> dealing with LVM or MD, because that is potentially a security issue
>>>> (someone could inject a bogus device node that you then mount instead of
>>>> your desired target),
>>>
>>> I do not understand it at all. MD and LVM has exactly the same problem
>>> - they need to know when they can assemble MD/VG. I miss what it has
>>> to do with security, sorry.
>> If you don't track whether or not the device is assembled, then someone could create an arbitrary device node with the same name and then get you to mount that, possibly causing all kinds of issues depending on any number of other factors.
>
> Device node is created as soon as array is seen for the first time. If you imply someone may replace it, what prevents doing it at any arbitrary time in the future?
It's still possible, but it's not as easy because replacing it after 
it's mounted would require a remount to have any effect.  The most 
reliable time to do something like this is during boot before the mount. 
  LVM and/or MD may or may not replace the node properly when they start 
(I don't have enough background on MD and haven't tested with LVM), but 
if that's after the fake node has already been mounted, then it's won't 
help much, except for helping cover up the attack.
>
>>>
>>>> but it makes no sense here, because there's no way to
>>>> prevent the equivalent from happening in BTRFS.
>>>>
>>>> As far as the udev rules, I'm pretty certain that _we_ ship those with
>>>> btrfs-progs,
>>>
>>> No, you do not. You ship rule to rename devices to be more
>>> "user-friendly". But the rule in question has always been part of
>>> udev.
>> Ah, you're right, I was mistaken about this.
>>>
>>>> I have no idea why they're packaged with udev in CentOS (oh
>>>> wait, I bet they package every single possible udev rule in that package
>>>> just in case, don't they?).
>>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 64-btrfs.rules and degraded boot
  2016-07-06 11:55           ` Andrei Borzenkov
  2016-07-06 12:14             ` Austin S. Hemmelgarn
@ 2016-07-06 12:49             ` Tomasz Torcz
  1 sibling, 0 replies; 33+ messages in thread
From: Tomasz Torcz @ 2016-07-06 12:49 UTC (permalink / raw)
  To: Btrfs BTRFS

On Wed, Jul 06, 2016 at 02:55:37PM +0300, Andrei Borzenkov wrote:
> On Wed, Jul 6, 2016 at 2:45 PM, Austin S. Hemmelgarn
> <ahferroin7@gmail.com> wrote:
> > On 2016-07-06 05:51, Andrei Borzenkov wrote:
> >>
> >> On Tue, Jul 5, 2016 at 11:10 PM, Chris Murphy <lists@colorremedies.com>
> >> wrote:
> >>>
> >>> I started a systemd-devel@ thread since that's where most udev stuff
> >>> gets talked about.
> >>>
> >>>
> >>> https://lists.freedesktop.org/archives/systemd-devel/2016-July/037031.html
> >>>
> >>
> >> Before discussing how to implement it in systemd, we need to decide
> >> what to implement. I.e.
> >>
> >> 1) do you always want to mount filesystem in degraded mode if not
> >> enough devices are present or only if explicit hint is given?
> >> 2) do you want to restrict degrade handling to root only or to other
> >> filesystems as well? Note that there could be more early boot
> >> filesystems that absolutely need same treatment (enters separate
> >> /usr), and there are also normal filesystems that may need be mounted
> >> even degraded.
> >> 3) can we query btrfs whether it is mountable in degraded mode?
> >> according to documentation, "btrfs device ready" (which udev builtin
> >> follows) checks "if it has ALL of it’s devices in cache for mounting".
> >> This is required for proper systemd ordering of services.
> >
> >
> > To be entirely honest, if it were me, I'd want systemd to fsck off.  If the
> > kernel mount(2) call succeeds, then the filesystem was ready enough to
> > mount, and if it doesn't, then it wasn't, end of story.
> 
> How should user space know when to try mount? What user space is
> supposed to do during boot if mount fails? Do you suggest
> 
> while true; do
>   mount /dev/foo && exit 0
> done
> 
> as part of startup sequence? And note that nowhere is systemd involved so far.

  Getting rid of such loops was the original motivation for the ioctl:
http://www.spinics.net/lists/linux-btrfs/msg17372.html

  Maybe the ioctl need extending? Instead of returning 1/0, it could
take flag saying ”return 1 as soon as degraded mount is possible”?
  
-- 
Tomasz Torcz                 Morality must always be based on practicality.
xmpp: zdzichubg@chrome.pl                -- Baron Vladimir Harkonnen


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 64-btrfs.rules and degraded boot
  2016-07-06  9:51       ` Andrei Borzenkov
  2016-07-06 11:45         ` Austin S. Hemmelgarn
@ 2016-07-06 17:19         ` Chris Murphy
  2016-07-06 18:04           ` Austin S. Hemmelgarn
  2016-07-06 18:24           ` Andrei Borzenkov
  1 sibling, 2 replies; 33+ messages in thread
From: Chris Murphy @ 2016-07-06 17:19 UTC (permalink / raw)
  To: Andrei Borzenkov; +Cc: Chris Murphy, Kai Krakow, Btrfs BTRFS

On Wed, Jul 6, 2016 at 3:51 AM, Andrei Borzenkov <arvidjaar@gmail.com> wrote:
> On Tue, Jul 5, 2016 at 11:10 PM, Chris Murphy <lists@colorremedies.com> wrote:
>> I started a systemd-devel@ thread since that's where most udev stuff
>> gets talked about.
>>
>> https://lists.freedesktop.org/archives/systemd-devel/2016-July/037031.html
>>
>
> Before discussing how to implement it in systemd, we need to decide
> what to implement. I.e.

Fair.


> 1) do you always want to mount filesystem in degraded mode if not
> enough devices are present or only if explicit hint is given?

Right now on Btrfs, it should be explicit. The faulty device concept,
handling, and notification is not mature. It's not a good idea to
silently mount degraded considering Btrfs does not actively catch up
the devices that are behind the next time there's a normal mount. It
only fixes things passively. So the user must opt into degraded mounts
rather than opt out.

The problem is the current udev rule is doing its own check for device
availability. So the mount command with explicit hint doesn't even get
attempted.



> 2) do you want to restrict degrade handling to root only or to other
> filesystems as well? Note that there could be more early boot
> filesystems that absolutely need same treatment (enters separate
> /usr), and there are also normal filesystems that may need be mounted
> even degraded.

I'm mainly concerned with rootfs. And I'm mainly concerned with a very
simple 2 disk raid1. With a simple user opt in using
rootflags=degraded, it should be possible to boot the system. Right
now it's not possible. Maybe just deleting 64-btrfs.rules would fix
this problem, I haven't tried it.


> 3) can we query btrfs whether it is mountable in degraded mode?
> according to documentation, "btrfs device ready" (which udev builtin
> follows) checks "if it has ALL of it’s devices in cache for mounting".
> This is required for proper systemd ordering of services.

Where does udev builtin use btrfs itself? I see "btrfs ready $device"
which is not a valid btrfs user space command.

I never get any errors from "btrfs device ready" even when too many
devices are missing. I don't know what it even does or if it's broken.

This is a three device raid1 where I removed 2 devices and "btrfs
device ready" does not complain, it always returns silent for me no
matter what. It's been this way for years as far as I know.

[root@f24s ~]# lvs
  LV         VG Attr       LSize  Pool       Origin Data%  Meta%  Move
Log Cpy%Sync Convert
  1          VG Vwi-a-tz-- 50.00g thintastic        2.55
  2          VG Vwi-a-tz-- 50.00g thintastic        4.00
  3          VG Vwi-a-tz-- 50.00g thintastic        2.54
  thintastic VG twi-aotz-- 90.00g                   5.05   2.92
[root@f24s ~]# btrfs fi show
Label: none  uuid: 96240fd9-ea76-47e7-8cf4-05d3570ccfd7
    Total devices 3 FS bytes used 2.26GiB
    devid    1 size 50.00GiB used 3.00GiB path /dev/mapper/VG-1
    devid    2 size 50.00GiB used 2.01GiB path /dev/mapper/VG-2
    devid    3 size 50.00GiB used 3.01GiB path /dev/mapper/VG-3

[root@f24s ~]# btrfs device ready /dev/mapper/VG-1
[root@f24s ~]#
[root@f24s ~]# lvchange -an VG/1
[root@f24s ~]# lvchange -an VG/2
[root@f24s ~]# btrfs dev scan
Scanning for Btrfs filesystems
[root@f24s ~]# lvs
  LV         VG Attr       LSize  Pool       Origin Data%  Meta%  Move
Log Cpy%Sync Convert
  1          VG Vwi---tz-- 50.00g thintastic
  2          VG Vwi---tz-- 50.00g thintastic
  3          VG Vwi-a-tz-- 50.00g thintastic        2.54
  thintastic VG twi-aotz-- 90.00g                   5.05   2.92
[root@f24s ~]# btrfs fi show
warning, device 2 is missing
Label: none  uuid: 96240fd9-ea76-47e7-8cf4-05d3570ccfd7
    Total devices 3 FS bytes used 2.26GiB
    devid    3 size 50.00GiB used 3.01GiB path /dev/mapper/VG-3
    *** Some devices missing

[root@f24s ~]# btrfs device ready /dev/mapper/VG-3
[root@f24s ~]#




-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 64-btrfs.rules and degraded boot
  2016-07-06 17:19         ` Chris Murphy
@ 2016-07-06 18:04           ` Austin S. Hemmelgarn
  2016-07-06 18:23             ` Chris Murphy
  2016-07-06 18:24           ` Andrei Borzenkov
  1 sibling, 1 reply; 33+ messages in thread
From: Austin S. Hemmelgarn @ 2016-07-06 18:04 UTC (permalink / raw)
  To: Chris Murphy, Andrei Borzenkov; +Cc: Kai Krakow, Btrfs BTRFS

On 2016-07-06 13:19, Chris Murphy wrote:
> On Wed, Jul 6, 2016 at 3:51 AM, Andrei Borzenkov <arvidjaar@gmail.com> wrote:
>> 3) can we query btrfs whether it is mountable in degraded mode?
>> according to documentation, "btrfs device ready" (which udev builtin
>> follows) checks "if it has ALL of it’s devices in cache for mounting".
>> This is required for proper systemd ordering of services.
>
> Where does udev builtin use btrfs itself? I see "btrfs ready $device"
> which is not a valid btrfs user space command.
>
> I never get any errors from "btrfs device ready" even when too many
> devices are missing. I don't know what it even does or if it's broken.
>
> This is a three device raid1 where I removed 2 devices and "btrfs
> device ready" does not complain, it always returns silent for me no
> matter what. It's been this way for years as far as I know.
>
> [root@f24s ~]# lvs
>   LV         VG Attr       LSize  Pool       Origin Data%  Meta%  Move
> Log Cpy%Sync Convert
>   1          VG Vwi-a-tz-- 50.00g thintastic        2.55
>   2          VG Vwi-a-tz-- 50.00g thintastic        4.00
>   3          VG Vwi-a-tz-- 50.00g thintastic        2.54
>   thintastic VG twi-aotz-- 90.00g                   5.05   2.92
> [root@f24s ~]# btrfs fi show
> Label: none  uuid: 96240fd9-ea76-47e7-8cf4-05d3570ccfd7
>     Total devices 3 FS bytes used 2.26GiB
>     devid    1 size 50.00GiB used 3.00GiB path /dev/mapper/VG-1
>     devid    2 size 50.00GiB used 2.01GiB path /dev/mapper/VG-2
>     devid    3 size 50.00GiB used 3.01GiB path /dev/mapper/VG-3
>
> [root@f24s ~]# btrfs device ready /dev/mapper/VG-1
> [root@f24s ~]#
> [root@f24s ~]# lvchange -an VG/1
> [root@f24s ~]# lvchange -an VG/2
> [root@f24s ~]# btrfs dev scan
> Scanning for Btrfs filesystems
> [root@f24s ~]# lvs
>   LV         VG Attr       LSize  Pool       Origin Data%  Meta%  Move
> Log Cpy%Sync Convert
>   1          VG Vwi---tz-- 50.00g thintastic
>   2          VG Vwi---tz-- 50.00g thintastic
>   3          VG Vwi-a-tz-- 50.00g thintastic        2.54
>   thintastic VG twi-aotz-- 90.00g                   5.05   2.92
> [root@f24s ~]# btrfs fi show
> warning, device 2 is missing
> Label: none  uuid: 96240fd9-ea76-47e7-8cf4-05d3570ccfd7
>     Total devices 3 FS bytes used 2.26GiB
>     devid    3 size 50.00GiB used 3.01GiB path /dev/mapper/VG-3
>     *** Some devices missing
>
> [root@f24s ~]# btrfs device ready /dev/mapper/VG-3
> [root@f24s ~]#
You won't get any output from it regardless, you have to check the 
return code as it's intended to be a tool for scripts and such.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 64-btrfs.rules and degraded boot
  2016-07-06 18:04           ` Austin S. Hemmelgarn
@ 2016-07-06 18:23             ` Chris Murphy
  2016-07-06 18:29               ` Andrei Borzenkov
  2016-07-06 19:17               ` Austin S. Hemmelgarn
  0 siblings, 2 replies; 33+ messages in thread
From: Chris Murphy @ 2016-07-06 18:23 UTC (permalink / raw)
  To: Austin S. Hemmelgarn
  Cc: Chris Murphy, Andrei Borzenkov, Kai Krakow, Btrfs BTRFS

On Wed, Jul 6, 2016 at 12:04 PM, Austin S. Hemmelgarn
<ahferroin7@gmail.com> wrote:
> On 2016-07-06 13:19, Chris Murphy wrote:
>>
>> On Wed, Jul 6, 2016 at 3:51 AM, Andrei Borzenkov <arvidjaar@gmail.com>
>> wrote:
>>>
>>> 3) can we query btrfs whether it is mountable in degraded mode?
>>> according to documentation, "btrfs device ready" (which udev builtin
>>> follows) checks "if it has ALL of it’s devices in cache for mounting".
>>> This is required for proper systemd ordering of services.
>>
>>
>> Where does udev builtin use btrfs itself? I see "btrfs ready $device"
>> which is not a valid btrfs user space command.
>>
>> I never get any errors from "btrfs device ready" even when too many
>> devices are missing. I don't know what it even does or if it's broken.
>>
>> This is a three device raid1 where I removed 2 devices and "btrfs
>> device ready" does not complain, it always returns silent for me no
>> matter what. It's been this way for years as far as I know.
>>
>> [root@f24s ~]# lvs
>>   LV         VG Attr       LSize  Pool       Origin Data%  Meta%  Move
>> Log Cpy%Sync Convert
>>   1          VG Vwi-a-tz-- 50.00g thintastic        2.55
>>   2          VG Vwi-a-tz-- 50.00g thintastic        4.00
>>   3          VG Vwi-a-tz-- 50.00g thintastic        2.54
>>   thintastic VG twi-aotz-- 90.00g                   5.05   2.92
>> [root@f24s ~]# btrfs fi show
>> Label: none  uuid: 96240fd9-ea76-47e7-8cf4-05d3570ccfd7
>>     Total devices 3 FS bytes used 2.26GiB
>>     devid    1 size 50.00GiB used 3.00GiB path /dev/mapper/VG-1
>>     devid    2 size 50.00GiB used 2.01GiB path /dev/mapper/VG-2
>>     devid    3 size 50.00GiB used 3.01GiB path /dev/mapper/VG-3
>>
>> [root@f24s ~]# btrfs device ready /dev/mapper/VG-1
>> [root@f24s ~]#
>> [root@f24s ~]# lvchange -an VG/1
>> [root@f24s ~]# lvchange -an VG/2
>> [root@f24s ~]# btrfs dev scan
>> Scanning for Btrfs filesystems
>> [root@f24s ~]# lvs
>>   LV         VG Attr       LSize  Pool       Origin Data%  Meta%  Move
>> Log Cpy%Sync Convert
>>   1          VG Vwi---tz-- 50.00g thintastic
>>   2          VG Vwi---tz-- 50.00g thintastic
>>   3          VG Vwi-a-tz-- 50.00g thintastic        2.54
>>   thintastic VG twi-aotz-- 90.00g                   5.05   2.92
>> [root@f24s ~]# btrfs fi show
>> warning, device 2 is missing
>> Label: none  uuid: 96240fd9-ea76-47e7-8cf4-05d3570ccfd7
>>     Total devices 3 FS bytes used 2.26GiB
>>     devid    3 size 50.00GiB used 3.01GiB path /dev/mapper/VG-3
>>     *** Some devices missing
>>
>> [root@f24s ~]# btrfs device ready /dev/mapper/VG-3
>> [root@f24s ~]#
>
> You won't get any output from it regardless, you have to check the return
> code as it's intended to be a tool for scripts and such.

How do I check the return code? When I use strace, no matter what I'm getting

+++ exited with 0 +++

I see both 'brfs device ready' and the udev btrfs builtin test are
calling BTRFS_IOC_DEVICES_READY so, it looks like udev is not using
user space tools to check but rather a btrfs ioctl. So clearly that
works or I wouldn't have stalled boots when all devices aren't
present.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 64-btrfs.rules and degraded boot
  2016-07-06 17:19         ` Chris Murphy
  2016-07-06 18:04           ` Austin S. Hemmelgarn
@ 2016-07-06 18:24           ` Andrei Borzenkov
  2016-07-06 18:57             ` Chris Murphy
  1 sibling, 1 reply; 33+ messages in thread
From: Andrei Borzenkov @ 2016-07-06 18:24 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Kai Krakow, Btrfs BTRFS

On Wed, Jul 6, 2016 at 8:19 PM, Chris Murphy <lists@colorremedies.com> wrote:
>
> I'm mainly concerned with rootfs. And I'm mainly concerned with a very
> simple 2 disk raid1. With a simple user opt in using
> rootflags=degraded, it should be possible to boot the system. Right
> now it's not possible. Maybe just deleting 64-btrfs.rules would fix
> this problem, I haven't tried it.
>

While deleting this rule will fix your specific degraded 2 disk raid 1
it will break non-degraded multi-device filesystem. Logic currently
implemented by systemd assumes that mount is called after
prerequisites have been fulfilled. Deleting this rule will call mount
as soon as the very first device is seen; such filesystem is obviously
not mountable.

Equivalent of this rule is required under systemd and desired in
general to avoid polling. On systemd list I outlined possible
alternative implementation as systemd service instead of really
hackish udev rule.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 64-btrfs.rules and degraded boot
  2016-07-06 18:23             ` Chris Murphy
@ 2016-07-06 18:29               ` Andrei Borzenkov
  2016-07-06 19:17               ` Austin S. Hemmelgarn
  1 sibling, 0 replies; 33+ messages in thread
From: Andrei Borzenkov @ 2016-07-06 18:29 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Austin S. Hemmelgarn, Kai Krakow, Btrfs BTRFS

On Wed, Jul 6, 2016 at 9:23 PM, Chris Murphy <lists@colorremedies.com> wrote:
>>> [root@f24s ~]# btrfs fi show
>>> warning, device 2 is missing
>>> Label: none  uuid: 96240fd9-ea76-47e7-8cf4-05d3570ccfd7
>>>     Total devices 3 FS bytes used 2.26GiB
>>>     devid    3 size 50.00GiB used 3.01GiB path /dev/mapper/VG-3
>>>     *** Some devices missing
>>>
>>> [root@f24s ~]# btrfs device ready /dev/mapper/VG-3
>>> [root@f24s ~]#
>>
>> You won't get any output from it regardless, you have to check the return
>> code as it's intended to be a tool for scripts and such.
>
> How do I check the return code? When I use strace, no matter what I'm getting
>
> +++ exited with 0 +++
>
> I see both 'brfs device ready' and the udev btrfs builtin test are
> calling BTRFS_IOC_DEVICES_READY so, it looks like udev is not using
> user space tools to check but rather a btrfs ioctl.

Correct. It is possible that ioctl returns correct result only the
very first time; notice that in your example btrfs had seen all other
devices at least once while at boot it is really the case of other
devices missing so far.

Which returns us to the question - how we can reliably query kernel
about mountability of filesystem.

> So clearly that
> works or I wouldn't have stalled boots when all devices aren't
> present.
>
> --
> Chris Murphy

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 64-btrfs.rules and degraded boot
  2016-07-06 18:24           ` Andrei Borzenkov
@ 2016-07-06 18:57             ` Chris Murphy
  2016-07-07 17:07               ` Goffredo Baroncelli
  0 siblings, 1 reply; 33+ messages in thread
From: Chris Murphy @ 2016-07-06 18:57 UTC (permalink / raw)
  To: Andrei Borzenkov; +Cc: Chris Murphy, Kai Krakow, Btrfs BTRFS

On Wed, Jul 6, 2016 at 12:24 PM, Andrei Borzenkov <arvidjaar@gmail.com> wrote:
> On Wed, Jul 6, 2016 at 8:19 PM, Chris Murphy <lists@colorremedies.com> wrote:
>>
>> I'm mainly concerned with rootfs. And I'm mainly concerned with a very
>> simple 2 disk raid1. With a simple user opt in using
>> rootflags=degraded, it should be possible to boot the system. Right
>> now it's not possible. Maybe just deleting 64-btrfs.rules would fix
>> this problem, I haven't tried it.
>>
>
> While deleting this rule will fix your specific degraded 2 disk raid 1
> it will break non-degraded multi-device filesystem. Logic currently
> implemented by systemd assumes that mount is called after
> prerequisites have been fulfilled. Deleting this rule will call mount
> as soon as the very first device is seen; such filesystem is obviously
> not mountable.

Seems like we need more granularity by btrfs ioctl for device ready,
e.g. some way to indicate:

0 all devices ready
1 devices not ready (don't even try to mount)
2 minimum devices ready (degraded mount possible)


Btrfs multiple device single and raid0 only return code 0 or 1. Where
raid 1, 5, 6 could return code 2. The systemd default policy for code
2 could be to wait some amount of time to see if state goes to 0. At
the timeout, try to mount anyway. If rootflags=degraded, it mounts. If
not, mount fails, and we get a dracut prompt.

That's better behavior than now.

> Equivalent of this rule is required under systemd and desired in
> general to avoid polling. On systemd list I outlined possible
> alternative implementation as systemd service instead of really
> hackish udev rule.

I'll go read it there. Thanks.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 64-btrfs.rules and degraded boot
  2016-07-06 18:23             ` Chris Murphy
  2016-07-06 18:29               ` Andrei Borzenkov
@ 2016-07-06 19:17               ` Austin S. Hemmelgarn
  2016-07-06 20:00                 ` Chris Murphy
  1 sibling, 1 reply; 33+ messages in thread
From: Austin S. Hemmelgarn @ 2016-07-06 19:17 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Andrei Borzenkov, Kai Krakow, Btrfs BTRFS

On 2016-07-06 14:23, Chris Murphy wrote:
> On Wed, Jul 6, 2016 at 12:04 PM, Austin S. Hemmelgarn
> <ahferroin7@gmail.com> wrote:
>> On 2016-07-06 13:19, Chris Murphy wrote:
>>>
>>> On Wed, Jul 6, 2016 at 3:51 AM, Andrei Borzenkov <arvidjaar@gmail.com>
>>> wrote:
>>>>
>>>> 3) can we query btrfs whether it is mountable in degraded mode?
>>>> according to documentation, "btrfs device ready" (which udev builtin
>>>> follows) checks "if it has ALL of it’s devices in cache for mounting".
>>>> This is required for proper systemd ordering of services.
>>>
>>>
>>> Where does udev builtin use btrfs itself? I see "btrfs ready $device"
>>> which is not a valid btrfs user space command.
>>>
>>> I never get any errors from "btrfs device ready" even when too many
>>> devices are missing. I don't know what it even does or if it's broken.
>>>
>>> This is a three device raid1 where I removed 2 devices and "btrfs
>>> device ready" does not complain, it always returns silent for me no
>>> matter what. It's been this way for years as far as I know.
>>>
>>> [root@f24s ~]# lvs
>>>   LV         VG Attr       LSize  Pool       Origin Data%  Meta%  Move
>>> Log Cpy%Sync Convert
>>>   1          VG Vwi-a-tz-- 50.00g thintastic        2.55
>>>   2          VG Vwi-a-tz-- 50.00g thintastic        4.00
>>>   3          VG Vwi-a-tz-- 50.00g thintastic        2.54
>>>   thintastic VG twi-aotz-- 90.00g                   5.05   2.92
>>> [root@f24s ~]# btrfs fi show
>>> Label: none  uuid: 96240fd9-ea76-47e7-8cf4-05d3570ccfd7
>>>     Total devices 3 FS bytes used 2.26GiB
>>>     devid    1 size 50.00GiB used 3.00GiB path /dev/mapper/VG-1
>>>     devid    2 size 50.00GiB used 2.01GiB path /dev/mapper/VG-2
>>>     devid    3 size 50.00GiB used 3.01GiB path /dev/mapper/VG-3
>>>
>>> [root@f24s ~]# btrfs device ready /dev/mapper/VG-1
>>> [root@f24s ~]#
>>> [root@f24s ~]# lvchange -an VG/1
>>> [root@f24s ~]# lvchange -an VG/2
>>> [root@f24s ~]# btrfs dev scan
>>> Scanning for Btrfs filesystems
>>> [root@f24s ~]# lvs
>>>   LV         VG Attr       LSize  Pool       Origin Data%  Meta%  Move
>>> Log Cpy%Sync Convert
>>>   1          VG Vwi---tz-- 50.00g thintastic
>>>   2          VG Vwi---tz-- 50.00g thintastic
>>>   3          VG Vwi-a-tz-- 50.00g thintastic        2.54
>>>   thintastic VG twi-aotz-- 90.00g                   5.05   2.92
>>> [root@f24s ~]# btrfs fi show
>>> warning, device 2 is missing
>>> Label: none  uuid: 96240fd9-ea76-47e7-8cf4-05d3570ccfd7
>>>     Total devices 3 FS bytes used 2.26GiB
>>>     devid    3 size 50.00GiB used 3.01GiB path /dev/mapper/VG-3
>>>     *** Some devices missing
>>>
>>> [root@f24s ~]# btrfs device ready /dev/mapper/VG-3
>>> [root@f24s ~]#
>>
>> You won't get any output from it regardless, you have to check the return
>> code as it's intended to be a tool for scripts and such.
>
> How do I check the return code? When I use strace, no matter what I'm getting
>
> +++ exited with 0 +++
>
> I see both 'brfs device ready' and the udev btrfs builtin test are
> calling BTRFS_IOC_DEVICES_READY so, it looks like udev is not using
> user space tools to check but rather a btrfs ioctl. So clearly that
> works or I wouldn't have stalled boots when all devices aren't
> present.
>
In bash or most other POSIX compliant shells, you can run this:
echo $?
to get the return code of the previous command.

In your case though, it may be reporting the FS ready because it had 
already seen all the devices, IIUC, the flag that checks is only set 
once, and never unset, which is not a good design in this case.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 64-btrfs.rules and degraded boot
  2016-07-06 19:17               ` Austin S. Hemmelgarn
@ 2016-07-06 20:00                 ` Chris Murphy
  2016-07-07 17:00                   ` Goffredo Baroncelli
  0 siblings, 1 reply; 33+ messages in thread
From: Chris Murphy @ 2016-07-06 20:00 UTC (permalink / raw)
  To: Austin S. Hemmelgarn
  Cc: Chris Murphy, Andrei Borzenkov, Kai Krakow, Btrfs BTRFS

On Wed, Jul 6, 2016 at 1:17 PM, Austin S. Hemmelgarn
<ahferroin7@gmail.com> wrote:

> In bash or most other POSIX compliant shells, you can run this:
> echo $?
> to get the return code of the previous command.
>
> In your case though, it may be reporting the FS ready because it had already
> seen all the devices, IIUC, the flag that checks is only set once, and never
> unset, which is not a good design in this case.

Oh dear.

[root@f24s ~]# lvs
  LV         VG Attr       LSize  Pool       Origin Data%  Meta%  Move
Log Cpy%Sync Convert
  1          VG Vwi---tz-- 50.00g thintastic
  2          VG Vwi---tz-- 50.00g thintastic
  3          VG Vwi-a-tz-- 50.00g thintastic        2.54
  thintastic VG twi-aotz-- 90.00g                   5.05   2.92
[root@f24s ~]# btrfs dev scan
Scanning for Btrfs filesystems
[root@f24s ~]# echo $?
0
[root@f24s ~]# btrfs device ready /dev/mapper/VG-3
[root@f24s ~]# echo $?
0
[root@f24s ~]# btrfs fi show
warning, device 2 is missing
Label: none  uuid: 96240fd9-ea76-47e7-8cf4-05d3570ccfd7
    Total devices 3 FS bytes used 2.26GiB
    devid    3 size 50.00GiB used 3.01GiB path /dev/mapper/VG-3
    *** Some devices missing


Cute, device 1 is also missing but that's not mentioned. In any case,
the device is still ready even after a dev scan. I guess this isn't
exactly testable all that easily unless I reboot.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 64-btrfs.rules and degraded boot
  2016-07-05 18:53 64-btrfs.rules and degraded boot Chris Murphy
  2016-07-05 19:27 ` Kai Krakow
@ 2016-07-07 16:37 ` Goffredo Baroncelli
  1 sibling, 0 replies; 33+ messages in thread
From: Goffredo Baroncelli @ 2016-07-07 16:37 UTC (permalink / raw)
  To: Chris Murphy, Btrfs BTRFS

On 2016-07-05 20:53, Chris Murphy wrote:
> I am kinda confused about this "btrfs ready $devnode" portion. Isn't
> it "btrfs device ready $devnode" if this is based on user space tools?

systemd, implemented this as internal command

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 64-btrfs.rules and degraded boot
  2016-07-06 12:48                 ` Austin S. Hemmelgarn
@ 2016-07-07 16:52                   ` Goffredo Baroncelli
  2016-07-07 18:23                     ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 33+ messages in thread
From: Goffredo Baroncelli @ 2016-07-07 16:52 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Andrei Borzenkov
  Cc: Chris Murphy, Kai Krakow, Btrfs BTRFS

On 2016-07-06 14:48, Austin S. Hemmelgarn wrote:
> On 2016-07-06 08:39, Andrei Borzenkov wrote:
[....]
>>>>> 
>>>>> To be entirely honest, if it were me, I'd want systemd to
>>>>> fsck off.  If the kernel mount(2) call succeeds, then the
>>>>> filesystem was ready enough to mount, and if it doesn't, then
>>>>> it wasn't, end of story.
>>>> 
>>>> How should user space know when to try mount? What user space
>>>> is supposed to do during boot if mount fails? Do you suggest
>>>> 
>>>> while true; do mount /dev/foo && exit 0 done
>>>> 
>>>> as part of startup sequence? And note that nowhere is systemd
>>>> involved so far.
>>> Nowhere there, except if you have a filesystem in fstab (or a
>>> mount unit, which I hate for other reasons that I will not go
>>> into right now), and you mount it and systemd thinks the device
>>> isn't ready, it unmounts it _immediately_.  In the case of boot,
>>> it's because of systemd thinking the device isn't ready that you
>>> can't mount degraded with a missing device.  In the case of the
>>> root filesystem at least, the initramfs is expected to handle
>>> this, and most of them do poll in some way, or have other methods
>>> of determining this.  I occasionally have issues with it with
>>> dracut without systemd, but that's due to a separate bug there
>>> involving the device mapper.
>>> 
>> 
>> How this systemd bashing answers my question - how user space knows
>> when it can call mount at startup?
> You mentioned that systemd wasn't involved, which is patently false
> if it's being used as your init system, and I was admittedly mostly
> responding to that.
> 
> Now, to answer the primary question which I forgot to answer: 
> Userspace doesn't.  Systemd doesn't either but assumes it does and
> checks in a flawed way.  Dracut's polling loop assumes it does but
> sometimes fails in a different way.  There is no way other than
> calling mount right now to know for sure if the mount will succeed,
> and that actually applies to a certain degree to any filesystem
> (because any number of things that are outside of even the kernel's
> control might happen while trying to mount the device.

I think that there is no a simple answer, and the answer may depend by context. 
In the past, I made a prototype of a mount helper for btrfs [1]; the aim was to:

1) get rid of the actual btrfs volume discovery (udev which trigger btrfs dev scan) which has a lot of strange condition (what happens when a device disappear ?)
2) create a place where we develop and define strategies to handle all (or most) of the case of [partial] failure of a [multi-device] btrfs filesystem

By default, my mount.btrfs waited the needed devices for a filesystem, and mount in degraded mode if not all devices are appeared (depending by a switch); if a timeout is reached, and error is returned.

It doesn't need any special udev rule, because it performs a discovery of the devices using libuuid. I think that mounting a filesystem and handling all the possibles case relaying of the udev and its syntax of the udev rules is more a problem than a solution. Adding that udev and the udev rules are developed in a different project, the difficulties increase.

I think that BTRFS for its complexity and their peculiarities need a dedicated tool like a mount helper.

My mount.btrfs is not able to solve all the problem, but might be a starts for handling the issues.

BR
G.Baroncelli


[1] http://www.spinics.net/lists/linux-btrfs/msg28764.html



-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 64-btrfs.rules and degraded boot
  2016-07-06 20:00                 ` Chris Murphy
@ 2016-07-07 17:00                   ` Goffredo Baroncelli
  0 siblings, 0 replies; 33+ messages in thread
From: Goffredo Baroncelli @ 2016-07-07 17:00 UTC (permalink / raw)
  To: Chris Murphy, Austin S. Hemmelgarn
  Cc: Andrei Borzenkov, Kai Krakow, Btrfs BTRFS

On 2016-07-06 22:00, Chris Murphy wrote:
> On Wed, Jul 6, 2016 at 1:17 PM, Austin S. Hemmelgarn
> <ahferroin7@gmail.com> wrote:
> 
>> In bash or most other POSIX compliant shells, you can run this:
>> echo $?
>> to get the return code of the previous command.
>>
>> In your case though, it may be reporting the FS ready because it had already
>> seen all the devices, IIUC, the flag that checks is only set once, and never
>> unset, which is not a good design in this case.
> 
> Oh dear.
> 
> [root@f24s ~]# lvs
>   LV         VG Attr       LSize  Pool       Origin Data%  Meta%  Move
> Log Cpy%Sync Convert
>   1          VG Vwi---tz-- 50.00g thintastic
>   2          VG Vwi---tz-- 50.00g thintastic
>   3          VG Vwi-a-tz-- 50.00g thintastic        2.54
>   thintastic VG twi-aotz-- 90.00g                   5.05   2.92
> [root@f24s ~]# btrfs dev scan
> Scanning for Btrfs filesystems
> [root@f24s ~]# echo $?
> 0
> [root@f24s ~]# btrfs device ready /dev/mapper/VG-3
> [root@f24s ~]# echo $?
> 0
> [root@f24s ~]# btrfs fi show
> warning, device 2 is missing
> Label: none  uuid: 96240fd9-ea76-47e7-8cf4-05d3570ccfd7
>     Total devices 3 FS bytes used 2.26GiB
>     devid    3 size 50.00GiB used 3.01GiB path /dev/mapper/VG-3
>     *** Some devices missing
> 
> 
> Cute, device 1 is also missing but that's not mentioned. In any case,
> the device is still ready even after a dev scan. I guess this isn't
> exactly testable all that easily unless I reboot.

IIRC a device when "registered" by "btrfs dev scan", is never removed from the available devices. This means that if you remove a valid device after that it is already scanned, "btrfs dev ready" still return OK until a reboot happened.

>From your email, it is not clear if you rebooted (or rmmod-ded btrfs) after you removed the devices.

Only my 2¢...

BR
G.Baroncelli
> 
> 
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 64-btrfs.rules and degraded boot
  2016-07-06 18:57             ` Chris Murphy
@ 2016-07-07 17:07               ` Goffredo Baroncelli
  0 siblings, 0 replies; 33+ messages in thread
From: Goffredo Baroncelli @ 2016-07-07 17:07 UTC (permalink / raw)
  To: Chris Murphy, Andrei Borzenkov; +Cc: Kai Krakow, Btrfs BTRFS

On 2016-07-06 20:57, Chris Murphy wrote:
[...]
> 
> Seems like we need more granularity by btrfs ioctl for device ready,
> e.g. some way to indicate:
> 
> 0 all devices ready
> 1 devices not ready (don't even try to mount)
> 2 minimum devices ready (degraded mount possible)
> 
> 
> Btrfs multiple device single and raid0 only return code 0 or 1. Where
> raid 1, 5, 6 could return code 2. The systemd default policy for code
> 2 could be to wait some amount of time to see if state goes to 0. At
> the timeout, try to mount anyway. If rootflags=degraded, it mounts. If
> not, mount fails, and we get a dracut prompt.
> 

Pay attention that to return 2, you have to scan all the VGs to check if all the involved devices are available: i.e. a filesystem composed by 5 disks, may have a VG RAID5 with only 3 disks used for data, and a VG RAID1 for metadata for the other two disks....

Think to try to perform this for each disk appearing.... I fear that it is too expensive


> That's better behavior than now.
> 
>> Equivalent of this rule is required under systemd and desired in
>> general to avoid polling. On systemd list I outlined possible
>> alternative implementation as systemd service instead of really
>> hackish udev rule.
> 
> I'll go read it there. Thanks.
> 
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 64-btrfs.rules and degraded boot
  2016-07-07 16:52                   ` Goffredo Baroncelli
@ 2016-07-07 18:23                     ` Austin S. Hemmelgarn
  2016-07-07 18:58                       ` Chris Murphy
  2016-07-07 19:41                       ` Goffredo Baroncelli
  0 siblings, 2 replies; 33+ messages in thread
From: Austin S. Hemmelgarn @ 2016-07-07 18:23 UTC (permalink / raw)
  To: kreijack, Andrei Borzenkov; +Cc: Chris Murphy, Kai Krakow, Btrfs BTRFS

On 2016-07-07 12:52, Goffredo Baroncelli wrote:
> On 2016-07-06 14:48, Austin S. Hemmelgarn wrote:
>> On 2016-07-06 08:39, Andrei Borzenkov wrote:
> [....]
>>>>>>
>>>>>> To be entirely honest, if it were me, I'd want systemd to
>>>>>> fsck off.  If the kernel mount(2) call succeeds, then the
>>>>>> filesystem was ready enough to mount, and if it doesn't, then
>>>>>> it wasn't, end of story.
>>>>>
>>>>> How should user space know when to try mount? What user space
>>>>> is supposed to do during boot if mount fails? Do you suggest
>>>>>
>>>>> while true; do mount /dev/foo && exit 0 done
>>>>>
>>>>> as part of startup sequence? And note that nowhere is systemd
>>>>> involved so far.
>>>> Nowhere there, except if you have a filesystem in fstab (or a
>>>> mount unit, which I hate for other reasons that I will not go
>>>> into right now), and you mount it and systemd thinks the device
>>>> isn't ready, it unmounts it _immediately_.  In the case of boot,
>>>> it's because of systemd thinking the device isn't ready that you
>>>> can't mount degraded with a missing device.  In the case of the
>>>> root filesystem at least, the initramfs is expected to handle
>>>> this, and most of them do poll in some way, or have other methods
>>>> of determining this.  I occasionally have issues with it with
>>>> dracut without systemd, but that's due to a separate bug there
>>>> involving the device mapper.
>>>>
>>>
>>> How this systemd bashing answers my question - how user space knows
>>> when it can call mount at startup?
>> You mentioned that systemd wasn't involved, which is patently false
>> if it's being used as your init system, and I was admittedly mostly
>> responding to that.
>>
>> Now, to answer the primary question which I forgot to answer:
>> Userspace doesn't.  Systemd doesn't either but assumes it does and
>> checks in a flawed way.  Dracut's polling loop assumes it does but
>> sometimes fails in a different way.  There is no way other than
>> calling mount right now to know for sure if the mount will succeed,
>> and that actually applies to a certain degree to any filesystem
>> (because any number of things that are outside of even the kernel's
>> control might happen while trying to mount the device.
>
> I think that there is no a simple answer, and the answer may depend by context.
> In the past, I made a prototype of a mount helper for btrfs [1]; the aim was to:
>
> 1) get rid of the actual btrfs volume discovery (udev which trigger btrfs dev scan) which has a lot of strange condition (what happens when a device disappear ?)
> 2) create a place where we develop and define strategies to handle all (or most) of the case of [partial] failure of a [multi-device] btrfs filesystem
>
> By default, my mount.btrfs waited the needed devices for a filesystem, and mount in degraded mode if not all devices are appeared (depending by a switch); if a timeout is reached, and error is returned.
>
> It doesn't need any special udev rule, because it performs a discovery of the devices using libuuid. I think that mounting a filesystem and handling all the possibles case relaying of the udev and its syntax of the udev rules is more a problem than a solution. Adding that udev and the udev rules are developed in a different project, the difficulties increase.
>
> I think that BTRFS for its complexity and their peculiarities need a dedicated tool like a mount helper.
>
> My mount.btrfs is not able to solve all the problem, but might be a starts for handling the issues.
FWIW, I've pretty much always been of the opinion that the device 
discovery belongs in a mount helper.  The auto-discovery from udev (and 
more importantly, how the kernel handles being told about a device) is 
much of the reason that it's so inherently dangerous to do block level 
copies.  There's obviously no way that can be changed now without 
breaking something, but that's on the really short list of things that I 
personally feel are worth breaking to fix a particularly dangerous 
pitfall.  The recent discovery that device ready state is write-once 
when set just reinforces this in my opinion.

Here's how I would picture the ideal situation:
* A device is processed by udev.  It detects that it's part of a BTRFS 
array, updates blkid and whatever else in userspace with this info, and 
then stops without telling the kernel.
* The kernel tracks devices until the filesystem they are part of is 
unmounted, or a mount of that FS fails.
* When the user goes to mount the a BTRFS filesystem, they use a mount 
helper.
   1. This helper queries udev/blkid/whatever to see which devices are 
part of an array.
   2. Once the helper determines which devices are potentially in the 
requested FS, it checks the following things to ensure array integrity:
     - Does each device report the same number of component devices for 
the array?
     - Does the reported number match the number of devices found?
     - If a mount by UUID is requested, do all the labels match on each 
device?
     - If a mount by LABEL is requested, do all the UUID's match on each 
device?
     - If a mount by path is requested, do all the component devices 
reported by that device have matching LABEL _and_ UUID?
     - Is any of the devices found already in-use by another mount?
   4. If any of the above checks fails, and the user has not specified 
an option to request a mount anyway, report the error and exit with 
non-zero status _before_ even talking to the kernel.
   5. If only the second check fails (the check verifying the number of 
devices found), and it fails because the number found is less than 
required for a non-degraded mount, ignore that check if and only if the 
user specified -o degraded.
   6. If any of the other checks fail, ignore them if and only if the 
user asks to ignore that specific check.
   7. Otherwise, notify the kernel about the devices and call mount(2).
* The mount helper parses it's own set of special options similar to the 
bg/fg/retry options used by mount.nfs to allow for timeouts when 
mounting, as well as asynchronous mounts in the background.
* btrfs device scan becomes a no-op
* btrfs device ready uses the above logic minus step 7 to determine if a 
filesystem is probably ready.

Such a situation would probably eliminate or at least reduce most of our 
current issues with device discovery, and provide much better error 
reporting and general flexibility.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 64-btrfs.rules and degraded boot
  2016-07-07 18:23                     ` Austin S. Hemmelgarn
@ 2016-07-07 18:58                       ` Chris Murphy
  2016-07-07 19:14                         ` Chris Murphy
                                           ` (2 more replies)
  2016-07-07 19:41                       ` Goffredo Baroncelli
  1 sibling, 3 replies; 33+ messages in thread
From: Chris Murphy @ 2016-07-07 18:58 UTC (permalink / raw)
  To: Austin S. Hemmelgarn
  Cc: Goffredo Baroncelli, Andrei Borzenkov, Chris Murphy, Kai Krakow,
	Btrfs BTRFS

On Thu, Jul 7, 2016 at 12:23 PM, Austin S. Hemmelgarn
<ahferroin7@gmail.com> wrote:

>
> Here's how I would picture the ideal situation:
> * A device is processed by udev.  It detects that it's part of a BTRFS
> array, updates blkid and whatever else in userspace with this info, and then
> stops without telling the kernel.
> * The kernel tracks devices until the filesystem they are part of is
> unmounted, or a mount of that FS fails.
> * When the user goes to mount the a BTRFS filesystem, they use a mount
> helper.
>   1. This helper queries udev/blkid/whatever to see which devices are part
> of an array.
>   2. Once the helper determines which devices are potentially in the
> requested FS, it checks the following things to ensure array integrity:
>     - Does each device report the same number of component devices for the
> array?
>     - Does the reported number match the number of devices found?
>     - If a mount by UUID is requested, do all the labels match on each
> device?
>     - If a mount by LABEL is requested, do all the UUID's match on each
> device?
>     - If a mount by path is requested, do all the component devices reported
> by that device have matching LABEL _and_ UUID?
>     - Is any of the devices found already in-use by another mount?
>   4. If any of the above checks fails, and the user has not specified an
> option to request a mount anyway, report the error and exit with non-zero
> status _before_ even talking to the kernel.
>   5. If only the second check fails (the check verifying the number of
> devices found), and it fails because the number found is less than required
> for a non-degraded mount, ignore that check if and only if the user
> specified -o degraded.
>   6. If any of the other checks fail, ignore them if and only if the user
> asks to ignore that specific check.
>   7. Otherwise, notify the kernel about the devices and call mount(2).
> * The mount helper parses it's own set of special options similar to the
> bg/fg/retry options used by mount.nfs to allow for timeouts when mounting,
> as well as asynchronous mounts in the background.
> * btrfs device scan becomes a no-op
> * btrfs device ready uses the above logic minus step 7 to determine if a
> filesystem is probably ready.
>
> Such a situation would probably eliminate or at least reduce most of our
> current issues with device discovery, and provide much better error
> reporting and general flexibility.

It might be useful to see where ZFS and LVM work and fail in this
regard. And also plan for D-Bus support to get state notifications up
to something like storaged or other such user space management tools.
Early on in Fedora there were many difficulties between systemd and
LVM, so avoiding whatever that was about would be nice.

Also, tangentially related, Fedora is replacing udisks2 with storaged.
Storaged already has a Btrfs plug-in so there should be better
awareness there. I get all kinds of damn strange behaviors in GNOME
with Btrfs multiple device volumes: volume names appearing twice in
the UI, unmounting one causes umount errors with the other.
https://fedoraproject.org/wiki/Changes/Replace_UDisks2_by_Storaged
http://storaged.org/

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 64-btrfs.rules and degraded boot
  2016-07-07 18:58                       ` Chris Murphy
@ 2016-07-07 19:14                         ` Chris Murphy
  2016-07-07 19:59                         ` Austin S. Hemmelgarn
  2016-07-07 20:13                         ` Goffredo Baroncelli
  2 siblings, 0 replies; 33+ messages in thread
From: Chris Murphy @ 2016-07-07 19:14 UTC (permalink / raw)
  To: Btrfs BTRFS

More Btrfs udev issues, they involve making btrfs multiple device
volumes via 'btrfs dev add' which then causes problems at boot time.
https://bugzilla.opensuse.org/show_bug.cgi?id=912170
https://bugzilla.suse.com/show_bug.cgi?id=984516

The last part is amusing in that the proposed fix is going to end up
in btrfs-progs. And so that's why:

[chris@f24m ~]$ dnf provides /usr/lib/udev/rules.d/64-btrfs-dm.rules
Last metadata expiration check: 1:18:18 ago on Thu Jul  7 11:54:20 2016.
btrfs-progs-4.6-1.fc25.x86_64 : Userspace programs for btrfs
Repo        : @System

[chris@f24m ~]$ dnf provides /usr/lib/udev/rules.d/64-btrfs.rules
Last metadata expiration check: 1:18:30 ago on Thu Jul  7 11:54:20 2016.
systemd-udev-229-8.fc24.x86_64 : Rule-based device node and kernel event manager
Repo        : @System

Ha. So the btrfs rule is provided by udev upstream. The dm specific
Btrfs rule is provided by Btrfs upstream. That's not confusing at all.


Chris Murphy

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 64-btrfs.rules and degraded boot
  2016-07-07 18:23                     ` Austin S. Hemmelgarn
  2016-07-07 18:58                       ` Chris Murphy
@ 2016-07-07 19:41                       ` Goffredo Baroncelli
  1 sibling, 0 replies; 33+ messages in thread
From: Goffredo Baroncelli @ 2016-07-07 19:41 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Andrei Borzenkov
  Cc: Chris Murphy, Kai Krakow, Btrfs BTRFS

On 2016-07-07 20:23, Austin S. Hemmelgarn wrote:
[...]
> FWIW, I've pretty much always been of the opinion that the device discovery belongs in a mount helper.  The auto-discovery from udev (and more importantly, how the kernel handles being told about a device) is much of the reason that it's so inherently dangerous to do block level copies.  There's obviously no way that can be changed now without breaking something, but that's on the really short list of things that I personally feel are worth breaking to fix a particularly dangerous pitfall.  The recent discovery that device ready state is write-once when set just reinforces this in my opinion.
> 
> Here's how I would picture the ideal situation:
> * A device is processed by udev.  It detects that it's part of a BTRFS array, updates blkid and whatever else in userspace with this info, and then stops without telling the kernel.
> * The kernel tracks devices until the filesystem they are part of is unmounted, or a mount of that FS fails.
> * When the user goes to mount the a BTRFS filesystem, they use a mount helper.
>   1. This helper queries udev/blkid/whatever to see which devices are part of an array.
>   2. Once the helper determines which devices are potentially in the requested FS, it checks the following things to ensure array integrity:
>     - Does each device report the same number of component devices for the array?
>     - Does the reported number match the number of devices found?
>     - If a mount by UUID is requested, do all the labels match on each device?
>     - If a mount by LABEL is requested, do all the UUID's match on each device?
>     - If a mount by path is requested, do all the component devices reported by that device have matching LABEL _and_ UUID?
>     - Is any of the devices found already in-use by another mount?
        ^^^^^^^^^^^^^^^^^ It is possible to mount two time the same device.

I add my favorite:
	- is there a conflict of disk-uuid (i.e two different disk with the same uuid) ?

Anyway the point 2 has to be in loop until timeout: i.e. if systemd ask to mount a filesystem when the first device appear, wait for all devices appear.

>   4. If any of the above checks fails, and the user has not specified an option to request a mount anyway, report the error and exit with non-zero status _before_ even talking to the kernel.
>   5. If only the second check fails (the check verifying the number of devices found), and it fails because the number found is less than required for a non-degraded mount, ignore that check if and only if the user specified -o degraded.
>   6. If any of the other checks fail, ignore them if and only if the user asks to ignore that specific check.
>   7. Otherwise, notify the kernel about the devices and call mount(2).
> * The mount helper parses it's own set of special options similar to the bg/fg/retry options used by mount.nfs to allow for timeouts when mounting, as well as asynchronous mounts in the background.
> * btrfs device scan becomes a no-op
> * btrfs device ready uses the above logic minus step 7 to determine if a filesystem is probably ready.
> 
> Such a situation would probably eliminate or at least reduce most of our current issues with device discovery, and provide much better error reporting and general flexibility.
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 64-btrfs.rules and degraded boot
  2016-07-07 18:58                       ` Chris Murphy
  2016-07-07 19:14                         ` Chris Murphy
@ 2016-07-07 19:59                         ` Austin S. Hemmelgarn
  2016-07-07 20:20                           ` Chris Murphy
  2016-07-07 20:13                         ` Goffredo Baroncelli
  2 siblings, 1 reply; 33+ messages in thread
From: Austin S. Hemmelgarn @ 2016-07-07 19:59 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Goffredo Baroncelli, Andrei Borzenkov, Kai Krakow, Btrfs BTRFS

On 2016-07-07 14:58, Chris Murphy wrote:
> On Thu, Jul 7, 2016 at 12:23 PM, Austin S. Hemmelgarn
> <ahferroin7@gmail.com> wrote:
>
>>
>> Here's how I would picture the ideal situation:
>> * A device is processed by udev.  It detects that it's part of a BTRFS
>> array, updates blkid and whatever else in userspace with this info, and then
>> stops without telling the kernel.
>> * The kernel tracks devices until the filesystem they are part of is
>> unmounted, or a mount of that FS fails.
>> * When the user goes to mount the a BTRFS filesystem, they use a mount
>> helper.
>>   1. This helper queries udev/blkid/whatever to see which devices are part
>> of an array.
>>   2. Once the helper determines which devices are potentially in the
>> requested FS, it checks the following things to ensure array integrity:
>>     - Does each device report the same number of component devices for the
>> array?
>>     - Does the reported number match the number of devices found?
>>     - If a mount by UUID is requested, do all the labels match on each
>> device?
>>     - If a mount by LABEL is requested, do all the UUID's match on each
>> device?
>>     - If a mount by path is requested, do all the component devices reported
>> by that device have matching LABEL _and_ UUID?
>>     - Is any of the devices found already in-use by another mount?
>>   4. If any of the above checks fails, and the user has not specified an
>> option to request a mount anyway, report the error and exit with non-zero
>> status _before_ even talking to the kernel.
>>   5. If only the second check fails (the check verifying the number of
>> devices found), and it fails because the number found is less than required
>> for a non-degraded mount, ignore that check if and only if the user
>> specified -o degraded.
>>   6. If any of the other checks fail, ignore them if and only if the user
>> asks to ignore that specific check.
>>   7. Otherwise, notify the kernel about the devices and call mount(2).
>> * The mount helper parses it's own set of special options similar to the
>> bg/fg/retry options used by mount.nfs to allow for timeouts when mounting,
>> as well as asynchronous mounts in the background.
>> * btrfs device scan becomes a no-op
>> * btrfs device ready uses the above logic minus step 7 to determine if a
>> filesystem is probably ready.
>>
>> Such a situation would probably eliminate or at least reduce most of our
>> current issues with device discovery, and provide much better error
>> reporting and general flexibility.
>
> It might be useful to see where ZFS and LVM work and fail in this
> regard. And also plan for D-Bus support to get state notifications up
> to something like storaged or other such user space management tools.
> Early on in Fedora there were many difficulties between systemd and
> LVM, so avoiding whatever that was about would be nice.
D-Bus support needs to be optional, period.  Not everybody uses D-Bus (I 
have dozens of systems that get by just fine without it, and know 
hundreds of other people who do as well), and even people who do don't 
always use every tool needed (on the one system I manage that does have 
it, the only things I need it for are Avahi, ConsoleKit, udev, and 
NetworkManager, and I'm getting pretty close to the point of getting rid 
of NM and CK and re-implementing or forking Avahi).  You have to 
consider the fact that there are and always will be people who do not 
install a GUI on their system and want the absolute minimum of software 
installed.
>
> Also, tangentially related, Fedora is replacing udisks2 with storaged.
> Storaged already has a Btrfs plug-in so there should be better
> awareness there. I get all kinds of damn strange behaviors in GNOME
> with Btrfs multiple device volumes: volume names appearing twice in
> the UI, unmounting one causes umount errors with the other.
> https://fedoraproject.org/wiki/Changes/Replace_UDisks2_by_Storaged
> http://storaged.org/
Personally, I don't care what Fedora is doing, or even what GNOME (or 
any other DE for that matter, the only reason I use Xfce is because some 
things need a GUI (many of them unnecessarily), and that happens to be 
the DE I have the fewest complaints about) is doing.  The only reason 
that things like GNOME Disks and such exist is because they're trying to 
imitate Windows and OS X, which is all well and good for a desktop, but 
is absolute crap for many server and embedded environments (Microsoft 
finally realized this, and Windows Server 2012 added the ability to 
install without a full desktop, which actually means that they have 
_more_ options than a number of Linux distributions (yes you can rip out 
the desktop on many distros if you want, but that takes an insane amount 
of effort most of the time, not to mention storage space)).

Storaged also qualifies as something that _needs_ to be optional, 
especially because it appears to require systemd (and it falls into the 
same category as D-Bus of 'unnecessary bloat on many systems').  Adding 
a mandatory dependency on systemd _will_ split the community and 
severely piss off quite a few people (you will likely get some rather 
nasty looks from a number of senior kernel developers if you meet them 
in person).

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 64-btrfs.rules and degraded boot
  2016-07-07 18:58                       ` Chris Murphy
  2016-07-07 19:14                         ` Chris Murphy
  2016-07-07 19:59                         ` Austin S. Hemmelgarn
@ 2016-07-07 20:13                         ` Goffredo Baroncelli
  2 siblings, 0 replies; 33+ messages in thread
From: Goffredo Baroncelli @ 2016-07-07 20:13 UTC (permalink / raw)
  To: Chris Murphy, Austin S. Hemmelgarn
  Cc: Andrei Borzenkov, Kai Krakow, Btrfs BTRFS

On 2016-07-07 20:58, Chris Murphy wrote:
> I get all kinds of damn strange behaviors in GNOME
> with Btrfs multiple device volumes: volume names appearing twice in
> the UI, unmounting one causes umount errors with the other.
> https://fedoraproject.org/wiki/Changes/Replace_UDisks2_by_Storaged
> http://storaged.org/

Unfortunately BTRFS is a mess from this point of view. Some btrfs subcommand query the system inspecting directly the data stored on the disks; others use the ioctl(2) syscall, which provides what the kernel think. Unfortunately, due to the cache, these two kind of source of information are out of sync. 

Often, when some command output don't convince me, I do some "sync"; then repeating the command the output is better ("btrfs fi show" is one of these commands).

BR
G.Baroncelli


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 64-btrfs.rules and degraded boot
  2016-07-07 19:59                         ` Austin S. Hemmelgarn
@ 2016-07-07 20:20                           ` Chris Murphy
  2016-07-08 12:24                             ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 33+ messages in thread
From: Chris Murphy @ 2016-07-07 20:20 UTC (permalink / raw)
  To: Austin S. Hemmelgarn
  Cc: Chris Murphy, Goffredo Baroncelli, Andrei Borzenkov, Kai Krakow,
	Btrfs BTRFS

On Thu, Jul 7, 2016 at 1:59 PM, Austin S. Hemmelgarn
<ahferroin7@gmail.com> wrote:

> D-Bus support needs to be optional, period.  Not everybody uses D-Bus (I
> have dozens of systems that get by just fine without it, and know hundreds
> of other people who do as well), and even people who do don't always use
> every tool needed (on the one system I manage that does have it, the only
> things I need it for are Avahi, ConsoleKit, udev, and NetworkManager, and
> I'm getting pretty close to the point of getting rid of NM and CK and
> re-implementing or forking Avahi).  You have to consider the fact that there
> are and always will be people who do not install a GUI on their system and
> want the absolute minimum of software installed.

That's fine, they can monitor kernel messages directly as their
notification system. I'm concerned with people who don't ever look at
kernel messages, you know, mortal users who have better things to do
with a computer than that. It's important for most anyone to not have
to wait for problems to manifest traumatically.


> Personally, I don't care what Fedora is doing, or even what GNOME (or any
> other DE for that matter, the only reason I use Xfce is because some things
> need a GUI (many of them unnecessarily), and that happens to be the DE I
> have the fewest complaints about) is doing.  The only reason that things
> like GNOME Disks and such exist is because they're trying to imitate Windows
> and OS X, which is all well and good for a desktop, but is absolute crap for
> many server and embedded environments (Microsoft finally realized this, and
> Windows Server 2012 added the ability to install without a full desktop,
> which actually means that they have _more_ options than a number of Linux
> distributions (yes you can rip out the desktop on many distros if you want,
> but that takes an insane amount of effort most of the time, not to mention
> storage space)).

I'm willing to bet dollars to donuts Xfce fans would love to know if
one of their rootfs mirrors is spewing read errors, while smartd
defers to the drive which says "hey no problems here". GNOME at least
does report certain critical smart errors, but that still leaves
something like 40% of drive failures happening without prior notice.


> Storaged also qualifies as something that _needs_ to be optional, especially
> because it appears to require systemd (and it falls into the same category
> as D-Bus of 'unnecessary bloat on many systems').  Adding a mandatory
> dependency on systemd _will_ split the community and severely piss off quite
> a few people (you will likely get some rather nasty looks from a number of
> senior kernel developers if you meet them in person).

I just want things to work for users, defined as people who would like
to stop depending on Windows and macOS for both server and desktop
usage. I don't really care about ideological issues outside of that
goal.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 64-btrfs.rules and degraded boot
  2016-07-07 20:20                           ` Chris Murphy
@ 2016-07-08 12:24                             ` Austin S. Hemmelgarn
  2016-07-11 21:07                               ` Chris Murphy
  0 siblings, 1 reply; 33+ messages in thread
From: Austin S. Hemmelgarn @ 2016-07-08 12:24 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Goffredo Baroncelli, Andrei Borzenkov, Kai Krakow, Btrfs BTRFS

On 2016-07-07 16:20, Chris Murphy wrote:
> On Thu, Jul 7, 2016 at 1:59 PM, Austin S. Hemmelgarn
> <ahferroin7@gmail.com> wrote:
>
>> D-Bus support needs to be optional, period.  Not everybody uses D-Bus (I
>> have dozens of systems that get by just fine without it, and know hundreds
>> of other people who do as well), and even people who do don't always use
>> every tool needed (on the one system I manage that does have it, the only
>> things I need it for are Avahi, ConsoleKit, udev, and NetworkManager, and
>> I'm getting pretty close to the point of getting rid of NM and CK and
>> re-implementing or forking Avahi).  You have to consider the fact that there
>> are and always will be people who do not install a GUI on their system and
>> want the absolute minimum of software installed.
>
> That's fine, they can monitor kernel messages directly as their
> notification system. I'm concerned with people who don't ever look at
> kernel messages, you know, mortal users who have better things to do
> with a computer than that. It's important for most anyone to not have
> to wait for problems to manifest traumatically.
My point is that they probably need btrfs-progs too.  Take me for 
example, I don't use some fancy graphical tool to tell me when my disks 
are failing, but I don't scrape kernel logs either.  I have things set 
up to monitor the disks directly (using btrfs-progs in the case of stuff 
that can check for), and notify me via e-mail if there's an issue.  Not 
supporting that use case at all would be like e2fsprogs adding a 
dependency on X11 and telling everyone who doesn't want to use X11 to 
just go implement their own tools.  If that happened, e2fsprogs would 
get forked, the commit reverted in that fork, and most of the 
non-enterprise distros would probably switch pretty damn quick to the 
forked version.
>
>
>> Personally, I don't care what Fedora is doing, or even what GNOME (or any
>> other DE for that matter, the only reason I use Xfce is because some things
>> need a GUI (many of them unnecessarily), and that happens to be the DE I
>> have the fewest complaints about) is doing.  The only reason that things
>> like GNOME Disks and such exist is because they're trying to imitate Windows
>> and OS X, which is all well and good for a desktop, but is absolute crap for
>> many server and embedded environments (Microsoft finally realized this, and
>> Windows Server 2012 added the ability to install without a full desktop,
>> which actually means that they have _more_ options than a number of Linux
>> distributions (yes you can rip out the desktop on many distros if you want,
>> but that takes an insane amount of effort most of the time, not to mention
>> storage space)).
>
> I'm willing to bet dollars to donuts Xfce fans would love to know if
> one of their rootfs mirrors is spewing read errors, while smartd
> defers to the drive which says "hey no problems here". GNOME at least
> does report certain critical smart errors, but that still leaves
> something like 40% of drive failures happening without prior notice.
I'm not saying some specific users don't care, I'm saying that requiring 
people to have a specific software stack which may not work for their 
use case is a stupid choice for something as low level as this.  Yes 
people want to know when something failed, but we shouldn't mandate 
_how_ they choose in a given system to check this.  There need to be 
more choices than just a GUI tool and talking directly to the kernel. 
Looking at this another way, it is fully possible to implement something 
to do this in a DE agnostic manner _without depending on D-BUS_ using 
the tools as they are right now.  An initial implementation would of 
course be inefficient, but until we get notifications _from the kernel_ 
about FS state, we have to poll regardless, which means that having 
D-Bus support would not help (and would probably just make things slower).
>
>
>> Storaged also qualifies as something that _needs_ to be optional, especially
>> because it appears to require systemd (and it falls into the same category
>> as D-Bus of 'unnecessary bloat on many systems').  Adding a mandatory
>> dependency on systemd _will_ split the community and severely piss off quite
>> a few people (you will likely get some rather nasty looks from a number of
>> senior kernel developers if you meet them in person).
>
> I just want things to work for users, defined as people who would like
> to stop depending on Windows and macOS for both server and desktop
> usage. I don't really care about ideological issues outside of that
> goal.
Making us hard depend on storaged would not help this goal.  It's no 
different than the Microsoft and Apple approach of 'our way or not at all'.

To clarify, I'm not trying to argue against adding support, I'm arguing 
against it being mandatory.  A filesystem which requires specific system 
services to be running just for regular maintenance tasks is not a well 
designed filesystem.  To be entirely honest, I'm not all that happy 
about the functional dependency on udev to have device discovery, but 
there's no point in me arguing about that...

Just thinking aloud, but why not do a daemon that does the actual 
monitoring, and then provide an interface (at least a UNIX domain 
socket, and optionally a D-Bus endpoint) that other tools can use to 
query filesystem status.  LVM already has a similar setup for monitoring 
DM-RAID volumes, snapshots, and thin storage pools, although it's 
designed as an event driven tool that does something when specific 
things happen (for example, possibly auto-extending snapshots when they 
start to get full).  Other than the D-Bus support, I could probably 
write a basic piece of software to do this in Python in about a week of 
work (most of which would be figuring out the edge cases and making sure 
it works on both 2.7 and 3) that would provide similar functionality 
(with better configurability too) to that which could easily provide an 
interface to query filesystem status.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 64-btrfs.rules and degraded boot
  2016-07-08 12:24                             ` Austin S. Hemmelgarn
@ 2016-07-11 21:07                               ` Chris Murphy
  2016-07-12 15:34                                 ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 33+ messages in thread
From: Chris Murphy @ 2016-07-11 21:07 UTC (permalink / raw)
  To: Austin S. Hemmelgarn
  Cc: Chris Murphy, Goffredo Baroncelli, Andrei Borzenkov, Kai Krakow,
	Btrfs BTRFS

On Fri, Jul 8, 2016 at 6:24 AM, Austin S. Hemmelgarn
<ahferroin7@gmail.com> wrote:

> To clarify, I'm not trying to argue against adding support, I'm arguing
> against it being mandatory.

By "D-Bus support" I did not mean to indicate mandating it, just that
it would be one possible way to get some very basic state change
messages to user space tools so we're doing the least amount of wheel
reinvention as possible.


>  A filesystem which requires specific system
> services to be running just for regular maintenance tasks is not a well
> designed filesystem.  To be entirely honest, I'm not all that happy about
> the functional dependency on udev to have device discovery, but there's no
> point in me arguing about that...

Well everything else that came before it is effectively deprecated, so
there's no going back. The way forward would be to get udev more
granular state information about a Btrfs volume than 0 and 1.

>
> Just thinking aloud, but why not do a daemon that does the actual
> monitoring, and then provide an interface (at least a UNIX domain socket,
> and optionally a D-Bus endpoint) that other tools can use to query
> filesystem status.  LVM already has a similar setup for monitoring DM-RAID
> volumes, snapshots, and thin storage pools, although it's designed as an
> event driven tool that does something when specific things happen (for
> example, possibly auto-extending snapshots when they start to get full).

That would be consistent with mdadm --monitor and dmeventd, but it is
yet another wheel reinvention at the lower level, which then also
necessitates higher level things to adapt to that interface. It would
be neat if there could be some unification and consistency.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 64-btrfs.rules and degraded boot
  2016-07-11 21:07                               ` Chris Murphy
@ 2016-07-12 15:34                                 ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 33+ messages in thread
From: Austin S. Hemmelgarn @ 2016-07-12 15:34 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Goffredo Baroncelli, Andrei Borzenkov, Kai Krakow, Btrfs BTRFS

On 2016-07-11 17:07, Chris Murphy wrote:
> On Fri, Jul 8, 2016 at 6:24 AM, Austin S. Hemmelgarn
> <ahferroin7@gmail.com> wrote:
>
>> To clarify, I'm not trying to argue against adding support, I'm arguing
>> against it being mandatory.
>
> By "D-Bus support" I did not mean to indicate mandating it, just that
> it would be one possible way to get some very basic state change
> messages to user space tools so we're doing the least amount of wheel
> reinvention as possible.
Minimizing the amount of work would be good, but I would not agree about 
D-Bus doing that.  It's easy to debug socket based IPC, it's not easy to 
debug D-Bus based IPC.  From a development perspective, I'd say we need 
to get something working with sockets first, and then worry about D-Bus 
once we have working infrastructure and abstraction for IPC.
>
>
>>  A filesystem which requires specific system
>> services to be running just for regular maintenance tasks is not a well
>> designed filesystem.  To be entirely honest, I'm not all that happy about
>> the functional dependency on udev to have device discovery, but there's no
>> point in me arguing about that...
>
> Well everything else that came before it is effectively deprecated, so
> there's no going back. The way forward would be to get udev more
> granular state information about a Btrfs volume than 0 and 1.
People still use other options, usually in embedded systems, but options 
do exist and are used.

That said, I couldn't agree more about reporting more info about the 
state of the FS, but I still feel that scanning on device connection is 
not a good thing with the way things are currently designed in the 
kernel, not just the binary state reporting.
>
>>
>> Just thinking aloud, but why not do a daemon that does the actual
>> monitoring, and then provide an interface (at least a UNIX domain socket,
>> and optionally a D-Bus endpoint) that other tools can use to query
>> filesystem status.  LVM already has a similar setup for monitoring DM-RAID
>> volumes, snapshots, and thin storage pools, although it's designed as an
>> event driven tool that does something when specific things happen (for
>> example, possibly auto-extending snapshots when they start to get full).
>
> That would be consistent with mdadm --monitor and dmeventd, but it is
> yet another wheel reinvention at the lower level, which then also
> necessitates higher level things to adapt to that interface. It would
> be neat if there could be some unification and consistency.
>
A consistent external API would be a good thing, but I'm not sure if 
unifying the internal design would be.  Trying to unify handling in an 
external project would make things less reliable, not more reliable, 
because


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2016-07-12 15:34 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-05 18:53 64-btrfs.rules and degraded boot Chris Murphy
2016-07-05 19:27 ` Kai Krakow
2016-07-05 19:30   ` Chris Murphy
2016-07-05 20:10     ` Chris Murphy
2016-07-06  9:51       ` Andrei Borzenkov
2016-07-06 11:45         ` Austin S. Hemmelgarn
2016-07-06 11:55           ` Andrei Borzenkov
2016-07-06 12:14             ` Austin S. Hemmelgarn
2016-07-06 12:39               ` Andrei Borzenkov
2016-07-06 12:48                 ` Austin S. Hemmelgarn
2016-07-07 16:52                   ` Goffredo Baroncelli
2016-07-07 18:23                     ` Austin S. Hemmelgarn
2016-07-07 18:58                       ` Chris Murphy
2016-07-07 19:14                         ` Chris Murphy
2016-07-07 19:59                         ` Austin S. Hemmelgarn
2016-07-07 20:20                           ` Chris Murphy
2016-07-08 12:24                             ` Austin S. Hemmelgarn
2016-07-11 21:07                               ` Chris Murphy
2016-07-12 15:34                                 ` Austin S. Hemmelgarn
2016-07-07 20:13                         ` Goffredo Baroncelli
2016-07-07 19:41                       ` Goffredo Baroncelli
2016-07-06 12:49             ` Tomasz Torcz
2016-07-06 17:19         ` Chris Murphy
2016-07-06 18:04           ` Austin S. Hemmelgarn
2016-07-06 18:23             ` Chris Murphy
2016-07-06 18:29               ` Andrei Borzenkov
2016-07-06 19:17               ` Austin S. Hemmelgarn
2016-07-06 20:00                 ` Chris Murphy
2016-07-07 17:00                   ` Goffredo Baroncelli
2016-07-06 18:24           ` Andrei Borzenkov
2016-07-06 18:57             ` Chris Murphy
2016-07-07 17:07               ` Goffredo Baroncelli
2016-07-07 16:37 ` Goffredo Baroncelli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.