All of lore.kernel.org
 help / color / mirror / Atom feed
* [survey]  BTRFS_IOC_DEVICES_READY return status
@ 2015-06-12 13:16 Anand Jain
  2015-06-12 18:04 ` [systemd-devel] " Andrei Borzenkov
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Anand Jain @ 2015-06-12 13:16 UTC (permalink / raw)
  To: systemd-devel, linux-btrfs@vger.kernel.org >> linux-btrfs
  Cc: lennart, dsterba



BTRFS_IOC_DEVICES_READY is to check if all the required devices
are known by the btrfs kernel, so that admin/system-application
could mount the FS. It is checked against a device in the argument.

However the actual implementation is bit more than just that,
in the way that it would also scan and register the device
provided in the argument (same as btrfs device scan subcommand
or BTRFS_IOC_SCAN_DEV ioctl).

So BTRFS_IOC_DEVICES_READY ioctl isn't a read/view only ioctl,
but its a write command as well.

Next, since in the kernel we only check if total_devices
(read from SB)  is equal to num_devices (counted in the list)
to state the status as 0 (ready) or 1 (not ready). But this
does not work in rest of the device pool state like missing,
seeding, replacing since total_devices is actually not equal
to num_devices in these state but device pool is ready for
the mount and its a bug which is not part of this discussions.


Questions:

  - Do we want BTRFS_IOC_DEVICES_READY ioctl to also scan and
    register the device provided (same as btrfs device scan
    command or the BTRFS_IOC_SCAN_DEV ioctl)
    OR can BTRFS_IOC_DEVICES_READY be read-only ioctl interface
    to check the state of the device pool. ?

  - If the the device in the argument is already mounted,
    can it straightaway return 0 (ready) ? (as of now it would
    again independently read the SB determine total_devices
    and check against num_devices.

  - What should be the expected return when the FS is mounted
    and there is a missing device.


Thanks, Anand

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [systemd-devel] [survey]  BTRFS_IOC_DEVICES_READY return status
  2015-06-12 13:16 [survey] BTRFS_IOC_DEVICES_READY return status Anand Jain
@ 2015-06-12 18:04 ` Andrei Borzenkov
  2015-06-12 20:08   ` Goffredo Baroncelli
  2015-06-13  7:20 ` btrfs filesystem show confused when label is same as mountpoint Sjoerd
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Andrei Borzenkov @ 2015-06-12 18:04 UTC (permalink / raw)
  To: Anand Jain
  Cc: systemd-devel, linux-btrfs@vger.kernel.org >> linux-btrfs, dsterba

В Fri, 12 Jun 2015 21:16:30 +0800
Anand Jain <anand.jain@oracle.com> пишет:

> 
> 
> BTRFS_IOC_DEVICES_READY is to check if all the required devices
> are known by the btrfs kernel, so that admin/system-application
> could mount the FS. It is checked against a device in the argument.
> 
> However the actual implementation is bit more than just that,
> in the way that it would also scan and register the device
> provided in the argument (same as btrfs device scan subcommand
> or BTRFS_IOC_SCAN_DEV ioctl).
> 
> So BTRFS_IOC_DEVICES_READY ioctl isn't a read/view only ioctl,
> but its a write command as well.
> 
> Next, since in the kernel we only check if total_devices
> (read from SB)  is equal to num_devices (counted in the list)
> to state the status as 0 (ready) or 1 (not ready). But this
> does not work in rest of the device pool state like missing,
> seeding, replacing since total_devices is actually not equal
> to num_devices in these state but device pool is ready for
> the mount and its a bug which is not part of this discussions.
> 
> 
> Questions:
> 
>   - Do we want BTRFS_IOC_DEVICES_READY ioctl to also scan and
>     register the device provided (same as btrfs device scan
>     command or the BTRFS_IOC_SCAN_DEV ioctl)
>     OR can BTRFS_IOC_DEVICES_READY be read-only ioctl interface
>     to check the state of the device pool. ?
> 

udev is using it to incrementally assemble multi-device btrfs, so in
this case I think it should. Are there any other users?

>   - If the the device in the argument is already mounted,
>     can it straightaway return 0 (ready) ? (as of now it would
>     again independently read the SB determine total_devices
>     and check against num_devices.
> 

I think yes; obvious use case is btrfs mounted in initrd and later
coldplug. There is no point to wait for anything as filesystem is
obviously there.

>   - What should be the expected return when the FS is mounted
>     and there is a missing device.
> 

This is similar to problem mdadm had to solve. mdadm starts timer as
soon as enough raid devices are present; if timer expires before raid
is complete, raid is started in degraded mode. This avoids spurious
rebuilds. So it would be good if btrfs could distinguish between enough
devices to mount and all devices.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [systemd-devel] [survey]  BTRFS_IOC_DEVICES_READY return status
  2015-06-12 18:04 ` [systemd-devel] " Andrei Borzenkov
@ 2015-06-12 20:08   ` Goffredo Baroncelli
  2015-06-13  9:35     ` Anand Jain
  0 siblings, 1 reply; 19+ messages in thread
From: Goffredo Baroncelli @ 2015-06-12 20:08 UTC (permalink / raw)
  To: Andrei Borzenkov, Anand Jain
  Cc: systemd-devel, linux-btrfs@vger.kernel.org >> linux-btrfs, dsterba

On 2015-06-12 20:04, Andrei Borzenkov wrote:
> В Fri, 12 Jun 2015 21:16:30 +0800
> Anand Jain <anand.jain@oracle.com> пишет:
> 
>>
>>
>> BTRFS_IOC_DEVICES_READY is to check if all the required devices
>> are known by the btrfs kernel, so that admin/system-application
>> could mount the FS. It is checked against a device in the argument.
>>
>> However the actual implementation is bit more than just that,
>> in the way that it would also scan and register the device
>> provided in the argument (same as btrfs device scan subcommand
>> or BTRFS_IOC_SCAN_DEV ioctl).
>>
>> So BTRFS_IOC_DEVICES_READY ioctl isn't a read/view only ioctl,
>> but its a write command as well.
>>
>> Next, since in the kernel we only check if total_devices
>> (read from SB)  is equal to num_devices (counted in the list)
>> to state the status as 0 (ready) or 1 (not ready). But this
>> does not work in rest of the device pool state like missing,
>> seeding, replacing since total_devices is actually not equal
>> to num_devices in these state but device pool is ready for
>> the mount and its a bug which is not part of this discussions.
>>
>>
>> Questions:
>>
>>   - Do we want BTRFS_IOC_DEVICES_READY ioctl to also scan and
>>     register the device provided (same as btrfs device scan
>>     command or the BTRFS_IOC_SCAN_DEV ioctl)
>>     OR can BTRFS_IOC_DEVICES_READY be read-only ioctl interface
>>     to check the state of the device pool. ?
>>
> 
> udev is using it to incrementally assemble multi-device btrfs, so in
> this case I think it should. 

I agree, the ioctl name is confusing, but unfortunately this is an API and 
it has to be stay here forever. Udev uses it, so we know for sure that it
is widely used.

> Are there any other users?
> 
>>   - If the the device in the argument is already mounted,
>>     can it straightaway return 0 (ready) ? (as of now it would
>>     again independently read the SB determine total_devices
>>     and check against num_devices.
>>
> 
> I think yes; obvious use case is btrfs mounted in initrd and later
> coldplug. There is no point to wait for anything as filesystem is
> obviously there.
> 
>>   - What should be the expected return when the FS is mounted
>>     and there is a missing device.

I suggest to not invest further energy on a ioctl API. If you want these kind of information, you (we) should export these in sysfs:
In an ideal world:

- a new btrfs device appears
- udev register it with BTRFS_IOC_SCAN_DEV:
- udev (or mount ?) checks the status of the filesystem reading the sysfs entries (total devices, present devices, seed devices, raid level....); on the basis of the local policy (allow degraded mount, device timeout, how many device are missing, filesystem redundancy level.....) udev (mount) may mount the filesystem with the appropriate parameter (ro, degraded, or even insert a spare device to correct a missing device....)

>>
> 
> This is similar to problem mdadm had to solve. mdadm starts timer as
> soon as enough raid devices are present; if timer expires before raid
> is complete, raid is started in degraded mode. This avoids spurious
> rebuilds. So it would be good if btrfs could distinguish between enough
> devices to mount and all devices.

These are two different things: how export the filesystem information (I am still convinced that these have to be exported via sysfs), and what the system has to do in case of ... (a missing device ?). The latter is a policy, and I think that it should be not rely in the kernel.


> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 19+ messages in thread

* btrfs filesystem show confused when label is same as mountpoint
  2015-06-12 13:16 [survey] BTRFS_IOC_DEVICES_READY return status Anand Jain
  2015-06-12 18:04 ` [systemd-devel] " Andrei Borzenkov
@ 2015-06-13  7:20 ` Sjoerd
  2015-06-13  9:51   ` Duncan
  2015-06-15 10:27 ` [survey] BTRFS_IOC_DEVICES_READY return status Lennart Poettering
  2015-06-15 15:01 ` David Sterba
  3 siblings, 1 reply; 19+ messages in thread
From: Sjoerd @ 2015-06-13  7:20 UTC (permalink / raw)
  To: BTRFS

Hi,

I've a btrfs partition with label 'MULTIMEDIA' (all capitals) and mounted it 
on /data/Multimedia (only M capital) and see the following when doing a btrfs 
fi show:

for mountpoint:
btrfs fi show /data/Multimedia
Btrfs v3.17

versus for label:
btrfs fi show MULTIMEDIA
Label: 'MULTIMEDIA'  uuid: ce5d23cd-73a4-4f7c-83cd-2c40d12f6697
        Total devices 4 FS bytes used 5.04TiB
        devid    1 size 1.48TiB used 1.26TiB path /dev/sda2
        devid    2 size 1.48TiB used 1.26TiB path /dev/sdc2
        devid    3 size 1.48TiB used 1.26TiB path /dev/sdd2
        devid    4 size 1.48TiB used 1.26TiB path /dev/sde2


So in the latter case I get the results I was looking for.


It's not realy a question, but I couldn't find anything on the bugtracker (if 
it's a bug in the first place) or a known something, so just to let you 
know,cause I took me awhile to figure out why I didn't get results for this 
particular mountpoint, while for others I did ;)

Cheers,
Sjoerd


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [systemd-devel] [survey]  BTRFS_IOC_DEVICES_READY return status
  2015-06-12 20:08   ` Goffredo Baroncelli
@ 2015-06-13  9:35     ` Anand Jain
  2015-06-13 15:09       ` Goffredo Baroncelli
                         ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Anand Jain @ 2015-06-13  9:35 UTC (permalink / raw)
  To: kreijack, Andrei Borzenkov
  Cc: systemd-devel, linux-btrfs@vger.kernel.org >> linux-btrfs, dsterba


Thanks for your reply Andrei and Goffredo.
more below...

On 06/13/2015 04:08 AM, Goffredo Baroncelli wrote:
> On 2015-06-12 20:04, Andrei Borzenkov wrote:
>> В Fri, 12 Jun 2015 21:16:30 +0800
>> Anand Jain <anand.jain@oracle.com> пишет:
>>
>>>
>>>
>>> BTRFS_IOC_DEVICES_READY is to check if all the required devices
>>> are known by the btrfs kernel, so that admin/system-application
>>> could mount the FS. It is checked against a device in the argument.
>>>
>>> However the actual implementation is bit more than just that,
>>> in the way that it would also scan and register the device
>>> provided in the argument (same as btrfs device scan subcommand
>>> or BTRFS_IOC_SCAN_DEV ioctl).
>>>
>>> So BTRFS_IOC_DEVICES_READY ioctl isn't a read/view only ioctl,
>>> but its a write command as well.
>>>
>>> Next, since in the kernel we only check if total_devices
>>> (read from SB)  is equal to num_devices (counted in the list)
>>> to state the status as 0 (ready) or 1 (not ready). But this
>>> does not work in rest of the device pool state like missing,
>>> seeding, replacing since total_devices is actually not equal
>>> to num_devices in these state but device pool is ready for
>>> the mount and its a bug which is not part of this discussions.
>>>
>>>
>>> Questions:
>>>
>>>    - Do we want BTRFS_IOC_DEVICES_READY ioctl to also scan and
>>>      register the device provided (same as btrfs device scan
>>>      command or the BTRFS_IOC_SCAN_DEV ioctl)
>>>      OR can BTRFS_IOC_DEVICES_READY be read-only ioctl interface
>>>      to check the state of the device pool. ?
>>>
>>
>> udev is using it to incrementally assemble multi-device btrfs, so in
>> this case I think it should.

  Nice. Thanks for letting me know this.

> I agree, the ioctl name is confusing, but unfortunately this is an API and
> it has to be stay here forever. Udev uses it, so we know for sure that it
> is widely used.

  ok. what goes in stays there forever. its time to update
  the man page rather.

>> Are there any other users?
>>
>>>    - If the the device in the argument is already mounted,
>>>      can it straightaway return 0 (ready) ? (as of now it would
>>>      again independently read the SB determine total_devices
>>>      and check against num_devices.
>>>
>>
>> I think yes; obvious use case is btrfs mounted in initrd and later
>> coldplug. There is no point to wait for anything as filesystem is
>> obviously there.
>>

  There is little difference. If the device is already mounted.
  And there are two device paths for the same device PA and PB.
  The path as last given to either 'btrfs dev scan (BTRFS_IOC_SCAN_DEV)'
  or 'btrfs device ready (BTRFS_IOC_DEVICES_READY)' will be shown
  in the 'btrfs filesystem show' or '/proc/self/mounts' output.
  It does not mean that btrfs kernel will close the first device path
  and reopen the 2nd given device path, it just updates the device path
  in the kernel.

  Further, the problem will be more intense in this eg.
  if you use dd and copy device A to device B.
  After you mount device A, by just providing device B in the
  above two commands you could let kernel update the device path,
  again all the IO (since device is mounted) are still going to
  the device A (not B), but /proc/self/mounts and 'btrfs fi show'
  shows it as device B (not A).

  Its a bug. very tricky to fix.

   - we can't return -EBUSY for subsequent (after mount) calls
   for the above two ioctls (if a mounted device is used as an argument).
   Since admin/system-application might actually call again to
   mount subvols.

   - we can return success (without updating the device path) but,
   we would be wrong when device A is copied into device B using dd.
   Since we would check against the on device SB's fsid/uuid/devid.
   Checking using strcmp the device paths is not practical since there
   can be different paths to the same device (lets says mapper).

   (any suggestion on how to check if its the same device in the
   kernel?).

   - Also if we don't let to update the device path after device is
   mounted, then are there chances that we would be stuck with the
   device path during initrd which does not make any sense to the
   user ?


>>>    - What should be the expected return when the FS is mounted
>>>      and there is a missing device.
>
> I suggest to not invest further energy on a ioctl API. If you want these kind of information, you (we) should export these in sysfs:
> In an ideal world:
>
> - a new btrfs device appears
> - udev register it with BTRFS_IOC_SCAN_DEV:
> - udev (or mount ?) checks the status of the filesystem reading the sysfs entries (total devices, present devices, seed devices, raid level....); on the basis of the local policy (allow degraded mount, device timeout, how many device are missing, filesystem redundancy level.....) udev (mount) may mount the filesystem with the appropriate parameter (ro, degraded, or even insert a spare device to correct a missing device....)

  Yes. sysfs interface is coming. few framework patch were sent sometime
  back, any comments will help. On the ioctl part I am trying to fix the
  bug(s).

>>>
>>
>> This is similar to problem mdadm had to solve. mdadm starts timer as
>> soon as enough raid devices are present; if timer expires before raid
>> is complete, raid is started in degraded mode. This avoids spurious
>> rebuilds. So it would be good if btrfs could distinguish between enough
>> devices to mount and all devices.

> These are two different things: how export the filesystem information (I am still convinced that these have to be exported via sysfs), and what the system has to do in case of ... (a missing device ?). The latter is a policy, and I think that it should be not rely in the kernel.
>
>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: btrfs filesystem show confused when label is same as mountpoint
  2015-06-13  7:20 ` btrfs filesystem show confused when label is same as mountpoint Sjoerd
@ 2015-06-13  9:51   ` Duncan
  2015-06-25 16:37     ` David Sterba
  0 siblings, 1 reply; 19+ messages in thread
From: Duncan @ 2015-06-13  9:51 UTC (permalink / raw)
  To: linux-btrfs

Sjoerd posted on Sat, 13 Jun 2015 09:20:12 +0200 as excerpted:

> versus for label:
> btrfs fi show MULTIMEDIA
> Label: 'MULTIMEDIA'  uuid: ce5d23cd-73a4-4f7c-83cd-2c40d12f6697

Hmm... I wasn't even aware that you could /use/ label!  But sure enough, 
it works here, too:

btrfs fi show rt0238gcnx+35l0
Label: 'rt0238gcnx+35l0'  uuid: 8f8d79ef-a86f-4306-a255-e0519e0f6132
        Total devices 2 FS bytes used 1.94GiB
        devid    1 size 8.00GiB used 3.78GiB path /dev/sda5
        devid    2 size 8.00GiB used 3.78GiB path /dev/sdb5

btrfs-progs v4.0.1


It works for UUID as well...

btrfs fi show 8f8d79ef-a86f-4306-a255-e0519e0f6132
Label: 'rt0238gcnx+35l0'  uuid: 8f8d79ef-a86f-4306-a255-e0519e0f6132
        Total devices 2 FS bytes used 1.94GiB
        devid    1 size 8.00GiB used 3.78GiB path /dev/sda5
        devid    2 size 8.00GiB used 3.78GiB path /dev/sdb5

btrfs-progs v4.0.1

... but that's a lot of arbitrary typing.

Doesn't work with partlabel or id (see /dev/disk/by-*), however. =:^(

Anyway, thanks!  Learned something new about btrfs fi show, today! =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [systemd-devel] [survey]  BTRFS_IOC_DEVICES_READY return status
  2015-06-13  9:35     ` Anand Jain
@ 2015-06-13 15:09       ` Goffredo Baroncelli
       [not found]         ` <pan$63061$a3cdf5f6$a390adbd$e6097ad9@cox.net>
  2015-06-15 10:46         ` Lennart Poettering
  2015-06-14  5:48       ` Andrei Borzenkov
  2015-06-15 10:41       ` Lennart Poettering
  2 siblings, 2 replies; 19+ messages in thread
From: Goffredo Baroncelli @ 2015-06-13 15:09 UTC (permalink / raw)
  To: Anand Jain, Andrei Borzenkov
  Cc: systemd-devel, linux-btrfs@vger.kernel.org >> linux-btrfs, dsterba

On 2015-06-13 11:35, Anand Jain wrote:
> 
> Thanks for your reply Andrei and Goffredo. more below...
> 
> On 06/13/2015 04:08 AM, Goffredo Baroncelli wrote:
>> On 2015-06-12 20:04, Andrei Borzenkov wrote:
>>> В Fri, 12 Jun 2015 21:16:30 +0800 Anand Jain
>>> <anand.jain@oracle.com> пишет:
>>> 
>>>> 
>>>> 
>>>> BTRFS_IOC_DEVICES_READY is to check if all the required
>>>> devices are known by the btrfs kernel, so that
>>>> admin/system-application could mount the FS. It is checked
>>>> against a device in the argument.
>>>> 
>>>> However the actual implementation is bit more than just that, 
>>>> in the way that it would also scan and register the device 
>>>> provided in the argument (same as btrfs device scan subcommand 
>>>> or BTRFS_IOC_SCAN_DEV ioctl).
>>>> 
>>>> So BTRFS_IOC_DEVICES_READY ioctl isn't a read/view only ioctl, 
>>>> but its a write command as well.
>>>> 
>>>> Next, since in the kernel we only check if total_devices (read
>>>> from SB)  is equal to num_devices (counted in the list) to
>>>> state the status as 0 (ready) or 1 (not ready). But this does
>>>> not work in rest of the device pool state like missing, 
>>>> seeding, replacing since total_devices is actually not equal to
>>>> num_devices in these state but device pool is ready for the
>>>> mount and its a bug which is not part of this discussions.
>>>> 
>>>> 
>>>> Questions:
>>>> 
>>>> - Do we want BTRFS_IOC_DEVICES_READY ioctl to also scan and 
>>>> register the device provided (same as btrfs device scan command
>>>> or the BTRFS_IOC_SCAN_DEV ioctl) OR can BTRFS_IOC_DEVICES_READY
>>>> be read-only ioctl interface to check the state of the device
>>>> pool. ?
>>>> 
>>> 
>>> udev is using it to incrementally assemble multi-device btrfs, so
>>> in this case I think it should.
> 
> Nice. Thanks for letting me know this.
> 
>> I agree, the ioctl name is confusing, but unfortunately this is an
>> API and it has to be stay here forever. Udev uses it, so we know
>> for sure that it is widely used.
> 
> ok. what goes in stays there forever. its time to update the man page
> rather.
> 
>>> Are there any other users?
>>> 
>>>> - If the the device in the argument is already mounted, can it
>>>> straightaway return 0 (ready) ? (as of now it would again
>>>> independently read the SB determine total_devices and check
>>>> against num_devices.
>>>> 
>>> 
>>> I think yes; obvious use case is btrfs mounted in initrd and
>>> later coldplug. There is no point to wait for anything as
>>> filesystem is obviously there.
>>> 
> 
> There is little difference. If the device is already mounted. And
> there are two device paths for the same device PA and PB. The path as
> last given to either 'btrfs dev scan (BTRFS_IOC_SCAN_DEV)' or 'btrfs
> device ready (BTRFS_IOC_DEVICES_READY)' will be shown in the 'btrfs
> filesystem show' or '/proc/self/mounts' output. It does not mean that
> btrfs kernel will close the first device path and reopen the 2nd
> given device path, it just updates the device path in the kernel.
> 
> Further, the problem will be more intense in this eg. if you use dd
> and copy device A to device B. After you mount device A, by just
> providing device B in the above two commands you could let kernel
> update the device path, again all the IO (since device is mounted)
> are still going to the device A (not B), but /proc/self/mounts and
> 'btrfs fi show' shows it as device B (not A).
> 
> Its a bug. very tricky to fix.

In the past [*] I proposed a mount.btrfs helper . I tried to move the logic outside the kernel.
I think that the problem is that we try to manage all these cases from a device point of view: when a device appears, we register the device and we try to mount the filesystem... This works very well when there is 1-volume filesystem. For the other cases there is a mess between the different layers:
- kernel
- udev/systemd
- initrd logic

My attempt followed a different idea: the mount helper waits the devices if needed, or if it is the case it mounts the filesystem in degraded mode. All devices are passed as mount arguments (--device=/dev/sdX), there is no a device registration: this avoids all these problems.

[*] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/40767

back to your questions

> - we can't return -EBUSY for subsequent (after mount) calls for the
> above two ioctls (if a mounted device is used as an argument). Since
> admin/system-application might actually call again to mount subvols.

I am not sure that the two things are related: the mount doesn't use BTRFS_IOC_DEVICES_READY. After BTRFS_IOC_DEVICES_READY returns OK, all the filesystem belongs this FSID should be mounted; but it is a job of systemd/initramfs/sysv... a further failed BTRFS_IOC_DEVICES_READY shouldn't case any problem ...


> 
> - we can return success (without updating the device path) but, we
> would be wrong when device A is copied into device B using dd. Since
> we would check against the on device SB's fsid/uuid/devid. Checking
> using strcmp the device paths is not practical since there can be
> different paths to the same device (lets says mapper).

> 
> (any suggestion on how to check if its the same device in the 
> kernel?).

check minor/major ?

> 
> - Also if we don't let to update the device path after device is 
> mounted, then are there chances that we would be stuck with the 
> device path during initrd which does not make any sense to the user
> ?
> 
> 
>>>> - What should be the expected return when the FS is mounted and
>>>> there is a missing device.
>> 
>> I suggest to not invest further energy on a ioctl API. If you want
>> these kind of information, you (we) should export these in sysfs: 
>> In an ideal world:
>> 
>> - a new btrfs device appears - udev register it with
>> BTRFS_IOC_SCAN_DEV: - udev (or mount ?) checks the status of the
>> filesystem reading the sysfs entries (total devices, present
>> devices, seed devices, raid level....); on the basis of the local
>> policy (allow degraded mount, device timeout, how many device are
>> missing, filesystem redundancy level.....) udev (mount) may mount
>> the filesystem with the appropriate parameter (ro, degraded, or
>> even insert a spare device to correct a missing device....)
> 
> Yes. sysfs interface is coming. few framework patch were sent
> sometime back, any comments will help. On the ioctl part I am trying
> to fix the bug(s).




> 
>>>> 
>>> 
>>> This is similar to problem mdadm had to solve. mdadm starts timer
>>> as soon as enough raid devices are present; if timer expires
>>> before raid is complete, raid is started in degraded mode. This
>>> avoids spurious rebuilds. So it would be good if btrfs could
>>> distinguish between enough devices to mount and all devices.
> 
>> These are two different things: how export the filesystem
>> information (I am still convinced that these have to be exported
>> via sysfs), and what the system has to do in case of ... (a missing
>> device ?). The latter is a policy, and I think that it should be
>> not rely in the kernel.
>> 
>> 
>>> -- To unsubscribe from this list: send the line "unsubscribe
>>> linux-btrfs" in the body of a message to
>>> majordomo@vger.kernel.org More majordomo info at
>>> http://vger.kernel.org/majordomo-info.html
>>> 
>> 
>> 
> -- To unsubscribe from this list: send the line "unsubscribe
> linux-btrfs" in the body of a message to majordomo@vger.kernel.org 
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [systemd-devel] [survey]  BTRFS_IOC_DEVICES_READY return status
  2015-06-13  9:35     ` Anand Jain
  2015-06-13 15:09       ` Goffredo Baroncelli
@ 2015-06-14  5:48       ` Andrei Borzenkov
  2015-06-15 10:41       ` Lennart Poettering
  2 siblings, 0 replies; 19+ messages in thread
From: Andrei Borzenkov @ 2015-06-14  5:48 UTC (permalink / raw)
  To: Anand Jain
  Cc: kreijack, systemd-devel,
	linux-btrfs@vger.kernel.org >> linux-btrfs, dsterba

В Sat, 13 Jun 2015 17:35:53 +0800
Anand Jain <anand.jain@oracle.com> пишет:

> 
> Thanks for your reply Andrei and Goffredo.
> more below...
> 
> On 06/13/2015 04:08 AM, Goffredo Baroncelli wrote:
> > On 2015-06-12 20:04, Andrei Borzenkov wrote:
> >> В Fri, 12 Jun 2015 21:16:30 +0800
> >> Anand Jain <anand.jain@oracle.com> пишет:
> >>
> >>>
> >>>
> >>> BTRFS_IOC_DEVICES_READY is to check if all the required devices
> >>> are known by the btrfs kernel, so that admin/system-application
> >>> could mount the FS. It is checked against a device in the argument.
> >>>
> >>> However the actual implementation is bit more than just that,
> >>> in the way that it would also scan and register the device
> >>> provided in the argument (same as btrfs device scan subcommand
> >>> or BTRFS_IOC_SCAN_DEV ioctl).
> >>>
> >>> So BTRFS_IOC_DEVICES_READY ioctl isn't a read/view only ioctl,
> >>> but its a write command as well.
> >>>
> >>> Next, since in the kernel we only check if total_devices
> >>> (read from SB)  is equal to num_devices (counted in the list)
> >>> to state the status as 0 (ready) or 1 (not ready). But this
> >>> does not work in rest of the device pool state like missing,
> >>> seeding, replacing since total_devices is actually not equal
> >>> to num_devices in these state but device pool is ready for
> >>> the mount and its a bug which is not part of this discussions.
> >>>
> >>>
> >>> Questions:
> >>>
> >>>    - Do we want BTRFS_IOC_DEVICES_READY ioctl to also scan and
> >>>      register the device provided (same as btrfs device scan
> >>>      command or the BTRFS_IOC_SCAN_DEV ioctl)
> >>>      OR can BTRFS_IOC_DEVICES_READY be read-only ioctl interface
> >>>      to check the state of the device pool. ?
> >>>
> >>
> >> udev is using it to incrementally assemble multi-device btrfs, so in
> >> this case I think it should.
> 
>   Nice. Thanks for letting me know this.
> 
> > I agree, the ioctl name is confusing, but unfortunately this is an API and
> > it has to be stay here forever. Udev uses it, so we know for sure that it
> > is widely used.
> 
>   ok. what goes in stays there forever. its time to update
>   the man page rather.
> 
> >> Are there any other users?
> >>
> >>>    - If the the device in the argument is already mounted,
> >>>      can it straightaway return 0 (ready) ? (as of now it would
> >>>      again independently read the SB determine total_devices
> >>>      and check against num_devices.
> >>>
> >>
> >> I think yes; obvious use case is btrfs mounted in initrd and later
> >> coldplug. There is no point to wait for anything as filesystem is
> >> obviously there.
> >>
> 
>   There is little difference. If the device is already mounted.
>   And there are two device paths for the same device PA and PB.
>   The path as last given to either 'btrfs dev scan (BTRFS_IOC_SCAN_DEV)'
>   or 'btrfs device ready (BTRFS_IOC_DEVICES_READY)' will be shown
>   in the 'btrfs filesystem show' or '/proc/self/mounts' output.
>   It does not mean that btrfs kernel will close the first device path
>   and reopen the 2nd given device path, it just updates the device path
>   in the kernel.
> 
>   Further, the problem will be more intense in this eg.
>   if you use dd and copy device A to device B.
>   After you mount device A, by just providing device B in the
>   above two commands you could let kernel update the device path,
>   again all the IO (since device is mounted) are still going to
>   the device A (not B), but /proc/self/mounts and 'btrfs fi show'
>   shows it as device B (not A).
> 
>   Its a bug. very tricky to fix.
> 
>    - we can't return -EBUSY for subsequent (after mount) calls
>    for the above two ioctls (if a mounted device is used as an argument).
>    Since admin/system-application might actually call again to
>    mount subvols.
> 
>    - we can return success (without updating the device path) but,
>    we would be wrong when device A is copied into device B using dd.
>    Since we would check against the on device SB's fsid/uuid/devid.
>    Checking using strcmp the device paths is not practical since there
>    can be different paths to the same device (lets says mapper).
> 

Neither of those problems are specific to mounted filesystem. The
order of device discovery is non-deterministic. If you duplicate devices
(snapshot, dd) it is unpredictable which devices will be included in
btrfs. I.e. if you have A, B, C and A1, B1, C1 filesystem could be
assembled as A, B, C1 and next boot as A, B1, C.

Other systems attempt to mitigate such situation by keeping track of
both on-disk identification and physical device properties (e.g.
changing LU number will cause VMware to block access to disk on
assumption that it is snapshot). One possibility is to store disk
physical identity (UUID, serial number) and compare on access. 

Unless this is done, to guard against such case full device scan must
be performed and attempt to mount such filesystem (that has duplicated
members) blocked until admin resolves the issue. If filesystem is already
mounted, any attempt to add duplicated member must be rejected.

>    (any suggestion on how to check if its the same device in the
>    kernel?).
> 

I do not know kernel interfaces, but querying for unique device
identification sounds like the most generic approach. OTOH there are
multiple supported methods and that is already what user space does
quite well, so could be done in cooperation with it?

>    - Also if we don't let to update the device path after device is
>    mounted, then are there chances that we would be stuck with the
>    device path during initrd which does not make any sense to the
>    user ?
> 

I think this is independent. udev is only concerned with "is btrfs
ready to be mounted" or not. I do not see why it would prevent updating
device paths in kernel if needed.

> 
> >>>    - What should be the expected return when the FS is mounted
> >>>      and there is a missing device.
> >
> > I suggest to not invest further energy on a ioctl API. If you want these kind of information, you (we) should export these in sysfs:
> > In an ideal world:
> >
> > - a new btrfs device appears
> > - udev register it with BTRFS_IOC_SCAN_DEV:
> > - udev (or mount ?) checks the status of the filesystem reading the sysfs entries (total devices, present devices, seed devices, raid level....); on the basis of the local policy (allow degraded mount, device timeout, how many device are missing, filesystem redundancy level.....) udev (mount) may mount the filesystem with the appropriate parameter (ro, degraded, or even insert a spare device to correct a missing device....)
> 
>   Yes. sysfs interface is coming. few framework patch were sent sometime
>   back, any comments will help. On the ioctl part I am trying to fix the
>   bug(s).
> 

As we need to call into btrfs anyway, it would actually be easier if
BTRFS_IOC_DEVICES_READY could return both indication that btrfs can be
mounted and whether it is incomplete. It already has all information
and it saves extra calls each time. But as it is API now, it probably
won't happen (BTRFS_IOC_DEVICES_READY_V2?)

> >>>
> >>
> >> This is similar to problem mdadm had to solve. mdadm starts timer as
> >> soon as enough raid devices are present; if timer expires before raid
> >> is complete, raid is started in degraded mode. This avoids spurious
> >> rebuilds. So it would be good if btrfs could distinguish between enough
> >> devices to mount and all devices.
> 
> > These are two different things: how export the filesystem information (I am still convinced that these have to be exported via sysfs), and what the system has to do in case of ... (a missing device ?). The latter is a policy, and I think that it should be not rely in the kernel.
> >
> >
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >
> >


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status
       [not found]         ` <pan$63061$a3cdf5f6$a390adbd$e6097ad9@cox.net>
@ 2015-06-14 19:44           ` Goffredo Baroncelli
  0 siblings, 0 replies; 19+ messages in thread
From: Goffredo Baroncelli @ 2015-06-14 19:44 UTC (permalink / raw)
  To: systemd-devel, linux-btrfs, 1i5t5.duncan

On 2015-06-14 06:05, Duncan wrote:
> Goffredo Baroncelli posted on Sat, 13 Jun 2015 17:09:19 +0200 as
> excerpted:
> 
>> My attempt followed a different idea: the mount helper waits the devices
>> if needed, or if it is the case it mounts the filesystem in degraded
>> mode.
>> All devices are passed as mount arguments (--device=/dev/sdX), there is
>> no a device registration: this avoids all these problems.
>>
>> [*] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/40767
> 
> But /dev/sdX doesn't always work, because, for instance, my usual /dev/sdb 
> was slow to respond on my last boot, and currently appears as /dev/sdf, 
> with sdb/c/d/e being my (multi-type) sdcard, etc, adapter, medialess.

Please give a look to my patch.

You may mount the filesystem in different way:
- by device (/dev/sdxxx)
- by UUID (UUID=)
- by LABEL (LABEL=)

The helper finds the right devices and (eventually) waits for the other devices.
When it has collected all the devices, these are passed to the kernel via 
the "device=/dev/sdx" mount option. So the registration would not be needed anymore.

> 
> Tho if /dev/disk/by-*/* works, I could use that.  Tho AFAIK it's udev 
> that fills that in, so udev would be necessary.

I never wrote that udev is not necessary. I think only that relying to udev
to handling a multi-volume filesystem is too complicated. The responsibility 
is spread in too much layer.



-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [survey]  BTRFS_IOC_DEVICES_READY return status
  2015-06-12 13:16 [survey] BTRFS_IOC_DEVICES_READY return status Anand Jain
  2015-06-12 18:04 ` [systemd-devel] " Andrei Borzenkov
  2015-06-13  7:20 ` btrfs filesystem show confused when label is same as mountpoint Sjoerd
@ 2015-06-15 10:27 ` Lennart Poettering
  2015-06-15 15:01 ` David Sterba
  3 siblings, 0 replies; 19+ messages in thread
From: Lennart Poettering @ 2015-06-15 10:27 UTC (permalink / raw)
  To: Anand Jain
  Cc: systemd-devel, linux-btrfs@vger.kernel.org >> linux-btrfs, dsterba

On Fri, 12.06.15 21:16, Anand Jain (anand.jain@oracle.com) wrote:

> 
> 
> BTRFS_IOC_DEVICES_READY is to check if all the required devices
> are known by the btrfs kernel, so that admin/system-application
> could mount the FS. It is checked against a device in the argument.
> 
> However the actual implementation is bit more than just that,
> in the way that it would also scan and register the device
> provided in the argument (same as btrfs device scan subcommand
> or BTRFS_IOC_SCAN_DEV ioctl).
> 
> So BTRFS_IOC_DEVICES_READY ioctl isn't a read/view only ioctl,
> but its a write command as well.
> 
> Next, since in the kernel we only check if total_devices
> (read from SB)  is equal to num_devices (counted in the list)
> to state the status as 0 (ready) or 1 (not ready). But this
> does not work in rest of the device pool state like missing,
> seeding, replacing since total_devices is actually not equal
> to num_devices in these state but device pool is ready for
> the mount and its a bug which is not part of this discussions.
> 
> 
> Questions:
> 
>  - Do we want BTRFS_IOC_DEVICES_READY ioctl to also scan and
>    register the device provided (same as btrfs device scan
>    command or the BTRFS_IOC_SCAN_DEV ioctl)
>    OR can BTRFS_IOC_DEVICES_READY be read-only ioctl interface
>    to check the state of the device pool. ?

I am pretty sure the kernel should not change API on this now. Hence:
stick to the current behaviour, please.

>  - If the the device in the argument is already mounted,
>    can it straightaway return 0 (ready) ? (as of now it would
>    again independently read the SB determine total_devices
>    and check against num_devices.

Yeah, I figure that might make sense to do.

>  - What should be the expected return when the FS is mounted
>    and there is a missing device.

An error, as it already does.

I am pretty sure that mounting "degraded" file systems should be an
exceptional operation, and not the common scheme. If it should happen
automatically at all, then it should be triggered by some daemon or
so, but not by udev/systemd.

Lennart

-- 
Lennart Poettering, Red Hat

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status
  2015-06-13  9:35     ` Anand Jain
  2015-06-13 15:09       ` Goffredo Baroncelli
  2015-06-14  5:48       ` Andrei Borzenkov
@ 2015-06-15 10:41       ` Lennart Poettering
  2 siblings, 0 replies; 19+ messages in thread
From: Lennart Poettering @ 2015-06-15 10:41 UTC (permalink / raw)
  To: Anand Jain
  Cc: kreijack, Andrei Borzenkov, systemd-devel, dsterba,
	linux-btrfs@vger.kernel.org >> linux-btrfs

On Sat, 13.06.15 17:35, Anand Jain (anand.jain@oracle.com) wrote:

> >>Are there any other users?
> >>
> >>>   - If the the device in the argument is already mounted,
> >>>     can it straightaway return 0 (ready) ? (as of now it would
> >>>     again independently read the SB determine total_devices
> >>>     and check against num_devices.
> >>>
> >>
> >>I think yes; obvious use case is btrfs mounted in initrd and later
> >>coldplug. There is no point to wait for anything as filesystem is
> >>obviously there.
> >>
> 
>  There is little difference. If the device is already mounted.
>  And there are two device paths for the same device PA and PB.
>  The path as last given to either 'btrfs dev scan (BTRFS_IOC_SCAN_DEV)'
>  or 'btrfs device ready (BTRFS_IOC_DEVICES_READY)' will be shown
>  in the 'btrfs filesystem show' or '/proc/self/mounts' output.
>  It does not mean that btrfs kernel will close the first device path
>  and reopen the 2nd given device path, it just updates the device path
>  in the kernel.

The device paths shown in /proc/self/mountinfo is also weird in other
cases: if people boot up without initrd, and use a btrfs fs as root,
then it will always carry the string /dev/root in there, which is
completely useless, since such a device never exists in userspace or
/sys, and hence one cannot make sense of. Moreover, if one then asks
the kernel for the devices backing the btrfs fs via the ioctl it will
also return /dev/root for it, which is really useless.

I think in general I'd prefer if btrfs would stop returning the device
paths it got from userspace or the kernel, and would always return
sanitized ones that use the official kernel names for the devices in
them. Specifically, the member devices ioctl should always return
names like "/dev/sda5", even if I mount something using root= on the
kernel cmdline, or if I mount /dev/disks/by-uuid/.... via a symlink
instead of the real kernel name of the device.

Then, I think it would be a good idea to always update the device
string shown in /proc/self/mountinfo to be a concatenated version of
the list of device names reported by the ioctl. So that a btrfs RAID
would show "/dev/sda5:/dev/sdb6:/dev/sdc5" or so. And if I remove or
add backing devices the string really should be updated.

The btrfs client side tools then could use udev to get a list of the
device node symlinks for each device to help the user identifying
which backing devices belong to a btrfs pool.

Lennart

-- 
Lennart Poettering, Red Hat

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status
  2015-06-13 15:09       ` Goffredo Baroncelli
       [not found]         ` <pan$63061$a3cdf5f6$a390adbd$e6097ad9@cox.net>
@ 2015-06-15 10:46         ` Lennart Poettering
  2015-06-15 17:23           ` Goffredo Baroncelli
  1 sibling, 1 reply; 19+ messages in thread
From: Lennart Poettering @ 2015-06-15 10:46 UTC (permalink / raw)
  To: kreijack
  Cc: Anand Jain, Andrei Borzenkov, systemd-devel, dsterba,
	linux-btrfs@vger.kernel.org >> linux-btrfs

On Sat, 13.06.15 17:09, Goffredo Baroncelli (kreijack@libero.it) wrote:

> > Further, the problem will be more intense in this eg. if you use dd
> > and copy device A to device B. After you mount device A, by just
> > providing device B in the above two commands you could let kernel
> > update the device path, again all the IO (since device is mounted)
> > are still going to the device A (not B), but /proc/self/mounts and
> > 'btrfs fi show' shows it as device B (not A).
> > 
> > Its a bug. very tricky to fix.
> 
> In the past [*] I proposed a mount.btrfs helper . I tried to move the logic outside the kernel.
> I think that the problem is that we try to manage all these cases
> from a device point of view: when a device appears, we register the
> device and we try to mount the filesystem... This works very well
> when there is 1-volume filesystem. For the other cases there is a
> mess between the different layers:

> - kernel
> - udev/systemd
> - initrd logic
> 
> My attempt followed a different idea: the mount helper waits the
> devices if needed, or if it is the case it mounts the filesystem in
> degraded mode. All devices are passed as mount arguments
> (--device=/dev/sdX), there is no a device registration: this avoids
> all these problems.

Hmm, no. /bin/mount should not block for devices. That's generally
incompatible with how the tool is used, and in particular from
systemd. We would not make use for such a scheme in
systemd. /bin/mount should always be short-running.

I am pretty sure that if such automatic degraded mounting should be
supported, then this should be done with some background storage
daemon that alters the effect of the READY ioctl somehow after the
timeout, and then retriggers the devcies so that systemd takes
note. (or, alternatively: such a scheme could even be implemented all
in kernel, based on some configurable kernel setting...)

Lennart

-- 
Lennart Poettering, Red Hat

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [survey]  BTRFS_IOC_DEVICES_READY return status
  2015-06-12 13:16 [survey] BTRFS_IOC_DEVICES_READY return status Anand Jain
                   ` (2 preceding siblings ...)
  2015-06-15 10:27 ` [survey] BTRFS_IOC_DEVICES_READY return status Lennart Poettering
@ 2015-06-15 15:01 ` David Sterba
  3 siblings, 0 replies; 19+ messages in thread
From: David Sterba @ 2015-06-15 15:01 UTC (permalink / raw)
  To: Anand Jain
  Cc: systemd-devel, linux-btrfs@vger.kernel.org >> linux-btrfs, lennart

On Fri, Jun 12, 2015 at 09:16:30PM +0800, Anand Jain wrote:
> BTRFS_IOC_DEVICES_READY is to check if all the required devices
> are known by the btrfs kernel, so that admin/system-application
> could mount the FS. It is checked against a device in the argument.
> 
> However the actual implementation is bit more than just that,
> in the way that it would also scan and register the device
> provided in the argument (same as btrfs device scan subcommand
> or BTRFS_IOC_SCAN_DEV ioctl).
> 
> So BTRFS_IOC_DEVICES_READY ioctl isn't a read/view only ioctl,
> but its a write command as well.

The implemented DEVICES_READY behaviour is intentional, but not a good
example of ioctl interface design. I asked for a more generic interface
to querying devices when this patch was submitted but to no outcome.

> Next, since in the kernel we only check if total_devices
> (read from SB)  is equal to num_devices (counted in the list)
> to state the status as 0 (ready) or 1 (not ready). But this
> does not work in rest of the device pool state like missing,
> seeding, replacing since total_devices is actually not equal
> to num_devices in these state but device pool is ready for
> the mount and its a bug which is not part of this discussions.

That's an example why the single-shot ioctl is bad - it relies on some
internal state that's otherwise nontrivial to get.

> Questions:
> 
>   - Do we want BTRFS_IOC_DEVICES_READY ioctl to also scan and
>     register the device provided (same as btrfs device scan
>     command or the BTRFS_IOC_SCAN_DEV ioctl)
>     OR can BTRFS_IOC_DEVICES_READY be read-only ioctl interface
>     to check the state of the device pool. ?

This has been mentioned in the thread, we cannot change the ioctl that
way. Extensions are possible as far as they stay backward compatible
without changes to the existing users.

>   - If the the device in the argument is already mounted,
>     can it straightaway return 0 (ready) ? (as of now it would
>     again independently read the SB determine total_devices
>     and check against num_devices.

We can do that, looks like a safe optimization.

>   - What should be the expected return when the FS is mounted
>     and there is a missing device.

I think the current ioctl cannot give a good answer to that, similar to
the seeding or dev-replace case. We'd need an improved ioctl or do it
via sysfs which is my preference at the moment.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status
  2015-06-15 10:46         ` Lennart Poettering
@ 2015-06-15 17:23           ` Goffredo Baroncelli
  2015-06-15 17:38             ` Lennart Poettering
  0 siblings, 1 reply; 19+ messages in thread
From: Goffredo Baroncelli @ 2015-06-15 17:23 UTC (permalink / raw)
  To: Lennart Poettering
  Cc: Anand Jain, Andrei Borzenkov, systemd-devel, dsterba,
	linux-btrfs@vger.kernel.org >> linux-btrfs

On 2015-06-15 12:46, Lennart Poettering wrote:
> On Sat, 13.06.15 17:09, Goffredo Baroncelli (kreijack@libero.it) wrote:
> 
>>> Further, the problem will be more intense in this eg. if you use dd
>>> and copy device A to device B. After you mount device A, by just
>>> providing device B in the above two commands you could let kernel
>>> update the device path, again all the IO (since device is mounted)
>>> are still going to the device A (not B), but /proc/self/mounts and
>>> 'btrfs fi show' shows it as device B (not A).
>>>
>>> Its a bug. very tricky to fix.
>>
>> In the past [*] I proposed a mount.btrfs helper . I tried to move the logic outside the kernel.
>> I think that the problem is that we try to manage all these cases
>> from a device point of view: when a device appears, we register the
>> device and we try to mount the filesystem... This works very well
>> when there is 1-volume filesystem. For the other cases there is a
>> mess between the different layers:
> 
>> - kernel
>> - udev/systemd
>> - initrd logic
>>
>> My attempt followed a different idea: the mount helper waits the
>> devices if needed, or if it is the case it mounts the filesystem in
>> degraded mode. All devices are passed as mount arguments
>> (--device=/dev/sdX), there is no a device registration: this avoids
>> all these problems.
> 
> Hmm, no. /bin/mount should not block for devices. That's generally
> incompatible with how the tool is used, and in particular from
> systemd. We would not make use for such a scheme in
> systemd. /bin/mount should always be short-running.

Apart systemd, which are these incompatibilities ? 

> 
> I am pretty sure that if such automatic degraded mounting should be
> supported, then this should be done with some background storage
> daemon that alters the effect of the READY ioctl somehow after the
> timeout, and then retriggers the devcies so that systemd takes
> note. (or, alternatively: such a scheme could even be implemented all
> in kernel, based on some configurable kernel setting...)

I recognize that this solution provides the maximum compatibility with the current implementation. However it seems too complex to me. Re-trigging a devices seems to me more a workaround than a solution.

Could a generator do this job ? I.e. this generator (or storage daemon) waits that all (or enough) devices are appeared, then it creates a .mount unit: do you think that it is doable ?


> 
> Lennart
> 
Goffredo

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status
  2015-06-15 17:23           ` Goffredo Baroncelli
@ 2015-06-15 17:38             ` Lennart Poettering
  2015-06-17 19:10               ` Goffredo Baroncelli
  0 siblings, 1 reply; 19+ messages in thread
From: Lennart Poettering @ 2015-06-15 17:38 UTC (permalink / raw)
  To: Goffredo Baroncelli
  Cc: Anand Jain, Andrei Borzenkov, systemd-devel, dsterba,
	linux-btrfs@vger.kernel.org >> linux-btrfs

On Mon, 15.06.15 19:23, Goffredo Baroncelli (kreijack@inwind.it) wrote:

> On 2015-06-15 12:46, Lennart Poettering wrote:
> > On Sat, 13.06.15 17:09, Goffredo Baroncelli (kreijack@libero.it) wrote:
> > 
> >>> Further, the problem will be more intense in this eg. if you use dd
> >>> and copy device A to device B. After you mount device A, by just
> >>> providing device B in the above two commands you could let kernel
> >>> update the device path, again all the IO (since device is mounted)
> >>> are still going to the device A (not B), but /proc/self/mounts and
> >>> 'btrfs fi show' shows it as device B (not A).
> >>>
> >>> Its a bug. very tricky to fix.
> >>
> >> In the past [*] I proposed a mount.btrfs helper . I tried to move the logic outside the kernel.
> >> I think that the problem is that we try to manage all these cases
> >> from a device point of view: when a device appears, we register the
> >> device and we try to mount the filesystem... This works very well
> >> when there is 1-volume filesystem. For the other cases there is a
> >> mess between the different layers:
> > 
> >> - kernel
> >> - udev/systemd
> >> - initrd logic
> >>
> >> My attempt followed a different idea: the mount helper waits the
> >> devices if needed, or if it is the case it mounts the filesystem in
> >> degraded mode. All devices are passed as mount arguments
> >> (--device=/dev/sdX), there is no a device registration: this avoids
> >> all these problems.
> > 
> > Hmm, no. /bin/mount should not block for devices. That's generally
> > incompatible with how the tool is used, and in particular from
> > systemd. We would not make use for such a scheme in
> > systemd. /bin/mount should always be short-running.
> 
> Apart systemd, which are these incompatibilities ? 

Well, /bin/mount is not a daemon, and it should not be one.

> > I am pretty sure that if such automatic degraded mounting should be
> > supported, then this should be done with some background storage
> > daemon that alters the effect of the READY ioctl somehow after the
> > timeout, and then retriggers the devcies so that systemd takes
> > note. (or, alternatively: such a scheme could even be implemented all
> > in kernel, based on some configurable kernel setting...)
> 
> I recognize that this solution provides the maximum compatibility
> with the current implementation. However it seems too complex to
> me. Re-trigging a devices seems to me more a workaround than a
> solution.

Well, it's not really ugly. I mean, if the state or properties of a
device change, then udev should update its information about it, and
that's done via a retrigger. We do that all the time already, for
example when an existing loopback device gets a backing file assigned
or removed. I am pretty sure that loopback case is very close to what
you want to do here, hence retriggering (either from the kernel side,
or from userspace), appears like an OK thing to do.

> Could a generator do this job ? I.e. this generator (or storage
> daemon) waits that all (or enough) devices are appeared, then it
> creates a .mount unit: do you think that it is doable ?

systemd generators are a way to extend the systemd unit dep tree with
units. They are very short running, and are executed only very very
early at boot. They cannot wait for anything, they don#t have access
to devices and are not run when they are appear.

Lennart

-- 
Lennart Poettering, Red Hat

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status
  2015-06-15 17:38             ` Lennart Poettering
@ 2015-06-17 19:10               ` Goffredo Baroncelli
  2015-06-17 21:02                 ` Lennart Poettering
  0 siblings, 1 reply; 19+ messages in thread
From: Goffredo Baroncelli @ 2015-06-17 19:10 UTC (permalink / raw)
  To: Lennart Poettering; +Cc: systemd Mailing List, linux-btrfs

On 2015-06-15 19:38, Lennart Poettering wrote:
> On Mon, 15.06.15 19:23, Goffredo Baroncelli (kreijack@inwind.it) wrote:
> 
>> On 2015-06-15 12:46, Lennart Poettering wrote:
>>> On Sat, 13.06.15 17:09, Goffredo Baroncelli (kreijack@libero.it) wrote:
>>>
>>>>> Further, the problem will be more intense in this eg. if you use dd
>>>>> and copy device A to device B. After you mount device A, by just
>>>>> providing device B in the above two commands you could let kernel
>>>>> update the device path, again all the IO (since device is mounted)
>>>>> are still going to the device A (not B), but /proc/self/mounts and
>>>>> 'btrfs fi show' shows it as device B (not A).
>>>>>
>>>>> Its a bug. very tricky to fix.
>>>>
>>>> In the past [*] I proposed a mount.btrfs helper . I tried to move the logic outside the kernel.
>>>> I think that the problem is that we try to manage all these cases
>>>> from a device point of view: when a device appears, we register the
>>>> device and we try to mount the filesystem... This works very well
>>>> when there is 1-volume filesystem. For the other cases there is a
>>>> mess between the different layers:
>>>
>>>> - kernel
>>>> - udev/systemd
>>>> - initrd logic
>>>>
>>>> My attempt followed a different idea: the mount helper waits the
>>>> devices if needed, or if it is the case it mounts the filesystem in
>>>> degraded mode. All devices are passed as mount arguments
>>>> (--device=/dev/sdX), there is no a device registration: this avoids
>>>> all these problems.
>>>
>>> Hmm, no. /bin/mount should not block for devices. That's generally
>>> incompatible with how the tool is used, and in particular from
>>> systemd. We would not make use for such a scheme in
>>> systemd. /bin/mount should always be short-running.
>>
>> Apart systemd, which are these incompatibilities ? 
> 
> Well, /bin/mount is not a daemon, and it should not be one.

My helper is not a deamon; you was correct the first time: it blocks until all needed/enough devices are appeared.
Anyway this should not be different from mounting a nfs filesystem. Even in this case the mount helper blocks until the connection happened. The block time is not negligible, even tough not long as a device timeout ... 

> 
>>> I am pretty sure that if such automatic degraded mounting should be
>>> supported, then this should be done with some background storage
>>> daemon that alters the effect of the READY ioctl somehow after the
>>> timeout, and then retriggers the devcies so that systemd takes
>>> note. (or, alternatively: such a scheme could even be implemented all
>>> in kernel, based on some configurable kernel setting...)
>>
>> I recognize that this solution provides the maximum compatibility
>> with the current implementation. However it seems too complex to
>> me. Re-trigging a devices seems to me more a workaround than a
>> solution.
> 
> Well, it's not really ugly. I mean, if the state or properties of a
> device change, then udev should update its information about it, and
> that's done via a retrigger. We do that all the time already, for
> example when an existing loopback device gets a backing file assigned
> or removed. I am pretty sure that loopback case is very close to what
> you want to do here, hence retriggering (either from the kernel side,
> or from userspace), appears like an OK thing to do.

What seems strange to me is that in this case the devices don't have changed their status.
How this problem is managed in the md/dm raid cases ?

> 
>> Could a generator do this job ? I.e. this generator (or storage
>> daemon) waits that all (or enough) devices are appeared, then it
>> creates a .mount unit: do you think that it is doable ?
> 
> systemd generators are a way to extend the systemd unit dep tree with
> units. They are very short running, and are executed only very very
> early at boot. They cannot wait for anything, they don#t have access
> to devices and are not run when they are appear.
> 
> Lennart
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status
  2015-06-17 19:10               ` Goffredo Baroncelli
@ 2015-06-17 21:02                 ` Lennart Poettering
  2015-06-18  2:40                   ` Andrei Borzenkov
  0 siblings, 1 reply; 19+ messages in thread
From: Lennart Poettering @ 2015-06-17 21:02 UTC (permalink / raw)
  To: kreijack; +Cc: systemd Mailing List, linux-btrfs

On Wed, 17.06.15 21:10, Goffredo Baroncelli (kreijack@libero.it) wrote:

> > Well, /bin/mount is not a daemon, and it should not be one.
> 
> My helper is not a deamon; you was correct the first time: it blocks
> until all needed/enough devices are appeared.
> Anyway this should not be different from mounting a nfs
> filesystem. Even in this case the mount helper blocks until the
> connection happened. The block time is not negligible, even tough
> not long as a device timeout ...

Well, the mount tool doesn't wait for the network to be configured or
so. It just waits for a response from the server. That's quite a
difference.

> > Well, it's not really ugly. I mean, if the state or properties of a
> > device change, then udev should update its information about it, and
> > that's done via a retrigger. We do that all the time already, for
> > example when an existing loopback device gets a backing file assigned
> > or removed. I am pretty sure that loopback case is very close to what
> > you want to do here, hence retriggering (either from the kernel side,
> > or from userspace), appears like an OK thing to do.
> 
> What seems strange to me is that in this case the devices don't have changed their status.
> How this problem is managed in the md/dm raid cases ?

md has a daemon mdmon to my knowledge.

Lennart

-- 
Lennart Poettering, Red Hat

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status
  2015-06-17 21:02                 ` Lennart Poettering
@ 2015-06-18  2:40                   ` Andrei Borzenkov
  0 siblings, 0 replies; 19+ messages in thread
From: Andrei Borzenkov @ 2015-06-18  2:40 UTC (permalink / raw)
  To: Lennart Poettering; +Cc: kreijack, systemd Mailing List, linux-btrfs

В Wed, 17 Jun 2015 23:02:02 +0200
Lennart Poettering <lennart@poettering.net> пишет:

> On Wed, 17.06.15 21:10, Goffredo Baroncelli (kreijack@libero.it) wrote:
> 
> > > Well, /bin/mount is not a daemon, and it should not be one.
> > 
> > My helper is not a deamon; you was correct the first time: it blocks
> > until all needed/enough devices are appeared.
> > Anyway this should not be different from mounting a nfs
> > filesystem. Even in this case the mount helper blocks until the
> > connection happened. The block time is not negligible, even tough
> > not long as a device timeout ...
> 
> Well, the mount tool doesn't wait for the network to be configured or
> so. It just waits for a response from the server. That's quite a
> difference.
> 
> > > Well, it's not really ugly. I mean, if the state or properties of a
> > > device change, then udev should update its information about it, and
> > > that's done via a retrigger. We do that all the time already, for
> > > example when an existing loopback device gets a backing file assigned
> > > or removed. I am pretty sure that loopback case is very close to what
> > > you want to do here, hence retriggering (either from the kernel side,
> > > or from userspace), appears like an OK thing to do.
> > 
> > What seems strange to me is that in this case the devices don't have changed their status.
> > How this problem is managed in the md/dm raid cases ?
> 
> md has a daemon mdmon to my knowledge.
> 

No, mdmon does something different. What mdadm does is to start timer
when RAID is complete enough to be started in degraded mode. If
notifications for missing devices appear after that, RAID is started
normally. If no notification appears until timer is expired, RAID is
started in degraded mode. 

ACTION=="add|change", IMPORT{program}="BINDIR/mdadm --incremental --export $devnode --offroot ${DEVLINKS}"
ACTION=="add|change", ENV{MD_STARTED}=="*unsafe*", ENV{MD_FOREIGN}=="no", ENV{SYSTEMD_WANTS}+="mdadm-last-resort@$env{MD_DEVICE}.timer"


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: btrfs filesystem show confused when label is same as mountpoint
  2015-06-13  9:51   ` Duncan
@ 2015-06-25 16:37     ` David Sterba
  0 siblings, 0 replies; 19+ messages in thread
From: David Sterba @ 2015-06-25 16:37 UTC (permalink / raw)
  To: Duncan; +Cc: linux-btrfs

On Sat, Jun 13, 2015 at 09:51:41AM +0000, Duncan wrote:
> Sjoerd posted on Sat, 13 Jun 2015 09:20:12 +0200 as excerpted:
> 
> > versus for label:
> > btrfs fi show MULTIMEDIA
> > Label: 'MULTIMEDIA'  uuid: ce5d23cd-73a4-4f7c-83cd-2c40d12f6697
> 
> Hmm... I wasn't even aware that you could /use/ label!  But sure enough, 
> it works here, too:
> 
> btrfs fi show rt0238gcnx+35l0
> Label: 'rt0238gcnx+35l0'  uuid: 8f8d79ef-a86f-4306-a255-e0519e0f6132
>         Total devices 2 FS bytes used 1.94GiB
>         devid    1 size 8.00GiB used 3.78GiB path /dev/sda5
>         devid    2 size 8.00GiB used 3.78GiB path /dev/sdb5
> 
> btrfs-progs v4.0.1
> 
> 
> It works for UUID as well...
> 
> btrfs fi show 8f8d79ef-a86f-4306-a255-e0519e0f6132
> Label: 'rt0238gcnx+35l0'  uuid: 8f8d79ef-a86f-4306-a255-e0519e0f6132
>         Total devices 2 FS bytes used 1.94GiB
>         devid    1 size 8.00GiB used 3.78GiB path /dev/sda5
>         devid    2 size 8.00GiB used 3.78GiB path /dev/sdb5
> 
> btrfs-progs v4.0.1
> 
> ... but that's a lot of arbitrary typing.
> 
> Doesn't work with partlabel or id (see /dev/disk/by-*), however. =:^(

The commandline tries to guess if it's label/uuid/path. If we want to
add support for partlabel and/or partuuid, we can't use the bare string,
but possibly the blkid tags, like

 $ btrfs fi show PARTUUID="8f8d79ef-a86f-4306-a255-e0519e0f6132"


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-06-25 16:37 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-12 13:16 [survey] BTRFS_IOC_DEVICES_READY return status Anand Jain
2015-06-12 18:04 ` [systemd-devel] " Andrei Borzenkov
2015-06-12 20:08   ` Goffredo Baroncelli
2015-06-13  9:35     ` Anand Jain
2015-06-13 15:09       ` Goffredo Baroncelli
     [not found]         ` <pan$63061$a3cdf5f6$a390adbd$e6097ad9@cox.net>
2015-06-14 19:44           ` Goffredo Baroncelli
2015-06-15 10:46         ` Lennart Poettering
2015-06-15 17:23           ` Goffredo Baroncelli
2015-06-15 17:38             ` Lennart Poettering
2015-06-17 19:10               ` Goffredo Baroncelli
2015-06-17 21:02                 ` Lennart Poettering
2015-06-18  2:40                   ` Andrei Borzenkov
2015-06-14  5:48       ` Andrei Borzenkov
2015-06-15 10:41       ` Lennart Poettering
2015-06-13  7:20 ` btrfs filesystem show confused when label is same as mountpoint Sjoerd
2015-06-13  9:51   ` Duncan
2015-06-25 16:37     ` David Sterba
2015-06-15 10:27 ` [survey] BTRFS_IOC_DEVICES_READY return status Lennart Poettering
2015-06-15 15:01 ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.