All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] getroot: Correctly handle missing btrfs device identifiers.
@ 2016-03-17 15:41 James Johnston
  2016-03-17 17:11 ` Andrei Borzenkov
  0 siblings, 1 reply; 4+ messages in thread
From: James Johnston @ 2016-03-17 15:41 UTC (permalink / raw)
  To: grub-devel

From 97bcb9eb8c34329a8ca1ec88fd89296e273ceb24 Mon Sep 17 00:00:00 2001
From: James Johnston <johnstonj.public@codenest.com>
Date: Thu, 17 Mar 2016 15:10:49 +0000
Subject: [PATCH] getroot: Correctly handle missing btrfs device identifiers.

The btrfs driver's BTRFS_IOC_FS_INFO ioctl only tells you the maximum
device ID and the number of devices.  It does not tell you which
device IDs may no longer be allocated - for example, if a device is
later deleted from the file system, its device ID won't be allocated
any more.

You must then probe each device ID with BTRFS_IOC_DEV_INFO to see if
it is a valid device or not.  It is not an error condition for this
ioctl to return ENODEV: it seems to mean that the device was deleted
by the user.

Signed-off-by: James Johnston <johnstonj.public@codenest.com>
---
 grub-core/osdep/linux/getroot.c | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/grub-core/osdep/linux/getroot.c b/grub-core/osdep/linux/getroot.c
index 10480b6..852f3a2 100644
--- a/grub-core/osdep/linux/getroot.c
+++ b/grub-core/osdep/linux/getroot.c
@@ -242,15 +242,29 @@ grub_find_root_devices_from_btrfs (const char *dir)
       struct btrfs_ioctl_dev_info_args devi;
       memset (&devi, 0, sizeof (devi));
       devi.devid = i;
+      /* We have to probe all possible device IDs, since btrfs doesn't
+	 give us a list of all the valid ones.  */
       if (ioctl (fd, BTRFS_IOC_DEV_INFO, &devi) < 0)
 	{
-	  close (fd);
-	  free (ret);
-	  return NULL;
+	  /* ENODEV is normal, e.g. if the first device is deleted from
+	     an array. */
+	  int last_error = errno;
+	  if (last_error != ENODEV)
+	    {
+	      grub_util_warn(_("btrfs device info could not be read "
+			       "for %s, device %d: errno %d"), dir, i,
+			       last_error);
+	      close (fd);
+	      free (ret);
+	      return NULL;
+	    }
+	}
+      else
+	{
+	  ret[j++] = xstrdup ((char *) devi.path);
+	  if (j >= fsi.num_devices)
+	    break;
 	}
-      ret[j++] = xstrdup ((char *) devi.path);
-      if (j >= fsi.num_devices)
-	break;
     }
   close (fd);
   ret[j] = 0;
-- 
1.9.5.msysgit.1





^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] getroot: Correctly handle missing btrfs device identifiers.
  2016-03-17 15:41 [PATCH] getroot: Correctly handle missing btrfs device identifiers James Johnston
@ 2016-03-17 17:11 ` Andrei Borzenkov
  0 siblings, 0 replies; 4+ messages in thread
From: Andrei Borzenkov @ 2016-03-17 17:11 UTC (permalink / raw)
  To: The development of GNU GRUB

17.03.2016 18:41, James Johnston пишет:
> From 97bcb9eb8c34329a8ca1ec88fd89296e273ceb24 Mon Sep 17 00:00:00 2001
> From: James Johnston <johnstonj.public@codenest.com>
> Date: Thu, 17 Mar 2016 15:10:49 +0000
> Subject: [PATCH] getroot: Correctly handle missing btrfs device identifiers.
> 
> The btrfs driver's BTRFS_IOC_FS_INFO ioctl only tells you the maximum
> device ID and the number of devices.  It does not tell you which
> device IDs may no longer be allocated - for example, if a device is
> later deleted from the file system, its device ID won't be allocated
> any more.
> 
> You must then probe each device ID with BTRFS_IOC_DEV_INFO to see if
> it is a valid device or not.  It is not an error condition for this
> ioctl to return ENODEV: it seems to mean that the device was deleted
> by the user.
> 

Kernel returns ENODEV if device was not found; we do not know why it was
not found. We need some other way to verify that all devices are
actually present to preserve current behavior.

Actually we probably need to handle redundant layout where non-essential
devices are missing but that is another topic.

> Signed-off-by: James Johnston <johnstonj.public@codenest.com>
> ---
>  grub-core/osdep/linux/getroot.c | 26 ++++++++++++++++++++------
>  1 file changed, 20 insertions(+), 6 deletions(-)
> 
> diff --git a/grub-core/osdep/linux/getroot.c b/grub-core/osdep/linux/getroot.c
> index 10480b6..852f3a2 100644
> --- a/grub-core/osdep/linux/getroot.c
> +++ b/grub-core/osdep/linux/getroot.c
> @@ -242,15 +242,29 @@ grub_find_root_devices_from_btrfs (const char *dir)
>        struct btrfs_ioctl_dev_info_args devi;
>        memset (&devi, 0, sizeof (devi));
>        devi.devid = i;
> +      /* We have to probe all possible device IDs, since btrfs doesn't
> +	 give us a list of all the valid ones.  */
>        if (ioctl (fd, BTRFS_IOC_DEV_INFO, &devi) < 0)
>  	{
> -	  close (fd);
> -	  free (ret);
> -	  return NULL;
> +	  /* ENODEV is normal, e.g. if the first device is deleted from
> +	     an array. */
> +	  int last_error = errno;
> +	  if (last_error != ENODEV)

Why this temporary variable? Why you cannot check errno directly?

> +	    {
> +	      grub_util_warn(_("btrfs device info could not be read "
> +			       "for %s, device %d: errno %d"), dir, i,
> +			       last_error);
> +	      close (fd);
> +	      free (ret);
> +	      return NULL;
> +	    }
> +	}
> +      else
> +	{
> +	  ret[j++] = xstrdup ((char *) devi.path);
> +	  if (j >= fsi.num_devices)
> +	    break;
>  	}
> -      ret[j++] = xstrdup ((char *) devi.path);
> -      if (j >= fsi.num_devices)
> -	break;
>      }
>    close (fd);
>    ret[j] = 0;
> 



^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH] getroot: Correctly handle missing btrfs device identifiers.
@ 2016-03-18  4:48 James Johnston
  0 siblings, 0 replies; 4+ messages in thread
From: James Johnston @ 2016-03-18  4:48 UTC (permalink / raw)
  To: 'The development of GNU GRUB'

Hi Andrei,

Thank you for considering and reviewing my patch.  I did a little more checking of the actual btrfs sources in kernel, and also userspace btrfs tools to try to address your concerns, since the ioctl interface is basically undocumented.

The scenario I am trying to address is a system such as my own, which reports the following:

$ btrfs fi show /
Label: 'SystemRoot'  uuid: 5c756382-8dc5-4afe-ac87-a604e9e7a9a2
        Total devices 2 FS bytes used 2.65GiB
        devid    2 size 5.60GiB used 3.53GiB path /dev/mapper/System1RootCrypt
        devid    3 size 5.60GiB used 3.53GiB path /dev/mapper/System0RootCrypt

btrfs-progs v4.0

As you can see: (1) there are only two devices, (2) they are numbered with devids #2 and #3.  There is no #1 device and if you ask for it with BTRFS_IOC_DEV_INFO, you get ENODEV.  (How did my system get this way?  I converted all the disk partitions to use LVM, one mirror at a time - this involved deleting/adding btrfs disks.  Somehow after this whole process I realized I had this new numbering scheme.)

Yet because there is no device #1, grub_find_root_devices_from_btrfs returns NULL.  This means that "grub-probe --target=device /" only prints one device, not two (and the one device is printed by fallback code that is called for when grub_find_root_devices_from_btrfs returns NULL).  That is the behavior I am trying to correct here.

> -----Original Message-----
> From: Andrei Borzenkov
> Sent: Thursday, March 17, 2016 17:12
> 
> 17.03.2016 18:41, James Johnston пишет:
> > From 97bcb9eb8c34329a8ca1ec88fd89296e273ceb24 Mon Sep 17 00:00:00 2001
> > From: James Johnston <johnstonj.public@codenest.com>
> > Date: Thu, 17 Mar 2016 15:10:49 +0000
> > Subject: [PATCH] getroot: Correctly handle missing btrfs device identifiers.
> >
> > The btrfs driver's BTRFS_IOC_FS_INFO ioctl only tells you the maximum
> > device ID and the number of devices.  It does not tell you which
> > device IDs may no longer be allocated - for example, if a device is
> > later deleted from the file system, its device ID won't be allocated
> > any more.
> >
> > You must then probe each device ID with BTRFS_IOC_DEV_INFO to see if
> > it is a valid device or not.  It is not an error condition for this
> > ioctl to return ENODEV: it seems to mean that the device was deleted
> > by the user.
> >
> 
> Kernel returns ENODEV if device was not found; we do not know why it was
> not found. We need some other way to verify that all devices are
> actually present to preserve current behavior.
> 
> Actually we probably need to handle redundant layout where non-essential
> devices are missing but that is another topic.

So, I did some investigating to see where the kernel returns ENODEV and it seems that it just returns this if there just isn't a btrfs device with that ID in the driver's list of devices.  Since the ioctl is not terribly helpful beyond ENODEV, we just have to dig into the sources to find out why. 

Note that it seems the driver list contains physically missing btrfs devices - if a btrfs device is physically missing, it is not cause to return ENODEV.

Here is the BTRFS_IOC_DEV_INFO ioctl:

https://github.com/torvalds/linux/blob/82b666eee71618b7ca812ee529af116582617dec/fs/btrfs/ioctl.c#L2746

As you can see, the function only returns ENODEV when btrfs_find_device returns NULL.  Let's examine that and find out:

https://github.com/torvalds/linux/blob/82b666eee71618b7ca812ee529af116582617dec/fs/btrfs/volumes.c#L6113

As we can see it just examines all devices for the file system and sees if any match the given devid (since we aren't comparing UUIDs).  Seems obvious enough.

What about if a device is missing?  Note that a device will still be in the list even if it is physically missing.  It just won't have a device path (i.e. null).

btrfs_find_device_missing_or_by_path function specifically returns devices marked missing:
https://github.com/torvalds/linux/blob/82b666eee71618b7ca812ee529af116582617dec/fs/btrfs/volumes.c#L2073

add_missing_dev does the obvious:
https://github.com/torvalds/linux/blob/82b666eee71618b7ca812ee529af116582617dec/fs/btrfs/volumes.c#L6133

You can search that file for the word "missing" to try to follow along and see that the driver does track missing devices.

In the userspace tools, we can look at btrfs-progs to see how they enumerate physical devices.  The load_device_info function enumerates devices in basically the same way I do in my patch - I tried to imitate the userspace tools, figuring I can't go wrong with that:
http://git.kernel.org/cgit/linux/kernel/git/kdave/btrfs-progs.git/tree/cmds-fi-usage.c?h=v4.4.1#n511
In this loop you can see that ENODEV is not considered a real error.  They just "continue" the loop to the next possible devid.  Also you see they loop to "fi_args.max_id" like how grub does, and just try all the possible IDs.

Noteworthy in the above btrfs userspace load_device_info function is that they handle physically missing devices:

		if (!dev_info.path[0]) {
			strcpy(info[ndevs].path, "missing");
		} else {
			strcpy(info[ndevs].path, (char *)dev_info.path);

You can see it when running "btrfs fi show /":

Data,RAID1: Size:3.00GiB, Used:2.53GiB
   /dev/mapper/System0RootCrypt    3.00GiB
   missing         3.00GiB

My patch does no such special check, because the original grub code didn't do a special check either.  In that regard, your request of "preserving current behavior" has been retained.  When grub-probe encounters a missing btrfs device (as opposed to an unallocated devid), I get this for "grub-probe -v --target=device /" after taking System1RootCrypt device offline:

./grub-probe: info: cannot open `/boot/grub/device.map': No such file or directory.
./grub-probe: error: failed to get canonical path of `'.

That's because our grub_find_root_devices_from_btrfs function didn't do any special handling for the null string, which signifies a missing device.

I'm not sure if this is really a desirable error to have happen.  But that's another issue for another patch I think.  (How would you like to see it handled?)

In summary:

1.  If the device is physically present, you get no error and the path is returned normally.
2.  If the device is absent (e.g. the file system was mounted degraded), you get a zero length path.
3.  If the device ID is not part of the file system to begin with, you get ENODEV.

If I'm getting any of the above wrong, please let me know. :)

> Why this temporary variable? Why you cannot check errno directly?
> 
> > +	    {
> > +	      grub_util_warn(_("btrfs device info could not be read "
> > +			       "for %s, device %d: errno %d"), dir, i,
> > +			       last_error);

Because I wanted to log it if there was a problem; this log message will hopefully help the next guy who has some problem with this function.

The "_" is presumably a call to gettext function and I don't know if gettext might do anything that would reset errno.  The safe thing is just to save it to a temporary and then call gettext via "_".

Best regards,

James Johnston




^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH] getroot: Correctly handle missing btrfs device identifiers.
@ 2016-03-17 15:32 James Johnston
  0 siblings, 0 replies; 4+ messages in thread
From: James Johnston @ 2016-03-17 15:32 UTC (permalink / raw)
  To: grub-devel

From 97bcb9eb8c34329a8ca1ec88fd89296e273ceb24 Mon Sep 17 00:00:00 2001
From: James Johnston <johnstonj.public@codenest.com>
Date: Thu, 17 Mar 2016 15:10:49 +0000
Subject: [PATCH] getroot: Correctly handle missing btrfs device identifiers.

The btrfs driver's BTRFS_IOC_FS_INFO ioctl only tells you the maximum
device ID and the number of devices.  It does not tell you which
device IDs may no longer be allocated - for example, if a device is
later deleted from the file system, its device ID won't be allocated
any more.

You must then probe each device ID with BTRFS_IOC_DEV_INFO to see if
it is a valid device or not.  It is not an error condition for this
ioctl to return ENODEV: it seems to mean that the device was deleted
by the user.

Signed-off-by: James Johnston <johnstonj.public@codenest.com>
---
 grub-core/osdep/linux/getroot.c | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/grub-core/osdep/linux/getroot.c b/grub-core/osdep/linux/getroot.c
index 10480b6..852f3a2 100644
--- a/grub-core/osdep/linux/getroot.c
+++ b/grub-core/osdep/linux/getroot.c
@@ -242,15 +242,29 @@ grub_find_root_devices_from_btrfs (const char *dir)
       struct btrfs_ioctl_dev_info_args devi;
       memset (&devi, 0, sizeof (devi));
       devi.devid = i;
+      /* We have to probe all possible device IDs, since btrfs doesn't
+	 give us a list of all the valid ones.  */
       if (ioctl (fd, BTRFS_IOC_DEV_INFO, &devi) < 0)
 	{
-	  close (fd);
-	  free (ret);
-	  return NULL;
+	  /* ENODEV is normal, e.g. if the first device is deleted from
+	     an array. */
+	  int last_error = errno;
+	  if (last_error != ENODEV)
+	    {
+	      grub_util_warn(_("btrfs device info could not be read "
+			       "for %s, device %d: errno %d"), dir, i,
+			       last_error);
+	      close (fd);
+	      free (ret);
+	      return NULL;
+	    }
+	}
+      else
+	{
+	  ret[j++] = xstrdup ((char *) devi.path);
+	  if (j >= fsi.num_devices)
+	    break;
 	}
-      ret[j++] = xstrdup ((char *) devi.path);
-      if (j >= fsi.num_devices)
-	break;
     }
   close (fd);
   ret[j] = 0;
-- 
1.9.5.msysgit.1





^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-03-18  4:48 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-17 15:41 [PATCH] getroot: Correctly handle missing btrfs device identifiers James Johnston
2016-03-17 17:11 ` Andrei Borzenkov
  -- strict thread matches above, loose matches on Subject: below --
2016-03-18  4:48 James Johnston
2016-03-17 15:32 James Johnston

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.