All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 REBASED 0/3] btrfs: fix issues due to alien device
@ 2020-04-28 15:22 Anand Jain
  2020-04-28 15:22 ` [PATCH 1/3] btrfs: drop useless goto in open_fs_devices Anand Jain
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Anand Jain @ 2020-04-28 15:22 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba, nborisov, josef

v3 REBASED: Based on the latest misc-next. for for-5.8.
   Dropped the following patches as there were concerns about the usage
   of error code -EUCLEAN
	btrfs: remove identified alien device in open_fs_devices
	btrfs: remove identified alien btrfs device in open_fs_devices

   Rmaining 3 patches here have obtained reviewed-by. With this pathset
   the pertaining fstests btrfs/197 and btrfs/198 (which tests 3 bugs)
   would pass as the patch 2/3 fixed a bug and 3/3 fixed the trigger
   of 2 other bugs (patch 1/3 is just a cleanup). Further at the moment
   I am not sure if there is any other trigger where it could again leave
   an alien device in the fs_devices leading to the same/similar bugs.

==== original email ====
v3: Fix alien device is due to wipefs in Patch4.
    Fix a nit in Patch3.
    Patches are reordered.

Alien device is a device in fs_devices list having a different fsid than
the expected fsid or no btrfs_magic. This patch set fixes issues found due
to the same.

Patch1: is a cleanup patch, not related.
Patch2: fixes failing to mount a degraded RAIDs (RAID1/5/6/10), by
	hardening the function btrfs_free_extra_devids().
Patch3: fixes the missing device (due to alien btrfs-device) not missing in
	the userland, by hardening the function btrfs_open_one_device().
Patch4: fixes the missing device (due to alien device) not missing in
	the userland, by returning EUCLEAN in btrfs_read_dev_one_super().
Patch5: eliminates the source of the alien device in the fs_devices.

PS: Fundamentally its wrong approach that btrfs-progs deduces the device
missing state in the userland instead of obtaining it from the kernel.
I remember objecting on the btrfs-progs patch which did that, but still
it got merged, bugs in p3 and p4 are its side effects. I wrote
patches to read device_state from the kernel using ioctl, procfs and
sysfs but it didn't get the due attention till a merger.

Anand Jain (3):
  btrfs: drop useless goto in open_fs_devices
  btrfs: include non-missing as a qualifier for the latest_bdev
  btrfs: free alien device due to device add

 fs/btrfs/volumes.c | 30 ++++++++++++++++++++++--------
 1 file changed, 22 insertions(+), 8 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/3] btrfs: drop useless goto in open_fs_devices
  2020-04-28 15:22 [PATCH v3 REBASED 0/3] btrfs: fix issues due to alien device Anand Jain
@ 2020-04-28 15:22 ` Anand Jain
  2020-04-30 14:09   ` David Sterba
  2020-04-28 15:22 ` [PATCH 2/3] btrfs: include non-missing as a qualifier for the latest_bdev Anand Jain
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Anand Jain @ 2020-04-28 15:22 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba, nborisov, josef

There is no need of goto out in open_fs_devices() as there is nothing
special done at %out:. So refactor it.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/volumes.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index aa1e0ae32943..bf953c4895f3 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1185,7 +1185,6 @@ static int open_fs_devices(struct btrfs_fs_devices *fs_devices,
 {
 	struct btrfs_device *device;
 	struct btrfs_device *latest_dev = NULL;
-	int ret = 0;
 
 	flags |= FMODE_EXCL;
 
@@ -1198,16 +1197,15 @@ static int open_fs_devices(struct btrfs_fs_devices *fs_devices,
 		    device->generation > latest_dev->generation)
 			latest_dev = device;
 	}
-	if (fs_devices->open_devices == 0) {
-		ret = -EINVAL;
-		goto out;
-	}
+	if (fs_devices->open_devices == 0)
+		return -EINVAL;
+
 	fs_devices->opened = 1;
 	fs_devices->latest_bdev = latest_dev->bdev;
 	fs_devices->total_rw_bytes = 0;
 	fs_devices->chunk_alloc_policy = BTRFS_CHUNK_ALLOC_REGULAR;
-out:
-	return ret;
+
+	return 0;
 }
 
 static int devid_cmp(void *priv, struct list_head *a, struct list_head *b)
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/3] btrfs: include non-missing as a qualifier for the latest_bdev
  2020-04-28 15:22 [PATCH v3 REBASED 0/3] btrfs: fix issues due to alien device Anand Jain
  2020-04-28 15:22 ` [PATCH 1/3] btrfs: drop useless goto in open_fs_devices Anand Jain
@ 2020-04-28 15:22 ` Anand Jain
  2020-04-30 13:46   ` David Sterba
  2020-04-28 15:22 ` [PATCH 3/3] btrfs: free alien device due to device add Anand Jain
  2020-04-30  6:05 ` [PATCH v3 REBASED 0/3] btrfs: fix issues due to alien device Nikolay Borisov
  3 siblings, 1 reply; 18+ messages in thread
From: Anand Jain @ 2020-04-28 15:22 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba, nborisov, josef

btrfs_free_extra_devids() reorgs fs_devices::latest_bdev
to point to the bdev with greatest device::generation number.
For a typical-missing device the generation number is zero so
fs_devices::latest_bdev will never point to it.

But if the missing device is due to alienation [1], then
device::generation is not-zero and if it is >= to rest of
device::generation in the list, then fs_devices::latest_bdev
ends up pointing to the missing device and reports the error
like this [2]

[1] We maintain devices of a fsid (as in fs_device::fsid) in the
fs_devices::devices list, a device is considered as an alien device
if its fsid does not match with the fs_device::fsid

$ mkfs.btrfs -fq /dev/sdd && mount /dev/sdd /btrfs

$ mkfs.btrfs -fq -draid1 -mraid1 /dev/sdb /dev/sdc
$ sleep 3 # avoid racing with udev's useless scans if needed
$ btrfs dev add -f /dev/sdb /btrfs
$ mount -o degraded /dev/sdc /btrfs1

[2]
mount: wrong fs type, bad option, bad superblock on /dev/sdc,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.

kernel: BTRFS warning (device sdc): devid 1 uuid 072a0192-675b-4d5a-8640-a5cf2b2c704d is missing
kernel: BTRFS error (device sdc): failed to read devices
kernel: BTRFS error (device sdc): open_ctree failed

Fix the root of the issue, by checking if the the device is not
missing before it can be a contender for the fs_devices::latest_bdev
title.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
v3-rebased: update change log.

 fs/btrfs/volumes.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 914402d3c1de..a67af16d960d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1042,6 +1042,8 @@ void btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices, int step)
 							&device->dev_state)) {
 			if (!test_bit(BTRFS_DEV_STATE_REPLACE_TGT,
 			     &device->dev_state) &&
+			    !test_bit(BTRFS_DEV_STATE_MISSING,
+			     &device->dev_state) &&
 			     (!latest_dev ||
 			      device->generation > latest_dev->generation)) {
 				latest_dev = device;
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/3] btrfs: free alien device due to device add
  2020-04-28 15:22 [PATCH v3 REBASED 0/3] btrfs: fix issues due to alien device Anand Jain
  2020-04-28 15:22 ` [PATCH 1/3] btrfs: drop useless goto in open_fs_devices Anand Jain
  2020-04-28 15:22 ` [PATCH 2/3] btrfs: include non-missing as a qualifier for the latest_bdev Anand Jain
@ 2020-04-28 15:22 ` Anand Jain
  2020-04-30 13:31   ` David Sterba
                     ` (2 more replies)
  2020-04-30  6:05 ` [PATCH v3 REBASED 0/3] btrfs: fix issues due to alien device Nikolay Borisov
  3 siblings, 3 replies; 18+ messages in thread
From: Anand Jain @ 2020-04-28 15:22 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba, nborisov, josef

When the old device has new fsid through btrfs device add -f <dev> our
fs_devices list has an alien device in one of the fs_devices. So this is
a trigger and not the root cause of the issue. This patch fixes the trigger.

By having an alien device in fs_devices, we have two issues so far

1. missing device is not shows as missing in the userland

Which is due to cracks in the function btrfs_open_one_device() and
hardened by the pending patches in the ml.
 btrfs: remove identified alien device in open_fs_devices
 btrfs: remove identified alien btrfs device in open_fs_devices

2. mount of a degraded fs_devices fails

Which is due to cracks in the function btrfs_free_extra_devids() and
hardened by patch (included in the set).
 btrfs: include non-missing as a qualifier for the latest_bdev

Now the trigger for both of this issue is that there is an alien (does not
contain the intended fsid/btrfs_magic) device in the fs_devices.

We know a device can be scanned/added through
btrfs-control::BTRFS_IOC_SCAN_DEV|BTRFS_IOC_DEVICES_READY
or by
ioctl::BTRFS_IOC_ADD_DEV

And device coming through btrfs-control is checked against the all other
devices in btrfs kernel but not coming through BTRFS_IOC_ADD_DEV.

This patch checks if the device add is alienating any other scanned
device and deletes it.

In fact, this patch fixes both the issues 1 and 2 (above) by eliminating
the trigger of the issue, but still they have their own patch as well
because its the right way to harden the functions and fill the cracks.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
v3-rebased: change log updated.

 fs/btrfs/volumes.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a67af16d960d..300ee5f0bfa2 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2665,6 +2665,19 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 
 	/* Update ctime/mtime for libblkid */
 	update_dev_time(device_path);
+
+	/*
+	 * Now that we have written a new sb into this device, check all other
+	 * fs_devices list if it alienates any scanned device.
+	 */
+	mutex_lock(&uuid_mutex);
+	/*
+	 * Ignore the return as we are successfull in the core task - to added
+	 * the device
+	 */
+	btrfs_free_stale_devices(device_path, NULL);
+	mutex_unlock(&uuid_mutex);
+
 	return ret;
 
 error_sysfs:
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 REBASED 0/3] btrfs: fix issues due to alien device
  2020-04-28 15:22 [PATCH v3 REBASED 0/3] btrfs: fix issues due to alien device Anand Jain
                   ` (2 preceding siblings ...)
  2020-04-28 15:22 ` [PATCH 3/3] btrfs: free alien device due to device add Anand Jain
@ 2020-04-30  6:05 ` Nikolay Borisov
  2020-04-30 17:54   ` Anand Jain
  3 siblings, 1 reply; 18+ messages in thread
From: Nikolay Borisov @ 2020-04-30  6:05 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs; +Cc: dsterba, josef



On 28.04.20 г. 18:22 ч., Anand Jain wrote:
> v3 REBASED: Based on the latest misc-next. for for-5.8.
>    Dropped the following patches as there were concerns about the usage
>    of error code -EUCLEAN
> 	btrfs: remove identified alien device in open_fs_devices
> 	btrfs: remove identified alien btrfs device in open_fs_devices
> 
>    Rmaining 3 patches here have obtained reviewed-by. With this pathset
>    the pertaining fstests btrfs/197 and btrfs/198 (which tests 3 bugs)
>    would pass as the patch 2/3 fixed a bug and 3/3 fixed the trigger
>    of 2 other bugs (patch 1/3 is just a cleanup). Further at the moment
>    I am not sure if there is any other trigger where it could again leave
>    an alien device in the fs_devices leading to the same/similar bugs.
> 
> ==== original email ====
> v3: Fix alien device is due to wipefs in Patch4.
>     Fix a nit in Patch3.
>     Patches are reordered.
> 
> Alien device is a device in fs_devices list having a different fsid than
> the expected fsid or no btrfs_magic. This patch set fixes issues found due
> to the same.
> 
> Patch1: is a cleanup patch, not related.
> Patch2: fixes failing to mount a degraded RAIDs (RAID1/5/6/10), by
> 	hardening the function btrfs_free_extra_devids().
> Patch3: fixes the missing device (due to alien btrfs-device) not missing in
> 	the userland, by hardening the function btrfs_open_one_device().
> Patch4: fixes the missing device (due to alien device) not missing in
> 	the userland, by returning EUCLEAN in btrfs_read_dev_one_super().
> Patch5: eliminates the source of the alien device in the fs_devices.
> 
> PS: Fundamentally its wrong approach that btrfs-progs deduces the device
> missing state in the userland instead of obtaining it from the kernel.
> I remember objecting on the btrfs-progs patch which did that, but still
> it got merged, bugs in p3 and p4 are its side effects. I wrote
> patches to read device_state from the kernel using ioctl, procfs and
> sysfs but it didn't get the due attention till a merger.
> 
> Anand Jain (3):
>   btrfs: drop useless goto in open_fs_devices
>   btrfs: include non-missing as a qualifier for the latest_bdev
>   btrfs: free alien device due to device add
> 
>  fs/btrfs/volumes.c | 30 ++++++++++++++++++++++--------
>  1 file changed, 22 insertions(+), 8 deletions(-)
> 


One thing I'm not clear is how can we get into a situation of an alien
device. I.e devices should be in fs_devices iff they are part of the
filesystem, no ?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 REBASED 0/3] btrfs: fix issues due to alien device
  2020-04-30 17:54   ` Anand Jain
@ 2020-04-30 10:28     ` Nikolay Borisov
  2020-05-01 19:45       ` Anand Jain
  0 siblings, 1 reply; 18+ messages in thread
From: Nikolay Borisov @ 2020-04-30 10:28 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs; +Cc: dsterba, josef



On 30.04.20 г. 20:54 ч., Anand Jain wrote:
> 
> 
> On 30/4/20 2:05 pm, Nikolay Borisov wrote:
>>
>>
>> On 28.04.20 г. 18:22 ч., Anand Jain wrote:
>>> v3 REBASED: Based on the latest misc-next. for for-5.8.
>>>     Dropped the following patches as there were concerns about the usage
>>>     of error code -EUCLEAN
>>>     btrfs: remove identified alien device in open_fs_devices
>>>     btrfs: remove identified alien btrfs device in open_fs_devices
>>>
>>>     Rmaining 3 patches here have obtained reviewed-by. With this pathset
>>>     the pertaining fstests btrfs/197 and btrfs/198 (which tests 3 bugs)
>>>     would pass as the patch 2/3 fixed a bug and 3/3 fixed the trigger
>>>     of 2 other bugs (patch 1/3 is just a cleanup). Further at the moment
>>>     I am not sure if there is any other trigger where it could again
>>> leave
>>>     an alien device in the fs_devices leading to the same/similar bugs.
>>>
>>> ==== original email ====
>>> v3: Fix alien device is due to wipefs in Patch4.
>>>      Fix a nit in Patch3.
>>>      Patches are reordered.
>>>
>>> Alien device is a device in fs_devices list having a different fsid than
>>> the expected fsid or no btrfs_magic. This patch set fixes issues
>>> found due
>>> to the same.
>>>
>>> Patch1: is a cleanup patch, not related.
>>> Patch2: fixes failing to mount a degraded RAIDs (RAID1/5/6/10), by
>>>     hardening the function btrfs_free_extra_devids().
>>> Patch3: fixes the missing device (due to alien btrfs-device) not
>>> missing in
>>>     the userland, by hardening the function btrfs_open_one_device().
>>> Patch4: fixes the missing device (due to alien device) not missing in
>>>     the userland, by returning EUCLEAN in btrfs_read_dev_one_super().
>>> Patch5: eliminates the source of the alien device in the fs_devices.
>>>
>>> PS: Fundamentally its wrong approach that btrfs-progs deduces the device
>>> missing state in the userland instead of obtaining it from the kernel.
>>> I remember objecting on the btrfs-progs patch which did that, but still
>>> it got merged, bugs in p3 and p4 are its side effects. I wrote
>>> patches to read device_state from the kernel using ioctl, procfs and
>>> sysfs but it didn't get the due attention till a merger.
>>>
>>> Anand Jain (3):
>>>    btrfs: drop useless goto in open_fs_devices
>>>    btrfs: include non-missing as a qualifier for the latest_bdev
>>>    btrfs: free alien device due to device add
>>>
>>>   fs/btrfs/volumes.c | 30 ++++++++++++++++++++++--------
>>>   1 file changed, 22 insertions(+), 8 deletions(-)
>>>
>>
>>
>> One thing I'm not clear is how can we get into a situation of an alien
>> device. I.e devices should be in fs_devices iff they are part of the>
>> filesystem, no ?
>>
> 
> I think you are missing the point that, when the devices (of a
> raid1/raid5/raid6) are unmounted, we don't free any of their
> fs_devices::device. So in this situation if one of those devices is
> added to any another fsid (using btrfs device add) or wiped using wipefs
> -a, we still don't free the device's former fs_devices::device entry in
> the kernel and it acts as an alien device among its former partners when
> it is mounting.

So it could happen only due to a deliberate action by the user
and not during normal operation. In this case is it not the user's
responsibility to remove/forget the device from that file system?


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] btrfs: free alien device due to device add
  2020-04-28 15:22 ` [PATCH 3/3] btrfs: free alien device due to device add Anand Jain
@ 2020-04-30 13:31   ` David Sterba
  2020-05-01 20:01     ` Anand Jain
  2020-05-04 18:58   ` [PATCH v4 2/3] btrfs: include non-missing as a qualifier for the latest_bdev Anand Jain
  2020-05-04 18:58   ` [PATCH v4 3/3] btrfs: free alien device due to device add Anand Jain
  2 siblings, 1 reply; 18+ messages in thread
From: David Sterba @ 2020-04-30 13:31 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs, dsterba, nborisov, josef

On Tue, Apr 28, 2020 at 11:22:27PM +0800, Anand Jain wrote:
> When the old device has new fsid through btrfs device add -f <dev> our
> fs_devices list has an alien device in one of the fs_devices. So this is
> a trigger and not the root cause of the issue. This patch fixes the trigger.
> 
> By having an alien device in fs_devices, we have two issues so far
> 
> 1. missing device is not shows as missing in the userland
> 
> Which is due to cracks in the function btrfs_open_one_device() and
> hardened by the pending patches in the ml.
>  btrfs: remove identified alien device in open_fs_devices
>  btrfs: remove identified alien btrfs device in open_fs_devices
> 
> 2. mount of a degraded fs_devices fails
> 
> Which is due to cracks in the function btrfs_free_extra_devids() and
> hardened by patch (included in the set).
>  btrfs: include non-missing as a qualifier for the latest_bdev
> 
> Now the trigger for both of this issue is that there is an alien (does not
> contain the intended fsid/btrfs_magic) device in the fs_devices.
> 
> We know a device can be scanned/added through
> btrfs-control::BTRFS_IOC_SCAN_DEV|BTRFS_IOC_DEVICES_READY
> or by
> ioctl::BTRFS_IOC_ADD_DEV
> 
> And device coming through btrfs-control is checked against the all other
> devices in btrfs kernel but not coming through BTRFS_IOC_ADD_DEV.
> 
> This patch checks if the device add is alienating any other scanned
> device and deletes it.
> 
> In fact, this patch fixes both the issues 1 and 2 (above) by eliminating
> the trigger of the issue, but still they have their own patch as well
> because its the right way to harden the functions and fill the cracks.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> ---
> v3-rebased: change log updated.
> 
>  fs/btrfs/volumes.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index a67af16d960d..300ee5f0bfa2 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2665,6 +2665,19 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>  
>  	/* Update ctime/mtime for libblkid */
>  	update_dev_time(device_path);
> +
> +	/*
> +	 * Now that we have written a new sb into this device, check all other
> +	 * fs_devices list if it alienates any scanned device.
> +	 */
> +	mutex_lock(&uuid_mutex);
> +	/*
> +	 * Ignore the return as we are successfull in the core task - to added
> +	 * the device
> +	 */
> +	btrfs_free_stale_devices(device_path, NULL);

So this is open coding btrfs_forget_devices, so the helper should be
used.

As there's NULL passed, all stale devices will be removed from the list,
but we can remove just the device being added, no? And before the whole
operation starts, not after. The closest moment is before
btrfs_commit_transaction, that's where the superblock gets overwritten.

> +	mutex_unlock(&uuid_mutex);
> +
>  	return ret;
>  
>  error_sysfs:
> -- 
> 2.23.0

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] btrfs: include non-missing as a qualifier for the latest_bdev
  2020-04-28 15:22 ` [PATCH 2/3] btrfs: include non-missing as a qualifier for the latest_bdev Anand Jain
@ 2020-04-30 13:46   ` David Sterba
  2020-05-01 22:54     ` Anand Jain
  0 siblings, 1 reply; 18+ messages in thread
From: David Sterba @ 2020-04-30 13:46 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs, dsterba, nborisov, josef

On Tue, Apr 28, 2020 at 11:22:26PM +0800, Anand Jain wrote:
> btrfs_free_extra_devids() reorgs fs_devices::latest_bdev
> to point to the bdev with greatest device::generation number.
> For a typical-missing device the generation number is zero so
> fs_devices::latest_bdev will never point to it.
> 
> But if the missing device is due to alienation [1], then
> device::generation is not-zero and if it is >= to rest of
> device::generation in the list, then fs_devices::latest_bdev
> ends up pointing to the missing device and reports the error
> like this [2]
> 
> [1] We maintain devices of a fsid (as in fs_device::fsid) in the
> fs_devices::devices list, a device is considered as an alien device
> if its fsid does not match with the fs_device::fsid
> 
> $ mkfs.btrfs -fq /dev/sdd && mount /dev/sdd /btrfs

Please put each command on one line for clarity

> $ mkfs.btrfs -fq -draid1 -mraid1 /dev/sdb /dev/sdc
> $ sleep 3 # avoid racing with udev's useless scans if needed
> $ btrfs dev add -f /dev/sdb /btrfs
> $ mount -o degraded /dev/sdc /btrfs1

So the cause is a second mkfs on some devices, but is the degraded mount
supposed to work? The example goes:

- create first filesystem with device A
- create second filesystem with device B and C
- add device B to the first filesystem, effectively making it missing
- mount first filesystem, degraded because of the missing device

For a reproducer that's ok, but is this something that we can expect to
happen in practice? The flag -f should prevent accidental overwrite, but
yes the kernel code needs to deal with that in any case.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] btrfs: drop useless goto in open_fs_devices
  2020-04-28 15:22 ` [PATCH 1/3] btrfs: drop useless goto in open_fs_devices Anand Jain
@ 2020-04-30 14:09   ` David Sterba
  0 siblings, 0 replies; 18+ messages in thread
From: David Sterba @ 2020-04-30 14:09 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs, dsterba, nborisov, josef

On Tue, Apr 28, 2020 at 11:22:25PM +0800, Anand Jain wrote:
> There is no need of goto out in open_fs_devices() as there is nothing
> special done at %out:. So refactor it.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>

As this is independent cleanup, I'll add it to misc-next, thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 REBASED 0/3] btrfs: fix issues due to alien device
  2020-04-30  6:05 ` [PATCH v3 REBASED 0/3] btrfs: fix issues due to alien device Nikolay Borisov
@ 2020-04-30 17:54   ` Anand Jain
  2020-04-30 10:28     ` Nikolay Borisov
  0 siblings, 1 reply; 18+ messages in thread
From: Anand Jain @ 2020-04-30 17:54 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs; +Cc: dsterba, josef



On 30/4/20 2:05 pm, Nikolay Borisov wrote:
> 
> 
> On 28.04.20 г. 18:22 ч., Anand Jain wrote:
>> v3 REBASED: Based on the latest misc-next. for for-5.8.
>>     Dropped the following patches as there were concerns about the usage
>>     of error code -EUCLEAN
>> 	btrfs: remove identified alien device in open_fs_devices
>> 	btrfs: remove identified alien btrfs device in open_fs_devices
>>
>>     Rmaining 3 patches here have obtained reviewed-by. With this pathset
>>     the pertaining fstests btrfs/197 and btrfs/198 (which tests 3 bugs)
>>     would pass as the patch 2/3 fixed a bug and 3/3 fixed the trigger
>>     of 2 other bugs (patch 1/3 is just a cleanup). Further at the moment
>>     I am not sure if there is any other trigger where it could again leave
>>     an alien device in the fs_devices leading to the same/similar bugs.
>>
>> ==== original email ====
>> v3: Fix alien device is due to wipefs in Patch4.
>>      Fix a nit in Patch3.
>>      Patches are reordered.
>>
>> Alien device is a device in fs_devices list having a different fsid than
>> the expected fsid or no btrfs_magic. This patch set fixes issues found due
>> to the same.
>>
>> Patch1: is a cleanup patch, not related.
>> Patch2: fixes failing to mount a degraded RAIDs (RAID1/5/6/10), by
>> 	hardening the function btrfs_free_extra_devids().
>> Patch3: fixes the missing device (due to alien btrfs-device) not missing in
>> 	the userland, by hardening the function btrfs_open_one_device().
>> Patch4: fixes the missing device (due to alien device) not missing in
>> 	the userland, by returning EUCLEAN in btrfs_read_dev_one_super().
>> Patch5: eliminates the source of the alien device in the fs_devices.
>>
>> PS: Fundamentally its wrong approach that btrfs-progs deduces the device
>> missing state in the userland instead of obtaining it from the kernel.
>> I remember objecting on the btrfs-progs patch which did that, but still
>> it got merged, bugs in p3 and p4 are its side effects. I wrote
>> patches to read device_state from the kernel using ioctl, procfs and
>> sysfs but it didn't get the due attention till a merger.
>>
>> Anand Jain (3):
>>    btrfs: drop useless goto in open_fs_devices
>>    btrfs: include non-missing as a qualifier for the latest_bdev
>>    btrfs: free alien device due to device add
>>
>>   fs/btrfs/volumes.c | 30 ++++++++++++++++++++++--------
>>   1 file changed, 22 insertions(+), 8 deletions(-)
>>
> 
> 
> One thing I'm not clear is how can we get into a situation of an alien
> device. I.e devices should be in fs_devices iff they are part of the> filesystem, no ?
> 

I think you are missing the point that, when the devices (of a
raid1/raid5/raid6) are unmounted, we don't free any of their
fs_devices::device. So in this situation if one of those devices is
added to any another fsid (using btrfs device add) or wiped using wipefs
-a, we still don't free the device's former fs_devices::device entry in
the kernel and it acts as an alien device among its former partners when
it is mounting.
Now when this former partner(s) is(are) trying to mount in degraded, the
mount thread reads this device's SB (which is now an alien) and gets new
fsid+devid. The function btrfs_open_one_device() sees the non-matching
fsid+devid and it just ignores and still does not free the
fs_devices::device. If it had done, then the add_missing_dev() could
have allocated a fresh fs_devices::device using the data in the
chunk-tree instead of just setting the device state as missing in the
function read_one_dev().

Please look at fstests btrfs/197 and btrfs/198 they tests these
scenarios.

There are three bugs.
  1. raid1 mount -o degraded failed (patch 2/3 fixes it) (this bug is
     not related to the alien device issue, but we need this patch
     to continue testing).
  2. btrfs fi show -m does not show the missing device after mount of
     the degraded raid1.
    a. Bug triggered by btrfs device add. Fixed by the dropped patch
       [patch] btrfs: remove identified alien btrfs device in 
open_fs_devices
       root cause is btrfs_open_one_device() did not free the known
       alien device which contains the btrfs_magic but does not contain
       the required fsid+devid.
    b. Bug triggered by wipefs -a. Fixed by the dropped patch
       [patch] btrfs: remove identified alien device in open_fs_devices
       root cause is mount thread does not free the identified alien
       (not even contains the btrfs_magic) device.
3. Device add ioctl trigger. (patch 3/3 fixes it).

We still need those two dropped patches. They are stuck with the
-EUCLEAN usage. It can be discussed and sent as separate patches.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 REBASED 0/3] btrfs: fix issues due to alien device
  2020-04-30 10:28     ` Nikolay Borisov
@ 2020-05-01 19:45       ` Anand Jain
  0 siblings, 0 replies; 18+ messages in thread
From: Anand Jain @ 2020-05-01 19:45 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs; +Cc: dsterba, josef



On 30/4/20 6:28 pm, Nikolay Borisov wrote:
> 
> 
> On 30.04.20 г. 20:54 ч., Anand Jain wrote:
>>
>>
>> On 30/4/20 2:05 pm, Nikolay Borisov wrote:
>>>
>>>
>>> On 28.04.20 г. 18:22 ч., Anand Jain wrote:
>>>> v3 REBASED: Based on the latest misc-next. for for-5.8.
>>>>      Dropped the following patches as there were concerns about the usage
>>>>      of error code -EUCLEAN
>>>>      btrfs: remove identified alien device in open_fs_devices
>>>>      btrfs: remove identified alien btrfs device in open_fs_devices
>>>>
>>>>      Rmaining 3 patches here have obtained reviewed-by. With this pathset
>>>>      the pertaining fstests btrfs/197 and btrfs/198 (which tests 3 bugs)
>>>>      would pass as the patch 2/3 fixed a bug and 3/3 fixed the trigger
>>>>      of 2 other bugs (patch 1/3 is just a cleanup). Further at the moment
>>>>      I am not sure if there is any other trigger where it could again
>>>> leave
>>>>      an alien device in the fs_devices leading to the same/similar bugs.
>>>>
>>>> ==== original email ====
>>>> v3: Fix alien device is due to wipefs in Patch4.
>>>>       Fix a nit in Patch3.
>>>>       Patches are reordered.
>>>>
>>>> Alien device is a device in fs_devices list having a different fsid than
>>>> the expected fsid or no btrfs_magic. This patch set fixes issues
>>>> found due
>>>> to the same.
>>>>
>>>> Patch1: is a cleanup patch, not related.
>>>> Patch2: fixes failing to mount a degraded RAIDs (RAID1/5/6/10), by
>>>>      hardening the function btrfs_free_extra_devids().
>>>> Patch3: fixes the missing device (due to alien btrfs-device) not
>>>> missing in
>>>>      the userland, by hardening the function btrfs_open_one_device().
>>>> Patch4: fixes the missing device (due to alien device) not missing in
>>>>      the userland, by returning EUCLEAN in btrfs_read_dev_one_super().
>>>> Patch5: eliminates the source of the alien device in the fs_devices.
>>>>
>>>> PS: Fundamentally its wrong approach that btrfs-progs deduces the device
>>>> missing state in the userland instead of obtaining it from the kernel.
>>>> I remember objecting on the btrfs-progs patch which did that, but still
>>>> it got merged, bugs in p3 and p4 are its side effects. I wrote
>>>> patches to read device_state from the kernel using ioctl, procfs and
>>>> sysfs but it didn't get the due attention till a merger.
>>>>
>>>> Anand Jain (3):
>>>>     btrfs: drop useless goto in open_fs_devices
>>>>     btrfs: include non-missing as a qualifier for the latest_bdev
>>>>     btrfs: free alien device due to device add
>>>>
>>>>    fs/btrfs/volumes.c | 30 ++++++++++++++++++++++--------
>>>>    1 file changed, 22 insertions(+), 8 deletions(-)
>>>>
>>>
>>>
>>> One thing I'm not clear is how can we get into a situation of an alien
>>> device. I.e devices should be in fs_devices iff they are part of the>
>>> filesystem, no ?
>>>
>>
>> I think you are missing the point that, when the devices (of a
>> raid1/raid5/raid6) are unmounted, we don't free any of their
>> fs_devices::device. So in this situation if one of those devices is
>> added to any another fsid (using btrfs device add) or wiped using wipefs
>> -a, we still don't free the device's former fs_devices::device entry in
>> the kernel and it acts as an alien device among its former partners when
>> it is mounting.
> 
> So it could happen only due to a deliberate action by the user
> and not during normal operation. In this case is it not the user's
> responsibility to remove/forget the device from that file system?
> 

Anyone can run into it. I did. Instead of confusing them with showing
that a device belongs to two FSID its better to fix. A device can't
belong to two fsid.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] btrfs: free alien device due to device add
  2020-04-30 13:31   ` David Sterba
@ 2020-05-01 20:01     ` Anand Jain
  2020-05-05 17:02       ` David Sterba
  0 siblings, 1 reply; 18+ messages in thread
From: Anand Jain @ 2020-05-01 20:01 UTC (permalink / raw)
  To: dsterba, linux-btrfs, dsterba, nborisov, josef



On 30/4/20 9:31 pm, David Sterba wrote:
> On Tue, Apr 28, 2020 at 11:22:27PM +0800, Anand Jain wrote:
>> When the old device has new fsid through btrfs device add -f <dev> our
>> fs_devices list has an alien device in one of the fs_devices. So this is
>> a trigger and not the root cause of the issue. This patch fixes the trigger.
>>
>> By having an alien device in fs_devices, we have two issues so far
>>
>> 1. missing device is not shows as missing in the userland
>>
>> Which is due to cracks in the function btrfs_open_one_device() and
>> hardened by the pending patches in the ml.
>>   btrfs: remove identified alien device in open_fs_devices
>>   btrfs: remove identified alien btrfs device in open_fs_devices
>>
>> 2. mount of a degraded fs_devices fails
>>
>> Which is due to cracks in the function btrfs_free_extra_devids() and
>> hardened by patch (included in the set).
>>   btrfs: include non-missing as a qualifier for the latest_bdev
>>
>> Now the trigger for both of this issue is that there is an alien (does not
>> contain the intended fsid/btrfs_magic) device in the fs_devices.
>>
>> We know a device can be scanned/added through
>> btrfs-control::BTRFS_IOC_SCAN_DEV|BTRFS_IOC_DEVICES_READY
>> or by
>> ioctl::BTRFS_IOC_ADD_DEV
>>
>> And device coming through btrfs-control is checked against the all other
>> devices in btrfs kernel but not coming through BTRFS_IOC_ADD_DEV.
>>
>> This patch checks if the device add is alienating any other scanned
>> device and deletes it.
>>
>> In fact, this patch fixes both the issues 1 and 2 (above) by eliminating
>> the trigger of the issue, but still they have their own patch as well
>> because its the right way to harden the functions and fill the cracks.
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
>> ---
>> v3-rebased: change log updated.
>>
>>   fs/btrfs/volumes.c | 13 +++++++++++++
>>   1 file changed, 13 insertions(+)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index a67af16d960d..300ee5f0bfa2 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -2665,6 +2665,19 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>>   
>>   	/* Update ctime/mtime for libblkid */
>>   	update_dev_time(device_path);
>> +
>> +	/*
>> +	 * Now that we have written a new sb into this device, check all other
>> +	 * fs_devices list if it alienates any scanned device.
>> +	 */
>> +	mutex_lock(&uuid_mutex);
>> +	/*
>> +	 * Ignore the return as we are successfull in the core task - to added
>> +	 * the device
>> +	 */
>> +	btrfs_free_stale_devices(device_path, NULL);
> 
> So this is open coding btrfs_forget_devices, so the helper should be
> used.
> 

  Ah. I didn't notice that. Will fix.

> As there's NULL passed, 

  NULL is passed to the 2nd argument skip_device

> all stale devices will be removed from the list,

  No, It means it does not have any particular device to skip.
  Added device is already part of mounted fs_device list,
  the loop skips its check. So no need to skip_device.

> but we can remove just the device being added, no?

  It does exactly that.
  btrfs_free_stale_devices(device_path, NULL);

  It removes the device from all other fs_devices which are _unmounted_.

> And before the whole
> operation starts, not after. 

  What if the add fails? Then we have to add scanned device back to avoid
  that mess. why not remove after we have successfully add the device
  to the mounted fsid.

> The closest moment is before
> btrfs_commit_transaction, that's where the superblock gets overwritten.
> 

Thanks, Anand


>> +	mutex_unlock(&uuid_mutex);
>> +
>>   	return ret;
>>   
>>   error_sysfs:
>> -- 
>> 2.23.0

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] btrfs: include non-missing as a qualifier for the latest_bdev
  2020-04-30 13:46   ` David Sterba
@ 2020-05-01 22:54     ` Anand Jain
  0 siblings, 0 replies; 18+ messages in thread
From: Anand Jain @ 2020-05-01 22:54 UTC (permalink / raw)
  To: dsterba, linux-btrfs, dsterba, nborisov, josef



On 30/4/20 9:46 pm, David Sterba wrote:
> On Tue, Apr 28, 2020 at 11:22:26PM +0800, Anand Jain wrote:
>> btrfs_free_extra_devids() reorgs fs_devices::latest_bdev
>> to point to the bdev with greatest device::generation number.
>> For a typical-missing device the generation number is zero so
>> fs_devices::latest_bdev will never point to it.
>>
>> But if the missing device is due to alienation [1], then
>> device::generation is not-zero and if it is >= to rest of
>> device::generation in the list, then fs_devices::latest_bdev
>> ends up pointing to the missing device and reports the error
>> like this [2]
>>
>> [1] We maintain devices of a fsid (as in fs_device::fsid) in the
>> fs_devices::devices list, a device is considered as an alien device
>> if its fsid does not match with the fs_device::fsid
>>
>> $ mkfs.btrfs -fq /dev/sdd && mount /dev/sdd /btrfs
> 
> Please put each command on one line for clarity
> 

yep.

>> $ mkfs.btrfs -fq -draid1 -mraid1 /dev/sdb /dev/sdc
>> $ sleep 3 # avoid racing with udev's useless scans if needed
>> $ btrfs dev add -f /dev/sdb /btrfs
>> $ mount -o degraded /dev/sdc /btrfs1
> 
> So the cause is a second mkfs on some devices, but is the degraded mount
> supposed to work? The example goes:
> 
Yes. It must work. We don't know if the user is mounting B just after
mkfs or if it already contains some data.

> - create first filesystem with device A
> - create second filesystem with device B and C
> - add device B to the first filesystem, effectively making it missing
> - mount first filesystem, degraded because of the missing device
> 
> For a reproducer that's ok, but is this something that we can expect to
> happen in practice? The flag -f should prevent accidental overwrite, but
> yes the kernel code needs to deal with that in any case.
> 

Its a configuration related, so is left the user how they arrive at
their understanding of what configuration is suitable for them. Yes its
better to fix loop holes in its path. I encountered it when testing
something else.




^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v4 2/3] btrfs: include non-missing as a qualifier for the latest_bdev
  2020-04-28 15:22 ` [PATCH 3/3] btrfs: free alien device due to device add Anand Jain
  2020-04-30 13:31   ` David Sterba
@ 2020-05-04 18:58   ` Anand Jain
  2020-05-04 18:58   ` [PATCH v4 3/3] btrfs: free alien device due to device add Anand Jain
  2 siblings, 0 replies; 18+ messages in thread
From: Anand Jain @ 2020-05-04 18:58 UTC (permalink / raw)
  To: linux-btrfs

btrfs_free_extra_devids() reorgs fs_devices::latest_bdev
to point to the bdev with greatest device::generation number.
For a typical-missing device the generation number is zero so
fs_devices::latest_bdev will never point to it.

But if the missing device is due to alienation [1], then
device::generation is not-zero and if it is >= to rest of
device::generation in the list, then fs_devices::latest_bdev
ends up pointing to the missing device and reports the error
like this [2]

[1] We maintain devices of a fsid (as in fs_device::fsid) in the
fs_devices::devices list, a device is considered as an alien device
if its fsid does not match with the fs_device::fsid

Consider a working raid1.
$ mkfs.btrfs -fq -draid1 -mraid1 /dev/sdb /dev/sdc
$ mount /dev/sdb /btrfs
$ umount /btrfs

While raid1 was unmounted the user decides to force add one its
device to another btrfs fs.

$ mkfs.btrfs -fq /dev/sdd
$ mount /dev/sdd /btrfs1
$ btrfs dev add -f /dev/sdb /btrfs1

Now the original raid1 fails to mount in degraded, because
fs_devices::latest_bdev is pointing to the alien device.

$ mount -o degraded /dev/sdc /btrfs

[2]
mount: wrong fs type, bad option, bad superblock on /dev/sdc,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.

kernel: BTRFS warning (device sdc): devid 1 uuid 072a0192-675b-4d5a-8640-a5cf2b2c704d is missing
kernel: BTRFS error (device sdc): failed to read devices
kernel: BTRFS error (device sdc): open_ctree failed

Fix the root of the issue, by checking if the the device is not
missing before it can be a contender for the fs_devices::latest_bdev
title.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
v3-rebased: update change log.
v4: update change log.

 fs/btrfs/volumes.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index bf953c4895f3..f93739afe681 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1042,6 +1042,8 @@ void btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices, int step)
 							&device->dev_state)) {
 			if (!test_bit(BTRFS_DEV_STATE_REPLACE_TGT,
 			     &device->dev_state) &&
+			    !test_bit(BTRFS_DEV_STATE_MISSING,
+			     &device->dev_state) &&
 			     (!latest_dev ||
 			      device->generation > latest_dev->generation)) {
 				latest_dev = device;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v4 3/3] btrfs: free alien device due to device add
  2020-04-28 15:22 ` [PATCH 3/3] btrfs: free alien device due to device add Anand Jain
  2020-04-30 13:31   ` David Sterba
  2020-05-04 18:58   ` [PATCH v4 2/3] btrfs: include non-missing as a qualifier for the latest_bdev Anand Jain
@ 2020-05-04 18:58   ` Anand Jain
  2020-05-05 19:34     ` David Sterba
  2 siblings, 1 reply; 18+ messages in thread
From: Anand Jain @ 2020-05-04 18:58 UTC (permalink / raw)
  To: linux-btrfs

When the old device has new fsid through btrfs device add -f <dev> our
fs_devices list has an alien device in one of the fs_devices. So this is
a trigger and not the root cause of the issue. This patch fixes the trigger.

By having an alien device in fs_devices, we have two issues so far

1. missing device is not shows as missing in the userland

Which is due to cracks in the function btrfs_open_one_device() and
hardened by the pending patches in the ml.
 btrfs: remove identified alien device in open_fs_devices
 btrfs: remove identified alien btrfs device in open_fs_devices

2. mount of a degraded fs_devices fails

Which is due to cracks in the function btrfs_free_extra_devids() and
hardened by patch (included in the set).
 btrfs: include non-missing as a qualifier for the latest_bdev

The trigger for both of this issue is that there is an alien (does not
contain the intended fsid/btrfs_magic) device in the fs_devices.

We know a device can be scanned/added through
btrfs-control::BTRFS_IOC_SCAN_DEV|BTRFS_IOC_DEVICES_READY
or by
ioctl::BTRFS_IOC_ADD_DEV

And device coming through btrfs-control is checked against the all other
devices in btrfs kernel but not coming through BTRFS_IOC_ADD_DEV.

This patch checks if the device add is alienating any other scanned
device and deletes it.

In fact, this patch fixes both the issues 1 and 2 (above) by eliminating
the trigger of the issue, but still they have their own patch as well
because its the right way to harden the functions and fill the cracks.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
v3-rebased: change log updated.
v4: use the helper btrfs_forget_devices().

 fs/btrfs/volumes.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f93739afe681..8e6f110b375d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2664,6 +2664,15 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 
 	/* Update ctime/mtime for libblkid */
 	update_dev_time(device_path);
+
+	/*
+	 * Now that we have written a new sb into this device, check all other
+	 * fs_devices list if device_path alienates any other scanned device.
+	 * Ignore the return as we are successfull in the core task - add
+	 * the device.
+	 */
+	btrfs_forget_devices(device_path);
+
 	return ret;
 
 error_sysfs:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] btrfs: free alien device due to device add
  2020-05-01 20:01     ` Anand Jain
@ 2020-05-05 17:02       ` David Sterba
  0 siblings, 0 replies; 18+ messages in thread
From: David Sterba @ 2020-05-05 17:02 UTC (permalink / raw)
  To: Anand Jain; +Cc: dsterba, linux-btrfs, dsterba, nborisov, josef

On Sat, May 02, 2020 at 04:01:28AM +0800, Anand Jain wrote:
>   Ah. I didn't notice that. Will fix.
> 
> > As there's NULL passed, 
> 
>   NULL is passed to the 2nd argument skip_device
> 
> > all stale devices will be removed from the list,
> 
>   No, It means it does not have any particular device to skip.
>   Added device is already part of mounted fs_device list,
>   the loop skips its check. So no need to skip_device.
> 
> > but we can remove just the device being added, no?
> 
>   It does exactly that.
>   btrfs_free_stale_devices(device_path, NULL);
> 
>   It removes the device from all other fs_devices which are _unmounted_.

Right, I got it wrong.

> > And before the whole
> > operation starts, not after. 
> 
>   What if the add fails? Then we have to add scanned device back to avoid
>   that mess. why not remove after we have successfully add the device
>   to the mounted fsid.

As there's no synchronization, there will be either need for a rollback
action or stale information after the change is permanent (device added
to new filesystem) but no fs_devices update happens.

The difference is that when the device is removed first, the rollback
happens only in exceptional cases, when the commit fails.

OTOH the time window between commit and removal happens each time. The
ohly hope here is that the window is really short so there's practically
zero chance of another process to jump in trying to use the device.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 3/3] btrfs: free alien device due to device add
  2020-05-04 18:58   ` [PATCH v4 3/3] btrfs: free alien device due to device add Anand Jain
@ 2020-05-05 19:34     ` David Sterba
  2020-05-05 23:40       ` Anand Jain
  0 siblings, 1 reply; 18+ messages in thread
From: David Sterba @ 2020-05-05 19:34 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs

On Tue, May 05, 2020 at 02:58:26AM +0800, Anand Jain wrote:
> When the old device has new fsid through btrfs device add -f <dev> our
> fs_devices list has an alien device in one of the fs_devices. So this is
> a trigger and not the root cause of the issue. This patch fixes the trigger.
> 
> By having an alien device in fs_devices, we have two issues so far
> 
> 1. missing device is not shows as missing in the userland
> 
> Which is due to cracks in the function btrfs_open_one_device() and
> hardened by the pending patches in the ml.
>  btrfs: remove identified alien device in open_fs_devices
>  btrfs: remove identified alien btrfs device in open_fs_devices

Such notes do not belong to changelog, the subjects don't say enough
about the problem or the fix.

> 2. mount of a degraded fs_devices fails
> 
> Which is due to cracks in the function btrfs_free_extra_devids() and
> hardened by patch (included in the set).
>  btrfs: include non-missing as a qualifier for the latest_bdev

Same.

> The trigger for both of this issue is that there is an alien (does not
> contain the intended fsid/btrfs_magic) device in the fs_devices.
> 
> We know a device can be scanned/added through
> btrfs-control::BTRFS_IOC_SCAN_DEV|BTRFS_IOC_DEVICES_READY
> or by
> ioctl::BTRFS_IOC_ADD_DEV
> 
> And device coming through btrfs-control is checked against the all other
> devices in btrfs kernel but not coming through BTRFS_IOC_ADD_DEV.
> 
> This patch checks if the device add is alienating any other scanned
> device and deletes it.
> 
> In fact, this patch fixes both the issues 1 and 2 (above) by eliminating
> the trigger of the issue, but still they have their own patch as well
> because its the right way to harden the functions and fill the cracks.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>

So I'm not sure how much the review holds when the code changes between
iterations. The right way is to drop the rev-by and CC the person for
any non-trivial or maybe for all changes.

I also rewrote the changelog as it's sometimes hard to understand what
you mean, I had most problems with the use of the term 'trigger'.

> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2664,6 +2664,15 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>  
>  	/* Update ctime/mtime for libblkid */
>  	update_dev_time(device_path);
> +
> +	/*
> +	 * Now that we have written a new sb into this device, check all other
> +	 * fs_devices list if device_path alienates any other scanned device.
> +	 * Ignore the return as we are successfull in the core task - add
> +	 * the device.
> +	 */
> +	btrfs_forget_devices(device_path);

Actually this should go before update_dev_time, as this is the point
when the file change could be detected by udev and rescanned, which
increases the window between commit and removal from fs_devices.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 3/3] btrfs: free alien device due to device add
  2020-05-05 19:34     ` David Sterba
@ 2020-05-05 23:40       ` Anand Jain
  0 siblings, 0 replies; 18+ messages in thread
From: Anand Jain @ 2020-05-05 23:40 UTC (permalink / raw)
  To: dsterba, linux-btrfs




>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> 
> So I'm not sure how much the review holds when the code changes between
> iterations. The right way is to drop the rev-by and CC the person for
> any non-trivial or maybe for all changes.
> 

  only thing that changed is use of helper function
  btrfs_forget_devices() instead of the open-code, so I kept the rev-by.
  Yeah I kind of noticed late that cc list added during git format-patch
  didn't actually add cc in the git send <patch-file>


>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -2664,6 +2664,15 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>>   
>>   	/* Update ctime/mtime for libblkid */
>>   	update_dev_time(device_path);
>> +
>> +	/*
>> +	 * Now that we have written a new sb into this device, check all other
>> +	 * fs_devices list if device_path alienates any other scanned device.
>> +	 * Ignore the return as we are successfull in the core task - add
>> +	 * the device.
>> +	 */
>> +	btrfs_forget_devices(device_path);
> 
> Actually this should go before update_dev_time, as this is the point
> when the file change could be detected by udev and rescanned, which
> increases the window between commit and removal from fs_devices.
> 

  You are right, theoretically btrfs_forget_devices() should be before
  update_dev_time()


Noted and agreed on rest of the stuffs.

Thanks, Anand

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-05-05 22:41 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-28 15:22 [PATCH v3 REBASED 0/3] btrfs: fix issues due to alien device Anand Jain
2020-04-28 15:22 ` [PATCH 1/3] btrfs: drop useless goto in open_fs_devices Anand Jain
2020-04-30 14:09   ` David Sterba
2020-04-28 15:22 ` [PATCH 2/3] btrfs: include non-missing as a qualifier for the latest_bdev Anand Jain
2020-04-30 13:46   ` David Sterba
2020-05-01 22:54     ` Anand Jain
2020-04-28 15:22 ` [PATCH 3/3] btrfs: free alien device due to device add Anand Jain
2020-04-30 13:31   ` David Sterba
2020-05-01 20:01     ` Anand Jain
2020-05-05 17:02       ` David Sterba
2020-05-04 18:58   ` [PATCH v4 2/3] btrfs: include non-missing as a qualifier for the latest_bdev Anand Jain
2020-05-04 18:58   ` [PATCH v4 3/3] btrfs: free alien device due to device add Anand Jain
2020-05-05 19:34     ` David Sterba
2020-05-05 23:40       ` Anand Jain
2020-04-30  6:05 ` [PATCH v3 REBASED 0/3] btrfs: fix issues due to alien device Nikolay Borisov
2020-04-30 17:54   ` Anand Jain
2020-04-30 10:28     ` Nikolay Borisov
2020-05-01 19:45       ` Anand Jain

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.