All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] clear_bit BTRFS_DEV_STATE_MISSING if the device->bdev is not NULL During mount.
@ 2021-09-22 17:06 Li Zhang
  2021-09-23  7:41 ` Nikolay Borisov
  0 siblings, 1 reply; 5+ messages in thread
From: Li Zhang @ 2021-09-22 17:06 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Li Zhang

reference to:
https://github.com/kdave/btrfs-progs/issues/389
If a device offline somehow, but after a moment, it is back
online, btrfs filesystem only mark the device is missing, but
don't mark it right.

However, Although filesystem mark device presented, in my
case the /dev/loop2 is come back, the generation of this
/dev/loop2 super block  is not right.Because the device's
data is not up-to-date.  So the info of status of scrubs
would list as following.:

$ losetup /dev/loop1 test1
$ losetup /dev/loop2 test2
$ losetup -d /dev/loop2
$ mount -o degraded /dev/loop1 /mnt/1
$ touch /mnt/1/file1.txt
$ umount /mnt/1
$ losetup /dev/loop2 test2
$ mount /dev/loop1 /mnt/1
$ btrfs scrub start /mnt/1
scrub started on /mnt/1, fsid 88a3295f-c052-4208-9b1f-7b2c5074c20a (pid=2477)
$ WARNING: errors detected during scrubbing, corrected

$btrfs scrub status /mnt/1
UUID:             88a3295f-c052-4208-9b1f-7b2c5074c20a
Scrub started:    Thu Sep 23 00:34:55 2021
Status:           finished
Duration:         0:00:01
Total to scrub:   272.00KiB
Rate:             272.00KiB/s
Error summary:    super=2 verify=5
  Corrected:      5
  Uncorrectable:  0
  Unverified:     0

And if we umount and mount again everything would be all right.

In my opinion, we could improve this scrub in the following
way, i personally vote the second method

1) Sync all data immediately during mounting, but it waste IO
resource.

2) In scrub, we should give detailed information of every device
instead of a single filesystem, since scrub is launching a number
of thread to scanning each device instead scan whole filesystem.
Futher more we should give user hint about how to fix it, in my
case, user should umount filesystem and mount it again. But if
data is corrupt, the repair program might be called,  in case of
further damage, user should replace a device and reconstruct
ata using raid1, raid6 and so on.

Signed-off-by: Li Zhang <zhanglikernel@gmail.com>
---
 fs/btrfs/volumes.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 464485a..99fdbaa 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -7198,6 +7198,11 @@ static int read_one_dev(struct extent_buffer *leaf,
 			set_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state);
 		}
 
+		if (device->bdev && test_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state) && device->fs_devices->missing_devices > 0) {
+			device->fs_devices->missing_devices--;
+			clear_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state);
+		}
+
 		/* Move the device to its own fs_devices */
 		if (device->fs_devices != fs_devices) {
 			ASSERT(test_bit(BTRFS_DEV_STATE_MISSING,
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] clear_bit BTRFS_DEV_STATE_MISSING if the device->bdev is not NULL During mount.
  2021-09-22 17:06 [PATCH] clear_bit BTRFS_DEV_STATE_MISSING if the device->bdev is not NULL During mount Li Zhang
@ 2021-09-23  7:41 ` Nikolay Borisov
  2021-09-23 10:10   ` li zhang
  0 siblings, 1 reply; 5+ messages in thread
From: Nikolay Borisov @ 2021-09-23  7:41 UTC (permalink / raw)
  To: Li Zhang, linux-btrfs



On 22.09.21 г. 20:06, Li Zhang wrote:
> reference to:
> https://github.com/kdave/btrfs-progs/issues/389
> If a device offline somehow, but after a moment, it is back
> online, btrfs filesystem only mark the device is missing, but
> don't mark it right.
> 
> However, Although filesystem mark device presented, in my
> case the /dev/loop2 is come back, the generation of this
> /dev/loop2 super block  is not right.Because the device's
> data is not up-to-date.  So the info of status of scrubs
> would list as following.:
> 
> $ losetup /dev/loop1 test1
> $ losetup /dev/loop2 test2
> $ losetup -d /dev/loop2
> $ mount -o degraded /dev/loop1 /mnt/1
> $ touch /mnt/1/file1.txt
> $ umount /mnt/1
> $ losetup /dev/loop2 test2
> $ mount /dev/loop1 /mnt/1
> $ btrfs scrub start /mnt/1
> scrub started on /mnt/1, fsid 88a3295f-c052-4208-9b1f-7b2c5074c20a (pid=2477)
> $ WARNING: errors detected during scrubbing, corrected
> 

THis seems it can be turned into an fstests rather easily, care to send
a patch for it as well ?

> $btrfs scrub status /mnt/1
> UUID:             88a3295f-c052-4208-9b1f-7b2c5074c20a
> Scrub started:    Thu Sep 23 00:34:55 2021
> Status:           finished
> Duration:         0:00:01
> Total to scrub:   272.00KiB
> Rate:             272.00KiB/s
> Error summary:    super=2 verify=5
>   Corrected:      5
>   Uncorrectable:  0
>   Unverified:     0
> 
> And if we umount and mount again everything would be all right.
> 
> In my opinion, we could improve this scrub in the following
> way, i personally vote the second method
> 
> 1) Sync all data immediately during mounting, but it waste IO
> resource.
> 
> 2) In scrub, we should give detailed information of every device
> instead of a single filesystem, since scrub is launching a number
> of thread to scanning each device instead scan whole filesystem.
> Futher more we should give user hint about how to fix it, in my
> case, user should umount filesystem and mount it again. But if
> data is corrupt, the repair program might be called,  in case of
> further damage, user should replace a device and reconstruct
> ata using raid1, raid6 and so on.
> 
> Signed-off-by: Li Zhang <zhanglikernel@gmail.com>
> ---
>  fs/btrfs/volumes.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 464485a..99fdbaa 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -7198,6 +7198,11 @@ static int read_one_dev(struct extent_buffer *leaf,
>  			set_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state);
>  		}
>  
> +		if (device->bdev && test_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state) && device->fs_devices->missing_devices > 0) {
> +			device->fs_devices->missing_devices--;
> +			clear_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state);
> +		}
> +
>  		/* Move the device to its own fs_devices */
>  		if (device->fs_devices != fs_devices) {
>  			ASSERT(test_bit(BTRFS_DEV_STATE_MISSING,
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] clear_bit BTRFS_DEV_STATE_MISSING if the device->bdev is not NULL During mount.
  2021-09-23  7:41 ` Nikolay Borisov
@ 2021-09-23 10:10   ` li zhang
  2021-09-23 10:53     ` Nikolay Borisov
  0 siblings, 1 reply; 5+ messages in thread
From: li zhang @ 2021-09-23 10:10 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs

Sure, I would love to do it.

To avoid ambiguity, Should I and write a test
case to detect whether clear the BTRFS_DEV_STATE_MISSING
if filesystem found a device, but it was marked to
BTRFS_DEV_STATE_MISSING.

Nikolay Borisov <nborisov@suse.com> 于2021年9月23日周四 下午3:41写道:
>
>
>
> On 22.09.21 г. 20:06, Li Zhang wrote:
> > reference to:
> > https://github.com/kdave/btrfs-progs/issues/389
> > If a device offline somehow, but after a moment, it is back
> > online, btrfs filesystem only mark the device is missing, but
> > don't mark it right.
> >
> > However, Although filesystem mark device presented, in my
> > case the /dev/loop2 is come back, the generation of this
> > /dev/loop2 super block  is not right.Because the device's
> > data is not up-to-date.  So the info of status of scrubs
> > would list as following.:
> >
> > $ losetup /dev/loop1 test1
> > $ losetup /dev/loop2 test2
> > $ losetup -d /dev/loop2
> > $ mount -o degraded /dev/loop1 /mnt/1
> > $ touch /mnt/1/file1.txt
> > $ umount /mnt/1
> > $ losetup /dev/loop2 test2
> > $ mount /dev/loop1 /mnt/1
> > $ btrfs scrub start /mnt/1
> > scrub started on /mnt/1, fsid 88a3295f-c052-4208-9b1f-7b2c5074c20a (pid=2477)
> > $ WARNING: errors detected during scrubbing, corrected
> >
>
> THis seems it can be turned into an fstests rather easily, care to send
> a patch for it as well ?
>
> > $btrfs scrub status /mnt/1
> > UUID:             88a3295f-c052-4208-9b1f-7b2c5074c20a
> > Scrub started:    Thu Sep 23 00:34:55 2021
> > Status:           finished
> > Duration:         0:00:01
> > Total to scrub:   272.00KiB
> > Rate:             272.00KiB/s
> > Error summary:    super=2 verify=5
> >   Corrected:      5
> >   Uncorrectable:  0
> >   Unverified:     0
> >
> > And if we umount and mount again everything would be all right.
> >
> > In my opinion, we could improve this scrub in the following
> > way, i personally vote the second method
> >
> > 1) Sync all data immediately during mounting, but it waste IO
> > resource.
> >
> > 2) In scrub, we should give detailed information of every device
> > instead of a single filesystem, since scrub is launching a number
> > of thread to scanning each device instead scan whole filesystem.
> > Futher more we should give user hint about how to fix it, in my
> > case, user should umount filesystem and mount it again. But if
> > data is corrupt, the repair program might be called,  in case of
> > further damage, user should replace a device and reconstruct
> > ata using raid1, raid6 and so on.
> >
> > Signed-off-by: Li Zhang <zhanglikernel@gmail.com>
> > ---
> >  fs/btrfs/volumes.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > index 464485a..99fdbaa 100644
> > --- a/fs/btrfs/volumes.c
> > +++ b/fs/btrfs/volumes.c
> > @@ -7198,6 +7198,11 @@ static int read_one_dev(struct extent_buffer *leaf,
> >                       set_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state);
> >               }
> >
> > +             if (device->bdev && test_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state) && device->fs_devices->missing_devices > 0) {
> > +                     device->fs_devices->missing_devices--;
> > +                     clear_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state);
> > +             }
> > +
> >               /* Move the device to its own fs_devices */
> >               if (device->fs_devices != fs_devices) {
> >                       ASSERT(test_bit(BTRFS_DEV_STATE_MISSING,
> >

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] clear_bit BTRFS_DEV_STATE_MISSING if the device->bdev is not NULL During mount.
  2021-09-23 10:10   ` li zhang
@ 2021-09-23 10:53     ` Nikolay Borisov
  2021-09-23 14:34       ` li zhang
  0 siblings, 1 reply; 5+ messages in thread
From: Nikolay Borisov @ 2021-09-23 10:53 UTC (permalink / raw)
  To: li zhang; +Cc: linux-btrfs



On 23.09.21 г. 13:10, li zhang wrote:
> Sure, I would love to do it.
> 
> To avoid ambiguity, Should I and write a test
> case to detect whether clear the BTRFS_DEV_STATE_MISSING
> if filesystem found a device, but it was marked to
> BTRFS_DEV_STATE_MISSING.

Yes, basically the idea is for the test case to fail without your patch
and pass with your patch. That way we'll ensure this won't regress in
the future.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] clear_bit BTRFS_DEV_STATE_MISSING if the device->bdev is not NULL During mount.
  2021-09-23 10:53     ` Nikolay Borisov
@ 2021-09-23 14:34       ` li zhang
  0 siblings, 0 replies; 5+ messages in thread
From: li zhang @ 2021-09-23 14:34 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs

Got it.

Nikolay Borisov <nborisov@suse.com> 于2021年9月23日周四 下午6:53写道:
>
>
>
> On 23.09.21 г. 13:10, li zhang wrote:
> > Sure, I would love to do it.
> >
> > To avoid ambiguity, Should I and write a test
> > case to detect whether clear the BTRFS_DEV_STATE_MISSING
> > if filesystem found a device, but it was marked to
> > BTRFS_DEV_STATE_MISSING.
>
> Yes, basically the idea is for the test case to fail without your patch
> and pass with your patch. That way we'll ensure this won't regress in
> the future.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-09-23 14:35 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-22 17:06 [PATCH] clear_bit BTRFS_DEV_STATE_MISSING if the device->bdev is not NULL During mount Li Zhang
2021-09-23  7:41 ` Nikolay Borisov
2021-09-23 10:10   ` li zhang
2021-09-23 10:53     ` Nikolay Borisov
2021-09-23 14:34       ` li zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.