[PATCH] clear_bit BTRFS_DEV_STATE_MISSING if the device->bdev is not NULL During mount.

* [PATCH] clear_bit BTRFS_DEV_STATE_MISSING if the device->bdev is not NULL During mount.
@ 2021-09-22 17:06 Li Zhang
  2021-09-23  7:41 ` Nikolay Borisov
  0 siblings, 1 reply; 5+ messages in thread
From: Li Zhang @ 2021-09-22 17:06 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Li Zhang

reference to:
https://github.com/kdave/btrfs-progs/issues/389
If a device offline somehow, but after a moment, it is back
online, btrfs filesystem only mark the device is missing, but
don't mark it right.

However, Although filesystem mark device presented, in my
case the /dev/loop2 is come back, the generation of this
/dev/loop2 super block  is not right.Because the device's
data is not up-to-date.  So the info of status of scrubs
would list as following.:

$ losetup /dev/loop1 test1
$ losetup /dev/loop2 test2
$ losetup -d /dev/loop2
$ mount -o degraded /dev/loop1 /mnt/1
$ touch /mnt/1/file1.txt
$ umount /mnt/1
$ losetup /dev/loop2 test2
$ mount /dev/loop1 /mnt/1
$ btrfs scrub start /mnt/1
scrub started on /mnt/1, fsid 88a3295f-c052-4208-9b1f-7b2c5074c20a (pid=2477)
$ WARNING: errors detected during scrubbing, corrected

$btrfs scrub status /mnt/1
UUID:             88a3295f-c052-4208-9b1f-7b2c5074c20a
Scrub started:    Thu Sep 23 00:34:55 2021
Status:           finished
Duration:         0:00:01
Total to scrub:   272.00KiB
Rate:             272.00KiB/s
Error summary:    super=2 verify=5
  Corrected:      5
  Uncorrectable:  0
  Unverified:     0

And if we umount and mount again everything would be all right.

In my opinion, we could improve this scrub in the following
way, i personally vote the second method

1) Sync all data immediately during mounting, but it waste IO
resource.

2) In scrub, we should give detailed information of every device
instead of a single filesystem, since scrub is launching a number
of thread to scanning each device instead scan whole filesystem.
Futher more we should give user hint about how to fix it, in my
case, user should umount filesystem and mount it again. But if
data is corrupt, the repair program might be called,  in case of
further damage, user should replace a device and reconstruct
ata using raid1, raid6 and so on.

Signed-off-by: Li Zhang <zhanglikernel@gmail.com>
---
 fs/btrfs/volumes.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 464485a..99fdbaa 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -7198,6 +7198,11 @@ static int read_one_dev(struct extent_buffer *leaf,
 			set_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state);
 		}
 
+		if (device->bdev && test_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state) && device->fs_devices->missing_devices > 0) {
+			device->fs_devices->missing_devices--;
+			clear_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state);
+		}
+
 		/* Move the device to its own fs_devices */
 		if (device->fs_devices != fs_devices) {
 			ASSERT(test_bit(BTRFS_DEV_STATE_MISSING,
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread