RAID1 disk missing

* RAID1 disk missing
@ 2020-07-30 11:38 Thommandra Gowtham
  2020-07-30 23:59 ` Zygo Blaxell
  0 siblings, 1 reply; 4+ messages in thread
From: Thommandra Gowtham @ 2020-07-30 11:38 UTC (permalink / raw)
  To: linux-btrfs

Hi,

I have root as BTRFS and are moving from 'single' to a RAID1
configuration with 2 disks. If one of the disk goes bad i.e completely
inaccessible to kernel(might be due a hardware issue), we are seeing
errors like below

[24710.550168] BTRFS error (device sdb3): bdev /dev/sda3 errs: wr
96618, rd 16870, flush 105, corrupt 0, gen 0
[24710.561121] BTRFS error (device sdb3): bdev /dev/sda3 errs: wr
96619, rd 16870, flush 105, corrupt 0, gen 0
[24710.572056] BTRFS error (device sdb3): bdev /dev/sda3 errs: wr
96620, rd 16870, flush 105, corrupt 0, gen 0
[24710.582983] BTRFS error (device sdb3): bdev /dev/sda3 errs: wr
96621, rd 16870, flush 105, corrupt 0, gen 0
[24710.593993] BTRFS error (device sdb3): bdev /dev/sda3 errs: wr
96622, rd 16870, flush 105, corrupt 0, gen 0
[24710.605112] BTRFS error (device sdb3): bdev /dev/sda3 errs: wr
96623, rd 16870, flush 105, corrupt 0, gen 0

The above are expected because one of the disks is missing. How do I
make sure that the system works fine until a replacement disk is
added? That can take a few days or a week?

# btrfs fi show
Label: 'rpool'  uuid: 2e9cf1a2-6688-4f7d-b371-a3a878e4bdf3
Total devices 2 FS bytes used 10.86GiB
devid    1 size 206.47GiB used 28.03GiB path /dev/sdb3
*** Some devices missing

Sometimes, the bad disk works fine after a power-cycle. When the disk
is seen again by the kernel after power-cycle, we see errors like
below

[  222.410779] BTRFS error (device sdb3): parent transid verify failed
on 1042750283776 wanted 422935 found 422735
[  222.429451] BTRFS error (device sdb3): parent transid verify failed
on 1042750353408 wanted 422939 found 422899
[  222.442354] BTRFS error (device sdb3): parent transid verify failed
on 1042750357504 wanted 422915 found 422779

And the BTRFS is unable to mount the filesystem in several cases due
to the errors. How do I proactively take action when a disk goes
missing(and can take a few days to get replaced)?
Is moving back from RAID1 to 'single' the only solution?

Please let me know your inputs.

I am using#   btrfs --version
btrfs-progs v4.4

Ubuntu 16.04: 4.15.0-36-generic #1 SMP Mon Oct 22 21:20:30 PDT 2018
x86_64 x86_64 x86_64 GNU/Linux

BTRFS in RAID1 configuration
# btrfs fi show
Label: 'rpool'  uuid: 2e9cf1a2-6688-4f7d-b371-a3a878e4bdf3
Total devices 2 FS bytes used 11.14GiB
devid    1 size 206.47GiB used 28.03GiB path /dev/sdb3
devid    2 size 206.47GiB used 28.03GiB path /dev/sda3

Regards,
Gowtham

^ permalink raw reply	[flat|nested] 4+ messages in thread