All of lore.kernel.org
 help / color / mirror / Atom feed
* how to replace a failed drive?
@ 2021-09-01 22:07 Tomasz Chmielewski
  2021-09-02  0:15 ` Remi Gauvin
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Tomasz Chmielewski @ 2021-09-01 22:07 UTC (permalink / raw)
  To: Btrfs BTRFS

I'm trying to follow 
https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_failed_devices 
to replace a failed drive. But it seems to be written by a person who 
never attempted to replace a failed drive in btrfs filesystem, and who 
never used mdadm RAID (to see how good RAID experience should look 
like).

What I have:

- RAID-10 over 4 devices (/dev/sd[a-d]2)
- 1 disk (/dev/sdb2) crashed and was no longer seen by the operating 
system
- it was replaced using hot-swapping - new drive registered itself as 
/dev/sde
- I've partitioned /dev/sde, so that /dev/sde2 matches the size of other 
btrfs devices
- because I couldn't remove the faulty device (it wouldn't go below my 
current number of devices) I've added the new device to btrfs 
filesystem:

btrfs device add /dev/sde2 /data/lxd


Now, I wonder, how can I remove the disk which crashed?

# btrfs device delete /dev/sdb2 /data/lxd
ERROR: not a block device: /dev/sdb2


# btrfs device remove /dev/sdb2 /data/lxd
ERROR: not a block device: /dev/sdb2


# btrfs filesystem show /data/lxd
Label: 'lxd5'  uuid: 2b77b498-a644-430b-9dd9-2ad3d381448a
         Total devices 5 FS bytes used 2.84TiB
         devid    1 size 1.73TiB used 1.60TiB path /dev/sda2
         devid    3 size 1.73TiB used 1.60TiB path /dev/sdd2
         devid    4 size 1.73TiB used 1.60TiB path /dev/sdc2
         devid    6 size 1.73TiB used 0.00B path /dev/sde2
         *** Some devices missing


And, a gem:

# btrfs device delete missing /data/lxd
ERROR: error removing device 'missing': no missing devices found to 
remove


So according to "btrfs filesystem show /data/lxd" device is missing, but 
according to "btrfs device delete missing /data/lxd" - no device is 
missing. So confusing!


At this point, btrfs keeps producing massive amounts of logs - 
gigabytes, like:

[39894585.659909] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr 
60298373, rd 393827, flush 1565805, corrupt 0, gen 0
[39894585.660096] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr 
60298374, rd 393827, flush 1565805, corrupt 0, gen 0
[39894585.660288] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr 
60298375, rd 393827, flush 1565805, corrupt 0, gen 0
[39894585.660478] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr 
60298376, rd 393827, flush 1565805, corrupt 0, gen 0
[39894585.660667] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr 
60298377, rd 393827, flush 1565805, corrupt 0, gen 0
[39894585.660861] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr 
60298378, rd 393827, flush 1565805, corrupt 0, gen 0
[39894585.661105] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr 
60298379, rd 393827, flush 1565805, corrupt 0, gen 0
[39894585.661298] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr 
60298380, rd 393827, flush 1565805, corrupt 0, gen 0
[39894585.747082] BTRFS warning (device sda2): lost page write due to IO 
error on /dev/sdb2
[39894585.747214] BTRFS error (device sda2): error writing primary super 
block to device 5



This is REALLY, REALLY very bad RAID experience.

How to recover at this point?


Tomasz Chmielewski

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-09-02  9:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-01 22:07 how to replace a failed drive? Tomasz Chmielewski
2021-09-02  0:15 ` Remi Gauvin
2021-09-02  6:03 ` Nikolay Borisov
2021-09-02  6:16 ` Nikolay Borisov
2021-09-02  7:45 ` Anand Jain
2021-09-02  8:00   ` Andrei Borzenkov
2021-09-02  8:04     ` Nikolay Borisov
2021-09-02  9:23     ` Tomasz Chmielewski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.