All of lore.kernel.org
 help / color / mirror / Atom feed
* how to replace a failed drive?
@ 2021-09-01 22:07 Tomasz Chmielewski
  2021-09-02  0:15 ` Remi Gauvin
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Tomasz Chmielewski @ 2021-09-01 22:07 UTC (permalink / raw)
  To: Btrfs BTRFS

I'm trying to follow 
https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_failed_devices 
to replace a failed drive. But it seems to be written by a person who 
never attempted to replace a failed drive in btrfs filesystem, and who 
never used mdadm RAID (to see how good RAID experience should look 
like).

What I have:

- RAID-10 over 4 devices (/dev/sd[a-d]2)
- 1 disk (/dev/sdb2) crashed and was no longer seen by the operating 
system
- it was replaced using hot-swapping - new drive registered itself as 
/dev/sde
- I've partitioned /dev/sde, so that /dev/sde2 matches the size of other 
btrfs devices
- because I couldn't remove the faulty device (it wouldn't go below my 
current number of devices) I've added the new device to btrfs 
filesystem:

btrfs device add /dev/sde2 /data/lxd


Now, I wonder, how can I remove the disk which crashed?

# btrfs device delete /dev/sdb2 /data/lxd
ERROR: not a block device: /dev/sdb2


# btrfs device remove /dev/sdb2 /data/lxd
ERROR: not a block device: /dev/sdb2


# btrfs filesystem show /data/lxd
Label: 'lxd5'  uuid: 2b77b498-a644-430b-9dd9-2ad3d381448a
         Total devices 5 FS bytes used 2.84TiB
         devid    1 size 1.73TiB used 1.60TiB path /dev/sda2
         devid    3 size 1.73TiB used 1.60TiB path /dev/sdd2
         devid    4 size 1.73TiB used 1.60TiB path /dev/sdc2
         devid    6 size 1.73TiB used 0.00B path /dev/sde2
         *** Some devices missing


And, a gem:

# btrfs device delete missing /data/lxd
ERROR: error removing device 'missing': no missing devices found to 
remove


So according to "btrfs filesystem show /data/lxd" device is missing, but 
according to "btrfs device delete missing /data/lxd" - no device is 
missing. So confusing!


At this point, btrfs keeps producing massive amounts of logs - 
gigabytes, like:

[39894585.659909] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr 
60298373, rd 393827, flush 1565805, corrupt 0, gen 0
[39894585.660096] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr 
60298374, rd 393827, flush 1565805, corrupt 0, gen 0
[39894585.660288] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr 
60298375, rd 393827, flush 1565805, corrupt 0, gen 0
[39894585.660478] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr 
60298376, rd 393827, flush 1565805, corrupt 0, gen 0
[39894585.660667] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr 
60298377, rd 393827, flush 1565805, corrupt 0, gen 0
[39894585.660861] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr 
60298378, rd 393827, flush 1565805, corrupt 0, gen 0
[39894585.661105] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr 
60298379, rd 393827, flush 1565805, corrupt 0, gen 0
[39894585.661298] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr 
60298380, rd 393827, flush 1565805, corrupt 0, gen 0
[39894585.747082] BTRFS warning (device sda2): lost page write due to IO 
error on /dev/sdb2
[39894585.747214] BTRFS error (device sda2): error writing primary super 
block to device 5



This is REALLY, REALLY very bad RAID experience.

How to recover at this point?


Tomasz Chmielewski

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: how to replace a failed drive?
  2021-09-01 22:07 how to replace a failed drive? Tomasz Chmielewski
@ 2021-09-02  0:15 ` Remi Gauvin
  2021-09-02  6:03 ` Nikolay Borisov
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Remi Gauvin @ 2021-09-02  0:15 UTC (permalink / raw)
  To: Tomasz Chmielewski, Btrfs BTRFS

On 2021-09-01 6:07 p.m., Tomasz Chmielewski wrote:
> I'm trying to follow
> https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_failed_devices
> to replace a failed drive. But it seems to be written by a person who
> never attempted to replace a failed drive in btrfs filesystem, and who
> never used mdadm RAID (to see how good RAID experience should look like).
> 
> What I have:
> 
> - RAID-10 over 4 devices (/dev/sd[a-d]2)
> - 1 disk (/dev/sdb2) crashed and was no longer seen by the operating system
> - it was replaced using hot-swapping - new drive registered itself as
> /dev/sde
> - I've partitioned /dev/sde, so that /dev/sde2 matches the size of other
> btrfs devices
> - because I couldn't remove the faulty device (it wouldn't go below my
> current number of devices) I've added the new device to btrfs filesystem:
> 
> btrfs device add /dev/sde2 /data/lxd
> 
> 
> Now, I wonder, how can I remove the disk which crashed?
> 
> # btrfs device delete /dev/sdb2 /data/lxd
> ERROR: not a block device: /dev/sdb2
> 
> 
> # btrfs device remove /dev/sdb2 /data/lxd
> ERROR: not a block device: /dev/sdb2
> 
> 
> # btrfs filesystem show /data/lxd
> Label: 'lxd5'  uuid: 2b77b498-a644-430b-9dd9-2ad3d381448a
>         Total devices 5 FS bytes used 2.84TiB
>         devid    1 size 1.73TiB used 1.60TiB path /dev/sda2
>         devid    3 size 1.73TiB used 1.60TiB path /dev/sdd2
>         devid    4 size 1.73TiB used 1.60TiB path /dev/sdc2
>         devid    6 size 1.73TiB used 0.00B path /dev/sde2
>         *** Some devices missing
> 
> 
> And, a gem:
> 
> # btrfs device delete missing /data/lxd
> ERROR: error removing device 'missing': no missing devices found to remove
> 
> 
> So according to "btrfs filesystem show /data/lxd" device is missing, but
> according to "btrfs device delete missing /data/lxd" - no device is
> missing. So confusing!
> 
> 
> At this point, btrfs keeps producing massive amounts of logs -
> gigabytes, like:
> 
> [39894585.659909] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298373, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660096] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298374, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660288] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298375, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660478] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298376, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660667] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298377, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660861] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298378, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.661105] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298379, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.661298] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298380, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.747082] BTRFS warning (device sda2): lost page write due to IO
> error on /dev/sdb2
> [39894585.747214] BTRFS error (device sda2): error writing primary super
> block to device 5
> 
> 
> 
> This is REALLY, REALLY very bad RAID experience.
> 
> How to recover at this point?
> 
> 
> Tomasz Chmielewski


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: how to replace a failed drive?
  2021-09-01 22:07 how to replace a failed drive? Tomasz Chmielewski
  2021-09-02  0:15 ` Remi Gauvin
@ 2021-09-02  6:03 ` Nikolay Borisov
  2021-09-02  6:16 ` Nikolay Borisov
  2021-09-02  7:45 ` Anand Jain
  3 siblings, 0 replies; 8+ messages in thread
From: Nikolay Borisov @ 2021-09-02  6:03 UTC (permalink / raw)
  To: Tomasz Chmielewski, Btrfs BTRFS



On 2.09.21 г. 1:07, Tomasz Chmielewski wrote:
> I'm trying to follow
> https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_failed_devices
> to replace a failed drive. But it seems to be written by a person who
> never attempted to replace a failed drive in btrfs filesystem, and who
> never used mdadm RAID (to see how good RAID experience should look like).
> 
> What I have:
> 
> - RAID-10 over 4 devices (/dev/sd[a-d]2)
> - 1 disk (/dev/sdb2) crashed and was no longer seen by the operating system
> - it was replaced using hot-swapping - new drive registered itself as
> /dev/sde
> - I've partitioned /dev/sde, so that /dev/sde2 matches the size of other
> btrfs devices
> - because I couldn't remove the faulty device (it wouldn't go below my
> current number of devices) I've added the new device to btrfs filesystem:
> 
> btrfs device add /dev/sde2 /data/lxd
> 
> 
> Now, I wonder, how can I remove the disk which crashed?
> 
> # btrfs device delete /dev/sdb2 /data/lxd
> ERROR: not a block device: /dev/sdb2

Right, this happens because, indeed, progs currently expects the path to
the device can be found. Your case clearly demonstrates this is not
always the case when a crash has occurred. So let me try and cook up a
fix for you.


<snip>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: how to replace a failed drive?
  2021-09-01 22:07 how to replace a failed drive? Tomasz Chmielewski
  2021-09-02  0:15 ` Remi Gauvin
  2021-09-02  6:03 ` Nikolay Borisov
@ 2021-09-02  6:16 ` Nikolay Borisov
  2021-09-02  7:45 ` Anand Jain
  3 siblings, 0 replies; 8+ messages in thread
From: Nikolay Borisov @ 2021-09-02  6:16 UTC (permalink / raw)
  To: Tomasz Chmielewski, Btrfs BTRFS



On 2.09.21 г. 1:07, Tomasz Chmielewski wrote:
> I'm trying to follow
> https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_failed_devices
> to replace a failed drive. But it seems to be written by a person who
> never attempted to replace a failed drive in btrfs filesystem, and who
> never used mdadm RAID (to see how good RAID experience should look like).
> 
> What I have:
> 
> - RAID-10 over 4 devices (/dev/sd[a-d]2)
> - 1 disk (/dev/sdb2) crashed and was no longer seen by the operating system
> - it was replaced using hot-swapping - new drive registered itself as
> /dev/sde
> - I've partitioned /dev/sde, so that /dev/sde2 matches the size of other
> btrfs devices
> - because I couldn't remove the faulty device (it wouldn't go below my
> current number of devices) I've added the new device to btrfs filesystem:
> 
> btrfs device add /dev/sde2 /data/lxd
> 
> 
> Now, I wonder, how can I remove the disk which crashed?
> 
> # btrfs device delete /dev/sdb2 /data/lxd
> ERROR: not a block device: /dev/sdb2


Actually can you run

btrfs device remove missing /data/lxd ?

> 
> 
> # btrfs device remove /dev/sdb2 /data/lxd
> ERROR: not a block device: /dev/sdb2
> 
> 
> # btrfs filesystem show /data/lxd
> Label: 'lxd5'  uuid: 2b77b498-a644-430b-9dd9-2ad3d381448a
>         Total devices 5 FS bytes used 2.84TiB
>         devid    1 size 1.73TiB used 1.60TiB path /dev/sda2
>         devid    3 size 1.73TiB used 1.60TiB path /dev/sdd2
>         devid    4 size 1.73TiB used 1.60TiB path /dev/sdc2
>         devid    6 size 1.73TiB used 0.00B path /dev/sde2
>         *** Some devices missing
> 
> 
> And, a gem:
> 
> # btrfs device delete missing /data/lxd
> ERROR: error removing device 'missing': no missing devices found to remove
> 
> 
> So according to "btrfs filesystem show /data/lxd" device is missing, but
> according to "btrfs device delete missing /data/lxd" - no device is
> missing. So confusing!
> 
> 
> At this point, btrfs keeps producing massive amounts of logs -
> gigabytes, like:
> 
> [39894585.659909] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298373, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660096] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298374, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660288] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298375, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660478] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298376, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660667] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298377, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660861] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298378, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.661105] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298379, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.661298] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298380, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.747082] BTRFS warning (device sda2): lost page write due to IO
> error on /dev/sdb2
> [39894585.747214] BTRFS error (device sda2): error writing primary super
> block to device 5
> 
> 
> 
> This is REALLY, REALLY very bad RAID experience.
> 
> How to recover at this point?
> 
> 
> Tomasz Chmielewski
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: how to replace a failed drive?
  2021-09-01 22:07 how to replace a failed drive? Tomasz Chmielewski
                   ` (2 preceding siblings ...)
  2021-09-02  6:16 ` Nikolay Borisov
@ 2021-09-02  7:45 ` Anand Jain
  2021-09-02  8:00   ` Andrei Borzenkov
  3 siblings, 1 reply; 8+ messages in thread
From: Anand Jain @ 2021-09-02  7:45 UTC (permalink / raw)
  To: Tomasz Chmielewski, Btrfs BTRFS

On 02/09/2021 06:07, Tomasz Chmielewski wrote:
> I'm trying to follow 
> https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_failed_devices 
> to replace a failed drive. But it seems to be written by a person who 
> never attempted to replace a failed drive in btrfs filesystem, and who 
> never used mdadm RAID (to see how good RAID experience should look like).
> 
> What I have:
> 
> - RAID-10 over 4 devices (/dev/sd[a-d]2)
> - 1 disk (/dev/sdb2) crashed and was no longer seen by the operating system
> - it was replaced using hot-swapping - new drive registered itself as 
> /dev/sde
> - I've partitioned /dev/sde, so that /dev/sde2 matches the size of other 
> btrfs devices
> - because I couldn't remove the faulty device (it wouldn't go below my 
> current number of devices) I've added the new device to btrfs filesystem:
> 


> btrfs device add /dev/sde2 /data/lxd

  Wiki is correct.

  $ btrfs replace start 7 /dev/sdf1 /mnt

  That is 'btrfs replace start <devid-of-missing-dev> <new-dev> /mnt'

  Do you mean this didn't work? As also mentioned in the wiki
  replace-command is better than add and remove.

  Moving forward, as Nikolay suggested, remove-missing will help.

-Anand

> Now, I wonder, how can I remove the disk which crashed?
> 
> # btrfs device delete /dev/sdb2 /data/lxd
> ERROR: not a block device: /dev/sdb2
> 
> 
> # btrfs device remove /dev/sdb2 /data/lxd
> ERROR: not a block device: /dev/sdb2
> 
> 
> # btrfs filesystem show /data/lxd
> Label: 'lxd5'  uuid: 2b77b498-a644-430b-9dd9-2ad3d381448a
>          Total devices 5 FS bytes used 2.84TiB
>          devid    1 size 1.73TiB used 1.60TiB path /dev/sda2
>          devid    3 size 1.73TiB used 1.60TiB path /dev/sdd2
>          devid    4 size 1.73TiB used 1.60TiB path /dev/sdc2
>          devid    6 size 1.73TiB used 0.00B path /dev/sde2
>          *** Some devices missing
> 
> 
> And, a gem:
> 
> # btrfs device delete missing /data/lxd
> ERROR: error removing device 'missing': no missing devices found to remove
> 
> 
> So according to "btrfs filesystem show /data/lxd" device is missing, but 
> according to "btrfs device delete missing /data/lxd" - no device is 
> missing. So confusing!
> 
> 
> At this point, btrfs keeps producing massive amounts of logs - 
> gigabytes, like:
> 
> [39894585.659909] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr 
> 60298373, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660096] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr 
> 60298374, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660288] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr 
> 60298375, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660478] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr 
> 60298376, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660667] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr 
> 60298377, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660861] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr 
> 60298378, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.661105] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr 
> 60298379, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.661298] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr 
> 60298380, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.747082] BTRFS warning (device sda2): lost page write due to IO 
> error on /dev/sdb2
> [39894585.747214] BTRFS error (device sda2): error writing primary super 
> block to device 5
> 
> 
> 
> This is REALLY, REALLY very bad RAID experience.
> 
> How to recover at this point?
> 
> 
> Tomasz Chmielewski


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: how to replace a failed drive?
  2021-09-02  7:45 ` Anand Jain
@ 2021-09-02  8:00   ` Andrei Borzenkov
  2021-09-02  8:04     ` Nikolay Borisov
  2021-09-02  9:23     ` Tomasz Chmielewski
  0 siblings, 2 replies; 8+ messages in thread
From: Andrei Borzenkov @ 2021-09-02  8:00 UTC (permalink / raw)
  To: Anand Jain, Tomasz Chmielewski, Btrfs BTRFS

On 02.09.2021 10:45, Anand Jain wrote:
> On 02/09/2021 06:07, Tomasz Chmielewski wrote:
>> I'm trying to follow
>> https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_failed_devices
>> to replace a failed drive. But it seems to be written by a person who
>> never attempted to replace a failed drive in btrfs filesystem, and who
>> never used mdadm RAID (to see how good RAID experience should look like).
>>
>> What I have:
>>
>> - RAID-10 over 4 devices (/dev/sd[a-d]2)
>> - 1 disk (/dev/sdb2) crashed and was no longer seen by the operating
>> system
>> - it was replaced using hot-swapping - new drive registered itself as
>> /dev/sde
>> - I've partitioned /dev/sde, so that /dev/sde2 matches the size of
>> other btrfs devices
>> - because I couldn't remove the faulty device (it wouldn't go below my
>> current number of devices) I've added the new device to btrfs filesystem:
>>
> 
> 
>> btrfs device add /dev/sde2 /data/lxd
> 
>  Wiki is correct.
> 
>  $ btrfs replace start 7 /dev/sdf1 /mnt
> 

Where exactly user is supposed to find out the correct number of missing
device? Because
...

>>
>> # btrfs filesystem show /data/lxd
>> Label: 'lxd5'  uuid: 2b77b498-a644-430b-9dd9-2ad3d381448a
>>          Total devices 5 FS bytes used 2.84TiB
>>          devid    1 size 1.73TiB used 1.60TiB path /dev/sda2
>>          devid    3 size 1.73TiB used 1.60TiB path /dev/sdd2
>>          devid    4 size 1.73TiB used 1.60TiB path /dev/sdc2
>>          devid    6 size 1.73TiB used 0.00B path /dev/sde2
>>          *** Some devices missing
>>

It only shows existing devices. "Some devices missing" is not exactly
helping. More useful would be "devid 7 missing".

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: how to replace a failed drive?
  2021-09-02  8:00   ` Andrei Borzenkov
@ 2021-09-02  8:04     ` Nikolay Borisov
  2021-09-02  9:23     ` Tomasz Chmielewski
  1 sibling, 0 replies; 8+ messages in thread
From: Nikolay Borisov @ 2021-09-02  8:04 UTC (permalink / raw)
  To: Andrei Borzenkov, Anand Jain, Tomasz Chmielewski, Btrfs BTRFS



On 2.09.21 г. 11:00, Andrei Borzenkov wrote:
> On 02.09.2021 10:45, Anand Jain wrote:
>> On 02/09/2021 06:07, Tomasz Chmielewski wrote:
>>> I'm trying to follow
>>> https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_failed_devices
>>> to replace a failed drive. But it seems to be written by a person who
>>> never attempted to replace a failed drive in btrfs filesystem, and who
>>> never used mdadm RAID (to see how good RAID experience should look like).
>>>
>>> What I have:
>>>
>>> - RAID-10 over 4 devices (/dev/sd[a-d]2)
>>> - 1 disk (/dev/sdb2) crashed and was no longer seen by the operating
>>> system
>>> - it was replaced using hot-swapping - new drive registered itself as
>>> /dev/sde
>>> - I've partitioned /dev/sde, so that /dev/sde2 matches the size of
>>> other btrfs devices
>>> - because I couldn't remove the faulty device (it wouldn't go below my
>>> current number of devices) I've added the new device to btrfs filesystem:
>>>
>>
>>
>>> btrfs device add /dev/sde2 /data/lxd
>>
>>  Wiki is correct.
>>
>>  $ btrfs replace start 7 /dev/sdf1 /mnt
>>
> 
> Where exactly user is supposed to find out the correct number of missing
> device? Because
> ...
> 
>>>
>>> # btrfs filesystem show /data/lxd
>>> Label: 'lxd5'  uuid: 2b77b498-a644-430b-9dd9-2ad3d381448a
>>>          Total devices 5 FS bytes used 2.84TiB
>>>          devid    1 size 1.73TiB used 1.60TiB path /dev/sda2
>>>          devid    3 size 1.73TiB used 1.60TiB path /dev/sdd2
>>>          devid    4 size 1.73TiB used 1.60TiB path /dev/sdc2
>>>          devid    6 size 1.73TiB used 0.00B path /dev/sde2
>>>          *** Some devices missing
>>>
> 
> It only shows existing devices. "Some devices missing" is not exactly
> helping. More useful would be "devid 7 missing".

Device missing is generally written in dmesg:

[168454.469038] BTRFS warning (device loop1): devid 3 uuid
5e73af15-91d4-416e-bafb-068801d8e561 is missing


But you are right this is definitely not very user friendly. I'll look
into trying to see if it's possible to have the missing device printed
by progs.

> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: how to replace a failed drive?
  2021-09-02  8:00   ` Andrei Borzenkov
  2021-09-02  8:04     ` Nikolay Borisov
@ 2021-09-02  9:23     ` Tomasz Chmielewski
  1 sibling, 0 replies; 8+ messages in thread
From: Tomasz Chmielewski @ 2021-09-02  9:23 UTC (permalink / raw)
  To: Andrei Borzenkov; +Cc: Anand Jain, Btrfs BTRFS

On 2021-09-02 10:00, Andrei Borzenkov wrote:
> On 02.09.2021 10:45, Anand Jain wrote:
>> On 02/09/2021 06:07, Tomasz Chmielewski wrote:
>>> I'm trying to follow
>>> https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_failed_devices
>>> to replace a failed drive. But it seems to be written by a person who
>>> never attempted to replace a failed drive in btrfs filesystem, and 
>>> who
>>> never used mdadm RAID (to see how good RAID experience should look 
>>> like).
>>> 
>>> What I have:
>>> 
>>> - RAID-10 over 4 devices (/dev/sd[a-d]2)
>>> - 1 disk (/dev/sdb2) crashed and was no longer seen by the operating
>>> system
>>> - it was replaced using hot-swapping - new drive registered itself as
>>> /dev/sde
>>> - I've partitioned /dev/sde, so that /dev/sde2 matches the size of
>>> other btrfs devices
>>> - because I couldn't remove the faulty device (it wouldn't go below 
>>> my
>>> current number of devices) I've added the new device to btrfs 
>>> filesystem:
>>> 
>> 
>> 
>>> btrfs device add /dev/sde2 /data/lxd
>> 
>>  Wiki is correct.
>> 
>>  $ btrfs replace start 7 /dev/sdf1 /mnt
>> 
> 
> Where exactly user is supposed to find out the correct number of 
> missing
> device? Because
> ...
> 
>>> 
>>> # btrfs filesystem show /data/lxd
>>> Label: 'lxd5'  uuid: 2b77b498-a644-430b-9dd9-2ad3d381448a
>>>          Total devices 5 FS bytes used 2.84TiB
>>>          devid    1 size 1.73TiB used 1.60TiB path /dev/sda2
>>>          devid    3 size 1.73TiB used 1.60TiB path /dev/sdd2
>>>          devid    4 size 1.73TiB used 1.60TiB path /dev/sdc2
>>>          devid    6 size 1.73TiB used 0.00B path /dev/sde2
>>>          *** Some devices missing
>>> 
> 
> It only shows existing devices. "Some devices missing" is not exactly
> helping. More useful would be "devid 7 missing".

Exactly this!

Fine documentation says:

    Now replace the absent device with the new drive /dev/sdf1 on the 
filesystem currently mounted on /mnt (since the device is absent, you 
can
    use any devid number that isn't present; 2,5,7,9 would all work the 
same):

      sudo btrfs replace start 7 /dev/sdf1 /mnt



I saw devid 1, 3, 4 and 6 in my "btrfs filesystem show ..." output. 
Pairing that with "you can use any devid number that isn't present" from 
documentation, I've used "2", as it was a devid number which wasn't 
present.

So this failed with an error.

btrfs replace start 2 /dev/sde2 /data/lxd


This did work:

btrfs replace start 5 /dev/sde2 /data/lxd



Highly confusing, and again, not what documentation says.


Tomasz Chmielewski

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-09-02  9:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-01 22:07 how to replace a failed drive? Tomasz Chmielewski
2021-09-02  0:15 ` Remi Gauvin
2021-09-02  6:03 ` Nikolay Borisov
2021-09-02  6:16 ` Nikolay Borisov
2021-09-02  7:45 ` Anand Jain
2021-09-02  8:00   ` Andrei Borzenkov
2021-09-02  8:04     ` Nikolay Borisov
2021-09-02  9:23     ` Tomasz Chmielewski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.