* how to replace a failed drive?
@ 2021-09-01 22:07 Tomasz Chmielewski
2021-09-02 0:15 ` Remi Gauvin
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Tomasz Chmielewski @ 2021-09-01 22:07 UTC (permalink / raw)
To: Btrfs BTRFS
I'm trying to follow
https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_failed_devices
to replace a failed drive. But it seems to be written by a person who
never attempted to replace a failed drive in btrfs filesystem, and who
never used mdadm RAID (to see how good RAID experience should look
like).
What I have:
- RAID-10 over 4 devices (/dev/sd[a-d]2)
- 1 disk (/dev/sdb2) crashed and was no longer seen by the operating
system
- it was replaced using hot-swapping - new drive registered itself as
/dev/sde
- I've partitioned /dev/sde, so that /dev/sde2 matches the size of other
btrfs devices
- because I couldn't remove the faulty device (it wouldn't go below my
current number of devices) I've added the new device to btrfs
filesystem:
btrfs device add /dev/sde2 /data/lxd
Now, I wonder, how can I remove the disk which crashed?
# btrfs device delete /dev/sdb2 /data/lxd
ERROR: not a block device: /dev/sdb2
# btrfs device remove /dev/sdb2 /data/lxd
ERROR: not a block device: /dev/sdb2
# btrfs filesystem show /data/lxd
Label: 'lxd5' uuid: 2b77b498-a644-430b-9dd9-2ad3d381448a
Total devices 5 FS bytes used 2.84TiB
devid 1 size 1.73TiB used 1.60TiB path /dev/sda2
devid 3 size 1.73TiB used 1.60TiB path /dev/sdd2
devid 4 size 1.73TiB used 1.60TiB path /dev/sdc2
devid 6 size 1.73TiB used 0.00B path /dev/sde2
*** Some devices missing
And, a gem:
# btrfs device delete missing /data/lxd
ERROR: error removing device 'missing': no missing devices found to
remove
So according to "btrfs filesystem show /data/lxd" device is missing, but
according to "btrfs device delete missing /data/lxd" - no device is
missing. So confusing!
At this point, btrfs keeps producing massive amounts of logs -
gigabytes, like:
[39894585.659909] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
60298373, rd 393827, flush 1565805, corrupt 0, gen 0
[39894585.660096] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
60298374, rd 393827, flush 1565805, corrupt 0, gen 0
[39894585.660288] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
60298375, rd 393827, flush 1565805, corrupt 0, gen 0
[39894585.660478] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
60298376, rd 393827, flush 1565805, corrupt 0, gen 0
[39894585.660667] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
60298377, rd 393827, flush 1565805, corrupt 0, gen 0
[39894585.660861] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
60298378, rd 393827, flush 1565805, corrupt 0, gen 0
[39894585.661105] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
60298379, rd 393827, flush 1565805, corrupt 0, gen 0
[39894585.661298] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
60298380, rd 393827, flush 1565805, corrupt 0, gen 0
[39894585.747082] BTRFS warning (device sda2): lost page write due to IO
error on /dev/sdb2
[39894585.747214] BTRFS error (device sda2): error writing primary super
block to device 5
This is REALLY, REALLY very bad RAID experience.
How to recover at this point?
Tomasz Chmielewski
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: how to replace a failed drive?
2021-09-01 22:07 how to replace a failed drive? Tomasz Chmielewski
@ 2021-09-02 0:15 ` Remi Gauvin
2021-09-02 6:03 ` Nikolay Borisov
` (2 subsequent siblings)
3 siblings, 0 replies; 8+ messages in thread
From: Remi Gauvin @ 2021-09-02 0:15 UTC (permalink / raw)
To: Tomasz Chmielewski, Btrfs BTRFS
On 2021-09-01 6:07 p.m., Tomasz Chmielewski wrote:
> I'm trying to follow
> https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_failed_devices
> to replace a failed drive. But it seems to be written by a person who
> never attempted to replace a failed drive in btrfs filesystem, and who
> never used mdadm RAID (to see how good RAID experience should look like).
>
> What I have:
>
> - RAID-10 over 4 devices (/dev/sd[a-d]2)
> - 1 disk (/dev/sdb2) crashed and was no longer seen by the operating system
> - it was replaced using hot-swapping - new drive registered itself as
> /dev/sde
> - I've partitioned /dev/sde, so that /dev/sde2 matches the size of other
> btrfs devices
> - because I couldn't remove the faulty device (it wouldn't go below my
> current number of devices) I've added the new device to btrfs filesystem:
>
> btrfs device add /dev/sde2 /data/lxd
>
>
> Now, I wonder, how can I remove the disk which crashed?
>
> # btrfs device delete /dev/sdb2 /data/lxd
> ERROR: not a block device: /dev/sdb2
>
>
> # btrfs device remove /dev/sdb2 /data/lxd
> ERROR: not a block device: /dev/sdb2
>
>
> # btrfs filesystem show /data/lxd
> Label: 'lxd5' uuid: 2b77b498-a644-430b-9dd9-2ad3d381448a
> Total devices 5 FS bytes used 2.84TiB
> devid 1 size 1.73TiB used 1.60TiB path /dev/sda2
> devid 3 size 1.73TiB used 1.60TiB path /dev/sdd2
> devid 4 size 1.73TiB used 1.60TiB path /dev/sdc2
> devid 6 size 1.73TiB used 0.00B path /dev/sde2
> *** Some devices missing
>
>
> And, a gem:
>
> # btrfs device delete missing /data/lxd
> ERROR: error removing device 'missing': no missing devices found to remove
>
>
> So according to "btrfs filesystem show /data/lxd" device is missing, but
> according to "btrfs device delete missing /data/lxd" - no device is
> missing. So confusing!
>
>
> At this point, btrfs keeps producing massive amounts of logs -
> gigabytes, like:
>
> [39894585.659909] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298373, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660096] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298374, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660288] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298375, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660478] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298376, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660667] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298377, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660861] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298378, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.661105] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298379, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.661298] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298380, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.747082] BTRFS warning (device sda2): lost page write due to IO
> error on /dev/sdb2
> [39894585.747214] BTRFS error (device sda2): error writing primary super
> block to device 5
>
>
>
> This is REALLY, REALLY very bad RAID experience.
>
> How to recover at this point?
>
>
> Tomasz Chmielewski
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: how to replace a failed drive?
2021-09-01 22:07 how to replace a failed drive? Tomasz Chmielewski
2021-09-02 0:15 ` Remi Gauvin
@ 2021-09-02 6:03 ` Nikolay Borisov
2021-09-02 6:16 ` Nikolay Borisov
2021-09-02 7:45 ` Anand Jain
3 siblings, 0 replies; 8+ messages in thread
From: Nikolay Borisov @ 2021-09-02 6:03 UTC (permalink / raw)
To: Tomasz Chmielewski, Btrfs BTRFS
On 2.09.21 г. 1:07, Tomasz Chmielewski wrote:
> I'm trying to follow
> https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_failed_devices
> to replace a failed drive. But it seems to be written by a person who
> never attempted to replace a failed drive in btrfs filesystem, and who
> never used mdadm RAID (to see how good RAID experience should look like).
>
> What I have:
>
> - RAID-10 over 4 devices (/dev/sd[a-d]2)
> - 1 disk (/dev/sdb2) crashed and was no longer seen by the operating system
> - it was replaced using hot-swapping - new drive registered itself as
> /dev/sde
> - I've partitioned /dev/sde, so that /dev/sde2 matches the size of other
> btrfs devices
> - because I couldn't remove the faulty device (it wouldn't go below my
> current number of devices) I've added the new device to btrfs filesystem:
>
> btrfs device add /dev/sde2 /data/lxd
>
>
> Now, I wonder, how can I remove the disk which crashed?
>
> # btrfs device delete /dev/sdb2 /data/lxd
> ERROR: not a block device: /dev/sdb2
Right, this happens because, indeed, progs currently expects the path to
the device can be found. Your case clearly demonstrates this is not
always the case when a crash has occurred. So let me try and cook up a
fix for you.
<snip>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: how to replace a failed drive?
2021-09-01 22:07 how to replace a failed drive? Tomasz Chmielewski
2021-09-02 0:15 ` Remi Gauvin
2021-09-02 6:03 ` Nikolay Borisov
@ 2021-09-02 6:16 ` Nikolay Borisov
2021-09-02 7:45 ` Anand Jain
3 siblings, 0 replies; 8+ messages in thread
From: Nikolay Borisov @ 2021-09-02 6:16 UTC (permalink / raw)
To: Tomasz Chmielewski, Btrfs BTRFS
On 2.09.21 г. 1:07, Tomasz Chmielewski wrote:
> I'm trying to follow
> https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_failed_devices
> to replace a failed drive. But it seems to be written by a person who
> never attempted to replace a failed drive in btrfs filesystem, and who
> never used mdadm RAID (to see how good RAID experience should look like).
>
> What I have:
>
> - RAID-10 over 4 devices (/dev/sd[a-d]2)
> - 1 disk (/dev/sdb2) crashed and was no longer seen by the operating system
> - it was replaced using hot-swapping - new drive registered itself as
> /dev/sde
> - I've partitioned /dev/sde, so that /dev/sde2 matches the size of other
> btrfs devices
> - because I couldn't remove the faulty device (it wouldn't go below my
> current number of devices) I've added the new device to btrfs filesystem:
>
> btrfs device add /dev/sde2 /data/lxd
>
>
> Now, I wonder, how can I remove the disk which crashed?
>
> # btrfs device delete /dev/sdb2 /data/lxd
> ERROR: not a block device: /dev/sdb2
Actually can you run
btrfs device remove missing /data/lxd ?
>
>
> # btrfs device remove /dev/sdb2 /data/lxd
> ERROR: not a block device: /dev/sdb2
>
>
> # btrfs filesystem show /data/lxd
> Label: 'lxd5' uuid: 2b77b498-a644-430b-9dd9-2ad3d381448a
> Total devices 5 FS bytes used 2.84TiB
> devid 1 size 1.73TiB used 1.60TiB path /dev/sda2
> devid 3 size 1.73TiB used 1.60TiB path /dev/sdd2
> devid 4 size 1.73TiB used 1.60TiB path /dev/sdc2
> devid 6 size 1.73TiB used 0.00B path /dev/sde2
> *** Some devices missing
>
>
> And, a gem:
>
> # btrfs device delete missing /data/lxd
> ERROR: error removing device 'missing': no missing devices found to remove
>
>
> So according to "btrfs filesystem show /data/lxd" device is missing, but
> according to "btrfs device delete missing /data/lxd" - no device is
> missing. So confusing!
>
>
> At this point, btrfs keeps producing massive amounts of logs -
> gigabytes, like:
>
> [39894585.659909] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298373, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660096] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298374, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660288] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298375, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660478] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298376, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660667] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298377, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660861] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298378, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.661105] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298379, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.661298] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298380, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.747082] BTRFS warning (device sda2): lost page write due to IO
> error on /dev/sdb2
> [39894585.747214] BTRFS error (device sda2): error writing primary super
> block to device 5
>
>
>
> This is REALLY, REALLY very bad RAID experience.
>
> How to recover at this point?
>
>
> Tomasz Chmielewski
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: how to replace a failed drive?
2021-09-01 22:07 how to replace a failed drive? Tomasz Chmielewski
` (2 preceding siblings ...)
2021-09-02 6:16 ` Nikolay Borisov
@ 2021-09-02 7:45 ` Anand Jain
2021-09-02 8:00 ` Andrei Borzenkov
3 siblings, 1 reply; 8+ messages in thread
From: Anand Jain @ 2021-09-02 7:45 UTC (permalink / raw)
To: Tomasz Chmielewski, Btrfs BTRFS
On 02/09/2021 06:07, Tomasz Chmielewski wrote:
> I'm trying to follow
> https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_failed_devices
> to replace a failed drive. But it seems to be written by a person who
> never attempted to replace a failed drive in btrfs filesystem, and who
> never used mdadm RAID (to see how good RAID experience should look like).
>
> What I have:
>
> - RAID-10 over 4 devices (/dev/sd[a-d]2)
> - 1 disk (/dev/sdb2) crashed and was no longer seen by the operating system
> - it was replaced using hot-swapping - new drive registered itself as
> /dev/sde
> - I've partitioned /dev/sde, so that /dev/sde2 matches the size of other
> btrfs devices
> - because I couldn't remove the faulty device (it wouldn't go below my
> current number of devices) I've added the new device to btrfs filesystem:
>
> btrfs device add /dev/sde2 /data/lxd
Wiki is correct.
$ btrfs replace start 7 /dev/sdf1 /mnt
That is 'btrfs replace start <devid-of-missing-dev> <new-dev> /mnt'
Do you mean this didn't work? As also mentioned in the wiki
replace-command is better than add and remove.
Moving forward, as Nikolay suggested, remove-missing will help.
-Anand
> Now, I wonder, how can I remove the disk which crashed?
>
> # btrfs device delete /dev/sdb2 /data/lxd
> ERROR: not a block device: /dev/sdb2
>
>
> # btrfs device remove /dev/sdb2 /data/lxd
> ERROR: not a block device: /dev/sdb2
>
>
> # btrfs filesystem show /data/lxd
> Label: 'lxd5' uuid: 2b77b498-a644-430b-9dd9-2ad3d381448a
> Total devices 5 FS bytes used 2.84TiB
> devid 1 size 1.73TiB used 1.60TiB path /dev/sda2
> devid 3 size 1.73TiB used 1.60TiB path /dev/sdd2
> devid 4 size 1.73TiB used 1.60TiB path /dev/sdc2
> devid 6 size 1.73TiB used 0.00B path /dev/sde2
> *** Some devices missing
>
>
> And, a gem:
>
> # btrfs device delete missing /data/lxd
> ERROR: error removing device 'missing': no missing devices found to remove
>
>
> So according to "btrfs filesystem show /data/lxd" device is missing, but
> according to "btrfs device delete missing /data/lxd" - no device is
> missing. So confusing!
>
>
> At this point, btrfs keeps producing massive amounts of logs -
> gigabytes, like:
>
> [39894585.659909] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298373, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660096] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298374, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660288] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298375, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660478] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298376, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660667] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298377, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.660861] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298378, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.661105] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298379, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.661298] BTRFS error (device sda2): bdev /dev/sdb2 errs: wr
> 60298380, rd 393827, flush 1565805, corrupt 0, gen 0
> [39894585.747082] BTRFS warning (device sda2): lost page write due to IO
> error on /dev/sdb2
> [39894585.747214] BTRFS error (device sda2): error writing primary super
> block to device 5
>
>
>
> This is REALLY, REALLY very bad RAID experience.
>
> How to recover at this point?
>
>
> Tomasz Chmielewski
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: how to replace a failed drive?
2021-09-02 7:45 ` Anand Jain
@ 2021-09-02 8:00 ` Andrei Borzenkov
2021-09-02 8:04 ` Nikolay Borisov
2021-09-02 9:23 ` Tomasz Chmielewski
0 siblings, 2 replies; 8+ messages in thread
From: Andrei Borzenkov @ 2021-09-02 8:00 UTC (permalink / raw)
To: Anand Jain, Tomasz Chmielewski, Btrfs BTRFS
On 02.09.2021 10:45, Anand Jain wrote:
> On 02/09/2021 06:07, Tomasz Chmielewski wrote:
>> I'm trying to follow
>> https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_failed_devices
>> to replace a failed drive. But it seems to be written by a person who
>> never attempted to replace a failed drive in btrfs filesystem, and who
>> never used mdadm RAID (to see how good RAID experience should look like).
>>
>> What I have:
>>
>> - RAID-10 over 4 devices (/dev/sd[a-d]2)
>> - 1 disk (/dev/sdb2) crashed and was no longer seen by the operating
>> system
>> - it was replaced using hot-swapping - new drive registered itself as
>> /dev/sde
>> - I've partitioned /dev/sde, so that /dev/sde2 matches the size of
>> other btrfs devices
>> - because I couldn't remove the faulty device (it wouldn't go below my
>> current number of devices) I've added the new device to btrfs filesystem:
>>
>
>
>> btrfs device add /dev/sde2 /data/lxd
>
> Wiki is correct.
>
> $ btrfs replace start 7 /dev/sdf1 /mnt
>
Where exactly user is supposed to find out the correct number of missing
device? Because
...
>>
>> # btrfs filesystem show /data/lxd
>> Label: 'lxd5' uuid: 2b77b498-a644-430b-9dd9-2ad3d381448a
>> Total devices 5 FS bytes used 2.84TiB
>> devid 1 size 1.73TiB used 1.60TiB path /dev/sda2
>> devid 3 size 1.73TiB used 1.60TiB path /dev/sdd2
>> devid 4 size 1.73TiB used 1.60TiB path /dev/sdc2
>> devid 6 size 1.73TiB used 0.00B path /dev/sde2
>> *** Some devices missing
>>
It only shows existing devices. "Some devices missing" is not exactly
helping. More useful would be "devid 7 missing".
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: how to replace a failed drive?
2021-09-02 8:00 ` Andrei Borzenkov
@ 2021-09-02 8:04 ` Nikolay Borisov
2021-09-02 9:23 ` Tomasz Chmielewski
1 sibling, 0 replies; 8+ messages in thread
From: Nikolay Borisov @ 2021-09-02 8:04 UTC (permalink / raw)
To: Andrei Borzenkov, Anand Jain, Tomasz Chmielewski, Btrfs BTRFS
On 2.09.21 г. 11:00, Andrei Borzenkov wrote:
> On 02.09.2021 10:45, Anand Jain wrote:
>> On 02/09/2021 06:07, Tomasz Chmielewski wrote:
>>> I'm trying to follow
>>> https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_failed_devices
>>> to replace a failed drive. But it seems to be written by a person who
>>> never attempted to replace a failed drive in btrfs filesystem, and who
>>> never used mdadm RAID (to see how good RAID experience should look like).
>>>
>>> What I have:
>>>
>>> - RAID-10 over 4 devices (/dev/sd[a-d]2)
>>> - 1 disk (/dev/sdb2) crashed and was no longer seen by the operating
>>> system
>>> - it was replaced using hot-swapping - new drive registered itself as
>>> /dev/sde
>>> - I've partitioned /dev/sde, so that /dev/sde2 matches the size of
>>> other btrfs devices
>>> - because I couldn't remove the faulty device (it wouldn't go below my
>>> current number of devices) I've added the new device to btrfs filesystem:
>>>
>>
>>
>>> btrfs device add /dev/sde2 /data/lxd
>>
>> Wiki is correct.
>>
>> $ btrfs replace start 7 /dev/sdf1 /mnt
>>
>
> Where exactly user is supposed to find out the correct number of missing
> device? Because
> ...
>
>>>
>>> # btrfs filesystem show /data/lxd
>>> Label: 'lxd5' uuid: 2b77b498-a644-430b-9dd9-2ad3d381448a
>>> Total devices 5 FS bytes used 2.84TiB
>>> devid 1 size 1.73TiB used 1.60TiB path /dev/sda2
>>> devid 3 size 1.73TiB used 1.60TiB path /dev/sdd2
>>> devid 4 size 1.73TiB used 1.60TiB path /dev/sdc2
>>> devid 6 size 1.73TiB used 0.00B path /dev/sde2
>>> *** Some devices missing
>>>
>
> It only shows existing devices. "Some devices missing" is not exactly
> helping. More useful would be "devid 7 missing".
Device missing is generally written in dmesg:
[168454.469038] BTRFS warning (device loop1): devid 3 uuid
5e73af15-91d4-416e-bafb-068801d8e561 is missing
But you are right this is definitely not very user friendly. I'll look
into trying to see if it's possible to have the missing device printed
by progs.
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: how to replace a failed drive?
2021-09-02 8:00 ` Andrei Borzenkov
2021-09-02 8:04 ` Nikolay Borisov
@ 2021-09-02 9:23 ` Tomasz Chmielewski
1 sibling, 0 replies; 8+ messages in thread
From: Tomasz Chmielewski @ 2021-09-02 9:23 UTC (permalink / raw)
To: Andrei Borzenkov; +Cc: Anand Jain, Btrfs BTRFS
On 2021-09-02 10:00, Andrei Borzenkov wrote:
> On 02.09.2021 10:45, Anand Jain wrote:
>> On 02/09/2021 06:07, Tomasz Chmielewski wrote:
>>> I'm trying to follow
>>> https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_failed_devices
>>> to replace a failed drive. But it seems to be written by a person who
>>> never attempted to replace a failed drive in btrfs filesystem, and
>>> who
>>> never used mdadm RAID (to see how good RAID experience should look
>>> like).
>>>
>>> What I have:
>>>
>>> - RAID-10 over 4 devices (/dev/sd[a-d]2)
>>> - 1 disk (/dev/sdb2) crashed and was no longer seen by the operating
>>> system
>>> - it was replaced using hot-swapping - new drive registered itself as
>>> /dev/sde
>>> - I've partitioned /dev/sde, so that /dev/sde2 matches the size of
>>> other btrfs devices
>>> - because I couldn't remove the faulty device (it wouldn't go below
>>> my
>>> current number of devices) I've added the new device to btrfs
>>> filesystem:
>>>
>>
>>
>>> btrfs device add /dev/sde2 /data/lxd
>>
>> Wiki is correct.
>>
>> $ btrfs replace start 7 /dev/sdf1 /mnt
>>
>
> Where exactly user is supposed to find out the correct number of
> missing
> device? Because
> ...
>
>>>
>>> # btrfs filesystem show /data/lxd
>>> Label: 'lxd5' uuid: 2b77b498-a644-430b-9dd9-2ad3d381448a
>>> Total devices 5 FS bytes used 2.84TiB
>>> devid 1 size 1.73TiB used 1.60TiB path /dev/sda2
>>> devid 3 size 1.73TiB used 1.60TiB path /dev/sdd2
>>> devid 4 size 1.73TiB used 1.60TiB path /dev/sdc2
>>> devid 6 size 1.73TiB used 0.00B path /dev/sde2
>>> *** Some devices missing
>>>
>
> It only shows existing devices. "Some devices missing" is not exactly
> helping. More useful would be "devid 7 missing".
Exactly this!
Fine documentation says:
Now replace the absent device with the new drive /dev/sdf1 on the
filesystem currently mounted on /mnt (since the device is absent, you
can
use any devid number that isn't present; 2,5,7,9 would all work the
same):
sudo btrfs replace start 7 /dev/sdf1 /mnt
I saw devid 1, 3, 4 and 6 in my "btrfs filesystem show ..." output.
Pairing that with "you can use any devid number that isn't present" from
documentation, I've used "2", as it was a devid number which wasn't
present.
So this failed with an error.
btrfs replace start 2 /dev/sde2 /data/lxd
This did work:
btrfs replace start 5 /dev/sde2 /data/lxd
Highly confusing, and again, not what documentation says.
Tomasz Chmielewski
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-09-02 9:24 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-01 22:07 how to replace a failed drive? Tomasz Chmielewski
2021-09-02 0:15 ` Remi Gauvin
2021-09-02 6:03 ` Nikolay Borisov
2021-09-02 6:16 ` Nikolay Borisov
2021-09-02 7:45 ` Anand Jain
2021-09-02 8:00 ` Andrei Borzenkov
2021-09-02 8:04 ` Nikolay Borisov
2021-09-02 9:23 ` Tomasz Chmielewski
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.