* Troubles removing missing device from RAID 6
@ 2020-07-19 14:13 Edmund Urbani
2020-07-20 4:23 ` Anand Jain
2020-07-21 0:57 ` Zygo Blaxell
0 siblings, 2 replies; 4+ messages in thread
From: Edmund Urbani @ 2020-07-19 14:13 UTC (permalink / raw)
To: linux-btrfs
Hello everyone,
after having RMA'd a faulty HDD from my RAID6 and having received the
replacement, I added the new disk to the filesystem. At that point the
missing device was still listed and I went ahead to remove it like so:
btrfs device delete missing /mnt/shared/
After a few hours that command aborted with an I/O error and the logs
revealed this problem:
[284564.279190] BTRFS info (device sda1): relocating block group
51490279391232 flags data|raid6
[284572.319649] btrfs_print_data_csum_error: 75 callbacks suppressed
[284572.319656] BTRFS warning (device sda1): csum failed root -9 ino 433
off 386727936 csum 0x791e44cc expected csum 0xbd1725d0 mirror 2
[284572.320165] BTRFS warning (device sda1): csum failed root -9 ino 433
off 386732032 csum 0xec5f6097 expected csum 0x9114b5fa mirror 2
[284572.320211] BTRFS warning (device sda1): csum failed root -9 ino 433
off 386736128 csum 0x4d2fa4b9 expected csum 0xf8a923f9 mirror 2
[284572.320225] BTRFS warning (device sda1): csum failed root -9 ino 433
off 386740224 csum 0xcad08362 expected csum 0xa9361ed3 mirror 2
[284572.320266] BTRFS warning (device sda1): csum failed root -9 ino 433
off 386744320 csum 0x469ac192 expected csum 0xb1e94692 mirror 2
[284572.320279] BTRFS warning (device sda1): csum failed root -9 ino 433
off 386748416 csum 0x69759c1f expected csum 0xb3b9aa86 mirror 2
[284572.320290] BTRFS warning (device sda1): csum failed root -9 ino 433
off 386752512 csum 0xd3a7c5d5 expected csum 0xd351862f mirror 2
[284572.320465] BTRFS warning (device sda1): csum failed root -9 ino 433
off 386756608 csum 0x1264af83 expected csum 0x3a2c0ed5 mirror 2
[284572.320480] BTRFS warning (device sda1): csum failed root -9 ino 433
off 386760704 csum 0x260a13ef expected csum 0xb3b4aec0 mirror 2
[284572.320492] BTRFS warning (device sda1): csum failed root -9 ino 433
off 386764800 csum 0x6b615cd9 expected csum 0x99eaf560 mirror 2
I ran a long SMART self-test on the drives in the array which found no
problem. Currently I am running scrub to attempt and fix the block group.
scrub status:
UUID: 9c3c3f8d-a601-4bd3-8871-d068dd500a15
Scrub started: Fri Jul 17 07:52:06 2020
Status: running
Duration: 14:47:07
Time left: 202:05:46
ETA: Tue Jul 28 00:07:36 2020
Total to scrub: 16.80TiB
Bytes scrubbed: 1.14TiB
Rate: 22.56MiB/s
Error summary: read=295132162
Corrected: 0
Uncorrectable: 295132162
Unverified: 0
device stats:
Label: none uuid: 9c3c3f8d-a601-4bd3-8871-d068dd500a15
Total devices 5 FS bytes used 16.80TiB
devid 3 size 9.09TiB used 8.76TiB path /dev/sda1
devid 4 size 9.09TiB used 8.76TiB path /dev/sdb1
devid 5 size 9.09TiB used 8.74TiB path /dev/sdd1
devid 6 size 9.09TiB used 498.53GiB path /dev/sdc1
*** Some devices missing
Is there anything else I can do to try and specifically fix that one
block group rather than scrubbing the entire filesytem? Also, is it
"normal" that scrub stats would show a huge number of "uncorrectable"
errors when a device is missing or should I be worried about that?
Kind regards,
Edmund
--
Auch Liland ist in der Krise für Sie da! #WirBleibenZuhause und liefern
Ihnen trotzdem weiterhin hohe Qualität und besten Service.
Unser Support
<mailto:support@liland.com> steht weiterhin wie gewohnt zur Verfügung.
Ihr
Team LILAND
*
*
*Liland IT GmbH*
Ferlach ● Wien ● München
Tel: +43 463
220111
Tel: +49 89 458 15 940
office@Liland.com
https://Liland.com
<https://Liland.com>
<https://twitter.com/lilandit>
<https://www.instagram.com/liland_com/>
<https://www.facebook.com/LilandIT/>
Copyright © 2020 Liland IT GmbH
Diese Mail enthaelt vertrauliche und/oder rechtlich geschuetzte
Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese Email
irrtuemlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.
This email may contain
confidential and/or privileged information.
If you are not the intended
recipient (or have received this email in error) please notify the sender
immediately and destroy this email. Any unauthorised copying, disclosure or
distribution of the material in this email is strictly forbidden.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Troubles removing missing device from RAID 6
2020-07-19 14:13 Troubles removing missing device from RAID 6 Edmund Urbani
@ 2020-07-20 4:23 ` Anand Jain
2020-07-21 0:57 ` Zygo Blaxell
1 sibling, 0 replies; 4+ messages in thread
From: Anand Jain @ 2020-07-20 4:23 UTC (permalink / raw)
To: Edmund Urbani, linux-btrfs
As you have an additional slot for the new disk, the proper procedure
would have been
btrfs replace start -r <faulty-dev> <new-dev> /mnt
-r shall avoid reading from the faulty dev.
(In some cases there might not be any spare slots, I am looking into
fixing replace command for those cases.)
Thanks, Anand
On 19/7/20 10:13 pm, Edmund Urbani wrote:
> Hello everyone,
>
> after having RMA'd a faulty HDD from my RAID6 and having received the
> replacement, I added the new disk to the filesystem. At that point the
> missing device was still listed and I went ahead to remove it like so:
>
> btrfs device delete missing /mnt/shared/
>
> After a few hours that command aborted with an I/O error and the logs
> revealed this problem:
>
> [284564.279190] BTRFS info (device sda1): relocating block group
> 51490279391232 flags data|raid6
> [284572.319649] btrfs_print_data_csum_error: 75 callbacks suppressed
> [284572.319656] BTRFS warning (device sda1): csum failed root -9 ino 433
> off 386727936 csum 0x791e44cc expected csum 0xbd1725d0 mirror 2
> [284572.320165] BTRFS warning (device sda1): csum failed root -9 ino 433
> off 386732032 csum 0xec5f6097 expected csum 0x9114b5fa mirror 2
> [284572.320211] BTRFS warning (device sda1): csum failed root -9 ino 433
> off 386736128 csum 0x4d2fa4b9 expected csum 0xf8a923f9 mirror 2
> [284572.320225] BTRFS warning (device sda1): csum failed root -9 ino 433
> off 386740224 csum 0xcad08362 expected csum 0xa9361ed3 mirror 2
> [284572.320266] BTRFS warning (device sda1): csum failed root -9 ino 433
> off 386744320 csum 0x469ac192 expected csum 0xb1e94692 mirror 2
> [284572.320279] BTRFS warning (device sda1): csum failed root -9 ino 433
> off 386748416 csum 0x69759c1f expected csum 0xb3b9aa86 mirror 2
> [284572.320290] BTRFS warning (device sda1): csum failed root -9 ino 433
> off 386752512 csum 0xd3a7c5d5 expected csum 0xd351862f mirror 2
> [284572.320465] BTRFS warning (device sda1): csum failed root -9 ino 433
> off 386756608 csum 0x1264af83 expected csum 0x3a2c0ed5 mirror 2
> [284572.320480] BTRFS warning (device sda1): csum failed root -9 ino 433
> off 386760704 csum 0x260a13ef expected csum 0xb3b4aec0 mirror 2
> [284572.320492] BTRFS warning (device sda1): csum failed root -9 ino 433
> off 386764800 csum 0x6b615cd9 expected csum 0x99eaf560 mirror 2
>
> I ran a long SMART self-test on the drives in the array which found no
> problem. Currently I am running scrub to attempt and fix the block group.
>
> scrub status:
>
> UUID: 9c3c3f8d-a601-4bd3-8871-d068dd500a15
>
> Scrub started: Fri Jul 17 07:52:06 2020
> Status: running
> Duration: 14:47:07
> Time left: 202:05:46
> ETA: Tue Jul 28 00:07:36 2020
> Total to scrub: 16.80TiB
> Bytes scrubbed: 1.14TiB
> Rate: 22.56MiB/s
> Error summary: read=295132162
> Corrected: 0
> Uncorrectable: 295132162
> Unverified: 0
>
> device stats:
>
> Label: none uuid: 9c3c3f8d-a601-4bd3-8871-d068dd500a15
> Total devices 5 FS bytes used 16.80TiB
> devid 3 size 9.09TiB used 8.76TiB path /dev/sda1
> devid 4 size 9.09TiB used 8.76TiB path /dev/sdb1
> devid 5 size 9.09TiB used 8.74TiB path /dev/sdd1
> devid 6 size 9.09TiB used 498.53GiB path /dev/sdc1
> *** Some devices missing
>
> Is there anything else I can do to try and specifically fix that one
> block group rather than scrubbing the entire filesytem? Also, is it
> "normal" that scrub stats would show a huge number of "uncorrectable"
> errors when a device is missing or should I be worried about that?
>
> Kind regards,
> Edmund
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Troubles removing missing device from RAID 6
2020-07-19 14:13 Troubles removing missing device from RAID 6 Edmund Urbani
2020-07-20 4:23 ` Anand Jain
@ 2020-07-21 0:57 ` Zygo Blaxell
2020-08-05 15:45 ` Edmund Urbani
1 sibling, 1 reply; 4+ messages in thread
From: Zygo Blaxell @ 2020-07-21 0:57 UTC (permalink / raw)
To: Edmund Urbani; +Cc: linux-btrfs
On Sun, Jul 19, 2020 at 04:13:29PM +0200, Edmund Urbani wrote:
> Hello everyone,
>
> after having RMA'd a faulty HDD from my RAID6 and having received the
> replacement, I added the new disk to the filesystem. At that point the
> missing device was still listed and I went ahead to remove it like so:
>
> btrfs device delete missing /mnt/shared/
>
> After a few hours that command aborted with an I/O error and the logs
> revealed this problem:
>
> [284564.279190] BTRFS info (device sda1): relocating block group
> 51490279391232 flags data|raid6
> [284572.319649] btrfs_print_data_csum_error: 75 callbacks suppressed
> [284572.319656] BTRFS warning (device sda1): csum failed root -9 ino 433 off
> 386727936 csum 0x791e44cc expected csum 0xbd1725d0 mirror 2
> [284572.320165] BTRFS warning (device sda1): csum failed root -9 ino 433 off
> 386732032 csum 0xec5f6097 expected csum 0x9114b5fa mirror 2
> [284572.320211] BTRFS warning (device sda1): csum failed root -9 ino 433 off
> 386736128 csum 0x4d2fa4b9 expected csum 0xf8a923f9 mirror 2
> [284572.320225] BTRFS warning (device sda1): csum failed root -9 ino 433 off
> 386740224 csum 0xcad08362 expected csum 0xa9361ed3 mirror 2
> [284572.320266] BTRFS warning (device sda1): csum failed root -9 ino 433 off
> 386744320 csum 0x469ac192 expected csum 0xb1e94692 mirror 2
> [284572.320279] BTRFS warning (device sda1): csum failed root -9 ino 433 off
> 386748416 csum 0x69759c1f expected csum 0xb3b9aa86 mirror 2
> [284572.320290] BTRFS warning (device sda1): csum failed root -9 ino 433 off
> 386752512 csum 0xd3a7c5d5 expected csum 0xd351862f mirror 2
> [284572.320465] BTRFS warning (device sda1): csum failed root -9 ino 433 off
> 386756608 csum 0x1264af83 expected csum 0x3a2c0ed5 mirror 2
> [284572.320480] BTRFS warning (device sda1): csum failed root -9 ino 433 off
> 386760704 csum 0x260a13ef expected csum 0xb3b4aec0 mirror 2
> [284572.320492] BTRFS warning (device sda1): csum failed root -9 ino 433 off
> 386764800 csum 0x6b615cd9 expected csum 0x99eaf560 mirror 2
>
> I ran a long SMART self-test on the drives in the array which found no
> problem.
You are hitting a few of the known bugs in btrfs raid5/6. See
https://lore.kernel.org/linux-btrfs/20200627032414.GX10769@hungrycats.org/
TL;DR don't expect anything to work right until 'btrfs replace' is done.
> Currently I am running scrub to attempt and fix the block group.
Scrub can only correct errors that exist on the disk, so scrub has no
effect here. Wait until 'btrfs replace' is done, then scrub the other
disks in the array.
btrfs raid6 has broken read code for degraded mode. The errors above
all originate from trees inside the kernel (root -9 isn't a normal
on-disk root). Those errors don't exist on disk. The errors are
triggered repeatably by on-disk structures, so the errors will _appear_
to be persistent (i.e. if you try to balance the same block group twice
it will usually fail at the same spot); however, the on-disk structures
are valid, and should not produce an error if the kernel code was correct,
or if the missing disk is replaced.
> scrub status:
>
> UUID: 9c3c3f8d-a601-4bd3-8871-d068dd500a15
>
> Scrub started: Fri Jul 17 07:52:06 2020
> Status: running
> Duration: 14:47:07
> Time left: 202:05:46
> ETA: Tue Jul 28 00:07:36 2020
> Total to scrub: 16.80TiB
> Bytes scrubbed: 1.14TiB
> Rate: 22.56MiB/s
> Error summary: read=295132162
> Corrected: 0
> Uncorrectable: 295132162
> Unverified: 0
>
> device stats:
>
> Label: none uuid: 9c3c3f8d-a601-4bd3-8871-d068dd500a15
> Total devices 5 FS bytes used 16.80TiB
> devid 3 size 9.09TiB used 8.76TiB path /dev/sda1
> devid 4 size 9.09TiB used 8.76TiB path /dev/sdb1
> devid 5 size 9.09TiB used 8.74TiB path /dev/sdd1
> devid 6 size 9.09TiB used 498.53GiB path /dev/sdc1
> *** Some devices missing
>
> Is there anything else I can do to try and specifically fix that one block
> group rather than scrubbing the entire filesytem? Also, is it "normal" that
> scrub stats would show a huge number of "uncorrectable" errors when a device
> is missing or should I be worried about that?
There might be a few dozen KB of uncorrectable data after the 'btrfs
replace' is done, depending on how messy the original disk failure was.
You may want to zero the dev stats once the btrfs replace is done,
as the stats collected during degraded mode will be mostly garbage.
> Kind regards,
> Edmund
>
>
> --
> Auch Liland ist in der Krise für Sie da! #WirBleibenZuhause und liefern
> Ihnen trotzdem weiterhin hohe Qualität und besten Service.
> Unser Support <mailto:support@liland.com> steht weiterhin wie gewohnt zur
> Verfügung.
> Ihr Team LILAND
> *
> *
> *Liland IT GmbH*
>
>
> Ferlach ● Wien ● München
> Tel: +43 463 220111
> Tel: +49 89 458 15 940
> office@Liland.com
> https://Liland.com <https://Liland.com>
> <https://twitter.com/lilandit> <https://www.instagram.com/liland_com/>
> <https://www.facebook.com/LilandIT/>
>
> Copyright © 2020 Liland IT GmbH
>
>
> Diese Mail enthaelt vertrauliche und/oder rechtlich geschuetzte
> Informationen.
> Wenn Sie nicht der richtige Adressat sind oder diese Email
> irrtuemlich erhalten haben, informieren Sie bitte sofort den Absender und
> vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
> Weitergabe dieser Mail ist nicht gestattet.
>
> This email may contain confidential and/or privileged information.
> If you are not the intended recipient (or have received this email in error)
> please notify the sender immediately and destroy this email.
> Any unauthorised copying, disclosure or distribution of the material in
> this email is strictly forbidden.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Troubles removing missing device from RAID 6
2020-07-21 0:57 ` Zygo Blaxell
@ 2020-08-05 15:45 ` Edmund Urbani
0 siblings, 0 replies; 4+ messages in thread
From: Edmund Urbani @ 2020-08-05 15:45 UTC (permalink / raw)
To: Zygo Blaxell; +Cc: linux-btrfs
On 7/21/20 2:57 AM, Zygo Blaxell wrote:
> On Sun, Jul 19, 2020 at 04:13:29PM +0200, Edmund Urbani wrote:
>> Hello everyone,
>>
>> after having RMA'd a faulty HDD from my RAID6 and having received the
>> replacement, I added the new disk to the filesystem. At that point the
>> missing device was still listed and I went ahead to remove it like so:
>>
>> btrfs device delete missing /mnt/shared/
>>
>> After a few hours that command aborted with an I/O error and the logs
>> revealed this problem:
>>
>> [284564.279190] BTRFS info (device sda1): relocating block group
>> 51490279391232 flags data|raid6
>> [284572.319649] btrfs_print_data_csum_error: 75 callbacks suppressed
>> [284572.319656] BTRFS warning (device sda1): csum failed root -9 ino 433 off
>> 386727936 csum 0x791e44cc expected csum 0xbd1725d0 mirror 2
>> [284572.320165] BTRFS warning (device sda1): csum failed root -9 ino 433 off
>> 386732032 csum 0xec5f6097 expected csum 0x9114b5fa mirror 2
>> [284572.320211] BTRFS warning (device sda1): csum failed root -9 ino 433 off
>> 386736128 csum 0x4d2fa4b9 expected csum 0xf8a923f9 mirror 2
>> [284572.320225] BTRFS warning (device sda1): csum failed root -9 ino 433 off
>> 386740224 csum 0xcad08362 expected csum 0xa9361ed3 mirror 2
>> [284572.320266] BTRFS warning (device sda1): csum failed root -9 ino 433 off
>> 386744320 csum 0x469ac192 expected csum 0xb1e94692 mirror 2
>> [284572.320279] BTRFS warning (device sda1): csum failed root -9 ino 433 off
>> 386748416 csum 0x69759c1f expected csum 0xb3b9aa86 mirror 2
>> [284572.320290] BTRFS warning (device sda1): csum failed root -9 ino 433 off
>> 386752512 csum 0xd3a7c5d5 expected csum 0xd351862f mirror 2
>> [284572.320465] BTRFS warning (device sda1): csum failed root -9 ino 433 off
>> 386756608 csum 0x1264af83 expected csum 0x3a2c0ed5 mirror 2
>> [284572.320480] BTRFS warning (device sda1): csum failed root -9 ino 433 off
>> 386760704 csum 0x260a13ef expected csum 0xb3b4aec0 mirror 2
>> [284572.320492] BTRFS warning (device sda1): csum failed root -9 ino 433 off
>> 386764800 csum 0x6b615cd9 expected csum 0x99eaf560 mirror 2
>>
>> I ran a long SMART self-test on the drives in the array which found no
>> problem.
> You are hitting a few of the known bugs in btrfs raid5/6. See
>
> https://lore.kernel.org/linux-btrfs/20200627032414.GX10769@hungrycats.org/
>
> TL;DR don't expect anything to work right until 'btrfs replace' is done.
>
>> Currently I am running scrub to attempt and fix the block group.
> Scrub can only correct errors that exist on the disk, so scrub has no
> effect here. Wait until 'btrfs replace' is done, then scrub the other
> disks in the array.
>
> btrfs raid6 has broken read code for degraded mode. The errors above
> all originate from trees inside the kernel (root -9 isn't a normal
> on-disk root). Those errors don't exist on disk. The errors are
> triggered repeatably by on-disk structures, so the errors will _appear_
> to be persistent (i.e. if you try to balance the same block group twice
> it will usually fail at the same spot); however, the on-disk structures
> are valid, and should not produce an error if the kernel code was correct,
> or if the missing disk is replaced.
>
>> scrub status:
>>
>> UUID: 9c3c3f8d-a601-4bd3-8871-d068dd500a15
>>
>> Scrub started: Fri Jul 17 07:52:06 2020
>> Status: running
>> Duration: 14:47:07
>> Time left: 202:05:46
>> ETA: Tue Jul 28 00:07:36 2020
>> Total to scrub: 16.80TiB
>> Bytes scrubbed: 1.14TiB
>> Rate: 22.56MiB/s
>> Error summary: read=295132162
>> Corrected: 0
>> Uncorrectable: 295132162
>> Unverified: 0
>>
>> device stats:
>>
>> Label: none uuid: 9c3c3f8d-a601-4bd3-8871-d068dd500a15
>> Total devices 5 FS bytes used 16.80TiB
>> devid 3 size 9.09TiB used 8.76TiB path /dev/sda1
>> devid 4 size 9.09TiB used 8.76TiB path /dev/sdb1
>> devid 5 size 9.09TiB used 8.74TiB path /dev/sdd1
>> devid 6 size 9.09TiB used 498.53GiB path /dev/sdc1
>> *** Some devices missing
>>
>> Is there anything else I can do to try and specifically fix that one block
>> group rather than scrubbing the entire filesytem? Also, is it "normal" that
>> scrub stats would show a huge number of "uncorrectable" errors when a device
>> is missing or should I be worried about that?
> There might be a few dozen KB of uncorrectable data after the 'btrfs
> replace' is done, depending on how messy the original disk failure was.
>
> You may want to zero the dev stats once the btrfs replace is done,
> as the stats collected during degraded mode will be mostly garbage.
>
>> Kind regards,
>> Edmund
>>
Scrub failed while I was gone on vacation. Thankfully the filesystem is still up
and running "fine" in degraded mode. I ordered another drive to try and replace
the missing one properly this time around.
PS: Sorry about the other redundant thread I created. Somehow missed this reply
yesterday.
--
Auch Liland ist in der Krise für Sie da! #WirBleibenZuhause und liefern
Ihnen trotzdem weiterhin hohe Qualität und besten Service.
Unser Support
<mailto:support@liland.com> steht weiterhin wie gewohnt zur Verfügung.
Ihr
Team LILAND
*
*
*Liland IT GmbH*
Ferlach ● Wien ● München
Tel: +43 463
220111
Tel: +49 89 458 15 940
office@Liland.com
https://Liland.com
<https://Liland.com>
<https://twitter.com/lilandit>
<https://www.instagram.com/liland_com/>
<https://www.facebook.com/LilandIT/>
Copyright © 2020 Liland IT GmbH
Diese Mail enthaelt vertrauliche und/oder rechtlich geschuetzte
Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese Email
irrtuemlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.
This email may contain
confidential and/or privileged information.
If you are not the intended
recipient (or have received this email in error) please notify the sender
immediately and destroy this email. Any unauthorised copying, disclosure or
distribution of the material in this email is strictly forbidden.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-08-05 16:59 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-19 14:13 Troubles removing missing device from RAID 6 Edmund Urbani
2020-07-20 4:23 ` Anand Jain
2020-07-21 0:57 ` Zygo Blaxell
2020-08-05 15:45 ` Edmund Urbani
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.