All of lore.kernel.org
 help / color / mirror / Atom feed
* scrub: unrepaired sectors detected
@ 2023-12-05  7:51 Stefan N
  2023-12-05 20:05 ` Qu Wenruo
  0 siblings, 1 reply; 4+ messages in thread
From: Stefan N @ 2023-12-05  7:51 UTC (permalink / raw)
  To: linux-btrfs

Hi all,

I'm having trouble getting an array to perform a scrub or replace, and
would appreciate any assistance. I have two empty disks I can use to
move things around, but the intended outcome is to use them to replace
two of the smaller disks.

$ uname -a ; btrfs --version ; btrfs fi show
Linux $hostname 6.5.0-13-generic #13-Ubuntu SMP PREEMPT_DYNAMIC Fri
Nov  3 12:16:05 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
btrfs-progs v6.3.2
Label: none  uuid: 3cde0d85-f53e-4db6-ac2c-a0e6528c5ced
        Total devices 8 FS bytes used 71.32TiB
        devid    1 size 16.37TiB used 16.37TiB path /dev/sdg
        devid    2 size 10.91TiB used 10.91TiB path /dev/sdf
        devid    3 size 16.37TiB used 16.36TiB path /dev/sdd
        devid    4 size 16.37TiB used 12.54TiB path /dev/sda
        devid    5 size 10.91TiB used 10.91TiB path /dev/sde
        devid    6 size 10.91TiB used 10.91TiB path /dev/sdc
        devid    7 size 16.37TiB used 16.37TiB path /dev/sdh
        devid    8 size 10.91TiB used 10.91TiB path /dev/sdb

$ btrfs fi df /mnt/point/
Data, RAID6: total=71.97TiB, used=71.23TiB
System, RAID1C3: total=36.00MiB, used=6.62MiB
Metadata, RAID1C3: total=91.00GiB, used=85.09GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
$

Attempting to scrub
BTRFS error (device sdg): unrepaired sectors detected, full stripe
145926853230592 data stripe 2 errors 5-13
BTRFS info (device sdg): scrub: not finished on devid 2 with status: -5

Scrub device /dev/sdf (id 2) canceled
Scrub started:    Thu Nov 30 08:01:03 2023
Status:           aborted
Duration:         32:17:10
        data_extents_scrubbed: 89766644
        tree_extents_scrubbed: 0
        data_bytes_scrubbed: 5856020676608
        tree_bytes_scrubbed: 0
        read_errors: 0
        csum_errors: 0
        verify_errors: 0
        no_csum: 0
        csum_discards: 0
        super_errors: 0
        malloc_errors: 0
        uncorrectable_errors: 0
        unverified_errors: 0
        corrected_errors: 0
        last_physical: 7984173809664

Attempting to do replace using brand new disks, failed at ~50%, ran
twice with two different pairs of disks
Disk /dev/sdi: 16.37 TiB, 18000207937536 bytes, 35156656128 sectors
Disk /dev/sdl: 16.37 TiB, 18000207937536 bytes, 35156656128 sectors

BTRFS error (device sdg): unrepaired sectors detected, full stripe
145926853230592 data stripe 2 errors 5-13
BTRFS error (device sdg): btrfs_scrub_dev(/dev/sdf, 2, /dev/sdl) failed -5

The data is fairly replaceable so typically have been previously been
deleting files that fail checks and performing roughly 3-monthly
scrubs and weekly balances (musage/dusage=50).

Any help would be appreciated!

Cheers,

Stefan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: scrub: unrepaired sectors detected
  2023-12-05  7:51 scrub: unrepaired sectors detected Stefan N
@ 2023-12-05 20:05 ` Qu Wenruo
  2023-12-09  1:50   ` Stefan N
  0 siblings, 1 reply; 4+ messages in thread
From: Qu Wenruo @ 2023-12-05 20:05 UTC (permalink / raw)
  To: Stefan N, linux-btrfs



On 2023/12/5 18:21, Stefan N wrote:
> Hi all,
>
> I'm having trouble getting an array to perform a scrub or replace, and
> would appreciate any assistance. I have two empty disks I can use to
> move things around, but the intended outcome is to use them to replace
> two of the smaller disks.
>
> $ uname -a ; btrfs --version ; btrfs fi show
> Linux $hostname 6.5.0-13-generic #13-Ubuntu SMP PREEMPT_DYNAMIC Fri
> Nov  3 12:16:05 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
> btrfs-progs v6.3.2
> Label: none  uuid: 3cde0d85-f53e-4db6-ac2c-a0e6528c5ced
>          Total devices 8 FS bytes used 71.32TiB
>          devid    1 size 16.37TiB used 16.37TiB path /dev/sdg
>          devid    2 size 10.91TiB used 10.91TiB path /dev/sdf
>          devid    3 size 16.37TiB used 16.36TiB path /dev/sdd
>          devid    4 size 16.37TiB used 12.54TiB path /dev/sda
>          devid    5 size 10.91TiB used 10.91TiB path /dev/sde
>          devid    6 size 10.91TiB used 10.91TiB path /dev/sdc
>          devid    7 size 16.37TiB used 16.37TiB path /dev/sdh
>          devid    8 size 10.91TiB used 10.91TiB path /dev/sdb
>
> $ btrfs fi df /mnt/point/
> Data, RAID6: total=71.97TiB, used=71.23TiB
> System, RAID1C3: total=36.00MiB, used=6.62MiB
> Metadata, RAID1C3: total=91.00GiB, used=85.09GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
> $
>
> Attempting to scrub
> BTRFS error (device sdg): unrepaired sectors detected, full stripe
> 145926853230592 data stripe 2 errors 5-13

This is introduced in recent kernels, to detect full stripe RAID56
stripes which contains sectors which can not be repaired.

This is pretty new behavior as an extra safenet, as sometimes such scrub
itself can further corrupt the P/Q stripes and cause unrepairable sectors.

And I'm afraid that's already the case here.
Older RAID56 code (and even the newer one) still has the old write-hole
problem, thus previous power loss can reduce the redundancy and
eventually lead to data corruption.

Newer scrub code is addressing this by detecting and error out, other
than further spreading the corruption.
> BTRFS info (device sdg): scrub: not finished on devid 2 with status: -5
>
> Scrub device /dev/sdf (id 2) canceled
> Scrub started:    Thu Nov 30 08:01:03 2023
> Status:           aborted
> Duration:         32:17:10
>          data_extents_scrubbed: 89766644
>          tree_extents_scrubbed: 0
>          data_bytes_scrubbed: 5856020676608
>          tree_bytes_scrubbed: 0
>          read_errors: 0
>          csum_errors: 0
>          verify_errors: 0
>          no_csum: 0
>          csum_discards: 0
>          super_errors: 0
>          malloc_errors: 0
>          uncorrectable_errors: 0
>          unverified_errors: 0
>          corrected_errors: 0
>          last_physical: 7984173809664
>
> Attempting to do replace using brand new disks, failed at ~50%, ran
> twice with two different pairs of disks
> Disk /dev/sdi: 16.37 TiB, 18000207937536 bytes, 35156656128 sectors
> Disk /dev/sdl: 16.37 TiB, 18000207937536 bytes, 35156656128 sectors
>
> BTRFS error (device sdg): unrepaired sectors detected, full stripe
> 145926853230592 data stripe 2 errors 5-13
> BTRFS error (device sdg): btrfs_scrub_dev(/dev/sdf, 2, /dev/sdl) failed -5
>
> The data is fairly replaceable so typically have been previously been
> deleting files that fail checks and performing roughly 3-monthly
> scrubs and weekly balances (musage/dusage=50).

This can be something happened in the past but only caught by newer kernel.

Anyway if you're fine to delete some files (only 9 sectors affected),
you can try to locate the inodes for the following bytenr range:

  [145926853382144, 145926853414912]

The way to go is using "btrfs logical-resolve -o <bytenr> <mnt>".

And delete all the involved files, increase the bytenr by 4k, try again
until no more output for every 4K block in above range.

Normally it should only be one or two files.

Then retry scrub, re-do the loop until the scrub can finish properly.

Thanks,
Qu

>
> Any help would be appreciated!
>
> Cheers,
>
> Stefan
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: scrub: unrepaired sectors detected
  2023-12-05 20:05 ` Qu Wenruo
@ 2023-12-09  1:50   ` Stefan N
  2023-12-09  5:25     ` Qu Wenruo
  0 siblings, 1 reply; 4+ messages in thread
From: Stefan N @ 2023-12-09  1:50 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

Hi Qu,

Thanks for explaining that, and giving a path to remediation.

Could you please explain how you derived the bytenr range from the log
message, as my attempt to reverse engineer the maths was not
successful in the next reported error:

BTRFS error (device sdg): unrepaired sectors detected, full stripe
145932367691776 data stripe 2 errors 14-15

Cheers,

Stefan

On Wed, 6 Dec 2023 at 06:35, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2023/12/5 18:21, Stefan N wrote:
> > Hi all,
> >
> > I'm having trouble getting an array to perform a scrub or replace, and
> > would appreciate any assistance. I have two empty disks I can use to
> > move things around, but the intended outcome is to use them to replace
> > two of the smaller disks.
> >
> > $ uname -a ; btrfs --version ; btrfs fi show
> > Linux $hostname 6.5.0-13-generic #13-Ubuntu SMP PREEMPT_DYNAMIC Fri
> > Nov  3 12:16:05 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
> > btrfs-progs v6.3.2
> > Label: none  uuid: 3cde0d85-f53e-4db6-ac2c-a0e6528c5ced
> >          Total devices 8 FS bytes used 71.32TiB
> >          devid    1 size 16.37TiB used 16.37TiB path /dev/sdg
> >          devid    2 size 10.91TiB used 10.91TiB path /dev/sdf
> >          devid    3 size 16.37TiB used 16.36TiB path /dev/sdd
> >          devid    4 size 16.37TiB used 12.54TiB path /dev/sda
> >          devid    5 size 10.91TiB used 10.91TiB path /dev/sde
> >          devid    6 size 10.91TiB used 10.91TiB path /dev/sdc
> >          devid    7 size 16.37TiB used 16.37TiB path /dev/sdh
> >          devid    8 size 10.91TiB used 10.91TiB path /dev/sdb
> >
> > $ btrfs fi df /mnt/point/
> > Data, RAID6: total=71.97TiB, used=71.23TiB
> > System, RAID1C3: total=36.00MiB, used=6.62MiB
> > Metadata, RAID1C3: total=91.00GiB, used=85.09GiB
> > GlobalReserve, single: total=512.00MiB, used=0.00B
> > $
> >
> > Attempting to scrub
> > BTRFS error (device sdg): unrepaired sectors detected, full stripe
> > 145926853230592 data stripe 2 errors 5-13
>
> This is introduced in recent kernels, to detect full stripe RAID56
> stripes which contains sectors which can not be repaired.
>
> This is pretty new behavior as an extra safenet, as sometimes such scrub
> itself can further corrupt the P/Q stripes and cause unrepairable sectors.
>
> And I'm afraid that's already the case here.
> Older RAID56 code (and even the newer one) still has the old write-hole
> problem, thus previous power loss can reduce the redundancy and
> eventually lead to data corruption.
>
> Newer scrub code is addressing this by detecting and error out, other
> than further spreading the corruption.
> > BTRFS info (device sdg): scrub: not finished on devid 2 with status: -5
> >
> > Scrub device /dev/sdf (id 2) canceled
> > Scrub started:    Thu Nov 30 08:01:03 2023
> > Status:           aborted
> > Duration:         32:17:10
> >          data_extents_scrubbed: 89766644
> >          tree_extents_scrubbed: 0
> >          data_bytes_scrubbed: 5856020676608
> >          tree_bytes_scrubbed: 0
> >          read_errors: 0
> >          csum_errors: 0
> >          verify_errors: 0
> >          no_csum: 0
> >          csum_discards: 0
> >          super_errors: 0
> >          malloc_errors: 0
> >          uncorrectable_errors: 0
> >          unverified_errors: 0
> >          corrected_errors: 0
> >          last_physical: 7984173809664
> >
> > Attempting to do replace using brand new disks, failed at ~50%, ran
> > twice with two different pairs of disks
> > Disk /dev/sdi: 16.37 TiB, 18000207937536 bytes, 35156656128 sectors
> > Disk /dev/sdl: 16.37 TiB, 18000207937536 bytes, 35156656128 sectors
> >
> > BTRFS error (device sdg): unrepaired sectors detected, full stripe
> > 145926853230592 data stripe 2 errors 5-13
> > BTRFS error (device sdg): btrfs_scrub_dev(/dev/sdf, 2, /dev/sdl) failed -5
> >
> > The data is fairly replaceable so typically have been previously been
> > deleting files that fail checks and performing roughly 3-monthly
> > scrubs and weekly balances (musage/dusage=50).
>
> This can be something happened in the past but only caught by newer kernel.
>
> Anyway if you're fine to delete some files (only 9 sectors affected),
> you can try to locate the inodes for the following bytenr range:
>
>   [145926853382144, 145926853414912]
>
> The way to go is using "btrfs logical-resolve -o <bytenr> <mnt>".
>
> And delete all the involved files, increase the bytenr by 4k, try again
> until no more output for every 4K block in above range.
>
> Normally it should only be one or two files.
>
> Then retry scrub, re-do the loop until the scrub can finish properly.
>
> Thanks,
> Qu
>
> >
> > Any help would be appreciated!
> >
> > Cheers,
> >
> > Stefan
> >

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: scrub: unrepaired sectors detected
  2023-12-09  1:50   ` Stefan N
@ 2023-12-09  5:25     ` Qu Wenruo
  0 siblings, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2023-12-09  5:25 UTC (permalink / raw)
  To: Stefan N, Qu Wenruo; +Cc: linux-btrfs


[-- Attachment #1.1.1: Type: text/plain, Size: 6101 bytes --]



On 2023/12/9 12:20, Stefan N wrote:
> Hi Qu,
> 
> Thanks for explaining that, and giving a path to remediation.
> 
> Could you please explain how you derived the bytenr range from the log
> message, as my attempt to reverse engineer the maths was not
> successful in the next reported error:
> 
> BTRFS error (device sdg): unrepaired sectors detected, full stripe
> 145932367691776 data stripe 2 errors 14-15

145932367691776 is the full stripe number, a full stripe looks like 
something like this:

	X             X+64K         X+128K                   X+64*N K
	|   Data 1    |   Data 2    |   ...   |    Data N    |

Data stripe 2 means it's the 3rd data stripe (we starts from data stripe 0).

So the 3rd data stripe would be for logical range
[Full stripe + 2 * 64K, Full stripe + 3 * 64K).

Furthermore, "errors" is for the vertical stripes, since btrfs is using 
fixed 64K stripe, and normally 4K sector size, we got 16 sectors for 
each data stripe.

And since the value is for vertical stripes, it applies to all data 
stripes. But since the report is only for data stripe 2, we only need to 
add the sectors offset, now we have:

  [Full stripe + 2 * 64K + 14 * 4K, Full stripe + 2 * 64K + 16 * 4K)

Hopes this can help you to pin down all the affected data.

BTW, considering btrfs scrub would try all possible RAID6 combinations, 
if we still have unrepariable data, it really means more than 2 corruptions.
Did you experienced more than 2 power losses before hitting this problem?

Thanks,
Qu
> 
> Cheers,
> 
> Stefan
> 
> On Wed, 6 Dec 2023 at 06:35, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>>
>> On 2023/12/5 18:21, Stefan N wrote:
>>> Hi all,
>>>
>>> I'm having trouble getting an array to perform a scrub or replace, and
>>> would appreciate any assistance. I have two empty disks I can use to
>>> move things around, but the intended outcome is to use them to replace
>>> two of the smaller disks.
>>>
>>> $ uname -a ; btrfs --version ; btrfs fi show
>>> Linux $hostname 6.5.0-13-generic #13-Ubuntu SMP PREEMPT_DYNAMIC Fri
>>> Nov  3 12:16:05 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
>>> btrfs-progs v6.3.2
>>> Label: none  uuid: 3cde0d85-f53e-4db6-ac2c-a0e6528c5ced
>>>           Total devices 8 FS bytes used 71.32TiB
>>>           devid    1 size 16.37TiB used 16.37TiB path /dev/sdg
>>>           devid    2 size 10.91TiB used 10.91TiB path /dev/sdf
>>>           devid    3 size 16.37TiB used 16.36TiB path /dev/sdd
>>>           devid    4 size 16.37TiB used 12.54TiB path /dev/sda
>>>           devid    5 size 10.91TiB used 10.91TiB path /dev/sde
>>>           devid    6 size 10.91TiB used 10.91TiB path /dev/sdc
>>>           devid    7 size 16.37TiB used 16.37TiB path /dev/sdh
>>>           devid    8 size 10.91TiB used 10.91TiB path /dev/sdb
>>>
>>> $ btrfs fi df /mnt/point/
>>> Data, RAID6: total=71.97TiB, used=71.23TiB
>>> System, RAID1C3: total=36.00MiB, used=6.62MiB
>>> Metadata, RAID1C3: total=91.00GiB, used=85.09GiB
>>> GlobalReserve, single: total=512.00MiB, used=0.00B
>>> $
>>>
>>> Attempting to scrub
>>> BTRFS error (device sdg): unrepaired sectors detected, full stripe
>>> 145926853230592 data stripe 2 errors 5-13
>>
>> This is introduced in recent kernels, to detect full stripe RAID56
>> stripes which contains sectors which can not be repaired.
>>
>> This is pretty new behavior as an extra safenet, as sometimes such scrub
>> itself can further corrupt the P/Q stripes and cause unrepairable sectors.
>>
>> And I'm afraid that's already the case here.
>> Older RAID56 code (and even the newer one) still has the old write-hole
>> problem, thus previous power loss can reduce the redundancy and
>> eventually lead to data corruption.
>>
>> Newer scrub code is addressing this by detecting and error out, other
>> than further spreading the corruption.
>>> BTRFS info (device sdg): scrub: not finished on devid 2 with status: -5
>>>
>>> Scrub device /dev/sdf (id 2) canceled
>>> Scrub started:    Thu Nov 30 08:01:03 2023
>>> Status:           aborted
>>> Duration:         32:17:10
>>>           data_extents_scrubbed: 89766644
>>>           tree_extents_scrubbed: 0
>>>           data_bytes_scrubbed: 5856020676608
>>>           tree_bytes_scrubbed: 0
>>>           read_errors: 0
>>>           csum_errors: 0
>>>           verify_errors: 0
>>>           no_csum: 0
>>>           csum_discards: 0
>>>           super_errors: 0
>>>           malloc_errors: 0
>>>           uncorrectable_errors: 0
>>>           unverified_errors: 0
>>>           corrected_errors: 0
>>>           last_physical: 7984173809664
>>>
>>> Attempting to do replace using brand new disks, failed at ~50%, ran
>>> twice with two different pairs of disks
>>> Disk /dev/sdi: 16.37 TiB, 18000207937536 bytes, 35156656128 sectors
>>> Disk /dev/sdl: 16.37 TiB, 18000207937536 bytes, 35156656128 sectors
>>>
>>> BTRFS error (device sdg): unrepaired sectors detected, full stripe
>>> 145926853230592 data stripe 2 errors 5-13
>>> BTRFS error (device sdg): btrfs_scrub_dev(/dev/sdf, 2, /dev/sdl) failed -5
>>>
>>> The data is fairly replaceable so typically have been previously been
>>> deleting files that fail checks and performing roughly 3-monthly
>>> scrubs and weekly balances (musage/dusage=50).
>>
>> This can be something happened in the past but only caught by newer kernel.
>>
>> Anyway if you're fine to delete some files (only 9 sectors affected),
>> you can try to locate the inodes for the following bytenr range:
>>
>>    [145926853382144, 145926853414912]
>>
>> The way to go is using "btrfs logical-resolve -o <bytenr> <mnt>".
>>
>> And delete all the involved files, increase the bytenr by 4k, try again
>> until no more output for every 4K block in above range.
>>
>> Normally it should only be one or two files.
>>
>> Then retry scrub, re-do the loop until the scrub can finish properly.
>>
>> Thanks,
>> Qu
>>
>>>
>>> Any help would be appreciated!
>>>
>>> Cheers,
>>>
>>> Stefan
>>>
> 

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 7027 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-12-09  5:25 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-05  7:51 scrub: unrepaired sectors detected Stefan N
2023-12-05 20:05 ` Qu Wenruo
2023-12-09  1:50   ` Stefan N
2023-12-09  5:25     ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.