linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Balance fails with csum errors, but scrub passes without errors
@ 2022-08-03 18:56 Martin
  2022-08-03 19:54 ` Thiago Ramon
  0 siblings, 1 reply; 5+ messages in thread
From: Martin @ 2022-08-03 18:56 UTC (permalink / raw)
  To: linux-btrfs

Hi,

I've recently had a hard drive that started showing csum errors in a
raid6 configuration with 13 drives, but smartctl wasn't reporting any
issues with the hard drive.
- I ran a scrub on the whole FS, it showed a bunch of errors that (I
think) it repaired.
- Then I tried adding a new drive and running a balance, this failed
with csum errors pretty quickly - pointing at that same drive that had
the scrub errors.
- I ran scrub just on that drive with the errors again, and the scrub
passed without reporting any issues!
- Balance still fails with errors on that drive.
- I replaced the drive (btrfs replace), which finished just fine, but
balance still fails with errors.
I'm not sure what to do from here, can someone advise on how I can
either repair these issues or delete the affected files and continue?


Initial scrub showing 260k errors:
    BTRFS warning (device sdf): checksum error at logical
63224657018880 on dev /dev/sde, physical 3870029381632, root 258,
inode 7735, offset 22675456, length 4096, links 1 (path: ...)
    BTRFS error (device sdf): bdev /dev/sde errs: wr 0, rd 0, flush 0,
corrupt 268348, gen 4074
    BTRFS error (device sdf): fixed up error at logical 63224657018880
on dev /dev/sde
    BTRFS warning (device sdf): checksum error at logical
63224666390528 on dev /dev/sde, physical 3870030233600, root 258,
inode 7735, offset 32047104, length 4096, links 1 (path: ...)
    BTRFS error (device sdf): bdev /dev/sde errs: wr 0, rd 0, flush 0,
corrupt 268349, gen 4074
    BTRFS warning (device sdf): checksum error at logical
63224675762176 on dev /dev/sde, physical 3870031085568, root 258,
inode 7735, offset 41418752, length 4096, links 1 (path: ...)
    BTRFS error (device sdf): bdev /dev/sde errs: wr 0, rd 0, flush 0,
corrupt 268350, gen 4074
    BTRFS warning (device sdf): checksum error at logical
63224685133824 on dev /dev/sde, physical 3870031937536, root 258,
inode 7735, offset 50790400, length 4096, links 1 (path: ...)
    BTRFS error (device sdf): bdev /dev/sde errs: wr 0, rd 0, flush 0,
corrupt 268351, gen 4074
    BTRFS error (device sdf): fixed up error at logical 63224666390528
on dev /dev/sde
    BTRFS error (device sdf): fixed up error at logical 63224675762176
on dev /dev/sde
    BTRFS error (device sdf): fixed up error at logical 63224685133824
on dev /dev/sde
    BTRFS war   ning (device sdf): checksum error at logical
63224694505472 on dev /dev/sde, physical 3870032789504, root 258,
inode 7735, offset 59375616, length 4096, links 1 (path: ...)
    BTRFS error (device sdf): bdev /dev/sde errs: wr 0, rd 0, flush 0,
corrupt 268352, gen 4074
    BTRFS error (device sdf): fixed up error at logical 63224694505472
on dev /dev/sde
    BTRFS warning (device sdf): checksum error at logical
63225491095552 on dev /dev/sde, physical 3870105206784, root 258,
inode 7735, offset 69664768, length 4096, links 1 (path: ...)
    BTRFS error (device sdf): bdev /dev/sde errs: wr 0, rd 0, flush 0,
corrupt 268353, gen 4074
    BTRFS warning (device sdf): checksum error at logical
63225500467200 on dev /dev/sde, physical 3870106058752, root 258,
inode 7735, offset 78118912, length 4096, links 1 (path: ...)
    BTRFS error (device sdf): bdev /dev/sde errs: wr 0, rd 0, flush 0,
corrupt 268354, gen 4074
    BTRFS error (device sdf): fixed up error at logical 63225491095552
on dev /dev/sde
    BTRFS error (device sdf): fixed up error at logical 63225500467200
on dev /dev/sde

Balance fails with these errors:
    [Wed Aug  3 12:13:26 2022] BTRFS info (device sdn): balance: start
-dstripes=13..13
    [Wed Aug  3 12:13:26 2022] BTRFS info (device sdn): relocating
block group 103549454516224 flags data|raid6
    [Wed Aug  3 12:13:45 2022] btrfs_print_data_csum_error: 55
callbacks suppressed
    [Wed Aug  3 12:13:45 2022] BTRFS warning (device sdn): csum failed
root -9 ino 257 off 6809305088 csum 0x26262de7 expected csum
0x0473ecb8 mirror 1
    [Wed Aug  3 12:13:45 2022] BTRFS error (device sdn): bdev /dev/sdk
errs: wr 0, rd 0, flush 0, corrupt 6, gen 0
    [Wed Aug  3 12:13:45 2022] BTRFS warning (device sdn): csum failed
root -9 ino 257 off 6809309184 csum 0x13e9e2a0 expected csum
0x723f00ca mirror 1
    [Wed Aug  3 12:13:45 2022] BTRFS error (device sdn): bdev /dev/sdk
errs: wr 0, rd 0, flush 0, corrupt 7, gen 0
    [Wed Aug  3 12:13:45 2022] BTRFS warning (device sdn): csum failed
root -9 ino 257 off 6809313280 csum 0x5c509a8f expected csum
0xfd89f318 mirror 1
    [Wed Aug  3 12:13:45 2022] BTRFS error (device sdn): bdev /dev/sdk
errs: wr 0, rd 0, flush 0, corrupt 8, gen 0
    [Wed Aug  3 12:13:45 2022] BTRFS warning (device sdn): csum failed
root -9 ino 257 off 6809317376 csum 0x42455521 expected csum
0x07cf450d mirror 1
    [Wed Aug  3 12:13:45 2022] BTRFS error (device sdn): bdev /dev/sdk
errs: wr 0, rd 0, flush 0, corrupt 9, gen 0
    [Wed Aug  3 12:13:45 2022] BTRFS warning (device sdn): csum failed
root -9 ino 257 off 6809305088 csum 0x26262de7 expected csum
0x0473ecb8 mirror 2
    [Wed Aug  3 12:13:45 2022] BTRFS warning (device sdn): csum failed
root -9 ino 257 off 6809309184 csum 0x13e9e2a0 expected csum
0x723f00ca mirror 2
    [Wed Aug  3 12:13:45 2022] BTRFS warning (device sdn): csum failed
root -9 ino 257 off 6809313280 csum 0x5c509a8f expected csum
0xfd89f318 mirror 2
    [Wed Aug  3 12:13:45 2022] BTRFS warning (device sdn): csum failed
root -9 ino 257 off 6809317376 csum 0x42455521 expected csum
0x07cf450d mirror 2
    [Wed Aug  3 12:13:45 2022] BTRFS warning (device sdn): csum failed
root -9 ino 257 off 6809305088 csum 0x26262de7 expected csum
0x0473ecb8 mirror 3
    [Wed Aug  3 12:13:45 2022] BTRFS warning (device sdn): csum failed
root -9 ino 257 off 6809309184 csum 0x13e9e2a0 expected csum
0x723f00ca mirror 3
    [Wed Aug  3 12:13:45 2022] BTRFS error (device sdn): bdev /dev/sdk
errs: wr 0, rd 0, flush 0, corrupt 10, gen 0
    [Wed Aug  3 12:13:48 2022] BTRFS info (device sdn): balance: ended
with status: -5

uname -a:
    Linux magneto 5.18.11-200.fc36.x86_64 #1 SMP PREEMPT_DYNAMIC Tue
Jul 12 22:52:35 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

btrfs --version
    btrfs-progs v5.18

btrfs fi show
    Label: 'raid6'  uuid: 4557fc3c-b70a-44cc-81b8-019658ea6cfd
    Total devices 14 FS bytes used 37.11TiB
    devid    1 size 9.10TiB used 3.44TiB path /dev/sdn
    devid    2 size 9.10TiB used 3.44TiB path /dev/sdk
    devid    3 size 7.28TiB used 3.41TiB path /dev/sdc
    devid    4 size 5.46TiB used 3.42TiB path /dev/sdh
    devid    5 size 3.64TiB used 3.41TiB path /dev/sdl
    devid    6 size 3.64TiB used 3.41TiB path /dev/sdb
    devid    7 size 5.46TiB used 3.41TiB path /dev/sdq
    devid    8 size 4.55TiB used 3.41TiB path /dev/sdf
    devid    9 size 4.55TiB used 3.41TiB path /dev/sdj
    devid   10 size 4.55TiB used 3.41TiB path /dev/sdm
    devid   11 size 4.55TiB used 3.41TiB path /dev/sdi
    devid   12 size 9.10TiB used 3.45TiB path /dev/sdg
    devid   13 size 9.10TiB used 3.45TiB path /dev/sde
    devid   14 size 9.10TiB used 61.09GiB path /dev/sdr


Thanks,
Martin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Balance fails with csum errors, but scrub passes without errors
  2022-08-03 18:56 Balance fails with csum errors, but scrub passes without errors Martin
@ 2022-08-03 19:54 ` Thiago Ramon
  2022-08-03 22:02   ` Martin
  0 siblings, 1 reply; 5+ messages in thread
From: Thiago Ramon @ 2022-08-03 19:54 UTC (permalink / raw)
  To: Martin; +Cc: linux-btrfs

I've had similar issues. There's 2 general cases which you need to
find and correct: actual csum errors on file data, and csum errors
outside the file data (AFAIK only on compressed files).
The first one is easier to spot by reading all files in the FS and
logging anything that throws an IO error. Just running a find and
cat'ing the files to /dev/null should do and list all errors, though
you might prefer to use something more sophisticated to log and resume
if you encounter any problems while doing it (might stumble on some
kernel BUG while doing it).
After you found all the actually damaged files and dealt with them
(ddrescue or just deleting them), you are left with pretty much trying
to balance, getting an error, finding the responsible file from the
offset on the error message (it's the offset inside the block group
being currently relocated) and then just defragging the file should be
enough to clear the error. Then just resume the balance and continue
on to the next one...

Just going to use this new case as another notice that there's
something horribly wrong with scrub on large raid6 arrays, as I'm
running yet another 30M files, 60TB scan on my 12x8TB array to find
everything scrub missed after my last replace...

On Wed, Aug 3, 2022 at 4:01 PM Martin <mbakiev@gmail.com> wrote:
>
> Hi,
>
> I've recently had a hard drive that started showing csum errors in a
> raid6 configuration with 13 drives, but smartctl wasn't reporting any
> issues with the hard drive.
> - I ran a scrub on the whole FS, it showed a bunch of errors that (I
> think) it repaired.
> - Then I tried adding a new drive and running a balance, this failed
> with csum errors pretty quickly - pointing at that same drive that had
> the scrub errors.
> - I ran scrub just on that drive with the errors again, and the scrub
> passed without reporting any issues!
> - Balance still fails with errors on that drive.
> - I replaced the drive (btrfs replace), which finished just fine, but
> balance still fails with errors.
> I'm not sure what to do from here, can someone advise on how I can
> either repair these issues or delete the affected files and continue?
>
>
> Initial scrub showing 260k errors:
>     BTRFS warning (device sdf): checksum error at logical
> 63224657018880 on dev /dev/sde, physical 3870029381632, root 258,
> inode 7735, offset 22675456, length 4096, links 1 (path: ...)
>     BTRFS error (device sdf): bdev /dev/sde errs: wr 0, rd 0, flush 0,
> corrupt 268348, gen 4074
>     BTRFS error (device sdf): fixed up error at logical 63224657018880
> on dev /dev/sde
>     BTRFS warning (device sdf): checksum error at logical
> 63224666390528 on dev /dev/sde, physical 3870030233600, root 258,
> inode 7735, offset 32047104, length 4096, links 1 (path: ...)
>     BTRFS error (device sdf): bdev /dev/sde errs: wr 0, rd 0, flush 0,
> corrupt 268349, gen 4074
>     BTRFS warning (device sdf): checksum error at logical
> 63224675762176 on dev /dev/sde, physical 3870031085568, root 258,
> inode 7735, offset 41418752, length 4096, links 1 (path: ...)
>     BTRFS error (device sdf): bdev /dev/sde errs: wr 0, rd 0, flush 0,
> corrupt 268350, gen 4074
>     BTRFS warning (device sdf): checksum error at logical
> 63224685133824 on dev /dev/sde, physical 3870031937536, root 258,
> inode 7735, offset 50790400, length 4096, links 1 (path: ...)
>     BTRFS error (device sdf): bdev /dev/sde errs: wr 0, rd 0, flush 0,
> corrupt 268351, gen 4074
>     BTRFS error (device sdf): fixed up error at logical 63224666390528
> on dev /dev/sde
>     BTRFS error (device sdf): fixed up error at logical 63224675762176
> on dev /dev/sde
>     BTRFS error (device sdf): fixed up error at logical 63224685133824
> on dev /dev/sde
>     BTRFS war   ning (device sdf): checksum error at logical
> 63224694505472 on dev /dev/sde, physical 3870032789504, root 258,
> inode 7735, offset 59375616, length 4096, links 1 (path: ...)
>     BTRFS error (device sdf): bdev /dev/sde errs: wr 0, rd 0, flush 0,
> corrupt 268352, gen 4074
>     BTRFS error (device sdf): fixed up error at logical 63224694505472
> on dev /dev/sde
>     BTRFS warning (device sdf): checksum error at logical
> 63225491095552 on dev /dev/sde, physical 3870105206784, root 258,
> inode 7735, offset 69664768, length 4096, links 1 (path: ...)
>     BTRFS error (device sdf): bdev /dev/sde errs: wr 0, rd 0, flush 0,
> corrupt 268353, gen 4074
>     BTRFS warning (device sdf): checksum error at logical
> 63225500467200 on dev /dev/sde, physical 3870106058752, root 258,
> inode 7735, offset 78118912, length 4096, links 1 (path: ...)
>     BTRFS error (device sdf): bdev /dev/sde errs: wr 0, rd 0, flush 0,
> corrupt 268354, gen 4074
>     BTRFS error (device sdf): fixed up error at logical 63225491095552
> on dev /dev/sde
>     BTRFS error (device sdf): fixed up error at logical 63225500467200
> on dev /dev/sde
>
> Balance fails with these errors:
>     [Wed Aug  3 12:13:26 2022] BTRFS info (device sdn): balance: start
> -dstripes=13..13
>     [Wed Aug  3 12:13:26 2022] BTRFS info (device sdn): relocating
> block group 103549454516224 flags data|raid6
>     [Wed Aug  3 12:13:45 2022] btrfs_print_data_csum_error: 55
> callbacks suppressed
>     [Wed Aug  3 12:13:45 2022] BTRFS warning (device sdn): csum failed
> root -9 ino 257 off 6809305088 csum 0x26262de7 expected csum
> 0x0473ecb8 mirror 1
>     [Wed Aug  3 12:13:45 2022] BTRFS error (device sdn): bdev /dev/sdk
> errs: wr 0, rd 0, flush 0, corrupt 6, gen 0
>     [Wed Aug  3 12:13:45 2022] BTRFS warning (device sdn): csum failed
> root -9 ino 257 off 6809309184 csum 0x13e9e2a0 expected csum
> 0x723f00ca mirror 1
>     [Wed Aug  3 12:13:45 2022] BTRFS error (device sdn): bdev /dev/sdk
> errs: wr 0, rd 0, flush 0, corrupt 7, gen 0
>     [Wed Aug  3 12:13:45 2022] BTRFS warning (device sdn): csum failed
> root -9 ino 257 off 6809313280 csum 0x5c509a8f expected csum
> 0xfd89f318 mirror 1
>     [Wed Aug  3 12:13:45 2022] BTRFS error (device sdn): bdev /dev/sdk
> errs: wr 0, rd 0, flush 0, corrupt 8, gen 0
>     [Wed Aug  3 12:13:45 2022] BTRFS warning (device sdn): csum failed
> root -9 ino 257 off 6809317376 csum 0x42455521 expected csum
> 0x07cf450d mirror 1
>     [Wed Aug  3 12:13:45 2022] BTRFS error (device sdn): bdev /dev/sdk
> errs: wr 0, rd 0, flush 0, corrupt 9, gen 0
>     [Wed Aug  3 12:13:45 2022] BTRFS warning (device sdn): csum failed
> root -9 ino 257 off 6809305088 csum 0x26262de7 expected csum
> 0x0473ecb8 mirror 2
>     [Wed Aug  3 12:13:45 2022] BTRFS warning (device sdn): csum failed
> root -9 ino 257 off 6809309184 csum 0x13e9e2a0 expected csum
> 0x723f00ca mirror 2
>     [Wed Aug  3 12:13:45 2022] BTRFS warning (device sdn): csum failed
> root -9 ino 257 off 6809313280 csum 0x5c509a8f expected csum
> 0xfd89f318 mirror 2
>     [Wed Aug  3 12:13:45 2022] BTRFS warning (device sdn): csum failed
> root -9 ino 257 off 6809317376 csum 0x42455521 expected csum
> 0x07cf450d mirror 2
>     [Wed Aug  3 12:13:45 2022] BTRFS warning (device sdn): csum failed
> root -9 ino 257 off 6809305088 csum 0x26262de7 expected csum
> 0x0473ecb8 mirror 3
>     [Wed Aug  3 12:13:45 2022] BTRFS warning (device sdn): csum failed
> root -9 ino 257 off 6809309184 csum 0x13e9e2a0 expected csum
> 0x723f00ca mirror 3
>     [Wed Aug  3 12:13:45 2022] BTRFS error (device sdn): bdev /dev/sdk
> errs: wr 0, rd 0, flush 0, corrupt 10, gen 0
>     [Wed Aug  3 12:13:48 2022] BTRFS info (device sdn): balance: ended
> with status: -5
>
> uname -a:
>     Linux magneto 5.18.11-200.fc36.x86_64 #1 SMP PREEMPT_DYNAMIC Tue
> Jul 12 22:52:35 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
>
> btrfs --version
>     btrfs-progs v5.18
>
> btrfs fi show
>     Label: 'raid6'  uuid: 4557fc3c-b70a-44cc-81b8-019658ea6cfd
>     Total devices 14 FS bytes used 37.11TiB
>     devid    1 size 9.10TiB used 3.44TiB path /dev/sdn
>     devid    2 size 9.10TiB used 3.44TiB path /dev/sdk
>     devid    3 size 7.28TiB used 3.41TiB path /dev/sdc
>     devid    4 size 5.46TiB used 3.42TiB path /dev/sdh
>     devid    5 size 3.64TiB used 3.41TiB path /dev/sdl
>     devid    6 size 3.64TiB used 3.41TiB path /dev/sdb
>     devid    7 size 5.46TiB used 3.41TiB path /dev/sdq
>     devid    8 size 4.55TiB used 3.41TiB path /dev/sdf
>     devid    9 size 4.55TiB used 3.41TiB path /dev/sdj
>     devid   10 size 4.55TiB used 3.41TiB path /dev/sdm
>     devid   11 size 4.55TiB used 3.41TiB path /dev/sdi
>     devid   12 size 9.10TiB used 3.45TiB path /dev/sdg
>     devid   13 size 9.10TiB used 3.45TiB path /dev/sde
>     devid   14 size 9.10TiB used 61.09GiB path /dev/sdr
>
>
> Thanks,
> Martin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Balance fails with csum errors, but scrub passes without errors
  2022-08-03 19:54 ` Thiago Ramon
@ 2022-08-03 22:02   ` Martin
  2022-08-03 23:13     ` Thiago Ramon
  0 siblings, 1 reply; 5+ messages in thread
From: Martin @ 2022-08-03 22:02 UTC (permalink / raw)
  To: Thiago Ramon; +Cc: linux-btrfs

> I've had similar issues. There's 2 general cases which you need to
> find and correct: actual csum errors on file data, and csum errors
> outside the file data (AFAIK only on compressed files).
> The first one is easier to spot by reading all files in the FS and
> logging anything that throws an IO error. Just running a find and
> cat'ing the files to /dev/null should do and list all errors, though
> you might prefer to use something more sophisticated to log and resume
> if you encounter any problems while doing it (might stumble on some
> kernel BUG while doing it).
> After you found all the actually damaged files and dealt with them
> (ddrescue or just deleting them), you are left with pretty much trying
> to balance, getting an error, finding the responsible file from the
> offset on the error message (it's the offset inside the block group
> being currently relocated) and then just defragging the file should be
> enough to clear the error. Then just resume the balance and continue
> on to the next one...

Do you have more information on how to figure out which files are
affected from the log error messages there?
Reading all the files first seems unnecessary if I can just use the
block group/offset to figure out the damaged files.
Using `find . -inum 257` points me to a file, but the entire file
reads just fine, so I suspect that's incorrect and has something to do
with the "root -9" part of the error message.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Balance fails with csum errors, but scrub passes without errors
  2022-08-03 22:02   ` Martin
@ 2022-08-03 23:13     ` Thiago Ramon
  2022-08-05 19:26       ` Martin
  0 siblings, 1 reply; 5+ messages in thread
From: Thiago Ramon @ 2022-08-03 23:13 UTC (permalink / raw)
  To: Martin; +Cc: linux-btrfs

Had to dig a bit through my IRC logs. The command is:
btrfs ins log -o $((block_group_start + offset)) /mountpoint

Eg. from your logs:
  [Wed Aug  3 12:13:26 2022] BTRFS info (device sdn): relocating block
group 103549454516224 flags data|raid6
  [Wed Aug  3 12:13:45 2022] BTRFS warning (device sdn): csum failed
root -9 ino 257 off 6809305088 csum 0x26262de7 expected csum
0x0473ecb8 mirror 1

You'd do:
btrfs ins log -o $((103549454516224 + 6809305088)) /mountpoint

It'll say which files are the problem. Chances are there are a lot of
files, and starting the balance after each one might be a bit too much
trouble, which is why I suggest scanning all files first. But if you
got lucky and it's very few files, just doing it with the balance
should be fine.

On Wed, Aug 3, 2022 at 7:03 PM Martin <mbakiev@gmail.com> wrote:
>
> > I've had similar issues. There's 2 general cases which you need to
> > find and correct: actual csum errors on file data, and csum errors
> > outside the file data (AFAIK only on compressed files).
> > The first one is easier to spot by reading all files in the FS and
> > logging anything that throws an IO error. Just running a find and
> > cat'ing the files to /dev/null should do and list all errors, though
> > you might prefer to use something more sophisticated to log and resume
> > if you encounter any problems while doing it (might stumble on some
> > kernel BUG while doing it).
> > After you found all the actually damaged files and dealt with them
> > (ddrescue or just deleting them), you are left with pretty much trying
> > to balance, getting an error, finding the responsible file from the
> > offset on the error message (it's the offset inside the block group
> > being currently relocated) and then just defragging the file should be
> > enough to clear the error. Then just resume the balance and continue
> > on to the next one...
>
> Do you have more information on how to figure out which files are
> affected from the log error messages there?
> Reading all the files first seems unnecessary if I can just use the
> block group/offset to figure out the damaged files.
> Using `find . -inum 257` points me to a file, but the entire file
> reads just fine, so I suspect that's incorrect and has something to do
> with the "root -9" part of the error message.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Balance fails with csum errors, but scrub passes without errors
  2022-08-03 23:13     ` Thiago Ramon
@ 2022-08-05 19:26       ` Martin
  0 siblings, 0 replies; 5+ messages in thread
From: Martin @ 2022-08-05 19:26 UTC (permalink / raw)
  To: Thiago Ramon; +Cc: linux-btrfs

Thanks for that info, Thiago! It was very helpful.

Turns out there was only 1 corrupted file on the FS, and after
deleting it and restoring from backup, the balance finished without
any more issues.
I'm going to run another scrub just in case, but everything looks normal now.

Still kind of a mystery of what went wrong such that the scrub
couldn't catch, but the balance did.

On Wed, Aug 3, 2022 at 5:13 PM Thiago Ramon <thiagoramon@gmail.com> wrote:
>
> Had to dig a bit through my IRC logs. The command is:
> btrfs ins log -o $((block_group_start + offset)) /mountpoint
>
> Eg. from your logs:
>   [Wed Aug  3 12:13:26 2022] BTRFS info (device sdn): relocating block
> group 103549454516224 flags data|raid6
>   [Wed Aug  3 12:13:45 2022] BTRFS warning (device sdn): csum failed
> root -9 ino 257 off 6809305088 csum 0x26262de7 expected csum
> 0x0473ecb8 mirror 1
>
> You'd do:
> btrfs ins log -o $((103549454516224 + 6809305088)) /mountpoint
>
> It'll say which files are the problem. Chances are there are a lot of
> files, and starting the balance after each one might be a bit too much
> trouble, which is why I suggest scanning all files first. But if you
> got lucky and it's very few files, just doing it with the balance
> should be fine.
>
> On Wed, Aug 3, 2022 at 7:03 PM Martin <mbakiev@gmail.com> wrote:
> >
> > > I've had similar issues. There's 2 general cases which you need to
> > > find and correct: actual csum errors on file data, and csum errors
> > > outside the file data (AFAIK only on compressed files).
> > > The first one is easier to spot by reading all files in the FS and
> > > logging anything that throws an IO error. Just running a find and
> > > cat'ing the files to /dev/null should do and list all errors, though
> > > you might prefer to use something more sophisticated to log and resume
> > > if you encounter any problems while doing it (might stumble on some
> > > kernel BUG while doing it).
> > > After you found all the actually damaged files and dealt with them
> > > (ddrescue or just deleting them), you are left with pretty much trying
> > > to balance, getting an error, finding the responsible file from the
> > > offset on the error message (it's the offset inside the block group
> > > being currently relocated) and then just defragging the file should be
> > > enough to clear the error. Then just resume the balance and continue
> > > on to the next one...
> >
> > Do you have more information on how to figure out which files are
> > affected from the log error messages there?
> > Reading all the files first seems unnecessary if I can just use the
> > block group/offset to figure out the damaged files.
> > Using `find . -inum 257` points me to a file, but the entire file
> > reads just fine, so I suspect that's incorrect and has something to do
> > with the "root -9" part of the error message.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-08-05 19:27 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-03 18:56 Balance fails with csum errors, but scrub passes without errors Martin
2022-08-03 19:54 ` Thiago Ramon
2022-08-03 22:02   ` Martin
2022-08-03 23:13     ` Thiago Ramon
2022-08-05 19:26       ` Martin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).