On Fri, Nov 08, 2019 at 11:06:50PM +0100, Richard Weinberger wrote:
> Am Dienstag, 5. November 2019, 23:03:01 CET schrieb Richard Weinberger:
> > [10860370.764595] BTRFS error (device md1): unable to fixup (regular) error at logical 593483341824 on dev /dev/md1
> > [10860395.236787] BTRFS error (device md1): bdev /dev/md1 errs: wr 0, rd 0, flush 0, corrupt 2292, gen 0
> > [10860395.237267] BTRFS error (device md1): unable to fixup (regular) error at logical 595304841216 on dev /dev/md1
> > [10860395.506085] BTRFS error (device md1): bdev /dev/md1 errs: wr 0, rd 0, flush 0, corrupt 2293, gen 0
> > [10860395.506560] BTRFS error (device md1): unable to fixup (regular) error at logical 595326820352 on dev /dev/md1
> > [10860395.511546] BTRFS error (device md1): bdev /dev/md1 errs: wr 0, rd 0, flush 0, corrupt 2294, gen 0
> > [10860395.512061] BTRFS error (device md1): unable to fixup (regular) error at logical 595327647744 on dev /dev/md1
> > [10860395.664956] BTRFS error (device md1): bdev /dev/md1 errs: wr 0, rd 0, flush 0, corrupt 2295, gen 0
> > [10860395.664959] BTRFS error (device md1): unable to fixup (regular) error at logical 595344850944 on dev /dev/md1
> > [10860395.677733] BTRFS error (device md1): bdev /dev/md1 errs: wr 0, rd 0, flush 0, corrupt 2296, gen 0
> > [10860395.677736] BTRFS error (device md1): unable to fixup (regular) error at logical 595346452480 on dev /dev/md1
> > [10860395.770918] BTRFS error (device md1): bdev /dev/md1 errs: wr 0, rd 0, flush 0, corrupt 2297, gen 0
> > [10860395.771523] BTRFS error (device md1): unable to fixup (regular) error at logical 595357601792 on dev /dev/md1
> > [10860395.789808] BTRFS error (device md1): bdev /dev/md1 errs: wr 0, rd 0, flush 0, corrupt 2298, gen 0
> > [10860395.790455] BTRFS error (device md1): unable to fixup (regular) error at logical 595359870976 on dev /dev/md1
> > [10860395.806699] BTRFS error (device md1): bdev /dev/md1 errs: wr 0, rd 0, flush 0, corrupt 2299, gen 0
> > [10860395.807381] BTRFS error (device md1): unable to fixup (regular) error at logical 595361865728 on dev /dev/md1
> > [10860395.918793] BTRFS error (device md1): bdev /dev/md1 errs: wr 0, rd 0, flush 0, corrupt 2300, gen 0
> > [10860395.919513] BTRFS error (device md1): unable to fixup (regular) error at logical 595372343296 on dev /dev/md1
> > [10860395.993817] BTRFS error (device md1): bdev /dev/md1 errs: wr 0, rd 0, flush 0, corrupt 2301, gen 0
> > [10860395.994574] BTRFS error (device md1): unable to fixup (regular) error at logical 595384438784 on dev /dev/md1
> 
> > For obvious reasons the "BTRFS error (device md1): unable to fixup (regular) error" lines made me nervous
> > and I would like to understand better what is going on.
> > The system has ECC memory with md1 being a RAID1 which passes all health checks.
> > 
> > I tried to find the inodes behind the erroneous addresses without success.
> > e.g.
> > $ btrfs inspect-internal logical-resolve -v -P 593483341824 /
> > ioctl ret=0, total_size=4096, bytes_left=4080, bytes_missing=0, cnt=0, missed=0
> > $ echo $?
> > 1
> > 
> > My kernel is 4.12.14-lp150.12.64-default (OpenSUSE 15.0), so not super recent but AFAICT btrfs should be sane
> > there. :-)
> > 
> > What could cause the errors and how to dig further?
> 
> I was able to reproduce this on vanilla v5.4-rc6.
> 
> Instrumenting btrfs revealed that all erroneous blocks are data blocks (BTRFS_EXTENT_FLAG_DATA)
> and only have ->checksum_error set.
> Both expected and computed checksums are non-zero.
> 
> To me it seems like all these blocks are orphaned data, while extent_from_logical() finds and extent
> for the affected logical addresses, none of the extents belong to an inode.
> This explains also why "btrfs inspect-internal logical-resolve" is unable to point me to an
> inode. And why scrub_print_warning("checksum error", sblock_to_check) does not log anything.
> The function returns early if no inode can be found for a data block...
> 
> This is something to worry about?
> 
> Why does the scrubbing mechanism check orphaned blocks?

Because it would be absurdly expensive to figure out which blocks in an
extent are orphaned.

Scrub normally reads just the extent and csum trees which are already
sorted in on-disk order, so it reads fast with no seeking.

To determine if an extent has orphaned blocks, scrub would have to
follow backrefs until it found references to every block in the extent,
or ran out of backrefs without finding a reference to at least one block.
The seeking makes this hundreds to millions of times more expensive than
just reading and verifying the orphan blocks.

> Thanks,
> //richard
> 
>