Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
From: Zygo Blaxell <zblaxell@furryterror.org>
To: Richard Weinberger <richard@nod.at>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Decoding "unable to fixup (regular)" errors
Date: Fri, 8 Nov 2019 17:16:56 -0500
Message-ID: <20191108221656.GS22121@hungrycats.org> (raw)
In-Reply-To: <2590197.gOHgNE8CYM@blindfold>

[-- Attachment #1: Type: text/plain, Size: 4720 bytes --]

On Fri, Nov 08, 2019 at 11:06:50PM +0100, Richard Weinberger wrote:
> Am Dienstag, 5. November 2019, 23:03:01 CET schrieb Richard Weinberger:
> > [10860370.764595] BTRFS error (device md1): unable to fixup (regular) error at logical 593483341824 on dev /dev/md1
> > [10860395.236787] BTRFS error (device md1): bdev /dev/md1 errs: wr 0, rd 0, flush 0, corrupt 2292, gen 0
> > [10860395.237267] BTRFS error (device md1): unable to fixup (regular) error at logical 595304841216 on dev /dev/md1
> > [10860395.506085] BTRFS error (device md1): bdev /dev/md1 errs: wr 0, rd 0, flush 0, corrupt 2293, gen 0
> > [10860395.506560] BTRFS error (device md1): unable to fixup (regular) error at logical 595326820352 on dev /dev/md1
> > [10860395.511546] BTRFS error (device md1): bdev /dev/md1 errs: wr 0, rd 0, flush 0, corrupt 2294, gen 0
> > [10860395.512061] BTRFS error (device md1): unable to fixup (regular) error at logical 595327647744 on dev /dev/md1
> > [10860395.664956] BTRFS error (device md1): bdev /dev/md1 errs: wr 0, rd 0, flush 0, corrupt 2295, gen 0
> > [10860395.664959] BTRFS error (device md1): unable to fixup (regular) error at logical 595344850944 on dev /dev/md1
> > [10860395.677733] BTRFS error (device md1): bdev /dev/md1 errs: wr 0, rd 0, flush 0, corrupt 2296, gen 0
> > [10860395.677736] BTRFS error (device md1): unable to fixup (regular) error at logical 595346452480 on dev /dev/md1
> > [10860395.770918] BTRFS error (device md1): bdev /dev/md1 errs: wr 0, rd 0, flush 0, corrupt 2297, gen 0
> > [10860395.771523] BTRFS error (device md1): unable to fixup (regular) error at logical 595357601792 on dev /dev/md1
> > [10860395.789808] BTRFS error (device md1): bdev /dev/md1 errs: wr 0, rd 0, flush 0, corrupt 2298, gen 0
> > [10860395.790455] BTRFS error (device md1): unable to fixup (regular) error at logical 595359870976 on dev /dev/md1
> > [10860395.806699] BTRFS error (device md1): bdev /dev/md1 errs: wr 0, rd 0, flush 0, corrupt 2299, gen 0
> > [10860395.807381] BTRFS error (device md1): unable to fixup (regular) error at logical 595361865728 on dev /dev/md1
> > [10860395.918793] BTRFS error (device md1): bdev /dev/md1 errs: wr 0, rd 0, flush 0, corrupt 2300, gen 0
> > [10860395.919513] BTRFS error (device md1): unable to fixup (regular) error at logical 595372343296 on dev /dev/md1
> > [10860395.993817] BTRFS error (device md1): bdev /dev/md1 errs: wr 0, rd 0, flush 0, corrupt 2301, gen 0
> > [10860395.994574] BTRFS error (device md1): unable to fixup (regular) error at logical 595384438784 on dev /dev/md1
> 
> > For obvious reasons the "BTRFS error (device md1): unable to fixup (regular) error" lines made me nervous
> > and I would like to understand better what is going on.
> > The system has ECC memory with md1 being a RAID1 which passes all health checks.
> > 
> > I tried to find the inodes behind the erroneous addresses without success.
> > e.g.
> > $ btrfs inspect-internal logical-resolve -v -P 593483341824 /
> > ioctl ret=0, total_size=4096, bytes_left=4080, bytes_missing=0, cnt=0, missed=0
> > $ echo $?
> > 1
> > 
> > My kernel is 4.12.14-lp150.12.64-default (OpenSUSE 15.0), so not super recent but AFAICT btrfs should be sane
> > there. :-)
> > 
> > What could cause the errors and how to dig further?
> 
> I was able to reproduce this on vanilla v5.4-rc6.
> 
> Instrumenting btrfs revealed that all erroneous blocks are data blocks (BTRFS_EXTENT_FLAG_DATA)
> and only have ->checksum_error set.
> Both expected and computed checksums are non-zero.
> 
> To me it seems like all these blocks are orphaned data, while extent_from_logical() finds and extent
> for the affected logical addresses, none of the extents belong to an inode.
> This explains also why "btrfs inspect-internal logical-resolve" is unable to point me to an
> inode. And why scrub_print_warning("checksum error", sblock_to_check) does not log anything.
> The function returns early if no inode can be found for a data block...
> 
> This is something to worry about?
> 
> Why does the scrubbing mechanism check orphaned blocks?

Because it would be absurdly expensive to figure out which blocks in an
extent are orphaned.

Scrub normally reads just the extent and csum trees which are already
sorted in on-disk order, so it reads fast with no seeking.

To determine if an extent has orphaned blocks, scrub would have to
follow backrefs until it found references to every block in the extent,
or ran out of backrefs without finding a reference to at least one block.
The seeking makes this hundreds to millions of times more expensive than
just reading and verifying the orphan blocks.

> Thanks,
> //richard
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

  reply index

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-05 22:03 Richard Weinberger
2019-11-08 22:06 ` Richard Weinberger
2019-11-08 22:16   ` Zygo Blaxell [this message]
2019-11-08 22:09 ` Zygo Blaxell
2019-11-08 22:21   ` Richard Weinberger
2019-11-08 22:25     ` Zygo Blaxell
2019-11-08 22:31       ` Richard Weinberger
2019-11-08 23:39         ` Zygo Blaxell
2019-11-09  9:58           ` checksum errors in orphaned blocks on multiple systems (Was: Re: Decoding "unable to fixup (regular)" errors) Richard Weinberger
2019-11-13  3:34             ` Zygo Blaxell
2019-11-09 10:00           ` Decoding "unable to fixup (regular)" errors Richard Weinberger
2019-11-13  3:31             ` Zygo Blaxell
2019-11-13 18:17             ` Chris Murphy
2019-11-13 18:24               ` Chris Murphy
2019-11-16  6:16               ` Zygo Blaxell

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191108221656.GS22121@hungrycats.org \
    --to=zblaxell@furryterror.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=richard@nod.at \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org
	public-inbox-index linux-btrfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git