All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Christoph Anton Mitterer <calestyo@scientia.org>,
	linux-btrfs@vger.kernel.org
Subject: Re: BTRFS error (device dm-0): parent transid verify failed on 1382301696 wanted 262166 found 22
Date: Tue, 1 Mar 2022 08:19:12 +0800	[thread overview]
Message-ID: <d408c15d-60e2-0701-f1f1-e35087539ab3@gmx.com> (raw)
In-Reply-To: <74ccc4a0bbd181dd547c06b32be2b071612aeb85.camel@scientia.org>



On 2022/2/28 23:24, Christoph Anton Mitterer wrote:
> On Mon, 2022-02-28 at 14:48 +0800, Qu Wenruo wrote:
>> Btrfs handles checksum differently for metadata (tree block) and
>> data.
>>
>> For metadata, its header has 32 bytes reserved for checksum, and
>> that's
>> where the csum of metadata is.
>> Aka, inlined checksum.
>
> Ah, I see.
>
>
>> For the best case, it's just a leave got this corruption.
>> In that case, if you're using SHA256 and 16K nodesize, you get at
>> most
>> 2MiB range which can not be read.
>> (Again, on disk data can still be fine)
>
> It would be interesting to see how much is actually affected,...
> shouldn't it be possible to run something like dd_rescue on it? I mean
> I'd probably get thousands of csum errors, but in the end it should
> show me how much of the file is gone.

As said, no real file is damaged.
It's just we can get csum.

So go rescue=idatacsums, and verify the content if you have backup.

>>
>> Depends on the generation. If your current generation (can be checked
>> with btrfs ins dump-super) is close to the number 262166, then it's
>> possible it's rewritten recently.
>
>
> Hmm, I assume it's just "main" generation field?

Yep.

>
> Then the number would be *pretty* much off. Which makes the whole thing
> IMO quite strange... as said, the file was written around 2019,... and
> it had been sent/received at least once.
>
> So would expect that the corruption or bit-flip would need to have
> happened at some point after it was first sent/received?

I guess the corrupted csum tree block happen at that time.

And fortunately that range doesn't get much utilized thus later
read/write won't get interrupted by that corrupted tree block.

...
>
>
> On Mon, 2022-02-28 at 14:54 +0800, Qu Wenruo wrote:>
>> It may not be a single file, but a lot of files.
>
> Shouldn't I be able to find out simply by copying away each file (like
> what I did during yesterday's backup)?

Yep, that's possible.

> Or something like tar -cf /dev/null /
>
> Every file that tar cannot read should give an error, and I'd see which
> are affected?

That's also a way.

>
>
>> As csum tree only stores two things, logical bytenr, and its csum.
>>
>> So we need some work to find out:
>>
>> 1) Which logical bytenr range is in that csum tree block
>>
>> 2) Which files owns the logical bytenr range.
>
> Is this possible already with standard tools?

We have tools for 2), "btrfs ins logical-resolve" to search for all the
files owning a logical bytenr range.

But we don't have to tool for 1), maybe you can use "btrfs ins dump-tree
-b <bytenr>" to check the content of that corrupted tree.
>
>
>
>> No common operations can help.
>>
>> But I can craft you a special fix to manually reset the generation of
>> that offending csum tree block, as a last resort method.
>
> I guess, if you'd say that the above way would work to find out which
> file was affected, and if it was only that one (which is not
> precious)... than I could simply copy all data off to some external
> disk, an just re-create the fs.
>
>
> If I'd delete the affected file(s) would btrfs simply clear the broken
> csum block?

Nope. That generation mismatch would prevent btrfs to do any
modification including CoW the tree block to a new location.

Thanks,
Qu

>
>
>> We have a way, since v5.11, we have a new mount option,
>> rescue=idatacsums, which can do exactly that, completely ignore data
>> csums.
>
> Ah :-)
>
>
> Thanks,
> Chris.
>
>
> PS: I'll start the memtest now, and report back once I have some news.

  reply	other threads:[~2022-03-01  0:19 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-27 17:45 BTRFS error (device dm-0): parent transid verify failed on 1382301696 wanted 262166 found 22 Christoph Anton Mitterer
2022-02-27 23:26 ` Qu Wenruo
2022-02-28  0:38   ` Christoph Anton Mitterer
2022-02-28  0:55     ` Qu Wenruo
2022-02-28  5:19       ` Christoph Anton Mitterer
2022-02-28  6:54         ` Qu Wenruo
2022-02-28  5:32       ` Christoph Anton Mitterer
2022-02-28  6:48         ` Qu Wenruo
2022-02-28 15:24           ` Christoph Anton Mitterer
2022-03-01  0:19             ` Qu Wenruo [this message]
2022-03-01  2:14               ` Christoph Anton Mitterer
2022-03-01  2:30                 ` Qu Wenruo
2022-03-02  1:38                   ` Christoph Anton Mitterer
2022-03-02  2:01                     ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d408c15d-60e2-0701-f1f1-e35087539ab3@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=calestyo@scientia.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.