All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <wqu@suse.com>
To: Christoph Anton Mitterer <calestyo@scientia.org>,
	Qu Wenruo <quwenruo.btrfs@gmx.com>,
	linux-btrfs@vger.kernel.org
Subject: Re: BTRFS error (device dm-0): parent transid verify failed on 1382301696 wanted 262166 found 22
Date: Tue, 1 Mar 2022 10:30:30 +0800	[thread overview]
Message-ID: <66d31354-a6d1-01a0-3674-c4976bd7d557@suse.com> (raw)
In-Reply-To: <9049B0C3-5A30-4824-BCED-61AA9AC128E5@scientia.org>



On 2022/3/1 10:14, Christoph Anton Mitterer wrote:
> Hey.
> 
> memtest[0] showed, that in fact memory is damaged in some higher region... as you've guessed, its always a single but flip.

Btrfs is now a pretty good memtest tool too :)

> 
> 
> 
> Am 1. März 2022 01:19:12 MEZ schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
>>> It would be interesting to see how much is actually affected,...
>>> shouldn't it be possible to run something like dd_rescue on it? I mean
>>> I'd probably get thousands of csum errors, but in the end it should
>>> show me how much of the file is gone.
>>
>> As said, no real file is damaged.
>> It's just we can get csum.
> 
> Sure. I've had understood that. What I've meant was to find out how much of the file (or, if more were affected, which files) was not guaranteed to be integrity protected, because its csum data is broken.
> 
> 
> 
>>> So would expect that the corruption or bit-flip would need to have
>>> happened at some point after it was first sent/received?
>>
>> I guess the corrupted csum tree block happen at that time.
> 
> It's still a bit strange, though, because I most likely had run a scrub since then,  and no errors were found...
> 
> But in principle, scrub should notice these corruptions in the csum tree, shouldn't it?

In theory, it should, especially for csum tree with skinny metadata feature.

In that case we should do a tree search and locate that tree block.

But there is a catch, if the tree block is still cached in memory, we 
may not do full comprehensive check on it and thus it may be a hole 
allowing it to sneak in.

Anyway, I need more investigate to be sure on how this happened without 
triggering scrub, and find a way to make btrfs a more robust memtester :)

Thanks,
Qu
> 
> 
>> And fortunately that range doesn't get much utilized thus later
>> read/write won't get interrupted by that corrupted tree block.
> 
> That I don't understand. You mean the csum tree isn't read/written in that region (i.e. not unless the associated files are read)... and that's why it went so long unnoticed?
> 
> 
> 
>>> Shouldn't I be able to find out simply by copying away each file (like
>>> what I did during yesterday's backup)?
>>
>> Yep, that's possible.
>>
>>> Or something like tar -cf /dev/null /
>>>
>>> Every file that tar cannot read should give an error, and I'd see which
>>> are affected?
>>
>> That's also a way.
> 
> Ok... if both works to find out files are affected (in the sense that they cannot be verified because the csum is broken... and thus may or may not be valid)... then I guess that's the easiest way for me to recover.
> 
> 
> 
>>>> 1) Which logical bytenr range is in that csum tree block
>>>>
>>>> 2) Which files owns the logical bytenr range.
>>>
>>> Is this possible already with standard tools?
>>
>> We have tools for 2), "btrfs ins logical-resolve" to search for all the
>> files owning a logical bytenr range.
> 
> So in principle,  since my tar yesterday brought no further errors,  the should result in only that one file.
> 
> 
>>> If I'd delete the affected file(s) would btrfs simply clear the broken
>>> csum block?
>>
>> Nope. That generation mismatch would prevent btrfs to do any
>> modification including CoW the tree block to a new location.
> 
> Ah, OK.
> 
> 
> Thanks,
> Chris.
> 
> [0] For those who haven't seen yet,  there's pcmemtest (https://github.com/martinwhitaker/pcmemtest) which is a fork off (or based upon memtest86+)... but with UEFI support, which memtest86+ cannot be used with.
> 


  reply	other threads:[~2022-03-01  2:30 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-27 17:45 BTRFS error (device dm-0): parent transid verify failed on 1382301696 wanted 262166 found 22 Christoph Anton Mitterer
2022-02-27 23:26 ` Qu Wenruo
2022-02-28  0:38   ` Christoph Anton Mitterer
2022-02-28  0:55     ` Qu Wenruo
2022-02-28  5:19       ` Christoph Anton Mitterer
2022-02-28  6:54         ` Qu Wenruo
2022-02-28  5:32       ` Christoph Anton Mitterer
2022-02-28  6:48         ` Qu Wenruo
2022-02-28 15:24           ` Christoph Anton Mitterer
2022-03-01  0:19             ` Qu Wenruo
2022-03-01  2:14               ` Christoph Anton Mitterer
2022-03-01  2:30                 ` Qu Wenruo [this message]
2022-03-02  1:38                   ` Christoph Anton Mitterer
2022-03-02  2:01                     ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=66d31354-a6d1-01a0-3674-c4976bd7d557@suse.com \
    --to=wqu@suse.com \
    --cc=calestyo@scientia.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.