All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Joe Hermaszewski <joe@monoid.al>, linux-btrfs@vger.kernel.org
Subject: Re: btrfs crash on armv7
Date: Thu, 26 Nov 2020 14:26:40 +0800	[thread overview]
Message-ID: <beb026bf-3f46-ae4d-7258-acf4c3ff001c@gmx.com> (raw)
In-Reply-To: <1f5cf01f-0f5e-8691-541d-efb763919577@gmx.com>


[-- Attachment #1.1: Type: text/plain, Size: 2893 bytes --]



On 2020/11/26 下午2:15, Qu Wenruo wrote:
> 
> 
> On 2020/11/25 下午11:28, Joe Hermaszewski wrote:
>> Hi,
>>
>> I have a arm32 machine with four drives with a btrfs fs spanning then in RAID1.
>> The filesystem has started behaving badly recently and I'm writing to:
>>
>> - Solicit advice on how best to get the system back to a stable state
>> - Report a potential bug
>>
>> ## What happened:
>>
>> A couple of days ago I could no longer ssh into it, and on the serial
>> connection there were heaps of messages (and new ones appearing with great
>> frequency) along the lines of: `parent transid verify failed on blah... wanted
>> x got y`.
>>
>> Although I don't have a record of the precise messages I do remember that there
>> was a difference of `15` between x and y.
> 
> This normally can be a sign of unreliable HDD, which lies on FLUSH,
> killing metadata COW.
> 
> But, your btrfs check doesn't report the same problem, thus I'm confused.
> 
> Would you please try to run a "btrfs check --readonly /dev/sda1" with
> the fs unmounted?
> 
> And, would you provide the full dmesg of that mismatch?
> The reason for the exact number is, I'm suspecting hardware memory
> corruption.
> 
>>
>> I power-cycled system and started a scrub after it rebooted, this was
>> interrupted quite promptly by several more errors in btrfs, and the disk
>> remounted RO.
>>
>> Every now and then in the kernel log I get messages like:
>>
>> `parent transid verify failed on blah... wanted x got y`
> 
> Not showing up in the gist dmesg though.
> 
>>
>> ## Important info
>>
>> The dev stats are all zero.
>>
>> Here are the outputs of some btrfs commands, dmesg and the kernel log from the
>> previous two boots: https://gist.github.com/b1beab134403c5047e2efbceb98985f9
>>
>> The "cut here" portion of the kernel log is as follows
>>
>> ```
>> [  409.158097] ------------[ cut here ]------------
>> [  409.158205] WARNING: CPU: 1 PID: 217 at fs/btrfs/disk-io.c:531
>> btree_csum_one_bio+0x208/0x248 [btrfs]
> 
> The line number shows, the tree bytenr doesn't match with the one in memory.
> This is really too rare to be seen, especially when we have no other
> error reported from btrfs (at least in the gist)
> 
> Since there is no other problems showing up in the gist, it means it
> could be a bit flip, considering the magic generation gap you mention is
> 15, I'm more suspicious about memory bit flip.
> 
> If you can provide the full parent transid mismatch error message, it
> may help to determine the possibility of hardware memory corruption.

Also if you can access the device, especially sda, please also try to
attach it to a known working system, and check if "btrfs check" reports
any error. (That file hole error can be ignored, as it won't cause any
problem).

Thanks,
Qu

> 
> Thanks,
> Qu



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2020-11-26  6:27 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-25 15:28 btrfs crash on armv7 Joe Hermaszewski
2020-11-26  6:15 ` Qu Wenruo
2020-11-26  6:26   ` Qu Wenruo [this message]
2020-11-26 10:53   ` Joe Hermaszewski
2020-11-26 11:05     ` Qu Wenruo
2020-11-27 15:15       ` Joe Hermaszewski
2020-11-28  0:45         ` Qu Wenruo
2020-12-19 10:35           ` Joe Hermaszewski
2020-12-20  0:28             ` Qu Wenruo
2021-04-08  8:16               ` Joe Hermaszewski
2021-04-08  8:38                 ` Qu Wenruo
     [not found]                   ` <CA+4cVr8sxGT1Zz+1tz+0OqBCukFgn7d_ZZq31bXASc426YbJ7A@mail.gmail.com>
     [not found]                     ` <1ae47f73-f39e-bb71-d0b2-02999a703a4b@gmx.com>
     [not found]                       ` <CA+4cVr9Zgscn=L0a6CXrCaWK12mne8EpdW0eEe+PPuhQG2fmxQ@mail.gmail.com>
2021-04-08 10:22                         ` Qu Wenruo
2021-04-08 11:15                   ` riteshh
2021-04-08 11:29                     ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=beb026bf-3f46-ae4d-7258-acf4c3ff001c@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=joe@monoid.al \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.