From: Russell King - ARM Linux admin <linux@armlinux.org.uk>
To: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, Theodore Ts'o <tytso@mit.edu>,
Andreas Dilger <adilger.kernel@dilger.ca>,
linux-ext4@vger.kernel.org
Subject: Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues
Date: Wed, 6 Jan 2021 17:46:05 +0000 [thread overview]
Message-ID: <20210106174605.GL1551@shell.armlinux.org.uk> (raw)
In-Reply-To: <20210106172033.GA2165@willie-the-truck>
On Wed, Jan 06, 2021 at 05:20:34PM +0000, Will Deacon wrote:
> I've managed to reproduce the corruption on my AMD Seattle board (8x A57).
> I haven't had a chance to dig deeper yet, but here's the recipe which works
> for me:
>
> 1. I'm using GCC 4.9.4 simply to try to get as close as I can to rmk's
> setup. I don't know if this is necessary or not, but the toolchain is
> here:
>
> https://kernel.org/pub/tools/crosstool/files/bin/arm64/4.9.4/arm64-gcc-4.9.4-nolibc-aarch64-linux-gnu.tar.xz
>
> and I needed to pull down an old libmpfr to get cc1 to work:
>
> http://ports.ubuntu.com/pool/main/m/mpfr4/libmpfr4_3.1.2-1_arm64.deb
>
> 2. I build a 5.9 kernel with the config here:
>
> https://mirrors.edge.kernel.org/pub/linux/kernel/people/will/bugs/rmk/config-5.9.0
>
> and the resulting Image is here:
>
> https://mirrors.edge.kernel.org/pub/linux/kernel/people/will/bugs/rmk/Image-5.9.0
>
> 3. Using that kernel, I boot into a 64-bit Debian 10 filesystem and open a
> couple of terminals over SSH.
>
> 4. In one terminal, I run:
>
> $ while (true); do find /var /usr /bin /sbin -type f -print0 | xargs -0
> md5sum > /dev/null; echo 2 | sudo tee /proc/sys/vm/drop_caches; done
>
> (note that sudo will prompt you for a password on the first iteration)
>
> 5. In the other terminal, I run:
>
> $ while (true); do ./hackbench ; sleep 1; done
>
> where hackbench is built from:
>
> https://people.redhat.com/mingo/cfs-scheduler/tools/hackbench.c
>
> and compiled according to comment in the source code.
>
> With that, I see the following after ten seconds or so:
>
> EXT4-fs error (device sda2): ext4_lookup:1707: inode #674497: comm md5sum: iget: checksum invalid
>
> Russell, Mark -- does this recipe explode reliably for you too?
It took a couple of iterations of the find loop (4) here on a kernel
where I'd dropped BLK_WBT=y from my .config... whereas I wasn't able
to provoke it before. So running hackbench in parallel seems to
increase the probability.
I rebooted, set it going again, and on the first iteration it exploded
with ext4 inode checksum failure. And again on the following reboot.
So yes, it looks like you've found a way to more reliably reproduce
it.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
WARNING: multiple messages have this Message-ID (diff)
From: Russell King - ARM Linux admin <linux@armlinux.org.uk>
To: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>,
Theodore Ts'o <tytso@mit.edu>,
linux-kernel@vger.kernel.org,
Andreas Dilger <adilger.kernel@dilger.ca>,
linux-ext4@vger.kernel.org, linux-arm-kernel@lists.infradead.org
Subject: Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues
Date: Wed, 6 Jan 2021 17:46:05 +0000 [thread overview]
Message-ID: <20210106174605.GL1551@shell.armlinux.org.uk> (raw)
In-Reply-To: <20210106172033.GA2165@willie-the-truck>
On Wed, Jan 06, 2021 at 05:20:34PM +0000, Will Deacon wrote:
> I've managed to reproduce the corruption on my AMD Seattle board (8x A57).
> I haven't had a chance to dig deeper yet, but here's the recipe which works
> for me:
>
> 1. I'm using GCC 4.9.4 simply to try to get as close as I can to rmk's
> setup. I don't know if this is necessary or not, but the toolchain is
> here:
>
> https://kernel.org/pub/tools/crosstool/files/bin/arm64/4.9.4/arm64-gcc-4.9.4-nolibc-aarch64-linux-gnu.tar.xz
>
> and I needed to pull down an old libmpfr to get cc1 to work:
>
> http://ports.ubuntu.com/pool/main/m/mpfr4/libmpfr4_3.1.2-1_arm64.deb
>
> 2. I build a 5.9 kernel with the config here:
>
> https://mirrors.edge.kernel.org/pub/linux/kernel/people/will/bugs/rmk/config-5.9.0
>
> and the resulting Image is here:
>
> https://mirrors.edge.kernel.org/pub/linux/kernel/people/will/bugs/rmk/Image-5.9.0
>
> 3. Using that kernel, I boot into a 64-bit Debian 10 filesystem and open a
> couple of terminals over SSH.
>
> 4. In one terminal, I run:
>
> $ while (true); do find /var /usr /bin /sbin -type f -print0 | xargs -0
> md5sum > /dev/null; echo 2 | sudo tee /proc/sys/vm/drop_caches; done
>
> (note that sudo will prompt you for a password on the first iteration)
>
> 5. In the other terminal, I run:
>
> $ while (true); do ./hackbench ; sleep 1; done
>
> where hackbench is built from:
>
> https://people.redhat.com/mingo/cfs-scheduler/tools/hackbench.c
>
> and compiled according to comment in the source code.
>
> With that, I see the following after ten seconds or so:
>
> EXT4-fs error (device sda2): ext4_lookup:1707: inode #674497: comm md5sum: iget: checksum invalid
>
> Russell, Mark -- does this recipe explode reliably for you too?
It took a couple of iterations of the find loop (4) here on a kernel
where I'd dropped BLK_WBT=y from my .config... whereas I wasn't able
to provoke it before. So running hackbench in parallel seems to
increase the probability.
I rebooted, set it going again, and on the first iteration it exploded
with ext4 inode checksum failure. And again on the following reboot.
So yes, it looks like you've found a way to more reliably reproduce
it.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2021-01-06 17:46 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-05 15:47 Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues Russell King - ARM Linux admin
2021-01-05 15:47 ` Russell King - ARM Linux admin
2021-01-05 18:27 ` Darrick J. Wong
2021-01-05 18:27 ` Darrick J. Wong
2021-01-05 19:50 ` Russell King - ARM Linux admin
2021-01-05 19:50 ` Russell King - ARM Linux admin
2021-01-06 11:53 ` Mark Rutland
2021-01-06 11:53 ` Mark Rutland
2021-01-06 12:13 ` Russell King - ARM Linux admin
2021-01-06 12:13 ` Russell King - ARM Linux admin
2021-01-06 13:52 ` Russell King - ARM Linux admin
2021-01-06 17:20 ` Will Deacon
2021-01-06 17:20 ` Will Deacon
2021-01-06 17:46 ` Russell King - ARM Linux admin [this message]
2021-01-06 17:46 ` Russell King - ARM Linux admin
2021-01-06 21:04 ` Arnd Bergmann
2021-01-06 21:04 ` Arnd Bergmann
2021-01-06 22:00 ` Arnd Bergmann
2021-01-06 22:00 ` Arnd Bergmann
2021-01-06 22:32 ` Russell King - ARM Linux admin
2021-01-06 22:32 ` Russell King - ARM Linux admin
2021-01-07 11:18 ` Russell King - ARM Linux admin
2021-01-07 11:18 ` Russell King - ARM Linux admin
2021-01-07 12:45 ` Russell King - ARM Linux admin
2021-01-07 12:45 ` Russell King - ARM Linux admin
2021-01-07 13:16 ` Arnd Bergmann
2021-01-07 13:16 ` Arnd Bergmann
2021-01-07 13:37 ` Russell King - ARM Linux admin
2021-01-07 13:37 ` Russell King - ARM Linux admin
2021-01-07 16:27 ` Theodore Ts'o
2021-01-07 16:27 ` Theodore Ts'o
2021-01-07 17:00 ` Florian Weimer
2021-01-07 17:00 ` Florian Weimer
2021-01-07 21:48 ` Arnd Bergmann
2021-01-07 21:48 ` Arnd Bergmann
2021-01-07 22:14 ` Russell King - ARM Linux admin
2021-01-07 22:14 ` Russell King - ARM Linux admin
2021-01-07 22:41 ` Eric Biggers
2021-01-07 22:41 ` Eric Biggers
2021-01-08 8:21 ` Ard Biesheuvel
2021-01-08 8:21 ` Ard Biesheuvel
2021-01-07 22:27 ` Eric Biggers
2021-01-07 22:27 ` Eric Biggers
2021-01-07 23:53 ` Darrick J. Wong
2021-01-07 23:53 ` Darrick J. Wong
2021-01-08 8:05 ` Arnd Bergmann
2021-01-08 8:05 ` Arnd Bergmann
2021-01-08 9:13 ` Peter Zijlstra
2021-01-08 9:13 ` Peter Zijlstra
2021-01-08 10:31 ` Pavel Machek
2021-01-08 10:31 ` Pavel Machek
2021-01-07 21:20 ` Arnd Bergmann
2021-01-07 21:20 ` Arnd Bergmann
2021-01-08 9:21 ` Peter Zijlstra
2021-01-08 9:21 ` Peter Zijlstra
2021-01-08 9:26 ` Will Deacon
2021-01-08 9:26 ` Will Deacon
2021-01-08 20:02 ` Linus Torvalds
2021-01-08 20:02 ` Linus Torvalds
2021-01-08 20:22 ` Arnd Bergmann
2021-01-08 20:22 ` Arnd Bergmann
2021-01-08 21:20 ` Nick Desaulniers
2021-01-08 21:20 ` Nick Desaulniers
2021-01-08 20:29 ` Russell King - ARM Linux admin
2021-01-08 20:29 ` Russell King - ARM Linux admin
2021-01-12 13:20 ` Lukas Wunner
2021-01-12 13:31 ` Florian Weimer
2021-01-12 13:31 ` Florian Weimer
2021-01-12 13:46 ` David Laight
2021-01-12 13:46 ` David Laight
2021-01-12 17:28 ` Linus Torvalds
2021-01-12 17:28 ` Linus Torvalds
2021-01-14 13:13 ` Lukas Wunner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210106174605.GL1551@shell.armlinux.org.uk \
--to=linux@armlinux.org.uk \
--cc=adilger.kernel@dilger.ca \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=tytso@mit.edu \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.