All of lore.kernel.org
 help / color / mirror / Atom feed
From: Will Deacon <will@kernel.org>
To: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Mark Rutland <mark.rutland@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, Theodore Ts'o <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	linux-ext4@vger.kernel.org
Subject: Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues
Date: Wed, 6 Jan 2021 17:20:34 +0000	[thread overview]
Message-ID: <20210106172033.GA2165@willie-the-truck> (raw)
In-Reply-To: <20210106135253.GJ1551@shell.armlinux.org.uk>

On Wed, Jan 06, 2021 at 01:52:53PM +0000, Russell King - ARM Linux admin wrote:
> On Wed, Jan 06, 2021 at 11:53:59AM +0000, Mark Rutland wrote:
> > ... and are you using defconfig or something else?
> 
> Not sure I replied to this. I'm not using the defconfig, I've my own
> .config
> 
> As I mentioned, Will has built a 5.10 kernel using Arnd's gcc 4.9.4
> and hasn't been able to reproduce it. He's sent me his kernel, which
> I've booted here, and haven't yet been able to provoke it.
> 
> Meanwhile, my 5.9 kernel continues to exhibit this problem, so I've
> sent Will my .config (which I'll include here.) There are differences
> in some of the block layer configuration. There's differences in the
> errata configuration, but we don't think that's a cause (they're not
> relevant for Cortex A72).
> 
> Our plan is:
> - Will is switching to 5.9, and using my config as a base for his
>   platform.
> - Will is going to send me his modified version of my config.
> - We are both going to build using the same kernel sources and same
>   config.
> - We are going to test our own kernels, and also swap kernel images
>   and test each others.
> 
> Watch this space for more news...

I've managed to reproduce the corruption on my AMD Seattle board (8x A57).
I haven't had a chance to dig deeper yet, but here's the recipe which works
for me:

1. I'm using GCC 4.9.4 simply to try to get as close as I can to rmk's
   setup. I don't know if this is necessary or not, but the toolchain is
   here:

   https://kernel.org/pub/tools/crosstool/files/bin/arm64/4.9.4/arm64-gcc-4.9.4-nolibc-aarch64-linux-gnu.tar.xz

   and I needed to pull down an old libmpfr to get cc1 to work:

   http://ports.ubuntu.com/pool/main/m/mpfr4/libmpfr4_3.1.2-1_arm64.deb

2. I build a 5.9 kernel with the config here:

   https://mirrors.edge.kernel.org/pub/linux/kernel/people/will/bugs/rmk/config-5.9.0

   and the resulting Image is here:

   https://mirrors.edge.kernel.org/pub/linux/kernel/people/will/bugs/rmk/Image-5.9.0

3. Using that kernel, I boot into a 64-bit Debian 10 filesystem and open a
   couple of terminals over SSH.

4. In one terminal, I run:

   $ while (true); do find /var /usr /bin /sbin -type f -print0 | xargs -0
     md5sum > /dev/null; echo 2 | sudo tee /proc/sys/vm/drop_caches; done

   (note that sudo will prompt you for a password on the first iteration)

5. In the other terminal, I run:

   $ while (true); do ./hackbench ; sleep 1; done

   where hackbench is built from:

   https://people.redhat.com/mingo/cfs-scheduler/tools/hackbench.c

   and compiled according to comment in the source code.

With that, I see the following after ten seconds or so:

  EXT4-fs error (device sda2): ext4_lookup:1707: inode #674497: comm md5sum: iget: checksum invalid

Russell, Mark -- does this recipe explode reliably for you too?

Will

WARNING: multiple messages have this Message-ID (diff)
From: Will Deacon <will@kernel.org>
To: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Mark Rutland <mark.rutland@arm.com>,
	Theodore Ts'o <tytso@mit.edu>,
	linux-kernel@vger.kernel.org,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	linux-ext4@vger.kernel.org, linux-arm-kernel@lists.infradead.org
Subject: Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues
Date: Wed, 6 Jan 2021 17:20:34 +0000	[thread overview]
Message-ID: <20210106172033.GA2165@willie-the-truck> (raw)
In-Reply-To: <20210106135253.GJ1551@shell.armlinux.org.uk>

On Wed, Jan 06, 2021 at 01:52:53PM +0000, Russell King - ARM Linux admin wrote:
> On Wed, Jan 06, 2021 at 11:53:59AM +0000, Mark Rutland wrote:
> > ... and are you using defconfig or something else?
> 
> Not sure I replied to this. I'm not using the defconfig, I've my own
> .config
> 
> As I mentioned, Will has built a 5.10 kernel using Arnd's gcc 4.9.4
> and hasn't been able to reproduce it. He's sent me his kernel, which
> I've booted here, and haven't yet been able to provoke it.
> 
> Meanwhile, my 5.9 kernel continues to exhibit this problem, so I've
> sent Will my .config (which I'll include here.) There are differences
> in some of the block layer configuration. There's differences in the
> errata configuration, but we don't think that's a cause (they're not
> relevant for Cortex A72).
> 
> Our plan is:
> - Will is switching to 5.9, and using my config as a base for his
>   platform.
> - Will is going to send me his modified version of my config.
> - We are both going to build using the same kernel sources and same
>   config.
> - We are going to test our own kernels, and also swap kernel images
>   and test each others.
> 
> Watch this space for more news...

I've managed to reproduce the corruption on my AMD Seattle board (8x A57).
I haven't had a chance to dig deeper yet, but here's the recipe which works
for me:

1. I'm using GCC 4.9.4 simply to try to get as close as I can to rmk's
   setup. I don't know if this is necessary or not, but the toolchain is
   here:

   https://kernel.org/pub/tools/crosstool/files/bin/arm64/4.9.4/arm64-gcc-4.9.4-nolibc-aarch64-linux-gnu.tar.xz

   and I needed to pull down an old libmpfr to get cc1 to work:

   http://ports.ubuntu.com/pool/main/m/mpfr4/libmpfr4_3.1.2-1_arm64.deb

2. I build a 5.9 kernel with the config here:

   https://mirrors.edge.kernel.org/pub/linux/kernel/people/will/bugs/rmk/config-5.9.0

   and the resulting Image is here:

   https://mirrors.edge.kernel.org/pub/linux/kernel/people/will/bugs/rmk/Image-5.9.0

3. Using that kernel, I boot into a 64-bit Debian 10 filesystem and open a
   couple of terminals over SSH.

4. In one terminal, I run:

   $ while (true); do find /var /usr /bin /sbin -type f -print0 | xargs -0
     md5sum > /dev/null; echo 2 | sudo tee /proc/sys/vm/drop_caches; done

   (note that sudo will prompt you for a password on the first iteration)

5. In the other terminal, I run:

   $ while (true); do ./hackbench ; sleep 1; done

   where hackbench is built from:

   https://people.redhat.com/mingo/cfs-scheduler/tools/hackbench.c

   and compiled according to comment in the source code.

With that, I see the following after ten seconds or so:

  EXT4-fs error (device sda2): ext4_lookup:1707: inode #674497: comm md5sum: iget: checksum invalid

Russell, Mark -- does this recipe explode reliably for you too?

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-01-06 17:21 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-05 15:47 Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues Russell King - ARM Linux admin
2021-01-05 15:47 ` Russell King - ARM Linux admin
2021-01-05 18:27 ` Darrick J. Wong
2021-01-05 18:27   ` Darrick J. Wong
2021-01-05 19:50   ` Russell King - ARM Linux admin
2021-01-05 19:50     ` Russell King - ARM Linux admin
2021-01-06 11:53 ` Mark Rutland
2021-01-06 11:53   ` Mark Rutland
2021-01-06 12:13   ` Russell King - ARM Linux admin
2021-01-06 12:13     ` Russell King - ARM Linux admin
2021-01-06 13:52   ` Russell King - ARM Linux admin
2021-01-06 17:20     ` Will Deacon [this message]
2021-01-06 17:20       ` Will Deacon
2021-01-06 17:46       ` Russell King - ARM Linux admin
2021-01-06 17:46         ` Russell King - ARM Linux admin
2021-01-06 21:04       ` Arnd Bergmann
2021-01-06 21:04         ` Arnd Bergmann
2021-01-06 22:00         ` Arnd Bergmann
2021-01-06 22:00           ` Arnd Bergmann
2021-01-06 22:32       ` Russell King - ARM Linux admin
2021-01-06 22:32         ` Russell King - ARM Linux admin
2021-01-07 11:18         ` Russell King - ARM Linux admin
2021-01-07 11:18           ` Russell King - ARM Linux admin
2021-01-07 12:45           ` Russell King - ARM Linux admin
2021-01-07 12:45             ` Russell King - ARM Linux admin
2021-01-07 13:16             ` Arnd Bergmann
2021-01-07 13:16               ` Arnd Bergmann
2021-01-07 13:37               ` Russell King - ARM Linux admin
2021-01-07 13:37                 ` Russell King - ARM Linux admin
2021-01-07 16:27                 ` Theodore Ts'o
2021-01-07 16:27                   ` Theodore Ts'o
2021-01-07 17:00                   ` Florian Weimer
2021-01-07 17:00                     ` Florian Weimer
2021-01-07 21:48                   ` Arnd Bergmann
2021-01-07 21:48                     ` Arnd Bergmann
2021-01-07 22:14                     ` Russell King - ARM Linux admin
2021-01-07 22:14                       ` Russell King - ARM Linux admin
2021-01-07 22:41                       ` Eric Biggers
2021-01-07 22:41                         ` Eric Biggers
2021-01-08  8:21                         ` Ard Biesheuvel
2021-01-08  8:21                           ` Ard Biesheuvel
2021-01-07 22:27                     ` Eric Biggers
2021-01-07 22:27                       ` Eric Biggers
2021-01-07 23:53                       ` Darrick J. Wong
2021-01-07 23:53                         ` Darrick J. Wong
2021-01-08  8:05                         ` Arnd Bergmann
2021-01-08  8:05                           ` Arnd Bergmann
2021-01-08  9:13                   ` Peter Zijlstra
2021-01-08  9:13                     ` Peter Zijlstra
2021-01-08 10:31                   ` Pavel Machek
2021-01-08 10:31                     ` Pavel Machek
2021-01-07 21:20                 ` Arnd Bergmann
2021-01-07 21:20                   ` Arnd Bergmann
2021-01-08  9:21                   ` Peter Zijlstra
2021-01-08  9:21                     ` Peter Zijlstra
2021-01-08  9:26                     ` Will Deacon
2021-01-08  9:26                       ` Will Deacon
2021-01-08 20:02                       ` Linus Torvalds
2021-01-08 20:02                         ` Linus Torvalds
2021-01-08 20:22                         ` Arnd Bergmann
2021-01-08 20:22                           ` Arnd Bergmann
2021-01-08 21:20                           ` Nick Desaulniers
2021-01-08 21:20                             ` Nick Desaulniers
2021-01-08 20:29                         ` Russell King - ARM Linux admin
2021-01-08 20:29                           ` Russell King - ARM Linux admin
2021-01-12 13:20                         ` Lukas Wunner
2021-01-12 13:31                           ` Florian Weimer
2021-01-12 13:31                             ` Florian Weimer
2021-01-12 13:46                             ` David Laight
2021-01-12 13:46                               ` David Laight
2021-01-12 17:28                           ` Linus Torvalds
2021-01-12 17:28                             ` Linus Torvalds
2021-01-14 13:13                             ` Lukas Wunner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210106172033.GA2165@willie-the-truck \
    --to=will@kernel.org \
    --cc=adilger.kernel@dilger.ca \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=mark.rutland@arm.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.