All of lore.kernel.org
 help / color / mirror / Atom feed
From: Russell King - ARM Linux admin <linux@armlinux.org.uk>
To: "Theodore Y. Ts'o" <tytso@mit.edu>
Cc: Olof Johansson <olof@lixom.net>,
	Jon Nettleton <jon@solid-run.com>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	"mark.rutland@arm.com" <mark.rutland@arm.com>,
	Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>,
	"arnd@arndb.de" <arnd@arndb.de>,
	"m.karthikeyan@mobiveil.co.in" <m.karthikeyan@mobiveil.co.in>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"Z.q. Hou" <zhiqiang.hou@nxp.com>,
	"l.subrahmanya@mobiveil.co.in" <l.subrahmanya@mobiveil.co.in>,
	"will.deacon@arm.com" <will.deacon@arm.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Leo Li <leoyang.li@nxp.com>, "M.h. Lian" <minghuan.lian@nxp.com>,
	Xiaowei Bao <xiaowei.bao@nxp.com>,
	"catalin.marinas@arm.com" <catalin.marinas@arm.com>,
	"bhelgaas@google.com" <bhelgaas@google.com>,
	"andrew.murray@arm.com" <andrew.murray@arm.com>,
	"shawnguo@kernel.org" <shawnguo@kernel.org>,
	Mingkai Hu <mingkai.hu@nxp.com>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCHv9 00/12] PCI: Recode Mobiveil driver and add PCIe Gen4 driver for NXP Layerscape SoCs
Date: Sat, 29 Feb 2020 17:03:28 +0000	[thread overview]
Message-ID: <20200229170328.GD25745@shell.armlinux.org.uk> (raw)
In-Reply-To: <20200229151907.GA7378@mit.edu>

On Sat, Feb 29, 2020 at 10:19:07AM -0500, Theodore Y. Ts'o wrote:
> On Sat, Feb 29, 2020 at 11:04:56AM +0000, Russell King - ARM Linux admin wrote:
> > Could it be a race condition, or some problem that's specific to the
> > ARM64 kernel that's provoking this corruption?
> 
> Since I got brought in mid-way through this discussion, can someone
> summarize the vital details of the bughunt?  What kernel version is
> involved, and is this a regression?  If so, what's the last version of
> the kernel where you didn't have a problem on this hardware?

It's a new platform, I've run most 5.x kernels on it, but only recently
have I had a NVMe.  Currently running a 5.5 based kernel (for which I
have to patch in support for the platform), and I've no idea if it is
a regression or not.

> Can you trigger this failure reliably?

No - the very first time I ended up with a corrupted ext4 fs was on the
8th February, and at that time it was put down to the NVMe not being
power-off safe: the machine had crashed sometime over night, resulting
in a section of my network going offline (due to a pause frame storm).
So, I powered it down from crashed state - and from what people tell me,
NVMe _may_ keep blocks unwritten to safe media for a considerable time.

I never bothered to investigate it because the explanation seemed
reasonable, and manually running e2fsck fixed the filesystem.

The system was then booted back into using the NVMe rootfs, and
continued to do so without apparent issue until the 21st Feb, when I
cleanly shut it down, and powered it off.  During the time it was
running, it likely saw many reboots of the 5.5 kernel.

I powered it back on yesterday morning, and this morning it found the
fs corruption while trying to do a logrotate.

As I say in my last email, I suspect it isn't an ext4 bug, but either
a locking implementation issue, coherency issue, or interconnect issue.
The 4k block with the affected inode looks perfectly reasonable with
the only exception that the checksum is incorrect for that one inode -
and other inodes stored in the same 4k block were modified afterwards.
It suggests to me that the writes to update the two 16-bit words
containing the checksum were somehow lost for this particular inode.

> Unfortunately, while I'm regularly running xfstests on x86_64 on a
> Google Compute Engine VM, I'm not doing any runs on arm64.  I can
> certainly build an arm-64.
> 
> There's a test-appliance designed to be run on ARM64 here[1].
> 
> [1] https://kernel.org/pub/linux/kernel/people/tytso/kvm-xfstests/xfstests-amd64.tar.xz

The filename seems to say "amd64" not "arm64" ?

> which is a Debian chroot, designed to be run via android-xfstests[2], but
> if you unpack it, it should be possible to enter the chroot and
> trigger the xfstests run manually on any arm64 system.
> 
> [2] https://thunk.org/android-xfstests
> 
> Does anyone know if kernel CI is running xfstests regularly?

I don't know...

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

WARNING: multiple messages have this Message-ID (diff)
From: Russell King - ARM Linux admin <linux@armlinux.org.uk>
To: "Theodore Y. Ts'o" <tytso@mit.edu>
Cc: "mark.rutland@arm.com" <mark.rutland@arm.com>,
	Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>,
	"m.karthikeyan@mobiveil.co.in" <m.karthikeyan@mobiveil.co.in>,
	"arnd@arndb.de" <arnd@arndb.de>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"Z.q. Hou" <zhiqiang.hou@nxp.com>,
	"l.subrahmanya@mobiveil.co.in" <l.subrahmanya@mobiveil.co.in>,
	Jon Nettleton <jon@solid-run.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"will.deacon@arm.com" <will.deacon@arm.com>,
	Leo Li <leoyang.li@nxp.com>, "M.h. Lian" <minghuan.lian@nxp.com>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Xiaowei Bao <xiaowei.bao@nxp.com>,
	"catalin.marinas@arm.com" <catalin.marinas@arm.com>,
	Olof Johansson <olof@lixom.net>,
	"andrew.murray@arm.com" <andrew.murray@arm.com>,
	"bhelgaas@google.com" <bhelgaas@google.com>,
	"shawnguo@kernel.org" <shawnguo@kernel.org>,
	Mingkai Hu <mingkai.hu@nxp.com>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCHv9 00/12] PCI: Recode Mobiveil driver and add PCIe Gen4 driver for NXP Layerscape SoCs
Date: Sat, 29 Feb 2020 17:03:28 +0000	[thread overview]
Message-ID: <20200229170328.GD25745@shell.armlinux.org.uk> (raw)
In-Reply-To: <20200229151907.GA7378@mit.edu>

On Sat, Feb 29, 2020 at 10:19:07AM -0500, Theodore Y. Ts'o wrote:
> On Sat, Feb 29, 2020 at 11:04:56AM +0000, Russell King - ARM Linux admin wrote:
> > Could it be a race condition, or some problem that's specific to the
> > ARM64 kernel that's provoking this corruption?
> 
> Since I got brought in mid-way through this discussion, can someone
> summarize the vital details of the bughunt?  What kernel version is
> involved, and is this a regression?  If so, what's the last version of
> the kernel where you didn't have a problem on this hardware?

It's a new platform, I've run most 5.x kernels on it, but only recently
have I had a NVMe.  Currently running a 5.5 based kernel (for which I
have to patch in support for the platform), and I've no idea if it is
a regression or not.

> Can you trigger this failure reliably?

No - the very first time I ended up with a corrupted ext4 fs was on the
8th February, and at that time it was put down to the NVMe not being
power-off safe: the machine had crashed sometime over night, resulting
in a section of my network going offline (due to a pause frame storm).
So, I powered it down from crashed state - and from what people tell me,
NVMe _may_ keep blocks unwritten to safe media for a considerable time.

I never bothered to investigate it because the explanation seemed
reasonable, and manually running e2fsck fixed the filesystem.

The system was then booted back into using the NVMe rootfs, and
continued to do so without apparent issue until the 21st Feb, when I
cleanly shut it down, and powered it off.  During the time it was
running, it likely saw many reboots of the 5.5 kernel.

I powered it back on yesterday morning, and this morning it found the
fs corruption while trying to do a logrotate.

As I say in my last email, I suspect it isn't an ext4 bug, but either
a locking implementation issue, coherency issue, or interconnect issue.
The 4k block with the affected inode looks perfectly reasonable with
the only exception that the checksum is incorrect for that one inode -
and other inodes stored in the same 4k block were modified afterwards.
It suggests to me that the writes to update the two 16-bit words
containing the checksum were somehow lost for this particular inode.

> Unfortunately, while I'm regularly running xfstests on x86_64 on a
> Google Compute Engine VM, I'm not doing any runs on arm64.  I can
> certainly build an arm-64.
> 
> There's a test-appliance designed to be run on ARM64 here[1].
> 
> [1] https://kernel.org/pub/linux/kernel/people/tytso/kvm-xfstests/xfstests-amd64.tar.xz

The filename seems to say "amd64" not "arm64" ?

> which is a Debian chroot, designed to be run via android-xfstests[2], but
> if you unpack it, it should be possible to enter the chroot and
> trigger the xfstests run manually on any arm64 system.
> 
> [2] https://thunk.org/android-xfstests
> 
> Does anyone know if kernel CI is running xfstests regularly?

I don't know...

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-02-29 17:03 UTC|newest]

Thread overview: 128+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-20  3:45 [PATCHv9 00/12] PCI: Recode Mobiveil driver and add PCIe Gen4 driver for NXP Layerscape SoCs Z.q. Hou
2019-11-20  3:45 ` Z.q. Hou
2019-11-20  3:45 ` [PATCHv9 01/12] PCI: mobiveil: Re-abstract the private structure Z.q. Hou
2019-11-20  3:45   ` Z.q. Hou
2020-01-13 10:09   ` Andrew Murray
2020-01-13 10:09     ` Andrew Murray
2020-02-06 11:04     ` Z.q. Hou
2020-02-06 11:04       ` Z.q. Hou
2020-02-06 11:27       ` Andrew Murray
2020-02-06 11:27         ` Andrew Murray
2019-11-20  3:45 ` [PATCHv9 02/12] PCI: mobiveil: Move the host initialization into a routine Z.q. Hou
2019-11-20  3:45   ` Z.q. Hou
2020-01-13 10:19   ` Andrew Murray
2020-01-13 10:19     ` Andrew Murray
2020-02-06 11:14     ` Z.q. Hou
2020-02-06 11:14       ` Z.q. Hou
2019-11-20  3:45 ` [PATCHv9 03/12] PCI: mobiveil: Collect the interrupt related operations " Z.q. Hou
2019-11-20  3:45   ` Z.q. Hou
2020-01-13 10:34   ` Andrew Murray
2020-01-13 10:34     ` Andrew Murray
2020-02-06 11:30     ` Z.q. Hou
2020-02-06 11:30       ` Z.q. Hou
2019-11-20  3:45 ` [PATCHv9 04/12] PCI: mobiveil: Modularize the Mobiveil PCIe Host Bridge IP driver Z.q. Hou
2019-11-20  3:45   ` Z.q. Hou
2020-01-13 11:05   ` Andrew Murray
2020-01-13 11:05     ` Andrew Murray
2020-02-06 12:25     ` Z.q. Hou
2020-02-06 12:25       ` Z.q. Hou
2019-11-20  3:45 ` [PATCHv9 05/12] PCI: mobiveil: Add callback function for interrupt initialization Z.q. Hou
2019-11-20  3:45   ` Z.q. Hou
2020-01-13 11:19   ` Andrew Murray
2020-01-13 11:19     ` Andrew Murray
2020-02-06 13:25     ` Z.q. Hou
2020-02-06 13:25       ` Z.q. Hou
2019-11-20  3:45 ` [PATCHv9 06/12] PCI: mobiveil: Add callback function for link up check Z.q. Hou
2019-11-20  3:45   ` Z.q. Hou
2020-01-13 11:22   ` Andrew Murray
2020-01-13 11:22     ` Andrew Murray
2020-02-06 13:25     ` Z.q. Hou
2020-02-06 13:25       ` Z.q. Hou
2019-11-20  3:46 ` [PATCHv9 07/12] PCI: mobiveil: Make mobiveil_host_init() can be used to re-init host Z.q. Hou
2019-11-20  3:46   ` Z.q. Hou
2020-01-13 11:26   ` Andrew Murray
2020-01-13 11:26     ` Andrew Murray
2020-02-06 13:27     ` Z.q. Hou
2020-02-06 13:27       ` Z.q. Hou
2019-11-20  3:46 ` [PATCHv9 08/12] PCI: mobiveil: Add 8-bit and 16-bit CSR register accessors Z.q. Hou
2019-11-20  3:46   ` Z.q. Hou
2020-01-13 11:31   ` Andrew Murray
2020-01-13 11:31     ` Andrew Murray
2020-02-06 13:45     ` Z.q. Hou
2020-02-06 13:45       ` Z.q. Hou
2019-11-20  3:46 ` [PATCHv9 09/12] dt-bindings: PCI: Add NXP Layerscape SoCs PCIe Gen4 controller Z.q. Hou
2019-11-20  3:46   ` Z.q. Hou
2019-11-20  3:46 ` [PATCHv9 10/12] PCI: mobiveil: Add PCIe Gen4 RC driver for NXP Layerscape SoCs Z.q. Hou
2019-11-20  3:46   ` Z.q. Hou
2020-01-13 12:02   ` Andrew Murray
2020-01-13 12:02     ` Andrew Murray
2020-02-06 13:45     ` Z.q. Hou
2020-02-06 13:45       ` Z.q. Hou
2020-02-06 14:29       ` Andrew Murray
2020-02-06 14:29         ` Andrew Murray
2019-11-20  3:46 ` [PATCHv9 11/12] arm64: dts: lx2160a: Add PCIe controller DT nodes Z.q. Hou
2019-11-20  3:46   ` Z.q. Hou
2019-11-20  3:46 ` [PATCHv9 12/12] arm64: defconfig: Enable CONFIG_PCIE_LAYERSCAPE_GEN4 Z.q. Hou
2019-11-20  3:46   ` Z.q. Hou
2019-11-20  9:57 ` [PATCHv9 00/12] PCI: Recode Mobiveil driver and add PCIe Gen4 driver for NXP Layerscape SoCs Russell King - ARM Linux admin
2019-11-20  9:57   ` Russell King - ARM Linux admin
2019-11-20 10:30   ` Z.q. Hou
2019-11-20 10:30     ` Z.q. Hou
2019-12-13 18:37 ` Olof Johansson
2019-12-13 18:37   ` Olof Johansson
2019-12-17  2:50   ` Z.q. Hou
2019-12-17  2:50     ` Z.q. Hou
2020-01-10 15:33     ` Lorenzo Pieralisi
2020-01-10 15:33       ` Lorenzo Pieralisi
2020-01-10 17:05       ` Olof Johansson
2020-01-10 17:05         ` Olof Johansson
2020-02-06 10:57         ` Z.q. Hou
2020-02-06 10:57           ` Z.q. Hou
2020-02-10 15:12           ` Olof Johansson
2020-02-10 15:12             ` Olof Johansson
2020-02-10 15:22             ` Russell King - ARM Linux admin
2020-02-10 15:22               ` Russell King - ARM Linux admin
2020-02-10 15:28               ` Olof Johansson
2020-02-10 15:28                 ` Olof Johansson
2020-02-10 16:15                 ` Russell King - ARM Linux admin
2020-02-10 16:15                   ` Russell King - ARM Linux admin
2020-02-10 17:20                   ` Russell King - ARM Linux admin
2020-02-10 17:20                     ` Russell King - ARM Linux admin
2020-02-10 18:33                   ` Olof Johansson
2020-02-10 18:33                     ` Olof Johansson
2020-02-10 18:41                 ` Li Yang
2020-02-10 18:41                   ` Li Yang
2020-02-10 19:48                   ` Li Yang
2020-02-10 19:48                     ` Li Yang
2020-02-11 12:13                   ` Laurentiu Tudor
2020-02-11 12:13                     ` Laurentiu Tudor
2020-02-11 13:04                     ` Robin Murphy
2020-02-11 13:04                       ` Robin Murphy
2020-02-11 13:55                       ` Laurentiu Tudor
2020-02-11 13:55                         ` Laurentiu Tudor
2020-02-11 14:51                         ` Robin Murphy
2020-02-11 14:51                           ` Robin Murphy
2020-02-11 14:48                       ` Olof Johansson
2020-02-11 14:48                         ` Olof Johansson
2020-02-11 15:14                         ` Laurentiu Tudor
2020-02-11 15:14                           ` Laurentiu Tudor
2020-02-29  9:55               ` Russell King - ARM Linux admin
2020-02-29  9:55                 ` Russell King - ARM Linux admin
2020-02-29 11:04                 ` Russell King - ARM Linux admin
2020-02-29 11:04                   ` Russell King - ARM Linux admin
2020-02-29 12:08                   ` Russell King - ARM Linux admin
2020-02-29 12:08                     ` Russell King - ARM Linux admin
2020-02-29 13:32                     ` Russell King - ARM Linux admin
2020-02-29 13:32                       ` Russell King - ARM Linux admin
2020-02-29 15:19                   ` Theodore Y. Ts'o
2020-02-29 15:19                     ` Theodore Y. Ts'o
2020-02-29 17:03                     ` Russell King - ARM Linux admin [this message]
2020-02-29 17:03                       ` Russell King - ARM Linux admin
2020-02-29 18:03                       ` Theodore Y. Ts'o
2020-02-29 18:03                         ` Theodore Y. Ts'o
2020-06-05 23:53                   ` Russell King - ARM Linux admin
2020-06-05 23:53                     ` Russell King - ARM Linux admin
2020-06-06 10:19                     ` Russell King - ARM Linux admin
2020-06-06 10:19                       ` Russell King - ARM Linux admin
2020-02-10 15:33             ` Lorenzo Pieralisi
2020-02-10 15:33               ` Lorenzo Pieralisi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200229170328.GD25745@shell.armlinux.org.uk \
    --to=linux@armlinux.org.uk \
    --cc=adilger.kernel@dilger.ca \
    --cc=andrew.murray@arm.com \
    --cc=arnd@arndb.de \
    --cc=bhelgaas@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=jon@solid-run.com \
    --cc=l.subrahmanya@mobiveil.co.in \
    --cc=leoyang.li@nxp.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=m.karthikeyan@mobiveil.co.in \
    --cc=mark.rutland@arm.com \
    --cc=minghuan.lian@nxp.com \
    --cc=mingkai.hu@nxp.com \
    --cc=olof@lixom.net \
    --cc=shawnguo@kernel.org \
    --cc=tytso@mit.edu \
    --cc=will.deacon@arm.com \
    --cc=xiaowei.bao@nxp.com \
    --cc=zhiqiang.hou@nxp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.