From: Russell King - ARM Linux admin <linux@armlinux.org.uk> To: Olof Johansson <olof@lixom.net> Cc: "mark.rutland@arm.com" <mark.rutland@arm.com>, "devicetree@vger.kernel.org" <devicetree@vger.kernel.org>, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>, "arnd@arndb.de" <arnd@arndb.de>, "m.karthikeyan@mobiveil.co.in" <m.karthikeyan@mobiveil.co.in>, "linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>, "Z.q. Hou" <zhiqiang.hou@nxp.com>, "l.subrahmanya@mobiveil.co.in" <l.subrahmanya@mobiveil.co.in>, "will.deacon@arm.com" <will.deacon@arm.com>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, Leo Li <leoyang.li@nxp.com>, "M.h. Lian" <minghuan.lian@nxp.com>, "robh+dt@kernel.org" <robh+dt@kernel.org>, Xiaowei Bao <xiaowei.bao@nxp.com>, "catalin.marinas@arm.com" <catalin.marinas@arm.com>, "bhelgaas@google.com" <bhelgaas@google.com>, "andrew.murray@arm.com" <andrew.murray@arm.com>, "shawnguo@kernel.org" <shawnguo@kernel.org>, Mingkai Hu <mingkai.hu@nxp.com>, "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org> Subject: Re: [PATCHv9 00/12] PCI: Recode Mobiveil driver and add PCIe Gen4 driver for NXP Layerscape SoCs Date: Mon, 10 Feb 2020 16:15:53 +0000 [thread overview] Message-ID: <20200210161553.GE25745@shell.armlinux.org.uk> (raw) In-Reply-To: <CAOesGMj6B-X1s8-mYqS0N6GJXdKka1MxaNV=33D1H++h7bmXrA@mail.gmail.com> On Mon, Feb 10, 2020 at 04:28:23PM +0100, Olof Johansson wrote: > On Mon, Feb 10, 2020 at 4:23 PM Russell King - ARM Linux admin > <linux@armlinux.org.uk> wrote: > > > > On Mon, Feb 10, 2020 at 04:12:30PM +0100, Olof Johansson wrote: > > > On Thu, Feb 6, 2020 at 11:57 AM Z.q. Hou <zhiqiang.hou@nxp.com> wrote: > > > > > > > > Hi Olof, > > > > > > > > Thanks a lot for your comments! > > > > And sorry for my delay respond! > > > > > > Actually, they apply with only minor conflicts on top of current -next. > > > > > > Bjorn, any chance we can get you to pick these up pretty soon? They > > > enable full use of a promising ARM developer system, the SolidRun > > > HoneyComb, and would be quite valuable for me and others to be able to > > > use with mainline or -next without any additional patches applied -- > > > which this patchset achieves. > > > > > > I know there are pending revisions based on feedback. I'll leave it up > > > to you and others to determine if that can be done with incremental > > > patches on top, or if it should be fixed before the initial patchset > > > is applied. But all in all, it's holding up adaption by me and surely > > > others of a very interesting platform -- I'm looking to replace my > > > aging MacchiatoBin with one of these and would need PCIe/NVMe to work > > > before I do. > > > > If you're going to be using NVMe, make sure you use a power-fail safe > > version; I've already had one instance where ext4 failed to mount > > because of a corrupted journal using an XPG SX8200 after the Honeycomb > > Serror'd, and then I powered it down after a few hours before later > > booting it back up. > > > > EXT4-fs (nvme0n1p2): INFO: recovery required on readonly filesystem > > EXT4-fs (nvme0n1p2): write access will be enabled during recovery > > JBD2: journal transaction 80849 on nvme0n1p2-8 is corrupt. > > EXT4-fs (nvme0n1p2): error loading journal > > Hmm, using btrfs on mine, not sure if the exposure is similar or not. As I understand the problem, it isn't a filesystem issue. It's a data integrity issue with the NVMe over power fail, how they cache the data, and ultimately write it to the nand flash. Have a read of: https://www.kingston.com/en/solutions/servers-data-centers/ssd-power-loss-protection As NVMe and SSD are basically the same underlying technology (the host interface is different) and the issues I've heard, and now experienced with my NVMe, I think the above is a good pointer to the problems of flash mass storage. As I understand it, the problem occurs when the mapping table has not been written back to flash, power is lost without the Standby Immediate command being sent, and there is no way for the firmware to quickly save the table. On subsequent power up, the firmware has to reconstruct the mapping table, and depending on how that is done, incorrect (old?) data may be returned for some blocks. That can happen to any blocks on the drive, which means any data can be at risk from a power loss event, whether that is a power failure or after a crash. > Do you know if the SErr was due to a known issue and/or if it's > something that's fixed in production silicon? The SError is triggered by something on the PCIe side of things; if I leave the Mellanox PCIe card out, then I don't get them. The errata patches I have merged into my tree help a bit, turning the code from being unable to boot without a SError with the card plugged in, to being able to boot and last a while - but the SErrors still eventually come, maybe taking a few days... and that's without the Mellanox ethernet interface being up. > (I still can't enable SMMU since across a warm reboot it fails > *completely*, with nothing coming up and working. NXP folks, you > listening? :) Is it just a warm reboot? I thought I saw SMMU activity on a cold boot as well, implying that there were devices active that Linux did not know about. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up
WARNING: multiple messages have this Message-ID (diff)
From: Russell King - ARM Linux admin <linux@armlinux.org.uk> To: Olof Johansson <olof@lixom.net> Cc: "mark.rutland@arm.com" <mark.rutland@arm.com>, "devicetree@vger.kernel.org" <devicetree@vger.kernel.org>, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>, "arnd@arndb.de" <arnd@arndb.de>, "m.karthikeyan@mobiveil.co.in" <m.karthikeyan@mobiveil.co.in>, "linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>, "Z.q. Hou" <zhiqiang.hou@nxp.com>, "l.subrahmanya@mobiveil.co.in" <l.subrahmanya@mobiveil.co.in>, "will.deacon@arm.com" <will.deacon@arm.com>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, Leo Li <leoyang.li@nxp.com>, "M.h. Lian" <minghuan.lian@nxp.com>, "robh+dt@kernel.org" <robh+dt@kernel.org>, Xiaowei Bao <xiaowei.bao@nxp.com>, "catalin.marinas@arm.com" <catalin.marinas@arm.com>, "bhelgaas@google.com" <bhelgaas@google.com>, "andrew.murray@arm.com" <andrew.murray@arm.com>, "shawnguo@kernel.org" <shawnguo@kernel.org>, Mingkai Hu <mingkai.hu@nxp.com>, "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org> Subject: Re: [PATCHv9 00/12] PCI: Recode Mobiveil driver and add PCIe Gen4 driver for NXP Layerscape SoCs Date: Mon, 10 Feb 2020 16:15:53 +0000 [thread overview] Message-ID: <20200210161553.GE25745@shell.armlinux.org.uk> (raw) In-Reply-To: <CAOesGMj6B-X1s8-mYqS0N6GJXdKka1MxaNV=33D1H++h7bmXrA@mail.gmail.com> On Mon, Feb 10, 2020 at 04:28:23PM +0100, Olof Johansson wrote: > On Mon, Feb 10, 2020 at 4:23 PM Russell King - ARM Linux admin > <linux@armlinux.org.uk> wrote: > > > > On Mon, Feb 10, 2020 at 04:12:30PM +0100, Olof Johansson wrote: > > > On Thu, Feb 6, 2020 at 11:57 AM Z.q. Hou <zhiqiang.hou@nxp.com> wrote: > > > > > > > > Hi Olof, > > > > > > > > Thanks a lot for your comments! > > > > And sorry for my delay respond! > > > > > > Actually, they apply with only minor conflicts on top of current -next. > > > > > > Bjorn, any chance we can get you to pick these up pretty soon? They > > > enable full use of a promising ARM developer system, the SolidRun > > > HoneyComb, and would be quite valuable for me and others to be able to > > > use with mainline or -next without any additional patches applied -- > > > which this patchset achieves. > > > > > > I know there are pending revisions based on feedback. I'll leave it up > > > to you and others to determine if that can be done with incremental > > > patches on top, or if it should be fixed before the initial patchset > > > is applied. But all in all, it's holding up adaption by me and surely > > > others of a very interesting platform -- I'm looking to replace my > > > aging MacchiatoBin with one of these and would need PCIe/NVMe to work > > > before I do. > > > > If you're going to be using NVMe, make sure you use a power-fail safe > > version; I've already had one instance where ext4 failed to mount > > because of a corrupted journal using an XPG SX8200 after the Honeycomb > > Serror'd, and then I powered it down after a few hours before later > > booting it back up. > > > > EXT4-fs (nvme0n1p2): INFO: recovery required on readonly filesystem > > EXT4-fs (nvme0n1p2): write access will be enabled during recovery > > JBD2: journal transaction 80849 on nvme0n1p2-8 is corrupt. > > EXT4-fs (nvme0n1p2): error loading journal > > Hmm, using btrfs on mine, not sure if the exposure is similar or not. As I understand the problem, it isn't a filesystem issue. It's a data integrity issue with the NVMe over power fail, how they cache the data, and ultimately write it to the nand flash. Have a read of: https://www.kingston.com/en/solutions/servers-data-centers/ssd-power-loss-protection As NVMe and SSD are basically the same underlying technology (the host interface is different) and the issues I've heard, and now experienced with my NVMe, I think the above is a good pointer to the problems of flash mass storage. As I understand it, the problem occurs when the mapping table has not been written back to flash, power is lost without the Standby Immediate command being sent, and there is no way for the firmware to quickly save the table. On subsequent power up, the firmware has to reconstruct the mapping table, and depending on how that is done, incorrect (old?) data may be returned for some blocks. That can happen to any blocks on the drive, which means any data can be at risk from a power loss event, whether that is a power failure or after a crash. > Do you know if the SErr was due to a known issue and/or if it's > something that's fixed in production silicon? The SError is triggered by something on the PCIe side of things; if I leave the Mellanox PCIe card out, then I don't get them. The errata patches I have merged into my tree help a bit, turning the code from being unable to boot without a SError with the card plugged in, to being able to boot and last a while - but the SErrors still eventually come, maybe taking a few days... and that's without the Mellanox ethernet interface being up. > (I still can't enable SMMU since across a warm reboot it fails > *completely*, with nothing coming up and working. NXP folks, you > listening? :) Is it just a warm reboot? I thought I saw SMMU activity on a cold boot as well, implying that there were devices active that Linux did not know about. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2020-02-10 16:16 UTC|newest] Thread overview: 128+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-11-20 3:45 [PATCHv9 00/12] PCI: Recode Mobiveil driver and add PCIe Gen4 driver for NXP Layerscape SoCs Z.q. Hou 2019-11-20 3:45 ` Z.q. Hou 2019-11-20 3:45 ` [PATCHv9 01/12] PCI: mobiveil: Re-abstract the private structure Z.q. Hou 2019-11-20 3:45 ` Z.q. Hou 2020-01-13 10:09 ` Andrew Murray 2020-01-13 10:09 ` Andrew Murray 2020-02-06 11:04 ` Z.q. Hou 2020-02-06 11:04 ` Z.q. Hou 2020-02-06 11:27 ` Andrew Murray 2020-02-06 11:27 ` Andrew Murray 2019-11-20 3:45 ` [PATCHv9 02/12] PCI: mobiveil: Move the host initialization into a routine Z.q. Hou 2019-11-20 3:45 ` Z.q. Hou 2020-01-13 10:19 ` Andrew Murray 2020-01-13 10:19 ` Andrew Murray 2020-02-06 11:14 ` Z.q. Hou 2020-02-06 11:14 ` Z.q. Hou 2019-11-20 3:45 ` [PATCHv9 03/12] PCI: mobiveil: Collect the interrupt related operations " Z.q. Hou 2019-11-20 3:45 ` Z.q. Hou 2020-01-13 10:34 ` Andrew Murray 2020-01-13 10:34 ` Andrew Murray 2020-02-06 11:30 ` Z.q. Hou 2020-02-06 11:30 ` Z.q. Hou 2019-11-20 3:45 ` [PATCHv9 04/12] PCI: mobiveil: Modularize the Mobiveil PCIe Host Bridge IP driver Z.q. Hou 2019-11-20 3:45 ` Z.q. Hou 2020-01-13 11:05 ` Andrew Murray 2020-01-13 11:05 ` Andrew Murray 2020-02-06 12:25 ` Z.q. Hou 2020-02-06 12:25 ` Z.q. Hou 2019-11-20 3:45 ` [PATCHv9 05/12] PCI: mobiveil: Add callback function for interrupt initialization Z.q. Hou 2019-11-20 3:45 ` Z.q. Hou 2020-01-13 11:19 ` Andrew Murray 2020-01-13 11:19 ` Andrew Murray 2020-02-06 13:25 ` Z.q. Hou 2020-02-06 13:25 ` Z.q. Hou 2019-11-20 3:45 ` [PATCHv9 06/12] PCI: mobiveil: Add callback function for link up check Z.q. Hou 2019-11-20 3:45 ` Z.q. Hou 2020-01-13 11:22 ` Andrew Murray 2020-01-13 11:22 ` Andrew Murray 2020-02-06 13:25 ` Z.q. Hou 2020-02-06 13:25 ` Z.q. Hou 2019-11-20 3:46 ` [PATCHv9 07/12] PCI: mobiveil: Make mobiveil_host_init() can be used to re-init host Z.q. Hou 2019-11-20 3:46 ` Z.q. Hou 2020-01-13 11:26 ` Andrew Murray 2020-01-13 11:26 ` Andrew Murray 2020-02-06 13:27 ` Z.q. Hou 2020-02-06 13:27 ` Z.q. Hou 2019-11-20 3:46 ` [PATCHv9 08/12] PCI: mobiveil: Add 8-bit and 16-bit CSR register accessors Z.q. Hou 2019-11-20 3:46 ` Z.q. Hou 2020-01-13 11:31 ` Andrew Murray 2020-01-13 11:31 ` Andrew Murray 2020-02-06 13:45 ` Z.q. Hou 2020-02-06 13:45 ` Z.q. Hou 2019-11-20 3:46 ` [PATCHv9 09/12] dt-bindings: PCI: Add NXP Layerscape SoCs PCIe Gen4 controller Z.q. Hou 2019-11-20 3:46 ` Z.q. Hou 2019-11-20 3:46 ` [PATCHv9 10/12] PCI: mobiveil: Add PCIe Gen4 RC driver for NXP Layerscape SoCs Z.q. Hou 2019-11-20 3:46 ` Z.q. Hou 2020-01-13 12:02 ` Andrew Murray 2020-01-13 12:02 ` Andrew Murray 2020-02-06 13:45 ` Z.q. Hou 2020-02-06 13:45 ` Z.q. Hou 2020-02-06 14:29 ` Andrew Murray 2020-02-06 14:29 ` Andrew Murray 2019-11-20 3:46 ` [PATCHv9 11/12] arm64: dts: lx2160a: Add PCIe controller DT nodes Z.q. Hou 2019-11-20 3:46 ` Z.q. Hou 2019-11-20 3:46 ` [PATCHv9 12/12] arm64: defconfig: Enable CONFIG_PCIE_LAYERSCAPE_GEN4 Z.q. Hou 2019-11-20 3:46 ` Z.q. Hou 2019-11-20 9:57 ` [PATCHv9 00/12] PCI: Recode Mobiveil driver and add PCIe Gen4 driver for NXP Layerscape SoCs Russell King - ARM Linux admin 2019-11-20 9:57 ` Russell King - ARM Linux admin 2019-11-20 10:30 ` Z.q. Hou 2019-11-20 10:30 ` Z.q. Hou 2019-12-13 18:37 ` Olof Johansson 2019-12-13 18:37 ` Olof Johansson 2019-12-17 2:50 ` Z.q. Hou 2019-12-17 2:50 ` Z.q. Hou 2020-01-10 15:33 ` Lorenzo Pieralisi 2020-01-10 15:33 ` Lorenzo Pieralisi 2020-01-10 17:05 ` Olof Johansson 2020-01-10 17:05 ` Olof Johansson 2020-02-06 10:57 ` Z.q. Hou 2020-02-06 10:57 ` Z.q. Hou 2020-02-10 15:12 ` Olof Johansson 2020-02-10 15:12 ` Olof Johansson 2020-02-10 15:22 ` Russell King - ARM Linux admin 2020-02-10 15:22 ` Russell King - ARM Linux admin 2020-02-10 15:28 ` Olof Johansson 2020-02-10 15:28 ` Olof Johansson 2020-02-10 16:15 ` Russell King - ARM Linux admin [this message] 2020-02-10 16:15 ` Russell King - ARM Linux admin 2020-02-10 17:20 ` Russell King - ARM Linux admin 2020-02-10 17:20 ` Russell King - ARM Linux admin 2020-02-10 18:33 ` Olof Johansson 2020-02-10 18:33 ` Olof Johansson 2020-02-10 18:41 ` Li Yang 2020-02-10 18:41 ` Li Yang 2020-02-10 19:48 ` Li Yang 2020-02-10 19:48 ` Li Yang 2020-02-11 12:13 ` Laurentiu Tudor 2020-02-11 12:13 ` Laurentiu Tudor 2020-02-11 13:04 ` Robin Murphy 2020-02-11 13:04 ` Robin Murphy 2020-02-11 13:55 ` Laurentiu Tudor 2020-02-11 13:55 ` Laurentiu Tudor 2020-02-11 14:51 ` Robin Murphy 2020-02-11 14:51 ` Robin Murphy 2020-02-11 14:48 ` Olof Johansson 2020-02-11 14:48 ` Olof Johansson 2020-02-11 15:14 ` Laurentiu Tudor 2020-02-11 15:14 ` Laurentiu Tudor 2020-02-29 9:55 ` Russell King - ARM Linux admin 2020-02-29 9:55 ` Russell King - ARM Linux admin 2020-02-29 11:04 ` Russell King - ARM Linux admin 2020-02-29 11:04 ` Russell King - ARM Linux admin 2020-02-29 12:08 ` Russell King - ARM Linux admin 2020-02-29 12:08 ` Russell King - ARM Linux admin 2020-02-29 13:32 ` Russell King - ARM Linux admin 2020-02-29 13:32 ` Russell King - ARM Linux admin 2020-02-29 15:19 ` Theodore Y. Ts'o 2020-02-29 15:19 ` Theodore Y. Ts'o 2020-02-29 17:03 ` Russell King - ARM Linux admin 2020-02-29 17:03 ` Russell King - ARM Linux admin 2020-02-29 18:03 ` Theodore Y. Ts'o 2020-02-29 18:03 ` Theodore Y. Ts'o 2020-06-05 23:53 ` Russell King - ARM Linux admin 2020-06-05 23:53 ` Russell King - ARM Linux admin 2020-06-06 10:19 ` Russell King - ARM Linux admin 2020-06-06 10:19 ` Russell King - ARM Linux admin 2020-02-10 15:33 ` Lorenzo Pieralisi 2020-02-10 15:33 ` Lorenzo Pieralisi
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20200210161553.GE25745@shell.armlinux.org.uk \ --to=linux@armlinux.org.uk \ --cc=andrew.murray@arm.com \ --cc=arnd@arndb.de \ --cc=bhelgaas@google.com \ --cc=catalin.marinas@arm.com \ --cc=devicetree@vger.kernel.org \ --cc=l.subrahmanya@mobiveil.co.in \ --cc=leoyang.li@nxp.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-pci@vger.kernel.org \ --cc=lorenzo.pieralisi@arm.com \ --cc=m.karthikeyan@mobiveil.co.in \ --cc=mark.rutland@arm.com \ --cc=minghuan.lian@nxp.com \ --cc=mingkai.hu@nxp.com \ --cc=olof@lixom.net \ --cc=robh+dt@kernel.org \ --cc=shawnguo@kernel.org \ --cc=will.deacon@arm.com \ --cc=xiaowei.bao@nxp.com \ --cc=zhiqiang.hou@nxp.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.