From: "Stéphane Graber" <stgraber@ubuntu.com>
To: linux-pci@vger.kernel.org
Cc: Rob Herring <robh@kernel.org>,
Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Subject: PCIe regression on APM Merlin (aarch64 dev platform) preventing NVME initialization
Date: Thu, 18 Nov 2021 13:10:09 -0500 [thread overview]
Message-ID: <CA+enf=v9rY_xnZML01oEgKLmvY1NGBUUhnSJaETmXtDtXfaczA@mail.gmail.com> (raw)
Hello,
I've recently been given access to a set of 4 APM X-Gene2 Merlin
boards (old-ish development platform).
Running them on Ubuntu 20.04's stock 5.4 kernel worked fine but trying
to run anything else would fail to boot due to a NVME initialization
timeout preventing the main drive from showing up at all.
Tracking this issue, I first moved to clean mainline kernels and then
isolated the issue to be somewhere between 5.4.0 and 5.5.0-rc1, which
sadly meant the merge window (so much for a quick bisect...). I've
then bisected between those two points and came up with:
6dce5aa59e0bf2430733d7a8b11c205ec10f408e (refs/bisect/bad) PCI:
xgene: Use inbound resources for setup
I finally switched to the latest 5.15.2 tree, reverted that one
commit, built a new kernel and confirmed that those boards now work
flawlessly.
Unfortunately that's about the extent of my abilities with kernel
debugging and I won't pretend to understand what that commit does or
how it may be breaking PCIe initialization on those systems.
I'm not technically blocked on this, I can manually build my own
kernels by reverting that one commit every time, but that's obviously
not ideal and I'd much rather have this fixed upstream :)
== Good boot on 5.15.2 (commit reverted) ==
Full log at: https://gist.github.com/stgraber/e489b7e55dd7ffaac9f77dd8634ca2ff
root@entak:~# dmesg | grep -Ei "nvme|pci"
[ 0.094146] PCI: CLS 0 bytes, default 64
[ 0.130573] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
[ 0.131324] xgene-pcie 1f2b0000.pcie: host bridge /soc/pcie@1f2b0000 ranges:
[ 0.131344] xgene-pcie 1f2b0000.pcie: No bus range found for
/soc/pcie@1f2b0000, using [bus 00-ff]
[ 0.131365] xgene-pcie 1f2b0000.pcie: IO
0xc010000000..0xc01000ffff -> 0x0000000000
[ 0.131388] xgene-pcie 1f2b0000.pcie: MEM
0xc120000000..0xc13fffffff -> 0x0020000000
[ 0.131401] xgene-pcie 1f2b0000.pcie: MEM
0xe000000000..0xffffffffff -> 0xe000000000
[ 0.131416] xgene-pcie 1f2b0000.pcie: IB MEM
0x8000000000..0x807fffffff -> 0x8000000000
[ 0.131427] xgene-pcie 1f2b0000.pcie: IB MEM
0x0000000000..0x7fffffffff -> 0x0000000000
[ 0.131510] xgene-pcie 1f2b0000.pcie: (rc) x4 gen-3 link up
[ 0.131600] xgene-pcie 1f2b0000.pcie: PCI host bridge to bus 0000:00
[ 0.131612] pci_bus 0000:00: root bus resource [bus 00-ff]
[ 0.131619] pci_bus 0000:00: root bus resource [io 0x0000-0xffff]
[ 0.131629] pci_bus 0000:00: root bus resource [mem
0xc120000000-0xc13fffffff] (bus address [0x20000000-0x3fffffff])
[ 0.131637] pci_bus 0000:00: root bus resource [mem
0xe000000000-0xffffffffff pref]
[ 0.131671] pci 0000:00:00.0: [10e8:e004] type 01 class 0x060400
[ 0.131682] pci_bus 0000:00: 2-byte config write to 0000:00:00.0
offset 0x4 may corrupt adjacent RW1C bits
[ 0.131693] pci_bus 0000:00: 2-byte config write to 0000:00:00.0
offset 0x4 may corrupt adjacent RW1C bits
[ 0.131705] pci_bus 0000:00: 2-byte config write to 0000:00:00.0
offset 0x4 may corrupt adjacent RW1C bits
[ 0.131715] pci_bus 0000:00: 2-byte config write to 0000:00:00.0
offset 0x4 may corrupt adjacent RW1C bits
[ 0.131725] pci_bus 0000:00: 2-byte config write to 0000:00:00.0
offset 0x4 may corrupt adjacent RW1C bits
[ 0.131733] pci_bus 0000:00: 2-byte config write to 0000:00:00.0
offset 0x4 may corrupt adjacent RW1C bits
[ 0.131742] pci_bus 0000:00: 2-byte config write to 0000:00:00.0
offset 0x4 may corrupt adjacent RW1C bits
[ 0.131753] pci_bus 0000:00: 2-byte config write to 0000:00:00.0
offset 0x4 may corrupt adjacent RW1C bits
[ 0.131781] pci_bus 0000:00: 2-byte config write to 0000:00:00.0
offset 0x3e may corrupt adjacent RW1C bits
[ 0.131832] pci 0000:00:00.0: supports D1 D2
[ 0.132373] pci_bus 0000:00: 2-byte config write to 0000:00:00.0
offset 0x3e may corrupt adjacent RW1C bits
[ 0.132482] pci 0000:01:00.0: [144d:a80a] type 00 class 0x010802
[ 0.132518] pci 0000:01:00.0: reg 0x10: [mem 0x40000000-0x40003fff 64bit]
[ 0.132778] pci 0000:01:00.0: 31.504 Gb/s available PCIe bandwidth,
limited by 8.0 GT/s PCIe x4 link at 0000:00:00.0 (capable of 63.012
Gb/s with 16.0 GT/s PCIe x4 link)
[ 0.143064] pci 0000:00:00.0: BAR 14: assigned [mem
0xc120000000-0xc1200fffff]
[ 0.143086] pci 0000:01:00.0: BAR 0: assigned [mem
0xc120000000-0xc120003fff 64bit]
[ 0.143105] pci 0000:00:00.0: PCI bridge to [bus 01]
[ 0.143114] pci 0000:00:00.0: bridge window [mem 0xc120000000-0xc1200fffff]
[ 0.143315] pcieport 0000:00:00.0: PME: Signaling with IRQ 59
[ 0.143518] pcieport 0000:00:00.0: AER: enabled with IRQ 59
[ 1.596986] ehci-pci: EHCI PCI platform driver
[ 1.611674] ohci-pci: OHCI PCI platform driver
[ 3.347499] nvme nvme0: pci function 0000:01:00.0
[ 3.347531] nvme 0000:01:00.0: enabling device (0000 -> 0002)
[ 3.350353] nvme nvme0: Shutdown timeout set to 10 seconds
[ 3.535444] nvme nvme0: 8/0/0 default/read/poll queues
[ 3.551454] nvme0n1: p1 p2 p3 p4
[ 6.963428] EXT4-fs (nvme0n1p2): mounted filesystem with ordered
data mode. Opts: (null). Quota mode: none.
[ 8.415778] EXT4-fs (nvme0n1p2): re-mounted. Opts: (null). Quota mode: none.
== Bad boot on 5.15.2 (clean build, nothing reverted) ==
Full log at: https://gist.github.com/stgraber/605e8e852d8de35c6bbe64fab0f83815
root@entak:~# cat /boot/efi/dmesg | grep -Ei "nvme|pci"
[ 0.094130] PCI: CLS 0 bytes, default 64
[ 0.130822] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
[ 0.131556] xgene-pcie 1f2b0000.pcie: host bridge /soc/pcie@1f2b0000 ranges:
[ 0.131576] xgene-pcie 1f2b0000.pcie: No bus range found for
/soc/pcie@1f2b0000, using [bus 00-ff]
[ 0.131596] xgene-pcie 1f2b0000.pcie: IO
0xc010000000..0xc01000ffff -> 0x0000000000
[ 0.131618] xgene-pcie 1f2b0000.pcie: MEM
0xc120000000..0xc13fffffff -> 0x0020000000
[ 0.131630] xgene-pcie 1f2b0000.pcie: MEM
0xe000000000..0xffffffffff -> 0xe000000000
[ 0.131646] xgene-pcie 1f2b0000.pcie: IB MEM
0x8000000000..0x807fffffff -> 0x8000000000
[ 0.131659] xgene-pcie 1f2b0000.pcie: IB MEM
0x0000000000..0x7fffffffff -> 0x0000000000
[ 0.131729] xgene-pcie 1f2b0000.pcie: (rc) x4 gen-3 link up
[ 0.131816] xgene-pcie 1f2b0000.pcie: PCI host bridge to bus 0000:00
[ 0.131827] pci_bus 0000:00: root bus resource [bus 00-ff]
[ 0.131834] pci_bus 0000:00: root bus resource [io 0x0000-0xffff]
[ 0.131844] pci_bus 0000:00: root bus resource [mem
0xc120000000-0xc13fffffff] (bus address [0x20000000-0x3fffffff])
[ 0.131852] pci_bus 0000:00: root bus resource [mem
0xe000000000-0xffffffffff pref]
[ 0.131886] pci 0000:00:00.0: [10e8:e004] type 01 class 0x060400
[ 0.131897] pci_bus 0000:00: 2-byte config write to 0000:00:00.0
offset 0x4 may corrupt adjacent RW1C bits
[ 0.131908] pci_bus 0000:00: 2-byte config write to 0000:00:00.0
offset 0x4 may corrupt adjacent RW1C bits
[ 0.131919] pci_bus 0000:00: 2-byte config write to 0000:00:00.0
offset 0x4 may corrupt adjacent RW1C bits
[ 0.131929] pci_bus 0000:00: 2-byte config write to 0000:00:00.0
offset 0x4 may corrupt adjacent RW1C bits
[ 0.131938] pci_bus 0000:00: 2-byte config write to 0000:00:00.0
offset 0x4 may corrupt adjacent RW1C bits
[ 0.131946] pci_bus 0000:00: 2-byte config write to 0000:00:00.0
offset 0x4 may corrupt adjacent RW1C bits
[ 0.131955] pci_bus 0000:00: 2-byte config write to 0000:00:00.0
offset 0x4 may corrupt adjacent RW1C bits
[ 0.131966] pci_bus 0000:00: 2-byte config write to 0000:00:00.0
offset 0x4 may corrupt adjacent RW1C bits
[ 0.131994] pci_bus 0000:00: 2-byte config write to 0000:00:00.0
offset 0x3e may corrupt adjacent RW1C bits
[ 0.132044] pci 0000:00:00.0: supports D1 D2
[ 0.132590] pci_bus 0000:00: 2-byte config write to 0000:00:00.0
offset 0x3e may corrupt adjacent RW1C bits
[ 0.132700] pci 0000:01:00.0: [144d:a80a] type 00 class 0x010802
[ 0.132735] pci 0000:01:00.0: reg 0x10: [mem 0x40000000-0x40003fff 64bit]
[ 0.132996] pci 0000:01:00.0: 31.504 Gb/s available PCIe bandwidth,
limited by 8.0 GT/s PCIe x4 link at 0000:00:00.0 (capable of 63.012
Gb/s with 16.0 GT/s PCIe x4 link)
[ 0.143038] pci 0000:00:00.0: BAR 14: assigned [mem
0xc120000000-0xc1200fffff]
[ 0.143059] pci 0000:01:00.0: BAR 0: assigned [mem
0xc120000000-0xc120003fff 64bit]
[ 0.143079] pci 0000:00:00.0: PCI bridge to [bus 01]
[ 0.143087] pci 0000:00:00.0: bridge window [mem 0xc120000000-0xc1200fffff]
[ 0.143286] pcieport 0000:00:00.0: PME: Signaling with IRQ 59
[ 0.143474] pcieport 0000:00:00.0: AER: enabled with IRQ 59
[ 1.598863] ehci-pci: EHCI PCI platform driver
[ 1.613544] ohci-pci: OHCI PCI platform driver
[ 3.280872] nvme nvme0: pci function 0000:01:00.0
[ 3.280929] nvme 0000:01:00.0: enabling device (0000 -> 0002)
[ 7.393328] pcieport 0000:00:00.0: AER: Corrected error received:
0000:01:00.0
[ 7.400550] nvme 0000:01:00.0: PCIe Bus Error: severity=Corrected,
type=Physical Layer, (Receiver ID)
[ 7.409733] nvme 0000:01:00.0: device [144d:a80a] error
status/mask=00000001/0000e000
[ 7.417703] nvme 0000:01:00.0: [ 0] RxErr
[ 7.423434] pci_generic_config_write32: 28 callbacks suppressed
[ 7.423439] pci_bus 0000:01: 2-byte config write to 0000:01:00.0
offset 0x7a may corrupt adjacent RW1C bits
[ 11.524622] pcieport 0000:00:00.0: AER: Corrected error received:
0000:01:00.0
[ 11.531828] nvme 0000:01:00.0: PCIe Bus Error: severity=Corrected,
type=Physical Layer, (Receiver ID)
[ 11.541008] nvme 0000:01:00.0: device [144d:a80a] error
status/mask=00000001/0000e000
[ 11.548978] nvme 0000:01:00.0: [ 0] RxErr
[ 11.554707] pci_bus 0000:01: 2-byte config write to 0000:01:00.0
offset 0x7a may corrupt adjacent RW1C bits
[ 64.046090] pcieport 0000:00:00.0: AER: Corrected error received:
0000:01:00.0
[ 64.053295] nvme 0000:01:00.0: PCIe Bus Error: severity=Corrected,
type=Physical Layer, (Receiver ID)
[ 64.062475] nvme 0000:01:00.0: device [144d:a80a] error
status/mask=00000001/0000e000
[ 64.070446] nvme 0000:01:00.0: [ 0] RxErr
[ 64.076175] pci_bus 0000:01: 2-byte config write to 0000:01:00.0
offset 0x7a may corrupt adjacent RW1C bits
[ 64.478625] nvme nvme0: I/O 16 QID 0 timeout, disable controller
[ 64.590606] nvme nvme0: Device shutdown incomplete; abort shutdown
[ 64.610619] pci_bus 0000:01: 2-byte config write to 0000:01:00.0
offset 0xb2 may corrupt adjacent RW1C bits
[ 64.620324] pci_bus 0000:01: 2-byte config write to 0000:01:00.0
offset 0x4 may corrupt adjacent RW1C bits
[ 64.629984] pci_bus 0000:01: 2-byte config write to 0000:01:00.0
offset 0x78 may corrupt adjacent RW1C bits
[ 64.639694] pci_bus 0000:01: 2-byte config write to 0000:01:00.0
offset 0x4 may corrupt adjacent RW1C bits
[ 64.649330] nvme nvme0: Identify Controller failed (-4)
[ 64.654541] nvme nvme0: Removing after probe failure status: -5
Thanks!
Stéphane
next reply other threads:[~2021-11-18 18:10 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-18 18:10 Stéphane Graber [this message]
2021-11-18 21:20 ` PCIe regression on APM Merlin (aarch64 dev platform) preventing NVME initialization Rob Herring
2021-11-18 22:03 ` Rob Herring
2021-11-19 4:43 ` Stéphane Graber
2021-11-21 9:43 ` Thorsten Leemhuis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CA+enf=v9rY_xnZML01oEgKLmvY1NGBUUhnSJaETmXtDtXfaczA@mail.gmail.com' \
--to=stgraber@ubuntu.com \
--cc=linux-pci@vger.kernel.org \
--cc=lorenzo.pieralisi@arm.com \
--cc=robh@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).