linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Rob Herring <robh@kernel.org>
To: "Stéphane Graber" <stgraber@ubuntu.com>
Cc: PCI <linux-pci@vger.kernel.org>,
	Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Subject: Re: PCIe regression on APM Merlin (aarch64 dev platform) preventing NVME initialization
Date: Thu, 18 Nov 2021 15:20:55 -0600	[thread overview]
Message-ID: <CAL_JsqL8ofiJv4gyP+rnDEKxjd5+ANL3XwhxaSF13aznvGpE9g@mail.gmail.com> (raw)
In-Reply-To: <CA+enf=v9rY_xnZML01oEgKLmvY1NGBUUhnSJaETmXtDtXfaczA@mail.gmail.com>

On Thu, Nov 18, 2021 at 12:10 PM Stéphane Graber <stgraber@ubuntu.com> wrote:
>
> Hello,
>
> I've recently been given access to a set of 4 APM X-Gene2 Merlin
> boards (old-ish development platform).
> Running them on Ubuntu 20.04's stock 5.4 kernel worked fine but trying
> to run anything else would fail to boot due to a NVME initialization
> timeout preventing the main drive from showing up at all.
>
> Tracking this issue, I first moved to clean mainline kernels and then
> isolated the issue to be somewhere between 5.4.0 and 5.5.0-rc1, which
> sadly meant the merge window (so much for a quick bisect...). I've
> then bisected between those two points and came up with:
>
>   6dce5aa59e0bf2430733d7a8b11c205ec10f408e (refs/bisect/bad) PCI:
> xgene: Use inbound resources for setup
>
> I finally switched to the latest 5.15.2 tree, reverted that one
> commit, built a new kernel and confirmed that those boards now work
> flawlessly.
>
> Unfortunately that's about the extent of my abilities with kernel
> debugging and I won't pretend to understand what that commit does or
> how it may be breaking PCIe initialization on those systems.
>
> I'm not technically blocked on this, I can manually build my own
> kernels by reverting that one commit every time, but that's obviously
> not ideal and I'd much rather have this fixed upstream :)

Doesn't this platform have ACPI f/w you can use? From the log, it
looks like ACPI tables are passed to the kernel, but since a full DT
is passed it is used by default. Does 'acpi=on' not work?

Given no one noticed the breakage for 2 years, I'd really like to
remove these dts and binding files otherwise someone needs to convert
bindings to schema and fix warnings. Current stats look like this:
Processing apm:
warnings: 240
undocumented compat: 114

For example, I noticed that dma-ranges declares the entries are 32-bit
(0x42000000 is 32-bit prefetch), yet the PCI bus address and sizes are
>32-bit. AFAICT, that isn't part of the problem here.

> == Good boot on 5.15.2 (commit reverted) ==
> Full log at: https://gist.github.com/stgraber/e489b7e55dd7ffaac9f77dd8634ca2ff
>
> root@entak:~# dmesg | grep -Ei "nvme|pci"
> [    0.094146] PCI: CLS 0 bytes, default 64
> [    0.130573] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
> [    0.131324] xgene-pcie 1f2b0000.pcie: host bridge /soc/pcie@1f2b0000 ranges:
> [    0.131344] xgene-pcie 1f2b0000.pcie:   No bus range found for
> /soc/pcie@1f2b0000, using [bus 00-ff]
> [    0.131365] xgene-pcie 1f2b0000.pcie:       IO
> 0xc010000000..0xc01000ffff -> 0x0000000000
> [    0.131388] xgene-pcie 1f2b0000.pcie:      MEM
> 0xc120000000..0xc13fffffff -> 0x0020000000
> [    0.131401] xgene-pcie 1f2b0000.pcie:      MEM
> 0xe000000000..0xffffffffff -> 0xe000000000
> [    0.131416] xgene-pcie 1f2b0000.pcie:   IB MEM
> 0x8000000000..0x807fffffff -> 0x8000000000
> [    0.131427] xgene-pcie 1f2b0000.pcie:   IB MEM
> 0x0000000000..0x7fffffffff -> 0x0000000000

My best guess is while the above is the parsed order of 'IB MEM'
regions, we sort the entries by address now and that changes which
inbound registers get used for each region. And one doesn't handle >
32-bit addresses. Can you try out this change? It's not what I want
for a final change because the code is just as fragile:

diff --git a/drivers/pci/controller/pci-xgene.c
b/drivers/pci/controller/pci-xgene.c
index 56d0d50338c8..18f05b65439e 100644
--- a/drivers/pci/controller/pci-xgene.c
+++ b/drivers/pci/controller/pci-xgene.c
@@ -465,16 +465,16 @@ static int xgene_pcie_select_ib_reg(u8
*ib_reg_mask, u64 size)
                return 1;
        }

-       if ((size > SZ_1K) && (size < SZ_1T) && !(*ib_reg_mask & (1 << 0))) {
-               *ib_reg_mask |= (1 << 0);
-               return 0;
-       }
-
        if ((size > SZ_1M) && (size < SZ_1T) && !(*ib_reg_mask & (1 << 2))) {
                *ib_reg_mask |= (1 << 2);
                return 2;
        }

+       if ((size > SZ_1K) && (size < SZ_1T) && !(*ib_reg_mask & (1 << 0))) {
+               *ib_reg_mask |= (1 << 0);
+               return 0;
+       }
+
        return -EINVAL;
 }

  reply	other threads:[~2021-11-18 21:21 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-18 18:10 Stéphane Graber
2021-11-18 21:20 ` Rob Herring [this message]
2021-11-18 22:03   ` Rob Herring
2021-11-19  4:43     ` Stéphane Graber
2021-11-21  9:43 ` Thorsten Leemhuis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAL_JsqL8ofiJv4gyP+rnDEKxjd5+ANL3XwhxaSF13aznvGpE9g@mail.gmail.com \
    --to=robh@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=stgraber@ubuntu.com \
    --subject='Re: PCIe regression on APM Merlin (aarch64 dev platform) preventing NVME initialization' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).