All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Stéphane Graber" <stgraber@ubuntu.com>
To: Rob Herring <robh@kernel.org>
Cc: PCI <linux-pci@vger.kernel.org>,
	Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Subject: Re: PCIe regression on APM Merlin (aarch64 dev platform) preventing NVME initialization
Date: Thu, 18 Nov 2021 23:43:54 -0500	[thread overview]
Message-ID: <CA+enf=uZex3hC+HxahV25cSFyp9Hz7bLC-h=PnUKEUydDh1Tmw@mail.gmail.com> (raw)
In-Reply-To: <CAL_JsqKrfpDtQZMMuhA_tURit6fO82FzPbKA40o6_8jWRewm8g@mail.gmail.com>

On Thu, Nov 18, 2021 at 5:03 PM Rob Herring <robh@kernel.org> wrote:
>
> On Thu, Nov 18, 2021 at 3:20 PM Rob Herring <robh@kernel.org> wrote:
> >
> > On Thu, Nov 18, 2021 at 12:10 PM Stéphane Graber <stgraber@ubuntu.com> wrote:
> > >
> > > Hello,
> > >
> > > I've recently been given access to a set of 4 APM X-Gene2 Merlin
> > > boards (old-ish development platform).
> > > Running them on Ubuntu 20.04's stock 5.4 kernel worked fine but trying
> > > to run anything else would fail to boot due to a NVME initialization
> > > timeout preventing the main drive from showing up at all.
> > >
> > > Tracking this issue, I first moved to clean mainline kernels and then
> > > isolated the issue to be somewhere between 5.4.0 and 5.5.0-rc1, which
> > > sadly meant the merge window (so much for a quick bisect...). I've
> > > then bisected between those two points and came up with:
> > >
> > >   6dce5aa59e0bf2430733d7a8b11c205ec10f408e (refs/bisect/bad) PCI:
> > > xgene: Use inbound resources for setup
> > >
> > > I finally switched to the latest 5.15.2 tree, reverted that one
> > > commit, built a new kernel and confirmed that those boards now work
> > > flawlessly.
> > >
> > > Unfortunately that's about the extent of my abilities with kernel
> > > debugging and I won't pretend to understand what that commit does or
> > > how it may be breaking PCIe initialization on those systems.
> > >
> > > I'm not technically blocked on this, I can manually build my own
> > > kernels by reverting that one commit every time, but that's obviously
> > > not ideal and I'd much rather have this fixed upstream :)
> >
> > Doesn't this platform have ACPI f/w you can use? From the log, it
> > looks like ACPI tables are passed to the kernel, but since a full DT
> > is passed it is used by default. Does 'acpi=on' not work?

Gave that a try with a clean 5.15.2 and unfortunately it's not booting
at all, all I get is:

Loading Linux 5.15.2 ...
Loading initial ramdisk ...
EFI stub: Booting Linux Kernel...
EFI stub: EFI_RNG_PROTOCOL unavailable
EFI stub: ERROR: FIRMWARE BUG: kernel image not aligned on 64k boundary
EFI stub: ERROR: FIRMWARE BUG: Image BSS overlaps adjacent EFI memory region
EFI stub: Using DTB from configuration table
EFI stub: Exiting boot services...
L3C: 8MB

> > Given no one noticed the breakage for 2 years, I'd really like to
> > remove these dts and binding files otherwise someone needs to convert
> > bindings to schema and fix warnings. Current stats look like this:
> > Processing apm:
> > warnings: 240
> > undocumented compat: 114
> >
> > For example, I noticed that dma-ranges declares the entries are 32-bit
> > (0x42000000 is 32-bit prefetch), yet the PCI bus address and sizes are
> > >32-bit. AFAICT, that isn't part of the problem here.
> >
> > > == Good boot on 5.15.2 (commit reverted) ==
> > > Full log at: https://gist.github.com/stgraber/e489b7e55dd7ffaac9f77dd8634ca2ff
> > >
> > > root@entak:~# dmesg | grep -Ei "nvme|pci"
> > > [    0.094146] PCI: CLS 0 bytes, default 64
> > > [    0.130573] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
> > > [    0.131324] xgene-pcie 1f2b0000.pcie: host bridge /soc/pcie@1f2b0000 ranges:
> > > [    0.131344] xgene-pcie 1f2b0000.pcie:   No bus range found for
> > > /soc/pcie@1f2b0000, using [bus 00-ff]
> > > [    0.131365] xgene-pcie 1f2b0000.pcie:       IO
> > > 0xc010000000..0xc01000ffff -> 0x0000000000
> > > [    0.131388] xgene-pcie 1f2b0000.pcie:      MEM
> > > 0xc120000000..0xc13fffffff -> 0x0020000000
> > > [    0.131401] xgene-pcie 1f2b0000.pcie:      MEM
> > > 0xe000000000..0xffffffffff -> 0xe000000000
> > > [    0.131416] xgene-pcie 1f2b0000.pcie:   IB MEM
> > > 0x8000000000..0x807fffffff -> 0x8000000000
> > > [    0.131427] xgene-pcie 1f2b0000.pcie:   IB MEM
> > > 0x0000000000..0x7fffffffff -> 0x0000000000
> >
> > My best guess is while the above is the parsed order of 'IB MEM'
> > regions, we sort the entries by address now and that changes which
> > inbound registers get used for each region. And one doesn't handle >
> > 32-bit addresses. Can you try out this change? It's not what I want
> > for a final change because the code is just as fragile:
>
> Actually, a better change is this:
>
> diff --git a/drivers/pci/controller/pci-xgene.c
> b/drivers/pci/controller/pci-xgene.c
> index 56d0d50338c8..d83dbd977418 100644
> --- a/drivers/pci/controller/pci-xgene.c
> +++ b/drivers/pci/controller/pci-xgene.c
> @@ -465,7 +465,7 @@ static int xgene_pcie_select_ib_reg(u8
> *ib_reg_mask, u64 size)
>                 return 1;
>         }
>
> -       if ((size > SZ_1K) && (size < SZ_1T) && !(*ib_reg_mask & (1 << 0))) {
> +       if ((size > SZ_1K) && (size < SZ_4G) && !(*ib_reg_mask & (1 << 0))) {
>                 *ib_reg_mask |= (1 << 0);
>                 return 0;
>         }

Just tested it, and it booted just fine!

Full boot log: https://gist.github.com/stgraber/41b2419ef88611ab7a2b4dceb028b4f7

Stéphane

  reply	other threads:[~2021-11-19  4:44 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-18 18:10 PCIe regression on APM Merlin (aarch64 dev platform) preventing NVME initialization Stéphane Graber
2021-11-18 21:20 ` Rob Herring
2021-11-18 22:03   ` Rob Herring
2021-11-19  4:43     ` Stéphane Graber [this message]
2021-11-21  9:43 ` Thorsten Leemhuis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+enf=uZex3hC+HxahV25cSFyp9Hz7bLC-h=PnUKEUydDh1Tmw@mail.gmail.com' \
    --to=stgraber@ubuntu.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=robh@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.