linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: SPARC64: getting "no compatible bridge window" errors :/
       [not found] <NCp_h9j--3-2@tutanota.com>
@ 2022-09-26 17:11 ` Bjorn Helgaas
  2022-10-03 22:35   ` Bjorn Helgaas
  0 siblings, 1 reply; 5+ messages in thread
From: Bjorn Helgaas @ 2022-09-26 17:11 UTC (permalink / raw)
  To: Richard Rogalski
  Cc: Linux Pci, Alex Deucher, David S. Miller, sparclinux, linux-kernel

[+cc Alex, David, sparclinux, LKML]

On Sun, Sep 25, 2022 at 06:59:23PM +0200, Richard Rogalski wrote:
> I hope this is the right place for this.

This is great, thanks a lot for your report!  Is this a regression?
If so, what's the most recent kernel that worked?  

> In my dmesg output, I get things like:
> 
> pci 0000:04:00.0: can't claim VGA legacy [mem 0x000a0000-0x000bffff]: no compatible bridge window
> pci 0000:06:00.0: can't claim VGA legacy [mem 0x000a0000-0x000bffff]: no compatible bridge window
> pci 0000:06:00.1: can't claim BAR 0 [mem 0x84110200000-0x84110203fff 64bit]: no compatible bridge window
> 
> I opened a bug for amdgpu [here](https://gitlab.freedesktop.org/drm/amd/-/issues/2169) but looking further into it I think it is caused by deeper PCIe problems :\
> 
> https://gitlab.freedesktop.org/drm/amd/uploads/cbf47807972c8a990bb2a8cdbb39ad9e/8C7CA9QNG dmesg log
> https://gitlab.freedesktop.org/drm/amd/uploads/6a799425dea50febd82f8bc11e54433a/ll.txt lspci -vv
> https://gitlab.freedesktop.org/drm/amd/uploads/7d4a794b1f7d67a1ffcdee5dfdec3ad6/config.txt kernel .config

Your error output attachment [1] contains an address that looks like
it's in 06:00.0 BAR 5:

  pci 0000:06:00.0: reg 0x24: [mem 0x84001200000-0x8400123ffff]
  NON-RESUMABLE ERROR: insn effective address [0x0000084001201410]

This looks like an amdgpu issue.  There have been recent changes like
c1c39032a074 ("drm/amdgpu: make sure to init common IP before gmc")
and dd6aeb4e5f59 ("drm/amdgpu: Don't enable LTR if not supported")
that could be related.

The PCI "no compatible bridge window" warnings are definitely an
issue, but I don't think they're related to the amdgpu crash:

  pci@400: PCI MEM64 [mem 0x84100000000-0x84dffffffff] offset 80000000000
  pci_bus 0000:00: root bus resource [mem 0x84100000000-0x84dffffffff] (bus address [0x4100000000-0x4dffffffff])
  pci 0000:09:00.0: can't claim BAR 0 [mem 0x84120000000-0x8412007ffff 64bit]: no compatible bridge window

Those and this from lspci:

  0000:01:00.0 bridge to [bus 02-09] window [mem 0x4100000000-0x412fffffff pref]
  0000:02:0c.0 bridge to [bus 09]    window [mem 0x4120000000-0x412fffffff pref]
  0000:09:00.0 Intel 82599ES NIC Region 0: Memory at 0x84120000000

are telling us there's something wrong with how the resource-to-bus
offset is being applied.  It looks like the offset was applied to the
NIC BAR, but didn't get applied to the bridge windows.

Could you start a new thread here (linux-kernel@vger.kernel.org,
linux-pci@vger.kernel.org, and sparclinux@vger.kernel.org) for this
issue and attach the dmesg log when booting with "ofpci_debug=1"?

Do the devices we complain about (NICs and storage HBAs 09:00.0,
09:00.1, 0d:00.0, 0d:00.1, 0e:00.0, 0f:00.0, 0001:03:00.0,
0001:03:00.1, 0001:0:00.0, 0001:0a:00.1) work?

Bjorn

[1] https://gitlab.freedesktop.org/drm/amd/uploads/b51f4d6783eeebf90de9a400525d07d6/qq

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: SPARC64: getting "no compatible bridge window" errors :/
  2022-09-26 17:11 ` SPARC64: getting "no compatible bridge window" errors :/ Bjorn Helgaas
@ 2022-10-03 22:35   ` Bjorn Helgaas
  2022-10-10 21:36     ` Bjorn Helgaas
  0 siblings, 1 reply; 5+ messages in thread
From: Bjorn Helgaas @ 2022-10-03 22:35 UTC (permalink / raw)
  To: Richard Rogalski
  Cc: Linux Pci, Alex Deucher, David S. Miller, sparclinux, linux-kernel

On Mon, Sep 26, 2022 at 12:11:06PM -0500, Bjorn Helgaas wrote:
> [+cc Alex, David, sparclinux, LKML]
> 
> On Sun, Sep 25, 2022 at 06:59:23PM +0200, Richard Rogalski wrote:
> > I hope this is the right place for this.
> 
> This is great, thanks a lot for your report!  Is this a regression?
> If so, what's the most recent kernel that worked?  
> 
> > In my dmesg output, I get things like:
> > 
> > pci 0000:04:00.0: can't claim VGA legacy [mem 0x000a0000-0x000bffff]: no compatible bridge window
> > pci 0000:06:00.0: can't claim VGA legacy [mem 0x000a0000-0x000bffff]: no compatible bridge window
> > pci 0000:06:00.1: can't claim BAR 0 [mem 0x84110200000-0x84110203fff 64bit]: no compatible bridge window
> > 
> > I opened a bug for amdgpu [here](https://gitlab.freedesktop.org/drm/amd/-/issues/2169) but looking further into it I think it is caused by deeper PCIe problems :\
> > 
> > https://gitlab.freedesktop.org/drm/amd/uploads/cbf47807972c8a990bb2a8cdbb39ad9e/8C7CA9QNG dmesg log
> > https://gitlab.freedesktop.org/drm/amd/uploads/6a799425dea50febd82f8bc11e54433a/ll.txt lspci -vv
> > https://gitlab.freedesktop.org/drm/amd/uploads/7d4a794b1f7d67a1ffcdee5dfdec3ad6/config.txt kernel .config
> 
> Your error output attachment [1] contains an address that looks like
> it's in 06:00.0 BAR 5:
> 
>   pci 0000:06:00.0: reg 0x24: [mem 0x84001200000-0x8400123ffff]
>   NON-RESUMABLE ERROR: insn effective address [0x0000084001201410]
> 
> This looks like an amdgpu issue.  There have been recent changes like
> c1c39032a074 ("drm/amdgpu: make sure to init common IP before gmc")
> and dd6aeb4e5f59 ("drm/amdgpu: Don't enable LTR if not supported")
> that could be related.
> 
> The PCI "no compatible bridge window" warnings are definitely an
> issue, but I don't think they're related to the amdgpu crash:
> 
>   pci@400: PCI MEM64 [mem 0x84100000000-0x84dffffffff] offset 80000000000
>   pci_bus 0000:00: root bus resource [mem 0x84100000000-0x84dffffffff] (bus address [0x4100000000-0x4dffffffff])
>   pci 0000:09:00.0: can't claim BAR 0 [mem 0x84120000000-0x8412007ffff 64bit]: no compatible bridge window
> 
> Those and this from lspci:
> 
>   0000:01:00.0 bridge to [bus 02-09] window [mem 0x4100000000-0x412fffffff pref]
>   0000:02:0c.0 bridge to [bus 09]    window [mem 0x4120000000-0x412fffffff pref]
>   0000:09:00.0 Intel 82599ES NIC Region 0: Memory at 0x84120000000
> 
> are telling us there's something wrong with how the resource-to-bus
> offset is being applied.  It looks like the offset was applied to the
> NIC BAR, but didn't get applied to the bridge windows.
> 
> Could you start a new thread here (linux-kernel@vger.kernel.org,
> linux-pci@vger.kernel.org, and sparclinux@vger.kernel.org) for this
> issue and attach the dmesg log when booting with "ofpci_debug=1"?

Any chance you could collect a dmesg log with "ofpci_debug=1"?

I'd like to look at the resource-to-bus offset issue.

> Do the devices we complain about (NICs and storage HBAs 09:00.0,
> 09:00.1, 0d:00.0, 0d:00.1, 0e:00.0, 0f:00.0, 0001:03:00.0,
> 0001:03:00.1, 0001:0:00.0, 0001:0a:00.1) work?
> 
> Bjorn
> 
> [1] https://gitlab.freedesktop.org/drm/amd/uploads/b51f4d6783eeebf90de9a400525d07d6/qq

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: SPARC64: getting "no compatible bridge window" errors :/
  2022-10-03 22:35   ` Bjorn Helgaas
@ 2022-10-10 21:36     ` Bjorn Helgaas
  0 siblings, 0 replies; 5+ messages in thread
From: Bjorn Helgaas @ 2022-10-10 21:36 UTC (permalink / raw)
  To: Richard Rogalski
  Cc: Linux Pci, Alex Deucher, David S. Miller, sparclinux,
	linux-kernel, Lijo Lazar

[+cc Lijo]

On Mon, Oct 03, 2022 at 05:35:02PM -0500, Bjorn Helgaas wrote:
> On Mon, Sep 26, 2022 at 12:11:06PM -0500, Bjorn Helgaas wrote:
> > [+cc Alex, David, sparclinux, LKML]
> > 
> > On Sun, Sep 25, 2022 at 06:59:23PM +0200, Richard Rogalski wrote:
> > > I hope this is the right place for this.
> > 
> > This is great, thanks a lot for your report!  Is this a regression?
> > If so, what's the most recent kernel that worked?  
> > 
> > > In my dmesg output, I get things like:
> > > 
> > > pci 0000:04:00.0: can't claim VGA legacy [mem 0x000a0000-0x000bffff]: no compatible bridge window
> > > pci 0000:06:00.0: can't claim VGA legacy [mem 0x000a0000-0x000bffff]: no compatible bridge window
> > > pci 0000:06:00.1: can't claim BAR 0 [mem 0x84110200000-0x84110203fff 64bit]: no compatible bridge window
> > > 
> > > I opened a bug for amdgpu [here](https://gitlab.freedesktop.org/drm/amd/-/issues/2169) but looking further into it I think it is caused by deeper PCIe problems :\
> > > 
> > > https://gitlab.freedesktop.org/drm/amd/uploads/cbf47807972c8a990bb2a8cdbb39ad9e/8C7CA9QNG dmesg log
> > > https://gitlab.freedesktop.org/drm/amd/uploads/6a799425dea50febd82f8bc11e54433a/ll.txt lspci -vv
> > > https://gitlab.freedesktop.org/drm/amd/uploads/7d4a794b1f7d67a1ffcdee5dfdec3ad6/config.txt kernel .config
> > 
> > Your error output attachment [1] contains an address that looks like
> > it's in 06:00.0 BAR 5:
> > 
> >   pci 0000:06:00.0: reg 0x24: [mem 0x84001200000-0x8400123ffff]
> >   NON-RESUMABLE ERROR: insn effective address [0x0000084001201410]
> > 
> > This looks like an amdgpu issue.  There have been recent changes like
> > c1c39032a074 ("drm/amdgpu: make sure to init common IP before gmc")
> > and dd6aeb4e5f59 ("drm/amdgpu: Don't enable LTR if not supported")
> > that could be related.

Ping for any updates?  Added Lijo, who fixed the LTR issue.

> > The PCI "no compatible bridge window" warnings are definitely an
> > issue, but I don't think they're related to the amdgpu crash:
> > 
> >   pci@400: PCI MEM64 [mem 0x84100000000-0x84dffffffff] offset 80000000000
> >   pci_bus 0000:00: root bus resource [mem 0x84100000000-0x84dffffffff] (bus address [0x4100000000-0x4dffffffff])
> >   pci 0000:09:00.0: can't claim BAR 0 [mem 0x84120000000-0x8412007ffff 64bit]: no compatible bridge window
> > 
> > Those and this from lspci:
> > 
> >   0000:01:00.0 bridge to [bus 02-09] window [mem 0x4100000000-0x412fffffff pref]
> >   0000:02:0c.0 bridge to [bus 09]    window [mem 0x4120000000-0x412fffffff pref]
> >   0000:09:00.0 Intel 82599ES NIC Region 0: Memory at 0x84120000000
> > 
> > are telling us there's something wrong with how the resource-to-bus
> > offset is being applied.  It looks like the offset was applied to the
> > NIC BAR, but didn't get applied to the bridge windows.
> > 
> > Could you start a new thread here (linux-kernel@vger.kernel.org,
> > linux-pci@vger.kernel.org, and sparclinux@vger.kernel.org) for this
> > issue and attach the dmesg log when booting with "ofpci_debug=1"?
> 
> Any chance you could collect a dmesg log with "ofpci_debug=1"?
> 
> I'd like to look at the resource-to-bus offset issue.

I would still like to see this dmesg log if possible.

> > Do the devices we complain about (NICs and storage HBAs 09:00.0,
> > 09:00.1, 0d:00.0, 0d:00.1, 0e:00.0, 0f:00.0, 0001:03:00.0,
> > 0001:03:00.1, 0001:0:00.0, 0001:0a:00.1) work?
> > 
> > Bjorn
> > 
> > [1] https://gitlab.freedesktop.org/drm/amd/uploads/b51f4d6783eeebf90de9a400525d07d6/qq

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: SPARC64: getting "no compatible bridge window" errors :/
  2022-10-21  3:47 Richard Rogalski
@ 2022-10-24 18:14 ` Bjorn Helgaas
  0 siblings, 0 replies; 5+ messages in thread
From: Bjorn Helgaas @ 2022-10-24 18:14 UTC (permalink / raw)
  To: Richard Rogalski
  Cc: alexander.deucher, davem, lijo.lazar, linux-kernel, linux-pci,
	sparclinux

On Fri, Oct 21, 2022 at 05:47:59AM +0200, Richard Rogalski wrote:
> Hello, very very sorry about the late reply. Life has been hectic. Also, not sure if this is how I reply to one of these, sorry if I screwed it up :)
> 
> > This is great, thanks a lot for your report!  Is this a regression?
> 
> Believe it or not, I am a brand new SPARC user :). So I can't say
> right now. Should I try a few old kernel releases to check?

I wouldn't bother trying older kernels.  In fact, I just noticed that
you're running a 5.15 kernel, which is about a year old.  It would be
much more interesting to try to reproduce the problem on a current
kernel, e.g., v6.0.

At https://packages.gentoo.org/packages/sys-kernel/gentoo-kernel, it
doesn't look like sparc gets much attention ;)

> > Any chance you could collect a dmesg log with "ofpci_debug=1"?
> 
> https://gitlab.freedesktop.org/drm/amd/uploads/0ed3c92921d7f88b06654b5f46e9756d/dmesg
> 
> > Do the devices we complain about (NICs and storage HBAs 09:00.0, 
> > 09:00.1, 0d:00.0, 0d:00.1, 0e:00.0, 0f:00.0, 0001:03:00.0, 
> > 0001:03:00.1, 0001:0:00.0, 0001:0a:00.1) work?
> 
> Well, I don't have any fiber optic equipment: these just came with
> the server. Also it has wayy too many NICs. I can't quite say.
> However... for the HBAs, that's where my root is :O. This is mildly
> concerning :D.

I spent way too long looking at these PCI resource weirdnesses.
Bottom line: ignore them.

From your ofpci_debug dmesg log (annotated with logging the PCI core
would do if it were doing this instead of the sparc OF code):

  pci@400: PCI MEM   [mem 0x84000100000-0x8407f7fffff] offset 84000000000
  pci@400: PCI MEM64 [mem 0x84100000000-0x84dffffffff] offset 80000000000
  pci_bus 0000:00: root bus resource [mem 0x84000100000-0x8407f7fffff] (bus address [0x00100000-0x7f7fffff])
  pci_bus 0000:00: root bus resource [mem 0x84100000000-0x84dffffffff] (bus address [0x4100000000-0x4dffffffff])

  pci 0000:04:00.0: can't claim VGA legacy [mem 0x000a0000-0x000bffff]: no compatible bridge window

    This one happens because according to OF, there is no bridge
    aperture to the PCI bus 0xa0000-0xbffff region.  The only accessible
    PCI bus regions are [0x00100000-0x7f7fffff] and
    [0x4100000000-0x4dffffffff].  Probably an OF defect.

  pci 0000:02:0c.0: PCI bridge to [bus 09]
  pci 0000:02:0c.0:       Using flags[0010220c] start[0000004120000000] size[0000000010000000]
  pci 0000:02:0c.0:   bridge window [mem 0x84120000000-0x8412fffffff 64bit pref]
  pci 0000:09:00.0: can't claim BAR 0 [mem 0x84120000000-0x8412007ffff 64bit]: no compatible bridge window

    These and similar warnings happen because OF says the upstream
    bridge window is prefetchable, but this is a non-prefetchable BAR.
    These likely work fine because in most cases prefetching will not
    occur on PCIe, even though the bridge window allows it.

So the warnings above are mostly harmless.  If you were to hot-add
something, there could be issues because we aren't keeping track of
the space these devices use.

lspci on sparc is unusual: it shows PCI bus addresses, not CPU
physical addresses like other arches [1], which means we see things
like this in dmesg, which shows the CPU physical address:

  pci_bus 0000:00: root bus resource [mem 0x84000100000-0x8407f7fffff] (bus address [0x00100000-0x7f7fffff])
  pci 0000:04:00.0: reg 0x10: [mem 0x84000800000-0x84000ffffff]

and this in lspci, which is the PCI bus address:

  0000:04:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 10) (prog-if 00 [VGA controller])
      Region 0: Memory at 00800000 (32-bit, non-prefetchable) [size=8M]

Annoying but harmless.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/?id=v5.18#n1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: SPARC64: getting "no compatible bridge window" errors :/
@ 2022-10-21  3:47 Richard Rogalski
  2022-10-24 18:14 ` Bjorn Helgaas
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Rogalski @ 2022-10-21  3:47 UTC (permalink / raw)
  To: helgaas
  Cc: alexander.deucher, davem, lijo.lazar, linux-kernel, linux-pci,
	rrogalski, sparclinux

Hello, very very sorry about the late reply. Life has been hectic. Also, not sure if this is how I reply to one of these, sorry if I screwed it up :)

> This is great, thanks a lot for your report!  Is this a regression?

Believe it or not, I am a brand new SPARC user :). So I can't say right now. Should I try a few old kernel releases to check?

> Any chance you could collect a dmesg log with "ofpci_debug=1"?

https://gitlab.freedesktop.org/drm/amd/uploads/0ed3c92921d7f88b06654b5f46e9756d/dmesg

> Do the devices we complain about (NICs and storage HBAs 09:00.0, 
> 09:00.1, 0d:00.0, 0d:00.1, 0e:00.0, 0f:00.0, 0001:03:00.0, 
> 0001:03:00.1, 0001:0:00.0, 0001:0a:00.1) work?

Well, I don't have any fiber optic equipment: these just came with the server. Also it has wayy too many NICs. I can't quite say.
However... for the HBAs, that's where my root is :O. This is mildly concerning :D.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-10-24 20:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <NCp_h9j--3-2@tutanota.com>
2022-09-26 17:11 ` SPARC64: getting "no compatible bridge window" errors :/ Bjorn Helgaas
2022-10-03 22:35   ` Bjorn Helgaas
2022-10-10 21:36     ` Bjorn Helgaas
2022-10-21  3:47 Richard Rogalski
2022-10-24 18:14 ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).