All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
	linux-pci@vger.kernel.org,
	Alex Williamson <alex.williamson@redhat.com>
Subject: Re: PCI resource allocation mismatch with BIOS
Date: Mon, 28 Nov 2022 14:39:32 -0600	[thread overview]
Message-ID: <20221128203932.GA644781@bhelgaas> (raw)
In-Reply-To: <Y4SYBtaP1hTWGsYn@black.fi.intel.com>

[+cc Alex]

Hi Mika,

On Mon, Nov 28, 2022 at 01:14:14PM +0200, Mika Westerberg wrote:
> Hi Bjorn,
> 
> There is another PCI resource allocation issue with some Intel GPUs but
> probably applies to other similar devices as well. This is something
> encountered in data centers where they trigger reset (secondary bus
> reset) to the GPUs if there is hang or similar detected. Basically they
> do something like:
> 
>   1. Unbind the graphics driver(s) through sysfs.
>   2. Remove the PCIe devices under the root port or the PCIe switch
>      upstream port through sysfs (echo 1 > ../remove).
>   3. Trigger reset through config space or use the sysfs reset attribute.
>   4. Run rescan on the root bus (echo 1 > /sys/bus/pci/rescan) 
> 
> Expectation is to see the devices come back in the same way prior the
> reset but what actually happens is that the Linux PCI resource
> allocation fails to allocate space for some of the resources. In this
> case it is the IOV BARs.
> 
> BIOS allocates resources for all these at boot time but after the rescan
> Linux tries to re-allocate them but since the allocation algorithm is
> more "consuming" some of the BARs do not fit to the available resource
> space.

Thanks for the report!  Definitely sounds like an issue.  I doubt that
I'll have time to work on it myself in the near future.

Is the "remove" before the reset actually necessary?  If we could
avoid the removal, maybe the config space save/restore we already do
around reset would avoid the issue?

Bjorn

> Here is an example. The devices involved are:
> 
> 53:00.0		GPU with IOV BARs
> 52:01.0		PCIe switch downstream port
> 
> PF = Physical Function
> VF = Virtual Function
> 
> BIOS allocation (dmesg)
> -----------------------
> pci 0000:52:01.0: scanning [bus 53-54] behind bridge, pass 0
> pci 0000:53:00.0: [8086:56c0] type 00 class 0x038000
> pci 0000:53:00.0: reg 0x10: [mem 0x205e1f000000-0x205e1fffffff 64bit pref]
> pci 0000:53:00.0: reg 0x18: [mem 0x201c00000000-0x201fffffffff 64bit pref]
> pci 0000:53:00.0: reg 0x30: [mem 0xffe00000-0xffffffff pref]
> pci 0000:53:00.0: reg 0x344: [mem 0x205e00000000-0x205e00ffffff 64bit pref]
> pci 0000:53:00.0: VF(n) BAR0 space: [mem 0x205e00000000-0x205e1effffff 64bit pref] (contains BAR0 for 31 VFs)
> pci 0000:53:00.0: reg 0x34c: [mem 0x202000000000-0x2021ffffffff 64bit pref]
> pci 0000:53:00.0: VF(n) BAR2 space: [mem 0x202000000000-0x205dffffffff 64bit pref] (contains BAR2 for 31 VFs)
> pci 0000:52:01.0: PCI bridge to [bus 53-54]
> pci 0000:52:01.0:   bridge window [mem 0x201c00000000-0x205e1fffffff 64bit pref]
> 
> GPU
> ~~~
> 0x201c00000000-0x201fffffffff	PF BAR2 16384M
> 0x202000000000-0x205dffffffff	VF BAR2	253952M (31 * 8G)
> 0x205e00000000-0x205e1effffff	VF BAR0 496M (31 * 16M)
> 0x205e1f000000-0x205e1fffffff 	PF BAR0 16M
> 					270848M
> 
> PCIe downstream port
> ~~~~~~~~~~~~~~~~~~~~
> 0x201c00000000-0x205e1fffffff		270848M
> 
> Linux allocation (dmesg)
> ------------------------
> pci 0000:52:01.0: [8086:4fa4] type 01 class 0x060400
> pci_bus 0000:52: fixups for bus
> pci 0000:51:00.0: PCI bridge to [bus 52-54]
> pci 0000:51:00.0:   bridge window [io  0x0000-0x0fff]
> pci 0000:51:00.0:   bridge window [mem 0x00000000-0x000fffff]
> pci 0000:51:00.0:   bridge window [mem 0x00000000-0x000fffff 64bit pref]
> pci 0000:52:01.0: scanning [bus 00-00] behind bridge, pass 0
> pci 0000:52:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
> pci 0000:52:01.0: scanning [bus 00-00] behind bridge, pass 1
> pci_bus 0000:53: scanning bus
> pci 0000:53:00.0: [8086:56c0] type 00 class 0x038000
> pci 0000:53:00.0: reg 0x10: [mem 0x00000000-0x00ffffff 64bit pref]
> pci 0000:53:00.0: reg 0x18: [mem 0x00000000-0x3ffffffff 64bit pref]
> pci 0000:53:00.0: reg 0x30: [mem 0x00000000-0x001fffff pref]
> pci 0000:53:00.0: reg 0x344: [mem 0x00000000-0x00ffffff 64bit pref]
> pci 0000:53:00.0: VF(n) BAR0 space: [mem 0x00000000-0x1effffff 64bit pref] (contains BAR0 for 31 VFs)
> pci 0000:53:00.0: reg 0x34c: [mem 0x00000000-0x1ffffffff 64bit pref]
> pci 0000:53:00.0: VF(n) BAR2 space: [mem 0x00000000-0x3dffffffff 64bit pref] (contains BAR2 for 31 VFs)
> pci_bus 0000:53: fixups for bus
> pci 0000:52:01.0: PCI bridge to [bus 53-54]
> pci 0000:52:01.0:   bridge window [io  0x0000-0x0fff]
> pci 0000:52:01.0:   bridge window [mem 0x00000000-0x000fffff]
> pci 0000:52:01.0:   bridge window [mem 0x00000000-0x000fffff 64bit pref]
> pci 0000:52:01.0: bridge window [mem 0x200000000-0x7ffffffff 64bit pref] to [bus 53] add_size 3e00000000 add_align 200000000
> pci 0000:51:00.0: bridge window [mem 0x200000000-0x7ffffffff 64bit pref] to [bus 52-53] add_size 3e00000000 add_align 200000000
> pcieport 0000:50:02.0: BAR 13: assigned [io  0x8000-0x8fff]
> pci 0000:51:00.0: BAR 15: no space for [mem size 0x4400000000 64bit pref]
> pci 0000:51:00.0: BAR 15: failed to assign [mem size 0x4400000000 64bit pref]
> pci 0000:51:00.0: BAR 0: assigned [mem 0x201c00000000-0x201c007fffff 64bit pref]
> pci 0000:51:00.0: BAR 14: assigned [mem 0xbb800000-0xbb9fffff]
> pci 0000:51:00.0: BAR 13: assigned [io  0x8000-0x8fff]
> pci 0000:51:00.0: BAR 15: assigned [mem 0x201c00000000-0x2021ffffffff 64bit pref]
> pci 0000:51:00.0: BAR 0: assigned [mem 0x202200000000-0x2022007fffff 64bit pref]
> pci 0000:51:00.0: BAR 14: assigned [mem 0xbb800000-0xbb9fffff]
> pci 0000:51:00.0: BAR 15: [mem 0x201c00000000-0x2021ffffffff 64bit pref] (failed to expand by 0x3e00000000)
> pci 0000:51:00.0: failed to add 3e00000000 res[15]=[mem 0x201c00000000-0x2021ffffffff 64bit pref]
> pci 0000:52:01.0: BAR 15: no space for [mem size 0x4400000000 64bit pref]
> pci 0000:52:01.0: BAR 15: failed to assign [mem size 0x4400000000 64bit pref]
> pci 0000:52:01.0: BAR 14: assigned [mem 0xbb800000-0xbb9fffff]
> pci 0000:52:01.0: BAR 13: assigned [io  0x8000-0x8fff]
> pci 0000:52:01.0: BAR 15: assigned [mem 0x201c00000000-0x2021ffffffff 64bit pref]
> pci 0000:52:01.0: BAR 14: assigned [mem 0xbb800000-0xbb9fffff]
> pci 0000:52:01.0: BAR 15: [mem 0x201c00000000-0x2021ffffffff 64bit pref] (failed to expand by 0x3e00000000)
> pci 0000:52:01.0: failed to add 3e00000000 res[15]=[mem 0x201c00000000-0x2021ffffffff 64bit pref]
> pci 0000:53:00.0: BAR 2: assigned [mem 0x201c00000000-0x201fffffffff 64bit pref]
> pci 0000:53:00.0: BAR 9: no space for [mem size 0x3e00000000 64bit pref]
> pci 0000:53:00.0: BAR 9: failed to assign [mem size 0x3e00000000 64bit pref]
> pci 0000:53:00.0: BAR 0: assigned [mem 0x202000000000-0x202000ffffff 64bit pref]
> pci 0000:53:00.0: BAR 7: assigned [mem 0x202001000000-0x20201fffffff 64bit pref]
> pci 0000:53:00.0: BAR 6: assigned [mem 0xbb800000-0xbb9fffff pref]
> pci 0000:53:00.0: BAR 2: assigned [mem 0x201c00000000-0x201fffffffff 64bit pref]
> pci 0000:53:00.0: BAR 0: assigned [mem 0x202000000000-0x202000ffffff 64bit pref]
> pci 0000:53:00.0: BAR 6: assigned [mem 0xbb800000-0xbb9fffff pref]
> pci 0000:53:00.0: BAR 9: no space for [mem size 0x3e00000000 64bit pref]
> pci 0000:53:00.0: BAR 9: failed to assign [mem size 0x3e00000000 64bit pref]
> pci 0000:53:00.0: BAR 7: assigned [mem 0x202001000000-0x20201fffffff 64bit pref]
> pci 0000:52:01.0: PCI bridge to [bus 53]
> pci 0000:52:01.0:   bridge window [io  0x8000-0x8fff]
> pci 0000:52:01.0:   bridge window [mem 0xbb800000-0xbb9fffff]
> pci 0000:52:01.0:   bridge window [mem 0x201c00000000-0x2021ffffffff 64bit pref]
> 
> GPU
> ~~~
> 0x201c00000000-0x201fffffffff	PF BAR2 16834M
> 0x202000000000-0x202000ffffff	PF BAR0	16M
> 0x202001000000-0x20201fffffff 	VF BAR0	496M (31 * 16M)
> FAIL				VF BAR2 253952M (31 * 8G)
> 
> PCIe downstream port
> ~~~~~~~~~~~~~~~~~~~~
> 0x201c00000000-0x2021ffffffff		24576M
> 
> Now, if I hack the allocation algorithm (in pbus_size_mem()) to "mimic"
> the BIOS allocation then these fit fine. However, if the BIOS allocation
> ever changes we may end up in similar issue. Also the Linux PCI resource
> allocation code has been like that for aeons so changing it would likely
> cause regressions.
> 
> Let me know if more information is needed. I have one of these cards
> locally and have remote access to a similar system where the above
> example was take so I can run additional testing.
> 
> Also let me know if you want me to file a bug in kernel.org bugzilla.
> 
> Thanks in advance!

  reply	other threads:[~2022-11-28 20:39 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-28 11:14 PCI resource allocation mismatch with BIOS Mika Westerberg
2022-11-28 20:39 ` Bjorn Helgaas [this message]
2022-11-28 22:06   ` Alex Williamson
2022-11-29  6:48     ` Lukas Wunner
2022-11-29 10:09       ` Mika Westerberg
2022-11-29 13:52       ` Alex Williamson
2022-11-29 15:07         ` Mika Westerberg
2022-11-29 15:46           ` Alex Williamson
2022-11-29 16:06             ` Lukas Wunner
2022-11-29 16:12               ` Alex Williamson
2022-11-30  7:43                 ` Lukas Wunner
2022-11-30  7:57                   ` Mika Westerberg
2022-11-30 15:47                     ` Alex Williamson
2022-12-01  9:41                       ` Mika Westerberg
2022-12-09 11:08                         ` Mika Westerberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221128203932.GA644781@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=alex.williamson@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=mika.westerberg@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.