From: Bjorn Helgaas <helgaas@kernel.org>
To: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
Jonathan Cameron <Jonathan.Cameron@huawei.com>,
Lukas Wunner <lukas@wunner.de>,
Chris Chiu <chris.chiu@canonical.com>,
linux-pci@vger.kernel.org
Subject: Re: [PATCH v3 1/2] PCI: Take other bus devices into account when distributing resources
Date: Fri, 2 Dec 2022 17:34:24 -0600 [thread overview]
Message-ID: <20221202233424.GA1070935@bhelgaas> (raw)
In-Reply-To: <20221130112221.66612-2-mika.westerberg@linux.intel.com>
Hi Mika,
On Wed, Nov 30, 2022 at 01:22:20PM +0200, Mika Westerberg wrote:
> A PCI bridge may reside on a bus with other devices as well. The
> resource distribution code does not take this into account properly and
> therefore it expands the bridge resource windows too much, not leaving
> space for the other devices (or functions a multifunction device) and
functions *of* a
> this leads to an issue that Jonathan reported. He runs QEMU with the
> following topoology (QEMU parameters):
topology
> -device pcie-root-port,port=0,id=root_port13,chassis=0,slot=2 \
> -device x3130-upstream,id=sw1,bus=root_port13,multifunction=on \
> -device e1000,bus=root_port13,addr=0.1 \
> -device xio3130-downstream,id=fun1,bus=sw1,chassis=0,slot=3 \
> -device e1000,bus=fun1
If you use spaces instead of tabs above, the "\" will stay lined up
when git log indents.
> The first e1000 NIC here is another function in the switch upstream
> port. This leads to following errors:
>
> pci 0000:00:04.0: bridge window [mem 0x10200000-0x103fffff] to [bus 02-04]
> pci 0000:02:00.0: bridge window [mem 0x10200000-0x103fffff] to [bus 03-04]
> pci 0000:02:00.1: BAR 0: failed to assign [mem size 0x00020000]
> e1000 0000:02:00.1: can't ioremap BAR 0: [??? 0x00000000 flags 0x0]
>
> Fix this by taking into account the possible multifunction devices when
> uptream port resources are distributed.
"upstream", although I think I would word this so it's less
PCIe-centric. IIUC, we just want to account for all the BARs on the
bus, whether they're in bridges, peers in a multi-function device, or
other devices.
> Link: https://lore.kernel.org/linux-pci/20221014124553.0000696f@huawei.com/
> Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> ---
> drivers/pci/setup-bus.c | 66 ++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 62 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index b4096598dbcb..d456175ddc4f 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -1830,10 +1830,68 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
> * bridges below.
> */
> if (hotplug_bridges + normal_bridges == 1) {
> - dev = list_first_entry(&bus->devices, struct pci_dev, bus_list);
> - if (dev->subordinate)
> - pci_bus_distribute_available_resources(dev->subordinate,
> - add_list, io, mmio, mmio_pref);
> + bridge = NULL;
> +
> + /* Find the single bridge on this bus first */
> + for_each_pci_bridge(dev, bus) {
> + bridge = dev;
> + break;
> + }
If we just remember "bridge" in the loop before this hunk, could we
get rid of the loop here? E.g.,
bridge = NULL;
for_each_pci_bridge(dev, bus) {
bridge = dev;
if (dev->is_hotplug_bridge)
hotplug_bridges++;
else
normal_bridges++;
}
> +
> + if (WARN_ON_ONCE(!bridge))
> + return;
Then I think this would be superfluous.
> + if (!bridge->subordinate)
> + return;
> +
> + /*
> + * Reduce the space available for distribution by the
> + * amount required by the other devices on the same bus
> + * as this bridge.
> + */
> + list_for_each_entry(dev, &bus->devices, bus_list) {
> + int i;
> +
> + if (dev == bridge)
> + continue;
Why do we skip "bridge"? Bridges are allowed to have two BARs
themselves, and it seems like they should be included here.
> + for (i = 0; i < PCI_NUM_RESOURCES; i++) {
> + const struct resource *dev_res = &dev->resource[i];
> + resource_size_t dev_sz;
> + struct resource *b_res;
> +
> + if (dev_res->flags & IORESOURCE_IO) {
> + b_res = &io;
> + } else if (dev_res->flags & IORESOURCE_MEM) {
> + if (dev_res->flags & IORESOURCE_PREFETCH)
> + b_res = &mmio_pref;
> + else
> + b_res = &mmio;
> + } else {
> + continue;
> + }
> +
> + /* Size aligned to bridge window */
> + align = pci_resource_alignment(bridge, b_res);
> + dev_sz = ALIGN(resource_size(dev_res), align);
> + if (!dev_sz)
> + continue;
> +
> + pci_dbg(dev, "resource %pR aligned to %#llx\n",
> + dev_res, (unsigned long long)dev_sz);
> +
> + if (dev_sz > resource_size(b_res))
> + memset(b_res, 0, sizeof(*b_res));
> + else
> + b_res->end -= dev_sz;
> +
> + pci_dbg(bridge, "updated available resources to %pR\n",
> + b_res);
> + }
> + }
This only happens for buses with a single bridge. Shouldn't it happen
regardless of how many bridges there are?
This block feels like something that could be split out to a separate
function. It looks like it only needs "bus", "io", "mmio",
"mmio_pref", and maybe "bridge".
I don't understand the "bridge" part; it looks like that's basically
to use 4K alignment for I/O windows and 1M for memory windows?
Using "bridge" seems like a clunky way to figure that out.
Bjorn
next prev parent reply other threads:[~2022-12-02 23:34 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-30 11:22 [PATCH v3 0/2] PCI: Distribute resources for root buses Mika Westerberg
2022-11-30 11:22 ` [PATCH v3 1/2] PCI: Take other bus devices into account when distributing resources Mika Westerberg
2022-12-02 17:45 ` Jonathan Cameron
2022-12-02 23:35 ` Bjorn Helgaas
2022-12-02 23:34 ` Bjorn Helgaas [this message]
2022-12-05 7:28 ` Mika Westerberg
2022-12-05 22:46 ` Bjorn Helgaas
2022-11-30 11:22 ` [PATCH v3 2/2] PCI: Distribute available resources for root buses too Mika Westerberg
2022-12-02 18:01 ` Jonathan Cameron
2022-12-02 17:07 ` [PATCH v3 0/2] PCI: Distribute resources for root buses Jonathan Cameron
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221202233424.GA1070935@bhelgaas \
--to=helgaas@kernel.org \
--cc=Jonathan.Cameron@huawei.com \
--cc=andriy.shevchenko@linux.intel.com \
--cc=bhelgaas@google.com \
--cc=chris.chiu@canonical.com \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=mika.westerberg@linux.intel.com \
--cc=rafael.j.wysocki@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).