From: Mika Westerberg <mika.westerberg@linux.intel.com>
To: Alexander Motin <mav@ixsystems.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
Jonathan Cameron <Jonathan.Cameron@huawei.com>,
Lukas Wunner <lukas@wunner.de>,
Chris Chiu <chris.chiu@canonical.com>,
Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>,
linux-pci@vger.kernel.org
Subject: Re: [PATCH v4 1/2] PCI: Take other bus devices into account when distributing resources
Date: Tue, 10 Jan 2023 11:54:28 +0200 [thread overview]
Message-ID: <Y7011Eg89E1cnoSU@black.fi.intel.com> (raw)
In-Reply-To: <7bdef8bb-7a5e-2b5d-35ba-56cefb38d91f@ixsystems.com>
Hi,
On Mon, Jan 09, 2023 at 02:33:35PM -0500, Alexander Motin wrote:
> On 04.01.2023 04:16, Mika Westerberg wrote:
> > A PCI bridge may reside on a bus with other devices as well. The
> > resource distribution code does not take this into account properly and
> > therefore it expands the bridge resource windows too much, not leaving
> > space for the other devices (or functions of a multifunction device) and
> > this leads to an issue that Jonathan reported. He runs QEMU with the
> > following topology (QEMU parameters):
> >
> > -device pcie-root-port,port=0,id=root_port13,chassis=0,slot=2 \
> > -device x3130-upstream,id=sw1,bus=root_port13,multifunction=on \
> > -device e1000,bus=root_port13,addr=0.1 \
> > -device xio3130-downstream,id=fun1,bus=sw1,chassis=0,slot=3 \
> > -device e1000,bus=fun1
> >
> > The first e1000 NIC here is another function in the switch upstream
> > port. This leads to following errors:
> >
> > pci 0000:00:04.0: bridge window [mem 0x10200000-0x103fffff] to [bus 02-04]
> > pci 0000:02:00.0: bridge window [mem 0x10200000-0x103fffff] to [bus 03-04]
> > pci 0000:02:00.1: BAR 0: failed to assign [mem size 0x00020000]
> > e1000 0000:02:00.1: can't ioremap BAR 0: [??? 0x00000000 flags 0x0]
> >
> > Fix this by taking into account the device BARs on the bus, including
> > the ones belonging to bridges themselves.
> >
> > Link: https://lore.kernel.org/linux-pci/20221014124553.0000696f@huawei.com/
> > Link: https://lore.kernel.org/linux-pci/6053736d-1923-41e7-def9-7585ce1772d9@ixsystems.com/
> > Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Reported-by: Alexander Motin <mav@ixsystems.com>
> > Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > ---
> > drivers/pci/setup-bus.c | 205 +++++++++++++++++++++++++---------------
> > 1 file changed, 129 insertions(+), 76 deletions(-)
> >
> > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> > index b4096598dbcb..cf6a7bdf2427 100644
> > --- a/drivers/pci/setup-bus.c
> > +++ b/drivers/pci/setup-bus.c
> > @@ -1750,6 +1750,7 @@ static void adjust_bridge_window(struct pci_dev *bridge, struct resource *res,
> > resource_size_t new_size)
> > {
> > resource_size_t add_size, size = resource_size(res);
> > + resource_size_t min_size;
> > if (res->parent)
> > return;
> > @@ -1757,30 +1758,87 @@ static void adjust_bridge_window(struct pci_dev *bridge, struct resource *res,
> > if (!new_size)
> > return;
> > + /* Minimum granularity of a bridge bridge window */
> > + min_size = window_alignment(bridge->bus, res->flags);
> > +
> > if (new_size > size) {
> > add_size = new_size - size;
> > + if (add_size < min_size)
> > + return;
> > pci_dbg(bridge, "bridge window %pR extended by %pa\n", res,
> > &add_size);
> > } else if (new_size < size) {
> > add_size = size - new_size;
> > + if (add_size < min_size)
> > + return;
>
> May be I don't understand something, but in what situation it may happen,
> and won't it be a problem if you silently do nothing here, while the calling
> code will use the passed new_size as-is?
Good point.
I saw this happening when I run QEMU but now that I started to look at
it again the reason seems to be that there is a 32-bit prefetchable
option ROM BAR for the e1000 NIC and that gets reduced incorrectly by
the reduce_dev_resources() which causes the "imbalance".
> > pci_dbg(bridge, "bridge window %pR shrunken by %pa\n", res,
> > &add_size);
> > + } else {
> > + return;
> > }
> > res->end = res->start + new_size - 1;
> > remove_from_list(add_list, res);
> > }
> > +static void reduce_dev_resources(struct pci_dev *dev, struct resource *io,
> > + struct resource *mmio,
> > + struct resource *mmio_pref)
> > +{
> > + int i;
> > +
> > + for (i = 0; i < PCI_NUM_RESOURCES; i++) {
> > + struct resource *res = &dev->resource[i];
> > + resource_size_t align, tmp, size;
> > +
> > + size = resource_size(res);
> > + if (!size)
> > + continue;
> > +
> > + align = pci_resource_alignment(dev, res);
> > +
> > + if (resource_type(res) == IORESOURCE_IO) {
> > + align = align ? ALIGN(io->start, align) - io->start : 0;
> > + tmp = align + size;
> > + io->start = min(io->start + tmp, io->end + 1);
> > + } else if (resource_type(res) == IORESOURCE_MEM) {
> > + if (res->flags & IORESOURCE_PREFETCH) {
> > + align = align ? ALIGN(mmio_pref->start, align) -
> > + mmio_pref->start : 0;
> > + tmp = align + size;
> > + mmio_pref->start = min(mmio_pref->start + tmp,
> > + mmio_pref->end + 1);
> > + } else {
> > + align = align ? ALIGN(mmio->start, align) -
> > + mmio->start : 0;
> > + tmp = align + size;
> > + mmio->start = min(mmio->start + tmp,
> > + mmio->end + 1);
> > + }
> > + }
> > + }
> > +}
> > +
> > +/*
> > + * io, mmio and mmio_pref contain the total amount of bridge window
> > + * space available. This includes the minimal space needed to cover all
> > + * the existing devices on the bus and the possible extra space that can
> > + * be shared with the bridges.
> > + *
> > + * The resource space consumed by bus->self (the bridge) is already
> > + * reduced.
> > + */
> > static void pci_bus_distribute_available_resources(struct pci_bus *bus,
> > struct list_head *add_list,
> > struct resource io,
> > struct resource mmio,
> > struct resource mmio_pref)
> > {
> > + resource_size_t io_align, mmio_align, mmio_pref_align;
> > + resource_size_t io_per_b, mmio_per_b, mmio_pref_per_b;
> > unsigned int normal_bridges = 0, hotplug_bridges = 0;
> > struct resource *io_res, *mmio_res, *mmio_pref_res;
> > struct pci_dev *dev, *bridge = bus->self;
> > - resource_size_t io_per_hp, mmio_per_hp, mmio_pref_per_hp, align;
> > io_res = &bridge->resource[PCI_BRIDGE_IO_WINDOW];
> > mmio_res = &bridge->resource[PCI_BRIDGE_MEM_WINDOW];
> > @@ -1790,17 +1848,17 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
> > * The alignment of this bridge is yet to be considered, hence it must
> > * be done now before extending its bridge window.
> > */
> > - align = pci_resource_alignment(bridge, io_res);
> > - if (!io_res->parent && align)
> > - io.start = min(ALIGN(io.start, align), io.end + 1);
> > + io_align = pci_resource_alignment(bridge, io_res);
> > + if (!io_res->parent && io_align)
> > + io.start = min(ALIGN(io.start, io_align), io.end + 1);
> > - align = pci_resource_alignment(bridge, mmio_res);
> > - if (!mmio_res->parent && align)
> > - mmio.start = min(ALIGN(mmio.start, align), mmio.end + 1);
> > + mmio_align = pci_resource_alignment(bridge, mmio_res);
> > + if (!mmio_res->parent && mmio_align)
> > + mmio.start = min(ALIGN(mmio.start, mmio_align), mmio.end + 1);
> > - align = pci_resource_alignment(bridge, mmio_pref_res);
> > - if (!mmio_pref_res->parent && align)
> > - mmio_pref.start = min(ALIGN(mmio_pref.start, align),
> > + mmio_pref_align = pci_resource_alignment(bridge, mmio_pref_res);
> > + if (!mmio_pref_res->parent && mmio_pref_align)
> > + mmio_pref.start = min(ALIGN(mmio_pref.start, mmio_pref_align),
> > mmio_pref.end + 1);
> > /*
> > @@ -1824,94 +1882,89 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
> > normal_bridges++;
> > }
> > - /*
> > - * There is only one bridge on the bus so it gets all available
> > - * resources which it can then distribute to the possible hotplug
> > - * bridges below.
> > - */
> > - if (hotplug_bridges + normal_bridges == 1) {
> > - dev = list_first_entry(&bus->devices, struct pci_dev, bus_list);
> > - if (dev->subordinate)
> > - pci_bus_distribute_available_resources(dev->subordinate,
> > - add_list, io, mmio, mmio_pref);
> > - return;
> > - }
> > -
> > - if (hotplug_bridges == 0)
> > + if (!(hotplug_bridges + normal_bridges))
> > return;
> > /*
> > * Calculate the total amount of extra resource space we can
> > - * pass to bridges below this one. This is basically the
> > - * extra space reduced by the minimal required space for the
> > - * non-hotplug bridges.
> > + * pass to bridges below this one. This is basically the extra
> > + * space reduced by the minimal required space for the bridge
> > + * windows and device BARs on this bus.
> > */
> > - for_each_pci_bridge(dev, bus) {
> > - resource_size_t used_size;
> > - struct resource *res;
> > -
> > - if (dev->is_hotplug_bridge)
> > - continue;
> > -
> > - /*
> > - * Reduce the available resource space by what the
> > - * bridge and devices below it occupy.
> > - */
> > - res = &dev->resource[PCI_BRIDGE_IO_WINDOW];
> > - align = pci_resource_alignment(dev, res);
> > - align = align ? ALIGN(io.start, align) - io.start : 0;
> > - used_size = align + resource_size(res);
> > - if (!res->parent)
> > - io.start = min(io.start + used_size, io.end + 1);
> > -
> > - res = &dev->resource[PCI_BRIDGE_MEM_WINDOW];
> > - align = pci_resource_alignment(dev, res);
> > - align = align ? ALIGN(mmio.start, align) - mmio.start : 0;
> > - used_size = align + resource_size(res);
> > - if (!res->parent)
> > - mmio.start = min(mmio.start + used_size, mmio.end + 1);
> > + list_for_each_entry(dev, &bus->devices, bus_list)
> > + reduce_dev_resources(dev, &io, &mmio, &mmio_pref);
> > - res = &dev->resource[PCI_BRIDGE_PREF_MEM_WINDOW];
> > - align = pci_resource_alignment(dev, res);
> > - align = align ? ALIGN(mmio_pref.start, align) -
> > - mmio_pref.start : 0;
> > - used_size = align + resource_size(res);
> > - if (!res->parent)
> > - mmio_pref.start = min(mmio_pref.start + used_size,
> > - mmio_pref.end + 1);
> > + /*
> > + * If there is at least one hotplug bridge on this bus it gets
> > + * all the extra resource space that was left after the
> > + * reductions above.
> > + *
> > + * If there are no hotplug bridges the extra resource space is
> > + * split between non-hotplug bridges. This is to allow possible
> > + * hotplug bridges below them to get the extra space as well.
> > + */
> > + if (hotplug_bridges) {
> > + io_per_b = div64_ul(resource_size(&io), hotplug_bridges);
> > + mmio_per_b = div64_ul(resource_size(&mmio), hotplug_bridges);
> > + mmio_pref_per_b = div64_ul(resource_size(&mmio_pref),
> > + hotplug_bridges);
> > + } else {
> > + io_per_b = div64_ul(resource_size(&io), normal_bridges);
> > + mmio_per_b = div64_ul(resource_size(&mmio), normal_bridges);
> > + mmio_pref_per_b = div64_ul(resource_size(&mmio_pref),
> > + normal_bridges);
> > }
> > - io_per_hp = div64_ul(resource_size(&io), hotplug_bridges);
> > - mmio_per_hp = div64_ul(resource_size(&mmio), hotplug_bridges);
> > - mmio_pref_per_hp = div64_ul(resource_size(&mmio_pref),
> > - hotplug_bridges);
> > -
> > /*
> > - * Go over devices on this bus and distribute the remaining
> > - * resource space between hotplug bridges.
> > + * Make sure the split resource space is properly aligned for
> > + * bridge windows (align it down to avoid going above what is
> > + * available).
> > */
> > + if (io_align)
> > + io_per_b = ALIGN_DOWN(io_per_b, io_align);
> > + if (mmio_align)
> > + mmio_per_b = ALIGN_DOWN(mmio_per_b, mmio_align);
> > + if (mmio_pref_align)
> > + mmio_pref_per_b = ALIGN_DOWN(mmio_pref_per_b, mmio_align);
>
> If I understand it right, you are applying alignment requirements of the
> parent bridge to the windows of its children. I don't have examples of any
> bridge with different alignment, but shouldn't we better get and use proper
> alignment inside the loop below?
Yes, I agree.
next prev parent reply other threads:[~2023-01-10 9:54 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-04 9:16 [PATCH v4 0/2] PCI: distribute resources for root buses Mika Westerberg
2023-01-04 9:16 ` [PATCH v4 1/2] PCI: Take other bus devices into account when distributing resources Mika Westerberg
2023-01-04 22:48 ` Bjorn Helgaas
2023-01-05 9:12 ` Mika Westerberg
2023-01-05 13:43 ` Mika Westerberg
2023-01-05 17:04 ` Bjorn Helgaas
2023-01-09 11:11 ` Mika Westerberg
2023-01-09 18:27 ` Alexander Motin
2023-01-10 10:07 ` Mika Westerberg
2023-01-09 19:33 ` Alexander Motin
2023-01-10 9:54 ` Mika Westerberg [this message]
2023-01-04 9:16 ` [PATCH v4 2/2] PCI: Distribute available resources for root buses too Mika Westerberg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y7011Eg89E1cnoSU@black.fi.intel.com \
--to=mika.westerberg@linux.intel.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=andriy.shevchenko@linux.intel.com \
--cc=bhelgaas@google.com \
--cc=chris.chiu@canonical.com \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=mav@ixsystems.com \
--cc=nicholas.johnson-opensource@outlook.com.au \
--cc=rafael.j.wysocki@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).