linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
	Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Lukas Wunner <lukas@wunner.de>,
	Chris Chiu <chris.chiu@canonical.com>,
	linux-pci@vger.kernel.org
Subject: Re: [PATCH v3 1/2] PCI: Take other bus devices into account when distributing resources
Date: Fri, 2 Dec 2022 17:34:24 -0600	[thread overview]
Message-ID: <20221202233424.GA1070935@bhelgaas> (raw)
In-Reply-To: <20221130112221.66612-2-mika.westerberg@linux.intel.com>

Hi Mika,

On Wed, Nov 30, 2022 at 01:22:20PM +0200, Mika Westerberg wrote:
> A PCI bridge may reside on a bus with other devices as well. The
> resource distribution code does not take this into account properly and
> therefore it expands the bridge resource windows too much, not leaving
> space for the other devices (or functions a multifunction device) and

functions *of* a 

> this leads to an issue that Jonathan reported. He runs QEMU with the
> following topoology (QEMU parameters):

topology

>  -device pcie-root-port,port=0,id=root_port13,chassis=0,slot=2	\
>  -device x3130-upstream,id=sw1,bus=root_port13,multifunction=on	\
>  -device e1000,bus=root_port13,addr=0.1 			\
>  -device xio3130-downstream,id=fun1,bus=sw1,chassis=0,slot=3	\
>  -device e1000,bus=fun1

If you use spaces instead of tabs above, the "\" will stay lined up
when git log indents.

> The first e1000 NIC here is another function in the switch upstream
> port. This leads to following errors:
> 
>   pci 0000:00:04.0: bridge window [mem 0x10200000-0x103fffff] to [bus 02-04]
>   pci 0000:02:00.0: bridge window [mem 0x10200000-0x103fffff] to [bus 03-04]
>   pci 0000:02:00.1: BAR 0: failed to assign [mem size 0x00020000]
>   e1000 0000:02:00.1: can't ioremap BAR 0: [??? 0x00000000 flags 0x0]
> 
> Fix this by taking into account the possible multifunction devices when
> uptream port resources are distributed.

"upstream", although I think I would word this so it's less
PCIe-centric.  IIUC, we just want to account for all the BARs on the
bus, whether they're in bridges, peers in a multi-function device, or
other devices.

> Link: https://lore.kernel.org/linux-pci/20221014124553.0000696f@huawei.com/
> Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> ---
>  drivers/pci/setup-bus.c | 66 ++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 62 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index b4096598dbcb..d456175ddc4f 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -1830,10 +1830,68 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
>  	 * bridges below.
>  	 */
>  	if (hotplug_bridges + normal_bridges == 1) {
> -		dev = list_first_entry(&bus->devices, struct pci_dev, bus_list);
> -		if (dev->subordinate)
> -			pci_bus_distribute_available_resources(dev->subordinate,
> -				add_list, io, mmio, mmio_pref);
> +		bridge = NULL;
> +
> +		/* Find the single bridge on this bus first */
> +		for_each_pci_bridge(dev, bus) {
> +			bridge = dev;
> +			break;
> +		}

If we just remember "bridge" in the loop before this hunk, could we
get rid of the loop here?  E.g.,

  bridge = NULL;
  for_each_pci_bridge(dev, bus) {
    bridge = dev;
    if (dev->is_hotplug_bridge)
      hotplug_bridges++;
    else
      normal_bridges++;
  }

> +
> +		if (WARN_ON_ONCE(!bridge))
> +			return;

Then I think this would be superfluous.

> +		if (!bridge->subordinate)
> +			return;
> +
> +		/*
> +		 * Reduce the space available for distribution by the
> +		 * amount required by the other devices on the same bus
> +		 * as this bridge.
> +		 */
> +		list_for_each_entry(dev, &bus->devices, bus_list) {
> +			int i;
> +
> +			if (dev == bridge)
> +				continue;

Why do we skip "bridge"?  Bridges are allowed to have two BARs
themselves, and it seems like they should be included here.

> +			for (i = 0; i < PCI_NUM_RESOURCES; i++) {
> +				const struct resource *dev_res = &dev->resource[i];
> +				resource_size_t dev_sz;
> +				struct resource *b_res;
> +
> +				if (dev_res->flags & IORESOURCE_IO) {
> +					b_res = &io;
> +				} else if (dev_res->flags & IORESOURCE_MEM) {
> +					if (dev_res->flags & IORESOURCE_PREFETCH)
> +						b_res = &mmio_pref;
> +					else
> +						b_res = &mmio;
> +				} else {
> +					continue;
> +				}
> +
> +				/* Size aligned to bridge window */
> +				align = pci_resource_alignment(bridge, b_res);
> +				dev_sz = ALIGN(resource_size(dev_res), align);
> +				if (!dev_sz)
> +					continue;
> +
> +				pci_dbg(dev, "resource %pR aligned to %#llx\n",
> +					dev_res, (unsigned long long)dev_sz);
> +
> +				if (dev_sz > resource_size(b_res))
> +					memset(b_res, 0, sizeof(*b_res));
> +				else
> +					b_res->end -= dev_sz;
> +
> +				pci_dbg(bridge, "updated available resources to %pR\n",
> +					b_res);
> +			}
> +		}

This only happens for buses with a single bridge.  Shouldn't it happen
regardless of how many bridges there are?

This block feels like something that could be split out to a separate
function.  It looks like it only needs "bus", "io", "mmio",
"mmio_pref", and maybe "bridge".

I don't understand the "bridge" part; it looks like that's basically
to use 4K alignment for I/O windows and 1M for memory windows?
Using "bridge" seems like a clunky way to figure that out.

Bjorn

  parent reply	other threads:[~2022-12-02 23:34 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-30 11:22 [PATCH v3 0/2] PCI: Distribute resources for root buses Mika Westerberg
2022-11-30 11:22 ` [PATCH v3 1/2] PCI: Take other bus devices into account when distributing resources Mika Westerberg
2022-12-02 17:45   ` Jonathan Cameron
2022-12-02 23:35     ` Bjorn Helgaas
2022-12-02 23:34   ` Bjorn Helgaas [this message]
2022-12-05  7:28     ` Mika Westerberg
2022-12-05 22:46       ` Bjorn Helgaas
2022-11-30 11:22 ` [PATCH v3 2/2] PCI: Distribute available resources for root buses too Mika Westerberg
2022-12-02 18:01   ` Jonathan Cameron
2022-12-02 17:07 ` [PATCH v3 0/2] PCI: Distribute resources for root buses Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221202233424.GA1070935@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=chris.chiu@canonical.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=mika.westerberg@linux.intel.com \
    --cc=rafael.j.wysocki@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).