linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mika Westerberg <mika.westerberg@linux.intel.com>
To: Alexander Motin <mav@ixsystems.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
	Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Lukas Wunner <lukas@wunner.de>,
	Chris Chiu <chris.chiu@canonical.com>,
	Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>,
	linux-pci@vger.kernel.org
Subject: Re: [PATCH v4 1/2] PCI: Take other bus devices into account when distributing resources
Date: Tue, 10 Jan 2023 11:54:28 +0200	[thread overview]
Message-ID: <Y7011Eg89E1cnoSU@black.fi.intel.com> (raw)
In-Reply-To: <7bdef8bb-7a5e-2b5d-35ba-56cefb38d91f@ixsystems.com>

Hi,

On Mon, Jan 09, 2023 at 02:33:35PM -0500, Alexander Motin wrote:
> On 04.01.2023 04:16, Mika Westerberg wrote:
> > A PCI bridge may reside on a bus with other devices as well. The
> > resource distribution code does not take this into account properly and
> > therefore it expands the bridge resource windows too much, not leaving
> > space for the other devices (or functions of a multifunction device) and
> > this leads to an issue that Jonathan reported. He runs QEMU with the
> > following topology (QEMU parameters):
> > 
> >   -device pcie-root-port,port=0,id=root_port13,chassis=0,slot=2  \
> >   -device x3130-upstream,id=sw1,bus=root_port13,multifunction=on \
> >   -device e1000,bus=root_port13,addr=0.1                         \
> >   -device xio3130-downstream,id=fun1,bus=sw1,chassis=0,slot=3    \
> >   -device e1000,bus=fun1
> > 
> > The first e1000 NIC here is another function in the switch upstream
> > port. This leads to following errors:
> > 
> >    pci 0000:00:04.0: bridge window [mem 0x10200000-0x103fffff] to [bus 02-04]
> >    pci 0000:02:00.0: bridge window [mem 0x10200000-0x103fffff] to [bus 03-04]
> >    pci 0000:02:00.1: BAR 0: failed to assign [mem size 0x00020000]
> >    e1000 0000:02:00.1: can't ioremap BAR 0: [??? 0x00000000 flags 0x0]
> > 
> > Fix this by taking into account the device BARs on the bus, including
> > the ones belonging to bridges themselves.
> > 
> > Link: https://lore.kernel.org/linux-pci/20221014124553.0000696f@huawei.com/
> > Link: https://lore.kernel.org/linux-pci/6053736d-1923-41e7-def9-7585ce1772d9@ixsystems.com/
> > Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Reported-by: Alexander Motin <mav@ixsystems.com>
> > Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > ---
> >   drivers/pci/setup-bus.c | 205 +++++++++++++++++++++++++---------------
> >   1 file changed, 129 insertions(+), 76 deletions(-)
> > 
> > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> > index b4096598dbcb..cf6a7bdf2427 100644
> > --- a/drivers/pci/setup-bus.c
> > +++ b/drivers/pci/setup-bus.c
> > @@ -1750,6 +1750,7 @@ static void adjust_bridge_window(struct pci_dev *bridge, struct resource *res,
> >   				 resource_size_t new_size)
> >   {
> >   	resource_size_t add_size, size = resource_size(res);
> > +	resource_size_t min_size;
> >   	if (res->parent)
> >   		return;
> > @@ -1757,30 +1758,87 @@ static void adjust_bridge_window(struct pci_dev *bridge, struct resource *res,
> >   	if (!new_size)
> >   		return;
> > +	/* Minimum granularity of a bridge bridge window */
> > +	min_size = window_alignment(bridge->bus, res->flags);
> > +
> >   	if (new_size > size) {
> >   		add_size = new_size - size;
> > +		if (add_size < min_size)
> > +			return;
> >   		pci_dbg(bridge, "bridge window %pR extended by %pa\n", res,
> >   			&add_size);
> >   	} else if (new_size < size) {
> >   		add_size = size - new_size;
> > +		if (add_size < min_size)
> > +			return;
> 
> May be I don't understand something, but in what situation it may happen,
> and won't it be a problem if you silently do nothing here, while the calling
> code will use the passed new_size as-is?

Good point.

I saw this happening when I run QEMU but now that I started to look at
it again the reason seems to be that there is a 32-bit prefetchable
option ROM BAR for the e1000 NIC and that gets reduced incorrectly by
the reduce_dev_resources() which causes the "imbalance".

> >   		pci_dbg(bridge, "bridge window %pR shrunken by %pa\n", res,
> >   			&add_size);
> > +	} else {
> > +		return;
> >   	}
> >   	res->end = res->start + new_size - 1;
> >   	remove_from_list(add_list, res);
> >   }
> > +static void reduce_dev_resources(struct pci_dev *dev, struct resource *io,
> > +				 struct resource *mmio,
> > +				 struct resource *mmio_pref)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < PCI_NUM_RESOURCES; i++) {
> > +		struct resource *res = &dev->resource[i];
> > +		resource_size_t align, tmp, size;
> > +
> > +		size = resource_size(res);
> > +		if (!size)
> > +			continue;
> > +
> > +		align = pci_resource_alignment(dev, res);
> > +
> > +		if (resource_type(res) == IORESOURCE_IO) {
> > +			align = align ? ALIGN(io->start, align) - io->start : 0;
> > +			tmp = align + size;
> > +			io->start = min(io->start + tmp, io->end + 1);
> > +		} else if (resource_type(res) == IORESOURCE_MEM) {
> > +			if (res->flags & IORESOURCE_PREFETCH) {
> > +				align = align ? ALIGN(mmio_pref->start, align) -
> > +						mmio_pref->start : 0;
> > +				tmp = align + size;
> > +				mmio_pref->start = min(mmio_pref->start + tmp,
> > +						       mmio_pref->end + 1);
> > +			} else {
> > +				align = align ? ALIGN(mmio->start, align) -
> > +						mmio->start : 0;
> > +				tmp = align + size;
> > +				mmio->start = min(mmio->start + tmp,
> > +						  mmio->end + 1);
> > +			}
> > +		}
> > +	}
> > +}
> > +
> > +/*
> > + * io, mmio and mmio_pref contain the total amount of bridge window
> > + * space available. This includes the minimal space needed to cover all
> > + * the existing devices on the bus and the possible extra space that can
> > + * be shared with the bridges.
> > + *
> > + * The resource space consumed by bus->self (the bridge) is already
> > + * reduced.
> > + */
> >   static void pci_bus_distribute_available_resources(struct pci_bus *bus,
> >   					    struct list_head *add_list,
> >   					    struct resource io,
> >   					    struct resource mmio,
> >   					    struct resource mmio_pref)
> >   {
> > +	resource_size_t io_align, mmio_align, mmio_pref_align;
> > +	resource_size_t io_per_b, mmio_per_b, mmio_pref_per_b;
> >   	unsigned int normal_bridges = 0, hotplug_bridges = 0;
> >   	struct resource *io_res, *mmio_res, *mmio_pref_res;
> >   	struct pci_dev *dev, *bridge = bus->self;
> > -	resource_size_t io_per_hp, mmio_per_hp, mmio_pref_per_hp, align;
> >   	io_res = &bridge->resource[PCI_BRIDGE_IO_WINDOW];
> >   	mmio_res = &bridge->resource[PCI_BRIDGE_MEM_WINDOW];
> > @@ -1790,17 +1848,17 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
> >   	 * The alignment of this bridge is yet to be considered, hence it must
> >   	 * be done now before extending its bridge window.
> >   	 */
> > -	align = pci_resource_alignment(bridge, io_res);
> > -	if (!io_res->parent && align)
> > -		io.start = min(ALIGN(io.start, align), io.end + 1);
> > +	io_align = pci_resource_alignment(bridge, io_res);
> > +	if (!io_res->parent && io_align)
> > +		io.start = min(ALIGN(io.start, io_align), io.end + 1);
> > -	align = pci_resource_alignment(bridge, mmio_res);
> > -	if (!mmio_res->parent && align)
> > -		mmio.start = min(ALIGN(mmio.start, align), mmio.end + 1);
> > +	mmio_align = pci_resource_alignment(bridge, mmio_res);
> > +	if (!mmio_res->parent && mmio_align)
> > +		mmio.start = min(ALIGN(mmio.start, mmio_align), mmio.end + 1);
> > -	align = pci_resource_alignment(bridge, mmio_pref_res);
> > -	if (!mmio_pref_res->parent && align)
> > -		mmio_pref.start = min(ALIGN(mmio_pref.start, align),
> > +	mmio_pref_align = pci_resource_alignment(bridge, mmio_pref_res);
> > +	if (!mmio_pref_res->parent && mmio_pref_align)
> > +		mmio_pref.start = min(ALIGN(mmio_pref.start, mmio_pref_align),
> >   			mmio_pref.end + 1);
> >   	/*
> > @@ -1824,94 +1882,89 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
> >   			normal_bridges++;
> >   	}
> > -	/*
> > -	 * There is only one bridge on the bus so it gets all available
> > -	 * resources which it can then distribute to the possible hotplug
> > -	 * bridges below.
> > -	 */
> > -	if (hotplug_bridges + normal_bridges == 1) {
> > -		dev = list_first_entry(&bus->devices, struct pci_dev, bus_list);
> > -		if (dev->subordinate)
> > -			pci_bus_distribute_available_resources(dev->subordinate,
> > -				add_list, io, mmio, mmio_pref);
> > -		return;
> > -	}
> > -
> > -	if (hotplug_bridges == 0)
> > +	if (!(hotplug_bridges + normal_bridges))
> >   		return;
> >   	/*
> >   	 * Calculate the total amount of extra resource space we can
> > -	 * pass to bridges below this one.  This is basically the
> > -	 * extra space reduced by the minimal required space for the
> > -	 * non-hotplug bridges.
> > +	 * pass to bridges below this one. This is basically the extra
> > +	 * space reduced by the minimal required space for the bridge
> > +	 * windows and device BARs on this bus.
> >   	 */
> > -	for_each_pci_bridge(dev, bus) {
> > -		resource_size_t used_size;
> > -		struct resource *res;
> > -
> > -		if (dev->is_hotplug_bridge)
> > -			continue;
> > -
> > -		/*
> > -		 * Reduce the available resource space by what the
> > -		 * bridge and devices below it occupy.
> > -		 */
> > -		res = &dev->resource[PCI_BRIDGE_IO_WINDOW];
> > -		align = pci_resource_alignment(dev, res);
> > -		align = align ? ALIGN(io.start, align) - io.start : 0;
> > -		used_size = align + resource_size(res);
> > -		if (!res->parent)
> > -			io.start = min(io.start + used_size, io.end + 1);
> > -
> > -		res = &dev->resource[PCI_BRIDGE_MEM_WINDOW];
> > -		align = pci_resource_alignment(dev, res);
> > -		align = align ? ALIGN(mmio.start, align) - mmio.start : 0;
> > -		used_size = align + resource_size(res);
> > -		if (!res->parent)
> > -			mmio.start = min(mmio.start + used_size, mmio.end + 1);
> > +	list_for_each_entry(dev, &bus->devices, bus_list)
> > +		reduce_dev_resources(dev, &io, &mmio, &mmio_pref);
> > -		res = &dev->resource[PCI_BRIDGE_PREF_MEM_WINDOW];
> > -		align = pci_resource_alignment(dev, res);
> > -		align = align ? ALIGN(mmio_pref.start, align) -
> > -			mmio_pref.start : 0;
> > -		used_size = align + resource_size(res);
> > -		if (!res->parent)
> > -			mmio_pref.start = min(mmio_pref.start + used_size,
> > -				mmio_pref.end + 1);
> > +	/*
> > +	 * If there is at least one hotplug bridge on this bus it gets
> > +	 * all the extra resource space that was left after the
> > +	 * reductions above.
> > +	 *
> > +	 * If there are no hotplug bridges the extra resource space is
> > +	 * split between non-hotplug bridges. This is to allow possible
> > +	 * hotplug bridges below them to get the extra space as well.
> > +	 */
> > +	if (hotplug_bridges) {
> > +		io_per_b = div64_ul(resource_size(&io), hotplug_bridges);
> > +		mmio_per_b = div64_ul(resource_size(&mmio), hotplug_bridges);
> > +		mmio_pref_per_b = div64_ul(resource_size(&mmio_pref),
> > +					   hotplug_bridges);
> > +	} else {
> > +		io_per_b = div64_ul(resource_size(&io), normal_bridges);
> > +		mmio_per_b = div64_ul(resource_size(&mmio), normal_bridges);
> > +		mmio_pref_per_b = div64_ul(resource_size(&mmio_pref),
> > +					   normal_bridges);
> >   	}
> > -	io_per_hp = div64_ul(resource_size(&io), hotplug_bridges);
> > -	mmio_per_hp = div64_ul(resource_size(&mmio), hotplug_bridges);
> > -	mmio_pref_per_hp = div64_ul(resource_size(&mmio_pref),
> > -		hotplug_bridges);
> > -
> >   	/*
> > -	 * Go over devices on this bus and distribute the remaining
> > -	 * resource space between hotplug bridges.
> > +	 * Make sure the split resource space is properly aligned for
> > +	 * bridge windows (align it down to avoid going above what is
> > +	 * available).
> >   	 */
> > +	if (io_align)
> > +		io_per_b = ALIGN_DOWN(io_per_b, io_align);
> > +	if (mmio_align)
> > +		mmio_per_b = ALIGN_DOWN(mmio_per_b, mmio_align);
> > +	if (mmio_pref_align)
> > +		mmio_pref_per_b = ALIGN_DOWN(mmio_pref_per_b, mmio_align);
> 
> If I understand it right, you are applying alignment requirements of the
> parent bridge to the windows of its children.  I don't have examples of any
> bridge with different alignment, but shouldn't we better get and use proper
> alignment inside the loop below?

Yes, I agree.

  reply	other threads:[~2023-01-10  9:54 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-04  9:16 [PATCH v4 0/2] PCI: distribute resources for root buses Mika Westerberg
2023-01-04  9:16 ` [PATCH v4 1/2] PCI: Take other bus devices into account when distributing resources Mika Westerberg
2023-01-04 22:48   ` Bjorn Helgaas
2023-01-05  9:12     ` Mika Westerberg
2023-01-05 13:43       ` Mika Westerberg
2023-01-05 17:04         ` Bjorn Helgaas
2023-01-09 11:11           ` Mika Westerberg
2023-01-09 18:27             ` Alexander Motin
2023-01-10 10:07               ` Mika Westerberg
2023-01-09 19:33   ` Alexander Motin
2023-01-10  9:54     ` Mika Westerberg [this message]
2023-01-04  9:16 ` [PATCH v4 2/2] PCI: Distribute available resources for root buses too Mika Westerberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y7011Eg89E1cnoSU@black.fi.intel.com \
    --to=mika.westerberg@linux.intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=chris.chiu@canonical.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=mav@ixsystems.com \
    --cc=nicholas.johnson-opensource@outlook.com.au \
    --cc=rafael.j.wysocki@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).