linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexander Motin <mav@ixsystems.com>
To: Mika Westerberg <mika.westerberg@linux.intel.com>,
	Bjorn Helgaas <bhelgaas@google.com>
Cc: "Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
	Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Lukas Wunner <lukas@wunner.de>,
	Chris Chiu <chris.chiu@canonical.com>,
	Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>,
	linux-pci@vger.kernel.org
Subject: Re: [PATCH v4 1/2] PCI: Take other bus devices into account when distributing resources
Date: Mon, 9 Jan 2023 14:33:35 -0500	[thread overview]
Message-ID: <7bdef8bb-7a5e-2b5d-35ba-56cefb38d91f@ixsystems.com> (raw)
In-Reply-To: <20230104091635.63331-2-mika.westerberg@linux.intel.com>

On 04.01.2023 04:16, Mika Westerberg wrote:
> A PCI bridge may reside on a bus with other devices as well. The
> resource distribution code does not take this into account properly and
> therefore it expands the bridge resource windows too much, not leaving
> space for the other devices (or functions of a multifunction device) and
> this leads to an issue that Jonathan reported. He runs QEMU with the
> following topology (QEMU parameters):
> 
>   -device pcie-root-port,port=0,id=root_port13,chassis=0,slot=2  \
>   -device x3130-upstream,id=sw1,bus=root_port13,multifunction=on \
>   -device e1000,bus=root_port13,addr=0.1                         \
>   -device xio3130-downstream,id=fun1,bus=sw1,chassis=0,slot=3    \
>   -device e1000,bus=fun1
> 
> The first e1000 NIC here is another function in the switch upstream
> port. This leads to following errors:
> 
>    pci 0000:00:04.0: bridge window [mem 0x10200000-0x103fffff] to [bus 02-04]
>    pci 0000:02:00.0: bridge window [mem 0x10200000-0x103fffff] to [bus 03-04]
>    pci 0000:02:00.1: BAR 0: failed to assign [mem size 0x00020000]
>    e1000 0000:02:00.1: can't ioremap BAR 0: [??? 0x00000000 flags 0x0]
> 
> Fix this by taking into account the device BARs on the bus, including
> the ones belonging to bridges themselves.
> 
> Link: https://lore.kernel.org/linux-pci/20221014124553.0000696f@huawei.com/
> Link: https://lore.kernel.org/linux-pci/6053736d-1923-41e7-def9-7585ce1772d9@ixsystems.com/
> Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reported-by: Alexander Motin <mav@ixsystems.com>
> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> ---
>   drivers/pci/setup-bus.c | 205 +++++++++++++++++++++++++---------------
>   1 file changed, 129 insertions(+), 76 deletions(-)
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index b4096598dbcb..cf6a7bdf2427 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -1750,6 +1750,7 @@ static void adjust_bridge_window(struct pci_dev *bridge, struct resource *res,
>   				 resource_size_t new_size)
>   {
>   	resource_size_t add_size, size = resource_size(res);
> +	resource_size_t min_size;
>   
>   	if (res->parent)
>   		return;
> @@ -1757,30 +1758,87 @@ static void adjust_bridge_window(struct pci_dev *bridge, struct resource *res,
>   	if (!new_size)
>   		return;
>   
> +	/* Minimum granularity of a bridge bridge window */
> +	min_size = window_alignment(bridge->bus, res->flags);
> +
>   	if (new_size > size) {
>   		add_size = new_size - size;
> +		if (add_size < min_size)
> +			return;
>   		pci_dbg(bridge, "bridge window %pR extended by %pa\n", res,
>   			&add_size);
>   	} else if (new_size < size) {
>   		add_size = size - new_size;
> +		if (add_size < min_size)
> +			return;

May be I don't understand something, but in what situation it may 
happen, and won't it be a problem if you silently do nothing here, while 
the calling code will use the passed new_size as-is?

>   		pci_dbg(bridge, "bridge window %pR shrunken by %pa\n", res,
>   			&add_size);
> +	} else {
> +		return;
>   	}
>   
>   	res->end = res->start + new_size - 1;
>   	remove_from_list(add_list, res);
>   }
>   
> +static void reduce_dev_resources(struct pci_dev *dev, struct resource *io,
> +				 struct resource *mmio,
> +				 struct resource *mmio_pref)
> +{
> +	int i;
> +
> +	for (i = 0; i < PCI_NUM_RESOURCES; i++) {
> +		struct resource *res = &dev->resource[i];
> +		resource_size_t align, tmp, size;
> +
> +		size = resource_size(res);
> +		if (!size)
> +			continue;
> +
> +		align = pci_resource_alignment(dev, res);
> +
> +		if (resource_type(res) == IORESOURCE_IO) {
> +			align = align ? ALIGN(io->start, align) - io->start : 0;
> +			tmp = align + size;
> +			io->start = min(io->start + tmp, io->end + 1);
> +		} else if (resource_type(res) == IORESOURCE_MEM) {
> +			if (res->flags & IORESOURCE_PREFETCH) {
> +				align = align ? ALIGN(mmio_pref->start, align) -
> +						mmio_pref->start : 0;
> +				tmp = align + size;
> +				mmio_pref->start = min(mmio_pref->start + tmp,
> +						       mmio_pref->end + 1);
> +			} else {
> +				align = align ? ALIGN(mmio->start, align) -
> +						mmio->start : 0;
> +				tmp = align + size;
> +				mmio->start = min(mmio->start + tmp,
> +						  mmio->end + 1);
> +			}
> +		}
> +	}
> +}
> +
> +/*
> + * io, mmio and mmio_pref contain the total amount of bridge window
> + * space available. This includes the minimal space needed to cover all
> + * the existing devices on the bus and the possible extra space that can
> + * be shared with the bridges.
> + *
> + * The resource space consumed by bus->self (the bridge) is already
> + * reduced.
> + */
>   static void pci_bus_distribute_available_resources(struct pci_bus *bus,
>   					    struct list_head *add_list,
>   					    struct resource io,
>   					    struct resource mmio,
>   					    struct resource mmio_pref)
>   {
> +	resource_size_t io_align, mmio_align, mmio_pref_align;
> +	resource_size_t io_per_b, mmio_per_b, mmio_pref_per_b;
>   	unsigned int normal_bridges = 0, hotplug_bridges = 0;
>   	struct resource *io_res, *mmio_res, *mmio_pref_res;
>   	struct pci_dev *dev, *bridge = bus->self;
> -	resource_size_t io_per_hp, mmio_per_hp, mmio_pref_per_hp, align;
>   
>   	io_res = &bridge->resource[PCI_BRIDGE_IO_WINDOW];
>   	mmio_res = &bridge->resource[PCI_BRIDGE_MEM_WINDOW];
> @@ -1790,17 +1848,17 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
>   	 * The alignment of this bridge is yet to be considered, hence it must
>   	 * be done now before extending its bridge window.
>   	 */
> -	align = pci_resource_alignment(bridge, io_res);
> -	if (!io_res->parent && align)
> -		io.start = min(ALIGN(io.start, align), io.end + 1);
> +	io_align = pci_resource_alignment(bridge, io_res);
> +	if (!io_res->parent && io_align)
> +		io.start = min(ALIGN(io.start, io_align), io.end + 1);
>   
> -	align = pci_resource_alignment(bridge, mmio_res);
> -	if (!mmio_res->parent && align)
> -		mmio.start = min(ALIGN(mmio.start, align), mmio.end + 1);
> +	mmio_align = pci_resource_alignment(bridge, mmio_res);
> +	if (!mmio_res->parent && mmio_align)
> +		mmio.start = min(ALIGN(mmio.start, mmio_align), mmio.end + 1);
>   
> -	align = pci_resource_alignment(bridge, mmio_pref_res);
> -	if (!mmio_pref_res->parent && align)
> -		mmio_pref.start = min(ALIGN(mmio_pref.start, align),
> +	mmio_pref_align = pci_resource_alignment(bridge, mmio_pref_res);
> +	if (!mmio_pref_res->parent && mmio_pref_align)
> +		mmio_pref.start = min(ALIGN(mmio_pref.start, mmio_pref_align),
>   			mmio_pref.end + 1);
>   
>   	/*
> @@ -1824,94 +1882,89 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
>   			normal_bridges++;
>   	}
>   
> -	/*
> -	 * There is only one bridge on the bus so it gets all available
> -	 * resources which it can then distribute to the possible hotplug
> -	 * bridges below.
> -	 */
> -	if (hotplug_bridges + normal_bridges == 1) {
> -		dev = list_first_entry(&bus->devices, struct pci_dev, bus_list);
> -		if (dev->subordinate)
> -			pci_bus_distribute_available_resources(dev->subordinate,
> -				add_list, io, mmio, mmio_pref);
> -		return;
> -	}
> -
> -	if (hotplug_bridges == 0)
> +	if (!(hotplug_bridges + normal_bridges))
>   		return;
>   
>   	/*
>   	 * Calculate the total amount of extra resource space we can
> -	 * pass to bridges below this one.  This is basically the
> -	 * extra space reduced by the minimal required space for the
> -	 * non-hotplug bridges.
> +	 * pass to bridges below this one. This is basically the extra
> +	 * space reduced by the minimal required space for the bridge
> +	 * windows and device BARs on this bus.
>   	 */
> -	for_each_pci_bridge(dev, bus) {
> -		resource_size_t used_size;
> -		struct resource *res;
> -
> -		if (dev->is_hotplug_bridge)
> -			continue;
> -
> -		/*
> -		 * Reduce the available resource space by what the
> -		 * bridge and devices below it occupy.
> -		 */
> -		res = &dev->resource[PCI_BRIDGE_IO_WINDOW];
> -		align = pci_resource_alignment(dev, res);
> -		align = align ? ALIGN(io.start, align) - io.start : 0;
> -		used_size = align + resource_size(res);
> -		if (!res->parent)
> -			io.start = min(io.start + used_size, io.end + 1);
> -
> -		res = &dev->resource[PCI_BRIDGE_MEM_WINDOW];
> -		align = pci_resource_alignment(dev, res);
> -		align = align ? ALIGN(mmio.start, align) - mmio.start : 0;
> -		used_size = align + resource_size(res);
> -		if (!res->parent)
> -			mmio.start = min(mmio.start + used_size, mmio.end + 1);
> +	list_for_each_entry(dev, &bus->devices, bus_list)
> +		reduce_dev_resources(dev, &io, &mmio, &mmio_pref);
>   
> -		res = &dev->resource[PCI_BRIDGE_PREF_MEM_WINDOW];
> -		align = pci_resource_alignment(dev, res);
> -		align = align ? ALIGN(mmio_pref.start, align) -
> -			mmio_pref.start : 0;
> -		used_size = align + resource_size(res);
> -		if (!res->parent)
> -			mmio_pref.start = min(mmio_pref.start + used_size,
> -				mmio_pref.end + 1);
> +	/*
> +	 * If there is at least one hotplug bridge on this bus it gets
> +	 * all the extra resource space that was left after the
> +	 * reductions above.
> +	 *
> +	 * If there are no hotplug bridges the extra resource space is
> +	 * split between non-hotplug bridges. This is to allow possible
> +	 * hotplug bridges below them to get the extra space as well.
> +	 */
> +	if (hotplug_bridges) {
> +		io_per_b = div64_ul(resource_size(&io), hotplug_bridges);
> +		mmio_per_b = div64_ul(resource_size(&mmio), hotplug_bridges);
> +		mmio_pref_per_b = div64_ul(resource_size(&mmio_pref),
> +					   hotplug_bridges);
> +	} else {
> +		io_per_b = div64_ul(resource_size(&io), normal_bridges);
> +		mmio_per_b = div64_ul(resource_size(&mmio), normal_bridges);
> +		mmio_pref_per_b = div64_ul(resource_size(&mmio_pref),
> +					   normal_bridges);
>   	}
>   
> -	io_per_hp = div64_ul(resource_size(&io), hotplug_bridges);
> -	mmio_per_hp = div64_ul(resource_size(&mmio), hotplug_bridges);
> -	mmio_pref_per_hp = div64_ul(resource_size(&mmio_pref),
> -		hotplug_bridges);
> -
>   	/*
> -	 * Go over devices on this bus and distribute the remaining
> -	 * resource space between hotplug bridges.
> +	 * Make sure the split resource space is properly aligned for
> +	 * bridge windows (align it down to avoid going above what is
> +	 * available).
>   	 */
> +	if (io_align)
> +		io_per_b = ALIGN_DOWN(io_per_b, io_align);
> +	if (mmio_align)
> +		mmio_per_b = ALIGN_DOWN(mmio_per_b, mmio_align);
> +	if (mmio_pref_align)
> +		mmio_pref_per_b = ALIGN_DOWN(mmio_pref_per_b, mmio_align);

If I understand it right, you are applying alignment requirements of the 
parent bridge to the windows of its children.  I don't have examples of 
any bridge with different alignment, but shouldn't we better get and use 
proper alignment inside the loop below?

> +
>   	for_each_pci_bridge(dev, bus) {
> +		resource_size_t allocated_io, allocated_mmio, allocated_mmio_pref;
> +		const struct resource *res;
>   		struct pci_bus *b;
>   
>   		b = dev->subordinate;
> -		if (!b || !dev->is_hotplug_bridge)
> +		if (!b)
> +			continue;
> +		if (hotplug_bridges && !dev->is_hotplug_bridge)
>   			continue;
>   
> +		io.end = io.start + io_per_b - 1;
>   		/*
> -		 * Distribute available extra resources equally between
> -		 * hotplug-capable downstream ports taking alignment into
> -		 * account.
> +		 * The x_per_b holds the extra resource space that can
> +		 * be added for each bridge but there is the minimal
> +		 * already reserved as well so adjust x.start down
> +		 * accordingly to cover the whole space.
>   		 */
> -		io.end = io.start + io_per_hp - 1;
> -		mmio.end = mmio.start + mmio_per_hp - 1;
> -		mmio_pref.end = mmio_pref.start + mmio_pref_per_hp - 1;
> +		res = &dev->resource[PCI_BRIDGE_IO_WINDOW];
> +		allocated_io = resource_size(res);
> +		io.start -= allocated_io;
> +
> +		mmio.end = mmio.start + mmio_per_b - 1;
> +		res = &dev->resource[PCI_BRIDGE_MEM_WINDOW];
> +		allocated_mmio = resource_size(res);
> +		mmio.start -= allocated_mmio;
> +
> +		mmio_pref.end = mmio_pref.start + mmio_pref_per_b - 1;
> +		res = &dev->resource[PCI_BRIDGE_PREF_MEM_WINDOW];
> +		allocated_mmio_pref = resource_size(res);
> +		mmio_pref.start -= allocated_mmio_pref;
>   
>   		pci_bus_distribute_available_resources(b, add_list, io, mmio,
>   						       mmio_pref);
>   
> -		io.start += io_per_hp;
> -		mmio.start += mmio_per_hp;
> -		mmio_pref.start += mmio_pref_per_hp;
> +		io.start += allocated_io + io_per_b;
> +		mmio.start += allocated_mmio + mmio_per_b;
> +		mmio_pref.start += allocated_mmio_pref + mmio_pref_per_b;
>   	}
>   }
>   

-- 
Alexander Motin

  parent reply	other threads:[~2023-01-09 19:33 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-04  9:16 [PATCH v4 0/2] PCI: distribute resources for root buses Mika Westerberg
2023-01-04  9:16 ` [PATCH v4 1/2] PCI: Take other bus devices into account when distributing resources Mika Westerberg
2023-01-04 22:48   ` Bjorn Helgaas
2023-01-05  9:12     ` Mika Westerberg
2023-01-05 13:43       ` Mika Westerberg
2023-01-05 17:04         ` Bjorn Helgaas
2023-01-09 11:11           ` Mika Westerberg
2023-01-09 18:27             ` Alexander Motin
2023-01-10 10:07               ` Mika Westerberg
2023-01-09 19:33   ` Alexander Motin [this message]
2023-01-10  9:54     ` Mika Westerberg
2023-01-04  9:16 ` [PATCH v4 2/2] PCI: Distribute available resources for root buses too Mika Westerberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7bdef8bb-7a5e-2b5d-35ba-56cefb38d91f@ixsystems.com \
    --to=mav@ixsystems.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=chris.chiu@canonical.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=mika.westerberg@linux.intel.com \
    --cc=nicholas.johnson-opensource@outlook.com.au \
    --cc=rafael.j.wysocki@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).