linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] PCI: Take multifunction devices into account when distributing resources
@ 2022-11-03 10:32 Mika Westerberg
  2022-11-03 10:32 ` [PATCH 2/2] Revert "Revert "PCI: Distribute available resources for root buses, too"" Mika Westerberg
  2022-11-08 21:11 ` [PATCH 1/2] PCI: Take multifunction devices into account when distributing resources Bjorn Helgaas
  0 siblings, 2 replies; 4+ messages in thread
From: Mika Westerberg @ 2022-11-03 10:32 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J . Wysocki, Andy Shevchenko, Jonathan Cameron,
	Lukas Wunner, Chris Chiu, linux-pci, Mika Westerberg

It is possible to have PCIe switch upstream port a multifunction device.
The resource distribution code does not take this into account properly
and therefore it expands the upstream port resource windows too much,
not leaving space for the other functions (in the multifunction device)
and this leads to an issue that Jonathan reported. He runs QEMU with
the following topoology (QEMU parameters):

 -device pcie-root-port,port=0,id=root_port13,chassis=0,slot=2	\
 -device x3130-upstream,id=sw1,bus=root_port13,multifunction=on	\
 -device e1000,bus=root_port13,addr=0.1 			\
 -device xio3130-downstream,id=fun1,bus=sw1,chassis=0,slot=3	\
 -device e1000,bus=fun1

The first e1000 NIC here is another function in the switch upstream
port. This leads to following errors:

  pci 0000:00:04.0: bridge window [mem 0x10200000-0x103fffff] to [bus 02-04]
  pci 0000:02:00.0: bridge window [mem 0x10200000-0x103fffff] to [bus 03-04]
  pci 0000:02:00.1: BAR 0: failed to assign [mem size 0x00020000]
  e1000 0000:02:00.1: can't ioremap BAR 0: [??? 0x00000000 flags 0x0]

Fix this by taking into account the possible multifunction devices when
uptream port resources are distributed.

Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---
Hi,

This is the formal patch that resulted from the discussion here:

https://lore.kernel.org/linux-pci/20220905080232.36087-5-mika.westerberg@linux.intel.com/T/#m724289d0ee0c1ae07628744c283116e60efaeaf1

Only change from that version is that we loop through all resources of
the multifunction device.

 drivers/pci/setup-bus.c | 63 ++++++++++++++++++++++++++++++++++++++---
 1 file changed, 59 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index b4096598dbcb..c8787b187ee4 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1830,10 +1830,65 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
 	 * bridges below.
 	 */
 	if (hotplug_bridges + normal_bridges == 1) {
-		dev = list_first_entry(&bus->devices, struct pci_dev, bus_list);
-		if (dev->subordinate)
-			pci_bus_distribute_available_resources(dev->subordinate,
-				add_list, io, mmio, mmio_pref);
+		/* Upstream port must be the first */
+		bridge = list_first_entry(&bus->devices, struct pci_dev, bus_list);
+		if (!bridge->subordinate)
+			return;
+
+		/*
+		 * It is possible to have switch upstream port as a part
+		 * of a multifunction device. For this reason reduce the
+		 * resources occupied by the other functions before
+		 * distributing the rest.
+		 */
+		list_for_each_entry(dev, &bus->devices, bus_list) {
+			int i;
+
+			if (dev == bridge)
+				continue;
+
+			/*
+			 * It should be multifunction but if not stop
+			 * the distribution and bail out.
+			 */
+			if (!dev->multifunction)
+				return;
+
+			for (i = 0; i < PCI_NUM_RESOURCES; i++) {
+				const struct resource *dev_res = &dev->resource[i];
+				resource_size_t dev_sz;
+				struct resource *b_res;
+
+				if (dev_res->flags & IORESOURCE_IO) {
+					b_res = &io;
+				} else if (dev_res->flags & IORESOURCE_MEM) {
+					if (dev_res->flags & IORESOURCE_PREFETCH)
+						b_res = &mmio_pref;
+					else
+						b_res = &mmio;
+				} else {
+					continue;
+				}
+
+				/* Size aligned to bridge window */
+				align = pci_resource_alignment(bridge, b_res);
+				dev_sz = ALIGN(resource_size(dev_res), align);
+
+				pci_dbg(dev, "%pR aligned to %llx\n", dev_res,
+					(unsigned long long)dev_sz);
+
+				if (dev_sz >= resource_size(b_res))
+					memset(b_res, 0, sizeof(*b_res));
+				else
+					b_res->end -= dev_sz;
+
+				pci_dbg(bridge, "updated available to %pR\n", b_res);
+			}
+		}
+
+		pci_bus_distribute_available_resources(bridge->subordinate,
+						       add_list, io, mmio,
+						       mmio_pref);
 		return;
 	}
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2] Revert "Revert "PCI: Distribute available resources for root buses, too""
  2022-11-03 10:32 [PATCH 1/2] PCI: Take multifunction devices into account when distributing resources Mika Westerberg
@ 2022-11-03 10:32 ` Mika Westerberg
  2022-11-08 21:11 ` [PATCH 1/2] PCI: Take multifunction devices into account when distributing resources Bjorn Helgaas
  1 sibling, 0 replies; 4+ messages in thread
From: Mika Westerberg @ 2022-11-03 10:32 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J . Wysocki, Andy Shevchenko, Jonathan Cameron,
	Lukas Wunner, Chris Chiu, linux-pci, Mika Westerberg

This reverts commit 5632e2beaf9d5dda694c0572684dea783d8a9492.

Now that pci_bridge_distribute_available_resources() takes multifunction
devices int account we can revert this revert to fix the original issue.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---
Hi Bjorn,

Let me know if you prefer re-sending the original patch over the revert.

 drivers/pci/setup-bus.c | 62 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 61 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index c8787b187ee4..e512f9ecb9d0 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1768,7 +1768,10 @@ static void adjust_bridge_window(struct pci_dev *bridge, struct resource *res,
 	}
 
 	res->end = res->start + new_size - 1;
-	remove_from_list(add_list, res);
+
+	/* If the resource is part of the add_list remove it now */
+	if (add_list)
+		remove_from_list(add_list, res);
 }
 
 static void pci_bus_distribute_available_resources(struct pci_bus *bus,
@@ -1978,6 +1981,8 @@ static void pci_bridge_distribute_available_resources(struct pci_dev *bridge,
 	if (!bridge->is_hotplug_bridge)
 		return;
 
+	pci_dbg(bridge, "distributing available resources\n");
+
 	/* Take the initial extra resources from the hotplug port */
 	available_io = bridge->resource[PCI_BRIDGE_IO_WINDOW];
 	available_mmio = bridge->resource[PCI_BRIDGE_MEM_WINDOW];
@@ -1989,6 +1994,59 @@ static void pci_bridge_distribute_available_resources(struct pci_dev *bridge,
 					       available_mmio_pref);
 }
 
+static bool pci_bridge_resources_not_assigned(struct pci_dev *dev)
+{
+	const struct resource *r;
+
+	/*
+	 * Check the child device's resources and if they are not yet
+	 * assigned it means we are configuring them (not the boot
+	 * firmware) so we should be able to extend the upstream
+	 * bridge's (that's the hotplug downstream PCIe port) resources
+	 * in the same way we do with the normal hotplug case.
+	 */
+	r = &dev->resource[PCI_BRIDGE_IO_WINDOW];
+	if (!r->flags || !(r->flags & IORESOURCE_STARTALIGN))
+		return false;
+	r = &dev->resource[PCI_BRIDGE_MEM_WINDOW];
+	if (!r->flags || !(r->flags & IORESOURCE_STARTALIGN))
+		return false;
+	r = &dev->resource[PCI_BRIDGE_PREF_MEM_WINDOW];
+	if (!r->flags || !(r->flags & IORESOURCE_STARTALIGN))
+		return false;
+
+	return true;
+}
+
+static void pci_root_bus_distribute_available_resources(struct pci_bus *bus,
+							struct list_head *add_list)
+{
+	struct pci_dev *dev, *bridge = bus->self;
+
+	for_each_pci_bridge(dev, bus) {
+		struct pci_bus *b;
+
+		b = dev->subordinate;
+		if (!b)
+			continue;
+
+		/*
+		 * Need to check "bridge" here too because it is NULL
+		 * in case of root bus.
+		 */
+		if (bridge && pci_bridge_resources_not_assigned(dev)) {
+			pci_bridge_distribute_available_resources(bridge, add_list);
+			/*
+			 * There is only PCIe upstream port on the bus
+			 * so we don't need to go futher.
+			 */
+			return;
+		}
+
+		pci_root_bus_distribute_available_resources(b, add_list);
+	}
+}
+
 /*
  * First try will not touch PCI bridge res.
  * Second and later try will clear small leaf bridge res.
@@ -2028,6 +2086,8 @@ void pci_assign_unassigned_root_bus_resources(struct pci_bus *bus)
 	 */
 	__pci_bus_size_bridges(bus, add_list);
 
+	pci_root_bus_distribute_available_resources(bus, add_list);
+
 	/* Depth last, allocate resources and update the hardware. */
 	__pci_bus_assign_resources(bus, add_list, &fail_head);
 	if (add_list)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] PCI: Take multifunction devices into account when distributing resources
  2022-11-03 10:32 [PATCH 1/2] PCI: Take multifunction devices into account when distributing resources Mika Westerberg
  2022-11-03 10:32 ` [PATCH 2/2] Revert "Revert "PCI: Distribute available resources for root buses, too"" Mika Westerberg
@ 2022-11-08 21:11 ` Bjorn Helgaas
  2022-11-09 12:41   ` Mika Westerberg
  1 sibling, 1 reply; 4+ messages in thread
From: Bjorn Helgaas @ 2022-11-08 21:11 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Bjorn Helgaas, Rafael J . Wysocki, Andy Shevchenko,
	Jonathan Cameron, Lukas Wunner, Chris Chiu, linux-pci

On Thu, Nov 03, 2022 at 12:32:53PM +0200, Mika Westerberg wrote:
> It is possible to have PCIe switch upstream port a multifunction device.

I can't quite parse this.  I guess the point is that a Switch Upstream
Port may be one of the functions of a multifunction device?

> The resource distribution code does not take this into account properly
> and therefore it expands the upstream port resource windows too much,
> not leaving space for the other functions (in the multifunction device)
> and this leads to an issue that Jonathan reported. He runs QEMU with
> the following topoology (QEMU parameters):
> 
>  -device pcie-root-port,port=0,id=root_port13,chassis=0,slot=2	\
>  -device x3130-upstream,id=sw1,bus=root_port13,multifunction=on	\
>  -device e1000,bus=root_port13,addr=0.1 			\
>  -device xio3130-downstream,id=fun1,bus=sw1,chassis=0,slot=3	\
>  -device e1000,bus=fun1
> 
> The first e1000 NIC here is another function in the switch upstream
> port. This leads to following errors:
> 
>   pci 0000:00:04.0: bridge window [mem 0x10200000-0x103fffff] to [bus 02-04]
>   pci 0000:02:00.0: bridge window [mem 0x10200000-0x103fffff] to [bus 03-04]
>   pci 0000:02:00.1: BAR 0: failed to assign [mem size 0x00020000]
>   e1000 0000:02:00.1: can't ioremap BAR 0: [??? 0x00000000 flags 0x0]
> 
> Fix this by taking into account the possible multifunction devices when
> uptream port resources are distributed.

Can you include the link to Jonathan's report?

> Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> ---
> Hi,
> 
> This is the formal patch that resulted from the discussion here:
> 
> https://lore.kernel.org/linux-pci/20220905080232.36087-5-mika.westerberg@linux.intel.com/T/#m724289d0ee0c1ae07628744c283116e60efaeaf1
> 
> Only change from that version is that we loop through all resources of
> the multifunction device.
> 
>  drivers/pci/setup-bus.c | 63 ++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 59 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index b4096598dbcb..c8787b187ee4 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -1830,10 +1830,65 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
>  	 * bridges below.
>  	 */
>  	if (hotplug_bridges + normal_bridges == 1) {
> -		dev = list_first_entry(&bus->devices, struct pci_dev, bus_list);
> -		if (dev->subordinate)
> -			pci_bus_distribute_available_resources(dev->subordinate,
> -				add_list, io, mmio, mmio_pref);
> +		/* Upstream port must be the first */

Do you have any citation or reasoning for this handy?  We had this
assumption before, and it seems true that an Upstream Port must be
Function 0 because a variety of Link-related things have to be in
Function 0, e.g., ARI ASPM Control, ARI Clock PM, Autonomous Width
Disable, Flit Mode Disable, LTR Enable, OBFF Enable, etc.  But those
are all pretty oblique.

I guess it's better to have the comment than not, but is the sort of
assertion that makes one wonder why it is true.

> +		bridge = list_first_entry(&bus->devices, struct pci_dev, bus_list);
> +		if (!bridge->subordinate)
> +			return;
> +
> +		/*
> +		 * It is possible to have switch upstream port as a part
> +		 * of a multifunction device. For this reason reduce the
> +		 * resources occupied by the other functions before
> +		 * distributing the rest.

The space consumed by the peer functions of the Switch Upstream Port
is determined by their BAR sizes, so I don't think we actually reduce
that.

I *think* the point here is to reduce the space available for
distribution by the amount required by the peers of the Switch
Upstream Port, right?  I.e., "mmio" is the amount of space we have to
distribute, and before splitting it across devices on the secondary
bus, we need to save out the space required for peers on the primary
bus.

> +		 */
> +		list_for_each_entry(dev, &bus->devices, bus_list) {
> +			int i;
> +
> +			if (dev == bridge)
> +				continue;
> +
> +			/*
> +			 * It should be multifunction but if not stop
> +			 * the distribution and bail out.
> +			 */
> +			if (!dev->multifunction)
> +				return;

Why do we bother with this?  If there are multiple devices on the bus,
don't we want to consider them all, regardless of whether
dev->multifunction is set?  It seems like a gratuitous check.

> +			for (i = 0; i < PCI_NUM_RESOURCES; i++) {
> +				const struct resource *dev_res = &dev->resource[i];
> +				resource_size_t dev_sz;
> +				struct resource *b_res;
> +
> +				if (dev_res->flags & IORESOURCE_IO) {
> +					b_res = &io;
> +				} else if (dev_res->flags & IORESOURCE_MEM) {
> +					if (dev_res->flags & IORESOURCE_PREFETCH)
> +						b_res = &mmio_pref;
> +					else
> +						b_res = &mmio;
> +				} else {
> +					continue;
> +				}
> +
> +				/* Size aligned to bridge window */
> +				align = pci_resource_alignment(bridge, b_res);
> +				dev_sz = ALIGN(resource_size(dev_res), align);
> +
> +				pci_dbg(dev, "%pR aligned to %llx\n", dev_res,

%#llx to avoid confusion and match other output.

> +					(unsigned long long)dev_sz);
> +
> +				if (dev_sz >= resource_size(b_res))
> +					memset(b_res, 0, sizeof(*b_res));
> +				else
> +					b_res->end -= dev_sz;
> +
> +				pci_dbg(bridge, "updated available to %pR\n", b_res);
> +			}
> +		}
> +
> +		pci_bus_distribute_available_resources(bridge->subordinate,
> +						       add_list, io, mmio,
> +						       mmio_pref);
>  		return;
>  	}
>  
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] PCI: Take multifunction devices into account when distributing resources
  2022-11-08 21:11 ` [PATCH 1/2] PCI: Take multifunction devices into account when distributing resources Bjorn Helgaas
@ 2022-11-09 12:41   ` Mika Westerberg
  0 siblings, 0 replies; 4+ messages in thread
From: Mika Westerberg @ 2022-11-09 12:41 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Rafael J . Wysocki, Andy Shevchenko,
	Jonathan Cameron, Lukas Wunner, Chris Chiu, linux-pci

On Tue, Nov 08, 2022 at 03:11:30PM -0600, Bjorn Helgaas wrote:
> On Thu, Nov 03, 2022 at 12:32:53PM +0200, Mika Westerberg wrote:
> > It is possible to have PCIe switch upstream port a multifunction device.
> 
> I can't quite parse this.  I guess the point is that a Switch Upstream
> Port may be one of the functions of a multifunction device?

Yes.

> > The resource distribution code does not take this into account properly
> > and therefore it expands the upstream port resource windows too much,
> > not leaving space for the other functions (in the multifunction device)
> > and this leads to an issue that Jonathan reported. He runs QEMU with
> > the following topoology (QEMU parameters):
> > 
> >  -device pcie-root-port,port=0,id=root_port13,chassis=0,slot=2	\
> >  -device x3130-upstream,id=sw1,bus=root_port13,multifunction=on	\
> >  -device e1000,bus=root_port13,addr=0.1 			\
> >  -device xio3130-downstream,id=fun1,bus=sw1,chassis=0,slot=3	\
> >  -device e1000,bus=fun1
> > 
> > The first e1000 NIC here is another function in the switch upstream
> > port. This leads to following errors:
> > 
> >   pci 0000:00:04.0: bridge window [mem 0x10200000-0x103fffff] to [bus 02-04]
> >   pci 0000:02:00.0: bridge window [mem 0x10200000-0x103fffff] to [bus 03-04]
> >   pci 0000:02:00.1: BAR 0: failed to assign [mem size 0x00020000]
> >   e1000 0000:02:00.1: can't ioremap BAR 0: [??? 0x00000000 flags 0x0]
> > 
> > Fix this by taking into account the possible multifunction devices when
> > uptream port resources are distributed.
> 
> Can you include the link to Jonathan's report?

Sure I will.

> > Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > ---
> > Hi,
> > 
> > This is the formal patch that resulted from the discussion here:
> > 
> > https://lore.kernel.org/linux-pci/20220905080232.36087-5-mika.westerberg@linux.intel.com/T/#m724289d0ee0c1ae07628744c283116e60efaeaf1
> > 
> > Only change from that version is that we loop through all resources of
> > the multifunction device.
> > 
> >  drivers/pci/setup-bus.c | 63 ++++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 59 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> > index b4096598dbcb..c8787b187ee4 100644
> > --- a/drivers/pci/setup-bus.c
> > +++ b/drivers/pci/setup-bus.c
> > @@ -1830,10 +1830,65 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
> >  	 * bridges below.
> >  	 */
> >  	if (hotplug_bridges + normal_bridges == 1) {
> > -		dev = list_first_entry(&bus->devices, struct pci_dev, bus_list);
> > -		if (dev->subordinate)
> > -			pci_bus_distribute_available_resources(dev->subordinate,
> > -				add_list, io, mmio, mmio_pref);
> > +		/* Upstream port must be the first */
> 
> Do you have any citation or reasoning for this handy?  We had this
> assumption before, and it seems true that an Upstream Port must be
> Function 0 because a variety of Link-related things have to be in
> Function 0, e.g., ARI ASPM Control, ARI Clock PM, Autonomous Width
> Disable, Flit Mode Disable, LTR Enable, OBFF Enable, etc.  But those
> are all pretty oblique.
> 
> I guess it's better to have the comment than not, but is the sort of
> assertion that makes one wonder why it is true.

Unfortunately I was not able to find such reference from the PCIe spec :(

> > +		bridge = list_first_entry(&bus->devices, struct pci_dev, bus_list);
> > +		if (!bridge->subordinate)
> > +			return;
> > +
> > +		/*
> > +		 * It is possible to have switch upstream port as a part
> > +		 * of a multifunction device. For this reason reduce the
> > +		 * resources occupied by the other functions before
> > +		 * distributing the rest.
> 
> The space consumed by the peer functions of the Switch Upstream Port
> is determined by their BAR sizes, so I don't think we actually reduce
> that.
> 
> I *think* the point here is to reduce the space available for
> distribution by the amount required by the peers of the Switch
> Upstream Port, right?  I.e., "mmio" is the amount of space we have to
> distribute, and before splitting it across devices on the secondary
> bus, we need to save out the space required for peers on the primary
> bus.

Yes, I will update the comment accordingly.

> > +		 */
> > +		list_for_each_entry(dev, &bus->devices, bus_list) {
> > +			int i;
> > +
> > +			if (dev == bridge)
> > +				continue;
> > +
> > +			/*
> > +			 * It should be multifunction but if not stop
> > +			 * the distribution and bail out.
> > +			 */
> > +			if (!dev->multifunction)
> > +				return;
> 
> Why do we bother with this?  If there are multiple devices on the bus,
> don't we want to consider them all, regardless of whether
> dev->multifunction is set?  It seems like a gratuitous check.

Agreed, I will remove it.

> 
> > +			for (i = 0; i < PCI_NUM_RESOURCES; i++) {
> > +				const struct resource *dev_res = &dev->resource[i];
> > +				resource_size_t dev_sz;
> > +				struct resource *b_res;
> > +
> > +				if (dev_res->flags & IORESOURCE_IO) {
> > +					b_res = &io;
> > +				} else if (dev_res->flags & IORESOURCE_MEM) {
> > +					if (dev_res->flags & IORESOURCE_PREFETCH)
> > +						b_res = &mmio_pref;
> > +					else
> > +						b_res = &mmio;
> > +				} else {
> > +					continue;
> > +				}
> > +
> > +				/* Size aligned to bridge window */
> > +				align = pci_resource_alignment(bridge, b_res);
> > +				dev_sz = ALIGN(resource_size(dev_res), align);
> > +
> > +				pci_dbg(dev, "%pR aligned to %llx\n", dev_res,
> 
> %#llx to avoid confusion and match other output.

OK.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-11-09 12:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-03 10:32 [PATCH 1/2] PCI: Take multifunction devices into account when distributing resources Mika Westerberg
2022-11-03 10:32 ` [PATCH 2/2] Revert "Revert "PCI: Distribute available resources for root buses, too"" Mika Westerberg
2022-11-08 21:11 ` [PATCH 1/2] PCI: Take multifunction devices into account when distributing resources Bjorn Helgaas
2022-11-09 12:41   ` Mika Westerberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).