linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 1/4] PCI: Consider alignment of hot-added bridges when distributing available resources
       [not found] <20190522222928.2964-1-nicholas.johnson-opensource@outlook.com.au>
@ 2019-05-22 14:30 ` Nicholas Johnson
  2019-06-15 19:56   ` Bjorn Helgaas
  2019-06-17  9:35   ` mika.westerberg
  2019-05-22 14:30 ` [PATCH v6 2/4] PCI: Modify extend_bridge_window() to set resource size directly Nicholas Johnson
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 12+ messages in thread
From: Nicholas Johnson @ 2019-05-22 14:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-pci, bhelgaas, mika.westerberg, corbet, Nicholas Johnson

Rewrite pci_bus_distribute_available_resources to better handle bridges
with different resource alignment requirements. Pass more details
arguments recursively to track the resource start and end addresses
relative to the initial hotplug bridge. This is especially useful for
Thunderbolt with native PCI enumeration, enabling external graphics
cards and other devices with bridge alignment higher than 0x100000
bytes.

Change extend_bridge_window to resize the actual resource, rather than
using add_list and dev_res->add_size. If an additional resource entry
exists for the given resource, zero out the add_size field to avoid it
interfering. Because add_size is considered optional when allocating,
using add_size could cause issues in some cases, because successful
resource distribution requires sizes to be guaranteed. Such cases
include hot-adding nested hotplug bridges in one enumeration, and
potentially others which are yet to be encountered.

Signed-off-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
---
 drivers/pci/setup-bus.c | 169 ++++++++++++++++++++--------------------
 1 file changed, 84 insertions(+), 85 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 0cdd5ff38..1b5b851ca 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1835,12 +1835,10 @@ static void extend_bridge_window(struct pci_dev *bridge, struct resource *res,
 }
 
 static void pci_bus_distribute_available_resources(struct pci_bus *bus,
-					    struct list_head *add_list,
-					    resource_size_t available_io,
-					    resource_size_t available_mmio,
-					    resource_size_t available_mmio_pref)
+	struct list_head *add_list, struct resource io,
+	struct resource mmio, struct resource mmio_pref)
 {
-	resource_size_t remaining_io, remaining_mmio, remaining_mmio_pref;
+	resource_size_t io_per_hp, mmio_per_hp, mmio_pref_per_hp, align;
 	unsigned int normal_bridges = 0, hotplug_bridges = 0;
 	struct resource *io_res, *mmio_res, *mmio_pref_res;
 	struct pci_dev *dev, *bridge = bus->self;
@@ -1850,29 +1848,36 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
 	mmio_pref_res = &bridge->resource[PCI_BRIDGE_RESOURCES + 2];
 
 	/*
-	 * Update additional resource list (add_list) to fill all the
-	 * extra resource space available for this port except the space
-	 * calculated in __pci_bus_size_bridges() which covers all the
-	 * devices currently connected to the port and below.
+	 * The alignment of this bridge is yet to be considered, hence it must
+	 * be done now before extending its bridge window. A single bridge
+	 * might not be able to occupy the whole parent region if the alignment
+	 * differs - for example, an external GPU at the end of a Thunderbolt
+	 * daisy chain.
 	 */
-	extend_bridge_window(bridge, io_res, add_list, available_io);
-	extend_bridge_window(bridge, mmio_res, add_list, available_mmio);
-	extend_bridge_window(bridge, mmio_pref_res, add_list,
-			     available_mmio_pref);
+	align = pci_resource_alignment(bridge, io_res);
+	if (!io_res->parent && align)
+		io.start = ALIGN(io.start, align);
+
+	align = pci_resource_alignment(bridge, mmio_res);
+	if (!mmio_res->parent && align)
+		mmio.start = ALIGN(mmio.start, align);
+
+	align = pci_resource_alignment(bridge, mmio_pref_res);
+	if (!mmio_pref_res->parent && align)
+		mmio_pref.start = ALIGN(mmio_pref.start, align);
 
 	/*
-	 * Calculate the total amount of extra resource space we can
-	 * pass to bridges below this one.  This is basically the
-	 * extra space reduced by the minimal required space for the
-	 * non-hotplug bridges.
+	 * Update the resources to fill as much remaining resource space in the
+	 * parent bridge as possible, while considering alignment.
 	 */
-	remaining_io = available_io;
-	remaining_mmio = available_mmio;
-	remaining_mmio_pref = available_mmio_pref;
+	extend_bridge_window(bridge, io_res, add_list, resource_size(&io));
+	extend_bridge_window(bridge, mmio_res, add_list, resource_size(&mmio));
+	extend_bridge_window(bridge, mmio_pref_res, add_list,
+		resource_size(&mmio_pref));
 
 	/*
 	 * Calculate how many hotplug bridges and normal bridges there
-	 * are on this bus.  We will distribute the additional available
+	 * are on this bus. We will distribute the additional available
 	 * resources between hotplug bridges.
 	 */
 	for_each_pci_bridge(dev, bus) {
@@ -1882,104 +1887,98 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
 			normal_bridges++;
 	}
 
+	/*
+	 * There is only one bridge on the bus so it gets all possible
+	 * resources which it can then distribute to the possible
+	 * hotplug bridges below.
+	 */
+	if (hotplug_bridges + normal_bridges == 1) {
+		dev = list_first_entry(&bus->devices, struct pci_dev, bus_list);
+		if (dev->subordinate)
+			pci_bus_distribute_available_resources(dev->subordinate,
+				add_list, io, mmio, mmio_pref);
+		return;
+	}
+
+	/*
+	 * Reduce the available resource space by what the
+	 * bridge and devices below it occupy.
+	 */
 	for_each_pci_bridge(dev, bus) {
-		const struct resource *res;
+		struct resource *res;
+		resource_size_t used_size;
 
 		if (dev->is_hotplug_bridge)
 			continue;
 
-		/*
-		 * Reduce the available resource space by what the
-		 * bridge and devices below it occupy.
-		 */
 		res = &dev->resource[PCI_BRIDGE_RESOURCES + 0];
-		if (!res->parent && available_io > resource_size(res))
-			remaining_io -= resource_size(res);
+		align = pci_resource_alignment(dev, res);
+		align = align ? ALIGN(io.start, align) - io.start : 0;
+		used_size = align + resource_size(res);
+		if (!res->parent && used_size <= resource_size(&io))
+			io.start += used_size;
 
 		res = &dev->resource[PCI_BRIDGE_RESOURCES + 1];
-		if (!res->parent && available_mmio > resource_size(res))
-			remaining_mmio -= resource_size(res);
+		align = pci_resource_alignment(dev, res);
+		align = align ? ALIGN(mmio.start, align) - mmio.start : 0;
+		used_size = align + resource_size(res);
+		if (!res->parent && used_size <= resource_size(&mmio))
+			mmio.start += used_size;
 
 		res = &dev->resource[PCI_BRIDGE_RESOURCES + 2];
-		if (!res->parent && available_mmio_pref > resource_size(res))
-			remaining_mmio_pref -= resource_size(res);
+		align = pci_resource_alignment(dev, res);
+		align = align ? ALIGN(mmio_pref.start, align) -
+				mmio_pref.start : 0;
+		used_size = align + resource_size(res);
+		if (!res->parent && used_size <= resource_size(&mmio_pref))
+			mmio_pref.start += used_size;
 	}
 
-	/*
-	 * There is only one bridge on the bus so it gets all available
-	 * resources which it can then distribute to the possible hotplug
-	 * bridges below.
-	 */
-	if (hotplug_bridges + normal_bridges == 1) {
-		dev = list_first_entry(&bus->devices, struct pci_dev, bus_list);
-		if (dev->subordinate) {
-			pci_bus_distribute_available_resources(dev->subordinate,
-				add_list, available_io, available_mmio,
-				available_mmio_pref);
-		}
+	if (!hotplug_bridges)
 		return;
-	}
 
 	/*
-	 * Go over devices on this bus and distribute the remaining
-	 * resource space between hotplug bridges.
+	 * Distribute any remaining resources equally between
+	 * the hotplug-capable downstream ports.
 	 */
-	for_each_pci_bridge(dev, bus) {
-		resource_size_t align, io, mmio, mmio_pref;
-		struct pci_bus *b;
+	io_per_hp = div64_ul(resource_size(&io), hotplug_bridges);
+	mmio_per_hp = div64_ul(resource_size(&mmio), hotplug_bridges);
+	mmio_pref_per_hp = div64_ul(resource_size(&mmio_pref),
+		hotplug_bridges);
 
-		b = dev->subordinate;
-		if (!b || !dev->is_hotplug_bridge)
+	for_each_pci_bridge(dev, bus) {
+		if (!dev->subordinate || !dev->is_hotplug_bridge)
 			continue;
 
-		/*
-		 * Distribute available extra resources equally between
-		 * hotplug-capable downstream ports taking alignment into
-		 * account.
-		 *
-		 * Here hotplug_bridges is always != 0.
-		 */
-		align = pci_resource_alignment(bridge, io_res);
-		io = div64_ul(available_io, hotplug_bridges);
-		io = min(ALIGN(io, align), remaining_io);
-		remaining_io -= io;
-
-		align = pci_resource_alignment(bridge, mmio_res);
-		mmio = div64_ul(available_mmio, hotplug_bridges);
-		mmio = min(ALIGN(mmio, align), remaining_mmio);
-		remaining_mmio -= mmio;
+		io.end = io.start + io_per_hp - 1;
+		mmio.end = mmio.start + mmio_per_hp - 1;
+		mmio_pref.end = mmio_pref.start + mmio_pref_per_hp - 1;
 
-		align = pci_resource_alignment(bridge, mmio_pref_res);
-		mmio_pref = div64_ul(available_mmio_pref, hotplug_bridges);
-		mmio_pref = min(ALIGN(mmio_pref, align), remaining_mmio_pref);
-		remaining_mmio_pref -= mmio_pref;
+		pci_bus_distribute_available_resources(dev->subordinate,
+			add_list, io, mmio, mmio_pref);
 
-		pci_bus_distribute_available_resources(b, add_list, io, mmio,
-						       mmio_pref);
+		io.start = io.end + 1;
+		mmio.start = mmio.end + 1;
+		mmio_pref.start = mmio_pref.end + 1;
 	}
 }
 
 static void pci_bridge_distribute_available_resources(struct pci_dev *bridge,
 						     struct list_head *add_list)
 {
-	resource_size_t available_io, available_mmio, available_mmio_pref;
-	const struct resource *res;
+	struct resource io_res, mmio_res, mmio_pref_res;
 
 	if (!bridge->is_hotplug_bridge)
 		return;
 
+	io_res = bridge->resource[PCI_BRIDGE_RESOURCES + 0];
+	mmio_res = bridge->resource[PCI_BRIDGE_RESOURCES + 1];
+	mmio_pref_res = bridge->resource[PCI_BRIDGE_RESOURCES + 2];
+
 	/* Take the initial extra resources from the hotplug port */
-	res = &bridge->resource[PCI_BRIDGE_RESOURCES + 0];
-	available_io = resource_size(res);
-	res = &bridge->resource[PCI_BRIDGE_RESOURCES + 1];
-	available_mmio = resource_size(res);
-	res = &bridge->resource[PCI_BRIDGE_RESOURCES + 2];
-	available_mmio_pref = resource_size(res);
 
 	pci_bus_distribute_available_resources(bridge->subordinate,
-					       add_list, available_io,
-					       available_mmio,
-					       available_mmio_pref);
+		add_list, io_res, mmio_res, mmio_pref_res);
 }
 
 void pci_assign_unassigned_bridge_resources(struct pci_dev *bridge)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 2/4] PCI: Modify extend_bridge_window() to set resource size directly
       [not found] <20190522222928.2964-1-nicholas.johnson-opensource@outlook.com.au>
  2019-05-22 14:30 ` [PATCH v6 1/4] PCI: Consider alignment of hot-added bridges when distributing available resources Nicholas Johnson
@ 2019-05-22 14:30 ` Nicholas Johnson
  2019-06-15 19:57   ` Bjorn Helgaas
  2019-05-22 14:31 ` [PATCH v6 3/4] PCI: Fix bug resulting in double hpmemsize being assigned to MMIO window Nicholas Johnson
  2019-05-22 14:31 ` [PATCH v6 4/4] PCI: Add pci=hpmemprefsize parameter to set MMIO_PREF size independently Nicholas Johnson
  3 siblings, 1 reply; 12+ messages in thread
From: Nicholas Johnson @ 2019-05-22 14:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-pci, bhelgaas, mika.westerberg, corbet, Nicholas Johnson

Background
==========================================================================

In the current state, the PCI allocation could fail with Thunderbolt
under certain unusual circumstances, because add_list resources are
"optional". Guaranteed allocation requires guaranteed resource sizes.

It is difficult to give examples of these failures - because without the
previous patch in the series, the symptoms of the problem are hidden by
larger problems. This patch has been split from the previous patch and
makes little sense on its own - as it is almost impossible to see the
effect of this patch without first fixing the problems addressed by the
previous patch. So the evidence I put forward for making this change is
that because add_list resources are "optional", there could be any
number of unforeseen bugs that are yet to be encountered if the kernel
decides not to assign all of the optional size. In kernel development,
we should not play around with chance.

Moving away from add_size also allows for use of pci=hpmemsize to assign
resources. Previously, when using add_size and not allowing the add_size
to shrink, it made it impossible to distribute resources. If a hotplug
bridge has size X, and below it is some devices with non-zero size Y and
a nested hotplug bridge of same size X, fitting X+Y into size X is
mathematically impossible.

This patch solves this by dropping add_size and giving each bridge the
maximum size possible without failing resource assignment. Using
pci=hpmemsize still works as pci_assign_unassigned_root_bus_resources()
does not call pci_bus_distribute_available_resources(). At boot,
pci_assign_unassigned_root_bus_resources() is used, instead of
pci_bridge_distribute_available_resources().

By allowing to use pci=hpmemsize, it removes the reliance on the
firmware to declare the window resources under the root port, and could
pay off in the future with USB4 (which is backward-compatible to
Thunderbolt devices, and not specific to Intel systems). Users of
Thunderbolt hardware on unsupported systems will be able to specify the
resources in the kernel parameters. Users of official systems will be
able to override the default firmware window sizes to allocate much
larger resource sizes, potentially enabling Thunderbolt support for
devices with massive BARs (with a few other problems solved by later
patches in this series).

Patch notes
==========================================================================

Modify extend_bridge_window() to remove the resource from add_list and
change the resource size directly.

Modify extend_bridge_window() to reset resources that are being assigned
zero size. This is required to prevent the bridge not being enabled due
to resources with zero size. This is a direct requirement to prevent the
change away from using add_list from introducing a regression - because
before, it was not possible to end up with zero size.

Signed-off-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
---
 drivers/pci/setup-bus.c | 42 ++++++++++++++++++++++++++---------------
 1 file changed, 27 insertions(+), 15 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 1b5b851ca..5675254fa 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1810,28 +1810,40 @@ void __init pci_assign_unassigned_resources(void)
 }
 
 static void extend_bridge_window(struct pci_dev *bridge, struct resource *res,
-				 struct list_head *add_list,
-				 resource_size_t available)
+			struct list_head *add_list, resource_size_t new_size)
 {
-	struct pci_dev_resource *dev_res;
+	resource_size_t add_size;
 
 	if (res->parent)
 		return;
 
-	if (resource_size(res) >= available)
-		return;
-
-	dev_res = res_to_dev_res(add_list, res);
-	if (!dev_res)
-		return;
+	if (new_size >= resource_size(res)) {
+		add_size = new_size - resource_size(res);
+		pci_dbg(bridge, "bridge window %pR extended by %pa\n", res,
+			&add_size);
+	} else {
+		add_size = resource_size(res) - new_size;
+		pci_dbg(bridge, "bridge window %pR shrunken by %pa\n", res,
+			&add_size);
+	}
 
-	/* Is there room to extend the window? */
-	if (available - resource_size(res) <= dev_res->add_size)
-		return;
+	/*
+	 * Resources requested using add_size in additional resource lists are
+	 * considered optional when allocated. Guaranteed size of allocation
+	 * is required to guarantee successful resource distribution. Hence,
+	 * the size of the actual resource must be adjusted, and the resource
+	 * removed from add_list to prevent any additional size interfering.
+	 */
+	res->end = res->start + new_size - 1;
+	remove_from_list(add_list, res);
 
-	dev_res->add_size = available - resource_size(res);
-	pci_dbg(bridge, "bridge window %pR extended by %pa\n", res,
-		&dev_res->add_size);
+	/*
+	 * If we have run out of bridge resources, we may end up with a
+	 * zero-sized resource which may cause its bridge to not be enabled.
+	 * Disabling the resource prevents any such issues.
+	 */
+	if (!new_size)
+		reset_resource(res);
 }
 
 static void pci_bus_distribute_available_resources(struct pci_bus *bus,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 3/4] PCI: Fix bug resulting in double hpmemsize being assigned to MMIO window
       [not found] <20190522222928.2964-1-nicholas.johnson-opensource@outlook.com.au>
  2019-05-22 14:30 ` [PATCH v6 1/4] PCI: Consider alignment of hot-added bridges when distributing available resources Nicholas Johnson
  2019-05-22 14:30 ` [PATCH v6 2/4] PCI: Modify extend_bridge_window() to set resource size directly Nicholas Johnson
@ 2019-05-22 14:31 ` Nicholas Johnson
  2019-05-22 14:31 ` [PATCH v6 4/4] PCI: Add pci=hpmemprefsize parameter to set MMIO_PREF size independently Nicholas Johnson
  3 siblings, 0 replies; 12+ messages in thread
From: Nicholas Johnson @ 2019-05-22 14:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-pci, bhelgaas, mika.westerberg, corbet, Nicholas Johnson

Background
==========================================================================

Solve bug report:
https://bugzilla.kernel.org/show_bug.cgi?id=203243

Currently, the kernel can sometimes assign the MMIO_PREF window
additional size into the MMIO window, resulting in double the MMIO
additional size, even if the MMIO_PREF window was successful.

This happens if in the first pass, the MMIO_PREF succeeds but the MMIO
fails. In the next pass, because MMIO_PREF is already assigned, the
attempt to assign MMIO_PREF returns an error code instead of success
(nothing more to do, already allocated).

Example of problem (more context can be found in the bug report URL):

Mainline kernel:
pci 0000:06:01.0: BAR 14: assigned [mem 0x90100000-0xa00fffff] = 256M
pci 0000:06:04.0: BAR 14: assigned [mem 0xa0200000-0xb01fffff] = 256M

Patched kernel:
pci 0000:06:01.0: BAR 14: assigned [mem 0x90100000-0x980fffff] = 128M
pci 0000:06:04.0: BAR 14: assigned [mem 0x98200000-0xa01fffff] = 128M

This was using pci=realloc,hpmemsize=128M,nocrs - on the same machine
with the same configuration, with a Ubuntu mainline kernel and a kernel
patched with this patch series.

This patch is vital for the next patch in the series. The next patch
allows the user to specify MMIO and MMIO_PREF independently. If the
MMIO_PREF is set to be very large, this bug will end up more than
doubling the MMIO size. The bug results in the MMIO_PREF being added to
the MMIO window, which means doubling if MMIO_PREF size == MMIO size.
With a large MMIO_PREF, without this patch, the MMIO window will likely
fail to be assigned altogether due to lack of 32-bit address space.

Patch notes
==========================================================================

Change find_free_bus_resource() to not skip assigned resources with
non-null parent.

Add checks in pbus_size_io() and pbus_size_mem() to return success if
resource returned from find_free_bus_resource() is already allocated.

This avoids pbus_size_io() and pbus_size_mem() returning error code to
__pci_bus_size_bridges() when a resource has been successfully assigned
in a previous pass. This fixes the existing behaviour where space for a
resource could be reserved multiple times in different parent bridge
windows. This also greatly reduces the number of failed BAR messages in
dmesg when Linux assigns resources.

Signed-off-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
---
 drivers/pci/setup-bus.c | 29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 5675254fa..8e1bc7ee7 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -752,13 +752,18 @@ static void pci_bridge_check_ranges(struct pci_bus *bus)
 }
 
 /*
- * Helper function for sizing routines: find first available bus resource
- * of a given type.  Note: we intentionally skip the bus resources which
- * have already been assigned (that is, have non-NULL parent resource).
+ * Helper function for sizing routines: find first bus resource of a given
+ * type. Note: we do not skip the bus resources which have already been
+ * assigned (r->parent != NULL). This is because a resource that is already
+ * assigned (nothing more to be done) will be indistinguishable from one that
+ * failed due to lack of space if we skip assigned resources. If the caller
+ * function cannot tell the difference then it might try to place the
+ * resources in a different window, doubling up on resources or causing
+ * unforeseeable issues.
  */
-static struct resource *find_free_bus_resource(struct pci_bus *bus,
-					       unsigned long type_mask,
-					       unsigned long type)
+struct resource *find_bus_resource_of_type(struct pci_bus *bus,
+						  unsigned long type_mask,
+						  unsigned long type)
 {
 	int i;
 	struct resource *r;
@@ -766,7 +771,7 @@ static struct resource *find_free_bus_resource(struct pci_bus *bus,
 	pci_bus_for_each_resource(bus, r, i) {
 		if (r == &ioport_resource || r == &iomem_resource)
 			continue;
-		if (r && (r->flags & type_mask) == type && !r->parent)
+		if (r && (r->flags & type_mask) == type)
 			return r;
 	}
 	return NULL;
@@ -866,14 +871,16 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
 			 struct list_head *realloc_head)
 {
 	struct pci_dev *dev;
-	struct resource *b_res = find_free_bus_resource(bus, IORESOURCE_IO,
-							IORESOURCE_IO);
+	struct resource *b_res = find_bus_resource_of_type(bus, IORESOURCE_IO,
+					IORESOURCE_IO);
 	resource_size_t size = 0, size0 = 0, size1 = 0;
 	resource_size_t children_add_size = 0;
 	resource_size_t min_align, align;
 
 	if (!b_res)
 		return;
+	if (b_res->parent)
+		return;
 
 	min_align = window_alignment(bus, IORESOURCE_IO);
 	list_for_each_entry(dev, &bus->devices, bus_list) {
@@ -978,7 +985,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 	resource_size_t min_align, align, size, size0, size1;
 	resource_size_t aligns[18]; /* Alignments from 1MB to 128GB */
 	int order, max_order;
-	struct resource *b_res = find_free_bus_resource(bus,
+	struct resource *b_res = find_bus_resource_of_type(bus,
 					mask | IORESOURCE_PREFETCH, type);
 	resource_size_t children_add_size = 0;
 	resource_size_t children_add_align = 0;
@@ -986,6 +993,8 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
 
 	if (!b_res)
 		return -ENOSPC;
+	if (b_res->parent)
+		return 0;
 
 	memset(aligns, 0, sizeof(aligns));
 	max_order = 0;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 4/4] PCI: Add pci=hpmemprefsize parameter to set MMIO_PREF size independently
       [not found] <20190522222928.2964-1-nicholas.johnson-opensource@outlook.com.au>
                   ` (2 preceding siblings ...)
  2019-05-22 14:31 ` [PATCH v6 3/4] PCI: Fix bug resulting in double hpmemsize being assigned to MMIO window Nicholas Johnson
@ 2019-05-22 14:31 ` Nicholas Johnson
  3 siblings, 0 replies; 12+ messages in thread
From: Nicholas Johnson @ 2019-05-22 14:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-pci, bhelgaas, mika.westerberg, corbet, Nicholas Johnson

Add kernel parameter pci=hpmemprefsize=nn[KMG] to control MMIO_PREF size
for PCI hotplug bridges.

Change behaviour of pci=hpmemsize=nn[KMG] to not set MMIO_PREF size if
hpmempref has been specified, rather than controlling both MMIO and
MMIO_PREF sizes unconditionally.

Update kernel-parameters documentation to reflect the above changes.

The effect of the above changes is to allow for MMIO and MMIO_PREF to be
specified independently, whilst ensuring no changes in functionality are
noticed by the user if updating kernel version with hpmemsize specified.

Signed-off-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
---
 .../admin-guide/kernel-parameters.txt         |  7 +++++-
 drivers/pci/pci.c                             | 18 ++++++++++---
 drivers/pci/setup-bus.c                       | 25 +++++++++++--------
 include/linux/pci.h                           |  3 ++-
 4 files changed, 37 insertions(+), 16 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 138f6664b..fdecff06c 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3439,7 +3439,12 @@
 				reserved for hotplug bridge's IO window.
 				Default size is 256 bytes.
 		hpmemsize=nn[KMG]	The fixed amount of bus space which is
-				reserved for hotplug bridge's memory window.
+				reserved for hotplug bridge's MMIO window. If
+				hpmemprefsize is not specified, then the same
+				size is applied to hotplug bridge's MMIO_PREF
+				window. Default size is 2 megabytes.
+		hpmemprefsize=nn[KMG]	The fixed amount of bus space which is
+				reserved for hotplug bridge's MMIO_PREF window.
 				Default size is 2 megabytes.
 		hpbussize=nn	The minimum amount of additional bus numbers
 				reserved for buses below a hotplug bridge.
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 8abc843b1..0e6292009 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -85,10 +85,12 @@ unsigned long pci_cardbus_io_size = DEFAULT_CARDBUS_IO_SIZE;
 unsigned long pci_cardbus_mem_size = DEFAULT_CARDBUS_MEM_SIZE;
 
 #define DEFAULT_HOTPLUG_IO_SIZE		(256)
-#define DEFAULT_HOTPLUG_MEM_SIZE	(2*1024*1024)
+#define DEFAULT_HOTPLUG_MMIO_SIZE	(2*1024*1024)
+#define DEFAULT_HOTPLUG_MMIO_PREF_SIZE	(2*1024*1024)
 /* pci=hpmemsize=nnM,hpiosize=nn can override this */
 unsigned long pci_hotplug_io_size  = DEFAULT_HOTPLUG_IO_SIZE;
-unsigned long pci_hotplug_mem_size = DEFAULT_HOTPLUG_MEM_SIZE;
+unsigned long pci_hotplug_mmio_size = DEFAULT_HOTPLUG_MMIO_SIZE;
+unsigned long pci_hotplug_mmio_pref_size = DEFAULT_HOTPLUG_MMIO_PREF_SIZE;
 
 #define DEFAULT_HOTPLUG_BUS_SIZE	1
 unsigned long pci_hotplug_bus_size = DEFAULT_HOTPLUG_BUS_SIZE;
@@ -6198,6 +6200,8 @@ EXPORT_SYMBOL(pci_fixup_cardbus);
 
 static int __init pci_setup(char *str)
 {
+	bool mmio_pref_specified = false;
+
 	while (str) {
 		char *k = strchr(str, ',');
 		if (k)
@@ -6232,7 +6236,15 @@ static int __init pci_setup(char *str)
 			} else if (!strncmp(str, "hpiosize=", 9)) {
 				pci_hotplug_io_size = memparse(str + 9, &str);
 			} else if (!strncmp(str, "hpmemsize=", 10)) {
-				pci_hotplug_mem_size = memparse(str + 10, &str);
+				pci_hotplug_mmio_size =
+					memparse(str + 10, &str);
+				if (!mmio_pref_specified)
+					pci_hotplug_mmio_pref_size =
+						pci_hotplug_mmio_size;
+			} else if (!strncmp(str, "hpmemprefsize=", 14)) {
+				mmio_pref_specified = true;
+				pci_hotplug_mmio_pref_size =
+					memparse(str + 14, &str);
 			} else if (!strncmp(str, "hpbussize=", 10)) {
 				pci_hotplug_bus_size =
 					simple_strtoul(str + 10, &str, 0);
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 8e1bc7ee7..44c37dda9 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1187,7 +1187,8 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
 {
 	struct pci_dev *dev;
 	unsigned long mask, prefmask, type2 = 0, type3 = 0;
-	resource_size_t additional_mem_size = 0, additional_io_size = 0;
+	resource_size_t additional_io_size = 0, additional_mmio_size = 0,
+		additional_mmio_pref_size = 0;
 	struct resource *b_res;
 	int ret;
 
@@ -1221,7 +1222,8 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
 		pci_bridge_check_ranges(bus);
 		if (bus->self->is_hotplug_bridge) {
 			additional_io_size  = pci_hotplug_io_size;
-			additional_mem_size = pci_hotplug_mem_size;
+			additional_mmio_size = pci_hotplug_mmio_size;
+			additional_mmio_pref_size = pci_hotplug_mmio_pref_size;
 		}
 		/* Fall through */
 	default:
@@ -1239,9 +1241,9 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
 		if (b_res[2].flags & IORESOURCE_MEM_64) {
 			prefmask |= IORESOURCE_MEM_64;
 			ret = pbus_size_mem(bus, prefmask, prefmask,
-				  prefmask, prefmask,
-				  realloc_head ? 0 : additional_mem_size,
-				  additional_mem_size, realloc_head);
+				prefmask, prefmask,
+				realloc_head ? 0 : additional_mmio_pref_size,
+				additional_mmio_pref_size, realloc_head);
 
 			/*
 			 * If successful, all non-prefetchable resources
@@ -1263,9 +1265,9 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
 		if (!type2) {
 			prefmask &= ~IORESOURCE_MEM_64;
 			ret = pbus_size_mem(bus, prefmask, prefmask,
-					 prefmask, prefmask,
-					 realloc_head ? 0 : additional_mem_size,
-					 additional_mem_size, realloc_head);
+				prefmask, prefmask,
+				realloc_head ? 0 : additional_mmio_pref_size,
+				additional_mmio_pref_size, realloc_head);
 
 			/*
 			 * If successful, only non-prefetchable resources
@@ -1274,7 +1276,8 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
 			if (ret == 0)
 				mask = prefmask;
 			else
-				additional_mem_size += additional_mem_size;
+				additional_mmio_size +=
+					additional_mmio_pref_size;
 
 			type2 = type3 = IORESOURCE_MEM;
 		}
@@ -1294,8 +1297,8 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct list_head *realloc_head)
 		 * prefetchable resource in a 64-bit prefetchable window.
 		 */
 		pbus_size_mem(bus, mask, IORESOURCE_MEM, type2, type3,
-				realloc_head ? 0 : additional_mem_size,
-				additional_mem_size, realloc_head);
+			realloc_head ? 0 : additional_mmio_size,
+			additional_mmio_size, realloc_head);
 		break;
 	}
 }
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 4a5a84d7b..7015f0b0f 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1972,7 +1972,8 @@ extern u8 pci_dfl_cache_line_size;
 extern u8 pci_cache_line_size;
 
 extern unsigned long pci_hotplug_io_size;
-extern unsigned long pci_hotplug_mem_size;
+extern unsigned long pci_hotplug_mmio_size;
+extern unsigned long pci_hotplug_mmio_pref_size;
 extern unsigned long pci_hotplug_bus_size;
 
 /* Architecture-specific versions may override these (weak) */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 1/4] PCI: Consider alignment of hot-added bridges when distributing available resources
  2019-05-22 14:30 ` [PATCH v6 1/4] PCI: Consider alignment of hot-added bridges when distributing available resources Nicholas Johnson
@ 2019-06-15 19:56   ` Bjorn Helgaas
  2019-06-17  9:21     ` mika.westerberg
  2019-06-19 13:40     ` Nicholas Johnson
  2019-06-17  9:35   ` mika.westerberg
  1 sibling, 2 replies; 12+ messages in thread
From: Bjorn Helgaas @ 2019-06-15 19:56 UTC (permalink / raw)
  To: Nicholas Johnson; +Cc: linux-kernel, linux-pci, mika.westerberg, corbet

Mika, this patch changes code you added in 1a5767725cec ("PCI:
Distribute available resources to hotplug-capable bridges").  Is there
any chance you could help review this?

On Wed, May 22, 2019 at 02:30:44PM +0000, Nicholas Johnson wrote:
> Rewrite pci_bus_distribute_available_resources to better handle bridges
> with different resource alignment requirements. Pass more details
> arguments recursively to track the resource start and end addresses
> relative to the initial hotplug bridge. This is especially useful for
> Thunderbolt with native PCI enumeration, enabling external graphics
> cards and other devices with bridge alignment higher than 0x100000
> bytes.
> 
> Change extend_bridge_window to resize the actual resource, rather than
> using add_list and dev_res->add_size. If an additional resource entry
> exists for the given resource, zero out the add_size field to avoid it
> interfering. Because add_size is considered optional when allocating,
> using add_size could cause issues in some cases, because successful
> resource distribution requires sizes to be guaranteed. Such cases
> include hot-adding nested hotplug bridges in one enumeration, and
> potentially others which are yet to be encountered.
> 
> Signed-off-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
> ---
>  drivers/pci/setup-bus.c | 169 ++++++++++++++++++++--------------------
>  1 file changed, 84 insertions(+), 85 deletions(-)
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index 0cdd5ff38..1b5b851ca 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -1835,12 +1835,10 @@ static void extend_bridge_window(struct pci_dev *bridge, struct resource *res,
>  }
>  
>  static void pci_bus_distribute_available_resources(struct pci_bus *bus,
> -					    struct list_head *add_list,
> -					    resource_size_t available_io,
> -					    resource_size_t available_mmio,
> -					    resource_size_t available_mmio_pref)
> +	struct list_head *add_list, struct resource io,
> +	struct resource mmio, struct resource mmio_pref)

Follow the parameter indentation style of the rest of the file.

>  {
> -	resource_size_t remaining_io, remaining_mmio, remaining_mmio_pref;
> +	resource_size_t io_per_hp, mmio_per_hp, mmio_pref_per_hp, align;
>  	unsigned int normal_bridges = 0, hotplug_bridges = 0;
>  	struct resource *io_res, *mmio_res, *mmio_pref_res;
>  	struct pci_dev *dev, *bridge = bus->self;
> @@ -1850,29 +1848,36 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
>  	mmio_pref_res = &bridge->resource[PCI_BRIDGE_RESOURCES + 2];
>  
>  	/*
> -	 * Update additional resource list (add_list) to fill all the
> -	 * extra resource space available for this port except the space
> -	 * calculated in __pci_bus_size_bridges() which covers all the
> -	 * devices currently connected to the port and below.
> +	 * The alignment of this bridge is yet to be considered, hence it must
> +	 * be done now before extending its bridge window. A single bridge
> +	 * might not be able to occupy the whole parent region if the alignment
> +	 * differs - for example, an external GPU at the end of a Thunderbolt
> +	 * daisy chain.

The example seems needlessly specific.  There isn't anything GPU- or
Thunderbolt-specific about this, is there?

Bridge windows can be aligned to any multiple of 1MB.  But a device
BAR must be aligned on its size, so any BAR larger than 1MB should be
able to cause this, e.g.,

  [mem 0x100000-0x3fffff] (bridge A 3MB window)
    [mem 0x200000-0x3fffff] (bridge B 2MB window)
      [mem 0x200000-0x3fffff] (device 2MB BAR)

Bridge B *could* occupy the the entire 3MB parent region, but it
doesn't need to.  But you say it "might not be *able* to", so maybe
you're thinking of something different?

> -	extend_bridge_window(bridge, io_res, add_list, available_io);
> -	extend_bridge_window(bridge, mmio_res, add_list, available_mmio);
> -	extend_bridge_window(bridge, mmio_pref_res, add_list,
> -			     available_mmio_pref);
> +	align = pci_resource_alignment(bridge, io_res);
> +	if (!io_res->parent && align)
> +		io.start = ALIGN(io.start, align);
> +
> +	align = pci_resource_alignment(bridge, mmio_res);
> +	if (!mmio_res->parent && align)
> +		mmio.start = ALIGN(mmio.start, align);
> +
> +	align = pci_resource_alignment(bridge, mmio_pref_res);
> +	if (!mmio_pref_res->parent && align)
> +		mmio_pref.start = ALIGN(mmio_pref.start, align);
>  
>  	/*
> -	 * Calculate the total amount of extra resource space we can
> -	 * pass to bridges below this one.  This is basically the
> -	 * extra space reduced by the minimal required space for the
> -	 * non-hotplug bridges.
> +	 * Update the resources to fill as much remaining resource space in the
> +	 * parent bridge as possible, while considering alignment.
>  	 */
> -	remaining_io = available_io;
> -	remaining_mmio = available_mmio;
> -	remaining_mmio_pref = available_mmio_pref;
> +	extend_bridge_window(bridge, io_res, add_list, resource_size(&io));
> +	extend_bridge_window(bridge, mmio_res, add_list, resource_size(&mmio));
> +	extend_bridge_window(bridge, mmio_pref_res, add_list,
> +		resource_size(&mmio_pref));
>  
>  	/*
>  	 * Calculate how many hotplug bridges and normal bridges there
> -	 * are on this bus.  We will distribute the additional available
> +	 * are on this bus. We will distribute the additional available

This whitespace change is pointless and distracting.

>  	 * resources between hotplug bridges.
>  	 */
>  	for_each_pci_bridge(dev, bus) {
> @@ -1882,104 +1887,98 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
>  			normal_bridges++;
>  	}
>  
> +	/*
> +	 * There is only one bridge on the bus so it gets all possible
> +	 * resources which it can then distribute to the possible
> +	 * hotplug bridges below.
> +	 */
> +	if (hotplug_bridges + normal_bridges == 1) {
> +		dev = list_first_entry(&bus->devices, struct pci_dev, bus_list);
> +		if (dev->subordinate)
> +			pci_bus_distribute_available_resources(dev->subordinate,
> +				add_list, io, mmio, mmio_pref);
> +		return;
> +	}

Moving this "single bridge" case up makes sense, and I think it could
be done by a separate patch preceding this one.  Mika, I remember some
discussion about this case, but I can't remember if there's some
reason you didn't do this initially.

The current code is:

  for_each_pci_bridge(dev, bus)
    # compute hotplug_bridges, normal_bridges

  for_each_pci_bridge(dev, bus)
    # compute remaining_io, etc

  if (hotplug_bridges + normal_bridges == 1)
    # handle single bridge case

  for_each_pci_bridge(dev, bus)
    # use remaining_io, etc here

AFAICT the single bridge case has no dependency on the remaining_io
computation.

> +	/*
> +	 * Reduce the available resource space by what the
> +	 * bridge and devices below it occupy.
> +	 */
>  	for_each_pci_bridge(dev, bus) {
> -		const struct resource *res;
> +		struct resource *res;
> +		resource_size_t used_size;
>  
>  		if (dev->is_hotplug_bridge)
>  			continue;
>  
> -		/*
> -		 * Reduce the available resource space by what the
> -		 * bridge and devices below it occupy.
> -		 */
>  		res = &dev->resource[PCI_BRIDGE_RESOURCES + 0];
> -		if (!res->parent && available_io > resource_size(res))
> -			remaining_io -= resource_size(res);
> +		align = pci_resource_alignment(dev, res);
> +		align = align ? ALIGN(io.start, align) - io.start : 0;
> +		used_size = align + resource_size(res);
> +		if (!res->parent && used_size <= resource_size(&io))
> +			io.start += used_size;
>  
>  		res = &dev->resource[PCI_BRIDGE_RESOURCES + 1];
> -		if (!res->parent && available_mmio > resource_size(res))
> -			remaining_mmio -= resource_size(res);
> +		align = pci_resource_alignment(dev, res);
> +		align = align ? ALIGN(mmio.start, align) - mmio.start : 0;
> +		used_size = align + resource_size(res);
> +		if (!res->parent && used_size <= resource_size(&mmio))
> +			mmio.start += used_size;
>  
>  		res = &dev->resource[PCI_BRIDGE_RESOURCES + 2];
> -		if (!res->parent && available_mmio_pref > resource_size(res))
> -			remaining_mmio_pref -= resource_size(res);
> +		align = pci_resource_alignment(dev, res);
> +		align = align ? ALIGN(mmio_pref.start, align) -
> +				mmio_pref.start : 0;
> +		used_size = align + resource_size(res);
> +		if (!res->parent && used_size <= resource_size(&mmio_pref))
> +			mmio_pref.start += used_size;
>  	}
>  
> -	/*
> -	 * There is only one bridge on the bus so it gets all available
> -	 * resources which it can then distribute to the possible hotplug
> -	 * bridges below.
> -	 */
> -	if (hotplug_bridges + normal_bridges == 1) {
> -		dev = list_first_entry(&bus->devices, struct pci_dev, bus_list);
> -		if (dev->subordinate) {
> -			pci_bus_distribute_available_resources(dev->subordinate,
> -				add_list, available_io, available_mmio,
> -				available_mmio_pref);
> -		}
> +	if (!hotplug_bridges)
>  		return;

I like the addition of this early return when there are no hotplug
bridges.  The following loop is a no-op if there are no hotplug
bridges, so it doesn't *fix* anything, but it does make it more
obvious that we don't even have to bother with the loop at all, and
it makes the "Here hotplug_bridges is always != 0" comment
unnecessary.

I think this could be done in a separate patch before this one, too.
Anything we can do to simplify these patches is a win because the code
is so complicated.

>  	/*
> -	 * Go over devices on this bus and distribute the remaining
> -	 * resource space between hotplug bridges.
> +	 * Distribute any remaining resources equally between
> +	 * the hotplug-capable downstream ports.
>  	 */
> -	for_each_pci_bridge(dev, bus) {
> -		resource_size_t align, io, mmio, mmio_pref;
> -		struct pci_bus *b;
> +	io_per_hp = div64_ul(resource_size(&io), hotplug_bridges);
> +	mmio_per_hp = div64_ul(resource_size(&mmio), hotplug_bridges);
> +	mmio_pref_per_hp = div64_ul(resource_size(&mmio_pref),
> +		hotplug_bridges);
>  
> -		b = dev->subordinate;
> -		if (!b || !dev->is_hotplug_bridge)
> +	for_each_pci_bridge(dev, bus) {
> +		if (!dev->subordinate || !dev->is_hotplug_bridge)
>  			continue;
>  
> -		/*
> -		 * Distribute available extra resources equally between
> -		 * hotplug-capable downstream ports taking alignment into
> -		 * account.
> -		 *
> -		 * Here hotplug_bridges is always != 0.
> -		 */
> -		align = pci_resource_alignment(bridge, io_res);
> -		io = div64_ul(available_io, hotplug_bridges);
> -		io = min(ALIGN(io, align), remaining_io);
> -		remaining_io -= io;
> -
> -		align = pci_resource_alignment(bridge, mmio_res);
> -		mmio = div64_ul(available_mmio, hotplug_bridges);
> -		mmio = min(ALIGN(mmio, align), remaining_mmio);
> -		remaining_mmio -= mmio;
> +		io.end = io.start + io_per_hp - 1;
> +		mmio.end = mmio.start + mmio_per_hp - 1;
> +		mmio_pref.end = mmio_pref.start + mmio_pref_per_hp - 1;
>  
> -		align = pci_resource_alignment(bridge, mmio_pref_res);
> -		mmio_pref = div64_ul(available_mmio_pref, hotplug_bridges);
> -		mmio_pref = min(ALIGN(mmio_pref, align), remaining_mmio_pref);
> -		remaining_mmio_pref -= mmio_pref;
> +		pci_bus_distribute_available_resources(dev->subordinate,
> +			add_list, io, mmio, mmio_pref);
>  
> -		pci_bus_distribute_available_resources(b, add_list, io, mmio,
> -						       mmio_pref);
> +		io.start = io.end + 1;
> +		mmio.start = mmio.end + 1;
> +		mmio_pref.start = mmio_pref.end + 1;
>  	}

I like the simplification of this loop.

>  }
>  
>  static void pci_bridge_distribute_available_resources(struct pci_dev *bridge,
>  						     struct list_head *add_list)
>  {
> -	resource_size_t available_io, available_mmio, available_mmio_pref;
> -	const struct resource *res;
> +	struct resource io_res, mmio_res, mmio_pref_res;
>  
>  	if (!bridge->is_hotplug_bridge)
>  		return;
>  
> +	io_res = bridge->resource[PCI_BRIDGE_RESOURCES + 0];
> +	mmio_res = bridge->resource[PCI_BRIDGE_RESOURCES + 1];
> +	mmio_pref_res = bridge->resource[PCI_BRIDGE_RESOURCES + 2];
> +
>  	/* Take the initial extra resources from the hotplug port */
> -	res = &bridge->resource[PCI_BRIDGE_RESOURCES + 0];
> -	available_io = resource_size(res);
> -	res = &bridge->resource[PCI_BRIDGE_RESOURCES + 1];
> -	available_mmio = resource_size(res);
> -	res = &bridge->resource[PCI_BRIDGE_RESOURCES + 2];
> -	available_mmio_pref = resource_size(res);
>  
>  	pci_bus_distribute_available_resources(bridge->subordinate,
> -					       add_list, available_io,
> -					       available_mmio,
> -					       available_mmio_pref);
> +		add_list, io_res, mmio_res, mmio_pref_res);
>  }
>  
>  void pci_assign_unassigned_bridge_resources(struct pci_dev *bridge)
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 2/4] PCI: Modify extend_bridge_window() to set resource size directly
  2019-05-22 14:30 ` [PATCH v6 2/4] PCI: Modify extend_bridge_window() to set resource size directly Nicholas Johnson
@ 2019-06-15 19:57   ` Bjorn Helgaas
  0 siblings, 0 replies; 12+ messages in thread
From: Bjorn Helgaas @ 2019-06-15 19:57 UTC (permalink / raw)
  To: Nicholas Johnson; +Cc: linux-kernel, linux-pci, mika.westerberg, corbet

On Wed, May 22, 2019 at 02:30:57PM +0000, Nicholas Johnson wrote:
> Background
> ==========================================================================
> 
> In the current state, the PCI allocation could fail with Thunderbolt
> under certain unusual circumstances, because add_list resources are
> "optional". Guaranteed allocation requires guaranteed resource sizes.

I don't see anything here specific to Thunderbolt, so this boils down
to "allocation might fail in unusual cases".  That's true, of course,
but in order to fix something we have to identify a failure that could
be avoided.

Part of the reason for "add_size" is for SR-IOV devices where we can
control the amount of space they require.  The PF space is mandatory,
but we can adjust the number of VFs and the space they use.

If we don't have enough space for all the possible VFs, we'd rather
allocate space for some of them than fail completely.

We can never guarantee that there's enough space for all devices, even
if we make all the possible VF space required instead of optional.

> It is difficult to give examples of these failures - because without the
> previous patch in the series, the symptoms of the problem are hidden by
> larger problems. This patch has been split from the previous patch and
> makes little sense on its own - as it is almost impossible to see the
> effect of this patch without first fixing the problems addressed by the
> previous patch. So the evidence I put forward for making this change is
> that because add_list resources are "optional", there could be any
> number of unforeseen bugs that are yet to be encountered if the kernel
> decides not to assign all of the optional size. In kernel development,
> we should not play around with chance.
> 
> Moving away from add_size also allows for use of pci=hpmemsize to assign
> resources. Previously, when using add_size and not allowing the add_size
> to shrink, it made it impossible to distribute resources. If a hotplug
> bridge has size X, and below it is some devices with non-zero size Y and
> a nested hotplug bridge of same size X, fitting X+Y into size X is
> mathematically impossible.
> 
> This patch solves this by dropping add_size and giving each bridge the
> maximum size possible without failing resource assignment. Using
> pci=hpmemsize still works as pci_assign_unassigned_root_bus_resources()
> does not call pci_bus_distribute_available_resources(). At boot,
> pci_assign_unassigned_root_bus_resources() is used, instead of
> pci_bridge_distribute_available_resources().
> 
> By allowing to use pci=hpmemsize, it removes the reliance on the
> firmware to declare the window resources under the root port, and could
> pay off in the future with USB4 (which is backward-compatible to
> Thunderbolt devices, and not specific to Intel systems). Users of
> Thunderbolt hardware on unsupported systems will be able to specify the
> resources in the kernel parameters. Users of official systems will be
> able to override the default firmware window sizes to allocate much
> larger resource sizes, potentially enabling Thunderbolt support for
> devices with massive BARs (with a few other problems solved by later
> patches in this series).
> 
> Patch notes
> ==========================================================================
> 
> Modify extend_bridge_window() to remove the resource from add_list and
> change the resource size directly.
> 
> Modify extend_bridge_window() to reset resources that are being assigned
> zero size. This is required to prevent the bridge not being enabled due
> to resources with zero size. This is a direct requirement to prevent the
> change away from using add_list from introducing a regression - because
> before, it was not possible to end up with zero size.
> 
> Signed-off-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
> ---
>  drivers/pci/setup-bus.c | 42 ++++++++++++++++++++++++++---------------
>  1 file changed, 27 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index 1b5b851ca..5675254fa 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -1810,28 +1810,40 @@ void __init pci_assign_unassigned_resources(void)
>  }
>  
>  static void extend_bridge_window(struct pci_dev *bridge, struct resource *res,
> -				 struct list_head *add_list,
> -				 resource_size_t available)
> +			struct list_head *add_list, resource_size_t new_size)

Follow parameter indentation style of the rest of the file.  It's OK if it
requires another line.

>  {
> -	struct pci_dev_resource *dev_res;
> +	resource_size_t add_size;
>  
>  	if (res->parent)
>  		return;
>  
> -	if (resource_size(res) >= available)
> -		return;
> -
> -	dev_res = res_to_dev_res(add_list, res);
> -	if (!dev_res)
> -		return;
> +	if (new_size >= resource_size(res)) {
> +		add_size = new_size - resource_size(res);
> +		pci_dbg(bridge, "bridge window %pR extended by %pa\n", res,
> +			&add_size);
> +	} else {
> +		add_size = resource_size(res) - new_size;
> +		pci_dbg(bridge, "bridge window %pR shrunken by %pa\n", res,
> +			&add_size);
> +	}
>  
> -	/* Is there room to extend the window? */
> -	if (available - resource_size(res) <= dev_res->add_size)
> -		return;
> +	/*
> +	 * Resources requested using add_size in additional resource lists are
> +	 * considered optional when allocated. Guaranteed size of allocation
> +	 * is required to guarantee successful resource distribution. Hence,
> +	 * the size of the actual resource must be adjusted, and the resource
> +	 * removed from add_list to prevent any additional size interfering.
> +	 */
> +	res->end = res->start + new_size - 1;
> +	remove_from_list(add_list, res);
>  
> -	dev_res->add_size = available - resource_size(res);
> -	pci_dbg(bridge, "bridge window %pR extended by %pa\n", res,
> -		&dev_res->add_size);
> +	/*
> +	 * If we have run out of bridge resources, we may end up with a
> +	 * zero-sized resource which may cause its bridge to not be enabled.
> +	 * Disabling the resource prevents any such issues.

I don't understand the concern about the bridge not being enabled; can
you help me out?

When you refer to the bridge not being enabled, I'm guessing you mean
the bridge PCI_COMMAND_IO and PCI_COMMAND_MEMORY bits might not be
set?

"res" refers to a bridge window (io, mmio, or mmio_pref).  Bridge
windows can be individually disabled (by setting limit < base), and
that should not cause the bridge itself to be disabled.
reset_resource() itself doesn't touch the bridge base or limit
registers; is there something later that does?

I don't like reset_resource() because it throws away the flags that
tell us what sort of resource we have (I/O, MMIO, 64-bit).  The only
way to get that back is to re-read the device's config space.  I think
the goal should be to read that *once* during enumeration and keep it.
I know we already use reset_resource() elsewhere, but I'd rather not
add new uses if we can avoid it.

> +	if (!new_size)
> +		reset_resource(res);
>  }
>  
>  static void pci_bus_distribute_available_resources(struct pci_bus *bus,
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 1/4] PCI: Consider alignment of hot-added bridges when distributing available resources
  2019-06-15 19:56   ` Bjorn Helgaas
@ 2019-06-17  9:21     ` mika.westerberg
  2019-06-19 13:40     ` Nicholas Johnson
  1 sibling, 0 replies; 12+ messages in thread
From: mika.westerberg @ 2019-06-17  9:21 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Nicholas Johnson, linux-kernel, linux-pci, corbet

On Sat, Jun 15, 2019 at 02:56:36PM -0500, Bjorn Helgaas wrote:
> Mika, this patch changes code you added in 1a5767725cec ("PCI:
> Distribute available resources to hotplug-capable bridges").  Is there
> any chance you could help review this?

Sure, I'll take a look and comment separately.

> On Wed, May 22, 2019 at 02:30:44PM +0000, Nicholas Johnson wrote:
> > Rewrite pci_bus_distribute_available_resources to better handle bridges
> > with different resource alignment requirements. Pass more details
> > arguments recursively to track the resource start and end addresses
> > relative to the initial hotplug bridge. This is especially useful for
> > Thunderbolt with native PCI enumeration, enabling external graphics
> > cards and other devices with bridge alignment higher than 0x100000
> > bytes.
> > 
> > Change extend_bridge_window to resize the actual resource, rather than
> > using add_list and dev_res->add_size. If an additional resource entry
> > exists for the given resource, zero out the add_size field to avoid it
> > interfering. Because add_size is considered optional when allocating,
> > using add_size could cause issues in some cases, because successful
> > resource distribution requires sizes to be guaranteed. Such cases
> > include hot-adding nested hotplug bridges in one enumeration, and
> > potentially others which are yet to be encountered.
> > 
> > Signed-off-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
> > ---
> >  drivers/pci/setup-bus.c | 169 ++++++++++++++++++++--------------------
> >  1 file changed, 84 insertions(+), 85 deletions(-)
> > 
> > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> > index 0cdd5ff38..1b5b851ca 100644
> > --- a/drivers/pci/setup-bus.c
> > +++ b/drivers/pci/setup-bus.c
> > @@ -1835,12 +1835,10 @@ static void extend_bridge_window(struct pci_dev *bridge, struct resource *res,
> >  }
> >  
> >  static void pci_bus_distribute_available_resources(struct pci_bus *bus,
> > -					    struct list_head *add_list,
> > -					    resource_size_t available_io,
> > -					    resource_size_t available_mmio,
> > -					    resource_size_t available_mmio_pref)
> > +	struct list_head *add_list, struct resource io,
> > +	struct resource mmio, struct resource mmio_pref)
> 
> Follow the parameter indentation style of the rest of the file.
> 
> >  {
> > -	resource_size_t remaining_io, remaining_mmio, remaining_mmio_pref;
> > +	resource_size_t io_per_hp, mmio_per_hp, mmio_pref_per_hp, align;
> >  	unsigned int normal_bridges = 0, hotplug_bridges = 0;
> >  	struct resource *io_res, *mmio_res, *mmio_pref_res;
> >  	struct pci_dev *dev, *bridge = bus->self;
> > @@ -1850,29 +1848,36 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
> >  	mmio_pref_res = &bridge->resource[PCI_BRIDGE_RESOURCES + 2];
> >  
> >  	/*
> > -	 * Update additional resource list (add_list) to fill all the
> > -	 * extra resource space available for this port except the space
> > -	 * calculated in __pci_bus_size_bridges() which covers all the
> > -	 * devices currently connected to the port and below.
> > +	 * The alignment of this bridge is yet to be considered, hence it must
> > +	 * be done now before extending its bridge window. A single bridge
> > +	 * might not be able to occupy the whole parent region if the alignment
> > +	 * differs - for example, an external GPU at the end of a Thunderbolt
> > +	 * daisy chain.
> 
> The example seems needlessly specific.  There isn't anything GPU- or
> Thunderbolt-specific about this, is there?
> 
> Bridge windows can be aligned to any multiple of 1MB.  But a device
> BAR must be aligned on its size, so any BAR larger than 1MB should be
> able to cause this, e.g.,
> 
>   [mem 0x100000-0x3fffff] (bridge A 3MB window)
>     [mem 0x200000-0x3fffff] (bridge B 2MB window)
>       [mem 0x200000-0x3fffff] (device 2MB BAR)
> 
> Bridge B *could* occupy the the entire 3MB parent region, but it
> doesn't need to.  But you say it "might not be *able* to", so maybe
> you're thinking of something different?
> 
> > -	extend_bridge_window(bridge, io_res, add_list, available_io);
> > -	extend_bridge_window(bridge, mmio_res, add_list, available_mmio);
> > -	extend_bridge_window(bridge, mmio_pref_res, add_list,
> > -			     available_mmio_pref);
> > +	align = pci_resource_alignment(bridge, io_res);
> > +	if (!io_res->parent && align)
> > +		io.start = ALIGN(io.start, align);
> > +
> > +	align = pci_resource_alignment(bridge, mmio_res);
> > +	if (!mmio_res->parent && align)
> > +		mmio.start = ALIGN(mmio.start, align);
> > +
> > +	align = pci_resource_alignment(bridge, mmio_pref_res);
> > +	if (!mmio_pref_res->parent && align)
> > +		mmio_pref.start = ALIGN(mmio_pref.start, align);
> >  
> >  	/*
> > -	 * Calculate the total amount of extra resource space we can
> > -	 * pass to bridges below this one.  This is basically the
> > -	 * extra space reduced by the minimal required space for the
> > -	 * non-hotplug bridges.
> > +	 * Update the resources to fill as much remaining resource space in the
> > +	 * parent bridge as possible, while considering alignment.
> >  	 */
> > -	remaining_io = available_io;
> > -	remaining_mmio = available_mmio;
> > -	remaining_mmio_pref = available_mmio_pref;
> > +	extend_bridge_window(bridge, io_res, add_list, resource_size(&io));
> > +	extend_bridge_window(bridge, mmio_res, add_list, resource_size(&mmio));
> > +	extend_bridge_window(bridge, mmio_pref_res, add_list,
> > +		resource_size(&mmio_pref));
> >  
> >  	/*
> >  	 * Calculate how many hotplug bridges and normal bridges there
> > -	 * are on this bus.  We will distribute the additional available
> > +	 * are on this bus. We will distribute the additional available
> 
> This whitespace change is pointless and distracting.
> 
> >  	 * resources between hotplug bridges.
> >  	 */
> >  	for_each_pci_bridge(dev, bus) {
> > @@ -1882,104 +1887,98 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
> >  			normal_bridges++;
> >  	}
> >  
> > +	/*
> > +	 * There is only one bridge on the bus so it gets all possible
> > +	 * resources which it can then distribute to the possible
> > +	 * hotplug bridges below.
> > +	 */
> > +	if (hotplug_bridges + normal_bridges == 1) {
> > +		dev = list_first_entry(&bus->devices, struct pci_dev, bus_list);
> > +		if (dev->subordinate)
> > +			pci_bus_distribute_available_resources(dev->subordinate,
> > +				add_list, io, mmio, mmio_pref);
> > +		return;
> > +	}
> 
> Moving this "single bridge" case up makes sense, and I think it could
> be done by a separate patch preceding this one.  Mika, I remember some
> discussion about this case, but I can't remember if there's some
> reason you didn't do this initially.

The single bridge case was already moved outside of the loop in
14fe5951b667 ("PCI: Move resource distribution for single bridge outside
loop"). But indeed there is no dependency to the code below so probably
just my oversight why it was not moved further up.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 1/4] PCI: Consider alignment of hot-added bridges when distributing available resources
  2019-05-22 14:30 ` [PATCH v6 1/4] PCI: Consider alignment of hot-added bridges when distributing available resources Nicholas Johnson
  2019-06-15 19:56   ` Bjorn Helgaas
@ 2019-06-17  9:35   ` mika.westerberg
  2019-06-17 18:22     ` Bjorn Helgaas
  2019-06-19 13:47     ` Nicholas Johnson
  1 sibling, 2 replies; 12+ messages in thread
From: mika.westerberg @ 2019-06-17  9:35 UTC (permalink / raw)
  To: Nicholas Johnson; +Cc: linux-kernel, linux-pci, bhelgaas, corbet

On Wed, May 22, 2019 at 02:30:44PM +0000, Nicholas Johnson wrote:
> Rewrite pci_bus_distribute_available_resources to better handle bridges
> with different resource alignment requirements. Pass more details
> arguments recursively to track the resource start and end addresses
> relative to the initial hotplug bridge. This is especially useful for
> Thunderbolt with native PCI enumeration, enabling external graphics
> cards and other devices with bridge alignment higher than 0x100000
 
Instead of 0x100000 you could say 1MB here.

> bytes.
> 
> Change extend_bridge_window to resize the actual resource, rather than
> using add_list and dev_res->add_size. If an additional resource entry
> exists for the given resource, zero out the add_size field to avoid it
> interfering. Because add_size is considered optional when allocating,
> using add_size could cause issues in some cases, because successful
> resource distribution requires sizes to be guaranteed. Such cases
> include hot-adding nested hotplug bridges in one enumeration, and
> potentially others which are yet to be encountered.
> 
> Signed-off-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>

The logic makes sense to me but since you probably need to spin another
revision anyway please find a couple of additional comments below.

> ---
>  drivers/pci/setup-bus.c | 169 ++++++++++++++++++++--------------------
>  1 file changed, 84 insertions(+), 85 deletions(-)
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index 0cdd5ff38..1b5b851ca 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -1835,12 +1835,10 @@ static void extend_bridge_window(struct pci_dev *bridge, struct resource *res,
>  }
>  
>  static void pci_bus_distribute_available_resources(struct pci_bus *bus,
> -					    struct list_head *add_list,
> -					    resource_size_t available_io,
> -					    resource_size_t available_mmio,
> -					    resource_size_t available_mmio_pref)
> +	struct list_head *add_list, struct resource io,
> +	struct resource mmio, struct resource mmio_pref)
>  {
> -	resource_size_t remaining_io, remaining_mmio, remaining_mmio_pref;
> +	resource_size_t io_per_hp, mmio_per_hp, mmio_pref_per_hp, align;
>  	unsigned int normal_bridges = 0, hotplug_bridges = 0;
>  	struct resource *io_res, *mmio_res, *mmio_pref_res;
>  	struct pci_dev *dev, *bridge = bus->self;
> @@ -1850,29 +1848,36 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
>  	mmio_pref_res = &bridge->resource[PCI_BRIDGE_RESOURCES + 2];
>  
>  	/*
> -	 * Update additional resource list (add_list) to fill all the
> -	 * extra resource space available for this port except the space
> -	 * calculated in __pci_bus_size_bridges() which covers all the
> -	 * devices currently connected to the port and below.
> +	 * The alignment of this bridge is yet to be considered, hence it must
> +	 * be done now before extending its bridge window. A single bridge
> +	 * might not be able to occupy the whole parent region if the alignment
> +	 * differs - for example, an external GPU at the end of a Thunderbolt
> +	 * daisy chain.

As Bjorn also commented there is nothing Thunderbolt specific here so I
would leave it out of the comment because it is kind of confusing.

>  	 */
> -	extend_bridge_window(bridge, io_res, add_list, available_io);
> -	extend_bridge_window(bridge, mmio_res, add_list, available_mmio);
> -	extend_bridge_window(bridge, mmio_pref_res, add_list,
> -			     available_mmio_pref);
> +	align = pci_resource_alignment(bridge, io_res);
> +	if (!io_res->parent && align)
> +		io.start = ALIGN(io.start, align);
> +
> +	align = pci_resource_alignment(bridge, mmio_res);
> +	if (!mmio_res->parent && align)
> +		mmio.start = ALIGN(mmio.start, align);
> +
> +	align = pci_resource_alignment(bridge, mmio_pref_res);
> +	if (!mmio_pref_res->parent && align)
> +		mmio_pref.start = ALIGN(mmio_pref.start, align);
>  
>  	/*
> -	 * Calculate the total amount of extra resource space we can
> -	 * pass to bridges below this one.  This is basically the
> -	 * extra space reduced by the minimal required space for the
> -	 * non-hotplug bridges.
> +	 * Update the resources to fill as much remaining resource space in the
> +	 * parent bridge as possible, while considering alignment.
>  	 */
> -	remaining_io = available_io;
> -	remaining_mmio = available_mmio;
> -	remaining_mmio_pref = available_mmio_pref;
> +	extend_bridge_window(bridge, io_res, add_list, resource_size(&io));
> +	extend_bridge_window(bridge, mmio_res, add_list, resource_size(&mmio));
> +	extend_bridge_window(bridge, mmio_pref_res, add_list,
> +		resource_size(&mmio_pref));

Please write it like it was originally (e.g align the second line to the
opening bracket):

	extend_bridge_window(bridge, mmio_pref_res, add_list,
			     resource_size(&mmio_pref));

>  
>  	/*
>  	 * Calculate how many hotplug bridges and normal bridges there
> -	 * are on this bus.  We will distribute the additional available
> +	 * are on this bus. We will distribute the additional available
>  	 * resources between hotplug bridges.
>  	 */
>  	for_each_pci_bridge(dev, bus) {
> @@ -1882,104 +1887,98 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
>  			normal_bridges++;
>  	}
>  
> +	/*
> +	 * There is only one bridge on the bus so it gets all possible
> +	 * resources which it can then distribute to the possible
> +	 * hotplug bridges below.
> +	 */
> +	if (hotplug_bridges + normal_bridges == 1) {
> +		dev = list_first_entry(&bus->devices, struct pci_dev, bus_list);
> +		if (dev->subordinate)
> +			pci_bus_distribute_available_resources(dev->subordinate,
> +				add_list, io, mmio, mmio_pref);
> +		return;
> +	}
> +
> +	/*
> +	 * Reduce the available resource space by what the
> +	 * bridge and devices below it occupy.
> +	 */
>  	for_each_pci_bridge(dev, bus) {
> -		const struct resource *res;
> +		struct resource *res;
> +		resource_size_t used_size;

Here order these in "reverse christmas tree" like:

		resource_size_t used_size;
		struct resource *res;

>  
>  		if (dev->is_hotplug_bridge)
>  			continue;
>  
> -		/*
> -		 * Reduce the available resource space by what the
> -		 * bridge and devices below it occupy.
> -		 */
>  		res = &dev->resource[PCI_BRIDGE_RESOURCES + 0];
> -		if (!res->parent && available_io > resource_size(res))
> -			remaining_io -= resource_size(res);
> +		align = pci_resource_alignment(dev, res);
> +		align = align ? ALIGN(io.start, align) - io.start : 0;
> +		used_size = align + resource_size(res);
> +		if (!res->parent && used_size <= resource_size(&io))
> +			io.start += used_size;
>  
>  		res = &dev->resource[PCI_BRIDGE_RESOURCES + 1];
> -		if (!res->parent && available_mmio > resource_size(res))
> -			remaining_mmio -= resource_size(res);
> +		align = pci_resource_alignment(dev, res);
> +		align = align ? ALIGN(mmio.start, align) - mmio.start : 0;
> +		used_size = align + resource_size(res);
> +		if (!res->parent && used_size <= resource_size(&mmio))
> +			mmio.start += used_size;
>  
>  		res = &dev->resource[PCI_BRIDGE_RESOURCES + 2];
> -		if (!res->parent && available_mmio_pref > resource_size(res))
> -			remaining_mmio_pref -= resource_size(res);
> +		align = pci_resource_alignment(dev, res);
> +		align = align ? ALIGN(mmio_pref.start, align) -
> +				mmio_pref.start : 0;
> +		used_size = align + resource_size(res);
> +		if (!res->parent && used_size <= resource_size(&mmio_pref))
> +			mmio_pref.start += used_size;
>  	}
>  
> -	/*
> -	 * There is only one bridge on the bus so it gets all available
> -	 * resources which it can then distribute to the possible hotplug
> -	 * bridges below.
> -	 */
> -	if (hotplug_bridges + normal_bridges == 1) {
> -		dev = list_first_entry(&bus->devices, struct pci_dev, bus_list);
> -		if (dev->subordinate) {
> -			pci_bus_distribute_available_resources(dev->subordinate,
> -				add_list, available_io, available_mmio,
> -				available_mmio_pref);
> -		}
> +	if (!hotplug_bridges)
>  		return;
> -	}
>  
>  	/*
> -	 * Go over devices on this bus and distribute the remaining
> -	 * resource space between hotplug bridges.
> +	 * Distribute any remaining resources equally between
> +	 * the hotplug-capable downstream ports.
>  	 */
> -	for_each_pci_bridge(dev, bus) {
> -		resource_size_t align, io, mmio, mmio_pref;
> -		struct pci_bus *b;
> +	io_per_hp = div64_ul(resource_size(&io), hotplug_bridges);
> +	mmio_per_hp = div64_ul(resource_size(&mmio), hotplug_bridges);
> +	mmio_pref_per_hp = div64_ul(resource_size(&mmio_pref),
> +		hotplug_bridges);

Here also write it like:

	mmio_pref_per_hp = div64_ul(resource_size(&mmio_pref),
				    hotplug_bridges);

>  
> -		b = dev->subordinate;
> -		if (!b || !dev->is_hotplug_bridge)
> +	for_each_pci_bridge(dev, bus) {
> +		if (!dev->subordinate || !dev->is_hotplug_bridge)
>  			continue;
>  
> -		/*
> -		 * Distribute available extra resources equally between
> -		 * hotplug-capable downstream ports taking alignment into
> -		 * account.
> -		 *
> -		 * Here hotplug_bridges is always != 0.
> -		 */
> -		align = pci_resource_alignment(bridge, io_res);
> -		io = div64_ul(available_io, hotplug_bridges);
> -		io = min(ALIGN(io, align), remaining_io);
> -		remaining_io -= io;
> -
> -		align = pci_resource_alignment(bridge, mmio_res);
> -		mmio = div64_ul(available_mmio, hotplug_bridges);
> -		mmio = min(ALIGN(mmio, align), remaining_mmio);
> -		remaining_mmio -= mmio;
> +		io.end = io.start + io_per_hp - 1;
> +		mmio.end = mmio.start + mmio_per_hp - 1;
> +		mmio_pref.end = mmio_pref.start + mmio_pref_per_hp - 1;
>  
> -		align = pci_resource_alignment(bridge, mmio_pref_res);
> -		mmio_pref = div64_ul(available_mmio_pref, hotplug_bridges);
> -		mmio_pref = min(ALIGN(mmio_pref, align), remaining_mmio_pref);
> -		remaining_mmio_pref -= mmio_pref;
> +		pci_bus_distribute_available_resources(dev->subordinate,
> +			add_list, io, mmio, mmio_pref);

Ditto.

>  
> -		pci_bus_distribute_available_resources(b, add_list, io, mmio,
> -						       mmio_pref);
> +		io.start = io.end + 1;
> +		mmio.start = mmio.end + 1;
> +		mmio_pref.start = mmio_pref.end + 1;
>  	}
>  }
>  
>  static void pci_bridge_distribute_available_resources(struct pci_dev *bridge,
>  						     struct list_head *add_list)
>  {
> -	resource_size_t available_io, available_mmio, available_mmio_pref;
> -	const struct resource *res;
> +	struct resource io_res, mmio_res, mmio_pref_res;
>  
>  	if (!bridge->is_hotplug_bridge)
>  		return;
>  
> +	io_res = bridge->resource[PCI_BRIDGE_RESOURCES + 0];
> +	mmio_res = bridge->resource[PCI_BRIDGE_RESOURCES + 1];
> +	mmio_pref_res = bridge->resource[PCI_BRIDGE_RESOURCES + 2];
> +
>  	/* Take the initial extra resources from the hotplug port */
> -	res = &bridge->resource[PCI_BRIDGE_RESOURCES + 0];
> -	available_io = resource_size(res);
> -	res = &bridge->resource[PCI_BRIDGE_RESOURCES + 1];
> -	available_mmio = resource_size(res);
> -	res = &bridge->resource[PCI_BRIDGE_RESOURCES + 2];
> -	available_mmio_pref = resource_size(res);
>  
>  	pci_bus_distribute_available_resources(bridge->subordinate,
> -					       add_list, available_io,
> -					       available_mmio,
> -					       available_mmio_pref);
> +		add_list, io_res, mmio_res, mmio_pref_res);
>  }
>  
>  void pci_assign_unassigned_bridge_resources(struct pci_dev *bridge)
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 1/4] PCI: Consider alignment of hot-added bridges when distributing available resources
  2019-06-17  9:35   ` mika.westerberg
@ 2019-06-17 18:22     ` Bjorn Helgaas
  2019-06-19 13:47     ` Nicholas Johnson
  1 sibling, 0 replies; 12+ messages in thread
From: Bjorn Helgaas @ 2019-06-17 18:22 UTC (permalink / raw)
  To: mika.westerberg; +Cc: Nicholas Johnson, linux-kernel, linux-pci, corbet

On Mon, Jun 17, 2019 at 12:35:13PM +0300, mika.westerberg@linux.intel.com wrote:
> On Wed, May 22, 2019 at 02:30:44PM +0000, Nicholas Johnson wrote:
> > Rewrite pci_bus_distribute_available_resources to better handle bridges
> > with different resource alignment requirements. Pass more details
> > arguments recursively to track the resource start and end addresses
> > relative to the initial hotplug bridge. This is especially useful for
> > Thunderbolt with native PCI enumeration, enabling external graphics
> > cards and other devices with bridge alignment higher than 0x100000
>  
> Instead of 0x100000 you could say 1MB here.

And of course, 1MB is the minimum bridge window alignment.  I *guess*
this is actually talking about endpoints with BARs larger than 1MB,
which have to be aligned on their size.  This doesn't actually impose
any requirement on the bridge window alignment, as long as the bridge
window contains the endpoint BARs.

> > bytes.

> >  	for_each_pci_bridge(dev, bus) {
> > -		const struct resource *res;
> > +		struct resource *res;
> > +		resource_size_t used_size;
> 
> Here order these in "reverse christmas tree" like:
> 
> 		resource_size_t used_size;
> 		struct resource *res;

I actually don't enforce "reverse christmas tree", and when I write
code, I order the declarations in order of their use in the code
below, as Nicholas has done.  But either way is fine.

Bjorn

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 1/4] PCI: Consider alignment of hot-added bridges when distributing available resources
  2019-06-15 19:56   ` Bjorn Helgaas
  2019-06-17  9:21     ` mika.westerberg
@ 2019-06-19 13:40     ` Nicholas Johnson
  2019-06-19 21:53       ` Bjorn Helgaas
  1 sibling, 1 reply; 12+ messages in thread
From: Nicholas Johnson @ 2019-06-19 13:40 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-kernel, linux-pci, mika.westerberg, corbet

This time I am using reply-all "group=g" in Mutt instead of reply 
"reply=r". It looks like I have been doing this incorrectly for a while 
now. At least I have found out that git send-email uses terrible 
encoding and adds the patch as an attachment, so things might start to 
look up.

I copied the body of my previous reply into this reply so below the 
line, it should be exactly the same. I thank Bjorn for his patience 
here.
_______________________________________________________________________________


In addition to my responses to your concerns below, I would like to make 
an initial clarification: does the fact that you have not replied to 
patches 3 and 4 in the series indicate that you are happy with them, or 
have not looked at them yet?

On Sat, Jun 15, 2019 at 02:56:36PM -0500, Bjorn Helgaas wrote:
> Mika, this patch changes code you added in 1a5767725cec ("PCI:
> Distribute available resources to hotplug-capable bridges").  Is there
> any chance you could help review this?
> 
> On Wed, May 22, 2019 at 02:30:44PM +0000, Nicholas Johnson wrote:
> > Rewrite pci_bus_distribute_available_resources to better handle bridges
> > with different resource alignment requirements. Pass more details
> > arguments recursively to track the resource start and end addresses
> > relative to the initial hotplug bridge. This is especially useful for
> > Thunderbolt with native PCI enumeration, enabling external graphics
> > cards and other devices with bridge alignment higher than 0x100000
> > bytes.
> > 
> > Change extend_bridge_window to resize the actual resource, rather than
> > using add_list and dev_res->add_size. If an additional resource entry
> > exists for the given resource, zero out the add_size field to avoid it
> > interfering. Because add_size is considered optional when allocating,
> > using add_size could cause issues in some cases, because successful
> > resource distribution requires sizes to be guaranteed. Such cases
> > include hot-adding nested hotplug bridges in one enumeration, and
> > potentially others which are yet to be encountered.
> > 
> > Signed-off-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
> > ---
> >  drivers/pci/setup-bus.c | 169 ++++++++++++++++++++--------------------
> >  1 file changed, 84 insertions(+), 85 deletions(-)
> > 
> > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> > index 0cdd5ff38..1b5b851ca 100644
> > --- a/drivers/pci/setup-bus.c
> > +++ b/drivers/pci/setup-bus.c
> > @@ -1835,12 +1835,10 @@ static void extend_bridge_window(struct pci_dev *bridge, struct resource *res,
> >  }
> >  
> >  static void pci_bus_distribute_available_resources(struct pci_bus *bus,
> > -					    struct list_head *add_list,
> > -					    resource_size_t available_io,
> > -					    resource_size_t available_mmio,
> > -					    resource_size_t available_mmio_pref)
> > +	struct list_head *add_list, struct resource io,
> > +	struct resource mmio, struct resource mmio_pref)
> 
> Follow the parameter indentation style of the rest of the file.
I will look at this. I will admit that handling line overflows has been 
tricky for me to get a handle on.

> 
> >  {
> > -	resource_size_t remaining_io, remaining_mmio, remaining_mmio_pref;
> > +	resource_size_t io_per_hp, mmio_per_hp, mmio_pref_per_hp, align;
> >  	unsigned int normal_bridges = 0, hotplug_bridges = 0;
> >  	struct resource *io_res, *mmio_res, *mmio_pref_res;
> >  	struct pci_dev *dev, *bridge = bus->self;
> > @@ -1850,29 +1848,36 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
> >  	mmio_pref_res = &bridge->resource[PCI_BRIDGE_RESOURCES + 2];
> >  
> >  	/*
> > -	 * Update additional resource list (add_list) to fill all the
> > -	 * extra resource space available for this port except the space
> > -	 * calculated in __pci_bus_size_bridges() which covers all the
> > -	 * devices currently connected to the port and below.
> > +	 * The alignment of this bridge is yet to be considered, hence it must
> > +	 * be done now before extending its bridge window. A single bridge
> > +	 * might not be able to occupy the whole parent region if the alignment
> > +	 * differs - for example, an external GPU at the end of a Thunderbolt
> > +	 * daisy chain.
> 
> The example seems needlessly specific.  There isn't anything GPU- or
> Thunderbolt-specific about this, is there?
> 
> Bridge windows can be aligned to any multiple of 1MB.  But a device
> BAR must be aligned on its size, so any BAR larger than 1MB should be
> able to cause this, e.g.,
> 
>   [mem 0x100000-0x3fffff] (bridge A 3MB window)
>     [mem 0x200000-0x3fffff] (bridge B 2MB window)
>       [mem 0x200000-0x3fffff] (device 2MB BAR)
> 
> Bridge B *could* occupy the the entire 3MB parent region, but it
> doesn't need to.  But you say it "might not be *able* to", so maybe
> you're thinking of something different?
Under some circumstances it may be possible to occupy the entire region, 
but not always. If the start address of the entire parent region is not 
aligned to the boundary required by the child then the start address of 
the child needs to be bumped up to the next boundary, leaving some 
unused space. In Mika's take on this, he suggested that I just remove 
this comment. I agreed. Does this solve the issue in your mind?

> 
> > -	extend_bridge_window(bridge, io_res, add_list, available_io);
> > -	extend_bridge_window(bridge, mmio_res, add_list, available_mmio);
> > -	extend_bridge_window(bridge, mmio_pref_res, add_list,
> > -			     available_mmio_pref);
> > +	align = pci_resource_alignment(bridge, io_res);
> > +	if (!io_res->parent && align)
> > +		io.start = ALIGN(io.start, align);
> > +
> > +	align = pci_resource_alignment(bridge, mmio_res);
> > +	if (!mmio_res->parent && align)
> > +		mmio.start = ALIGN(mmio.start, align);
> > +
> > +	align = pci_resource_alignment(bridge, mmio_pref_res);
> > +	if (!mmio_pref_res->parent && align)
> > +		mmio_pref.start = ALIGN(mmio_pref.start, align);
> >  
> >  	/*
> > -	 * Calculate the total amount of extra resource space we can
> > -	 * pass to bridges below this one.  This is basically the
> > -	 * extra space reduced by the minimal required space for the
> > -	 * non-hotplug bridges.
> > +	 * Update the resources to fill as much remaining resource space in the
> > +	 * parent bridge as possible, while considering alignment.
> >  	 */
> > -	remaining_io = available_io;
> > -	remaining_mmio = available_mmio;
> > -	remaining_mmio_pref = available_mmio_pref;
> > +	extend_bridge_window(bridge, io_res, add_list, resource_size(&io));
> > +	extend_bridge_window(bridge, mmio_res, add_list, resource_size(&mmio));
> > +	extend_bridge_window(bridge, mmio_pref_res, add_list,
> > +		resource_size(&mmio_pref));
> >  
> >  	/*
> >  	 * Calculate how many hotplug bridges and normal bridges there
> > -	 * are on this bus.  We will distribute the additional available
> > +	 * are on this bus. We will distribute the additional available
> 
> This whitespace change is pointless and distracting.
Okay. I will read this as "remove it".

> 
> >  	 * resources between hotplug bridges.
> >  	 */
> >  	for_each_pci_bridge(dev, bus) {
> > @@ -1882,104 +1887,98 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
> >  			normal_bridges++;
> >  	}
> >  
> > +	/*
> > +	 * There is only one bridge on the bus so it gets all possible
> > +	 * resources which it can then distribute to the possible
> > +	 * hotplug bridges below.
> > +	 */
> > +	if (hotplug_bridges + normal_bridges == 1) {
> > +		dev = list_first_entry(&bus->devices, struct pci_dev, bus_list);
> > +		if (dev->subordinate)
> > +			pci_bus_distribute_available_resources(dev->subordinate,
> > +				add_list, io, mmio, mmio_pref);
> > +		return;
> > +	}
> 
> Moving this "single bridge" case up makes sense, and I think it could
> be done by a separate patch preceding this one.  Mika, I remember some
> discussion about this case, but I can't remember if there's some
> reason you didn't do this initially.
> 
> The current code is:
> 
>   for_each_pci_bridge(dev, bus)
>     # compute hotplug_bridges, normal_bridges
> 
>   for_each_pci_bridge(dev, bus)
>     # compute remaining_io, etc
> 
>   if (hotplug_bridges + normal_bridges == 1)
>     # handle single bridge case
> 
>   for_each_pci_bridge(dev, bus)
>     # use remaining_io, etc here
> 
> AFAICT the single bridge case has no dependency on the remaining_io
> computation.
If you want another patch then please explicitly request it with how you 
want it to be done. In the past I have misinterpreted some of your 
requests as suggestions, and other times have gone off with a 
misinterpretation and produced undesirable patches. I want to be sure 
this time.

My take on this is that although the patch is based on the skeleton of 
Mika's existing functions, it is not a minor change, which does not 
leave a lot of lines the same. So re-ordering the existing code in a 
preliminary patch may not reduce the number of diff lines in this.

Regardless, I will happily make the change if asked, but I will need a 
request with little room for misinterpretation.

My current interpretation is to make a preliminary patch that does 
nothing except move the single bridge loop up one.

> 
> > +	/*
> > +	 * Reduce the available resource space by what the
> > +	 * bridge and devices below it occupy.
> > +	 */
> >  	for_each_pci_bridge(dev, bus) {
> > -		const struct resource *res;
> > +		struct resource *res;
> > +		resource_size_t used_size;
> >  
> >  		if (dev->is_hotplug_bridge)
> >  			continue;
> >  
> > -		/*
> > -		 * Reduce the available resource space by what the
> > -		 * bridge and devices below it occupy.
> > -		 */
> >  		res = &dev->resource[PCI_BRIDGE_RESOURCES + 0];
> > -		if (!res->parent && available_io > resource_size(res))
> > -			remaining_io -= resource_size(res);
> > +		align = pci_resource_alignment(dev, res);
> > +		align = align ? ALIGN(io.start, align) - io.start : 0;
> > +		used_size = align + resource_size(res);
> > +		if (!res->parent && used_size <= resource_size(&io))
> > +			io.start += used_size;
> >  
> >  		res = &dev->resource[PCI_BRIDGE_RESOURCES + 1];
> > -		if (!res->parent && available_mmio > resource_size(res))
> > -			remaining_mmio -= resource_size(res);
> > +		align = pci_resource_alignment(dev, res);
> > +		align = align ? ALIGN(mmio.start, align) - mmio.start : 0;
> > +		used_size = align + resource_size(res);
> > +		if (!res->parent && used_size <= resource_size(&mmio))
> > +			mmio.start += used_size;
> >  
> >  		res = &dev->resource[PCI_BRIDGE_RESOURCES + 2];
> > -		if (!res->parent && available_mmio_pref > resource_size(res))
> > -			remaining_mmio_pref -= resource_size(res);
> > +		align = pci_resource_alignment(dev, res);
> > +		align = align ? ALIGN(mmio_pref.start, align) -
> > +				mmio_pref.start : 0;
> > +		used_size = align + resource_size(res);
> > +		if (!res->parent && used_size <= resource_size(&mmio_pref))
> > +			mmio_pref.start += used_size;
> >  	}
> >  
> > -	/*
> > -	 * There is only one bridge on the bus so it gets all available
> > -	 * resources which it can then distribute to the possible hotplug
> > -	 * bridges below.
> > -	 */
> > -	if (hotplug_bridges + normal_bridges == 1) {
> > -		dev = list_first_entry(&bus->devices, struct pci_dev, bus_list);
> > -		if (dev->subordinate) {
> > -			pci_bus_distribute_available_resources(dev->subordinate,
> > -				add_list, available_io, available_mmio,
> > -				available_mmio_pref);
> > -		}
> > +	if (!hotplug_bridges)
> >  		return;
> 
> I like the addition of this early return when there are no hotplug
> bridges.  The following loop is a no-op if there are no hotplug
> bridges, so it doesn't *fix* anything, but it does make it more
> obvious that we don't even have to bother with the loop at all, and
> it makes the "Here hotplug_bridges is always != 0" comment
> unnecessary.
It fixes the division by zero on the next line of code, but I love to 
shoot two birds with one stone. Moving the div64_ul calls out of the 
loop is the price for the simplification of the loop.

> 
> I think this could be done in a separate patch before this one, too.
> Anything we can do to simplify these patches is a win because the code
> is so complicated.
Again, please provide guidance on what exactly you wish to be separated, 
and whether the patch should be before or after the other preliminary 
patch to move the loop up by one.

My initial interpretation is that you want me to add the following in 
Mika's code before his equivalent loop as a two line patch:

if (!hotplug_bridges)
	return;

Should I take out the following comment at the same time?
"Here hotplug_bridges is always != 0."

> 
> >  	/*
> > -	 * Go over devices on this bus and distribute the remaining
> > -	 * resource space between hotplug bridges.
> > +	 * Distribute any remaining resources equally between
> > +	 * the hotplug-capable downstream ports.
> >  	 */
> > -	for_each_pci_bridge(dev, bus) {
> > -		resource_size_t align, io, mmio, mmio_pref;
> > -		struct pci_bus *b;
> > +	io_per_hp = div64_ul(resource_size(&io), hotplug_bridges);
> > +	mmio_per_hp = div64_ul(resource_size(&mmio), hotplug_bridges);
> > +	mmio_pref_per_hp = div64_ul(resource_size(&mmio_pref),
> > +		hotplug_bridges);
> >  
> > -		b = dev->subordinate;
> > -		if (!b || !dev->is_hotplug_bridge)
> > +	for_each_pci_bridge(dev, bus) {
> > +		if (!dev->subordinate || !dev->is_hotplug_bridge)
> >  			continue;
> >  
> > -		/*
> > -		 * Distribute available extra resources equally between
> > -		 * hotplug-capable downstream ports taking alignment into
> > -		 * account.
> > -		 *
> > -		 * Here hotplug_bridges is always != 0.
> > -		 */
> > -		align = pci_resource_alignment(bridge, io_res);
> > -		io = div64_ul(available_io, hotplug_bridges);
> > -		io = min(ALIGN(io, align), remaining_io);
> > -		remaining_io -= io;
> > -
> > -		align = pci_resource_alignment(bridge, mmio_res);
> > -		mmio = div64_ul(available_mmio, hotplug_bridges);
> > -		mmio = min(ALIGN(mmio, align), remaining_mmio);
> > -		remaining_mmio -= mmio;
> > +		io.end = io.start + io_per_hp - 1;
> > +		mmio.end = mmio.start + mmio_per_hp - 1;
> > +		mmio_pref.end = mmio_pref.start + mmio_pref_per_hp - 1;
> >  
> > -		align = pci_resource_alignment(bridge, mmio_pref_res);
> > -		mmio_pref = div64_ul(available_mmio_pref, hotplug_bridges);
> > -		mmio_pref = min(ALIGN(mmio_pref, align), remaining_mmio_pref);
> > -		remaining_mmio_pref -= mmio_pref;
> > +		pci_bus_distribute_available_resources(dev->subordinate,
> > +			add_list, io, mmio, mmio_pref);
> >  
> > -		pci_bus_distribute_available_resources(b, add_list, io, mmio,
> > -						       mmio_pref);
> > +		io.start = io.end + 1;
> > +		mmio.start = mmio.end + 1;
> > +		mmio_pref.start = mmio_pref.end + 1;
> >  	}
> 
> I like the simplification of this loop.
Thanks.

> 
> >  }
> >  
> >  static void pci_bridge_distribute_available_resources(struct pci_dev *bridge,
> >  						     struct list_head *add_list)
> >  {
> > -	resource_size_t available_io, available_mmio, available_mmio_pref;
> > -	const struct resource *res;
> > +	struct resource io_res, mmio_res, mmio_pref_res;
> >  
> >  	if (!bridge->is_hotplug_bridge)
> >  		return;
> >  
> > +	io_res = bridge->resource[PCI_BRIDGE_RESOURCES + 0];
> > +	mmio_res = bridge->resource[PCI_BRIDGE_RESOURCES + 1];
> > +	mmio_pref_res = bridge->resource[PCI_BRIDGE_RESOURCES + 2];
> > +
> >  	/* Take the initial extra resources from the hotplug port */
> > -	res = &bridge->resource[PCI_BRIDGE_RESOURCES + 0];
> > -	available_io = resource_size(res);
> > -	res = &bridge->resource[PCI_BRIDGE_RESOURCES + 1];
> > -	available_mmio = resource_size(res);
> > -	res = &bridge->resource[PCI_BRIDGE_RESOURCES + 2];
> > -	available_mmio_pref = resource_size(res);
> >  
> >  	pci_bus_distribute_available_resources(bridge->subordinate,
> > -					       add_list, available_io,
> > -					       available_mmio,
> > -					       available_mmio_pref);
> > +		add_list, io_res, mmio_res, mmio_pref_res);
> >  }
> >  
> >  void pci_assign_unassigned_bridge_resources(struct pci_dev *bridge)
> > -- 
> > 2.20.1
> > 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 1/4] PCI: Consider alignment of hot-added bridges when distributing available resources
  2019-06-17  9:35   ` mika.westerberg
  2019-06-17 18:22     ` Bjorn Helgaas
@ 2019-06-19 13:47     ` Nicholas Johnson
  1 sibling, 0 replies; 12+ messages in thread
From: Nicholas Johnson @ 2019-06-19 13:47 UTC (permalink / raw)
  To: mika.westerberg; +Cc: linux-kernel, linux-pci, bhelgaas, corbet

Again, I am re-sending my previous response with reply-all instead of 
plain reply. Mika, you can ignore this as you have already seen it. For 
everybody else, everything below the line should be exactly the same as 
I copied it over. 
_______________________________________________________________________________


On Mon, Jun 17, 2019 at 12:35:13PM +0300, mika.westerberg@linux.intel.com wrote:
> On Wed, May 22, 2019 at 02:30:44PM +0000, Nicholas Johnson wrote:
> > Rewrite pci_bus_distribute_available_resources to better handle bridges
> > with different resource alignment requirements. Pass more details
> > arguments recursively to track the resource start and end addresses
> > relative to the initial hotplug bridge. This is especially useful for
> > Thunderbolt with native PCI enumeration, enabling external graphics
> > cards and other devices with bridge alignment higher than 0x100000
>  
> Instead of 0x100000 you could say 1MB here.
My train of thought is 1MB is sometimes means base-10 but then again, 
that is probably outside of the kernel world. I can change it to 1MB.

> 
> > bytes.
> > 
> > Change extend_bridge_window to resize the actual resource, rather than
> > using add_list and dev_res->add_size. If an additional resource entry
> > exists for the given resource, zero out the add_size field to avoid it
> > interfering. Because add_size is considered optional when allocating,
> > using add_size could cause issues in some cases, because successful
> > resource distribution requires sizes to be guaranteed. Such cases
> > include hot-adding nested hotplug bridges in one enumeration, and
> > potentially others which are yet to be encountered.
> > 
> > Signed-off-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
> 
> The logic makes sense to me but since you probably need to spin another
> revision anyway please find a couple of additional comments below.
> 
> > ---
> >  drivers/pci/setup-bus.c | 169 ++++++++++++++++++++--------------------
> >  1 file changed, 84 insertions(+), 85 deletions(-)
> > 
> > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> > index 0cdd5ff38..1b5b851ca 100644
> > --- a/drivers/pci/setup-bus.c
> > +++ b/drivers/pci/setup-bus.c
> > @@ -1835,12 +1835,10 @@ static void extend_bridge_window(struct pci_dev *bridge, struct resource *res,
> >  }
> >  
> >  static void pci_bus_distribute_available_resources(struct pci_bus *bus,
> > -					    struct list_head *add_list,
> > -					    resource_size_t available_io,
> > -					    resource_size_t available_mmio,
> > -					    resource_size_t available_mmio_pref)
> > +	struct list_head *add_list, struct resource io,
> > +	struct resource mmio, struct resource mmio_pref)
> >  {
> > -	resource_size_t remaining_io, remaining_mmio, remaining_mmio_pref;
> > +	resource_size_t io_per_hp, mmio_per_hp, mmio_pref_per_hp, align;
> >  	unsigned int normal_bridges = 0, hotplug_bridges = 0;
> >  	struct resource *io_res, *mmio_res, *mmio_pref_res;
> >  	struct pci_dev *dev, *bridge = bus->self;
> > @@ -1850,29 +1848,36 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
> >  	mmio_pref_res = &bridge->resource[PCI_BRIDGE_RESOURCES + 2];
> >  
> >  	/*
> > -	 * Update additional resource list (add_list) to fill all the
> > -	 * extra resource space available for this port except the space
> > -	 * calculated in __pci_bus_size_bridges() which covers all the
> > -	 * devices currently connected to the port and below.
> > +	 * The alignment of this bridge is yet to be considered, hence it must
> > +	 * be done now before extending its bridge window. A single bridge
> > +	 * might not be able to occupy the whole parent region if the alignment
> > +	 * differs - for example, an external GPU at the end of a Thunderbolt
> > +	 * daisy chain.
> 
> As Bjorn also commented there is nothing Thunderbolt specific here so I
> would leave it out of the comment because it is kind of confusing.
Okay, I can remove "A single bridge.... daisy chain".
Please correct me if that is not what you meant.

> 
> >  	 */
> > -	extend_bridge_window(bridge, io_res, add_list, available_io);
> > -	extend_bridge_window(bridge, mmio_res, add_list, available_mmio);
> > -	extend_bridge_window(bridge, mmio_pref_res, add_list,
> > -			     available_mmio_pref);
> > +	align = pci_resource_alignment(bridge, io_res);
> > +	if (!io_res->parent && align)
> > +		io.start = ALIGN(io.start, align);
> > +
> > +	align = pci_resource_alignment(bridge, mmio_res);
> > +	if (!mmio_res->parent && align)
> > +		mmio.start = ALIGN(mmio.start, align);
> > +
> > +	align = pci_resource_alignment(bridge, mmio_pref_res);
> > +	if (!mmio_pref_res->parent && align)
> > +		mmio_pref.start = ALIGN(mmio_pref.start, align);
> >  
> >  	/*
> > -	 * Calculate the total amount of extra resource space we can
> > -	 * pass to bridges below this one.  This is basically the
> > -	 * extra space reduced by the minimal required space for the
> > -	 * non-hotplug bridges.
> > +	 * Update the resources to fill as much remaining resource space in the
> > +	 * parent bridge as possible, while considering alignment.
> >  	 */
> > -	remaining_io = available_io;
> > -	remaining_mmio = available_mmio;
> > -	remaining_mmio_pref = available_mmio_pref;
> > +	extend_bridge_window(bridge, io_res, add_list, resource_size(&io));
> > +	extend_bridge_window(bridge, mmio_res, add_list, resource_size(&mmio));
> > +	extend_bridge_window(bridge, mmio_pref_res, add_list,
> > +		resource_size(&mmio_pref));
> 
> Please write it like it was originally (e.g align the second line to the
> opening bracket):
> 
> 	extend_bridge_window(bridge, mmio_pref_res, add_list, resource_size(&mmio_pref));
Can do. The style of handling line overflows is still throwing me.

> 
> >  
> >  	/*
> >  	 * Calculate how many hotplug bridges and normal bridges there
> > -	 * are on this bus.  We will distribute the additional available
> > +	 * are on this bus. We will distribute the additional available
> >  	 * resources between hotplug bridges.
> >  	 */
> >  	for_each_pci_bridge(dev, bus) {
> > @@ -1882,104 +1887,98 @@ static void pci_bus_distribute_available_resources(struct pci_bus *bus,
> >  			normal_bridges++;
> >  	}
> >  
> > +	/*
> > +	 * There is only one bridge on the bus so it gets all possible
> > +	 * resources which it can then distribute to the possible
> > +	 * hotplug bridges below.
> > +	 */
> > +	if (hotplug_bridges + normal_bridges == 1) {
> > +		dev = list_first_entry(&bus->devices, struct pci_dev, bus_list);
> > +		if (dev->subordinate)
> > +			pci_bus_distribute_available_resources(dev->subordinate,
> > +				add_list, io, mmio, mmio_pref);
> > +		return;
> > +	}
> > +
> > +	/*
> > +	 * Reduce the available resource space by what the
> > +	 * bridge and devices below it occupy.
> > +	 */
> >  	for_each_pci_bridge(dev, bus) {
> > -		const struct resource *res;
> > +		struct resource *res;
> > +		resource_size_t used_size;
> 
> Here order these in "reverse christmas tree" like:
> 
> 		resource_size_t used_size;
> 		struct resource *res;
That seems reasonable.

> 
> >  
> >  		if (dev->is_hotplug_bridge)
> >  			continue;
> >  
> > -		/*
> > -		 * Reduce the available resource space by what the
> > -		 * bridge and devices below it occupy.
> > -		 */
> >  		res = &dev->resource[PCI_BRIDGE_RESOURCES + 0];
> > -		if (!res->parent && available_io > resource_size(res))
> > -			remaining_io -= resource_size(res);
> > +		align = pci_resource_alignment(dev, res);
> > +		align = align ? ALIGN(io.start, align) - io.start : 0;
> > +		used_size = align + resource_size(res);
> > +		if (!res->parent && used_size <= resource_size(&io))
> > +			io.start += used_size;
> >  
> >  		res = &dev->resource[PCI_BRIDGE_RESOURCES + 1];
> > -		if (!res->parent && available_mmio > resource_size(res))
> > -			remaining_mmio -= resource_size(res);
> > +		align = pci_resource_alignment(dev, res);
> > +		align = align ? ALIGN(mmio.start, align) - mmio.start : 0;
> > +		used_size = align + resource_size(res);
> > +		if (!res->parent && used_size <= resource_size(&mmio))
> > +			mmio.start += used_size;
> >  
> >  		res = &dev->resource[PCI_BRIDGE_RESOURCES + 2];
> > -		if (!res->parent && available_mmio_pref > resource_size(res))
> > -			remaining_mmio_pref -= resource_size(res);
> > +		align = pci_resource_alignment(dev, res);
> > +		align = align ? ALIGN(mmio_pref.start, align) -
> > +				mmio_pref.start : 0;
> > +		used_size = align + resource_size(res);
> > +		if (!res->parent && used_size <= resource_size(&mmio_pref))
> > +			mmio_pref.start += used_size;
> >  	}
> >  
> > -	/*
> > -	 * There is only one bridge on the bus so it gets all available
> > -	 * resources which it can then distribute to the possible hotplug
> > -	 * bridges below.
> > -	 */
> > -	if (hotplug_bridges + normal_bridges == 1) {
> > -		dev = list_first_entry(&bus->devices, struct pci_dev, bus_list);
> > -		if (dev->subordinate) {
> > -			pci_bus_distribute_available_resources(dev->subordinate,
> > -				add_list, available_io, available_mmio,
> > -				available_mmio_pref);
> > -		}
> > +	if (!hotplug_bridges)
> >  		return;
> > -	}
> >  
> >  	/*
> > -	 * Go over devices on this bus and distribute the remaining
> > -	 * resource space between hotplug bridges.
> > +	 * Distribute any remaining resources equally between
> > +	 * the hotplug-capable downstream ports.
> >  	 */
> > -	for_each_pci_bridge(dev, bus) {
> > -		resource_size_t align, io, mmio, mmio_pref;
> > -		struct pci_bus *b;
> > +	io_per_hp = div64_ul(resource_size(&io), hotplug_bridges);
> > +	mmio_per_hp = div64_ul(resource_size(&mmio), hotplug_bridges);
> > +	mmio_pref_per_hp = div64_ul(resource_size(&mmio_pref),
> > +		hotplug_bridges);
> 
> Here also write it like:
> 
> 	mmio_pref_per_hp = div64_ul(resource_size(&mmio_pref),
> 				    hotplug_bridges);
Okay.

> 
> >  
> > -		b = dev->subordinate;
> > -		if (!b || !dev->is_hotplug_bridge)
> > +	for_each_pci_bridge(dev, bus) {
> > +		if (!dev->subordinate || !dev->is_hotplug_bridge)
> >  			continue;
> >  
> > -		/*
> > -		 * Distribute available extra resources equally between
> > -		 * hotplug-capable downstream ports taking alignment into
> > -		 * account.
> > -		 *
> > -		 * Here hotplug_bridges is always != 0.
> > -		 */
> > -		align = pci_resource_alignment(bridge, io_res);
> > -		io = div64_ul(available_io, hotplug_bridges);
> > -		io = min(ALIGN(io, align), remaining_io);
> > -		remaining_io -= io;
> > -
> > -		align = pci_resource_alignment(bridge, mmio_res);
> > -		mmio = div64_ul(available_mmio, hotplug_bridges);
> > -		mmio = min(ALIGN(mmio, align), remaining_mmio);
> > -		remaining_mmio -= mmio;
> > +		io.end = io.start + io_per_hp - 1;
> > +		mmio.end = mmio.start + mmio_per_hp - 1;
> > +		mmio_pref.end = mmio_pref.start + mmio_pref_per_hp - 1;
> >  
> > -		align = pci_resource_alignment(bridge, mmio_pref_res);
> > -		mmio_pref = div64_ul(available_mmio_pref, hotplug_bridges);
> > -		mmio_pref = min(ALIGN(mmio_pref, align), remaining_mmio_pref);
> > -		remaining_mmio_pref -= mmio_pref;
> > +		pci_bus_distribute_available_resources(dev->subordinate,
> > +			add_list, io, mmio, mmio_pref);
> 
> Ditto.
Okay.

> 
> >  
> > -		pci_bus_distribute_available_resources(b, add_list, io, mmio,
> > -						       mmio_pref);
> > +		io.start = io.end + 1;
> > +		mmio.start = mmio.end + 1;
> > +		mmio_pref.start = mmio_pref.end + 1;
> >  	}
> >  }
> >  
> >  static void pci_bridge_distribute_available_resources(struct pci_dev *bridge,
> >  						     struct list_head *add_list)
> >  {
> > -	resource_size_t available_io, available_mmio, available_mmio_pref;
> > -	const struct resource *res;
> > +	struct resource io_res, mmio_res, mmio_pref_res;
> >  
> >  	if (!bridge->is_hotplug_bridge)
> >  		return;
> >  
> > +	io_res = bridge->resource[PCI_BRIDGE_RESOURCES + 0];
> > +	mmio_res = bridge->resource[PCI_BRIDGE_RESOURCES + 1];
> > +	mmio_pref_res = bridge->resource[PCI_BRIDGE_RESOURCES + 2];
> > +
> >  	/* Take the initial extra resources from the hotplug port */
> > -	res = &bridge->resource[PCI_BRIDGE_RESOURCES + 0];
> > -	available_io = resource_size(res);
> > -	res = &bridge->resource[PCI_BRIDGE_RESOURCES + 1];
> > -	available_mmio = resource_size(res);
> > -	res = &bridge->resource[PCI_BRIDGE_RESOURCES + 2];
> > -	available_mmio_pref = resource_size(res);
> >  
> >  	pci_bus_distribute_available_resources(bridge->subordinate,
> > -					       add_list, available_io,
> > -					       available_mmio,
> > -					       available_mmio_pref);
> > +		add_list, io_res, mmio_res, mmio_pref_res);
> >  }
> >  
> >  void pci_assign_unassigned_bridge_resources(struct pci_dev *bridge)
> > -- 
> > 2.20.1
> > 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 1/4] PCI: Consider alignment of hot-added bridges when distributing available resources
  2019-06-19 13:40     ` Nicholas Johnson
@ 2019-06-19 21:53       ` Bjorn Helgaas
  0 siblings, 0 replies; 12+ messages in thread
From: Bjorn Helgaas @ 2019-06-19 21:53 UTC (permalink / raw)
  To: Nicholas Johnson; +Cc: linux-kernel, linux-pci, mika.westerberg, corbet

On Wed, Jun 19, 2019 at 01:40:50PM +0000, Nicholas Johnson wrote:
> At least I have found out that git send-email uses terrible 
> encoding and adds the patch as an attachment

This must be configurable somehow because many people use git
send-email to send plain-text patches.

> On Sat, Jun 15, 2019 at 02:56:36PM -0500, Bjorn Helgaas wrote:
> > On Wed, May 22, 2019 at 02:30:44PM +0000, Nicholas Johnson wrote:

> > > +	 * The alignment of this bridge is yet to be considered, hence it must
> > > +	 * be done now before extending its bridge window. A single bridge
> > > +	 * might not be able to occupy the whole parent region if the alignment
> > > +	 * differs - for example, an external GPU at the end of a Thunderbolt
> > > +	 * daisy chain.
> > 
> > The example seems needlessly specific.  There isn't anything GPU- or
> > Thunderbolt-specific about this, is there?
> > 
> > Bridge windows can be aligned to any multiple of 1MB.  But a device
> > BAR must be aligned on its size, so any BAR larger than 1MB should be
> > able to cause this, e.g.,
> > 
> >   [mem 0x100000-0x3fffff] (bridge A 3MB window)
> >     [mem 0x200000-0x3fffff] (bridge B 2MB window)
> >       [mem 0x200000-0x3fffff] (device 2MB BAR)
> > 
> > Bridge B *could* occupy the the entire 3MB parent region, but it
> > doesn't need to.  But you say it "might not be *able* to", so maybe
> > you're thinking of something different?
>
> Under some circumstances it may be possible to occupy the entire region, 
> but not always. If the start address of the entire parent region is not 
> aligned to the boundary required by the child then the start address of 
> the child needs to be bumped up to the next boundary, leaving some 
> unused space. In Mika's take on this, he suggested that I just remove 
> this comment. I agreed. Does this solve the issue in your mind?

"The alignment of this bridge ..." sentence might be useful.  But I
don't think the "A single bridge ..." sentence makes sense.  Even if
the parent region isn't sufficiently aligned for descendents of the
child, the child's window *could* start at the parent's start address.
Then some space routed to the child's secondary would be unused.

Maybe when you said the bridge "might not be able to occupy" you meant
that not all the space would be usable by descendants of the child.  I
certainly agree with that.  I read it as "the child's window might not
be able to consume the whole parent window", which is not true.

> > > -	 * are on this bus.  We will distribute the additional available
> > > +	 * are on this bus. We will distribute the additional available
> > 
> > This whitespace change is pointless and distracting.
>
> Okay. I will read this as "remove it".

Yes, please.  We always try to minimize the size of a patch, so it's
worthwhile to read the patch before sending it.

> > Moving this "single bridge" case up makes sense, and I think it could
> > be done by a separate patch preceding this one.  Mika, I remember some
> > discussion about this case, but I can't remember if there's some
> > reason you didn't do this initially.
> > 
> > The current code is:
> > 
> >   for_each_pci_bridge(dev, bus)
> >     # compute hotplug_bridges, normal_bridges
> > 
> >   for_each_pci_bridge(dev, bus)
> >     # compute remaining_io, etc
> > 
> >   if (hotplug_bridges + normal_bridges == 1)
> >     # handle single bridge case
> > 
> >   for_each_pci_bridge(dev, bus)
> >     # use remaining_io, etc here
> > 
> > AFAICT the single bridge case has no dependency on the remaining_io
> > computation.
>
> If you want another patch then please explicitly request it with how you 
> want it to be done.

Yes, please make it a separate patch.  The advantages are:

  - each patch becomes smaller
  - each patch is easier to review by itself
  - if the series causes a problem, bisection gives a more specific
    result
  - if we need to revert a patch, we may be able to keep the other
    patches

> > > +	if (!hotplug_bridges)
> > >  		return;
> > 
> > I like the addition of this early return when there are no hotplug
> > bridges.  The following loop is a no-op if there are no hotplug
> > bridges, so it doesn't *fix* anything, but it does make it more
> > obvious that we don't even have to bother with the loop at all, and
> > it makes the "Here hotplug_bridges is always != 0" comment
> > unnecessary.
>
> It fixes the division by zero on the next line of code, but I love to 
> shoot two birds with one stone. Moving the div64_ul calls out of the 
> loop is the price for the simplification of the loop.
> 
> > I think this could be done in a separate patch before this one, too.
> > Anything we can do to simplify these patches is a win because the code
> > is so complicated.
>
> Again, please provide guidance on what exactly you wish to be separated, 
> and whether the patch should be before or after the other preliminary 
> patch to move the loop up by one.
> 
> My initial interpretation is that you want me to add the following in 
> Mika's code before his equivalent loop as a two line patch:
> 
> if (!hotplug_bridges)
> 	return;
> 
> Should I take out the following comment at the same time?
> "Here hotplug_bridges is always != 0."

Yes, please make this a separate patch that adds the
"if (!hotplug_bridges)" and removes the comment.  It doesn't really
matter whether this is before or after the other preliminary patch,
since they are unrelated.

Bjorn

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-06-19 21:53 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20190522222928.2964-1-nicholas.johnson-opensource@outlook.com.au>
2019-05-22 14:30 ` [PATCH v6 1/4] PCI: Consider alignment of hot-added bridges when distributing available resources Nicholas Johnson
2019-06-15 19:56   ` Bjorn Helgaas
2019-06-17  9:21     ` mika.westerberg
2019-06-19 13:40     ` Nicholas Johnson
2019-06-19 21:53       ` Bjorn Helgaas
2019-06-17  9:35   ` mika.westerberg
2019-06-17 18:22     ` Bjorn Helgaas
2019-06-19 13:47     ` Nicholas Johnson
2019-05-22 14:30 ` [PATCH v6 2/4] PCI: Modify extend_bridge_window() to set resource size directly Nicholas Johnson
2019-06-15 19:57   ` Bjorn Helgaas
2019-05-22 14:31 ` [PATCH v6 3/4] PCI: Fix bug resulting in double hpmemsize being assigned to MMIO window Nicholas Johnson
2019-05-22 14:31 ` [PATCH v6 4/4] PCI: Add pci=hpmemprefsize parameter to set MMIO_PREF size independently Nicholas Johnson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).