All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/4] PCI: Rework pci_scan_slot() and isolated PCI functions
@ 2022-04-19 10:27 Niklas Schnelle
  2022-04-19 10:28 ` [PATCH v3 1/4] PCI: Clean up pci_scan_slot() Niklas Schnelle
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Niklas Schnelle @ 2022-04-19 10:27 UTC (permalink / raw)
  To: Bjorn Helgaas, Jan Kiszka, Matthew Rosato, Pierre Morel
  Cc: linux-kernel, virtualization, linux-s390, linux-pci

Hi Bjorn, Hi Jan,

In an earlier version[0], I sought to apply the existing jailhouse special case
for isolated PCI functions to s390. As Bjorn noted in[1] there appears to be
some potential for cleaning things up and removing duplication though.

This series attempts to do this cleanup (Patches 1 and 2) followed by enabling
isolated PCI functions for s390 (Patches 3 and 4). If need be I can of course
split the cleanup off but for now I kept it as one as that's what I have
been testing.

Thanks,
Niklas

Changes v2 -> v3:
- Removed now unused nr_devs variable (kernel test robot)

[0] https://lore.kernel.org/linux-pci/20220404095346.2324666-1-schnelle@linux.ibm.com/
[1] https://lore.kernel.org/linux-pci/20220408224514.GA353445@bhelgaas/
 
Niklas Schnelle (4):
  PCI: Clean up pci_scan_slot()
  PCI: Move jailhouse's isolated function handling to pci_scan_slot()
  PCI: Extend isolated function probing to s390
  s390/pci: allow zPCI zbus without a function zero

 arch/s390/pci/pci_bus.c    | 82 ++++++++++----------------------------
 drivers/pci/probe.c        | 63 +++++++++++------------------
 include/linux/hypervisor.h |  9 +++++
 3 files changed, 52 insertions(+), 102 deletions(-)

-- 
2.32.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 1/4] PCI: Clean up pci_scan_slot()
  2022-04-19 10:27 [PATCH v3 0/4] PCI: Rework pci_scan_slot() and isolated PCI functions Niklas Schnelle
@ 2022-04-19 10:28 ` Niklas Schnelle
  2022-04-21  2:14     ` Bjorn Helgaas
  2022-04-19 10:28 ` [PATCH v3 2/4] PCI: Move jailhouse's isolated function handling to pci_scan_slot() Niklas Schnelle
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Niklas Schnelle @ 2022-04-19 10:28 UTC (permalink / raw)
  To: Bjorn Helgaas, Jan Kiszka, Matthew Rosato, Pierre Morel
  Cc: linux-kernel, virtualization, linux-s390, linux-pci

While determining the next PCI function is factored out of
pci_scan_slot() into next_fn() the former still handles the first
function as a special case duplicating the code from the scan loop and
splitting the condition that the first function exits from it being
multifunction which is tested in next_fn().

Furthermore the non ARI branch of next_fn() mixes the case that
multifunction devices may have non-contiguous function ranges and dev
may thus be NULL with the multifunction requirement. It also signals
that no further functions need to be scanned by returning 0 which is
a valid function number.

Improve upon this by moving all conditions for having to scan for more
functions into next_fn() and make them obvious and commented.

By changing next_fn() to return -ENODEV instead of 0 when there is no
next function we can then handle the initial function inside the loop
and deduplicate the shared handling.

No functional change is intended.

Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
 drivers/pci/probe.c | 41 +++++++++++++++++++----------------------
 1 file changed, 19 insertions(+), 22 deletions(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 17a969942d37..389aa1f9cb2c 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2579,33 +2579,35 @@ struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn)
 }
 EXPORT_SYMBOL(pci_scan_single_device);
 
-static unsigned int next_fn(struct pci_bus *bus, struct pci_dev *dev,
-			    unsigned int fn)
+static int next_fn(struct pci_bus *bus, struct pci_dev *dev, int fn)
 {
 	int pos;
 	u16 cap = 0;
 	unsigned int next_fn;
 
-	if (pci_ari_enabled(bus)) {
-		if (!dev)
-			return 0;
+	if (dev && pci_ari_enabled(bus)) {
 		pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ARI);
 		if (!pos)
-			return 0;
+			return -ENODEV;
 
 		pci_read_config_word(dev, pos + PCI_ARI_CAP, &cap);
 		next_fn = PCI_ARI_CAP_NFN(cap);
 		if (next_fn <= fn)
-			return 0;	/* protect against malformed list */
+			return -ENODEV;	/* protect against malformed list */
 
 		return next_fn;
 	}
 
-	/* dev may be NULL for non-contiguous multifunction devices */
-	if (!dev || dev->multifunction)
-		return (fn + 1) % 8;
-
-	return 0;
+	/* only multifunction devices may have more functions */
+	if (dev && !dev->multifunction)
+		return -ENODEV;
+	/*
+	 * A function 0 is required but multifunction devices may
+	 * be non-contiguous so dev can be NULL otherwise.
+	 */
+	if (!fn && !dev)
+		return -ENODEV;
+	return (fn <= 6) ? fn + 1 : -ENODEV;
 }
 
 static int only_one_child(struct pci_bus *bus)
@@ -2643,24 +2645,19 @@ static int only_one_child(struct pci_bus *bus)
  */
 int pci_scan_slot(struct pci_bus *bus, int devfn)
 {
-	unsigned int fn, nr = 0;
-	struct pci_dev *dev;
+	int fn, nr = 0;
+	struct pci_dev *dev = NULL;
 
 	if (only_one_child(bus) && (devfn > 0))
 		return 0; /* Already scanned the entire slot */
 
-	dev = pci_scan_single_device(bus, devfn);
-	if (!dev)
-		return 0;
-	if (!pci_dev_is_added(dev))
-		nr++;
-
-	for (fn = next_fn(bus, dev, 0); fn > 0; fn = next_fn(bus, dev, fn)) {
+	for (fn = 0; fn >= 0; fn = next_fn(bus, dev, fn)) {
 		dev = pci_scan_single_device(bus, devfn + fn);
 		if (dev) {
 			if (!pci_dev_is_added(dev))
 				nr++;
-			dev->multifunction = 1;
+			if (nr > 1)
+				dev->multifunction = 1;
 		}
 	}
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 2/4] PCI: Move jailhouse's isolated function handling to pci_scan_slot()
  2022-04-19 10:27 [PATCH v3 0/4] PCI: Rework pci_scan_slot() and isolated PCI functions Niklas Schnelle
  2022-04-19 10:28 ` [PATCH v3 1/4] PCI: Clean up pci_scan_slot() Niklas Schnelle
@ 2022-04-19 10:28 ` Niklas Schnelle
  2022-04-19 10:28 ` [PATCH v3 3/4] PCI: Extend isolated function probing to s390 Niklas Schnelle
  2022-04-19 10:28 ` [PATCH v3 4/4] s390/pci: allow zPCI zbus without a function zero Niklas Schnelle
  3 siblings, 0 replies; 12+ messages in thread
From: Niklas Schnelle @ 2022-04-19 10:28 UTC (permalink / raw)
  To: Bjorn Helgaas, Jan Kiszka, Matthew Rosato, Pierre Morel
  Cc: linux-kernel, virtualization, linux-s390, linux-pci

The special case of the jailhouse hypervisor passing through individual
PCI functions handles scanning for PCI functions even if function 0 does
not exist. Currently this is done with an extra loop duplicating the one
in pci_scan_slot(). By incorporating the check for jailhouse_paravirt()
into next_fn() we can instead do this as part of the normal
pci_scan_slot(). The only functional change is that we now call
pcie_aspm_init_link_state() for these functions but this already
happened if function 0 was passed through and should not be a problem.

Link: https://lore.kernel.org/linux-pci/20220408224514.GA353445@bhelgaas/
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
 drivers/pci/probe.c | 28 +++++++---------------------
 1 file changed, 7 insertions(+), 21 deletions(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 389aa1f9cb2c..a1e8f1e14c3d 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2602,10 +2602,11 @@ static int next_fn(struct pci_bus *bus, struct pci_dev *dev, int fn)
 	if (dev && !dev->multifunction)
 		return -ENODEV;
 	/*
-	 * A function 0 is required but multifunction devices may
-	 * be non-contiguous so dev can be NULL otherwise.
+	 * Usually a function 0 is required but the jailhouse hypervisor may
+	 * pass individual functions. For non-contiguous multifunction devices
+	 * some functions may also be missing.
 	 */
-	if (!fn && !dev)
+	if (!fn && !dev && !jailhouse_paravirt())
 		return -ENODEV;
 	return (fn <= 6) ? fn + 1 : -ENODEV;
 }
@@ -2855,29 +2856,14 @@ static unsigned int pci_scan_child_bus_extend(struct pci_bus *bus,
 {
 	unsigned int used_buses, normal_bridges = 0, hotplug_bridges = 0;
 	unsigned int start = bus->busn_res.start;
-	unsigned int devfn, fn, cmax, max = start;
+	unsigned int devfn, cmax, max = start;
 	struct pci_dev *dev;
-	int nr_devs;
 
 	dev_dbg(&bus->dev, "scanning bus\n");
 
 	/* Go find them, Rover! */
-	for (devfn = 0; devfn < 256; devfn += 8) {
-		nr_devs = pci_scan_slot(bus, devfn);
-
-		/*
-		 * The Jailhouse hypervisor may pass individual functions of a
-		 * multi-function device to a guest without passing function 0.
-		 * Look for them as well.
-		 */
-		if (jailhouse_paravirt() && nr_devs == 0) {
-			for (fn = 1; fn < 8; fn++) {
-				dev = pci_scan_single_device(bus, devfn + fn);
-				if (dev)
-					dev->multifunction = 1;
-			}
-		}
-	}
+	for (devfn = 0; devfn < 256; devfn += 8)
+		pci_scan_slot(bus, devfn);
 
 	/* Reserve buses for SR-IOV capability */
 	used_buses = pci_iov_bus_range(bus);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 3/4] PCI: Extend isolated function probing to s390
  2022-04-19 10:27 [PATCH v3 0/4] PCI: Rework pci_scan_slot() and isolated PCI functions Niklas Schnelle
  2022-04-19 10:28 ` [PATCH v3 1/4] PCI: Clean up pci_scan_slot() Niklas Schnelle
  2022-04-19 10:28 ` [PATCH v3 2/4] PCI: Move jailhouse's isolated function handling to pci_scan_slot() Niklas Schnelle
@ 2022-04-19 10:28 ` Niklas Schnelle
  2022-04-19 10:28 ` [PATCH v3 4/4] s390/pci: allow zPCI zbus without a function zero Niklas Schnelle
  3 siblings, 0 replies; 12+ messages in thread
From: Niklas Schnelle @ 2022-04-19 10:28 UTC (permalink / raw)
  To: Bjorn Helgaas, Jan Kiszka, Matthew Rosato, Pierre Morel
  Cc: linux-kernel, virtualization, linux-s390, linux-pci

Like the jailhouse hypervisor s390's PCI architecture allows passing
isolated PCI functions to an OS instance. As of now this is was not
utilized even with multi-function support as the s390 PCI code makes
sure that only virtual PCI busses including a function with devfn 0 are
presented to the PCI subsystem. A subsequent change will remove this
restriction.

Allow probing such functions by replacing the existing check for
jailhouse_paravirt() with a new hypervisor_isolated_pci_functions()
helper.

Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
 drivers/pci/probe.c        | 8 ++++----
 include/linux/hypervisor.h | 9 +++++++++
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index a1e8f1e14c3d..09b46746d48e 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2602,11 +2602,11 @@ static int next_fn(struct pci_bus *bus, struct pci_dev *dev, int fn)
 	if (dev && !dev->multifunction)
 		return -ENODEV;
 	/*
-	 * Usually a function 0 is required but the jailhouse hypervisor may
-	 * pass individual functions. For non-contiguous multifunction devices
-	 * some functions may also be missing.
+	 * Usually a function 0 is required but some hypervisors may pass
+	 * individual functions. For non-contiguous multifunction devices some
+	 * functions may also be missing.
 	 */
-	if (!fn && !dev && !jailhouse_paravirt())
+	if (!fn && !dev && !hypervisor_isolated_pci_functions())
 		return -ENODEV;
 	return (fn <= 6) ? fn + 1 : -ENODEV;
 }
diff --git a/include/linux/hypervisor.h b/include/linux/hypervisor.h
index fc08b433c856..52abd459f9a3 100644
--- a/include/linux/hypervisor.h
+++ b/include/linux/hypervisor.h
@@ -32,4 +32,13 @@ static inline bool jailhouse_paravirt(void)
 
 #endif /* !CONFIG_X86 */
 
+static inline bool hypervisor_isolated_pci_functions(void)
+{
+	if (IS_ENABLED(CONFIG_S390))
+		return true;
+	else
+		return jailhouse_paravirt();
+}
+
+
 #endif /* __LINUX_HYPEVISOR_H */
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 4/4] s390/pci: allow zPCI zbus without a function zero
  2022-04-19 10:27 [PATCH v3 0/4] PCI: Rework pci_scan_slot() and isolated PCI functions Niklas Schnelle
                   ` (2 preceding siblings ...)
  2022-04-19 10:28 ` [PATCH v3 3/4] PCI: Extend isolated function probing to s390 Niklas Schnelle
@ 2022-04-19 10:28 ` Niklas Schnelle
  3 siblings, 0 replies; 12+ messages in thread
From: Niklas Schnelle @ 2022-04-19 10:28 UTC (permalink / raw)
  To: Bjorn Helgaas, Jan Kiszka, Matthew Rosato, Pierre Morel
  Cc: linux-kernel, virtualization, linux-s390, linux-pci

Currently the zPCI code block PCI bus creation and probing of a zPCI
zbus unless there is a PCI function with devfn 0. This is always the
case for the PCI functions with hidden RID but may keep PCI functions
from a multi-function PCI device with RID information invisible until
the function 0 becomes visible. Worse as a PCI bus is necessary to even
present a PCI hotplug slot even that remains invisible.

With the probing of these so called isolated PCI functions enabled for
s390 in common code this restriction is no longer necessary. On network
cards with multiple ports and a PF per port this also allows using each
port on its own while still providing the physical PCI topology
information in the devfn needed to associate VFs with their parent PF.

Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
 arch/s390/pci/pci_bus.c | 82 ++++++++++-------------------------------
 1 file changed, 20 insertions(+), 62 deletions(-)

diff --git a/arch/s390/pci/pci_bus.c b/arch/s390/pci/pci_bus.c
index 5d77acbd1c87..6a8da1b742ae 100644
--- a/arch/s390/pci/pci_bus.c
+++ b/arch/s390/pci/pci_bus.c
@@ -145,9 +145,6 @@ int zpci_bus_scan_bus(struct zpci_bus *zbus)
 	struct zpci_dev *zdev;
 	int devfn, rc, ret = 0;
 
-	if (!zbus->function[0])
-		return 0;
-
 	for (devfn = 0; devfn < ZPCI_FUNCTIONS_PER_BUS; devfn++) {
 		zdev = zbus->function[devfn];
 		if (zdev && zdev->state == ZPCI_FN_STATE_CONFIGURED) {
@@ -184,26 +181,26 @@ void zpci_bus_scan_busses(void)
 
 /* zpci_bus_create_pci_bus - Create the PCI bus associated with this zbus
  * @zbus: the zbus holding the zdevices
- * @f0: function 0 of the bus
+ * @fr: PCI root function that will determine the bus's domain, and bus speeed
  * @ops: the pci operations
  *
- * Function zero is taken as a parameter as this is used to determine the
- * domain, multifunction property and maximum bus speed of the entire bus.
+ * The PCI function @fr determines the domain (its UID), multifunction property
+ * and maximum bus speed of the entire bus.
  *
  * Return: 0 on success, an error code otherwise
  */
-static int zpci_bus_create_pci_bus(struct zpci_bus *zbus, struct zpci_dev *f0, struct pci_ops *ops)
+static int zpci_bus_create_pci_bus(struct zpci_bus *zbus, struct zpci_dev *fr, struct pci_ops *ops)
 {
 	struct pci_bus *bus;
 	int domain;
 
-	domain = zpci_alloc_domain((u16)f0->uid);
+	domain = zpci_alloc_domain((u16)fr->uid);
 	if (domain < 0)
 		return domain;
 
 	zbus->domain_nr = domain;
-	zbus->multifunction = f0->rid_available;
-	zbus->max_bus_speed = f0->max_bus_speed;
+	zbus->multifunction = fr->rid_available;
+	zbus->max_bus_speed = fr->max_bus_speed;
 
 	/*
 	 * Note that the zbus->resources are taken over and zbus->resources
@@ -303,47 +300,6 @@ void pcibios_bus_add_device(struct pci_dev *pdev)
 	}
 }
 
-/* zpci_bus_create_hotplug_slots - Add hotplug slot(s) for device added to bus
- * @zdev: the zPCI device that was newly added
- *
- * Add the hotplug slot(s) for the newly added PCI function. Normally this is
- * simply the slot for the function itself. If however we are adding the
- * function 0 on a zbus, it might be that we already registered functions on
- * that zbus but could not create their hotplug slots yet so add those now too.
- *
- * Return: 0 on success, an error code otherwise
- */
-static int zpci_bus_create_hotplug_slots(struct zpci_dev *zdev)
-{
-	struct zpci_bus *zbus = zdev->zbus;
-	int devfn, rc = 0;
-
-	rc = zpci_init_slot(zdev);
-	if (rc)
-		return rc;
-	zdev->has_hp_slot = 1;
-
-	if (zdev->devfn == 0 && zbus->multifunction) {
-		/* Now that function 0 is there we can finally create the
-		 * hotplug slots for those functions with devfn != 0 that have
-		 * been parked in zbus->function[] waiting for us to be able to
-		 * create the PCI bus.
-		 */
-		for  (devfn = 1; devfn < ZPCI_FUNCTIONS_PER_BUS; devfn++) {
-			zdev = zbus->function[devfn];
-			if (zdev && !zdev->has_hp_slot) {
-				rc = zpci_init_slot(zdev);
-				if (rc)
-					return rc;
-				zdev->has_hp_slot = 1;
-			}
-		}
-
-	}
-
-	return rc;
-}
-
 static int zpci_bus_add_device(struct zpci_bus *zbus, struct zpci_dev *zdev)
 {
 	int rc = -EINVAL;
@@ -352,21 +308,19 @@ static int zpci_bus_add_device(struct zpci_bus *zbus, struct zpci_dev *zdev)
 		pr_err("devfn %04x is already assigned\n", zdev->devfn);
 		return rc;
 	}
+
 	zdev->zbus = zbus;
 	zbus->function[zdev->devfn] = zdev;
 	zpci_nb_devices++;
 
-	if (zbus->bus) {
-		if (zbus->multifunction && !zdev->rid_available) {
-			WARN_ONCE(1, "rid_available not set for multifunction\n");
-			goto error;
-		}
-
-		zpci_bus_create_hotplug_slots(zdev);
-	} else {
-		/* Hotplug slot will be created once function 0 appears */
-		zbus->multifunction = 1;
+	if (zbus->multifunction && !zdev->rid_available) {
+		WARN_ONCE(1, "rid_available not set for multifunction\n");
+		goto error;
 	}
+	rc = zpci_init_slot(zdev);
+	if (rc)
+		goto error;
+	zdev->has_hp_slot = 1;
 
 	return 0;
 
@@ -400,7 +354,11 @@ int zpci_bus_device_register(struct zpci_dev *zdev, struct pci_ops *ops)
 			return -ENOMEM;
 	}
 
-	if (zdev->devfn == 0) {
+	if (!zbus->bus) {
+		/* The UID of the first PCI function registered with a zpci_bus
+		 * is used as the domain number for that bus. Currently there
+		 * is exactly one zpci_bus per domain.
+		 */
 		rc = zpci_bus_create_pci_bus(zbus, zdev, ops);
 		if (rc)
 			goto error;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/4] PCI: Clean up pci_scan_slot()
  2022-04-19 10:28 ` [PATCH v3 1/4] PCI: Clean up pci_scan_slot() Niklas Schnelle
@ 2022-04-21  2:14     ` Bjorn Helgaas
  0 siblings, 0 replies; 12+ messages in thread
From: Bjorn Helgaas @ 2022-04-21  2:14 UTC (permalink / raw)
  To: Niklas Schnelle
  Cc: Bjorn Helgaas, Jan Kiszka, Matthew Rosato, Pierre Morel,
	linux-kernel, virtualization, linux-s390, linux-pci

Hi Niklas,

I'm sure this makes good sense, but I need a little more hand-holding.
Sorry this is long and rambling.

On Tue, Apr 19, 2022 at 12:28:00PM +0200, Niklas Schnelle wrote:
> While determining the next PCI function is factored out of
> pci_scan_slot() into next_fn() the former still handles the first
> function as a special case duplicating the code from the scan loop and
> splitting the condition that the first function exits from it being
> multifunction which is tested in next_fn().
> 
> Furthermore the non ARI branch of next_fn() mixes the case that
> multifunction devices may have non-contiguous function ranges and dev
> may thus be NULL with the multifunction requirement. It also signals
> that no further functions need to be scanned by returning 0 which is
> a valid function number.
> 
> Improve upon this by moving all conditions for having to scan for more
> functions into next_fn() and make them obvious and commented.
> 
> By changing next_fn() to return -ENODEV instead of 0 when there is no
> next function we can then handle the initial function inside the loop
> and deduplicate the shared handling.
> 
> No functional change is intended.
> 
> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
> ---
>  drivers/pci/probe.c | 41 +++++++++++++++++++----------------------
>  1 file changed, 19 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 17a969942d37..389aa1f9cb2c 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2579,33 +2579,35 @@ struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn)
>  }
>  EXPORT_SYMBOL(pci_scan_single_device);
>  
> -static unsigned int next_fn(struct pci_bus *bus, struct pci_dev *dev,
> -			    unsigned int fn)
> +static int next_fn(struct pci_bus *bus, struct pci_dev *dev, int fn)
>  {
>  	int pos;
>  	u16 cap = 0;
>  	unsigned int next_fn;
>  
> -	if (pci_ari_enabled(bus)) {
> -		if (!dev)
> -			return 0;
> +	if (dev && pci_ari_enabled(bus)) {

I think this would be easier to verify if we kept the explicit error
return, e.g.,

  if (pci_ari_enabled(bus)) {
    if (!dev)
      return -ENODEV;
    pos = pci_find_ext_capability(...);

Otherwise we have to sort through the !dev cases below.  I guess
-ENODEV would come from either the "!fn && !dev" case or the "fn > 6"
case, but it's not obvious to me that those are equivalent to the
previous code.

>  		pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ARI);
>  		if (!pos)
> -			return 0;
> +			return -ENODEV;
>  
>  		pci_read_config_word(dev, pos + PCI_ARI_CAP, &cap);
>  		next_fn = PCI_ARI_CAP_NFN(cap);
>  		if (next_fn <= fn)
> -			return 0;	/* protect against malformed list */
> +			return -ENODEV;	/* protect against malformed list */
>  
>  		return next_fn;
>  	}
>  
> -	/* dev may be NULL for non-contiguous multifunction devices */
> -	if (!dev || dev->multifunction)
> -		return (fn + 1) % 8;
> -
> -	return 0;
> +	/* only multifunction devices may have more functions */
> +	if (dev && !dev->multifunction)
> +		return -ENODEV;

I don't understand why the "!dev || dev->multifunction" test needs to
change.  Isn't that valid even in the hypervisor case?  IIUC, you want
to return success in some cases that currently return failure, so this
case that was already success should be fine as it was.

Is this because "(fn + 1) % 8" may be zero, which previously
terminated the loop, but now it doesn't because "fn == 0" is the
*first* execution of the loop?

If so, I wonder if we could avoid that case by adding:

  if (fn >= 7)
    return -ENODEV;

at the very beginning.  Maybe that would allow a more trivial patch
that just changed the error return from 0 to -ENODEV, i.e., leaving
all the logic in next_fn() unchanged?

I'm wondering if this could end up like:

    if (fn >= 7)
      return -ENODEV;

    if (pci_ari_enabled(bus)) {
      if (!dev)
	return -ENODEV;
      ...
      return next_fn;
    }

    if (!dev || dev->multifunction)
      return (fn + 1) % 8;

 +  if (hypervisor_isolated_pci_functions())
 +    return (fn + 1) % 8;

    return -ENODEV;

(The hypervisor part being added in a subsequent patch, and I'm not
sure exactly what logic you need there -- the point being that it's
just an additional success case.)

The "% 8" seems possibly superfluous then, since previously that
caused a zero return that terminated the loop.  If we're using -ENODEV
to terminate the loop, we probably don't care about the mod 8.

> +	/*
> +	 * A function 0 is required but multifunction devices may
> +	 * be non-contiguous so dev can be NULL otherwise.

I understood the original "dev may be NULL ..." comment, but I can't
quite parse this.  "dev can be NULL" for non-zero functions?  That's
basically what it said before, but it's not clear what "otherwise"
refers to.

> +	 */
> +	if (!fn && !dev)
> +		return -ENODEV;

This part isn't obvious to me yet, partly because of the "!fn && !dev"
construction.  The negatives make it hard to parse.

Since "fn" isn't a boolean or a pointer, I think "fn == 0" is easier
to read than "!fn".  I would test "dev" first since it logically
precedes "fn".

IIUC !dev means we haven't found a function at this device number yet.
So this:

  if (!dev && fn == 0)
    return -ENODEV;

means we called pci_scan_single_device(bus, devfn + 0) the first time
through the loop, and it didn't find a device so it returned NULL.

> +	return (fn <= 6) ? fn + 1 : -ENODEV;
>  }
>  
>  static int only_one_child(struct pci_bus *bus)
> @@ -2643,24 +2645,19 @@ static int only_one_child(struct pci_bus *bus)
>   */
>  int pci_scan_slot(struct pci_bus *bus, int devfn)
>  {
> -	unsigned int fn, nr = 0;
> -	struct pci_dev *dev;
> +	int fn, nr = 0;
> +	struct pci_dev *dev = NULL;
>  
>  	if (only_one_child(bus) && (devfn > 0))
>  		return 0; /* Already scanned the entire slot */
>  
> -	dev = pci_scan_single_device(bus, devfn);
> -	if (!dev)
> -		return 0;
> -	if (!pci_dev_is_added(dev))
> -		nr++;
> -
> -	for (fn = next_fn(bus, dev, 0); fn > 0; fn = next_fn(bus, dev, fn)) {
> +	for (fn = 0; fn >= 0; fn = next_fn(bus, dev, fn)) {
>  		dev = pci_scan_single_device(bus, devfn + fn);

"devfn + fn" (in the existing, unchanged code) is a little bit weird.
In almost all cases, devfn is the result of "PCI_DEVFN(slot, 0)", so
we could make the interface:

  pci_scan_slot(struct pci_bus *bus, int dev)

where "dev" is 0-31.

The only exceptions are a couple hotplug drivers where the fn probably
is or should be 0, too, but I haven't verified that.

But this would be scope creep, so possibly something we could consider
in the future, but not for this series.

>  		if (dev) {
>  			if (!pci_dev_is_added(dev))
>  				nr++;
> -			dev->multifunction = 1;
> +			if (nr > 1)
> +				dev->multifunction = 1;
>  		}
>  	}
>  
> -- 
> 2.32.0
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/4] PCI: Clean up pci_scan_slot()
@ 2022-04-21  2:14     ` Bjorn Helgaas
  0 siblings, 0 replies; 12+ messages in thread
From: Bjorn Helgaas @ 2022-04-21  2:14 UTC (permalink / raw)
  To: Niklas Schnelle
  Cc: linux-s390, Matthew Rosato, Pierre Morel, Jan Kiszka,
	linux-kernel, virtualization, linux-pci, Bjorn Helgaas

Hi Niklas,

I'm sure this makes good sense, but I need a little more hand-holding.
Sorry this is long and rambling.

On Tue, Apr 19, 2022 at 12:28:00PM +0200, Niklas Schnelle wrote:
> While determining the next PCI function is factored out of
> pci_scan_slot() into next_fn() the former still handles the first
> function as a special case duplicating the code from the scan loop and
> splitting the condition that the first function exits from it being
> multifunction which is tested in next_fn().
> 
> Furthermore the non ARI branch of next_fn() mixes the case that
> multifunction devices may have non-contiguous function ranges and dev
> may thus be NULL with the multifunction requirement. It also signals
> that no further functions need to be scanned by returning 0 which is
> a valid function number.
> 
> Improve upon this by moving all conditions for having to scan for more
> functions into next_fn() and make them obvious and commented.
> 
> By changing next_fn() to return -ENODEV instead of 0 when there is no
> next function we can then handle the initial function inside the loop
> and deduplicate the shared handling.
> 
> No functional change is intended.
> 
> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
> ---
>  drivers/pci/probe.c | 41 +++++++++++++++++++----------------------
>  1 file changed, 19 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 17a969942d37..389aa1f9cb2c 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2579,33 +2579,35 @@ struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn)
>  }
>  EXPORT_SYMBOL(pci_scan_single_device);
>  
> -static unsigned int next_fn(struct pci_bus *bus, struct pci_dev *dev,
> -			    unsigned int fn)
> +static int next_fn(struct pci_bus *bus, struct pci_dev *dev, int fn)
>  {
>  	int pos;
>  	u16 cap = 0;
>  	unsigned int next_fn;
>  
> -	if (pci_ari_enabled(bus)) {
> -		if (!dev)
> -			return 0;
> +	if (dev && pci_ari_enabled(bus)) {

I think this would be easier to verify if we kept the explicit error
return, e.g.,

  if (pci_ari_enabled(bus)) {
    if (!dev)
      return -ENODEV;
    pos = pci_find_ext_capability(...);

Otherwise we have to sort through the !dev cases below.  I guess
-ENODEV would come from either the "!fn && !dev" case or the "fn > 6"
case, but it's not obvious to me that those are equivalent to the
previous code.

>  		pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ARI);
>  		if (!pos)
> -			return 0;
> +			return -ENODEV;
>  
>  		pci_read_config_word(dev, pos + PCI_ARI_CAP, &cap);
>  		next_fn = PCI_ARI_CAP_NFN(cap);
>  		if (next_fn <= fn)
> -			return 0;	/* protect against malformed list */
> +			return -ENODEV;	/* protect against malformed list */
>  
>  		return next_fn;
>  	}
>  
> -	/* dev may be NULL for non-contiguous multifunction devices */
> -	if (!dev || dev->multifunction)
> -		return (fn + 1) % 8;
> -
> -	return 0;
> +	/* only multifunction devices may have more functions */
> +	if (dev && !dev->multifunction)
> +		return -ENODEV;

I don't understand why the "!dev || dev->multifunction" test needs to
change.  Isn't that valid even in the hypervisor case?  IIUC, you want
to return success in some cases that currently return failure, so this
case that was already success should be fine as it was.

Is this because "(fn + 1) % 8" may be zero, which previously
terminated the loop, but now it doesn't because "fn == 0" is the
*first* execution of the loop?

If so, I wonder if we could avoid that case by adding:

  if (fn >= 7)
    return -ENODEV;

at the very beginning.  Maybe that would allow a more trivial patch
that just changed the error return from 0 to -ENODEV, i.e., leaving
all the logic in next_fn() unchanged?

I'm wondering if this could end up like:

    if (fn >= 7)
      return -ENODEV;

    if (pci_ari_enabled(bus)) {
      if (!dev)
	return -ENODEV;
      ...
      return next_fn;
    }

    if (!dev || dev->multifunction)
      return (fn + 1) % 8;

 +  if (hypervisor_isolated_pci_functions())
 +    return (fn + 1) % 8;

    return -ENODEV;

(The hypervisor part being added in a subsequent patch, and I'm not
sure exactly what logic you need there -- the point being that it's
just an additional success case.)

The "% 8" seems possibly superfluous then, since previously that
caused a zero return that terminated the loop.  If we're using -ENODEV
to terminate the loop, we probably don't care about the mod 8.

> +	/*
> +	 * A function 0 is required but multifunction devices may
> +	 * be non-contiguous so dev can be NULL otherwise.

I understood the original "dev may be NULL ..." comment, but I can't
quite parse this.  "dev can be NULL" for non-zero functions?  That's
basically what it said before, but it's not clear what "otherwise"
refers to.

> +	 */
> +	if (!fn && !dev)
> +		return -ENODEV;

This part isn't obvious to me yet, partly because of the "!fn && !dev"
construction.  The negatives make it hard to parse.

Since "fn" isn't a boolean or a pointer, I think "fn == 0" is easier
to read than "!fn".  I would test "dev" first since it logically
precedes "fn".

IIUC !dev means we haven't found a function at this device number yet.
So this:

  if (!dev && fn == 0)
    return -ENODEV;

means we called pci_scan_single_device(bus, devfn + 0) the first time
through the loop, and it didn't find a device so it returned NULL.

> +	return (fn <= 6) ? fn + 1 : -ENODEV;
>  }
>  
>  static int only_one_child(struct pci_bus *bus)
> @@ -2643,24 +2645,19 @@ static int only_one_child(struct pci_bus *bus)
>   */
>  int pci_scan_slot(struct pci_bus *bus, int devfn)
>  {
> -	unsigned int fn, nr = 0;
> -	struct pci_dev *dev;
> +	int fn, nr = 0;
> +	struct pci_dev *dev = NULL;
>  
>  	if (only_one_child(bus) && (devfn > 0))
>  		return 0; /* Already scanned the entire slot */
>  
> -	dev = pci_scan_single_device(bus, devfn);
> -	if (!dev)
> -		return 0;
> -	if (!pci_dev_is_added(dev))
> -		nr++;
> -
> -	for (fn = next_fn(bus, dev, 0); fn > 0; fn = next_fn(bus, dev, fn)) {
> +	for (fn = 0; fn >= 0; fn = next_fn(bus, dev, fn)) {
>  		dev = pci_scan_single_device(bus, devfn + fn);

"devfn + fn" (in the existing, unchanged code) is a little bit weird.
In almost all cases, devfn is the result of "PCI_DEVFN(slot, 0)", so
we could make the interface:

  pci_scan_slot(struct pci_bus *bus, int dev)

where "dev" is 0-31.

The only exceptions are a couple hotplug drivers where the fn probably
is or should be 0, too, but I haven't verified that.

But this would be scope creep, so possibly something we could consider
in the future, but not for this series.

>  		if (dev) {
>  			if (!pci_dev_is_added(dev))
>  				nr++;
> -			dev->multifunction = 1;
> +			if (nr > 1)
> +				dev->multifunction = 1;
>  		}
>  	}
>  
> -- 
> 2.32.0
> 
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/4] PCI: Clean up pci_scan_slot()
  2022-04-21  2:14     ` Bjorn Helgaas
  (?)
@ 2022-04-21  9:27     ` Niklas Schnelle
  2022-04-21 11:14       ` Niklas Schnelle
  2022-04-21 17:09         ` Bjorn Helgaas
  -1 siblings, 2 replies; 12+ messages in thread
From: Niklas Schnelle @ 2022-04-21  9:27 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Jan Kiszka, Matthew Rosato, Pierre Morel,
	linux-kernel, virtualization, linux-s390, linux-pci

[-- Attachment #1: Type: text/plain, Size: 10874 bytes --]

On Wed, 2022-04-20 at 21:14 -0500, Bjorn Helgaas wrote:
> Hi Niklas,
> 
> I'm sure this makes good sense, but I need a little more hand-holding.
> Sorry this is long and rambling.
> 
> On Tue, Apr 19, 2022 at 12:28:00PM +0200, Niklas Schnelle wrote:
> > While determining the next PCI function is factored out of
> > pci_scan_slot() into next_fn() the former still handles the first
> > function as a special case duplicating the code from the scan loop and
> > splitting the condition that the first function exits from it being
> > multifunction which is tested in next_fn().
> > 
> > Furthermore the non ARI branch of next_fn() mixes the case that
> > multifunction devices may have non-contiguous function ranges and dev
> > may thus be NULL with the multifunction requirement. It also signals
> > that no further functions need to be scanned by returning 0 which is
> > a valid function number.
> > 
> > Improve upon this by moving all conditions for having to scan for more
> > functions into next_fn() and make them obvious and commented.
> > 
> > By changing next_fn() to return -ENODEV instead of 0 when there is no
> > next function we can then handle the initial function inside the loop
> > and deduplicate the shared handling.
> > 
> > No functional change is intended.
> > 
> > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
> > ---
> >  drivers/pci/probe.c | 41 +++++++++++++++++++----------------------
> >  1 file changed, 19 insertions(+), 22 deletions(-)
> > 
> > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> > index 17a969942d37..389aa1f9cb2c 100644
> > --- a/drivers/pci/probe.c
> > +++ b/drivers/pci/probe.c
> > @@ -2579,33 +2579,35 @@ struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn)
> >  }
> >  EXPORT_SYMBOL(pci_scan_single_device);
> >  
> > -static unsigned int next_fn(struct pci_bus *bus, struct pci_dev *dev,
> > -			    unsigned int fn)
> > +static int next_fn(struct pci_bus *bus, struct pci_dev *dev, int fn)
> >  {
> >  	int pos;
> >  	u16 cap = 0;
> >  	unsigned int next_fn;
> >  
> > -	if (pci_ari_enabled(bus)) {
> > -		if (!dev)
> > -			return 0;
> > +	if (dev && pci_ari_enabled(bus)) {
> 
> I think this would be easier to verify if we kept the explicit error
> return, e.g.,
> 
>   if (pci_ari_enabled(bus)) {
>     if (!dev)
>       return -ENODEV;
>     pos = pci_find_ext_capability(...);
> 
> Otherwise we have to sort through the !dev cases below.  I guess
> -ENODEV would come from either the "!fn && !dev" case or the "fn > 6"
> case, but it's not obvious to me that those are equivalent to the
> previous code.

We could keep this the same for this patch but I think for jailhouse
(patch 2) we need the "!dev" case not to fail here such that we can
handle the missing function 0 below even if ARI is enabled. For s390
this doesn't currently matter because pci_ari_enabled(bus) is always
false but I assumed that this isn't necessarily so for jailhouse. I
sent a follow up mail on a slight behavior change I can think of for
this case for v2 but forgot to send it also for v3. Quoted below:

"This part here theoretically changes the behavior slightly. If the ARI
information is wrong/lands us in a "hole" we may look for more
functions via the non-ARI path. Not sure if that is relevant though as
in the worst case we might find functions that we otherwise wouldn't
have seen. Seems rather obsure to me but I might be wrong, we currently
don't see the ARI capability in Linux on IBM Z so I have less
experience with this. I did of course boot test on my x86_64
workstation."

> 
> >  		pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ARI);
> >  		if (!pos)
> > -			return 0;
> > +			return -ENODEV;
> >  
> >  		pci_read_config_word(dev, pos + PCI_ARI_CAP, &cap);
> >  		next_fn = PCI_ARI_CAP_NFN(cap);
> >  		if (next_fn <= fn)
> > -			return 0;	/* protect against malformed list */
> > +			return -ENODEV;	/* protect against malformed list */
> >  
> >  		return next_fn;
> >  	}
> >  
> > -	/* dev may be NULL for non-contiguous multifunction devices */
> > -	if (!dev || dev->multifunction)
> > -		return (fn + 1) % 8;
> > -
> > -	return 0;
> > +	/* only multifunction devices may have more functions */
> > +	if (dev && !dev->multifunction)
> > +		return -ENODEV;
> 
> I don't understand why the "!dev || dev->multifunction" test needs to
> change.  Isn't that valid even in the hypervisor case?  IIUC, you want
> to return success in some cases that currently return failure, so this
> case that was already success should be fine as it was.

This isn't a change to the test. It's the negation of the logical
condition *and* a switch of the branches i.e. keeps the overall
behavior exactly the same. The equivalence is !(!A || B) == (A && !B).
There are two reasons I did this.

1. I find (!dev || dev->multifunction) to be much harder to grasp than
(dev && !dev->multifunction).

2. The whole next_fn() in my opinion becomes easier to read if it bails
for all bad cases early and the "this is the next fn" is the final
return if we didn't bail. This becomes even more true as another
condition is added in patch 2.

> 
> Is this because "(fn + 1) % 8" may be zero, which previously
> terminated the loop, but now it doesn't because "fn == 0" is the
> *first* execution of the loop?

Yes with function 0 handled in the loop we can't use 0 as the
termination indication. Also I find it generally weird to use a wrap
around for this.

> 
> If so, I wonder if we could avoid that case by adding:
> 
>   if (fn >= 7)
>     return -ENODEV;
> 
> at the very beginning.  Maybe that would allow a more trivial patch
> that just changed the error return from 0 to -ENODEV, i.e., leaving
> all the logic in next_fn() unchanged?

I think this is equivalent to the ternary at the return. Both return
-ENODEV for fn >= 7. I do like your idea better though as it keeps with
the scheme of my point 2 above and ternaries are ever so slightly
harder to read.

> 
> I'm wondering if this could end up like:
> 
>     if (fn >= 7)
>       return -ENODEV;
> 
>     if (pci_ari_enabled(bus)) {
>       if (!dev)
> 	return -ENODEV;
>       ...
>       return next_fn;
>     }
> 
>     if (!dev || dev->multifunction)
>       return (fn + 1) % 8;
> 
>  +  if (hypervisor_isolated_pci_functions())
>  +    return (fn + 1) % 8;
> 
>     return -ENODEV;
> 
> (The hypervisor part being added in a subsequent patch, and I'm not
> sure exactly what logic you need there -- the point being that it's
> just an additional success case.)

Yes pretty much only that by negating the success case and switching
the branches we end up with a list of fail/bail checks and a single
success return even with the hyperisor check added. Also not sure if
the "fn >= 7" check should rather go after the ARI path to keep them
separate doesn't really matter of course.

> 
> The "% 8" seems possibly superfluous then, since previously that
> caused a zero return that terminated the loop.  If we're using -ENODEV
> to terminate the loop, we probably don't care about the mod 8.

Yes

> 
> > +	/*
> > +	 * A function 0 is required but multifunction devices may
> > +	 * be non-contiguous so dev can be NULL otherwise.
> 
> I understood the original "dev may be NULL ..." comment, but I can't
> quite parse this.  "dev can be NULL" for non-zero functions?  That's
> basically what it said before, but it's not clear what "otherwise"
> refers to.

I agree this can probably be improved. I'm trying to say that dev can
be NULL if it is not function 0 which must exist. Maybe:

"dev may be NULL as multifunction devices may be non-contiguous but a
function 0 is required"

> 
> > +	 */
> > +	if (!fn && !dev)
> > +		return -ENODEV;
> 
> This part isn't obvious to me yet, partly because of the "!fn && !dev"
> construction.  The negatives make it hard to parse.
> 
> Since "fn" isn't a boolean or a pointer, I think "fn == 0" is easier
> to read than "!fn".  I would test "dev" first since it logically
> precedes "fn".

I agree about the "fn == 0", I only used "!fn" because I remember
getting checkpatch warnings for "foo == 0" in the past. I'll change to
fn == 0. As for the order see below.

> 
> IIUC !dev means we haven't found a function at this device number yet.
> So this:
> 
>   if (!dev && fn == 0)
>     return -ENODEV;
> 
> means we called pci_scan_single_device(bus, devfn + 0) the first time
> through the loop, and it didn't find a device so it returned NULL.

Yes. This is "dev may be NULL unless we're looking at function 0". The
fn came before dev because I wrote it as "function 0 must not be NULL"
but it could also be "dev is NULL and we're looking at function 0",
I have no clear preference.

This is also the case that gets changed by patch 2 to become:

"function 0 must not be NULL unless we have isolated PCI functions"

or with the order switched:

"dev is NULL and we're looking at function 0 and don't have isolated
PCI functions"

> 
> > +	return (fn <= 6) ? fn + 1 : -ENODEV;
> >  }
> >  
> >  static int only_one_child(struct pci_bus *bus)
> > @@ -2643,24 +2645,19 @@ static int only_one_child(struct pci_bus *bus)
> >   */
> >  int pci_scan_slot(struct pci_bus *bus, int devfn)
> >  {
> > -	unsigned int fn, nr = 0;
> > -	struct pci_dev *dev;
> > +	int fn, nr = 0;
> > +	struct pci_dev *dev = NULL;
> >  
> >  	if (only_one_child(bus) && (devfn > 0))
> >  		return 0; /* Already scanned the entire slot */
> >  
> > -	dev = pci_scan_single_device(bus, devfn);
> > -	if (!dev)
> > -		return 0;
> > -	if (!pci_dev_is_added(dev))
> > -		nr++;
> > -
> > -	for (fn = next_fn(bus, dev, 0); fn > 0; fn = next_fn(bus, dev, fn)) {
> > +	for (fn = 0; fn >= 0; fn = next_fn(bus, dev, fn)) {
> >  		dev = pci_scan_single_device(bus, devfn + fn);
> 
> "devfn + fn" (in the existing, unchanged code) is a little bit weird.
> In almost all cases, devfn is the result of "PCI_DEVFN(slot, 0)", so
> we could make the interface:
> 
>   pci_scan_slot(struct pci_bus *bus, int dev)
> 
> where "dev" is 0-31.
> 
> The only exceptions are a couple hotplug drivers where the fn probably
> is or should be 0, too, but I haven't verified that.
> 
> But this would be scope creep, so possibly something we could consider
> in the future, but not for this series.

Hmm, I see your point. It makes little sense to have a devfn that isn't
from PCI_DEVFN(slot, 0) and not use pci_scan_single_device() instead.

> 
> >  		if (dev) {
> >  			if (!pci_dev_is_added(dev))
> >  				nr++;
> > -			dev->multifunction = 1;
> > +			if (nr > 1)
> > +				dev->multifunction = 1;
> >  		}
> >  	}
> >  
> > -- 
> > 2.32.0
> > 


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/4] PCI: Clean up pci_scan_slot()
  2022-04-21  9:27     ` Niklas Schnelle
@ 2022-04-21 11:14       ` Niklas Schnelle
  2022-04-21 17:09         ` Bjorn Helgaas
  1 sibling, 0 replies; 12+ messages in thread
From: Niklas Schnelle @ 2022-04-21 11:14 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Jan Kiszka, Matthew Rosato, Pierre Morel,
	linux-kernel, virtualization, linux-s390, linux-pci

[-- Attachment #1: Type: text/plain, Size: 1253 bytes --]

> > > +	return (fn <= 6) ? fn + 1 : -ENODEV;
> > >  }
> > >  
> > >  static int only_one_child(struct pci_bus *bus)
> > > @@ -2643,24 +2645,19 @@ static int only_one_child(struct pci_bus *bus)
> > >   */
> > >  int pci_scan_slot(struct pci_bus *bus, int devfn)
> > >  {
> > > -	unsigned int fn, nr = 0;
> > > -	struct pci_dev *dev;
> > > +	int fn, nr = 0;
> > > +	struct pci_dev *dev = NULL;
> > >  
> > >  	if (only_one_child(bus) && (devfn > 0))
> > >  		return 0; /* Already scanned the entire slot */
> > >  
> > > -	dev = pci_scan_single_device(bus, devfn);
> > > -	if (!dev)
> > > -		return 0;
> > > 

As it might not have been clear in my previous mail. The above !dev
test just for the "devfn + 0" case is equivalent to the new:

if (!dev && fn == 0)
	return -ENODEV;

As fn doesn't wrap around anymore fn == 0 is true only for the first
iteration. Both in the existing and in the changed code the first
pci_scan_single_device() happens before the first next_fn() call though
with the new code that then breaks the loop instead of the above
return. This makes sense in my mind because if the first function
doesn't exist there are no more functions except if we have isolated
PCI functions due to a hypervisor.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/4] PCI: Clean up pci_scan_slot()
  2022-04-21  9:27     ` Niklas Schnelle
@ 2022-04-21 17:09         ` Bjorn Helgaas
  2022-04-21 17:09         ` Bjorn Helgaas
  1 sibling, 0 replies; 12+ messages in thread
From: Bjorn Helgaas @ 2022-04-21 17:09 UTC (permalink / raw)
  To: Niklas Schnelle
  Cc: Bjorn Helgaas, Jan Kiszka, Matthew Rosato, Pierre Morel,
	linux-kernel, virtualization, linux-s390, linux-pci

On Thu, Apr 21, 2022 at 11:27:42AM +0200, Niklas Schnelle wrote:
> On Wed, 2022-04-20 at 21:14 -0500, Bjorn Helgaas wrote:
> > On Tue, Apr 19, 2022 at 12:28:00PM +0200, Niklas Schnelle wrote:
> > > While determining the next PCI function is factored out of
> > > pci_scan_slot() into next_fn() the former still handles the first
> > > function as a special case duplicating the code from the scan loop and
> > > splitting the condition that the first function exits from it being
> > > multifunction which is tested in next_fn().
> > > 
> > > Furthermore the non ARI branch of next_fn() mixes the case that
> > > multifunction devices may have non-contiguous function ranges and dev
> > > may thus be NULL with the multifunction requirement. It also signals
> > > that no further functions need to be scanned by returning 0 which is
> > > a valid function number.
> > > 
> > > Improve upon this by moving all conditions for having to scan for more
> > > functions into next_fn() and make them obvious and commented.
> > > 
> > > By changing next_fn() to return -ENODEV instead of 0 when there is no
> > > next function we can then handle the initial function inside the loop
> > > and deduplicate the shared handling.
> > > 
> > > No functional change is intended.
> > > 
> > > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
> > > ---
> > >  drivers/pci/probe.c | 41 +++++++++++++++++++----------------------
> > >  1 file changed, 19 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> > > index 17a969942d37..389aa1f9cb2c 100644
> > > --- a/drivers/pci/probe.c
> > > +++ b/drivers/pci/probe.c
> > > @@ -2579,33 +2579,35 @@ struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn)
> > >  }
> > >  EXPORT_SYMBOL(pci_scan_single_device);
> > >  
> > > -static unsigned int next_fn(struct pci_bus *bus, struct pci_dev *dev,
> > > -			    unsigned int fn)
> > > +static int next_fn(struct pci_bus *bus, struct pci_dev *dev, int fn)
> > >  {
> > >  	int pos;
> > >  	u16 cap = 0;
> > >  	unsigned int next_fn;
> > >  
> > > -	if (pci_ari_enabled(bus)) {
> > > -		if (!dev)
> > > -			return 0;
> > > +	if (dev && pci_ari_enabled(bus)) {
> > 
> > I think this would be easier to verify if we kept the explicit error
> > return, e.g.,
> > 
> >   if (pci_ari_enabled(bus)) {
> >     if (!dev)
> >       return -ENODEV;
> >     pos = pci_find_ext_capability(...);
> > 
> > Otherwise we have to sort through the !dev cases below.  I guess
> > -ENODEV would come from either the "!fn && !dev" case or the "fn > 6"
> > case, but it's not obvious to me that those are equivalent to the
> > previous code.
> 
> We could keep this the same for this patch but I think for jailhouse
> (patch 2) we need the "!dev" case not to fail here such that we can
> handle the missing function 0 below even if ARI is enabled. For s390
> this doesn't currently matter because pci_ari_enabled(bus) is always
> false but I assumed that this isn't necessarily so for jailhouse. I
> sent a follow up mail on a slight behavior change I can think of for
> this case for v2 but forgot to send it also for v3. Quoted below:

I think it would be good to make the first patch change as little as
possible to make it easier to analyze, then possibly test for
hypervisor when changing this behavior.

> > > -	/* dev may be NULL for non-contiguous multifunction devices */
> > > -	if (!dev || dev->multifunction)
> > > -		return (fn + 1) % 8;
> > > -
> > > -	return 0;
> > > +	/* only multifunction devices may have more functions */
> > > +	if (dev && !dev->multifunction)
> > > +		return -ENODEV;
> > 
> > I don't understand why the "!dev || dev->multifunction" test needs to
> > change.  Isn't that valid even in the hypervisor case?  IIUC, you want
> > to return success in some cases that currently return failure, so this
> > case that was already success should be fine as it was.
> 
> This isn't a change to the test. It's the negation of the logical
> condition *and* a switch of the branches i.e. keeps the overall
> behavior exactly the same. The equivalence is !(!A || B) == (A && !B).

I see the Boolean equivalence, but it's difficult to verify that the
consequences are equivalent because the new code has the extra "!fn &&
!dev" test in the middle.

> There are two reasons I did this.
> 
> 1. I find (!dev || dev->multifunction) to be much harder to grasp than
> (dev && !dev->multifunction).
> 
> 2. The whole next_fn() in my opinion becomes easier to read if it bails
> for all bad cases early and the "this is the next fn" is the final
> return if we didn't bail. This becomes even more true as another
> condition is added in patch 2.

Fair enough, and I agree that "this is the next fn" is a nice final
return.  In general I think it's good to return either an error or the
next fn as soon as it is known.  It makes it harder to analyze if the
return value has already been determined but we have to mentally pass
over subsequent tests that don't affect it.

> > Is this because "(fn + 1) % 8" may be zero, which previously
> > terminated the loop, but now it doesn't because "fn == 0" is the
> > *first* execution of the loop?
> 
> Yes with function 0 handled in the loop we can't use 0 as the
> termination indication. Also I find it generally weird to use a wrap
> around for this.

Yes, I agree that's weird.  Usually I prefer "for" loops over
"do ...  while", but this might be a case where it makes sense --
we *always* want to call pci_scan_single_device() once, and
"do ... while" would accomplish that without any fuss.  It might even
allow us to keep the 0 return value as the termination condition,
which would be nice because fn could stay unsigned and it would reduce
the size of this patch.

I'm hoping we can end up with something like this:

  unsigned int next_fn(bus, dev, fn, mf)
  {
    if (ari(bus)) {
      if (!dev)
	return 0;
      return PCI_ARI_CAP_NFN();
    }

    if (fn >= 7)
      return 0;

    if (mf)
      return fn + 1;

    if (hypervisor())
      return fn + 1;

    return 0;
  }

  int pci_scan_slot(...)
  {
    unsigned int fn = 0, mf = 0;

    do {
      dev = pci_scan_single_device(bus, devfn + fn);
      if (dev && dev->multifunction)
	mf = 1;
      fn = next_fn(dev, fn, mf);
    } while (fn);
  }

This would be minimal change to next_fn(): just add the "mf"
parameter, which removes a lot of the confusing "dev" and "!dev"
testing, and add the "fn >= 7" to remove the implicit "% 8 == 0"
failure case.

Then the jailhouse/s390 patch would trivially add the new hypervisor
case, which is clearly separated from everything else.

> > If so, I wonder if we could avoid that case by adding:
> > 
> >   if (fn >= 7)
> >     return -ENODEV;
> > 
> > at the very beginning.  Maybe that would allow a more trivial patch
> > that just changed the error return from 0 to -ENODEV, i.e., leaving
> > all the logic in next_fn() unchanged?
> 
> I think this is equivalent to the ternary at the return. Both return
> -ENODEV for fn >= 7. I do like your idea better though as it keeps with
> the scheme of my point 2 above and ternaries are ever so slightly
> harder to read.

Oops, I don't think we can do this directly because in the ARI case,
fn is basically 8 bits wide so can be 0-255.

Bjorn

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/4] PCI: Clean up pci_scan_slot()
@ 2022-04-21 17:09         ` Bjorn Helgaas
  0 siblings, 0 replies; 12+ messages in thread
From: Bjorn Helgaas @ 2022-04-21 17:09 UTC (permalink / raw)
  To: Niklas Schnelle
  Cc: linux-s390, Matthew Rosato, Pierre Morel, Jan Kiszka,
	linux-kernel, virtualization, linux-pci, Bjorn Helgaas

On Thu, Apr 21, 2022 at 11:27:42AM +0200, Niklas Schnelle wrote:
> On Wed, 2022-04-20 at 21:14 -0500, Bjorn Helgaas wrote:
> > On Tue, Apr 19, 2022 at 12:28:00PM +0200, Niklas Schnelle wrote:
> > > While determining the next PCI function is factored out of
> > > pci_scan_slot() into next_fn() the former still handles the first
> > > function as a special case duplicating the code from the scan loop and
> > > splitting the condition that the first function exits from it being
> > > multifunction which is tested in next_fn().
> > > 
> > > Furthermore the non ARI branch of next_fn() mixes the case that
> > > multifunction devices may have non-contiguous function ranges and dev
> > > may thus be NULL with the multifunction requirement. It also signals
> > > that no further functions need to be scanned by returning 0 which is
> > > a valid function number.
> > > 
> > > Improve upon this by moving all conditions for having to scan for more
> > > functions into next_fn() and make them obvious and commented.
> > > 
> > > By changing next_fn() to return -ENODEV instead of 0 when there is no
> > > next function we can then handle the initial function inside the loop
> > > and deduplicate the shared handling.
> > > 
> > > No functional change is intended.
> > > 
> > > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
> > > ---
> > >  drivers/pci/probe.c | 41 +++++++++++++++++++----------------------
> > >  1 file changed, 19 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> > > index 17a969942d37..389aa1f9cb2c 100644
> > > --- a/drivers/pci/probe.c
> > > +++ b/drivers/pci/probe.c
> > > @@ -2579,33 +2579,35 @@ struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn)
> > >  }
> > >  EXPORT_SYMBOL(pci_scan_single_device);
> > >  
> > > -static unsigned int next_fn(struct pci_bus *bus, struct pci_dev *dev,
> > > -			    unsigned int fn)
> > > +static int next_fn(struct pci_bus *bus, struct pci_dev *dev, int fn)
> > >  {
> > >  	int pos;
> > >  	u16 cap = 0;
> > >  	unsigned int next_fn;
> > >  
> > > -	if (pci_ari_enabled(bus)) {
> > > -		if (!dev)
> > > -			return 0;
> > > +	if (dev && pci_ari_enabled(bus)) {
> > 
> > I think this would be easier to verify if we kept the explicit error
> > return, e.g.,
> > 
> >   if (pci_ari_enabled(bus)) {
> >     if (!dev)
> >       return -ENODEV;
> >     pos = pci_find_ext_capability(...);
> > 
> > Otherwise we have to sort through the !dev cases below.  I guess
> > -ENODEV would come from either the "!fn && !dev" case or the "fn > 6"
> > case, but it's not obvious to me that those are equivalent to the
> > previous code.
> 
> We could keep this the same for this patch but I think for jailhouse
> (patch 2) we need the "!dev" case not to fail here such that we can
> handle the missing function 0 below even if ARI is enabled. For s390
> this doesn't currently matter because pci_ari_enabled(bus) is always
> false but I assumed that this isn't necessarily so for jailhouse. I
> sent a follow up mail on a slight behavior change I can think of for
> this case for v2 but forgot to send it also for v3. Quoted below:

I think it would be good to make the first patch change as little as
possible to make it easier to analyze, then possibly test for
hypervisor when changing this behavior.

> > > -	/* dev may be NULL for non-contiguous multifunction devices */
> > > -	if (!dev || dev->multifunction)
> > > -		return (fn + 1) % 8;
> > > -
> > > -	return 0;
> > > +	/* only multifunction devices may have more functions */
> > > +	if (dev && !dev->multifunction)
> > > +		return -ENODEV;
> > 
> > I don't understand why the "!dev || dev->multifunction" test needs to
> > change.  Isn't that valid even in the hypervisor case?  IIUC, you want
> > to return success in some cases that currently return failure, so this
> > case that was already success should be fine as it was.
> 
> This isn't a change to the test. It's the negation of the logical
> condition *and* a switch of the branches i.e. keeps the overall
> behavior exactly the same. The equivalence is !(!A || B) == (A && !B).

I see the Boolean equivalence, but it's difficult to verify that the
consequences are equivalent because the new code has the extra "!fn &&
!dev" test in the middle.

> There are two reasons I did this.
> 
> 1. I find (!dev || dev->multifunction) to be much harder to grasp than
> (dev && !dev->multifunction).
> 
> 2. The whole next_fn() in my opinion becomes easier to read if it bails
> for all bad cases early and the "this is the next fn" is the final
> return if we didn't bail. This becomes even more true as another
> condition is added in patch 2.

Fair enough, and I agree that "this is the next fn" is a nice final
return.  In general I think it's good to return either an error or the
next fn as soon as it is known.  It makes it harder to analyze if the
return value has already been determined but we have to mentally pass
over subsequent tests that don't affect it.

> > Is this because "(fn + 1) % 8" may be zero, which previously
> > terminated the loop, but now it doesn't because "fn == 0" is the
> > *first* execution of the loop?
> 
> Yes with function 0 handled in the loop we can't use 0 as the
> termination indication. Also I find it generally weird to use a wrap
> around for this.

Yes, I agree that's weird.  Usually I prefer "for" loops over
"do ...  while", but this might be a case where it makes sense --
we *always* want to call pci_scan_single_device() once, and
"do ... while" would accomplish that without any fuss.  It might even
allow us to keep the 0 return value as the termination condition,
which would be nice because fn could stay unsigned and it would reduce
the size of this patch.

I'm hoping we can end up with something like this:

  unsigned int next_fn(bus, dev, fn, mf)
  {
    if (ari(bus)) {
      if (!dev)
	return 0;
      return PCI_ARI_CAP_NFN();
    }

    if (fn >= 7)
      return 0;

    if (mf)
      return fn + 1;

    if (hypervisor())
      return fn + 1;

    return 0;
  }

  int pci_scan_slot(...)
  {
    unsigned int fn = 0, mf = 0;

    do {
      dev = pci_scan_single_device(bus, devfn + fn);
      if (dev && dev->multifunction)
	mf = 1;
      fn = next_fn(dev, fn, mf);
    } while (fn);
  }

This would be minimal change to next_fn(): just add the "mf"
parameter, which removes a lot of the confusing "dev" and "!dev"
testing, and add the "fn >= 7" to remove the implicit "% 8 == 0"
failure case.

Then the jailhouse/s390 patch would trivially add the new hypervisor
case, which is clearly separated from everything else.

> > If so, I wonder if we could avoid that case by adding:
> > 
> >   if (fn >= 7)
> >     return -ENODEV;
> > 
> > at the very beginning.  Maybe that would allow a more trivial patch
> > that just changed the error return from 0 to -ENODEV, i.e., leaving
> > all the logic in next_fn() unchanged?
> 
> I think this is equivalent to the ternary at the return. Both return
> -ENODEV for fn >= 7. I do like your idea better though as it keeps with
> the scheme of my point 2 above and ternaries are ever so slightly
> harder to read.

Oops, I don't think we can do this directly because in the ARI case,
fn is basically 8 bits wide so can be 0-255.

Bjorn
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/4] PCI: Clean up pci_scan_slot()
  2022-04-21 17:09         ` Bjorn Helgaas
  (?)
@ 2022-04-22 11:16         ` Niklas Schnelle
  -1 siblings, 0 replies; 12+ messages in thread
From: Niklas Schnelle @ 2022-04-22 11:16 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Jan Kiszka, Matthew Rosato, Pierre Morel,
	linux-kernel, virtualization, linux-s390, linux-pci

[-- Attachment #1: Type: text/plain, Size: 10837 bytes --]

On Thu, 2022-04-21 at 12:09 -0500, Bjorn Helgaas wrote:
> On Thu, Apr 21, 2022 at 11:27:42AM +0200, Niklas Schnelle wrote:
> > On Wed, 2022-04-20 at 21:14 -0500, Bjorn Helgaas wrote:
> > > On Tue, Apr 19, 2022 at 12:28:00PM +0200, Niklas Schnelle wrote:
> > > > While determining the next PCI function is factored out of
> > > > pci_scan_slot() into next_fn() the former still handles the first
> > > > function as a special case duplicating the code from the scan loop and
> > > > splitting the condition that the first function exits from it being
> > > > multifunction which is tested in next_fn().
> > > > 
> > > > Furthermore the non ARI branch of next_fn() mixes the case that
> > > > multifunction devices may have non-contiguous function ranges and dev
> > > > may thus be NULL with the multifunction requirement. It also signals
> > > > that no further functions need to be scanned by returning 0 which is
> > > > a valid function number.
> > > > 
> > > > Improve upon this by moving all conditions for having to scan for more
> > > > functions into next_fn() and make them obvious and commented.
> > > > 
> > > > By changing next_fn() to return -ENODEV instead of 0 when there is no
> > > > next function we can then handle the initial function inside the loop
> > > > and deduplicate the shared handling.
> > > > 
> > > > No functional change is intended.
> > > > 
> > > > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
> > > > ---
> > > >  drivers/pci/probe.c | 41 +++++++++++++++++++----------------------
> > > >  1 file changed, 19 insertions(+), 22 deletions(-)
> > > > 
> > > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> > > > index 17a969942d37..389aa1f9cb2c 100644
> > > > --- a/drivers/pci/probe.c
> > > > +++ b/drivers/pci/probe.c
> > > > @@ -2579,33 +2579,35 @@ struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn)
> > > >  }
> > > >  EXPORT_SYMBOL(pci_scan_single_device);
> > > >  
> > > > -static unsigned int next_fn(struct pci_bus *bus, struct pci_dev *dev,
> > > > -			    unsigned int fn)
> > > > +static int next_fn(struct pci_bus *bus, struct pci_dev *dev, int fn)
> > > >  {
> > > >  	int pos;
> > > >  	u16 cap = 0;
> > > >  	unsigned int next_fn;
> > > >  
> > > > -	if (pci_ari_enabled(bus)) {
> > > > -		if (!dev)
> > > > -			return 0;
> > > > +	if (dev && pci_ari_enabled(bus)) {
> > > 
> > > I think this would be easier to verify if we kept the explicit error
> > > return, e.g.,
> > > 
> > >   if (pci_ari_enabled(bus)) {
> > >     if (!dev)
> > >       return -ENODEV;
> > >     pos = pci_find_ext_capability(...);
> > > 
> > > Otherwise we have to sort through the !dev cases below.  I guess
> > > -ENODEV would come from either the "!fn && !dev" case or the "fn > 6"
> > > case, but it's not obvious to me that those are equivalent to the
> > > previous code.
> > 
> > We could keep this the same for this patch but I think for jailhouse
> > (patch 2) we need the "!dev" case not to fail here such that we can
> > handle the missing function 0 below even if ARI is enabled. For s390
> > this doesn't currently matter because pci_ari_enabled(bus) is always
> > false but I assumed that this isn't necessarily so for jailhouse. I
> > sent a follow up mail on a slight behavior change I can think of for
> > this case for v2 but forgot to send it also for v3. Quoted below:
> 
> I think it would be good to make the first patch change as little as
> possible to make it easier to analyze, then possibly test for
> hypervisor when changing this behavior.

Yes makes sense, in my current local version I've kept the direct
return here. I think I've also found a way to keep this even for the
isolated PCI function case. See below.

> 
> > > > -	/* dev may be NULL for non-contiguous multifunction devices */
> > > > -	if (!dev || dev->multifunction)
> > > > -		return (fn + 1) % 8;
> > > > -
> > > > -	return 0;
> > > > +	/* only multifunction devices may have more functions */
> > > > +	if (dev && !dev->multifunction)
> > > > +		return -ENODEV;
> > > 
> > > I don't understand why the "!dev || dev->multifunction" test needs to
> > > change.  Isn't that valid even in the hypervisor case?  IIUC, you want
> > > to return success in some cases that currently return failure, so this
> > > case that was already success should be fine as it was.
> > 
> > This isn't a change to the test. It's the negation of the logical
> > condition *and* a switch of the branches i.e. keeps the overall
> > behavior exactly the same. The equivalence is !(!A || B) == (A && !B).
> 
> I see the Boolean equivalence, but it's difficult to verify that the
> consequences are equivalent because the new code has the extra "!fn &&
> !dev" test in the middle.

Ok. I would argue that the "fn == 0 && !dev" is just the moved "!dev"
check for the initial pci_scan_single_device() that previously happened
outside the loop. With the modulo gone I can't think of any other way
to get fn == 0 but in the first iteration. 

But you are right, the extra test introduces some extra churn in
next_fn(). I think we can get rid of that new condition such that
next_fn() is more easily verifiable. See below.

> 
> > There are two reasons I did this.
> > 
> > 1. I find (!dev || dev->multifunction) to be much harder to grasp than
> > (dev && !dev->multifunction).
> > 
> > 2. The whole next_fn() in my opinion becomes easier to read if it bails
> > for all bad cases early and the "this is the next fn" is the final
> > return if we didn't bail. This becomes even more true as another
> > condition is added in patch 2.
> 
> Fair enough, and I agree that "this is the next fn" is a nice final
> return.  In general I think it's good to return either an error or the
> next fn as soon as it is known.  It makes it harder to analyze if the
> return value has already been determined but we have to mentally pass
> over subsequent tests that don't affect it.

I agree and I think we can get this with the transformed cases too.

> 
> > > Is this because "(fn + 1) % 8" may be zero, which previously
> > > terminated the loop, but now it doesn't because "fn == 0" is the
> > > *first* execution of the loop?
> > 
> > Yes with function 0 handled in the loop we can't use 0 as the
> > termination indication. Also I find it generally weird to use a wrap
> > around for this.
> 
> Yes, I agree that's weird.  Usually I prefer "for" loops over
> "do ...  while", but this might be a case where it makes sense --
> we *always* want to call pci_scan_single_device() once, and
> "do ... while" would accomplish that without any fuss.  It might even
> allow us to keep the 0 return value as the termination condition,
> which would be nice because fn could stay unsigned and it would reduce
> the size of this patch.

To me the 0 return is part of the weirdness as it is a valid fn value,
so returning it from next_fn() would naturally communicate that the
next fn is fn 0 not that there are no more functions.

It feels like making next_fn() more obvious is worth the larger patch.
I think us having to convince ourselves of these details is testament
that it currently is very hard to understand the interactions here
while the new conditions for stopping the scan are each almost obvious.
So I think if we can convince ourselves that the new code is exactly
equivalent which I believe it is when keeping the check in the ARI
path, then that is what makes the patch save.

> 
> I'm hoping we can end up with something like this:
> 
>   unsigned int next_fn(bus, dev, fn, mf)
>   {
>     if (ari(bus)) {
>       if (!dev)
> 	return 0;
>       return PCI_ARI_CAP_NFN();
>     }
> 
>     if (fn >= 7)
>       return 0;
> 
>     if (mf)
>       return fn + 1;
> 
>     if (hypervisor())
>       return fn + 1;

Ooh, just realized that my series changes the behavior for jailhouse
when the passed through device is not multifunction. In the existing
code pci_scan_single_device() is called for all devfn irrespective of
whether the first function found has dev->multifunction set after
scanning it. I'm not sure if that would happen if e.g. we have multiple
SR-IOV VFs but not the PF.

> 
>     return 0;
>   }
> 
>   int pci_scan_slot(...)
>   {
>     unsigned int fn = 0, mf = 0;
> 
>     do {
>       dev = pci_scan_single_device(bus, devfn + fn);
>       if (dev && dev->multifunction)
> 	mf = 1;
>       fn = next_fn(dev, fn, mf);
>     } while (fn);
>   }
> 
> This would be minimal change to next_fn(): just add the "mf"
> parameter, which removes a lot of the confusing "dev" and "!dev"
> testing, and add the "fn >= 7" to remove the implicit "% 8 == 0"
> failure case.

The extra mf parameter feels a bit superflous as we already have dev-
>multifunction and then would just move the "dev && (!)dev-
>multifunction" test out of next_fn(). 

To me this doesn't look like less of a change to next_fn() either. That
said, it gave me an idea. One way to change next_fn() less is to keep
the "fn == 0 && !dev" test out of it and in pci_scan_slot(). That way
there are no new conditions in next_fn() and the existing conditions
can be transformed as proposed without mixing in new stuff.

With that the scan loop would look something like:

  int pci_scan_slot(...)
  {
    int fn = 0, nr =0 ;

    do {
      dev = pci_scan_single_device(bus, devfn + fn);
      if (dev) {
         ...
      } else if (fn == 0) {
        /* missing function 0*/
        break;
      }
      fn = next_fn(bus, dev, fn);
    } while (fn);
  }

Even better this allows us to keep the "!dev" check in the ARI case as
we don't have to handle the missing function 0 in next_fn().

Let me sent you this variant before we abandon the -ENODEV return and
condition transforms.

> 
> Then the jailhouse/s390 patch would trivially add the new hypervisor
> case, which is clearly separated from everything else.
> 
> > > If so, I wonder if we could avoid that case by adding:
> > > 
> > >   if (fn >= 7)
> > >     return -ENODEV;
> > > 
> > > at the very beginning.  Maybe that would allow a more trivial patch
> > > that just changed the error return from 0 to -ENODEV, i.e., leaving
> > > all the logic in next_fn() unchanged?
> > 
> > I think this is equivalent to the ternary at the return. Both return
> > -ENODEV for fn >= 7. I do like your idea better though as it keeps with
> > the scheme of my point 2 above and ternaries are ever so slightly
> > harder to read.
> 
> Oops, I don't think we can do this directly because in the ARI case,
> fn is basically 8 bits wide so can be 0-255.
> 
> Bjorn

True, we really do need to keep the ARI case separate.


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-04-22 11:16 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-19 10:27 [PATCH v3 0/4] PCI: Rework pci_scan_slot() and isolated PCI functions Niklas Schnelle
2022-04-19 10:28 ` [PATCH v3 1/4] PCI: Clean up pci_scan_slot() Niklas Schnelle
2022-04-21  2:14   ` Bjorn Helgaas
2022-04-21  2:14     ` Bjorn Helgaas
2022-04-21  9:27     ` Niklas Schnelle
2022-04-21 11:14       ` Niklas Schnelle
2022-04-21 17:09       ` Bjorn Helgaas
2022-04-21 17:09         ` Bjorn Helgaas
2022-04-22 11:16         ` Niklas Schnelle
2022-04-19 10:28 ` [PATCH v3 2/4] PCI: Move jailhouse's isolated function handling to pci_scan_slot() Niklas Schnelle
2022-04-19 10:28 ` [PATCH v3 3/4] PCI: Extend isolated function probing to s390 Niklas Schnelle
2022-04-19 10:28 ` [PATCH v3 4/4] s390/pci: allow zPCI zbus without a function zero Niklas Schnelle

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.