All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Niklas Schnelle <schnelle@linux.ibm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
	Jan Kiszka <jan.kiszka@siemens.com>,
	Matthew Rosato <mjrosato@linux.ibm.com>,
	Pierre Morel <pmorel@linux.ibm.com>,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	linux-s390@vger.kernel.org, linux-pci@vger.kernel.org
Subject: Re: [PATCH v3 1/4] PCI: Clean up pci_scan_slot()
Date: Wed, 20 Apr 2022 21:14:46 -0500	[thread overview]
Message-ID: <20220421021446.GA1356365@bhelgaas> (raw)
In-Reply-To: <20220419102803.3430139-2-schnelle@linux.ibm.com>

Hi Niklas,

I'm sure this makes good sense, but I need a little more hand-holding.
Sorry this is long and rambling.

On Tue, Apr 19, 2022 at 12:28:00PM +0200, Niklas Schnelle wrote:
> While determining the next PCI function is factored out of
> pci_scan_slot() into next_fn() the former still handles the first
> function as a special case duplicating the code from the scan loop and
> splitting the condition that the first function exits from it being
> multifunction which is tested in next_fn().
> 
> Furthermore the non ARI branch of next_fn() mixes the case that
> multifunction devices may have non-contiguous function ranges and dev
> may thus be NULL with the multifunction requirement. It also signals
> that no further functions need to be scanned by returning 0 which is
> a valid function number.
> 
> Improve upon this by moving all conditions for having to scan for more
> functions into next_fn() and make them obvious and commented.
> 
> By changing next_fn() to return -ENODEV instead of 0 when there is no
> next function we can then handle the initial function inside the loop
> and deduplicate the shared handling.
> 
> No functional change is intended.
> 
> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
> ---
>  drivers/pci/probe.c | 41 +++++++++++++++++++----------------------
>  1 file changed, 19 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 17a969942d37..389aa1f9cb2c 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2579,33 +2579,35 @@ struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn)
>  }
>  EXPORT_SYMBOL(pci_scan_single_device);
>  
> -static unsigned int next_fn(struct pci_bus *bus, struct pci_dev *dev,
> -			    unsigned int fn)
> +static int next_fn(struct pci_bus *bus, struct pci_dev *dev, int fn)
>  {
>  	int pos;
>  	u16 cap = 0;
>  	unsigned int next_fn;
>  
> -	if (pci_ari_enabled(bus)) {
> -		if (!dev)
> -			return 0;
> +	if (dev && pci_ari_enabled(bus)) {

I think this would be easier to verify if we kept the explicit error
return, e.g.,

  if (pci_ari_enabled(bus)) {
    if (!dev)
      return -ENODEV;
    pos = pci_find_ext_capability(...);

Otherwise we have to sort through the !dev cases below.  I guess
-ENODEV would come from either the "!fn && !dev" case or the "fn > 6"
case, but it's not obvious to me that those are equivalent to the
previous code.

>  		pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ARI);
>  		if (!pos)
> -			return 0;
> +			return -ENODEV;
>  
>  		pci_read_config_word(dev, pos + PCI_ARI_CAP, &cap);
>  		next_fn = PCI_ARI_CAP_NFN(cap);
>  		if (next_fn <= fn)
> -			return 0;	/* protect against malformed list */
> +			return -ENODEV;	/* protect against malformed list */
>  
>  		return next_fn;
>  	}
>  
> -	/* dev may be NULL for non-contiguous multifunction devices */
> -	if (!dev || dev->multifunction)
> -		return (fn + 1) % 8;
> -
> -	return 0;
> +	/* only multifunction devices may have more functions */
> +	if (dev && !dev->multifunction)
> +		return -ENODEV;

I don't understand why the "!dev || dev->multifunction" test needs to
change.  Isn't that valid even in the hypervisor case?  IIUC, you want
to return success in some cases that currently return failure, so this
case that was already success should be fine as it was.

Is this because "(fn + 1) % 8" may be zero, which previously
terminated the loop, but now it doesn't because "fn == 0" is the
*first* execution of the loop?

If so, I wonder if we could avoid that case by adding:

  if (fn >= 7)
    return -ENODEV;

at the very beginning.  Maybe that would allow a more trivial patch
that just changed the error return from 0 to -ENODEV, i.e., leaving
all the logic in next_fn() unchanged?

I'm wondering if this could end up like:

    if (fn >= 7)
      return -ENODEV;

    if (pci_ari_enabled(bus)) {
      if (!dev)
	return -ENODEV;
      ...
      return next_fn;
    }

    if (!dev || dev->multifunction)
      return (fn + 1) % 8;

 +  if (hypervisor_isolated_pci_functions())
 +    return (fn + 1) % 8;

    return -ENODEV;

(The hypervisor part being added in a subsequent patch, and I'm not
sure exactly what logic you need there -- the point being that it's
just an additional success case.)

The "% 8" seems possibly superfluous then, since previously that
caused a zero return that terminated the loop.  If we're using -ENODEV
to terminate the loop, we probably don't care about the mod 8.

> +	/*
> +	 * A function 0 is required but multifunction devices may
> +	 * be non-contiguous so dev can be NULL otherwise.

I understood the original "dev may be NULL ..." comment, but I can't
quite parse this.  "dev can be NULL" for non-zero functions?  That's
basically what it said before, but it's not clear what "otherwise"
refers to.

> +	 */
> +	if (!fn && !dev)
> +		return -ENODEV;

This part isn't obvious to me yet, partly because of the "!fn && !dev"
construction.  The negatives make it hard to parse.

Since "fn" isn't a boolean or a pointer, I think "fn == 0" is easier
to read than "!fn".  I would test "dev" first since it logically
precedes "fn".

IIUC !dev means we haven't found a function at this device number yet.
So this:

  if (!dev && fn == 0)
    return -ENODEV;

means we called pci_scan_single_device(bus, devfn + 0) the first time
through the loop, and it didn't find a device so it returned NULL.

> +	return (fn <= 6) ? fn + 1 : -ENODEV;
>  }
>  
>  static int only_one_child(struct pci_bus *bus)
> @@ -2643,24 +2645,19 @@ static int only_one_child(struct pci_bus *bus)
>   */
>  int pci_scan_slot(struct pci_bus *bus, int devfn)
>  {
> -	unsigned int fn, nr = 0;
> -	struct pci_dev *dev;
> +	int fn, nr = 0;
> +	struct pci_dev *dev = NULL;
>  
>  	if (only_one_child(bus) && (devfn > 0))
>  		return 0; /* Already scanned the entire slot */
>  
> -	dev = pci_scan_single_device(bus, devfn);
> -	if (!dev)
> -		return 0;
> -	if (!pci_dev_is_added(dev))
> -		nr++;
> -
> -	for (fn = next_fn(bus, dev, 0); fn > 0; fn = next_fn(bus, dev, fn)) {
> +	for (fn = 0; fn >= 0; fn = next_fn(bus, dev, fn)) {
>  		dev = pci_scan_single_device(bus, devfn + fn);

"devfn + fn" (in the existing, unchanged code) is a little bit weird.
In almost all cases, devfn is the result of "PCI_DEVFN(slot, 0)", so
we could make the interface:

  pci_scan_slot(struct pci_bus *bus, int dev)

where "dev" is 0-31.

The only exceptions are a couple hotplug drivers where the fn probably
is or should be 0, too, but I haven't verified that.

But this would be scope creep, so possibly something we could consider
in the future, but not for this series.

>  		if (dev) {
>  			if (!pci_dev_is_added(dev))
>  				nr++;
> -			dev->multifunction = 1;
> +			if (nr > 1)
> +				dev->multifunction = 1;
>  		}
>  	}
>  
> -- 
> 2.32.0
> 

WARNING: multiple messages have this Message-ID (diff)
From: Bjorn Helgaas <helgaas@kernel.org>
To: Niklas Schnelle <schnelle@linux.ibm.com>
Cc: linux-s390@vger.kernel.org,
	Matthew Rosato <mjrosato@linux.ibm.com>,
	Pierre Morel <pmorel@linux.ibm.com>,
	Jan Kiszka <jan.kiszka@siemens.com>,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	linux-pci@vger.kernel.org, Bjorn Helgaas <bhelgaas@google.com>
Subject: Re: [PATCH v3 1/4] PCI: Clean up pci_scan_slot()
Date: Wed, 20 Apr 2022 21:14:46 -0500	[thread overview]
Message-ID: <20220421021446.GA1356365@bhelgaas> (raw)
In-Reply-To: <20220419102803.3430139-2-schnelle@linux.ibm.com>

Hi Niklas,

I'm sure this makes good sense, but I need a little more hand-holding.
Sorry this is long and rambling.

On Tue, Apr 19, 2022 at 12:28:00PM +0200, Niklas Schnelle wrote:
> While determining the next PCI function is factored out of
> pci_scan_slot() into next_fn() the former still handles the first
> function as a special case duplicating the code from the scan loop and
> splitting the condition that the first function exits from it being
> multifunction which is tested in next_fn().
> 
> Furthermore the non ARI branch of next_fn() mixes the case that
> multifunction devices may have non-contiguous function ranges and dev
> may thus be NULL with the multifunction requirement. It also signals
> that no further functions need to be scanned by returning 0 which is
> a valid function number.
> 
> Improve upon this by moving all conditions for having to scan for more
> functions into next_fn() and make them obvious and commented.
> 
> By changing next_fn() to return -ENODEV instead of 0 when there is no
> next function we can then handle the initial function inside the loop
> and deduplicate the shared handling.
> 
> No functional change is intended.
> 
> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
> ---
>  drivers/pci/probe.c | 41 +++++++++++++++++++----------------------
>  1 file changed, 19 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 17a969942d37..389aa1f9cb2c 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2579,33 +2579,35 @@ struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn)
>  }
>  EXPORT_SYMBOL(pci_scan_single_device);
>  
> -static unsigned int next_fn(struct pci_bus *bus, struct pci_dev *dev,
> -			    unsigned int fn)
> +static int next_fn(struct pci_bus *bus, struct pci_dev *dev, int fn)
>  {
>  	int pos;
>  	u16 cap = 0;
>  	unsigned int next_fn;
>  
> -	if (pci_ari_enabled(bus)) {
> -		if (!dev)
> -			return 0;
> +	if (dev && pci_ari_enabled(bus)) {

I think this would be easier to verify if we kept the explicit error
return, e.g.,

  if (pci_ari_enabled(bus)) {
    if (!dev)
      return -ENODEV;
    pos = pci_find_ext_capability(...);

Otherwise we have to sort through the !dev cases below.  I guess
-ENODEV would come from either the "!fn && !dev" case or the "fn > 6"
case, but it's not obvious to me that those are equivalent to the
previous code.

>  		pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ARI);
>  		if (!pos)
> -			return 0;
> +			return -ENODEV;
>  
>  		pci_read_config_word(dev, pos + PCI_ARI_CAP, &cap);
>  		next_fn = PCI_ARI_CAP_NFN(cap);
>  		if (next_fn <= fn)
> -			return 0;	/* protect against malformed list */
> +			return -ENODEV;	/* protect against malformed list */
>  
>  		return next_fn;
>  	}
>  
> -	/* dev may be NULL for non-contiguous multifunction devices */
> -	if (!dev || dev->multifunction)
> -		return (fn + 1) % 8;
> -
> -	return 0;
> +	/* only multifunction devices may have more functions */
> +	if (dev && !dev->multifunction)
> +		return -ENODEV;

I don't understand why the "!dev || dev->multifunction" test needs to
change.  Isn't that valid even in the hypervisor case?  IIUC, you want
to return success in some cases that currently return failure, so this
case that was already success should be fine as it was.

Is this because "(fn + 1) % 8" may be zero, which previously
terminated the loop, but now it doesn't because "fn == 0" is the
*first* execution of the loop?

If so, I wonder if we could avoid that case by adding:

  if (fn >= 7)
    return -ENODEV;

at the very beginning.  Maybe that would allow a more trivial patch
that just changed the error return from 0 to -ENODEV, i.e., leaving
all the logic in next_fn() unchanged?

I'm wondering if this could end up like:

    if (fn >= 7)
      return -ENODEV;

    if (pci_ari_enabled(bus)) {
      if (!dev)
	return -ENODEV;
      ...
      return next_fn;
    }

    if (!dev || dev->multifunction)
      return (fn + 1) % 8;

 +  if (hypervisor_isolated_pci_functions())
 +    return (fn + 1) % 8;

    return -ENODEV;

(The hypervisor part being added in a subsequent patch, and I'm not
sure exactly what logic you need there -- the point being that it's
just an additional success case.)

The "% 8" seems possibly superfluous then, since previously that
caused a zero return that terminated the loop.  If we're using -ENODEV
to terminate the loop, we probably don't care about the mod 8.

> +	/*
> +	 * A function 0 is required but multifunction devices may
> +	 * be non-contiguous so dev can be NULL otherwise.

I understood the original "dev may be NULL ..." comment, but I can't
quite parse this.  "dev can be NULL" for non-zero functions?  That's
basically what it said before, but it's not clear what "otherwise"
refers to.

> +	 */
> +	if (!fn && !dev)
> +		return -ENODEV;

This part isn't obvious to me yet, partly because of the "!fn && !dev"
construction.  The negatives make it hard to parse.

Since "fn" isn't a boolean or a pointer, I think "fn == 0" is easier
to read than "!fn".  I would test "dev" first since it logically
precedes "fn".

IIUC !dev means we haven't found a function at this device number yet.
So this:

  if (!dev && fn == 0)
    return -ENODEV;

means we called pci_scan_single_device(bus, devfn + 0) the first time
through the loop, and it didn't find a device so it returned NULL.

> +	return (fn <= 6) ? fn + 1 : -ENODEV;
>  }
>  
>  static int only_one_child(struct pci_bus *bus)
> @@ -2643,24 +2645,19 @@ static int only_one_child(struct pci_bus *bus)
>   */
>  int pci_scan_slot(struct pci_bus *bus, int devfn)
>  {
> -	unsigned int fn, nr = 0;
> -	struct pci_dev *dev;
> +	int fn, nr = 0;
> +	struct pci_dev *dev = NULL;
>  
>  	if (only_one_child(bus) && (devfn > 0))
>  		return 0; /* Already scanned the entire slot */
>  
> -	dev = pci_scan_single_device(bus, devfn);
> -	if (!dev)
> -		return 0;
> -	if (!pci_dev_is_added(dev))
> -		nr++;
> -
> -	for (fn = next_fn(bus, dev, 0); fn > 0; fn = next_fn(bus, dev, fn)) {
> +	for (fn = 0; fn >= 0; fn = next_fn(bus, dev, fn)) {
>  		dev = pci_scan_single_device(bus, devfn + fn);

"devfn + fn" (in the existing, unchanged code) is a little bit weird.
In almost all cases, devfn is the result of "PCI_DEVFN(slot, 0)", so
we could make the interface:

  pci_scan_slot(struct pci_bus *bus, int dev)

where "dev" is 0-31.

The only exceptions are a couple hotplug drivers where the fn probably
is or should be 0, too, but I haven't verified that.

But this would be scope creep, so possibly something we could consider
in the future, but not for this series.

>  		if (dev) {
>  			if (!pci_dev_is_added(dev))
>  				nr++;
> -			dev->multifunction = 1;
> +			if (nr > 1)
> +				dev->multifunction = 1;
>  		}
>  	}
>  
> -- 
> 2.32.0
> 
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

  reply	other threads:[~2022-04-21  2:15 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-19 10:27 [PATCH v3 0/4] PCI: Rework pci_scan_slot() and isolated PCI functions Niklas Schnelle
2022-04-19 10:28 ` [PATCH v3 1/4] PCI: Clean up pci_scan_slot() Niklas Schnelle
2022-04-21  2:14   ` Bjorn Helgaas [this message]
2022-04-21  2:14     ` Bjorn Helgaas
2022-04-21  9:27     ` Niklas Schnelle
2022-04-21 11:14       ` Niklas Schnelle
2022-04-21 17:09       ` Bjorn Helgaas
2022-04-21 17:09         ` Bjorn Helgaas
2022-04-22 11:16         ` Niklas Schnelle
2022-04-19 10:28 ` [PATCH v3 2/4] PCI: Move jailhouse's isolated function handling to pci_scan_slot() Niklas Schnelle
2022-04-19 10:28 ` [PATCH v3 3/4] PCI: Extend isolated function probing to s390 Niklas Schnelle
2022-04-19 10:28 ` [PATCH v3 4/4] s390/pci: allow zPCI zbus without a function zero Niklas Schnelle

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220421021446.GA1356365@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=bhelgaas@google.com \
    --cc=jan.kiszka@siemens.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mjrosato@linux.ibm.com \
    --cc=pmorel@linux.ibm.com \
    --cc=schnelle@linux.ibm.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.