linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: <Alex_Gagniuc@Dellteam.com>
To: <alex.williamson@redhat.com>, <bhelgaas@google.com>,
	<helgaas@kernel.org>, <mr.nuke.me@gmail.com>,
	<linux-pci@vger.kernel.org>
Cc: <Austin.Bolen@dell.com>, <keith.busch@intel.com>,
	<Shyam.Iyer@dell.com>, <lukas@wunner.de>, <okaya@kernel.org>,
	<torvalds@linux-foundation.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] PCI: Add link_change error handler and vfio-pci user
Date: Wed, 24 Apr 2019 16:45:45 +0000	[thread overview]
Message-ID: <44c43b8c1739488181930c074bb6eddb@ausx13mps321.AMER.DELL.COM> (raw)
In-Reply-To: 155605909349.3575.13433421148215616375.stgit@gimli.home

On 4/23/2019 5:42 PM, Alex Williamson wrote:
> The PCIe bandwidth notification service generates logging any time a
> link changes speed or width to a state that is considered downgraded.
> Unfortunately, it cannot differentiate signal integrity related link
> changes from those intentionally initiated by an endpoint driver,
> including drivers that may live in userspace or VMs when making use
> of vfio-pci.  Therefore, allow the driver to have a say in whether
> the link is indeed downgraded and worth noting in the log, or if the
> change is perhaps intentional.
> 
> For vfio-pci, we don't know the intentions of the user/guest driver
> either, but we do know that GPU drivers in guests actively manage
> the link state and therefore trigger the bandwidth notification for
> what appear to be entirely intentional link changes.
> 
> Fixes: e8303bb7a75c PCI/LINK: Report degraded links via link bandwidth notification
> Link: https://lore.kernel.org/linux-pci/155597243666.19387.1205950870601742062.stgit@gimli.home/T/#u
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
> 
> Changing to pci_dbg() logging is not super usable, so let's try the
> previous idea of letting the driver handle link change events as they
> see fit.  Ideally this might be two patches, but for easier handling,
> folding the pci and vfio-pci bits together.  Comments?  Thanks,

I think this callback opens up a can of worms where drivers can ad-hoc 
kill a number what otherwise can be indicators of problems. But I don't 
have to like it to review it :).

>   drivers/pci/probe.c         |   13 +++++++++++++
>   drivers/vfio/pci/vfio_pci.c |   10 ++++++++++
>   include/linux/pci.h         |    3 +++
>   3 files changed, 26 insertions(+)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 7e12d0163863..233cd4b5b6e8 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2403,6 +2403,19 @@ void pcie_report_downtraining(struct pci_dev *dev)

I don't think you want to change pcie_report_downtraining(). You're 
advertising to "report" something, by nomenclature, but then go around 
and also call a notification callback. This is also used during probe, 
and you've now just killed your chance to notice you've booted with a 
degraded link.
If what you want to do is silence the bandwidth notification, you want 
to modify the threaded interrupt that calls this.

>   	if (PCI_FUNC(dev->devfn) != 0 || dev->is_virtfn)
>   		return;
>   
> +	/*
> +	 * If driver handles link_change event, defer to driver.  PCIe drivers
> +	 * can call pcie_print_link_status() to print current link info.
> +	 */
> +	device_lock(&dev->dev);
> +	if (dev->driver && dev->driver->err_handler &&
> +	    dev->driver->err_handler->link_change) {
> +		dev->driver->err_handler->link_change(dev);
> +		device_unlock(&dev->dev);
> +		return;
> +	}
> +	device_unlock(&dev->dev);

Can we write this such that there is a single lock()/unlock() pair?

> +
>   	/* Print link status only if the device is constrained by the fabric */
>   	__pcie_print_link_status(dev, false);
>   }
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index cab71da46f4a..c9ffc0ccabb3 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -1418,8 +1418,18 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
>   	return PCI_ERS_RESULT_CAN_RECOVER;
>   }
>   
> +/*
> + * Ignore link change notification, we can't differentiate signal related
> + * link changes from user driver power management type operations, so do
> + * nothing.  Potentially this could be routed out to the user.
> + */
> +static void vfio_pci_link_change(struct pci_dev *pdev)
> +{
> +}
> +
>   static const struct pci_error_handlers vfio_err_handlers = {
>   	.error_detected = vfio_pci_aer_err_detected,
> +	.link_change = vfio_pci_link_change,
>   };
>   
>   static struct pci_driver vfio_pci_driver = {
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 27854731afc4..e9194bc03f9e 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -763,6 +763,9 @@ struct pci_error_handlers {
>   
>   	/* Device driver may resume normal operations */
>   	void (*resume)(struct pci_dev *dev);
> +
> +	/* PCIe link change notification */
> +	void (*link_change)(struct pci_dev *dev);
>   };
>   
>   
> 
> 



  reply	other threads:[~2019-04-24 21:34 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-23 22:42 [PATCH] PCI: Add link_change error handler and vfio-pci user Alex Williamson
2019-04-24 16:45 ` Alex_Gagniuc [this message]
2019-04-24 17:19   ` Alex Williamson
2019-04-24 17:35     ` Alex G
2019-04-24 17:57 ` Bjorn Helgaas
2019-04-29 14:51   ` Alex Williamson
2019-04-29 16:45     ` Sinan Kaya
2019-04-29 16:59       ` Alex Williamson
2019-04-30 17:59         ` Keith Busch
2019-04-29 17:43     ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44c43b8c1739488181930c074bb6eddb@ausx13mps321.AMER.DELL.COM \
    --to=alex_gagniuc@dellteam.com \
    --cc=Austin.Bolen@dell.com \
    --cc=Shyam.Iyer@dell.com \
    --cc=alex.williamson@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=helgaas@kernel.org \
    --cc=keith.busch@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=mr.nuke.me@gmail.com \
    --cc=okaya@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).