All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
To: Sean V Kelley <sean.v.kelley@intel.com>
Cc: <bhelgaas@google.com>, <rjw@rjwysocki.net>,
	<ashok.raj@kernel.org>, <tony.luck@intel.com>,
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	<linux-pci@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Subject: Re: [RFC PATCH 6/9] PCI: Add 'rcec' field to pci_dev for associated RCiEPs
Date: Mon, 27 Jul 2020 17:11:04 +0100	[thread overview]
Message-ID: <20200727171104.000053f8@huawei.com> (raw)
In-Reply-To: <20200727122358.00006c23@Huawei.com>

On Mon, 27 Jul 2020 12:23:58 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:

> On Fri, 24 Jul 2020 10:22:20 -0700
> Sean V Kelley <sean.v.kelley@intel.com> wrote:
> 
> > From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> > 
> > When attempting error recovery for an RCiEP associated with an RCEC device,
> > there needs to be a way to update the Root Error Status, the Uncorrectable
> > Error Status and the Uncorrectable Error Severity of the parent RCEC.
> > So add the 'rcec' field to the pci_dev structure and provide a hook for the
> > Root Port Driver to associate RCiEPs with their respective parent RCEC.
> > 
> > Co-developed-by: Sean V Kelley <sean.v.kelley@intel.com>
> > Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> > Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>  
> 
> I haven't tested yet, but I think there is one path in here that breaks
> my case (no OS visible rcec / all done in firmware GHESv2 / APEI)
> 
> Jonathan
> 
> > ---
> >  drivers/pci/pcie/aer.c         |  9 +++++----
> >  drivers/pci/pcie/err.c         |  9 +++++++++
> >  drivers/pci/pcie/portdrv_pci.c | 15 +++++++++++++++
> >  include/linux/pci.h            |  3 +++
> >  4 files changed, 32 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > index 3acf56683915..f1bf06be449e 100644
> > --- a/drivers/pci/pcie/aer.c
> > +++ b/drivers/pci/pcie/aer.c
> > @@ -1335,17 +1335,18 @@ static int aer_probe(struct pcie_device *dev)
> >  static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
> >  {
> >  	int aer = dev->aer_cap;
> > +	int rc = 0;
> >  	u32 reg32;
> > -	int rc;
> > -
> >  
> >  	/* Disable Root's interrupt in response to error messages */
> >  	pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, &reg32);
> >  	reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
> >  	pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32);
> >  
> > -	rc = pci_bus_error_reset(dev);
> > -	pci_info(dev, "Root Port link has been reset\n");
> > +	if (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_EC) {
> > +		rc = pci_bus_error_reset(dev);
> > +		pci_info(dev, "Root Port link has been reset\n");
> > +	}
> >  
> >  	/* Clear Root Error Status */
> >  	pci_read_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, &reg32);
> > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> > index 9b3ec94bdf1d..0aae7643132e 100644
> > --- a/drivers/pci/pcie/err.c
> > +++ b/drivers/pci/pcie/err.c
> > @@ -203,6 +203,11 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> >  		pci_walk_dev_affected(dev, report_frozen_detected, &status);
> >  		if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END) {
> >  			status = flr_on_rciep(dev);
> > +			/*
> > +			 * The callback only clears the Root Error Status
> > +			 * of the RCEC (see aer.c).
> > +			 */
> > +			reset_link(dev->rcec);  
> 
> This looks dangerous for my case where APEI / GHESv2 is used.  In that case
> we don't expose an RCEC at all.   I don't think the reset_link callback
> is safe to a null pointer here.  Fix may be as simple as
> if (dev->rcec)
> 	reset_link(dev->rcec);
> 
> 
> >  			if (status != PCI_ERS_RESULT_RECOVERED) {
> >  				pci_warn(dev, "function level reset failed\n");
> >  				goto failed;
> > @@ -246,7 +251,11 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> >  	     pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC)) {
> >  		pci_aer_clear_device_status(dev);
> >  		pci_aer_clear_nonfatal_status(dev);
> > +	} else if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END) {
> > +		pci_aer_clear_device_status(dev->rcec);
This needs updating as well to match current approach in Bjorn's tree

(reworked version of my fix for this in th !is_native case)

> > +		pci_aer_clear_nonfatal_status(dev->rcec);  
> 
> These may be safe as in my both now have protections for !pcie_aer_is_native.
Except I'm wrong.  That function performs a local check based
on the pci_dev passed in, so blows up anyway.

So this whole block needs an if(dev->rcec), or potentially protect it with
a call to pci_aer_is_native(dev) on the basis we shouldn't be able to get
here unless we are not doing native aer or we have an rcec.

Jonathan

> 
> >  	}
> > +
> >  	pci_info(dev, "device recovery successful\n");
> >  	return status;
> >  
> > diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> > index d5b109499b10..f9409a0110c2 100644
> > --- a/drivers/pci/pcie/portdrv_pci.c
> > +++ b/drivers/pci/pcie/portdrv_pci.c
> > @@ -90,6 +90,18 @@ static const struct dev_pm_ops pcie_portdrv_pm_ops = {
> >  #define PCIE_PORTDRV_PM_OPS	NULL
> >  #endif /* !PM */
> >  
> > +static int pcie_hook_rcec(struct pci_dev *pdev, void *data)
> > +{
> > +	struct pci_dev *rcec = (struct pci_dev *)data;
> > +
> > +	pdev->rcec = rcec;
> > +	pci_info(rcec, "RCiEP(under an RCEC) %04x:%02x:%02x.%d\n",
> > +		 pci_domain_nr(pdev->bus), pdev->bus->number,
> > +		 PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));  
> 
> We may want to make this debug info at somepoint if we have a way
> of discovering it from userspace.   The PCI boot up is extremely
> verbose already!
> 
> > +
> > +	return 0;
> > +}
> > +
> >  /*
> >   * pcie_portdrv_probe - Probe PCI-Express port devices
> >   * @dev: PCI-Express port device being probed
> > @@ -110,6 +122,9 @@ static int pcie_portdrv_probe(struct pci_dev *dev,
> >  	     (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_EC)))
> >  		return -ENODEV;
> >  
> > +	if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC)
> > +		pcie_walk_rcec(dev, pcie_hook_rcec, dev);
> > +
> >  	status = pcie_port_device_register(dev);
> >  	if (status)
> >  		return status;
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index 34c1c4f45288..e920f29df40b 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -326,6 +326,9 @@ struct pci_dev {
> >  #ifdef CONFIG_PCIEAER
> >  	u16		aer_cap;	/* AER capability offset */
> >  	struct aer_stats *aer_stats;	/* AER stats for this device */
> > +#endif
> > +#ifdef CONFIG_PCIEPORTBUS
> > +	struct pci_dev	*rcec;		/* Associated RCEC device */
> >  #endif
> >  	u8		pcie_cap;	/* PCIe capability offset */
> >  	u8		msi_cap;	/* MSI capability offset */  
> 


  parent reply	other threads:[~2020-07-27 16:11 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-24 17:22 [RFC PATCH 0/9] Add RCEC handling to PCI/AER Sean V Kelley
2020-07-24 17:22 ` [RFC PATCH 1/9] pci_ids: Add class code and extended capability for RCEC Sean V Kelley
2020-07-27 10:00   ` Jonathan Cameron
2020-07-27 10:21     ` Jonathan Cameron
2020-07-27 15:22       ` Sean V Kelley
2020-07-24 17:22 ` [RFC PATCH 2/9] PCI: Extend Root Port Driver to support RCEC Sean V Kelley
2020-07-27 12:30   ` Jonathan Cameron
2020-07-27 15:05     ` Sean V Kelley
2020-07-24 17:22 ` [RFC PATCH 3/9] PCI/portdrv: Add pcie_walk_rcec() to walk RCiEPs associated with RCEC Sean V Kelley
2020-07-27 10:49   ` Jonathan Cameron
2020-07-27 15:21     ` Sean V Kelley
2020-07-24 17:22 ` [RFC PATCH 4/9] PCI/AER: Extend AER error handling to RCECs Sean V Kelley
2020-07-27 11:00   ` Jonathan Cameron
2020-07-27 14:58     ` Sean V Kelley
2020-07-27 14:04   ` Jonathan Cameron
2020-07-27 15:00     ` Sean V Kelley
2020-07-24 17:22 ` [RFC PATCH 5/9] PCI/AER: Apply function level reset to RCiEP on fatal error Sean V Kelley
2020-07-25 10:05   ` kernel test robot
2020-07-27 11:17   ` Jonathan Cameron
2020-07-28 13:27     ` Zhuo, Qiuxu
2020-07-28 16:14       ` Sean V Kelley
2020-07-28 17:02         ` Jonathan Cameron
2020-07-28 17:42           ` Sean V Kelley
2020-07-24 17:22 ` [RFC PATCH 6/9] PCI: Add 'rcec' field to pci_dev for associated RCiEPs Sean V Kelley
2020-07-27 11:23   ` Jonathan Cameron
2020-07-27 15:39     ` Sean V Kelley
2020-07-27 16:11     ` Jonathan Cameron [this message]
2020-07-27 16:28       ` Sean V Kelley
2020-07-24 17:22 ` [RFC PATCH 7/9] PCI/AER: Add RCEC AER handling Sean V Kelley
2020-07-27 12:22   ` Jonathan Cameron
2020-07-27 15:19     ` Sean V Kelley
2020-07-27 17:14       ` Jonathan Cameron
2020-07-24 17:22 ` [RFC PATCH 8/9] PCI/PME: Add RCEC PME handling Sean V Kelley
2020-08-04  8:35   ` Jay Fang
2020-08-04  9:47     ` Jonathan Cameron
2020-07-24 17:22 ` [RFC PATCH 9/9] PCI/AER: Add RCEC AER error injection support Sean V Kelley
2020-07-27 12:37 ` [RFC PATCH 0/9] Add RCEC handling to PCI/AER Jonathan Cameron
2020-07-27 14:56   ` Sean V Kelley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200727171104.000053f8@huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=ashok.raj@kernel.org \
    --cc=bhelgaas@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=qiuxu.zhuo@intel.com \
    --cc=rjw@rjwysocki.net \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=sean.v.kelley@intel.com \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.