linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Kelley, Sean V" <sean.v.kelley@intel.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: "bhelgaas@google.com" <bhelgaas@google.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	"xerces.zhao@gmail.com" <xerces.zhao@gmail.com>,
	"Wysocki, Rafael J" <rafael.j.wysocki@intel.com>,
	"Raj, Ashok" <ashok.raj@intel.com>,
	"Luck, Tony" <tony.luck@intel.com>,
	"Kuppuswamy,
	Sathyanarayanan" <sathyanarayanan.kuppuswamy@intel.com>,
	"Zhuo, Qiuxu" <qiuxu.zhuo@intel.com>,
	Linux PCI <linux-pci@vger.kernel.org>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v12 12/15] PCI/RCEC: Add RCiEP's linked RCEC to AER/ERR
Date: Thu, 3 Dec 2020 00:51:40 +0000	[thread overview]
Message-ID: <6E339ABE-2F55-486B-833A-BDDAF27A114D@intel.com> (raw)
In-Reply-To: <20201202234425.GA1486740@bjorn-Precision-5520>



> On Dec 2, 2020, at 3:44 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> 
> On Fri, Nov 20, 2020 at 04:10:33PM -0800, Sean V Kelley wrote:
>> From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>> 
>> When attempting error recovery for an RCiEP associated with an RCEC device,
>> there needs to be a way to update the Root Error Status, the Uncorrectable
>> Error Status and the Uncorrectable Error Severity of the parent RCEC.  In
>> some non-native cases in which there is no OS-visible device associated
>> with the RCiEP, there is nothing to act upon as the firmware is acting
>> before the OS.
>> 
>> Add handling for the linked RCEC in AER/ERR while taking into account
>> non-native cases.
>> 
>> Co-developed-by: Sean V Kelley <sean.v.kelley@intel.com>
>> Link: https://lore.kernel.org/r/20201002184735.1229220-12-seanvk.dev@oregontracks.org
>> Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
>> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> ---
>> drivers/pci/pcie/aer.c | 46 +++++++++++++++++++++++++++++++-----------
>> drivers/pci/pcie/err.c | 20 +++++++++---------
>> 2 files changed, 44 insertions(+), 22 deletions(-)
>> 
>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>> index 0ba0b47ae751..51389a6ee4ca 100644
>> --- a/drivers/pci/pcie/aer.c
>> +++ b/drivers/pci/pcie/aer.c
>> @@ -1358,29 +1358,51 @@ static int aer_probe(struct pcie_device *dev)
>>  */
>> static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
>> {
>> -	int aer = dev->aer_cap;
>> +	int type = pci_pcie_type(dev);
>> +	struct pci_dev *root;
>> +	int aer = 0;
>> +	int rc = 0;
>> 	u32 reg32;
>> -	int rc;
>> 
>> -	if (pcie_aer_is_native(dev)) {
>> +	if (type == PCI_EXP_TYPE_RC_END)
>> +		/*
>> +		 * The reset should only clear the Root Error Status
>> +		 * of the RCEC. Only perform this for the
>> +		 * native case, i.e., an RCEC is present.
>> +		 */
>> +		root = dev->rcec;
>> +	else
>> +		root = dev;
>> +
>> +	if (root)
>> +		aer = dev->aer_cap;
>> +
>> +	if ((aer) && pcie_aer_is_native(dev)) {
>> 		/* Disable Root's interrupt in response to error messages */
>> -		pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, &reg32);
>> +		pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, &reg32);
>> 		reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
>> -		pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32);
>> +		pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32);
>> 	}
>> 
>> -	rc = pci_bus_error_reset(dev);
>> -	pci_info(dev, "Root Port link has been reset (%d)\n", rc);
>> +	if (type == PCI_EXP_TYPE_RC_EC || type == PCI_EXP_TYPE_RC_END) {
>> +		if (pcie_has_flr(dev)) {
>> +			rc = pcie_flr(dev);
>> +			pci_info(dev, "has been reset (%d)\n", rc);
> 
> Maybe:
> 
>  +             } else {
>  +                     rc = -ENOTTY;
>  +                     pci_info(dev, "not reset (no FLR support)\n");
> 
> Or do we want to pretend the device was reset and return
> PCI_ERS_RESULT_RECOVERED?

We are currently doing the latter now with the default of rc = 0 above and so  I’m not sure the extra detail here on the absence of FLR support is of value.


> 
>> +	} else {
>> +		rc = pci_bus_error_reset(dev);
>> +		pci_info(dev, "Root Port link has been reset (%d)\n", rc);
>> +	}
>> 
>> -	if (pcie_aer_is_native(dev)) {
>> +	if ((aer) && pcie_aer_is_native(dev)) {
>> 		/* Clear Root Error Status */
>> -		pci_read_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, &reg32);
>> -		pci_write_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, reg32);
>> +		pci_read_config_dword(root, aer + PCI_ERR_ROOT_STATUS, &reg32);
>> +		pci_write_config_dword(root, aer + PCI_ERR_ROOT_STATUS, reg32);
>> 
>> 		/* Enable Root Port's interrupt in response to error messages */
>> -		pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, &reg32);
>> +		pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, &reg32);
>> 		reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
>> -		pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32);
>> +		pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32);
>> 	}
>> 
>> 	return rc ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_RECOVERED;
>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
>> index 7883c9791562..cbc5abfe767b 100644
>> --- a/drivers/pci/pcie/err.c
>> +++ b/drivers/pci/pcie/err.c
>> @@ -148,10 +148,10 @@ static int report_resume(struct pci_dev *dev, void *data)
>> 
>> /**
>>  * pci_walk_bridge - walk bridges potentially AER affected
>> - * @bridge:	bridge which may be a Port, an RCEC with associated RCiEPs,
>> - *		or an RCiEP associated with an RCEC
>> - * @cb:		callback to be called for each device found
>> - * @userdata:	arbitrary pointer to be passed to callback
>> + * @bridge   bridge which may be an RCEC with associated RCiEPs,
>> + *           or a Port.
>> + * @cb       callback to be called for each device found
>> + * @userdata arbitrary pointer to be passed to callback.
>>  *
>>  * If the device provided is a bridge, walk the subordinate bus, including
>>  * any bridged devices on buses under this bus.  Call the provided callback
>> @@ -164,8 +164,14 @@ static void pci_walk_bridge(struct pci_dev *bridge,
>> 			    int (*cb)(struct pci_dev *, void *),
>> 			    void *userdata)
>> {
>> +	/*
>> +	 * In a non-native case where there is no OS-visible reporting
>> +	 * device the bridge will be NULL, i.e., no RCEC, no Downstream Port.
> 
> I don't quite understand this comment.  I see that in the non-native
> case, the reporting device may not be OS-visible.  But I don't
> understand why the comment is *here*.
> 
> If "bridge" can be NULL here, we should test that before dereferencing
> "bridge->subordinate".

Wrongly worded.  The subordinate may be NULL or the associated RCEC may be NULL, not the “bridge”.
However, per below, we should not be trying to call report_frozen_detected(), report_mmio_enabled() via
the associated RCEC’s driver, but rather the CB for the RCiEP itself.

Going back to this conversation,

https://lore.kernel.org/linux-pci/20201016172210.GA86168@bjorn-Precision-5520/

"Looks like *this* is the patch where the "no subordinate bus" case
becomes possible?  If you agree, I can just move the test here, no
need to repost.”

It is actually the case we are only dealing with the absence of a subordinate bus.

> 
>> 	if (bridge->subordinate)
>> 		pci_walk_bus(bridge->subordinate, cb, userdata);
>> +	else if (bridge->rcec)
>> +		cb(bridge->rcec, userdata);
> 
> And I don't understand what's going on here.  In this case, I *think*
> "bridge" is an RCiEP and "bridge->rcec" is the related RCEC, so it
> looks like we'll call report_frozen_detected(), report_mmio_enabled(),
> etc for the RCEC driver.  I would think we'd want the RCiEP driver.

Indeed, the bridge->rcec here is the dev->rcec in which the dev is the RCiEP.

And we don’t need that conditional here, it should just hit the device driver’s routines.

This is an unfortunate side effect of the RCiEP being subordinate to the RCEC but for
 purposes of linking, it gives the impression of the other way around.


> 
> Sorry if I'm missing the obvious.

Actually your observations are on point.

Thanks,

Sean

> 
>> 	else
>> 		cb(bridge, userdata);
>> }
>> @@ -194,12 +200,6 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>> 	pci_dbg(bridge, "broadcast error_detected message\n");
>> 	if (state == pci_channel_io_frozen) {
>> 		pci_walk_bridge(bridge, report_frozen_detected, &status);
>> -		if (type == PCI_EXP_TYPE_RC_END) {
>> -			pci_warn(dev, "subordinate device reset not possible for RCiEP\n");
>> -			status = PCI_ERS_RESULT_NONE;
>> -			goto failed;
>> -		}
>> -
>> 		status = reset_subordinates(bridge);
>> 		if (status != PCI_ERS_RESULT_RECOVERED) {
>> 			pci_warn(bridge, "subordinate device reset failed\n");
>> -- 
>> 2.29.2
>> 


  reply	other threads:[~2020-12-03  0:52 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-21  0:10 [PATCH v12 00/15] Add RCEC handling to PCI/AER Sean V Kelley
2020-11-21  0:10 ` [PATCH v12 01/15] AER: aer_root_reset() non-native handling Sean V Kelley
2020-11-21  0:10 ` [PATCH v12 02/15] PCI/RCEC: Bind RCEC devices to the Root Port driver Sean V Kelley
2020-11-21  0:10 ` [PATCH v12 03/15] PCI/RCEC: Cache RCEC capabilities in pci_init_capabilities() Sean V Kelley
2020-11-21  0:10 ` [PATCH v12 04/15] PCI/ERR: Rename reset_link() to reset_subordinates() Sean V Kelley
2020-11-21  0:10 ` [PATCH v12 05/15] PCI/ERR: Simplify by using pci_upstream_bridge() Sean V Kelley
2020-12-03 18:45   ` Kelley, Sean V
2020-12-03 22:25     ` Bjorn Helgaas
2020-11-21  0:10 ` [PATCH v12 06/15] PCI/ERR: Simplify by computing pci_pcie_type() once Sean V Kelley
2020-11-21  0:10 ` [PATCH v12 07/15] PCI/ERR: Use "bridge" for clarity in pcie_do_recovery() Sean V Kelley
2020-12-02 23:18   ` Bjorn Helgaas
2020-11-21  0:10 ` [PATCH v12 08/15] PCI/ERR: Avoid negated conditional for clarity Sean V Kelley
2020-11-21  0:10 ` [PATCH v12 09/15] PCI/ERR: Add pci_walk_bridge() to pcie_do_recovery() Sean V Kelley
2020-11-21  0:10 ` [PATCH v12 10/15] PCI/ERR: Limit AER resets in pcie_do_recovery() Sean V Kelley
2020-11-23 23:28   ` Bjorn Helgaas
2020-11-23 23:57     ` Kelley, Sean V
2020-11-24 17:17       ` Bjorn Helgaas
2020-11-30 19:54         ` Kelley, Sean V
2020-12-01  0:25           ` Bjorn Helgaas
2020-12-01  1:09             ` Kuppuswamy, Sathyanarayanan
2020-12-01  1:13             ` Kelley, Sean V
2020-12-02 20:53             ` Kelley, Sean V
2020-12-02 21:27               ` Bjorn Helgaas
2020-12-02 22:54                 ` Kelley, Sean V
2020-11-21  0:10 ` [PATCH v12 11/15] PCI/RCEC: Add pcie_link_rcec() to associate RCiEPs Sean V Kelley
2020-11-21  0:10 ` [PATCH v12 12/15] PCI/RCEC: Add RCiEP's linked RCEC to AER/ERR Sean V Kelley
2020-12-02 23:44   ` Bjorn Helgaas
2020-12-03  0:51     ` Kelley, Sean V [this message]
2020-12-04  0:01       ` Bjorn Helgaas
2020-12-04 17:17         ` Kelley, Sean V
2020-12-04 17:24           ` Bjorn Helgaas
2020-12-05 21:30           ` Bjorn Helgaas
2020-12-07 17:23             ` Kelley, Sean V
2020-11-21  0:10 ` [PATCH v12 13/15] PCI/AER: Add pcie_walk_rcec() to RCEC AER handling Sean V Kelley
2020-11-21  0:10 ` [PATCH v12 14/15] PCI/PME: Add pcie_walk_rcec() to RCEC PME handling Sean V Kelley
2020-11-21  0:10 ` [PATCH v12 15/15] PCI/AER: Add RCEC AER error injection support Sean V Kelley
2020-11-21  4:26 ` [PATCH v12 00/15] Add RCEC handling to PCI/AER Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6E339ABE-2F55-486B-833A-BDDAF27A114D@intel.com \
    --to=sean.v.kelley@intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=ashok.raj@intel.com \
    --cc=bhelgaas@google.com \
    --cc=helgaas@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=qiuxu.zhuo@intel.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=sathyanarayanan.kuppuswamy@intel.com \
    --cc=tony.luck@intel.com \
    --cc=xerces.zhao@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).