Linux-PCI Archive on lore.kernel.org
 help / color / Atom feed
From: Sean V Kelley <seanvk.dev@oregontracks.org>
To: bhelgaas@google.com, Jonathan.Cameron@huawei.com,
	rafael.j.wysocki@intel.com, ashok.raj@intel.com,
	tony.luck@intel.com, sathyanarayanan.kuppuswamy@intel.com,
	qiuxu.zhuo@intel.com
Cc: linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	Sean V Kelley <sean.v.kelley@intel.com>
Subject: [PATCH v8 08/14] PCI/AER: Extend AER error handling to RCECs
Date: Fri,  2 Oct 2020 11:47:29 -0700
Message-ID: <20201002184735.1229220-9-seanvk.dev@oregontracks.org> (raw)
In-Reply-To: <20201002184735.1229220-1-seanvk.dev@oregontracks.org>

From: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Currently the kernel does not handle AER errors for Root Complex
integrated End Points (RCiEPs)[0]. These devices sit on a root bus within
the Root Complex (RC). AER handling is performed by a Root Complex Event
Collector (RCEC) [1] which is a effectively a type of RCiEP on the same
root bus.

For an RCEC (technically not a Bridge), error messages "received" from
associated RCiEPs must be enabled for "transmission" in order to cause a
System Error via the Root Control register or (when the Advanced Error
Reporting Capability is present) reporting via the Root Error Command
register and logging in the Root Error Status register and Error Source
Identification register.

In addition to the defined OS level handling of the reset flow for the
associated RCiEPs of an RCEC, it is possible to also have non-native
handling. In that case there is no need to take any actions on the RCEC
because the firmware is responsible for them. This is true where APEI [2]
is used to report the AER errors via a GHES[v2] HEST entry [3] and
relevant AER CPER record [4] and non-native handling is in use.

We effectively end up with two different types of discovery for
purposes of handling AER errors:

1) Normal bus walk - we pass the downstream port above a bus to which
the device is attached and it walks everything below that point.

2) An RCiEP with no visible association with an RCEC as there is no need
to walk devices. In that case, the flow is to just call the callbacks for
the actual device, which in turn references its associated RCEC.

Modify pci_walk_bridge() to handle devices which lack a subordinate bus.
If the device does not then it will call the function on that device
alone.

[0] ACPI PCI Express Base Specification 5.0-1 1.3.2.3 Root Complex
Integrated Endpoint Rules.
[1] ACPI PCI Express Base Specification 5.0-1 6.2 Error Signalling and
Logging
[2] ACPI Specification 6.3 Chapter 18 ACPI Platform Error Interface (APEI)
[3] ACPI Specification 6.3 18.2.3.7 Generic Hardware Error Source
[4] UEFI Specification 2.8, N.2.7 PCI Express Error Section

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
---
 drivers/pci/pcie/err.c | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 5ff1afa4763d..c4ceca42a3bf 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -148,19 +148,25 @@ static int report_resume(struct pci_dev *dev, void *data)
 
 /**
  * pci_walk_bridge - walk bridges potentially AER affected
- * @bridge   bridge which may be a Port.
+ * @bridge   bridge which may be an RCEC with associated RCiEPs,
+ *           an RCiEP associated with an RCEC, or a Port.
  * @cb       callback to be called for each device found
  * @userdata arbitrary pointer to be passed to callback.
  *
  * If the device provided is a bridge, walk the subordinate bus,
  * including any bridged devices on buses under this bus.
  * Call the provided callback on each device found.
+ *
+ * If the device provided has no subordinate bus, call the provided
+ * callback on the device itself.
  */
 static void pci_walk_bridge(struct pci_dev *bridge, int (*cb)(struct pci_dev *, void *),
 			    void *userdata)
 {
 	if (bridge->subordinate)
 		pci_walk_bus(bridge->subordinate, cb, userdata);
+	else
+		cb(bridge, userdata);
 }
 
 pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
@@ -174,11 +180,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 	/*
 	 * Error recovery runs on all subordinates of the first downstream
 	 * bridge. If the downstream bridge detected the error, it is
-	 * cleared at the end.
+	 * cleared at the end. For RCiEPs we should reset just the RCiEP itself.
 	 */
 	type = pci_pcie_type(dev);
 	if (type == PCI_EXP_TYPE_ROOT_PORT ||
-	    type == PCI_EXP_TYPE_DOWNSTREAM)
+	    type == PCI_EXP_TYPE_DOWNSTREAM ||
+	    type == PCI_EXP_TYPE_RC_EC ||
+	    type == PCI_EXP_TYPE_RC_END)
 		bridge = dev;
 	else
 		bridge = pci_upstream_bridge(dev);
@@ -186,7 +194,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 	pci_dbg(dev, "broadcast error_detected message\n");
 	if (state == pci_channel_io_frozen) {
 		pci_walk_bridge(bridge, report_frozen_detected, &status);
-		status = reset_subordinate_device(bridge);
+		if (type == PCI_EXP_TYPE_RC_END) {
+			pci_warn(dev, "subordinate device reset not possible for RCiEP\n");
+			status = PCI_ERS_RESULT_NONE;
+			goto failed;
+		}
+
+		status = reset_subordinate_devices(bridge);
 		if (status != PCI_ERS_RESULT_RECOVERED) {
 			pci_warn(dev, "subordinate device reset failed\n");
 			goto failed;
@@ -219,7 +233,8 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 	pci_walk_bridge(bridge, report_resume, &status);
 
 	if (type == PCI_EXP_TYPE_ROOT_PORT ||
-	    type == PCI_EXP_TYPE_DOWNSTREAM) {
+	    type == PCI_EXP_TYPE_DOWNSTREAM ||
+	    type == PCI_EXP_TYPE_RC_EC) {
 		if (pcie_aer_is_native(bridge))
 			pcie_clear_device_status(bridge);
 		pci_aer_clear_nonfatal_status(bridge);
-- 
2.28.0


  parent reply index

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-02 18:47 [PATCH v8 00/14] Add RCEC handling to PCI/AER Sean V Kelley
2020-10-02 18:47 ` [PATCH v8 01/14] PCI/RCEC: Add RCEC class code and extended capability Sean V Kelley
2020-10-02 18:47 ` [PATCH v8 02/14] PCI/RCEC: Bind RCEC devices to the Root Port driver Sean V Kelley
2020-10-02 18:47 ` [PATCH v8 03/14] PCI/RCEC: Cache RCEC capabilities in pci_init_capabilities() Sean V Kelley
2020-10-02 18:47 ` [PATCH v8 04/14] PCI/ERR: Rename reset_link() to reset_subordinate_device() Sean V Kelley
2020-10-02 18:47 ` [PATCH v8 05/14] PCI/ERR: Use "bridge" for clarity in pcie_do_recovery() Sean V Kelley
2020-10-02 18:47 ` [PATCH v8 06/14] PCI/ERR: Add pci_walk_bridge() to pcie_do_recovery() Sean V Kelley
2020-10-02 18:47 ` [PATCH v8 07/14] PCI/ERR: Limit AER resets in pcie_do_recovery() Sean V Kelley
2020-10-02 18:47 ` Sean V Kelley [this message]
2020-10-09 21:27   ` [PATCH v8 08/14] PCI/AER: Extend AER error handling to RCECs Bjorn Helgaas
2020-10-09 21:54     ` Sean V Kelley
2020-10-02 18:47 ` [PATCH v8 09/14] PCI/AER: Apply function level reset to RCiEP on fatal error Sean V Kelley
2020-10-02 18:47 ` [PATCH v8 10/14] PCI/RCEC: Add pcie_link_rcec() to associate RCiEPs Sean V Kelley
2020-10-02 18:47 ` [PATCH v8 11/14] PCI/RCEC: Add RCiEP's linked RCEC to AER/ERR Sean V Kelley
2020-10-09 17:57   ` Bjorn Helgaas
2020-10-09 18:26     ` Sean V Kelley
2020-10-09 18:34       ` Sean V Kelley
2020-10-09 18:53         ` Sean V Kelley
2020-10-09 19:48           ` Sean V Kelley
2020-10-09 21:30     ` Bjorn Helgaas
2020-10-09 22:07       ` Sean V Kelley
2020-10-09 23:51         ` Kelley, Sean V
2020-10-12 22:58           ` Bjorn Helgaas
2020-10-13 16:55             ` Sean V Kelley
2020-10-02 18:47 ` [PATCH v8 12/14] PCI/AER: Add pcie_walk_rcec() to RCEC AER handling Sean V Kelley
2020-10-02 18:47 ` [PATCH v8 13/14] PCI/PME: Add pcie_walk_rcec() to RCEC PME handling Sean V Kelley
2020-10-02 18:47 ` [PATCH v8 14/14] PCI/AER: Add RCEC AER error injection support Sean V Kelley
2020-10-09 15:53 ` [PATCH v8 00/14] Add RCEC handling to PCI/AER Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201002184735.1229220-9-seanvk.dev@oregontracks.org \
    --to=seanvk.dev@oregontracks.org \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=ashok.raj@intel.com \
    --cc=bhelgaas@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=qiuxu.zhuo@intel.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=sathyanarayanan.kuppuswamy@intel.com \
    --cc=sean.v.kelley@intel.com \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-PCI Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-pci/0 linux-pci/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-pci linux-pci/ https://lore.kernel.org/linux-pci \
		linux-pci@vger.kernel.org
	public-inbox-index linux-pci

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-pci


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git