linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 00/10] Add RCEC handling to PCI/AER
@ 2020-09-22 21:38 Sean V Kelley
  2020-09-22 21:38 ` [PATCH v6 01/10] PCI/RCEC: Add RCEC class code and extended capability Sean V Kelley
                   ` (9 more replies)
  0 siblings, 10 replies; 25+ messages in thread
From: Sean V Kelley @ 2020-09-22 21:38 UTC (permalink / raw)
  To: bhelgaas, Jonathan.Cameron, rafael.j.wysocki, ashok.raj,
	tony.luck, sathyanarayanan.kuppuswamy, qiuxu.zhuo
  Cc: linux-pci, linux-kernel, Sean V Kelley

From: Sean V Kelley <sean.v.kelley@intel.com>

Changes since v5 and v4:

- Note that not all of the patches in v4 made it to the lists due
possible errors on the smtp server side. It's being investigated.
- Removed two Reviewed-by tags on the patches for pcie_walk_rcec()
due to significant functionality changes.
- Made walk_rcec() static.
- Bumped to v6, with hope this makes it completely to the lists.

Changes since v3 [1]:

- Merge usage patch and API (pcie_walk_rcec).
(Sathyanarayanan Kuppuswamy)

- Considering the possible ways to call pcie_do_recovery(), make
the flow more understandable through assignments of the incoming
devices in terms of 'bridges'.
- reset_link() may be misnamed. The point is really to reset any devices
below 'dev', which is a bridge. Using reset_subordinate_devices() makes it
more clear that we pass a bridge and reset the devices *below* it.
- In pcie_walk_rcec() an RCEC bus number does not indicate association
in the Assocated Bus Numbers register so skip it (7.9.10.3)
- Potential for a lot of churning to call pci_get_slot() in
pcie_walk_rciep_Devfn(). Attempt to reduce the number of calls by identifying
the associations through bus walks.
- Change pr_dbg of link RCiEP to "PME & error events reported via.."
(Bjorn Helgaas)

- In some cases (AER via APEI) there may not exist an RCEC. In these cases
the bridge is NULL. Account for that in the bridge walk.
(Jonathan Cameron)


[1] https://lore.kernel.org/linux-pci/20200812164659.1118946-1-sean.v.kelley@intel.com/



Jonathan Cameron (1):
  PCI/AER: Extend AER error handling to RCECs

Qiuxu Zhuo (5):
  PCI/RCEC: Add RCEC class code and extended capability
  PCI/RCEC: Bind RCEC devices to the Root Port driver
  PCI/AER: Apply function level reset to RCiEP on fatal error
  PCI/RCEC: Add RCiEP's linked RCEC to AER/ERR
  PCI/AER: Add RCEC AER error injection support

Sean V Kelley (4):
  PCI/RCEC: Cache RCEC capabilities in pci_init_capabilities()
  PCI/RCEC: Add pcie_link_rcec() to associate RCiEPs
  PCI/AER: Add pcie_walk_rcec() to RCEC AER handling
  PCI/PME: Add pcie_walk_rcec() to RCEC PME handling

 drivers/pci/pci.h               |  26 ++++-
 drivers/pci/pcie/Makefile       |   2 +-
 drivers/pci/pcie/aer.c          |  36 +++++--
 drivers/pci/pcie/aer_inject.c   |   5 +-
 drivers/pci/pcie/err.c          | 102 ++++++++++++++----
 drivers/pci/pcie/pme.c          |  15 ++-
 drivers/pci/pcie/portdrv_core.c |   8 +-
 drivers/pci/pcie/portdrv_pci.c  |   8 +-
 drivers/pci/pcie/rcec.c         | 185 ++++++++++++++++++++++++++++++++
 drivers/pci/probe.c             |   3 +-
 include/linux/pci.h             |   5 +
 include/linux/pci_ids.h         |   1 +
 include/uapi/linux/pci_regs.h   |   7 ++
 13 files changed, 359 insertions(+), 44 deletions(-)
 create mode 100644 drivers/pci/pcie/rcec.c

--
2.28.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v6 01/10] PCI/RCEC: Add RCEC class code and extended capability
  2020-09-22 21:38 [PATCH v6 00/10] Add RCEC handling to PCI/AER Sean V Kelley
@ 2020-09-22 21:38 ` Sean V Kelley
  2020-09-22 21:38 ` [PATCH v6 02/10] PCI/RCEC: Bind RCEC devices to the Root Port driver Sean V Kelley
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Sean V Kelley @ 2020-09-22 21:38 UTC (permalink / raw)
  To: bhelgaas, Jonathan.Cameron, rafael.j.wysocki, ashok.raj,
	tony.luck, sathyanarayanan.kuppuswamy, qiuxu.zhuo
  Cc: linux-pci, linux-kernel

From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>

A PCIe Root Complex Event Collector(RCEC) has the base class 0x08,
sub-class 0x07, and programming interface 0x00. Add the class code
0x0807 to identify RCEC devices and add the defines for the RCEC
Endpoint Association Extended Capability.

See PCI Express Base Specification, version 5.0-1, section "1.3.4
Root Complex Event Collector" and section "7.9.10 Root Complex
Event Collector Endpoint Association Extended Capability"

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 include/linux/pci_ids.h       | 1 +
 include/uapi/linux/pci_regs.h | 7 +++++++
 2 files changed, 8 insertions(+)

diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index 1ab1e24bcbce..d8156a5dbee8 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -81,6 +81,7 @@
 #define PCI_CLASS_SYSTEM_RTC		0x0803
 #define PCI_CLASS_SYSTEM_PCI_HOTPLUG	0x0804
 #define PCI_CLASS_SYSTEM_SDHCI		0x0805
+#define PCI_CLASS_SYSTEM_RCEC		0x0807
 #define PCI_CLASS_SYSTEM_OTHER		0x0880
 
 #define PCI_BASE_CLASS_INPUT		0x09
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index f9701410d3b5..f335f65f65d6 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -828,6 +828,13 @@
 #define  PCI_PWR_CAP_BUDGET(x)	((x) & 1)	/* Included in system budget */
 #define PCI_EXT_CAP_PWR_SIZEOF	16
 
+/* Root Complex Event Collector Endpoint Association  */
+#define PCI_RCEC_RCIEP_BITMAP	4	/* Associated Bitmap for RCiEPs */
+#define PCI_RCEC_BUSN		8	/* RCEC Associated Bus Numbers */
+#define  PCI_RCEC_BUSN_REG_VER	0x02	/* Least capability version that BUSN present */
+#define  PCI_RCEC_BUSN_NEXT(x)	(((x) >> 8) & 0xff)
+#define  PCI_RCEC_BUSN_LAST(x)	(((x) >> 16) & 0xff)
+
 /* Vendor-Specific (VSEC, PCI_EXT_CAP_ID_VNDR) */
 #define PCI_VNDR_HEADER		4	/* Vendor-Specific Header */
 #define  PCI_VNDR_HEADER_ID(x)	((x) & 0xffff)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v6 02/10] PCI/RCEC: Bind RCEC devices to the Root Port driver
  2020-09-22 21:38 [PATCH v6 00/10] Add RCEC handling to PCI/AER Sean V Kelley
  2020-09-22 21:38 ` [PATCH v6 01/10] PCI/RCEC: Add RCEC class code and extended capability Sean V Kelley
@ 2020-09-22 21:38 ` Sean V Kelley
  2020-09-25 19:59   ` Bjorn Helgaas
  2020-09-22 21:38 ` [PATCH v6 03/10] PCI/RCEC: Cache RCEC capabilities in pci_init_capabilities() Sean V Kelley
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Sean V Kelley @ 2020-09-22 21:38 UTC (permalink / raw)
  To: bhelgaas, Jonathan.Cameron, rafael.j.wysocki, ashok.raj,
	tony.luck, sathyanarayanan.kuppuswamy, qiuxu.zhuo
  Cc: linux-pci, linux-kernel, Sean V Kelley

From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>

If a Root Complex Integrated Endpoint (RCiEP) is implemented, errors may
optionally be sent to a corresponding Root Complex Event Collector (RCEC).
Each RCiEP must be associated with no more than one RCEC. Interface errors
are reported to the OS by RCECs.

For an RCEC (technically not a Bridge), error messages "received" from
associated RCiEPs must be enabled for "transmission" in order to cause a
System Error via the Root Control register or (when the Advanced Error
Reporting Capability is present) reporting via the Root Error Command
register and logging in the Root Error Status register and Error Source
Identification register.

Given the commonality with Root Ports and the need to also support AER
and PME services for RCECs, extend the Root Port driver to support RCEC
devices through the addition of the RCEC Class ID to the driver
structure.

Co-developed-by: Sean V Kelley <sean.v.kelley@intel.com>
Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 drivers/pci/pcie/portdrv_core.c | 8 ++++----
 drivers/pci/pcie/portdrv_pci.c  | 5 ++++-
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index 50a9522ab07d..99769c636775 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -234,11 +234,11 @@ static int get_port_device_capability(struct pci_dev *dev)
 #endif
 
 	/*
-	 * Root ports are capable of generating PME too.  Root Complex
-	 * Event Collectors can also generate PMEs, but we don't handle
-	 * those yet.
+	 * Root ports and Root Complex Event Collectors are capable
+	 * of generating PME.
 	 */
-	if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT &&
+	if ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
+	     pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC) &&
 	    (pcie_ports_native || host->native_pme)) {
 		services |= PCIE_PORT_SERVICE_PME;
 
diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index 3a3ce40ae1ab..4d880679b9b1 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -106,7 +106,8 @@ static int pcie_portdrv_probe(struct pci_dev *dev,
 	if (!pci_is_pcie(dev) ||
 	    ((pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT) &&
 	     (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM) &&
-	     (pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM)))
+	     (pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM) &&
+	     (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_EC)))
 		return -ENODEV;
 
 	status = pcie_port_device_register(dev);
@@ -195,6 +196,8 @@ static const struct pci_device_id port_pci_ids[] = {
 	{ PCI_DEVICE_CLASS(((PCI_CLASS_BRIDGE_PCI << 8) | 0x00), ~0) },
 	/* subtractive decode PCI-to-PCI bridge, class type is 060401h */
 	{ PCI_DEVICE_CLASS(((PCI_CLASS_BRIDGE_PCI << 8) | 0x01), ~0) },
+	/* handle any Root Complex Event Collector */
+	{ PCI_DEVICE_CLASS(((PCI_CLASS_SYSTEM_RCEC << 8) | 0x00), ~0) },
 	{ },
 };
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v6 03/10] PCI/RCEC: Cache RCEC capabilities in pci_init_capabilities()
  2020-09-22 21:38 [PATCH v6 00/10] Add RCEC handling to PCI/AER Sean V Kelley
  2020-09-22 21:38 ` [PATCH v6 01/10] PCI/RCEC: Add RCEC class code and extended capability Sean V Kelley
  2020-09-22 21:38 ` [PATCH v6 02/10] PCI/RCEC: Bind RCEC devices to the Root Port driver Sean V Kelley
@ 2020-09-22 21:38 ` Sean V Kelley
  2020-09-25 20:13   ` Bjorn Helgaas
  2020-09-22 21:38 ` [PATCH v6 04/10] PCI/AER: Extend AER error handling to RCECs Sean V Kelley
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Sean V Kelley @ 2020-09-22 21:38 UTC (permalink / raw)
  To: bhelgaas, Jonathan.Cameron, rafael.j.wysocki, ashok.raj,
	tony.luck, sathyanarayanan.kuppuswamy, qiuxu.zhuo
  Cc: linux-pci, linux-kernel, Sean V Kelley

From: Sean V Kelley <sean.v.kelley@intel.com>

Extend support for Root Complex Event Collectors by decoding and
caching the RCEC Endpoint Association Extended Capabilities when
enumerating. Use that cached information for later error source
reporting. See PCI Express Base Specification, version 5.0-1,
section 7.9.10.

Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
---
 drivers/pci/pci.h         | 18 ++++++++++++++
 drivers/pci/pcie/Makefile |  2 +-
 drivers/pci/pcie/rcec.c   | 52 +++++++++++++++++++++++++++++++++++++++
 drivers/pci/probe.c       |  3 ++-
 include/linux/pci.h       |  4 +++
 5 files changed, 77 insertions(+), 2 deletions(-)
 create mode 100644 drivers/pci/pcie/rcec.c

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index fa12f7cbc1a0..83670a6425d8 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -449,6 +449,16 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info);
 void aer_print_error(struct pci_dev *dev, struct aer_err_info *info);
 #endif	/* CONFIG_PCIEAER */
 
+#ifdef CONFIG_PCIEPORTBUS
+/* Cached RCEC Associated Endpoint Extended Capabilities */
+struct rcec_ext {
+	u8		ver;
+	u8		nextbusn;
+	u8		lastbusn;
+	u32		bitmap;
+};
+#endif
+
 #ifdef CONFIG_PCIE_DPC
 void pci_save_dpc_state(struct pci_dev *dev);
 void pci_restore_dpc_state(struct pci_dev *dev);
@@ -461,6 +471,14 @@ static inline void pci_restore_dpc_state(struct pci_dev *dev) {}
 static inline void pci_dpc_init(struct pci_dev *pdev) {}
 #endif
 
+#ifdef CONFIG_PCIEPORTBUS
+void pci_rcec_init(struct pci_dev *dev);
+void pci_rcec_exit(struct pci_dev *dev);
+#else
+static inline void pci_rcec_init(struct pci_dev *dev) {}
+static inline void pci_rcec_exit(struct pci_dev *dev) {}
+#endif
+
 #ifdef CONFIG_PCI_ATS
 /* Address Translation Service */
 void pci_ats_init(struct pci_dev *dev);
diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile
index 68da9280ff11..d9697892fa3e 100644
--- a/drivers/pci/pcie/Makefile
+++ b/drivers/pci/pcie/Makefile
@@ -2,7 +2,7 @@
 #
 # Makefile for PCI Express features and port driver
 
-pcieportdrv-y			:= portdrv_core.o portdrv_pci.o err.o
+pcieportdrv-y			:= portdrv_core.o portdrv_pci.o err.o rcec.o
 
 obj-$(CONFIG_PCIEPORTBUS)	+= pcieportdrv.o
 
diff --git a/drivers/pci/pcie/rcec.c b/drivers/pci/pcie/rcec.c
new file mode 100644
index 000000000000..519ae086ff41
--- /dev/null
+++ b/drivers/pci/pcie/rcec.c
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Root Complex Event Collector Support
+ *
+ * Authors:
+ *  Sean V Kelley <sean.v.kelley@intel.com>
+ *  Qiuxu Zhuo <qiuxu.zhuo@intel.com>
+ *
+ * Copyright (C) 2020 Intel Corp.
+ */
+
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/bitops.h>
+#include <linux/pci.h>
+#include <linux/pci_regs.h>
+
+#include "../pci.h"
+
+void pci_rcec_init(struct pci_dev *dev)
+{
+	u32 rcec, hdr, busn;
+
+	/* Only for Root Complex Event Collectors */
+	if (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_EC)
+		return;
+
+	dev->rcec_cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_RCEC);
+	if (!dev->rcec_cap)
+		return;
+
+	dev->rcec_ext = kzalloc(sizeof(*dev->rcec_ext), GFP_KERNEL);
+
+	rcec = dev->rcec_cap;
+	pci_read_config_dword(dev, rcec + PCI_RCEC_RCIEP_BITMAP, &dev->rcec_ext->bitmap);
+
+	/* Check whether RCEC BUSN register is present */
+	pci_read_config_dword(dev, rcec, &hdr);
+	dev->rcec_ext->ver = PCI_EXT_CAP_VER(hdr);
+	if (dev->rcec_ext->ver < PCI_RCEC_BUSN_REG_VER)
+		return;
+
+	pci_read_config_dword(dev, rcec + PCI_RCEC_BUSN, &busn);
+	dev->rcec_ext->nextbusn = PCI_RCEC_BUSN_NEXT(busn);
+	dev->rcec_ext->lastbusn = PCI_RCEC_BUSN_LAST(busn);
+}
+
+void pci_rcec_exit(struct pci_dev *dev)
+{
+	kfree(dev->rcec_ext);
+	dev->rcec_ext = NULL;
+}
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 03d37128a24f..16bc651fecb7 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2201,6 +2201,7 @@ static void pci_configure_device(struct pci_dev *dev)
 static void pci_release_capabilities(struct pci_dev *dev)
 {
 	pci_aer_exit(dev);
+	pci_rcec_exit(dev);
 	pci_vpd_release(dev);
 	pci_iov_release(dev);
 	pci_free_cap_save_buffers(dev);
@@ -2400,7 +2401,7 @@ static void pci_init_capabilities(struct pci_dev *dev)
 	pci_ptm_init(dev);		/* Precision Time Measurement */
 	pci_aer_init(dev);		/* Advanced Error Reporting */
 	pci_dpc_init(dev);		/* Downstream Port Containment */
-
+	pci_rcec_init(dev);		/* Root Complex Event Collector */
 	pcie_report_downtraining(dev);
 
 	if (pci_probe_reset_function(dev) == 0)
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 835530605c0d..5c5c4eb642b6 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -326,6 +326,10 @@ struct pci_dev {
 #ifdef CONFIG_PCIEAER
 	u16		aer_cap;	/* AER capability offset */
 	struct aer_stats *aer_stats;	/* AER stats for this device */
+#endif
+#ifdef CONFIG_PCIEPORTBUS
+	u16		rcec_cap;	/* RCEC capability offset */
+	struct rcec_ext *rcec_ext;	/* RCEC cached assoc. endpoint extended capabilities */
 #endif
 	u8		pcie_cap;	/* PCIe capability offset */
 	u8		msi_cap;	/* MSI capability offset */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v6 04/10] PCI/AER: Extend AER error handling to RCECs
  2020-09-22 21:38 [PATCH v6 00/10] Add RCEC handling to PCI/AER Sean V Kelley
                   ` (2 preceding siblings ...)
  2020-09-22 21:38 ` [PATCH v6 03/10] PCI/RCEC: Cache RCEC capabilities in pci_init_capabilities() Sean V Kelley
@ 2020-09-22 21:38 ` Sean V Kelley
  2020-09-25 21:14   ` Bjorn Helgaas
  2020-09-22 21:38 ` [PATCH v6 05/10] PCI/AER: Apply function level reset to RCiEP on fatal error Sean V Kelley
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Sean V Kelley @ 2020-09-22 21:38 UTC (permalink / raw)
  To: bhelgaas, Jonathan.Cameron, rafael.j.wysocki, ashok.raj,
	tony.luck, sathyanarayanan.kuppuswamy, qiuxu.zhuo
  Cc: linux-pci, linux-kernel, Sean V Kelley

From: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Currently the kernel does not handle AER errors for Root Complex
integrated End Points (RCiEPs)[0]. These devices sit on a root bus within
the Root Complex (RC). AER handling is performed by a Root Complex Event
Collector (RCEC) [1] which is a effectively a type of RCiEP on the same
root bus.

For an RCEC (technically not a Bridge), error messages "received" from
associated RCiEPs must be enabled for "transmission" in order to cause a
System Error via the Root Control register or (when the Advanced Error
Reporting Capability is present) reporting via the Root Error Command
register and logging in the Root Error Status register and Error Source
Identification register.

In addition to the defined OS level handling of the reset flow for the
associated RCiEPs of an RCEC, it is possible to also have non-native
handling. In that case there is no need to take any actions on the RCEC
because the firmware is responsible for them. This is true where APEI [2]
is used to report the AER errors via a GHES[v2] HEST entry [3] and
relevant AER CPER record [4] and non-native handling is in use.

We effectively end up with two different types of discovery for
purposes of handling AER errors:

1) Normal bus walk - we pass the downstream port above a bus to which
the device is attached and it walks everything below that point.

2) An RCiEP with no visible association with an RCEC as there is no need
to walk devices. In that case, the flow is to just call the callbacks for
the actual device, which in turn references its associated RCEC.

A new walk function pci_bridge_walk(), similar to pci_bus_walk(),
is provided that takes a pci_dev instead of a bus. If that bridge
corresponds to a downstream port it will walk the subordinate bus of
that bridge. If the device does not then it will call the function on
that device alone.

[0] ACPI PCI Express Base Specification 5.0-1 1.3.2.3 Root Complex
Integrated Endpoint Rules.
[1] ACPI PCI Express Base Specification 5.0-1 6.2 Error Signalling and
Logging
[2] ACPI Specification 6.3 Chapter 18 ACPI Platform Error Interface (APEI)
[3] ACPI Specification 6.3 18.2.3.7 Generic Hardware Error Source
[4] UEFI Specification 2.8, N.2.7 PCI Express Error Section

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
---
 drivers/pci/pci.h      |  2 +-
 drivers/pci/pcie/err.c | 77 +++++++++++++++++++++++++++++++-----------
 2 files changed, 59 insertions(+), 20 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 83670a6425d8..7b547fc3679a 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -575,7 +575,7 @@ static inline int pci_dev_specific_disable_acs_redir(struct pci_dev *dev)
 /* PCI error reporting and recovery */
 pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 			pci_channel_state_t state,
-			pci_ers_result_t (*reset_link)(struct pci_dev *pdev));
+			pci_ers_result_t (*reset_subordinate_devices)(struct pci_dev *pdev));
 
 bool pcie_wait_for_link(struct pci_dev *pdev, bool active);
 #ifdef CONFIG_PCIEASPM
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index c543f419d8f9..e575fa6cee63 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -146,38 +146,73 @@ static int report_resume(struct pci_dev *dev, void *data)
 	return 0;
 }
 
+/**
+ * pci_bridge_walk - walk bridges potentially AER affected
+ * @bridge   bridge which may be an RCEC with associated RCiEPs,
+ *           an RCiEP associated with an RCEC, or a Port.
+ * @cb       callback to be called for each device found
+ * @userdata arbitrary pointer to be passed to callback.
+ *
+ * If the device provided is a bridge, walk the subordinate bus,
+ * including any bridged devices on buses under this bus.
+ * Call the provided callback on each device found.
+ *
+ * If the device provided has no subordinate bus, call the provided
+ * callback on the device itself.
+ */
+static void pci_bridge_walk(struct pci_dev *bridge, int (*cb)(struct pci_dev *, void *),
+			    void *userdata)
+{
+	if (bridge->subordinate)
+		pci_walk_bus(bridge->subordinate, cb, userdata);
+	else
+		cb(bridge, userdata);
+}
+
 pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 			pci_channel_state_t state,
-			pci_ers_result_t (*reset_link)(struct pci_dev *pdev))
+			pci_ers_result_t (*reset_subordinate_devices)(struct pci_dev *pdev))
 {
 	pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
-	struct pci_bus *bus;
+	struct pci_dev *bridge;
+	int type;
 
 	/*
-	 * Error recovery runs on all subordinates of the first downstream port.
-	 * If the downstream port detected the error, it is cleared at the end.
+	 * Error recovery runs on all subordinates of the first downstream
+	 * bridge. If the downstream bridge detected the error, it is
+	 * cleared at the end. For RCiEPs we should reset just the RCiEP itself.
 	 */
-	if (!(pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
-	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM))
-		dev = dev->bus->self;
-	bus = dev->subordinate;
+	type = pci_pcie_type(dev);
+	if (type == PCI_EXP_TYPE_ROOT_PORT ||
+	    type == PCI_EXP_TYPE_DOWNSTREAM ||
+	    type == PCI_EXP_TYPE_RC_EC ||
+	    type == PCI_EXP_TYPE_RC_END)
+		bridge = dev;
+	else
+		bridge = pci_upstream_bridge(dev);
 
 	pci_dbg(dev, "broadcast error_detected message\n");
 	if (state == pci_channel_io_frozen) {
-		pci_walk_bus(bus, report_frozen_detected, &status);
-		status = reset_link(dev);
+		pci_bridge_walk(bridge, report_frozen_detected, &status);
+		if (type == PCI_EXP_TYPE_RC_END) {
+			pci_warn(dev, "link reset not possible for RCiEP\n");
+			status = PCI_ERS_RESULT_NONE;
+			goto failed;
+		}
+
+		status = reset_subordinate_devices(bridge);
 		if (status != PCI_ERS_RESULT_RECOVERED) {
-			pci_warn(dev, "link reset failed\n");
+			pci_warn(dev, "subordinate device reset failed\n");
 			goto failed;
 		}
 	} else {
-		pci_walk_bus(bus, report_normal_detected, &status);
+		pci_bridge_walk(bridge, report_normal_detected, &status);
 	}
 
 	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
 		status = PCI_ERS_RESULT_RECOVERED;
 		pci_dbg(dev, "broadcast mmio_enabled message\n");
-		pci_walk_bus(bus, report_mmio_enabled, &status);
+		pci_bridge_walk(bridge, report_mmio_enabled, &status);
 	}
 
 	if (status == PCI_ERS_RESULT_NEED_RESET) {
@@ -188,18 +223,22 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 		 */
 		status = PCI_ERS_RESULT_RECOVERED;
 		pci_dbg(dev, "broadcast slot_reset message\n");
-		pci_walk_bus(bus, report_slot_reset, &status);
+		pci_bridge_walk(bridge, report_slot_reset, &status);
 	}
 
 	if (status != PCI_ERS_RESULT_RECOVERED)
 		goto failed;
 
 	pci_dbg(dev, "broadcast resume message\n");
-	pci_walk_bus(bus, report_resume, &status);
-
-	if (pcie_aer_is_native(dev))
-		pcie_clear_device_status(dev);
-	pci_aer_clear_nonfatal_status(dev);
+	pci_bridge_walk(bridge, report_resume, &status);
+
+	if (type == PCI_EXP_TYPE_ROOT_PORT ||
+	    type == PCI_EXP_TYPE_DOWNSTREAM ||
+	    type == PCI_EXP_TYPE_RC_EC) {
+		if (pcie_aer_is_native(bridge))
+			pcie_clear_device_status(bridge);
+		pci_aer_clear_nonfatal_status(bridge);
+	}
 	pci_info(dev, "device recovery successful\n");
 	return status;
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v6 05/10] PCI/AER: Apply function level reset to RCiEP on fatal error
  2020-09-22 21:38 [PATCH v6 00/10] Add RCEC handling to PCI/AER Sean V Kelley
                   ` (3 preceding siblings ...)
  2020-09-22 21:38 ` [PATCH v6 04/10] PCI/AER: Extend AER error handling to RCECs Sean V Kelley
@ 2020-09-22 21:38 ` Sean V Kelley
  2020-09-25 21:58   ` Bjorn Helgaas
  2020-09-22 21:38 ` [PATCH v6 06/10] PCI/RCEC: Add pcie_link_rcec() to associate RCiEPs Sean V Kelley
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Sean V Kelley @ 2020-09-22 21:38 UTC (permalink / raw)
  To: bhelgaas, Jonathan.Cameron, rafael.j.wysocki, ashok.raj,
	tony.luck, sathyanarayanan.kuppuswamy, qiuxu.zhuo
  Cc: linux-pci, linux-kernel

From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>

Attempt to do function level reset for an RCiEP associated with an
RCEC device on fatal error.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
---
 drivers/pci/pcie/err.c | 31 ++++++++++++++++++++++---------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index e575fa6cee63..5380ecc41506 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -169,6 +169,17 @@ static void pci_bridge_walk(struct pci_dev *bridge, int (*cb)(struct pci_dev *,
 		cb(bridge, userdata);
 }
 
+static pci_ers_result_t flr_on_rciep(struct pci_dev *dev)
+{
+	if (!pcie_has_flr(dev))
+		return PCI_ERS_RESULT_NONE;
+
+	if (pcie_flr(dev))
+		return PCI_ERS_RESULT_DISCONNECT;
+
+	return PCI_ERS_RESULT_RECOVERED;
+}
+
 pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 			pci_channel_state_t state,
 			pci_ers_result_t (*reset_subordinate_devices)(struct pci_dev *pdev))
@@ -195,15 +206,17 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 	if (state == pci_channel_io_frozen) {
 		pci_bridge_walk(bridge, report_frozen_detected, &status);
 		if (type == PCI_EXP_TYPE_RC_END) {
-			pci_warn(dev, "link reset not possible for RCiEP\n");
-			status = PCI_ERS_RESULT_NONE;
-			goto failed;
-		}
-
-		status = reset_subordinate_devices(bridge);
-		if (status != PCI_ERS_RESULT_RECOVERED) {
-			pci_warn(dev, "subordinate device reset failed\n");
-			goto failed;
+			status = flr_on_rciep(dev);
+			if (status != PCI_ERS_RESULT_RECOVERED) {
+				pci_warn(dev, "function level reset failed\n");
+				goto failed;
+			}
+		} else {
+			status = reset_subordinate_devices(bridge);
+			if (status != PCI_ERS_RESULT_RECOVERED) {
+				pci_warn(dev, "subordinate device reset failed\n");
+				goto failed;
+			}
 		}
 	} else {
 		pci_bridge_walk(bridge, report_normal_detected, &status);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v6 06/10] PCI/RCEC: Add pcie_link_rcec() to associate RCiEPs
  2020-09-22 21:38 [PATCH v6 00/10] Add RCEC handling to PCI/AER Sean V Kelley
                   ` (4 preceding siblings ...)
  2020-09-22 21:38 ` [PATCH v6 05/10] PCI/AER: Apply function level reset to RCiEP on fatal error Sean V Kelley
@ 2020-09-22 21:38 ` Sean V Kelley
  2020-09-25 22:15   ` Bjorn Helgaas
  2020-09-22 21:38 ` [PATCH v6 07/10] PCI/RCEC: Add RCiEP's linked RCEC to AER/ERR Sean V Kelley
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Sean V Kelley @ 2020-09-22 21:38 UTC (permalink / raw)
  To: bhelgaas, Jonathan.Cameron, rafael.j.wysocki, ashok.raj,
	tony.luck, sathyanarayanan.kuppuswamy, qiuxu.zhuo
  Cc: linux-pci, linux-kernel, Sean V Kelley

From: Sean V Kelley <sean.v.kelley@intel.com>

A Root Complex Event Collector provides support for
terminating error and PME messages from associated RCiEPs.

Make use of the RCEC Endpoint Association Extended Capability
to identify associated RCiEPs. Link the associated RCiEPs as
the RCECs are enumerated.

Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
---
 drivers/pci/pci.h              |  2 +
 drivers/pci/pcie/portdrv_pci.c |  3 ++
 drivers/pci/pcie/rcec.c        | 96 ++++++++++++++++++++++++++++++++++
 include/linux/pci.h            |  1 +
 4 files changed, 102 insertions(+)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 7b547fc3679a..ddb5872466fb 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -474,9 +474,11 @@ static inline void pci_dpc_init(struct pci_dev *pdev) {}
 #ifdef CONFIG_PCIEPORTBUS
 void pci_rcec_init(struct pci_dev *dev);
 void pci_rcec_exit(struct pci_dev *dev);
+void pcie_link_rcec(struct pci_dev *rcec);
 #else
 static inline void pci_rcec_init(struct pci_dev *dev) {}
 static inline void pci_rcec_exit(struct pci_dev *dev) {}
+static inline void pcie_link_rcec(struct pci_dev *rcec) {}
 #endif
 
 #ifdef CONFIG_PCI_ATS
diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index 4d880679b9b1..dbeb0155c2c3 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -110,6 +110,9 @@ static int pcie_portdrv_probe(struct pci_dev *dev,
 	     (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_EC)))
 		return -ENODEV;
 
+	if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC)
+		pcie_link_rcec(dev);
+
 	status = pcie_port_device_register(dev);
 	if (status)
 		return status;
diff --git a/drivers/pci/pcie/rcec.c b/drivers/pci/pcie/rcec.c
index 519ae086ff41..5630480a6659 100644
--- a/drivers/pci/pcie/rcec.c
+++ b/drivers/pci/pcie/rcec.c
@@ -17,6 +17,102 @@
 
 #include "../pci.h"
 
+struct walk_rcec_data {
+	struct pci_dev *rcec;
+	int (*user_callback)(struct pci_dev *dev, void *data);
+	void *user_data;
+};
+
+static bool rcec_assoc_rciep(struct pci_dev *rcec, struct pci_dev *rciep)
+{
+	unsigned long bitmap = rcec->rcec_ext->bitmap;
+	unsigned int devn;
+
+	/* An RCiEP found on bus in range */
+	if (rcec->bus->number != rciep->bus->number)
+		return true;
+
+	/* Same bus, so check bitmap */
+	for_each_set_bit(devn, &bitmap, 32)
+		if (devn == rciep->devfn)
+			return true;
+
+	return false;
+}
+
+static int link_rcec_helper(struct pci_dev *dev, void *data)
+{
+	struct walk_rcec_data *rcec_data = data;
+	struct pci_dev *rcec = rcec_data->rcec;
+
+	if ((pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END) && rcec_assoc_rciep(rcec, dev)) {
+		dev->rcec = rcec;
+		pci_dbg(dev, "PME & error events reported via %s\n", pci_name(rcec));
+	}
+
+	return 0;
+}
+
+void walk_rcec(int (*cb)(struct pci_dev *dev, void *data), void *userdata)
+{
+	struct walk_rcec_data *rcec_data = userdata;
+	struct pci_dev *rcec = rcec_data->rcec;
+	u8 nextbusn, lastbusn;
+	struct pci_bus *bus;
+	unsigned int bnr;
+
+	if (!rcec->rcec_cap)
+		return;
+
+	/* Walk own bus for bitmap based association */
+	pci_walk_bus(rcec->bus, cb, rcec_data);
+
+	/* Check whether RCEC BUSN register is present */
+	if (rcec->rcec_ext->ver < PCI_RCEC_BUSN_REG_VER)
+		return;
+
+	nextbusn = rcec->rcec_ext->nextbusn;
+	lastbusn = rcec->rcec_ext->lastbusn;
+
+	/* All RCiEP devices are on the same bus as the RCEC */
+	if (nextbusn == 0xff && lastbusn == 0x00)
+		return;
+
+	for (bnr = nextbusn; bnr <= lastbusn; bnr++) {
+		/* No association indicated (PCIe 5.0-1, 7.9.10.3) */
+		if (bnr == rcec->bus->number)
+			continue;
+
+		bus = pci_find_bus(pci_domain_nr(rcec->bus), bnr);
+		if (!bus)
+			continue;
+
+		/* Find RCiEP devices on the given bus ranges */
+		pci_walk_bus(bus, cb, rcec_data);
+	}
+}
+
+/**
+ * pcie_link_rcec - Link RCiEP devices associating with RCEC.
+ * @rcec     RCEC whose RCiEP devices should be linked.
+ *
+ * Link the given RCEC to each RCiEP device found.
+ *
+ */
+void pcie_link_rcec(struct pci_dev *rcec)
+{
+	struct walk_rcec_data rcec_data;
+
+	if (!rcec->rcec_cap)
+		return;
+
+	rcec_data.rcec = rcec;
+	rcec_data.user_callback = NULL;
+	rcec_data.user_data = NULL;
+
+	walk_rcec(link_rcec_helper, &rcec_data);
+}
+
 void pci_rcec_init(struct pci_dev *dev)
 {
 	u32 rcec, hdr, busn;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 5c5c4eb642b6..ad382a9484ea 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -330,6 +330,7 @@ struct pci_dev {
 #ifdef CONFIG_PCIEPORTBUS
 	u16		rcec_cap;	/* RCEC capability offset */
 	struct rcec_ext *rcec_ext;	/* RCEC cached assoc. endpoint extended capabilities */
+	struct pci_dev  *rcec;          /* Associated RCEC device */
 #endif
 	u8		pcie_cap;	/* PCIe capability offset */
 	u8		msi_cap;	/* MSI capability offset */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v6 07/10] PCI/RCEC: Add RCiEP's linked RCEC to AER/ERR
  2020-09-22 21:38 [PATCH v6 00/10] Add RCEC handling to PCI/AER Sean V Kelley
                   ` (5 preceding siblings ...)
  2020-09-22 21:38 ` [PATCH v6 06/10] PCI/RCEC: Add pcie_link_rcec() to associate RCiEPs Sean V Kelley
@ 2020-09-22 21:38 ` Sean V Kelley
  2020-09-27 23:45   ` Bjorn Helgaas
  2020-09-22 21:38 ` [PATCH v6 08/10] PCI/AER: Add pcie_walk_rcec() to RCEC AER handling Sean V Kelley
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Sean V Kelley @ 2020-09-22 21:38 UTC (permalink / raw)
  To: bhelgaas, Jonathan.Cameron, rafael.j.wysocki, ashok.raj,
	tony.luck, sathyanarayanan.kuppuswamy, qiuxu.zhuo
  Cc: linux-pci, linux-kernel, Sean V Kelley

From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>

When attempting error recovery for an RCiEP associated with an RCEC device,
there needs to be a way to update the Root Error Status, the Uncorrectable
Error Status and the Uncorrectable Error Severity of the parent RCEC.
In some non-native cases in which there is no OS visible device
associated with the RCiEP, there is nothing to act upon as the firmware
is acting before the OS. So add handling for the linked 'rcec' in AER/ERR
while taking into account non-native cases.

Co-developed-by: Sean V Kelley <sean.v.kelley@intel.com>
Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
---
 drivers/pci/pcie/aer.c |  9 +++++----
 drivers/pci/pcie/err.c | 38 ++++++++++++++++++++++++--------------
 2 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 65dff5f3457a..dccdba60b5d9 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1358,17 +1358,18 @@ static int aer_probe(struct pcie_device *dev)
 static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
 {
 	int aer = dev->aer_cap;
+	int rc = 0;
 	u32 reg32;
-	int rc;
-
 
 	/* Disable Root's interrupt in response to error messages */
 	pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, &reg32);
 	reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
 	pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32);
 
-	rc = pci_bus_error_reset(dev);
-	pci_info(dev, "Root Port link has been reset\n");
+	if (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_EC) {
+		rc = pci_bus_error_reset(dev);
+		pci_info(dev, "Root Port link has been reset\n");
+	}
 
 	/* Clear Root Error Status */
 	pci_read_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, &reg32);
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 5380ecc41506..a61a2518163a 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -149,7 +149,8 @@ static int report_resume(struct pci_dev *dev, void *data)
 /**
  * pci_bridge_walk - walk bridges potentially AER affected
  * @bridge   bridge which may be an RCEC with associated RCiEPs,
- *           an RCiEP associated with an RCEC, or a Port.
+ *           or a Port.
+ * @dev      an RCiEP lacking an associated RCEC.
  * @cb       callback to be called for each device found
  * @userdata arbitrary pointer to be passed to callback.
  *
@@ -160,13 +161,16 @@ static int report_resume(struct pci_dev *dev, void *data)
  * If the device provided has no subordinate bus, call the provided
  * callback on the device itself.
  */
-static void pci_bridge_walk(struct pci_dev *bridge, int (*cb)(struct pci_dev *, void *),
+static void pci_bridge_walk(struct pci_dev *bridge, struct pci_dev *dev,
+			    int (*cb)(struct pci_dev *, void *),
 			    void *userdata)
 {
-	if (bridge->subordinate)
+	if (bridge && bridge->subordinate)
 		pci_walk_bus(bridge->subordinate, cb, userdata);
-	else
+	else if (bridge)
 		cb(bridge, userdata);
+	else
+		cb(dev, userdata);
 }
 
 static pci_ers_result_t flr_on_rciep(struct pci_dev *dev)
@@ -196,16 +200,24 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 	type = pci_pcie_type(dev);
 	if (type == PCI_EXP_TYPE_ROOT_PORT ||
 	    type == PCI_EXP_TYPE_DOWNSTREAM ||
-	    type == PCI_EXP_TYPE_RC_EC ||
-	    type == PCI_EXP_TYPE_RC_END)
+	    type == PCI_EXP_TYPE_RC_EC)
 		bridge = dev;
+	else if (type == PCI_EXP_TYPE_RC_END)
+		bridge = dev->rcec;
 	else
 		bridge = pci_upstream_bridge(dev);
 
 	pci_dbg(dev, "broadcast error_detected message\n");
 	if (state == pci_channel_io_frozen) {
-		pci_bridge_walk(bridge, report_frozen_detected, &status);
+		pci_bridge_walk(bridge, dev, report_frozen_detected, &status);
 		if (type == PCI_EXP_TYPE_RC_END) {
+			/*
+			 * The callback only clears the Root Error Status
+			 * of the RCEC (see aer.c).
+			 */
+			if (bridge)
+				reset_subordinate_devices(bridge);
+
 			status = flr_on_rciep(dev);
 			if (status != PCI_ERS_RESULT_RECOVERED) {
 				pci_warn(dev, "function level reset failed\n");
@@ -219,13 +231,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 			}
 		}
 	} else {
-		pci_bridge_walk(bridge, report_normal_detected, &status);
+		pci_bridge_walk(bridge, dev, report_normal_detected, &status);
 	}
 
 	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
 		status = PCI_ERS_RESULT_RECOVERED;
 		pci_dbg(dev, "broadcast mmio_enabled message\n");
-		pci_bridge_walk(bridge, report_mmio_enabled, &status);
+		pci_bridge_walk(bridge, dev, report_mmio_enabled, &status);
 	}
 
 	if (status == PCI_ERS_RESULT_NEED_RESET) {
@@ -236,18 +248,16 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 		 */
 		status = PCI_ERS_RESULT_RECOVERED;
 		pci_dbg(dev, "broadcast slot_reset message\n");
-		pci_bridge_walk(bridge, report_slot_reset, &status);
+		pci_bridge_walk(bridge, dev, report_slot_reset, &status);
 	}
 
 	if (status != PCI_ERS_RESULT_RECOVERED)
 		goto failed;
 
 	pci_dbg(dev, "broadcast resume message\n");
-	pci_bridge_walk(bridge, report_resume, &status);
+	pci_bridge_walk(bridge, dev, report_resume, &status);
 
-	if (type == PCI_EXP_TYPE_ROOT_PORT ||
-	    type == PCI_EXP_TYPE_DOWNSTREAM ||
-	    type == PCI_EXP_TYPE_RC_EC) {
+	if (bridge) {
 		if (pcie_aer_is_native(bridge))
 			pcie_clear_device_status(bridge);
 		pci_aer_clear_nonfatal_status(bridge);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v6 08/10] PCI/AER: Add pcie_walk_rcec() to RCEC AER handling
  2020-09-22 21:38 [PATCH v6 00/10] Add RCEC handling to PCI/AER Sean V Kelley
                   ` (6 preceding siblings ...)
  2020-09-22 21:38 ` [PATCH v6 07/10] PCI/RCEC: Add RCiEP's linked RCEC to AER/ERR Sean V Kelley
@ 2020-09-22 21:38 ` Sean V Kelley
  2020-09-22 21:38 ` [PATCH v6 09/10] PCI/PME: Add pcie_walk_rcec() to RCEC PME handling Sean V Kelley
  2020-09-22 21:38 ` [PATCH v6 10/10] PCI/AER: Add RCEC AER error injection support Sean V Kelley
  9 siblings, 0 replies; 25+ messages in thread
From: Sean V Kelley @ 2020-09-22 21:38 UTC (permalink / raw)
  To: bhelgaas, Jonathan.Cameron, rafael.j.wysocki, ashok.raj,
	tony.luck, sathyanarayanan.kuppuswamy, qiuxu.zhuo
  Cc: linux-pci, linux-kernel, Sean V Kelley

From: Sean V Kelley <sean.v.kelley@intel.com>

The Root Complex Event Collectors(RCEC) appear as peers to Root Ports
and also have the AER capability. In addition, actions need to be taken
for assocated RCiEPs. In such cases the RCECs will need to be walked in
order to find and act upon their respective RCiEPs.  Extend the existing
ability to link the RCECs with a walking function pcie_walk_rcec(). Add
RCEC support to the current AER service driver and attach the AER service
driver to the RCEC device.

Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
---
 drivers/pci/pci.h       |  4 ++++
 drivers/pci/pcie/aer.c  | 27 ++++++++++++++++++++-------
 drivers/pci/pcie/rcec.c | 39 ++++++++++++++++++++++++++++++++++++++-
 3 files changed, 62 insertions(+), 8 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index ddb5872466fb..e8535a7d4b53 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -475,10 +475,14 @@ static inline void pci_dpc_init(struct pci_dev *pdev) {}
 void pci_rcec_init(struct pci_dev *dev);
 void pci_rcec_exit(struct pci_dev *dev);
 void pcie_link_rcec(struct pci_dev *rcec);
+void pcie_walk_rcec(struct pci_dev *rcec, int (*cb)(struct pci_dev *, void *),
+		    void *userdata);
 #else
 static inline void pci_rcec_init(struct pci_dev *dev) {}
 static inline void pci_rcec_exit(struct pci_dev *dev) {}
 static inline void pcie_link_rcec(struct pci_dev *rcec) {}
+static inline void pcie_walk_rcec(struct pci_dev *rcec, int (*cb)(struct pci_dev *, void *),
+				  void *userdata) {}
 #endif
 
 #ifdef CONFIG_PCI_ATS
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index dccdba60b5d9..43772bfc134e 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -300,7 +300,7 @@ int pci_aer_raw_clear_status(struct pci_dev *dev)
 		return -EIO;
 
 	port_type = pci_pcie_type(dev);
-	if (port_type == PCI_EXP_TYPE_ROOT_PORT) {
+	if (port_type == PCI_EXP_TYPE_ROOT_PORT || port_type == PCI_EXP_TYPE_RC_EC) {
 		pci_read_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, &status);
 		pci_write_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, status);
 	}
@@ -595,7 +595,8 @@ static umode_t aer_stats_attrs_are_visible(struct kobject *kobj,
 	if ((a == &dev_attr_aer_rootport_total_err_cor.attr ||
 	     a == &dev_attr_aer_rootport_total_err_fatal.attr ||
 	     a == &dev_attr_aer_rootport_total_err_nonfatal.attr) &&
-	    pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT)
+	    ((pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT) &&
+	    (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_EC)))
 		return 0;
 
 	return a->mode;
@@ -916,7 +917,10 @@ static bool find_source_device(struct pci_dev *parent,
 	if (result)
 		return true;
 
-	pci_walk_bus(parent->subordinate, find_device_iter, e_info);
+	if (pci_pcie_type(parent) == PCI_EXP_TYPE_RC_EC)
+		pcie_walk_rcec(parent, find_device_iter, e_info);
+	else
+		pci_walk_bus(parent->subordinate, find_device_iter, e_info);
 
 	if (!e_info->error_dev_num) {
 		pci_info(parent, "can't find device of ID%04x\n", e_info->id);
@@ -1053,6 +1057,7 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info)
 		if (!(info->status & ~info->mask))
 			return 0;
 	} else if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
+		   pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC ||
 	           pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
 		   info->severity == AER_NONFATAL) {
 
@@ -1205,6 +1210,7 @@ static int set_device_error_reporting(struct pci_dev *dev, void *data)
 	int type = pci_pcie_type(dev);
 
 	if ((type == PCI_EXP_TYPE_ROOT_PORT) ||
+	    (type == PCI_EXP_TYPE_RC_EC) ||
 	    (type == PCI_EXP_TYPE_UPSTREAM) ||
 	    (type == PCI_EXP_TYPE_DOWNSTREAM)) {
 		if (enable)
@@ -1229,9 +1235,11 @@ static void set_downstream_devices_error_reporting(struct pci_dev *dev,
 {
 	set_device_error_reporting(dev, &enable);
 
-	if (!dev->subordinate)
-		return;
-	pci_walk_bus(dev->subordinate, set_device_error_reporting, &enable);
+	if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC)
+		pcie_walk_rcec(dev, set_device_error_reporting, &enable);
+	else if (dev->subordinate)
+		pci_walk_bus(dev->subordinate, set_device_error_reporting, &enable);
+
 }
 
 /**
@@ -1329,6 +1337,11 @@ static int aer_probe(struct pcie_device *dev)
 	struct device *device = &dev->device;
 	struct pci_dev *port = dev->port;
 
+	/* Limit to Root Ports or Root Complex Event Collectors */
+	if ((pci_pcie_type(port) != PCI_EXP_TYPE_RC_EC) &&
+	    (pci_pcie_type(port) != PCI_EXP_TYPE_ROOT_PORT))
+		return -ENODEV;
+
 	rpc = devm_kzalloc(device, sizeof(struct aer_rpc), GFP_KERNEL);
 	if (!rpc)
 		return -ENOMEM;
@@ -1385,7 +1398,7 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
 
 static struct pcie_port_service_driver aerdriver = {
 	.name		= "aer",
-	.port_type	= PCI_EXP_TYPE_ROOT_PORT,
+	.port_type	= PCIE_ANY_PORT,
 	.service	= PCIE_PORT_SERVICE_AER,
 
 	.probe		= aer_probe,
diff --git a/drivers/pci/pcie/rcec.c b/drivers/pci/pcie/rcec.c
index 5630480a6659..e6d20131b578 100644
--- a/drivers/pci/pcie/rcec.c
+++ b/drivers/pci/pcie/rcec.c
@@ -53,7 +53,18 @@ static int link_rcec_helper(struct pci_dev *dev, void *data)
 	return 0;
 }
 
-void walk_rcec(int (*cb)(struct pci_dev *dev, void *data), void *userdata)
+static int walk_rcec_helper(struct pci_dev *dev, void *data)
+{
+	struct walk_rcec_data *rcec_data = data;
+	struct pci_dev *rcec = rcec_data->rcec;
+
+	if ((pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END) && rcec_assoc_rciep(rcec, dev))
+		rcec_data->user_callback(dev, rcec_data->user_data);
+
+	return 0;
+}
+
+static void walk_rcec(int (*cb)(struct pci_dev *dev, void *data), void *userdata)
 {
 	struct walk_rcec_data *rcec_data = userdata;
 	struct pci_dev *rcec = rcec_data->rcec;
@@ -113,6 +124,32 @@ void pcie_link_rcec(struct pci_dev *rcec)
 	walk_rcec(link_rcec_helper, &rcec_data);
 }
 
+/**
+ * pcie_walk_rcec - Walk RCiEP devices associating with RCEC and call callback.
+ * @rcec     RCEC whose RCiEP devices should be walked.
+ * @cb       Callback to be called for each RCiEP device found.
+ * @userdata Arbitrary pointer to be passed to callback.
+ *
+ * Walk the given RCEC. Call the provided callback on each RCiEP device found.
+ *
+ * We check the return of @cb each time. If it returns anything
+ * other than 0, we break out.
+ */
+void pcie_walk_rcec(struct pci_dev *rcec, int (*cb)(struct pci_dev *, void *),
+		    void *userdata)
+{
+	struct walk_rcec_data rcec_data;
+
+	if (!rcec->rcec_cap)
+		return;
+
+	rcec_data.rcec = rcec;
+	rcec_data.user_callback = cb;
+	rcec_data.user_data = userdata;
+
+	walk_rcec(walk_rcec_helper, &rcec_data);
+}
+
 void pci_rcec_init(struct pci_dev *dev)
 {
 	u32 rcec, hdr, busn;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v6 09/10] PCI/PME: Add pcie_walk_rcec() to RCEC PME handling
  2020-09-22 21:38 [PATCH v6 00/10] Add RCEC handling to PCI/AER Sean V Kelley
                   ` (7 preceding siblings ...)
  2020-09-22 21:38 ` [PATCH v6 08/10] PCI/AER: Add pcie_walk_rcec() to RCEC AER handling Sean V Kelley
@ 2020-09-22 21:38 ` Sean V Kelley
  2020-09-22 21:38 ` [PATCH v6 10/10] PCI/AER: Add RCEC AER error injection support Sean V Kelley
  9 siblings, 0 replies; 25+ messages in thread
From: Sean V Kelley @ 2020-09-22 21:38 UTC (permalink / raw)
  To: bhelgaas, Jonathan.Cameron, rafael.j.wysocki, ashok.raj,
	tony.luck, sathyanarayanan.kuppuswamy, qiuxu.zhuo
  Cc: linux-pci, linux-kernel, Sean V Kelley

From: Sean V Kelley <sean.v.kelley@intel.com>

The Root Complex Event Collectors(RCEC) appear as peers of Root Ports
and also have the PME capability. As with AER, there is a need to be
able to walk the RCiEPs associated with their RCEC for purposes of
acting upon them with callbacks. So add RCEC support through the use
of pcie_walk_rcec() to the current PME service driver and attach the
PME service driver to the RCEC device.

Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
---
 drivers/pci/pcie/pme.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pcie/pme.c b/drivers/pci/pcie/pme.c
index 6a32970bb731..87799166c96a 100644
--- a/drivers/pci/pcie/pme.c
+++ b/drivers/pci/pcie/pme.c
@@ -310,7 +310,10 @@ static int pcie_pme_can_wakeup(struct pci_dev *dev, void *ign)
 static void pcie_pme_mark_devices(struct pci_dev *port)
 {
 	pcie_pme_can_wakeup(port, NULL);
-	if (port->subordinate)
+
+	if (pci_pcie_type(port) == PCI_EXP_TYPE_RC_EC)
+		pcie_walk_rcec(port, pcie_pme_can_wakeup, NULL);
+	else if (port->subordinate)
 		pci_walk_bus(port->subordinate, pcie_pme_can_wakeup, NULL);
 }
 
@@ -320,10 +323,15 @@ static void pcie_pme_mark_devices(struct pci_dev *port)
  */
 static int pcie_pme_probe(struct pcie_device *srv)
 {
-	struct pci_dev *port;
+	struct pci_dev *port = srv->port;
 	struct pcie_pme_service_data *data;
 	int ret;
 
+	/* Limit to Root Ports or Root Complex Event Collectors */
+	if ((pci_pcie_type(port) != PCI_EXP_TYPE_RC_EC) &&
+	    (pci_pcie_type(port) != PCI_EXP_TYPE_ROOT_PORT))
+		return -ENODEV;
+
 	data = kzalloc(sizeof(*data), GFP_KERNEL);
 	if (!data)
 		return -ENOMEM;
@@ -333,7 +341,6 @@ static int pcie_pme_probe(struct pcie_device *srv)
 	data->srv = srv;
 	set_service_data(srv, data);
 
-	port = srv->port;
 	pcie_pme_interrupt_enable(port, false);
 	pcie_clear_root_pme_status(port);
 
@@ -445,7 +452,7 @@ static void pcie_pme_remove(struct pcie_device *srv)
 
 static struct pcie_port_service_driver pcie_pme_driver = {
 	.name		= "pcie_pme",
-	.port_type	= PCI_EXP_TYPE_ROOT_PORT,
+	.port_type	= PCIE_ANY_PORT,
 	.service	= PCIE_PORT_SERVICE_PME,
 
 	.probe		= pcie_pme_probe,
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v6 10/10] PCI/AER: Add RCEC AER error injection support
  2020-09-22 21:38 [PATCH v6 00/10] Add RCEC handling to PCI/AER Sean V Kelley
                   ` (8 preceding siblings ...)
  2020-09-22 21:38 ` [PATCH v6 09/10] PCI/PME: Add pcie_walk_rcec() to RCEC PME handling Sean V Kelley
@ 2020-09-22 21:38 ` Sean V Kelley
  9 siblings, 0 replies; 25+ messages in thread
From: Sean V Kelley @ 2020-09-22 21:38 UTC (permalink / raw)
  To: bhelgaas, Jonathan.Cameron, rafael.j.wysocki, ashok.raj,
	tony.luck, sathyanarayanan.kuppuswamy, qiuxu.zhuo
  Cc: linux-pci, linux-kernel, Sean V Kelley

From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>

The Root Complex Event Collectors(RCEC) appear as peers to Root Ports
and also have the AER capability. So add RCEC support to the current AER
error injection driver.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Co-developed-by: Sean V Kelley <sean.v.kelley@intel.com>
Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
---
 drivers/pci/pcie/aer_inject.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pcie/aer_inject.c b/drivers/pci/pcie/aer_inject.c
index c2cbf425afc5..011a6c54b4e3 100644
--- a/drivers/pci/pcie/aer_inject.c
+++ b/drivers/pci/pcie/aer_inject.c
@@ -333,8 +333,11 @@ static int aer_inject(struct aer_error_inj *einj)
 	if (!dev)
 		return -ENODEV;
 	rpdev = pcie_find_root_port(dev);
+	/* If Root port not found, try to find an RCEC */
+	if (!rpdev)
+		rpdev = dev->rcec;
 	if (!rpdev) {
-		pci_err(dev, "Root port not found\n");
+		pci_err(dev, "Neither root port nor RCEC found\n");
 		ret = -ENODEV;
 		goto out_put;
 	}
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v6 02/10] PCI/RCEC: Bind RCEC devices to the Root Port driver
  2020-09-22 21:38 ` [PATCH v6 02/10] PCI/RCEC: Bind RCEC devices to the Root Port driver Sean V Kelley
@ 2020-09-25 19:59   ` Bjorn Helgaas
  2020-09-25 21:39     ` Sean V Kelley
  0 siblings, 1 reply; 25+ messages in thread
From: Bjorn Helgaas @ 2020-09-25 19:59 UTC (permalink / raw)
  To: Sean V Kelley
  Cc: bhelgaas, Jonathan.Cameron, rafael.j.wysocki, ashok.raj,
	tony.luck, sathyanarayanan.kuppuswamy, qiuxu.zhuo, linux-pci,
	linux-kernel, Sean V Kelley

On Tue, Sep 22, 2020 at 02:38:51PM -0700, Sean V Kelley wrote:
> From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> 
> If a Root Complex Integrated Endpoint (RCiEP) is implemented, errors may
> optionally be sent to a corresponding Root Complex Event Collector (RCEC).
> Each RCiEP must be associated with no more than one RCEC. Interface errors
> are reported to the OS by RCECs.
> 
> For an RCEC (technically not a Bridge), error messages "received" from
> associated RCiEPs must be enabled for "transmission" in order to cause a
> System Error via the Root Control register or (when the Advanced Error
> Reporting Capability is present) reporting via the Root Error Command
> register and logging in the Root Error Status register and Error Source
> Identification register.
> 
> Given the commonality with Root Ports and the need to also support AER
> and PME services for RCECs, extend the Root Port driver to support RCEC
> devices through the addition of the RCEC Class ID to the driver
> structure.
> 
> Co-developed-by: Sean V Kelley <sean.v.kelley@intel.com>
> Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
>  drivers/pci/pcie/portdrv_core.c | 8 ++++----
>  drivers/pci/pcie/portdrv_pci.c  | 5 ++++-
>  2 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
> index 50a9522ab07d..99769c636775 100644
> --- a/drivers/pci/pcie/portdrv_core.c
> +++ b/drivers/pci/pcie/portdrv_core.c
> @@ -234,11 +234,11 @@ static int get_port_device_capability(struct pci_dev *dev)
>  #endif
>  
>  	/*
> -	 * Root ports are capable of generating PME too.  Root Complex
> -	 * Event Collectors can also generate PMEs, but we don't handle
> -	 * those yet.
> +	 * Root ports and Root Complex Event Collectors are capable
> +	 * of generating PME.
>  	 */
> -	if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT &&
> +	if ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> +	     pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC) &&

It seems like this change belongs in patch 09/10, where we change
pme.c so it claims both Root Ports and RCECs.  Does this hunk
accomplish anything before 09/10?

>  	    (pcie_ports_native || host->native_pme)) {
>  		services |= PCIE_PORT_SERVICE_PME;
>  
> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> index 3a3ce40ae1ab..4d880679b9b1 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -106,7 +106,8 @@ static int pcie_portdrv_probe(struct pci_dev *dev,
>  	if (!pci_is_pcie(dev) ||
>  	    ((pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT) &&
>  	     (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM) &&
> -	     (pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM)))
> +	     (pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM) &&
> +	     (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_EC)))
>  		return -ENODEV;
>  
>  	status = pcie_port_device_register(dev);
> @@ -195,6 +196,8 @@ static const struct pci_device_id port_pci_ids[] = {
>  	{ PCI_DEVICE_CLASS(((PCI_CLASS_BRIDGE_PCI << 8) | 0x00), ~0) },
>  	/* subtractive decode PCI-to-PCI bridge, class type is 060401h */
>  	{ PCI_DEVICE_CLASS(((PCI_CLASS_BRIDGE_PCI << 8) | 0x01), ~0) },
> +	/* handle any Root Complex Event Collector */
> +	{ PCI_DEVICE_CLASS(((PCI_CLASS_SYSTEM_RCEC << 8) | 0x00), ~0) },
>  	{ },
>  };
>  
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v6 03/10] PCI/RCEC: Cache RCEC capabilities in pci_init_capabilities()
  2020-09-22 21:38 ` [PATCH v6 03/10] PCI/RCEC: Cache RCEC capabilities in pci_init_capabilities() Sean V Kelley
@ 2020-09-25 20:13   ` Bjorn Helgaas
  2020-09-25 21:49     ` Sean V Kelley
  0 siblings, 1 reply; 25+ messages in thread
From: Bjorn Helgaas @ 2020-09-25 20:13 UTC (permalink / raw)
  To: Sean V Kelley
  Cc: bhelgaas, Jonathan.Cameron, rafael.j.wysocki, ashok.raj,
	tony.luck, sathyanarayanan.kuppuswamy, qiuxu.zhuo, linux-pci,
	linux-kernel, Sean V Kelley

On Tue, Sep 22, 2020 at 02:38:52PM -0700, Sean V Kelley wrote:
> From: Sean V Kelley <sean.v.kelley@intel.com>
> 
> Extend support for Root Complex Event Collectors by decoding and
> caching the RCEC Endpoint Association Extended Capabilities when
> enumerating. Use that cached information for later error source
> reporting. See PCI Express Base Specification, version 5.0-1,
> section 7.9.10.
> 
> Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
> Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
> ---
>  drivers/pci/pci.h         | 18 ++++++++++++++
>  drivers/pci/pcie/Makefile |  2 +-
>  drivers/pci/pcie/rcec.c   | 52 +++++++++++++++++++++++++++++++++++++++
>  drivers/pci/probe.c       |  3 ++-
>  include/linux/pci.h       |  4 +++
>  5 files changed, 77 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/pci/pcie/rcec.c
> 
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index fa12f7cbc1a0..83670a6425d8 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -449,6 +449,16 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info);
>  void aer_print_error(struct pci_dev *dev, struct aer_err_info *info);
>  #endif	/* CONFIG_PCIEAER */
>  
> +#ifdef CONFIG_PCIEPORTBUS
> +/* Cached RCEC Associated Endpoint Extended Capabilities */
> +struct rcec_ext {
> +	u8		ver;
> +	u8		nextbusn;
> +	u8		lastbusn;
> +	u32		bitmap;
> +};
> +#endif
> +
>  #ifdef CONFIG_PCIE_DPC
>  void pci_save_dpc_state(struct pci_dev *dev);
>  void pci_restore_dpc_state(struct pci_dev *dev);
> @@ -461,6 +471,14 @@ static inline void pci_restore_dpc_state(struct pci_dev *dev) {}
>  static inline void pci_dpc_init(struct pci_dev *pdev) {}
>  #endif
>  
> +#ifdef CONFIG_PCIEPORTBUS
> +void pci_rcec_init(struct pci_dev *dev);
> +void pci_rcec_exit(struct pci_dev *dev);
> +#else
> +static inline void pci_rcec_init(struct pci_dev *dev) {}
> +static inline void pci_rcec_exit(struct pci_dev *dev) {}
> +#endif
> +
>  #ifdef CONFIG_PCI_ATS
>  /* Address Translation Service */
>  void pci_ats_init(struct pci_dev *dev);
> diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile
> index 68da9280ff11..d9697892fa3e 100644
> --- a/drivers/pci/pcie/Makefile
> +++ b/drivers/pci/pcie/Makefile
> @@ -2,7 +2,7 @@
>  #
>  # Makefile for PCI Express features and port driver
>  
> -pcieportdrv-y			:= portdrv_core.o portdrv_pci.o err.o
> +pcieportdrv-y			:= portdrv_core.o portdrv_pci.o err.o rcec.o
>  
>  obj-$(CONFIG_PCIEPORTBUS)	+= pcieportdrv.o
>  
> diff --git a/drivers/pci/pcie/rcec.c b/drivers/pci/pcie/rcec.c
> new file mode 100644
> index 000000000000..519ae086ff41
> --- /dev/null
> +++ b/drivers/pci/pcie/rcec.c
> @@ -0,0 +1,52 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Root Complex Event Collector Support
> + *
> + * Authors:
> + *  Sean V Kelley <sean.v.kelley@intel.com>
> + *  Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> + *
> + * Copyright (C) 2020 Intel Corp.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/errno.h>
> +#include <linux/bitops.h>
> +#include <linux/pci.h>
> +#include <linux/pci_regs.h>

Do we really need all the above?  I don't see any errno or bitops
here.

> +#include "../pci.h"
> +
> +void pci_rcec_init(struct pci_dev *dev)
> +{
> +	u32 rcec, hdr, busn;
> +
> +	/* Only for Root Complex Event Collectors */
> +	if (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_EC)
> +		return;
> +
> +	dev->rcec_cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_RCEC);
> +	if (!dev->rcec_cap)
> +		return;
> +
> +	dev->rcec_ext = kzalloc(sizeof(*dev->rcec_ext), GFP_KERNEL);
> +
> +	rcec = dev->rcec_cap;
> +	pci_read_config_dword(dev, rcec + PCI_RCEC_RCIEP_BITMAP, &dev->rcec_ext->bitmap);
> +
> +	/* Check whether RCEC BUSN register is present */
> +	pci_read_config_dword(dev, rcec, &hdr);
> +	dev->rcec_ext->ver = PCI_EXT_CAP_VER(hdr);
> +	if (dev->rcec_ext->ver < PCI_RCEC_BUSN_REG_VER)
> +		return;
> +
> +	pci_read_config_dword(dev, rcec + PCI_RCEC_BUSN, &busn);
> +	dev->rcec_ext->nextbusn = PCI_RCEC_BUSN_NEXT(busn);
> +	dev->rcec_ext->lastbusn = PCI_RCEC_BUSN_LAST(busn);
> +}
> +
> +void pci_rcec_exit(struct pci_dev *dev)
> +{
> +	kfree(dev->rcec_ext);
> +	dev->rcec_ext = NULL;
> +}
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 03d37128a24f..16bc651fecb7 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2201,6 +2201,7 @@ static void pci_configure_device(struct pci_dev *dev)
>  static void pci_release_capabilities(struct pci_dev *dev)
>  {
>  	pci_aer_exit(dev);
> +	pci_rcec_exit(dev);
>  	pci_vpd_release(dev);
>  	pci_iov_release(dev);
>  	pci_free_cap_save_buffers(dev);
> @@ -2400,7 +2401,7 @@ static void pci_init_capabilities(struct pci_dev *dev)
>  	pci_ptm_init(dev);		/* Precision Time Measurement */
>  	pci_aer_init(dev);		/* Advanced Error Reporting */
>  	pci_dpc_init(dev);		/* Downstream Port Containment */
> -
> +	pci_rcec_init(dev);		/* Root Complex Event Collector */
>  	pcie_report_downtraining(dev);
>  
>  	if (pci_probe_reset_function(dev) == 0)
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 835530605c0d..5c5c4eb642b6 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -326,6 +326,10 @@ struct pci_dev {
>  #ifdef CONFIG_PCIEAER
>  	u16		aer_cap;	/* AER capability offset */
>  	struct aer_stats *aer_stats;	/* AER stats for this device */
> +#endif
> +#ifdef CONFIG_PCIEPORTBUS
> +	u16		rcec_cap;	/* RCEC capability offset */

Looking through the whole series, I think rcec_cap is used (a) in
pci_rcec_init(), where we actually read the RCEC EA capability, and
(b) in walk_rcec() and pcie_link_rcec(), where we only use it to test
whether the device has an RCEC EA capability.

Couldn't we accomplish (b) just by testing "dev->rcec_ext"?  Then we
wouldn't need to save rcec_cap at all.

> +	struct rcec_ext *rcec_ext;	/* RCEC cached assoc. endpoint extended capabilities */

Maybe "rcec_ea"?  The important part is that this is the Endpoint
Association information.  The fact that it happens to be an "extended"
capability isn't very interesting.

>  #endif
>  	u8		pcie_cap;	/* PCIe capability offset */
>  	u8		msi_cap;	/* MSI capability offset */
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v6 04/10] PCI/AER: Extend AER error handling to RCECs
  2020-09-22 21:38 ` [PATCH v6 04/10] PCI/AER: Extend AER error handling to RCECs Sean V Kelley
@ 2020-09-25 21:14   ` Bjorn Helgaas
  2020-09-25 22:08     ` Sean V Kelley
  0 siblings, 1 reply; 25+ messages in thread
From: Bjorn Helgaas @ 2020-09-25 21:14 UTC (permalink / raw)
  To: Sean V Kelley
  Cc: bhelgaas, Jonathan.Cameron, rafael.j.wysocki, ashok.raj,
	tony.luck, sathyanarayanan.kuppuswamy, qiuxu.zhuo, linux-pci,
	linux-kernel, Sean V Kelley, Sinan Kaya

[+cc Sinan, who's been reviewing changes in this area (thanks, Sinan!)]

On Tue, Sep 22, 2020 at 02:38:53PM -0700, Sean V Kelley wrote:
> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> Currently the kernel does not handle AER errors for Root Complex
> integrated End Points (RCiEPs)[0]. These devices sit on a root bus within
> the Root Complex (RC). AER handling is performed by a Root Complex Event
> Collector (RCEC) [1] which is a effectively a type of RCiEP on the same
> root bus.
> 
> For an RCEC (technically not a Bridge), error messages "received" from
> associated RCiEPs must be enabled for "transmission" in order to cause a
> System Error via the Root Control register or (when the Advanced Error
> Reporting Capability is present) reporting via the Root Error Command
> register and logging in the Root Error Status register and Error Source
> Identification register.
> 
> In addition to the defined OS level handling of the reset flow for the
> associated RCiEPs of an RCEC, it is possible to also have non-native
> handling. In that case there is no need to take any actions on the RCEC
> because the firmware is responsible for them. This is true where APEI [2]
> is used to report the AER errors via a GHES[v2] HEST entry [3] and
> relevant AER CPER record [4] and non-native handling is in use.
> 
> We effectively end up with two different types of discovery for
> purposes of handling AER errors:
> 
> 1) Normal bus walk - we pass the downstream port above a bus to which
> the device is attached and it walks everything below that point.
> 
> 2) An RCiEP with no visible association with an RCEC as there is no need
> to walk devices. In that case, the flow is to just call the callbacks for
> the actual device, which in turn references its associated RCEC.
> 
> A new walk function pci_bridge_walk(), similar to pci_bus_walk(),
> is provided that takes a pci_dev instead of a bus. If that bridge
> corresponds to a downstream port it will walk the subordinate bus of
> that bridge. If the device does not then it will call the function on
> that device alone.
> 
> [0] ACPI PCI Express Base Specification 5.0-1 1.3.2.3 Root Complex
> Integrated Endpoint Rules.
> [1] ACPI PCI Express Base Specification 5.0-1 6.2 Error Signalling and
> Logging
> [2] ACPI Specification 6.3 Chapter 18 ACPI Platform Error Interface (APEI)
> [3] ACPI Specification 6.3 18.2.3.7 Generic Hardware Error Source
> [4] UEFI Specification 2.8, N.2.7 PCI Express Error Section
> 
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>

I like this patch.  I think there are a few things that could be
peeled off as "no functional change" preliminary patches to make the
important changes more obvious in the "money patch".

> ---
>  drivers/pci/pci.h      |  2 +-
>  drivers/pci/pcie/err.c | 77 +++++++++++++++++++++++++++++++-----------
>  2 files changed, 59 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 83670a6425d8..7b547fc3679a 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -575,7 +575,7 @@ static inline int pci_dev_specific_disable_acs_redir(struct pci_dev *dev)
>  /* PCI error reporting and recovery */
>  pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>  			pci_channel_state_t state,
> -			pci_ers_result_t (*reset_link)(struct pci_dev *pdev));
> +			pci_ers_result_t (*reset_subordinate_devices)(struct pci_dev *pdev));
>  
>  bool pcie_wait_for_link(struct pci_dev *pdev, bool active);
>  #ifdef CONFIG_PCIEASPM
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index c543f419d8f9..e575fa6cee63 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -146,38 +146,73 @@ static int report_resume(struct pci_dev *dev, void *data)
>  	return 0;
>  }
>  
> +/**
> + * pci_bridge_walk - walk bridges potentially AER affected
> + * @bridge   bridge which may be an RCEC with associated RCiEPs,
> + *           an RCiEP associated with an RCEC, or a Port.
> + * @cb       callback to be called for each device found
> + * @userdata arbitrary pointer to be passed to callback.
> + *
> + * If the device provided is a bridge, walk the subordinate bus,
> + * including any bridged devices on buses under this bus.
> + * Call the provided callback on each device found.
> + *
> + * If the device provided has no subordinate bus, call the provided
> + * callback on the device itself.
> + */
> +static void pci_bridge_walk(struct pci_dev *bridge, int (*cb)(struct pci_dev *, void *),

Maybe call this pci_walk_bridge() so it's the same order as the
existing pci_walk_bus(), unless there's some reason to be different.

> +			    void *userdata)
> +{
> +	if (bridge->subordinate)
> +		pci_walk_bus(bridge->subordinate, cb, userdata);
> +	else
> +		cb(bridge, userdata);
> +}
> +
>  pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>  			pci_channel_state_t state,
> -			pci_ers_result_t (*reset_link)(struct pci_dev *pdev))
> +			pci_ers_result_t (*reset_subordinate_devices)(struct pci_dev *pdev))

The rename to "reset_subordinate_devices" seems separable, since it
doesn't change the interface.

>  {
>  	pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
> -	struct pci_bus *bus;
> +	struct pci_dev *bridge;
> +	int type;
>  
>  	/*
> -	 * Error recovery runs on all subordinates of the first downstream port.
> -	 * If the downstream port detected the error, it is cleared at the end.
> +	 * Error recovery runs on all subordinates of the first downstream
> +	 * bridge. If the downstream bridge detected the error, it is
> +	 * cleared at the end. For RCiEPs we should reset just the RCiEP itself.
>  	 */
> -	if (!(pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> -	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM))
> -		dev = dev->bus->self;
> -	bus = dev->subordinate;
> +	type = pci_pcie_type(dev);
> +	if (type == PCI_EXP_TYPE_ROOT_PORT ||
> +	    type == PCI_EXP_TYPE_DOWNSTREAM ||
> +	    type == PCI_EXP_TYPE_RC_EC ||
> +	    type == PCI_EXP_TYPE_RC_END)
> +		bridge = dev;
> +	else
> +		bridge = pci_upstream_bridge(dev);

This makes it much easier to read, thank you.  I think the addition of
"type", rename of "dev" to "bridge", the inversion of the condition
(major improvement, thanks again), and use of pci_upstream_bridge()
instead of dev->bus->self might also be separable?

Of course, you'd have to add the RC_EC and RC_END cases later, in the
money patch, but that's a good thing because it won't get lost in all
the other things being changed.

>  	pci_dbg(dev, "broadcast error_detected message\n");
>  	if (state == pci_channel_io_frozen) {
> -		pci_walk_bus(bus, report_frozen_detected, &status);
> -		status = reset_link(dev);
> +		pci_bridge_walk(bridge, report_frozen_detected, &status);
> +		if (type == PCI_EXP_TYPE_RC_END) {
> +			pci_warn(dev, "link reset not possible for RCiEP\n");
> +			status = PCI_ERS_RESULT_NONE;
> +			goto failed;
> +		}
> +
> +		status = reset_subordinate_devices(bridge);
>  		if (status != PCI_ERS_RESULT_RECOVERED) {
> -			pci_warn(dev, "link reset failed\n");
> +			pci_warn(dev, "subordinate device reset failed\n");
>  			goto failed;
>  		}
>  	} else {
> -		pci_walk_bus(bus, report_normal_detected, &status);
> +		pci_bridge_walk(bridge, report_normal_detected, &status);
>  	}
>  
>  	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
>  		status = PCI_ERS_RESULT_RECOVERED;
>  		pci_dbg(dev, "broadcast mmio_enabled message\n");
> -		pci_walk_bus(bus, report_mmio_enabled, &status);
> +		pci_bridge_walk(bridge, report_mmio_enabled, &status);
>  	}
>  
>  	if (status == PCI_ERS_RESULT_NEED_RESET) {
> @@ -188,18 +223,22 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>  		 */
>  		status = PCI_ERS_RESULT_RECOVERED;
>  		pci_dbg(dev, "broadcast slot_reset message\n");
> -		pci_walk_bus(bus, report_slot_reset, &status);
> +		pci_bridge_walk(bridge, report_slot_reset, &status);
>  	}
>  
>  	if (status != PCI_ERS_RESULT_RECOVERED)
>  		goto failed;
>  
>  	pci_dbg(dev, "broadcast resume message\n");
> -	pci_walk_bus(bus, report_resume, &status);
> -
> -	if (pcie_aer_is_native(dev))
> -		pcie_clear_device_status(dev);
> -	pci_aer_clear_nonfatal_status(dev);
> +	pci_bridge_walk(bridge, report_resume, &status);
> +
> +	if (type == PCI_EXP_TYPE_ROOT_PORT ||
> +	    type == PCI_EXP_TYPE_DOWNSTREAM ||
> +	    type == PCI_EXP_TYPE_RC_EC) {

Addition of this check also seems worthy of a separate patch (for just
root ports and downstream ports first, then RC_EC being added later).
That would make a convenient place to explain why we need the change.
I think it's *correct*; it just gets lost in the noise and not even
mentioned when it's done as part of one big patch.

> +		if (pcie_aer_is_native(bridge))
> +			pcie_clear_device_status(bridge);
> +		pci_aer_clear_nonfatal_status(bridge);
> +	}
>  	pci_info(dev, "device recovery successful\n");
>  	return status;
>  
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v6 02/10] PCI/RCEC: Bind RCEC devices to the Root Port driver
  2020-09-25 19:59   ` Bjorn Helgaas
@ 2020-09-25 21:39     ` Sean V Kelley
  0 siblings, 0 replies; 25+ messages in thread
From: Sean V Kelley @ 2020-09-25 21:39 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Sean V Kelley, bhelgaas, Jonathan.Cameron, rafael.j.wysocki,
	ashok.raj, tony.luck, sathyanarayanan.kuppuswamy, qiuxu.zhuo,
	linux-pci, linux-kernel

On 25 Sep 2020, at 12:59, Bjorn Helgaas wrote:

> On Tue, Sep 22, 2020 at 02:38:51PM -0700, Sean V Kelley wrote:
>> From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>>
>> If a Root Complex Integrated Endpoint (RCiEP) is implemented, errors 
>> may
>> optionally be sent to a corresponding Root Complex Event Collector 
>> (RCEC).
>> Each RCiEP must be associated with no more than one RCEC. Interface 
>> errors
>> are reported to the OS by RCECs.
>>
>> For an RCEC (technically not a Bridge), error messages "received" 
>> from
>> associated RCiEPs must be enabled for "transmission" in order to 
>> cause a
>> System Error via the Root Control register or (when the Advanced 
>> Error
>> Reporting Capability is present) reporting via the Root Error Command
>> register and logging in the Root Error Status register and Error 
>> Source
>> Identification register.
>>
>> Given the commonality with Root Ports and the need to also support 
>> AER
>> and PME services for RCECs, extend the Root Port driver to support 
>> RCEC
>> devices through the addition of the RCEC Class ID to the driver
>> structure.
>>
>> Co-developed-by: Sean V Kelley <sean.v.kelley@intel.com>
>> Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
>> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> ---
>>  drivers/pci/pcie/portdrv_core.c | 8 ++++----
>>  drivers/pci/pcie/portdrv_pci.c  | 5 ++++-
>>  2 files changed, 8 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/pci/pcie/portdrv_core.c 
>> b/drivers/pci/pcie/portdrv_core.c
>> index 50a9522ab07d..99769c636775 100644
>> --- a/drivers/pci/pcie/portdrv_core.c
>> +++ b/drivers/pci/pcie/portdrv_core.c
>> @@ -234,11 +234,11 @@ static int get_port_device_capability(struct 
>> pci_dev *dev)
>>  #endif
>>
>>  	/*
>> -	 * Root ports are capable of generating PME too.  Root Complex
>> -	 * Event Collectors can also generate PMEs, but we don't handle
>> -	 * those yet.
>> +	 * Root ports and Root Complex Event Collectors are capable
>> +	 * of generating PME.
>>  	 */
>> -	if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT &&
>> +	if ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
>> +	     pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC) &&
>
> It seems like this change belongs in patch 09/10, where we change
> pme.c so it claims both Root Ports and RCECs.  Does this hunk
> accomplish anything before 09/10?

You are right.  It’s not needed until 09/10.  Will move.

Thanks,

Sean

>
>>  	    (pcie_ports_native || host->native_pme)) {
>>  		services |= PCIE_PORT_SERVICE_PME;
>>
>> diff --git a/drivers/pci/pcie/portdrv_pci.c 
>> b/drivers/pci/pcie/portdrv_pci.c
>> index 3a3ce40ae1ab..4d880679b9b1 100644
>> --- a/drivers/pci/pcie/portdrv_pci.c
>> +++ b/drivers/pci/pcie/portdrv_pci.c
>> @@ -106,7 +106,8 @@ static int pcie_portdrv_probe(struct pci_dev 
>> *dev,
>>  	if (!pci_is_pcie(dev) ||
>>  	    ((pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT) &&
>>  	     (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM) &&
>> -	     (pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM)))
>> +	     (pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM) &&
>> +	     (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_EC)))
>>  		return -ENODEV;
>>
>>  	status = pcie_port_device_register(dev);
>> @@ -195,6 +196,8 @@ static const struct pci_device_id port_pci_ids[] 
>> = {
>>  	{ PCI_DEVICE_CLASS(((PCI_CLASS_BRIDGE_PCI << 8) | 0x00), ~0) },
>>  	/* subtractive decode PCI-to-PCI bridge, class type is 060401h */
>>  	{ PCI_DEVICE_CLASS(((PCI_CLASS_BRIDGE_PCI << 8) | 0x01), ~0) },
>> +	/* handle any Root Complex Event Collector */
>> +	{ PCI_DEVICE_CLASS(((PCI_CLASS_SYSTEM_RCEC << 8) | 0x00), ~0) },
>>  	{ },
>>  };
>>
>> -- 
>> 2.28.0
>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v6 03/10] PCI/RCEC: Cache RCEC capabilities in pci_init_capabilities()
  2020-09-25 20:13   ` Bjorn Helgaas
@ 2020-09-25 21:49     ` Sean V Kelley
  0 siblings, 0 replies; 25+ messages in thread
From: Sean V Kelley @ 2020-09-25 21:49 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Sean V Kelley, bhelgaas, Jonathan.Cameron, rafael.j.wysocki,
	ashok.raj, tony.luck, sathyanarayanan.kuppuswamy, qiuxu.zhuo,
	linux-pci, linux-kernel

On 25 Sep 2020, at 13:13, Bjorn Helgaas wrote:

> On Tue, Sep 22, 2020 at 02:38:52PM -0700, Sean V Kelley wrote:
>> From: Sean V Kelley <sean.v.kelley@intel.com>
>>
>> Extend support for Root Complex Event Collectors by decoding and
>> caching the RCEC Endpoint Association Extended Capabilities when
>> enumerating. Use that cached information for later error source
>> reporting. See PCI Express Base Specification, version 5.0-1,
>> section 7.9.10.
>>
>> Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
>> Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>> Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
>> ---
>>  drivers/pci/pci.h         | 18 ++++++++++++++
>>  drivers/pci/pcie/Makefile |  2 +-
>>  drivers/pci/pcie/rcec.c   | 52 
>> +++++++++++++++++++++++++++++++++++++++
>>  drivers/pci/probe.c       |  3 ++-
>>  include/linux/pci.h       |  4 +++
>>  5 files changed, 77 insertions(+), 2 deletions(-)
>>  create mode 100644 drivers/pci/pcie/rcec.c
>>
>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>> index fa12f7cbc1a0..83670a6425d8 100644
>> --- a/drivers/pci/pci.h
>> +++ b/drivers/pci/pci.h
>> @@ -449,6 +449,16 @@ int aer_get_device_error_info(struct pci_dev 
>> *dev, struct aer_err_info *info);
>>  void aer_print_error(struct pci_dev *dev, struct aer_err_info 
>> *info);
>>  #endif	/* CONFIG_PCIEAER */
>>
>> +#ifdef CONFIG_PCIEPORTBUS
>> +/* Cached RCEC Associated Endpoint Extended Capabilities */
>> +struct rcec_ext {
>> +	u8		ver;
>> +	u8		nextbusn;
>> +	u8		lastbusn;
>> +	u32		bitmap;
>> +};
>> +#endif
>> +
>>  #ifdef CONFIG_PCIE_DPC
>>  void pci_save_dpc_state(struct pci_dev *dev);
>>  void pci_restore_dpc_state(struct pci_dev *dev);
>> @@ -461,6 +471,14 @@ static inline void pci_restore_dpc_state(struct 
>> pci_dev *dev) {}
>>  static inline void pci_dpc_init(struct pci_dev *pdev) {}
>>  #endif
>>
>> +#ifdef CONFIG_PCIEPORTBUS
>> +void pci_rcec_init(struct pci_dev *dev);
>> +void pci_rcec_exit(struct pci_dev *dev);
>> +#else
>> +static inline void pci_rcec_init(struct pci_dev *dev) {}
>> +static inline void pci_rcec_exit(struct pci_dev *dev) {}
>> +#endif
>> +
>>  #ifdef CONFIG_PCI_ATS
>>  /* Address Translation Service */
>>  void pci_ats_init(struct pci_dev *dev);
>> diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile
>> index 68da9280ff11..d9697892fa3e 100644
>> --- a/drivers/pci/pcie/Makefile
>> +++ b/drivers/pci/pcie/Makefile
>> @@ -2,7 +2,7 @@
>>  #
>>  # Makefile for PCI Express features and port driver
>>
>> -pcieportdrv-y			:= portdrv_core.o portdrv_pci.o err.o
>> +pcieportdrv-y			:= portdrv_core.o portdrv_pci.o err.o rcec.o
>>
>>  obj-$(CONFIG_PCIEPORTBUS)	+= pcieportdrv.o
>>
>> diff --git a/drivers/pci/pcie/rcec.c b/drivers/pci/pcie/rcec.c
>> new file mode 100644
>> index 000000000000..519ae086ff41
>> --- /dev/null
>> +++ b/drivers/pci/pcie/rcec.c
>> @@ -0,0 +1,52 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Root Complex Event Collector Support
>> + *
>> + * Authors:
>> + *  Sean V Kelley <sean.v.kelley@intel.com>
>> + *  Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>> + *
>> + * Copyright (C) 2020 Intel Corp.
>> + */
>> +
>> +#include <linux/kernel.h>
>> +#include <linux/errno.h>
>> +#include <linux/bitops.h>
>> +#include <linux/pci.h>
>> +#include <linux/pci_regs.h>
>
> Do we really need all the above?  I don't see any errno or bitops
> here.

Will remove unused.

>
>> +#include "../pci.h"
>> +
>> +void pci_rcec_init(struct pci_dev *dev)
>> +{
>> +	u32 rcec, hdr, busn;
>> +
>> +	/* Only for Root Complex Event Collectors */
>> +	if (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_EC)
>> +		return;
>> +
>> +	dev->rcec_cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_RCEC);
>> +	if (!dev->rcec_cap)
>> +		return;
>> +
>> +	dev->rcec_ext = kzalloc(sizeof(*dev->rcec_ext), GFP_KERNEL);
>> +
>> +	rcec = dev->rcec_cap;
>> +	pci_read_config_dword(dev, rcec + PCI_RCEC_RCIEP_BITMAP, 
>> &dev->rcec_ext->bitmap);
>> +
>> +	/* Check whether RCEC BUSN register is present */
>> +	pci_read_config_dword(dev, rcec, &hdr);
>> +	dev->rcec_ext->ver = PCI_EXT_CAP_VER(hdr);
>> +	if (dev->rcec_ext->ver < PCI_RCEC_BUSN_REG_VER)
>> +		return;
>> +
>> +	pci_read_config_dword(dev, rcec + PCI_RCEC_BUSN, &busn);
>> +	dev->rcec_ext->nextbusn = PCI_RCEC_BUSN_NEXT(busn);
>> +	dev->rcec_ext->lastbusn = PCI_RCEC_BUSN_LAST(busn);
>> +}
>> +
>> +void pci_rcec_exit(struct pci_dev *dev)
>> +{
>> +	kfree(dev->rcec_ext);
>> +	dev->rcec_ext = NULL;
>> +}
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index 03d37128a24f..16bc651fecb7 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -2201,6 +2201,7 @@ static void pci_configure_device(struct pci_dev 
>> *dev)
>>  static void pci_release_capabilities(struct pci_dev *dev)
>>  {
>>  	pci_aer_exit(dev);
>> +	pci_rcec_exit(dev);
>>  	pci_vpd_release(dev);
>>  	pci_iov_release(dev);
>>  	pci_free_cap_save_buffers(dev);
>> @@ -2400,7 +2401,7 @@ static void pci_init_capabilities(struct 
>> pci_dev *dev)
>>  	pci_ptm_init(dev);		/* Precision Time Measurement */
>>  	pci_aer_init(dev);		/* Advanced Error Reporting */
>>  	pci_dpc_init(dev);		/* Downstream Port Containment */
>> -
>> +	pci_rcec_init(dev);		/* Root Complex Event Collector */
>>  	pcie_report_downtraining(dev);
>>
>>  	if (pci_probe_reset_function(dev) == 0)
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index 835530605c0d..5c5c4eb642b6 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -326,6 +326,10 @@ struct pci_dev {
>>  #ifdef CONFIG_PCIEAER
>>  	u16		aer_cap;	/* AER capability offset */
>>  	struct aer_stats *aer_stats;	/* AER stats for this device */
>> +#endif
>> +#ifdef CONFIG_PCIEPORTBUS
>> +	u16		rcec_cap;	/* RCEC capability offset */
>
> Looking through the whole series, I think rcec_cap is used (a) in
> pci_rcec_init(), where we actually read the RCEC EA capability, and
> (b) in walk_rcec() and pcie_link_rcec(), where we only use it to test
> whether the device has an RCEC EA capability.
>
> Couldn't we accomplish (b) just by testing "dev->rcec_ext"?  Then we
> wouldn't need to save rcec_cap at all.

Yes, rcec_ext really implies that the cap is there so the explicit cap 
would not be necessary.  Will remove and make use of the renamed rcec_ea 
(per below).

>
>> +	struct rcec_ext *rcec_ext;	/* RCEC cached assoc. endpoint extended 
>> capabilities */
>
> Maybe "rcec_ea"?  The important part is that this is the Endpoint
> Association information.  The fact that it happens to be an "extended"
> capability isn't very interesting.

Right, better to be more specific here.  Will change.

Thanks,

Sean

>
>>  #endif
>>  	u8		pcie_cap;	/* PCIe capability offset */
>>  	u8		msi_cap;	/* MSI capability offset */
>> -- 
>> 2.28.0
>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v6 05/10] PCI/AER: Apply function level reset to RCiEP on fatal error
  2020-09-22 21:38 ` [PATCH v6 05/10] PCI/AER: Apply function level reset to RCiEP on fatal error Sean V Kelley
@ 2020-09-25 21:58   ` Bjorn Helgaas
  2020-09-25 22:54     ` Sean V Kelley
  0 siblings, 1 reply; 25+ messages in thread
From: Bjorn Helgaas @ 2020-09-25 21:58 UTC (permalink / raw)
  To: Sean V Kelley
  Cc: bhelgaas, Jonathan.Cameron, rafael.j.wysocki, ashok.raj,
	tony.luck, sathyanarayanan.kuppuswamy, qiuxu.zhuo, linux-pci,
	linux-kernel

On Tue, Sep 22, 2020 at 02:38:54PM -0700, Sean V Kelley wrote:
> From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> 
> Attempt to do function level reset for an RCiEP associated with an
> RCEC device on fatal error.
> 
> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> ---
>  drivers/pci/pcie/err.c | 31 ++++++++++++++++++++++---------
>  1 file changed, 22 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index e575fa6cee63..5380ecc41506 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -169,6 +169,17 @@ static void pci_bridge_walk(struct pci_dev *bridge, int (*cb)(struct pci_dev *,
>  		cb(bridge, userdata);
>  }
>  
> +static pci_ers_result_t flr_on_rciep(struct pci_dev *dev)
> +{
> +	if (!pcie_has_flr(dev))
> +		return PCI_ERS_RESULT_NONE;
> +
> +	if (pcie_flr(dev))
> +		return PCI_ERS_RESULT_DISCONNECT;
> +
> +	return PCI_ERS_RESULT_RECOVERED;
> +}

Either we reset the device or we didn't; there is no third option.

If I read it correctly, nothing in pcie_do_recovery() cares about the
difference between PCI_ERS_RESULT_NONE and PCI_ERS_RESULT_DISCONNECT.
The status does get returned, but only edr_handle_event() looks at it,
and it doesn't care about the difference either.

So I think this should just return PCI_ERS_RESULT_RECOVERED or
PCI_ERS_RESULT_DISCONNECT.

>  pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>  			pci_channel_state_t state,
>  			pci_ers_result_t (*reset_subordinate_devices)(struct pci_dev *pdev))
> @@ -195,15 +206,17 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>  	if (state == pci_channel_io_frozen) {
>  		pci_bridge_walk(bridge, report_frozen_detected, &status);
>  		if (type == PCI_EXP_TYPE_RC_END) {
> -			pci_warn(dev, "link reset not possible for RCiEP\n");
> -			status = PCI_ERS_RESULT_NONE;
> -			goto failed;
> -		}
> -
> -		status = reset_subordinate_devices(bridge);
> -		if (status != PCI_ERS_RESULT_RECOVERED) {
> -			pci_warn(dev, "subordinate device reset failed\n");
> -			goto failed;
> +			status = flr_on_rciep(dev);
> +			if (status != PCI_ERS_RESULT_RECOVERED) {
> +				pci_warn(dev, "function level reset failed\n");
> +				goto failed;
> +			}
> +		} else {
> +			status = reset_subordinate_devices(bridge);
> +			if (status != PCI_ERS_RESULT_RECOVERED) {
> +				pci_warn(dev, "subordinate device reset failed\n");
> +				goto failed;
> +			}
>  		}
>  	} else {
>  		pci_bridge_walk(bridge, report_normal_detected, &status);
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v6 04/10] PCI/AER: Extend AER error handling to RCECs
  2020-09-25 21:14   ` Bjorn Helgaas
@ 2020-09-25 22:08     ` Sean V Kelley
  0 siblings, 0 replies; 25+ messages in thread
From: Sean V Kelley @ 2020-09-25 22:08 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Sean V Kelley, bhelgaas, Jonathan.Cameron, rafael.j.wysocki,
	ashok.raj, tony.luck, sathyanarayanan.kuppuswamy, qiuxu.zhuo,
	linux-pci, linux-kernel, Sinan Kaya

On 25 Sep 2020, at 14:14, Bjorn Helgaas wrote:

> [+cc Sinan, who's been reviewing changes in this area (thanks, 
> Sinan!)]
>
> On Tue, Sep 22, 2020 at 02:38:53PM -0700, Sean V Kelley wrote:
>> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>
>> Currently the kernel does not handle AER errors for Root Complex
>> integrated End Points (RCiEPs)[0]. These devices sit on a root bus 
>> within
>> the Root Complex (RC). AER handling is performed by a Root Complex 
>> Event
>> Collector (RCEC) [1] which is a effectively a type of RCiEP on the 
>> same
>> root bus.
>>
>> For an RCEC (technically not a Bridge), error messages "received" 
>> from
>> associated RCiEPs must be enabled for "transmission" in order to 
>> cause a
>> System Error via the Root Control register or (when the Advanced 
>> Error
>> Reporting Capability is present) reporting via the Root Error Command
>> register and logging in the Root Error Status register and Error 
>> Source
>> Identification register.
>>
>> In addition to the defined OS level handling of the reset flow for 
>> the
>> associated RCiEPs of an RCEC, it is possible to also have non-native
>> handling. In that case there is no need to take any actions on the 
>> RCEC
>> because the firmware is responsible for them. This is true where APEI 
>> [2]
>> is used to report the AER errors via a GHES[v2] HEST entry [3] and
>> relevant AER CPER record [4] and non-native handling is in use.
>>
>> We effectively end up with two different types of discovery for
>> purposes of handling AER errors:
>>
>> 1) Normal bus walk - we pass the downstream port above a bus to which
>> the device is attached and it walks everything below that point.
>>
>> 2) An RCiEP with no visible association with an RCEC as there is no 
>> need
>> to walk devices. In that case, the flow is to just call the callbacks 
>> for
>> the actual device, which in turn references its associated RCEC.
>>
>> A new walk function pci_bridge_walk(), similar to pci_bus_walk(),
>> is provided that takes a pci_dev instead of a bus. If that bridge
>> corresponds to a downstream port it will walk the subordinate bus of
>> that bridge. If the device does not then it will call the function on
>> that device alone.
>>
>> [0] ACPI PCI Express Base Specification 5.0-1 1.3.2.3 Root Complex
>> Integrated Endpoint Rules.
>> [1] ACPI PCI Express Base Specification 5.0-1 6.2 Error Signalling 
>> and
>> Logging
>> [2] ACPI Specification 6.3 Chapter 18 ACPI Platform Error Interface 
>> (APEI)
>> [3] ACPI Specification 6.3 18.2.3.7 Generic Hardware Error Source
>> [4] UEFI Specification 2.8, N.2.7 PCI Express Error Section
>>
>> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
>
> I like this patch.  I think there are a few things that could be
> peeled off as "no functional change" preliminary patches to make the
> important changes more obvious in the "money patch".

Great, it was helpful to discuss at LPC to give it more clarity.
>
>> ---
>>  drivers/pci/pci.h      |  2 +-
>>  drivers/pci/pcie/err.c | 77 
>> +++++++++++++++++++++++++++++++-----------
>>  2 files changed, 59 insertions(+), 20 deletions(-)
>>
>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>> index 83670a6425d8..7b547fc3679a 100644
>> --- a/drivers/pci/pci.h
>> +++ b/drivers/pci/pci.h
>> @@ -575,7 +575,7 @@ static inline int 
>> pci_dev_specific_disable_acs_redir(struct pci_dev *dev)
>>  /* PCI error reporting and recovery */
>>  pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>>  			pci_channel_state_t state,
>> -			pci_ers_result_t (*reset_link)(struct pci_dev *pdev));
>> +			pci_ers_result_t (*reset_subordinate_devices)(struct pci_dev 
>> *pdev));
>>
>>  bool pcie_wait_for_link(struct pci_dev *pdev, bool active);
>>  #ifdef CONFIG_PCIEASPM
>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
>> index c543f419d8f9..e575fa6cee63 100644
>> --- a/drivers/pci/pcie/err.c
>> +++ b/drivers/pci/pcie/err.c
>> @@ -146,38 +146,73 @@ static int report_resume(struct pci_dev *dev, 
>> void *data)
>>  	return 0;
>>  }
>>
>> +/**
>> + * pci_bridge_walk - walk bridges potentially AER affected
>> + * @bridge   bridge which may be an RCEC with associated RCiEPs,
>> + *           an RCiEP associated with an RCEC, or a Port.
>> + * @cb       callback to be called for each device found
>> + * @userdata arbitrary pointer to be passed to callback.
>> + *
>> + * If the device provided is a bridge, walk the subordinate bus,
>> + * including any bridged devices on buses under this bus.
>> + * Call the provided callback on each device found.
>> + *
>> + * If the device provided has no subordinate bus, call the provided
>> + * callback on the device itself.
>> + */
>> +static void pci_bridge_walk(struct pci_dev *bridge, int (*cb)(struct 
>> pci_dev *, void *),
>
> Maybe call this pci_walk_bridge() so it's the same order as the
> existing pci_walk_bus(), unless there's some reason to be different.

Yes, I was wanting to distinguish it from pci_walk_bus() in some way 
because it incorporated it and I wanted to put emphasis on the bridge 
first. But in retrospect, I’m saying the same thing so might as well 
be consistent! Will change.

>
>> +			    void *userdata)
>> +{
>> +	if (bridge->subordinate)
>> +		pci_walk_bus(bridge->subordinate, cb, userdata);
>> +	else
>> +		cb(bridge, userdata);
>> +}
>> +
>>  pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>>  			pci_channel_state_t state,
>> -			pci_ers_result_t (*reset_link)(struct pci_dev *pdev))
>> +			pci_ers_result_t (*reset_subordinate_devices)(struct pci_dev 
>> *pdev))
>
> The rename to "reset_subordinate_devices" seems separable, since it
> doesn't change the interface.

Agree, it’s rather independent. Also changed a warning string output. 
Will make separate.

>
>>  {
>>  	pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
>> -	struct pci_bus *bus;
>> +	struct pci_dev *bridge;
>> +	int type;
>>
>>  	/*
>> -	 * Error recovery runs on all subordinates of the first downstream 
>> port.
>> -	 * If the downstream port detected the error, it is cleared at the 
>> end.
>> +	 * Error recovery runs on all subordinates of the first downstream
>> +	 * bridge. If the downstream bridge detected the error, it is
>> +	 * cleared at the end. For RCiEPs we should reset just the RCiEP 
>> itself.
>>  	 */
>> -	if (!(pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
>> -	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM))
>> -		dev = dev->bus->self;
>> -	bus = dev->subordinate;
>> +	type = pci_pcie_type(dev);
>> +	if (type == PCI_EXP_TYPE_ROOT_PORT ||
>> +	    type == PCI_EXP_TYPE_DOWNSTREAM ||
>> +	    type == PCI_EXP_TYPE_RC_EC ||
>> +	    type == PCI_EXP_TYPE_RC_END)
>> +		bridge = dev;
>> +	else
>> +		bridge = pci_upstream_bridge(dev);
>
> This makes it much easier to read, thank you.  I think the addition of
> "type", rename of "dev" to "bridge", the inversion of the condition
> (major improvement, thanks again), and use of pci_upstream_bridge()
> instead of dev->bus->self might also be separable?

Agree, it’s separable. Will do.

>
> Of course, you'd have to add the RC_EC and RC_END cases later, in the
> money patch, but that's a good thing because it won't get lost in all
> the other things being changed.

Makes sense to me.

>
>>  	pci_dbg(dev, "broadcast error_detected message\n");
>>  	if (state == pci_channel_io_frozen) {
>> -		pci_walk_bus(bus, report_frozen_detected, &status);
>> -		status = reset_link(dev);
>> +		pci_bridge_walk(bridge, report_frozen_detected, &status);
>> +		if (type == PCI_EXP_TYPE_RC_END) {
>> +			pci_warn(dev, "link reset not possible for RCiEP\n");
>> +			status = PCI_ERS_RESULT_NONE;
>> +			goto failed;
>> +		}
>> +
>> +		status = reset_subordinate_devices(bridge);
>>  		if (status != PCI_ERS_RESULT_RECOVERED) {
>> -			pci_warn(dev, "link reset failed\n");
>> +			pci_warn(dev, "subordinate device reset failed\n");
>>  			goto failed;
>>  		}
>>  	} else {
>> -		pci_walk_bus(bus, report_normal_detected, &status);
>> +		pci_bridge_walk(bridge, report_normal_detected, &status);
>>  	}
>>
>>  	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
>>  		status = PCI_ERS_RESULT_RECOVERED;
>>  		pci_dbg(dev, "broadcast mmio_enabled message\n");
>> -		pci_walk_bus(bus, report_mmio_enabled, &status);
>> +		pci_bridge_walk(bridge, report_mmio_enabled, &status);
>>  	}
>>
>>  	if (status == PCI_ERS_RESULT_NEED_RESET) {
>> @@ -188,18 +223,22 @@ pci_ers_result_t pcie_do_recovery(struct 
>> pci_dev *dev,
>>  		 */
>>  		status = PCI_ERS_RESULT_RECOVERED;
>>  		pci_dbg(dev, "broadcast slot_reset message\n");
>> -		pci_walk_bus(bus, report_slot_reset, &status);
>> +		pci_bridge_walk(bridge, report_slot_reset, &status);
>>  	}
>>
>>  	if (status != PCI_ERS_RESULT_RECOVERED)
>>  		goto failed;
>>
>>  	pci_dbg(dev, "broadcast resume message\n");
>> -	pci_walk_bus(bus, report_resume, &status);
>> -
>> -	if (pcie_aer_is_native(dev))
>> -		pcie_clear_device_status(dev);
>> -	pci_aer_clear_nonfatal_status(dev);
>> +	pci_bridge_walk(bridge, report_resume, &status);
>> +
>> +	if (type == PCI_EXP_TYPE_ROOT_PORT ||
>> +	    type == PCI_EXP_TYPE_DOWNSTREAM ||
>> +	    type == PCI_EXP_TYPE_RC_EC) {
>
> Addition of this check also seems worthy of a separate patch (for just
> root ports and downstream ports first, then RC_EC being added later).
> That would make a convenient place to explain why we need the change.
> I think it's *correct*; it just gets lost in the noise and not even
> mentioned when it's done as part of one big patch.

It does feel a bit out of place here and having the space to also 
explain why this change is needed would be worthy of the separation. 
Will do.

Thanks!

Sean


>
>> +		if (pcie_aer_is_native(bridge))
>> +			pcie_clear_device_status(bridge);
>> +		pci_aer_clear_nonfatal_status(bridge);
>> +	}
>>  	pci_info(dev, "device recovery successful\n");
>>  	return status;
>>
>> -- 
>> 2.28.0
>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v6 06/10] PCI/RCEC: Add pcie_link_rcec() to associate RCiEPs
  2020-09-22 21:38 ` [PATCH v6 06/10] PCI/RCEC: Add pcie_link_rcec() to associate RCiEPs Sean V Kelley
@ 2020-09-25 22:15   ` Bjorn Helgaas
  2020-09-28 16:20     ` Sean V Kelley
  0 siblings, 1 reply; 25+ messages in thread
From: Bjorn Helgaas @ 2020-09-25 22:15 UTC (permalink / raw)
  To: Sean V Kelley
  Cc: bhelgaas, Jonathan.Cameron, rafael.j.wysocki, ashok.raj,
	tony.luck, sathyanarayanan.kuppuswamy, qiuxu.zhuo, linux-pci,
	linux-kernel, Sean V Kelley

On Tue, Sep 22, 2020 at 02:38:55PM -0700, Sean V Kelley wrote:
> From: Sean V Kelley <sean.v.kelley@intel.com>
> 
> A Root Complex Event Collector provides support for
> terminating error and PME messages from associated RCiEPs.
> 
> Make use of the RCEC Endpoint Association Extended Capability
> to identify associated RCiEPs. Link the associated RCiEPs as
> the RCECs are enumerated.
> 
> Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
> ---
>  drivers/pci/pci.h              |  2 +
>  drivers/pci/pcie/portdrv_pci.c |  3 ++
>  drivers/pci/pcie/rcec.c        | 96 ++++++++++++++++++++++++++++++++++
>  include/linux/pci.h            |  1 +
>  4 files changed, 102 insertions(+)
> 
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 7b547fc3679a..ddb5872466fb 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -474,9 +474,11 @@ static inline void pci_dpc_init(struct pci_dev *pdev) {}
>  #ifdef CONFIG_PCIEPORTBUS
>  void pci_rcec_init(struct pci_dev *dev);
>  void pci_rcec_exit(struct pci_dev *dev);
> +void pcie_link_rcec(struct pci_dev *rcec);
>  #else
>  static inline void pci_rcec_init(struct pci_dev *dev) {}
>  static inline void pci_rcec_exit(struct pci_dev *dev) {}
> +static inline void pcie_link_rcec(struct pci_dev *rcec) {}
>  #endif
>  
>  #ifdef CONFIG_PCI_ATS
> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> index 4d880679b9b1..dbeb0155c2c3 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -110,6 +110,9 @@ static int pcie_portdrv_probe(struct pci_dev *dev,
>  	     (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_EC)))
>  		return -ENODEV;
>  
> +	if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC)
> +		pcie_link_rcec(dev);

Nice solution.  One day we'll get rid of pcie_portdrv_probe() and
integrate this stuff better into the PCI core, and we'll have to
figure out a little different solution then.  But we'll be smarter
then so it should be possible :)

>  	status = pcie_port_device_register(dev);
>  	if (status)
>  		return status;
> diff --git a/drivers/pci/pcie/rcec.c b/drivers/pci/pcie/rcec.c
> index 519ae086ff41..5630480a6659 100644
> --- a/drivers/pci/pcie/rcec.c
> +++ b/drivers/pci/pcie/rcec.c
> @@ -17,6 +17,102 @@
>  
>  #include "../pci.h"
>  
> +struct walk_rcec_data {
> +	struct pci_dev *rcec;
> +	int (*user_callback)(struct pci_dev *dev, void *data);
> +	void *user_data;
> +};
> +
> +static bool rcec_assoc_rciep(struct pci_dev *rcec, struct pci_dev *rciep)
> +{
> +	unsigned long bitmap = rcec->rcec_ext->bitmap;
> +	unsigned int devn;
> +
> +	/* An RCiEP found on bus in range */
> +	if (rcec->bus->number != rciep->bus->number)
> +		return true;
> +
> +	/* Same bus, so check bitmap */
> +	for_each_set_bit(devn, &bitmap, 32)
> +		if (devn == rciep->devfn)
> +			return true;
> +
> +	return false;
> +}
> +
> +static int link_rcec_helper(struct pci_dev *dev, void *data)
> +{
> +	struct walk_rcec_data *rcec_data = data;
> +	struct pci_dev *rcec = rcec_data->rcec;
> +
> +	if ((pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END) && rcec_assoc_rciep(rcec, dev)) {
> +		dev->rcec = rcec;
> +		pci_dbg(dev, "PME & error events reported via %s\n", pci_name(rcec));
> +	}
> +
> +	return 0;
> +}
> +
> +void walk_rcec(int (*cb)(struct pci_dev *dev, void *data), void *userdata)
> +{
> +	struct walk_rcec_data *rcec_data = userdata;
> +	struct pci_dev *rcec = rcec_data->rcec;
> +	u8 nextbusn, lastbusn;
> +	struct pci_bus *bus;
> +	unsigned int bnr;
> +
> +	if (!rcec->rcec_cap)
> +		return;
> +
> +	/* Walk own bus for bitmap based association */
> +	pci_walk_bus(rcec->bus, cb, rcec_data);
> +
> +	/* Check whether RCEC BUSN register is present */
> +	if (rcec->rcec_ext->ver < PCI_RCEC_BUSN_REG_VER)
> +		return;
> +
> +	nextbusn = rcec->rcec_ext->nextbusn;
> +	lastbusn = rcec->rcec_ext->lastbusn;
> +
> +	/* All RCiEP devices are on the same bus as the RCEC */
> +	if (nextbusn == 0xff && lastbusn == 0x00)
> +		return;
> +
> +	for (bnr = nextbusn; bnr <= lastbusn; bnr++) {
> +		/* No association indicated (PCIe 5.0-1, 7.9.10.3) */
> +		if (bnr == rcec->bus->number)
> +			continue;
> +
> +		bus = pci_find_bus(pci_domain_nr(rcec->bus), bnr);
> +		if (!bus)
> +			continue;
> +
> +		/* Find RCiEP devices on the given bus ranges */
> +		pci_walk_bus(bus, cb, rcec_data);
> +	}
> +}
> +
> +/**
> + * pcie_link_rcec - Link RCiEP devices associating with RCEC.
> + * @rcec     RCEC whose RCiEP devices should be linked.
> + *
> + * Link the given RCEC to each RCiEP device found.
> + *
> + */
> +void pcie_link_rcec(struct pci_dev *rcec)
> +{
> +	struct walk_rcec_data rcec_data;
> +
> +	if (!rcec->rcec_cap)
> +		return;
> +
> +	rcec_data.rcec = rcec;
> +	rcec_data.user_callback = NULL;
> +	rcec_data.user_data = NULL;
> +
> +	walk_rcec(link_rcec_helper, &rcec_data);
> +}
> +
>  void pci_rcec_init(struct pci_dev *dev)
>  {
>  	u32 rcec, hdr, busn;
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 5c5c4eb642b6..ad382a9484ea 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -330,6 +330,7 @@ struct pci_dev {
>  #ifdef CONFIG_PCIEPORTBUS
>  	u16		rcec_cap;	/* RCEC capability offset */
>  	struct rcec_ext *rcec_ext;	/* RCEC cached assoc. endpoint extended capabilities */
> +	struct pci_dev  *rcec;          /* Associated RCEC device */

Wondering if we can put this pointer inside the struct rcec_ext (or
whatever we call it) so we don't have to pay *two* pointers for every
PCI device?  Maybe it should be something like this:

  struct rcec {
    u8			ea_ver;
    u8			ea_nextbusn;
    u8			ea_lastbusn;
    u32			ea_bitmap;
    struct pci_dev	*rcec;
  }

I dunno.  Not sure if that would be better or worse, since we'd be
mixing RCEC stuff (the EA capability info) with RCiEP stuff (the
pointer to the related RCEC).  Could even be a union, since any given
device only needs one of them, but I'm pretty sure that would be
worse.

BTW, 03/10 didn't add a forward declaration, e.g.,

  struct rcec_ext;

to include/linux/pci.h, and the actual definition is in
drivers/pci/pci.h.  It seems like you should need that?

>  #endif
>  	u8		pcie_cap;	/* PCIe capability offset */
>  	u8		msi_cap;	/* MSI capability offset */
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v6 05/10] PCI/AER: Apply function level reset to RCiEP on fatal error
  2020-09-25 21:58   ` Bjorn Helgaas
@ 2020-09-25 22:54     ` Sean V Kelley
  0 siblings, 0 replies; 25+ messages in thread
From: Sean V Kelley @ 2020-09-25 22:54 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Sean V Kelley, bhelgaas, Jonathan.Cameron, rafael.j.wysocki,
	ashok.raj, tony.luck, sathyanarayanan.kuppuswamy, qiuxu.zhuo,
	linux-pci, linux-kernel

On 25 Sep 2020, at 14:58, Bjorn Helgaas wrote:

> On Tue, Sep 22, 2020 at 02:38:54PM -0700, Sean V Kelley wrote:
>> From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>>
>> Attempt to do function level reset for an RCiEP associated with an
>> RCEC device on fatal error.
>>
>> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>> ---
>>  drivers/pci/pcie/err.c | 31 ++++++++++++++++++++++---------
>>  1 file changed, 22 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
>> index e575fa6cee63..5380ecc41506 100644
>> --- a/drivers/pci/pcie/err.c
>> +++ b/drivers/pci/pcie/err.c
>> @@ -169,6 +169,17 @@ static void pci_bridge_walk(struct pci_dev 
>> *bridge, int (*cb)(struct pci_dev *,
>>  		cb(bridge, userdata);
>>  }
>>
>> +static pci_ers_result_t flr_on_rciep(struct pci_dev *dev)
>> +{
>> +	if (!pcie_has_flr(dev))
>> +		return PCI_ERS_RESULT_NONE;
>> +
>> +	if (pcie_flr(dev))
>> +		return PCI_ERS_RESULT_DISCONNECT;
>> +
>> +	return PCI_ERS_RESULT_RECOVERED;
>> +}
>
> Either we reset the device or we didn't; there is no third option.
>
> If I read it correctly, nothing in pcie_do_recovery() cares about the
> difference between PCI_ERS_RESULT_NONE and PCI_ERS_RESULT_DISCONNECT.
> The status does get returned, but only edr_handle_event() looks at it,
> and it doesn't care about the difference either.
>
> So I think this should just return PCI_ERS_RESULT_RECOVERED or
> PCI_ERS_RESULT_DISCONNECT.

You raise a good point. As currently implemented, the usage requires 
either a reset or not result. Will change.

Thanks,

Sean

>
>>  pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>>  			pci_channel_state_t state,
>>  			pci_ers_result_t (*reset_subordinate_devices)(struct pci_dev 
>> *pdev))
>> @@ -195,15 +206,17 @@ pci_ers_result_t pcie_do_recovery(struct 
>> pci_dev *dev,
>>  	if (state == pci_channel_io_frozen) {
>>  		pci_bridge_walk(bridge, report_frozen_detected, &status);
>>  		if (type == PCI_EXP_TYPE_RC_END) {
>> -			pci_warn(dev, "link reset not possible for RCiEP\n");
>> -			status = PCI_ERS_RESULT_NONE;
>> -			goto failed;
>> -		}
>> -
>> -		status = reset_subordinate_devices(bridge);
>> -		if (status != PCI_ERS_RESULT_RECOVERED) {
>> -			pci_warn(dev, "subordinate device reset failed\n");
>> -			goto failed;
>> +			status = flr_on_rciep(dev);
>> +			if (status != PCI_ERS_RESULT_RECOVERED) {
>> +				pci_warn(dev, "function level reset failed\n");
>> +				goto failed;
>> +			}
>> +		} else {
>> +			status = reset_subordinate_devices(bridge);
>> +			if (status != PCI_ERS_RESULT_RECOVERED) {
>> +				pci_warn(dev, "subordinate device reset failed\n");
>> +				goto failed;
>> +			}
>>  		}
>>  	} else {
>>  		pci_bridge_walk(bridge, report_normal_detected, &status);
>> -- 
>> 2.28.0
>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v6 07/10] PCI/RCEC: Add RCiEP's linked RCEC to AER/ERR
  2020-09-22 21:38 ` [PATCH v6 07/10] PCI/RCEC: Add RCiEP's linked RCEC to AER/ERR Sean V Kelley
@ 2020-09-27 23:45   ` Bjorn Helgaas
  2020-09-28  1:47     ` Bjorn Helgaas
  2020-09-28 17:27     ` Sean V Kelley
  0 siblings, 2 replies; 25+ messages in thread
From: Bjorn Helgaas @ 2020-09-27 23:45 UTC (permalink / raw)
  To: Sean V Kelley
  Cc: bhelgaas, Jonathan.Cameron, rafael.j.wysocki, ashok.raj,
	tony.luck, sathyanarayanan.kuppuswamy, qiuxu.zhuo, linux-pci,
	linux-kernel, Sean V Kelley

On Tue, Sep 22, 2020 at 02:38:56PM -0700, Sean V Kelley wrote:
> From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> 
> When attempting error recovery for an RCiEP associated with an RCEC device,
> there needs to be a way to update the Root Error Status, the Uncorrectable
> Error Status and the Uncorrectable Error Severity of the parent RCEC.
> In some non-native cases in which there is no OS visible device
> associated with the RCiEP, there is nothing to act upon as the firmware
> is acting before the OS. So add handling for the linked 'rcec' in AER/ERR
> while taking into account non-native cases.
> 
> Co-developed-by: Sean V Kelley <sean.v.kelley@intel.com>
> Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> ---
>  drivers/pci/pcie/aer.c |  9 +++++----
>  drivers/pci/pcie/err.c | 38 ++++++++++++++++++++++++--------------
>  2 files changed, 29 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 65dff5f3457a..dccdba60b5d9 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1358,17 +1358,18 @@ static int aer_probe(struct pcie_device *dev)
>  static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
>  {
>  	int aer = dev->aer_cap;
> +	int rc = 0;
>  	u32 reg32;
> -	int rc;
> -
>  
>  	/* Disable Root's interrupt in response to error messages */
>  	pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, &reg32);
>  	reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
>  	pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32);
>  
> -	rc = pci_bus_error_reset(dev);
> -	pci_info(dev, "Root Port link has been reset\n");
> +	if (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_EC) {
> +		rc = pci_bus_error_reset(dev);
> +		pci_info(dev, "Root Port link has been reset\n");
> +	}
>  
>  	/* Clear Root Error Status */
>  	pci_read_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, &reg32);
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 5380ecc41506..a61a2518163a 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -149,7 +149,8 @@ static int report_resume(struct pci_dev *dev, void *data)
>  /**
>   * pci_bridge_walk - walk bridges potentially AER affected
>   * @bridge   bridge which may be an RCEC with associated RCiEPs,
> - *           an RCiEP associated with an RCEC, or a Port.
> + *           or a Port.
> + * @dev      an RCiEP lacking an associated RCEC.
>   * @cb       callback to be called for each device found
>   * @userdata arbitrary pointer to be passed to callback.
>   *
> @@ -160,13 +161,16 @@ static int report_resume(struct pci_dev *dev, void *data)
>   * If the device provided has no subordinate bus, call the provided
>   * callback on the device itself.
>   */
> -static void pci_bridge_walk(struct pci_dev *bridge, int (*cb)(struct pci_dev *, void *),
> +static void pci_bridge_walk(struct pci_dev *bridge, struct pci_dev *dev,
> +			    int (*cb)(struct pci_dev *, void *),
>  			    void *userdata)
>  {
> -	if (bridge->subordinate)
> +	if (bridge && bridge->subordinate)

I *guess* this means there's a possibility that bridge is NULL?  And I
guess that case is when pci_upstream_bridge(dev) in pcie_do_recovery()
was NULL?

I can't figure out what that means in practice.  Oh, wait, I bet I
know -- this is the non-native case where there's no OS-visible
reporting device.

We need some sort of comment because this is really not a top-of-mind
situation unless you happen to be working on one of the few systems
like that.

Not too sure this scenario is really a good fit for the
pci_bridge_walk() model, but I think it's OK for now with a hint so we
can remember this no RCEC, no Root Port model.

>  		pci_walk_bus(bridge->subordinate, cb, userdata);
> -	else
> +	else if (bridge)
>  		cb(bridge, userdata);
> +	else
> +		cb(dev, userdata);
>  }
>  
>  static pci_ers_result_t flr_on_rciep(struct pci_dev *dev)
> @@ -196,16 +200,24 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>  	type = pci_pcie_type(dev);
>  	if (type == PCI_EXP_TYPE_ROOT_PORT ||
>  	    type == PCI_EXP_TYPE_DOWNSTREAM ||
> -	    type == PCI_EXP_TYPE_RC_EC ||
> -	    type == PCI_EXP_TYPE_RC_END)
> +	    type == PCI_EXP_TYPE_RC_EC)
>  		bridge = dev;
> +	else if (type == PCI_EXP_TYPE_RC_END)
> +		bridge = dev->rcec;
>  	else
>  		bridge = pci_upstream_bridge(dev);
>  
>  	pci_dbg(dev, "broadcast error_detected message\n");
>  	if (state == pci_channel_io_frozen) {
> -		pci_bridge_walk(bridge, report_frozen_detected, &status);
> +		pci_bridge_walk(bridge, dev, report_frozen_detected, &status);
>  		if (type == PCI_EXP_TYPE_RC_END) {
> +			/*
> +			 * The callback only clears the Root Error Status
> +			 * of the RCEC (see aer.c).
> +			 */
> +			if (bridge)
> +				reset_subordinate_devices(bridge);

It's unfortunate to add yet another special case in this code, and I'm
not thrilled about the one in aer_root_reset() either.  It's just not
obvious why they should be there.  I'm sure if I spent 30 minutes
decoding things, it would all make sense.  Guess I'm just griping
because I don't have a better suggestion.

>  			status = flr_on_rciep(dev);
>  			if (status != PCI_ERS_RESULT_RECOVERED) {
>  				pci_warn(dev, "function level reset failed\n");
> @@ -219,13 +231,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>  			}
>  		}
>  	} else {
> -		pci_bridge_walk(bridge, report_normal_detected, &status);
> +		pci_bridge_walk(bridge, dev, report_normal_detected, &status);
>  	}
>  
>  	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
>  		status = PCI_ERS_RESULT_RECOVERED;
>  		pci_dbg(dev, "broadcast mmio_enabled message\n");
> -		pci_bridge_walk(bridge, report_mmio_enabled, &status);
> +		pci_bridge_walk(bridge, dev, report_mmio_enabled, &status);
>  	}
>  
>  	if (status == PCI_ERS_RESULT_NEED_RESET) {
> @@ -236,18 +248,16 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>  		 */
>  		status = PCI_ERS_RESULT_RECOVERED;
>  		pci_dbg(dev, "broadcast slot_reset message\n");
> -		pci_bridge_walk(bridge, report_slot_reset, &status);
> +		pci_bridge_walk(bridge, dev, report_slot_reset, &status);
>  	}
>  
>  	if (status != PCI_ERS_RESULT_RECOVERED)
>  		goto failed;
>  
>  	pci_dbg(dev, "broadcast resume message\n");
> -	pci_bridge_walk(bridge, report_resume, &status);
> +	pci_bridge_walk(bridge, dev, report_resume, &status);
>  
> -	if (type == PCI_EXP_TYPE_ROOT_PORT ||
> -	    type == PCI_EXP_TYPE_DOWNSTREAM ||
> -	    type == PCI_EXP_TYPE_RC_EC) {
> +	if (bridge) {
>  		if (pcie_aer_is_native(bridge))
>  			pcie_clear_device_status(bridge);
>  		pci_aer_clear_nonfatal_status(bridge);
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v6 07/10] PCI/RCEC: Add RCiEP's linked RCEC to AER/ERR
  2020-09-27 23:45   ` Bjorn Helgaas
@ 2020-09-28  1:47     ` Bjorn Helgaas
  2020-09-28 21:36       ` Sean V Kelley
  2020-09-28 17:27     ` Sean V Kelley
  1 sibling, 1 reply; 25+ messages in thread
From: Bjorn Helgaas @ 2020-09-28  1:47 UTC (permalink / raw)
  To: Sean V Kelley
  Cc: bhelgaas, Jonathan.Cameron, rafael.j.wysocki, ashok.raj,
	tony.luck, sathyanarayanan.kuppuswamy, qiuxu.zhuo, linux-pci,
	linux-kernel, Sean V Kelley

On Sun, Sep 27, 2020 at 06:45:45PM -0500, Bjorn Helgaas wrote:
> On Tue, Sep 22, 2020 at 02:38:56PM -0700, Sean V Kelley wrote:
> > From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>

> >  	pci_dbg(dev, "broadcast error_detected message\n");
> >  	if (state == pci_channel_io_frozen) {
> > -		pci_bridge_walk(bridge, report_frozen_detected, &status);
> > +		pci_bridge_walk(bridge, dev, report_frozen_detected, &status);
> >  		if (type == PCI_EXP_TYPE_RC_END) {
> > +			/*
> > +			 * The callback only clears the Root Error Status
> > +			 * of the RCEC (see aer.c).
> > +			 */
> > +			if (bridge)
> > +				reset_subordinate_devices(bridge);
> 
> It's unfortunate to add yet another special case in this code, and I'm
> not thrilled about the one in aer_root_reset() either.  It's just not
> obvious why they should be there.  I'm sure if I spent 30 minutes
> decoding things, it would all make sense.  Guess I'm just griping
> because I don't have a better suggestion.

I'm sorry, this was unkind of me.  If I don't have a constructive
idea, there's no reason for me to complain about this.  I apologize.

Bjorn

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v6 06/10] PCI/RCEC: Add pcie_link_rcec() to associate RCiEPs
  2020-09-25 22:15   ` Bjorn Helgaas
@ 2020-09-28 16:20     ` Sean V Kelley
  0 siblings, 0 replies; 25+ messages in thread
From: Sean V Kelley @ 2020-09-28 16:20 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Sean V Kelley, bhelgaas, Jonathan.Cameron, rafael.j.wysocki,
	ashok.raj, tony.luck, sathyanarayanan.kuppuswamy, qiuxu.zhuo,
	linux-pci, linux-kernel

On 25 Sep 2020, at 15:15, Bjorn Helgaas wrote:

> On Tue, Sep 22, 2020 at 02:38:55PM -0700, Sean V Kelley wrote:
>> From: Sean V Kelley <sean.v.kelley@intel.com>
>>
>> A Root Complex Event Collector provides support for
>> terminating error and PME messages from associated RCiEPs.
>>
>> Make use of the RCEC Endpoint Association Extended Capability
>> to identify associated RCiEPs. Link the associated RCiEPs as
>> the RCECs are enumerated.
>>
>> Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>> Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
>> ---
>>  drivers/pci/pci.h              |  2 +
>>  drivers/pci/pcie/portdrv_pci.c |  3 ++
>>  drivers/pci/pcie/rcec.c        | 96 
>> ++++++++++++++++++++++++++++++++++
>>  include/linux/pci.h            |  1 +
>>  4 files changed, 102 insertions(+)
>>
>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>> index 7b547fc3679a..ddb5872466fb 100644
>> --- a/drivers/pci/pci.h
>> +++ b/drivers/pci/pci.h
>> @@ -474,9 +474,11 @@ static inline void pci_dpc_init(struct pci_dev 
>> *pdev) {}
>>  #ifdef CONFIG_PCIEPORTBUS
>>  void pci_rcec_init(struct pci_dev *dev);
>>  void pci_rcec_exit(struct pci_dev *dev);
>> +void pcie_link_rcec(struct pci_dev *rcec);
>>  #else
>>  static inline void pci_rcec_init(struct pci_dev *dev) {}
>>  static inline void pci_rcec_exit(struct pci_dev *dev) {}
>> +static inline void pcie_link_rcec(struct pci_dev *rcec) {}
>>  #endif
>>
>>  #ifdef CONFIG_PCI_ATS
>> diff --git a/drivers/pci/pcie/portdrv_pci.c 
>> b/drivers/pci/pcie/portdrv_pci.c
>> index 4d880679b9b1..dbeb0155c2c3 100644
>> --- a/drivers/pci/pcie/portdrv_pci.c
>> +++ b/drivers/pci/pcie/portdrv_pci.c
>> @@ -110,6 +110,9 @@ static int pcie_portdrv_probe(struct pci_dev 
>> *dev,
>>  	     (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_EC)))
>>  		return -ENODEV;
>>
>> +	if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC)
>> +		pcie_link_rcec(dev);
>
> Nice solution.  One day we'll get rid of pcie_portdrv_probe() and
> integrate this stuff better into the PCI core, and we'll have to
> figure out a little different solution then.  But we'll be smarter
> then so it should be possible :)

Indeed!

>
>>  	status = pcie_port_device_register(dev);
>>  	if (status)
>>  		return status;
>> diff --git a/drivers/pci/pcie/rcec.c b/drivers/pci/pcie/rcec.c
>> index 519ae086ff41..5630480a6659 100644
>> --- a/drivers/pci/pcie/rcec.c
>> +++ b/drivers/pci/pcie/rcec.c
>> @@ -17,6 +17,102 @@
>>
>>  #include "../pci.h"
>>
>> +struct walk_rcec_data {
>> +	struct pci_dev *rcec;
>> +	int (*user_callback)(struct pci_dev *dev, void *data);
>> +	void *user_data;
>> +};
>> +
>> +static bool rcec_assoc_rciep(struct pci_dev *rcec, struct pci_dev 
>> *rciep)
>> +{
>> +	unsigned long bitmap = rcec->rcec_ext->bitmap;
>> +	unsigned int devn;
>> +
>> +	/* An RCiEP found on bus in range */
>> +	if (rcec->bus->number != rciep->bus->number)
>> +		return true;
>> +
>> +	/* Same bus, so check bitmap */
>> +	for_each_set_bit(devn, &bitmap, 32)
>> +		if (devn == rciep->devfn)
>> +			return true;
>> +
>> +	return false;
>> +}
>> +
>> +static int link_rcec_helper(struct pci_dev *dev, void *data)
>> +{
>> +	struct walk_rcec_data *rcec_data = data;
>> +	struct pci_dev *rcec = rcec_data->rcec;
>> +
>> +	if ((pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END) && 
>> rcec_assoc_rciep(rcec, dev)) {
>> +		dev->rcec = rcec;
>> +		pci_dbg(dev, "PME & error events reported via %s\n", 
>> pci_name(rcec));
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +void walk_rcec(int (*cb)(struct pci_dev *dev, void *data), void 
>> *userdata)
>> +{
>> +	struct walk_rcec_data *rcec_data = userdata;
>> +	struct pci_dev *rcec = rcec_data->rcec;
>> +	u8 nextbusn, lastbusn;
>> +	struct pci_bus *bus;
>> +	unsigned int bnr;
>> +
>> +	if (!rcec->rcec_cap)
>> +		return;
>> +
>> +	/* Walk own bus for bitmap based association */
>> +	pci_walk_bus(rcec->bus, cb, rcec_data);
>> +
>> +	/* Check whether RCEC BUSN register is present */
>> +	if (rcec->rcec_ext->ver < PCI_RCEC_BUSN_REG_VER)
>> +		return;
>> +
>> +	nextbusn = rcec->rcec_ext->nextbusn;
>> +	lastbusn = rcec->rcec_ext->lastbusn;
>> +
>> +	/* All RCiEP devices are on the same bus as the RCEC */
>> +	if (nextbusn == 0xff && lastbusn == 0x00)
>> +		return;
>> +
>> +	for (bnr = nextbusn; bnr <= lastbusn; bnr++) {
>> +		/* No association indicated (PCIe 5.0-1, 7.9.10.3) */
>> +		if (bnr == rcec->bus->number)
>> +			continue;
>> +
>> +		bus = pci_find_bus(pci_domain_nr(rcec->bus), bnr);
>> +		if (!bus)
>> +			continue;
>> +
>> +		/* Find RCiEP devices on the given bus ranges */
>> +		pci_walk_bus(bus, cb, rcec_data);
>> +	}
>> +}
>> +
>> +/**
>> + * pcie_link_rcec - Link RCiEP devices associating with RCEC.
>> + * @rcec     RCEC whose RCiEP devices should be linked.
>> + *
>> + * Link the given RCEC to each RCiEP device found.
>> + *
>> + */
>> +void pcie_link_rcec(struct pci_dev *rcec)
>> +{
>> +	struct walk_rcec_data rcec_data;
>> +
>> +	if (!rcec->rcec_cap)
>> +		return;
>> +
>> +	rcec_data.rcec = rcec;
>> +	rcec_data.user_callback = NULL;
>> +	rcec_data.user_data = NULL;
>> +
>> +	walk_rcec(link_rcec_helper, &rcec_data);
>> +}
>> +
>>  void pci_rcec_init(struct pci_dev *dev)
>>  {
>>  	u32 rcec, hdr, busn;
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index 5c5c4eb642b6..ad382a9484ea 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -330,6 +330,7 @@ struct pci_dev {
>>  #ifdef CONFIG_PCIEPORTBUS
>>  	u16		rcec_cap;	/* RCEC capability offset */
>>  	struct rcec_ext *rcec_ext;	/* RCEC cached assoc. endpoint extended 
>> capabilities */
>> +	struct pci_dev  *rcec;          /* Associated RCEC device */
>
> Wondering if we can put this pointer inside the struct rcec_ext (or
> whatever we call it) so we don't have to pay *two* pointers for every
> PCI device?  Maybe it should be something like this:

It’s definitely unfortunate to need these two pointers. Here we have 
the spec implying an association, but leaving it to software to fill in 
the gaps! The spec provides a bitmap, but the reverse mapping is also 
needed from the RCiEPs to the RCEC.

>
>   struct rcec {
>     u8			ea_ver;
>     u8			ea_nextbusn;
>     u8			ea_lastbusn;
>     u32			ea_bitmap;
>     struct pci_dev	*rcec;
>   }
>
> I dunno.  Not sure if that would be better or worse, since we'd be
> mixing RCEC stuff (the EA capability info) with RCiEP stuff (the
> pointer to the related RCEC).  Could even be a union, since any given
> device only needs one of them, but I'm pretty sure that would be
> worse.

Well, I think the pointer savings would be offset by lack of clarity on 
what is going on here due to this nesting. Then again, *rcec_ea and 
*rcec are not themselves revealing in terms of the relationship with 
RCiEPs either.  I’ll think on it some more. I do like the ea_ prefix.

>
> BTW, 03/10 didn't add a forward declaration, e.g.,
>
>   struct rcec_ext;
>
> to include/linux/pci.h, and the actual definition is in
> drivers/pci/pci.h.  It seems like you should need that?

Yes, it should be added.  Will do.

Thanks,

Sean
>
>>  #endif
>>  	u8		pcie_cap;	/* PCIe capability offset */
>>  	u8		msi_cap;	/* MSI capability offset */
>> -- 
>> 2.28.0
>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v6 07/10] PCI/RCEC: Add RCiEP's linked RCEC to AER/ERR
  2020-09-27 23:45   ` Bjorn Helgaas
  2020-09-28  1:47     ` Bjorn Helgaas
@ 2020-09-28 17:27     ` Sean V Kelley
  1 sibling, 0 replies; 25+ messages in thread
From: Sean V Kelley @ 2020-09-28 17:27 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Sean V Kelley, bhelgaas, Jonathan.Cameron, rafael.j.wysocki,
	ashok.raj, tony.luck, sathyanarayanan.kuppuswamy, qiuxu.zhuo,
	linux-pci, linux-kernel

On 27 Sep 2020, at 16:45, Bjorn Helgaas wrote:

> On Tue, Sep 22, 2020 at 02:38:56PM -0700, Sean V Kelley wrote:
>> From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>>
>> When attempting error recovery for an RCiEP associated with an RCEC 
>> device,
>> there needs to be a way to update the Root Error Status, the 
>> Uncorrectable
>> Error Status and the Uncorrectable Error Severity of the parent RCEC.
>> In some non-native cases in which there is no OS visible device
>> associated with the RCiEP, there is nothing to act upon as the 
>> firmware
>> is acting before the OS. So add handling for the linked 'rcec' in 
>> AER/ERR
>> while taking into account non-native cases.
>>
>> Co-developed-by: Sean V Kelley <sean.v.kelley@intel.com>
>> Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
>> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>> ---
>>  drivers/pci/pcie/aer.c |  9 +++++----
>>  drivers/pci/pcie/err.c | 38 ++++++++++++++++++++++++--------------
>>  2 files changed, 29 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>> index 65dff5f3457a..dccdba60b5d9 100644
>> --- a/drivers/pci/pcie/aer.c
>> +++ b/drivers/pci/pcie/aer.c
>> @@ -1358,17 +1358,18 @@ static int aer_probe(struct pcie_device *dev)
>>  static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
>>  {
>>  	int aer = dev->aer_cap;
>> +	int rc = 0;
>>  	u32 reg32;
>> -	int rc;
>> -
>>
>>  	/* Disable Root's interrupt in response to error messages */
>>  	pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, &reg32);
>>  	reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
>>  	pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32);
>>
>> -	rc = pci_bus_error_reset(dev);
>> -	pci_info(dev, "Root Port link has been reset\n");
>> +	if (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_EC) {
>> +		rc = pci_bus_error_reset(dev);
>> +		pci_info(dev, "Root Port link has been reset\n");
>> +	}
>>
>>  	/* Clear Root Error Status */
>>  	pci_read_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, &reg32);
>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
>> index 5380ecc41506..a61a2518163a 100644
>> --- a/drivers/pci/pcie/err.c
>> +++ b/drivers/pci/pcie/err.c
>> @@ -149,7 +149,8 @@ static int report_resume(struct pci_dev *dev, 
>> void *data)
>>  /**
>>   * pci_bridge_walk - walk bridges potentially AER affected
>>   * @bridge   bridge which may be an RCEC with associated RCiEPs,
>> - *           an RCiEP associated with an RCEC, or a Port.
>> + *           or a Port.
>> + * @dev      an RCiEP lacking an associated RCEC.
>>   * @cb       callback to be called for each device found
>>   * @userdata arbitrary pointer to be passed to callback.
>>   *
>> @@ -160,13 +161,16 @@ static int report_resume(struct pci_dev *dev, 
>> void *data)
>>   * If the device provided has no subordinate bus, call the provided
>>   * callback on the device itself.
>>   */
>> -static void pci_bridge_walk(struct pci_dev *bridge, int (*cb)(struct 
>> pci_dev *, void *),
>> +static void pci_bridge_walk(struct pci_dev *bridge, struct pci_dev 
>> *dev,
>> +			    int (*cb)(struct pci_dev *, void *),
>>  			    void *userdata)
>>  {
>> -	if (bridge->subordinate)
>> +	if (bridge && bridge->subordinate)
>
> I *guess* this means there's a possibility that bridge is NULL?  And I
> guess that case is when pci_upstream_bridge(dev) in pcie_do_recovery()
> was NULL?
>
> I can't figure out what that means in practice.  Oh, wait, I bet I
> know -- this is the non-native case where there's no OS-visible
> reporting device.

Yes, this is for the non-native case.

>
> We need some sort of comment because this is really not a top-of-mind
> situation unless you happen to be working on one of the few systems
> like that.

Will do.

>
> Not too sure this scenario is really a good fit for the
> pci_bridge_walk() model, but I think it's OK for now with a hint so we
> can remember this no RCEC, no Root Port model.

Understood.

>
>>  		pci_walk_bus(bridge->subordinate, cb, userdata);
>> -	else
>> +	else if (bridge)
>>  		cb(bridge, userdata);
>> +	else
>> +		cb(dev, userdata);
>>  }
>>
>>  static pci_ers_result_t flr_on_rciep(struct pci_dev *dev)
>> @@ -196,16 +200,24 @@ pci_ers_result_t pcie_do_recovery(struct 
>> pci_dev *dev,
>>  	type = pci_pcie_type(dev);
>>  	if (type == PCI_EXP_TYPE_ROOT_PORT ||
>>  	    type == PCI_EXP_TYPE_DOWNSTREAM ||
>> -	    type == PCI_EXP_TYPE_RC_EC ||
>> -	    type == PCI_EXP_TYPE_RC_END)
>> +	    type == PCI_EXP_TYPE_RC_EC)
>>  		bridge = dev;
>> +	else if (type == PCI_EXP_TYPE_RC_END)
>> +		bridge = dev->rcec;
>>  	else
>>  		bridge = pci_upstream_bridge(dev);
>>
>>  	pci_dbg(dev, "broadcast error_detected message\n");
>>  	if (state == pci_channel_io_frozen) {
>> -		pci_bridge_walk(bridge, report_frozen_detected, &status);
>> +		pci_bridge_walk(bridge, dev, report_frozen_detected, &status);
>>  		if (type == PCI_EXP_TYPE_RC_END) {
>> +			/*
>> +			 * The callback only clears the Root Error Status
>> +			 * of the RCEC (see aer.c).
>> +			 */
>> +			if (bridge)
>> +				reset_subordinate_devices(bridge);
>
> It's unfortunate to add yet another special case in this code, and I'm
> not thrilled about the one in aer_root_reset() either.  It's just not
> obvious why they should be there.  I'm sure if I spent 30 minutes
> decoding things, it would all make sense.  Guess I'm just griping
> because I don't have a better suggestion.

I could also add some more detail here referencing the non-native case 
where there may not even exist an RCEC. Aside from comments, I could 
employ a descriptive flag such as “native_bridge”.

Thanks,

Sean

>
>>  			status = flr_on_rciep(dev);
>>  			if (status != PCI_ERS_RESULT_RECOVERED) {
>>  				pci_warn(dev, "function level reset failed\n");
>> @@ -219,13 +231,13 @@ pci_ers_result_t pcie_do_recovery(struct 
>> pci_dev *dev,
>>  			}
>>  		}
>>  	} else {
>> -		pci_bridge_walk(bridge, report_normal_detected, &status);
>> +		pci_bridge_walk(bridge, dev, report_normal_detected, &status);
>>  	}
>>
>>  	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
>>  		status = PCI_ERS_RESULT_RECOVERED;
>>  		pci_dbg(dev, "broadcast mmio_enabled message\n");
>> -		pci_bridge_walk(bridge, report_mmio_enabled, &status);
>> +		pci_bridge_walk(bridge, dev, report_mmio_enabled, &status);
>>  	}
>>
>>  	if (status == PCI_ERS_RESULT_NEED_RESET) {
>> @@ -236,18 +248,16 @@ pci_ers_result_t pcie_do_recovery(struct 
>> pci_dev *dev,
>>  		 */
>>  		status = PCI_ERS_RESULT_RECOVERED;
>>  		pci_dbg(dev, "broadcast slot_reset message\n");
>> -		pci_bridge_walk(bridge, report_slot_reset, &status);
>> +		pci_bridge_walk(bridge, dev, report_slot_reset, &status);
>>  	}
>>
>>  	if (status != PCI_ERS_RESULT_RECOVERED)
>>  		goto failed;
>>
>>  	pci_dbg(dev, "broadcast resume message\n");
>> -	pci_bridge_walk(bridge, report_resume, &status);
>> +	pci_bridge_walk(bridge, dev, report_resume, &status);
>>
>> -	if (type == PCI_EXP_TYPE_ROOT_PORT ||
>> -	    type == PCI_EXP_TYPE_DOWNSTREAM ||
>> -	    type == PCI_EXP_TYPE_RC_EC) {
>> +	if (bridge) {
>>  		if (pcie_aer_is_native(bridge))
>>  			pcie_clear_device_status(bridge);
>>  		pci_aer_clear_nonfatal_status(bridge);
>> -- 
>> 2.28.0
>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v6 07/10] PCI/RCEC: Add RCiEP's linked RCEC to AER/ERR
  2020-09-28  1:47     ` Bjorn Helgaas
@ 2020-09-28 21:36       ` Sean V Kelley
  0 siblings, 0 replies; 25+ messages in thread
From: Sean V Kelley @ 2020-09-28 21:36 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Sean V Kelley, bhelgaas, Jonathan.Cameron, rafael.j.wysocki,
	ashok.raj, tony.luck, sathyanarayanan.kuppuswamy, qiuxu.zhuo,
	linux-pci, linux-kernel

On 27 Sep 2020, at 18:47, Bjorn Helgaas wrote:

> On Sun, Sep 27, 2020 at 06:45:45PM -0500, Bjorn Helgaas wrote:
>> On Tue, Sep 22, 2020 at 02:38:56PM -0700, Sean V Kelley wrote:
>>> From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>
>>>  	pci_dbg(dev, "broadcast error_detected message\n");
>>>  	if (state == pci_channel_io_frozen) {
>>> -		pci_bridge_walk(bridge, report_frozen_detected, &status);
>>> +		pci_bridge_walk(bridge, dev, report_frozen_detected, &status);
>>>  		if (type == PCI_EXP_TYPE_RC_END) {
>>> +			/*
>>> +			 * The callback only clears the Root Error Status
>>> +			 * of the RCEC (see aer.c).
>>> +			 */
>>> +			if (bridge)
>>> +				reset_subordinate_devices(bridge);
>>
>> It's unfortunate to add yet another special case in this code, and 
>> I'm
>> not thrilled about the one in aer_root_reset() either.  It's just not
>> obvious why they should be there.  I'm sure if I spent 30 minutes
>> decoding things, it would all make sense.  Guess I'm just griping
>> because I don't have a better suggestion.
>
> I'm sorry, this was unkind of me.  If I don't have a constructive
> idea, there's no reason for me to complain about this.  I apologize.
>
> Bjorn

No worries at all. The unfortunate handling of the Spec for RCEC/RCiEP 
associations and the added needs for native versus non-native is 
frustrating.

Sean

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2020-09-28 21:36 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-22 21:38 [PATCH v6 00/10] Add RCEC handling to PCI/AER Sean V Kelley
2020-09-22 21:38 ` [PATCH v6 01/10] PCI/RCEC: Add RCEC class code and extended capability Sean V Kelley
2020-09-22 21:38 ` [PATCH v6 02/10] PCI/RCEC: Bind RCEC devices to the Root Port driver Sean V Kelley
2020-09-25 19:59   ` Bjorn Helgaas
2020-09-25 21:39     ` Sean V Kelley
2020-09-22 21:38 ` [PATCH v6 03/10] PCI/RCEC: Cache RCEC capabilities in pci_init_capabilities() Sean V Kelley
2020-09-25 20:13   ` Bjorn Helgaas
2020-09-25 21:49     ` Sean V Kelley
2020-09-22 21:38 ` [PATCH v6 04/10] PCI/AER: Extend AER error handling to RCECs Sean V Kelley
2020-09-25 21:14   ` Bjorn Helgaas
2020-09-25 22:08     ` Sean V Kelley
2020-09-22 21:38 ` [PATCH v6 05/10] PCI/AER: Apply function level reset to RCiEP on fatal error Sean V Kelley
2020-09-25 21:58   ` Bjorn Helgaas
2020-09-25 22:54     ` Sean V Kelley
2020-09-22 21:38 ` [PATCH v6 06/10] PCI/RCEC: Add pcie_link_rcec() to associate RCiEPs Sean V Kelley
2020-09-25 22:15   ` Bjorn Helgaas
2020-09-28 16:20     ` Sean V Kelley
2020-09-22 21:38 ` [PATCH v6 07/10] PCI/RCEC: Add RCiEP's linked RCEC to AER/ERR Sean V Kelley
2020-09-27 23:45   ` Bjorn Helgaas
2020-09-28  1:47     ` Bjorn Helgaas
2020-09-28 21:36       ` Sean V Kelley
2020-09-28 17:27     ` Sean V Kelley
2020-09-22 21:38 ` [PATCH v6 08/10] PCI/AER: Add pcie_walk_rcec() to RCEC AER handling Sean V Kelley
2020-09-22 21:38 ` [PATCH v6 09/10] PCI/PME: Add pcie_walk_rcec() to RCEC PME handling Sean V Kelley
2020-09-22 21:38 ` [PATCH v6 10/10] PCI/AER: Add RCEC AER error injection support Sean V Kelley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).