All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/10] Add RCEC handling to PCI/AER
@ 2020-08-12 16:46 Sean V Kelley
  2020-08-12 16:46 ` [PATCH v3 01/10] PCI/RCEC: Add RCEC class code and extended capability Sean V Kelley
                   ` (9 more replies)
  0 siblings, 10 replies; 26+ messages in thread
From: Sean V Kelley @ 2020-08-12 16:46 UTC (permalink / raw)
  To: bhelgaas, Jonathan.Cameron, rjw, ashok.raj, tony.luck,
	sathyanarayanan.kuppuswamy, qiuxu.zhuo
  Cc: linux-pci, linux-kernel, Sean V Kelley


Ongoing discussion on patch 4/9 [1]:

[1] https://lore.kernel.org/linux-pci/20200810103252.00000318@Huawei.com/

Changes since v2 [2]:

- Fix comment header for get_port_device_capability().
- Remove pcie_aer_is_native() check in pcie_do_recovery().
- Fix wording of error message in aer_inject().
- Use "non-native handling" in place of "firmware first"
(Jonathan Cameron)

- Prefer patches based on pci/master.
- Correct order of tags, especially when using Co-developed-by.
- Fix two patches ensuring that most important words appear early in one-liner.
- Check on type for bitmap in pcie_walk_rciep_devfn(). However, due to use
of for_each_set_bit() macro, const unsigned long is needed. Unchanged.
- Matching pci_dev_put() needed where pci_get_slot() is used.
- Read the RCEC Associated Endpoint Capabilities and cache them at enumeration
time, e.g., pci_init_capabilities().
- Remove unnecessary use of pci_find_bus() and corresponding check on value.
- Replace use of pdev/pbus with dev/bus (preferred naming style).
- Mention name of new function pci_walk_dev_affected() in 4/9.
- Use "()" after function names in log for 4/9.
- Fix width to wrap within 75 col for 4/9.
- Use word "bridge" in place of "port" in function comment.
- Fix style in conditional in pci_walk_dev_affected().
(Bjorn Helgaas)

[2] https://lore.kernel.org/linux-pci/20200804194052.193272-1-sean.v.kelley@intel.com/

Root Complex Event Collectors (RCEC) provide support for terminating error
and PME messages from Root Complex Integrated Endpoints (RCiEPs).  An RCEC
resides on a Bus in the Root Complex. Multiple RCECs can in fact reside on
a single bus. An RCEC will explicitly declare supported RCiEPs through the
Root Complex Endpoint Association Extended Capability.

(See PCIe 5.0-1, sections 1.3.2.3 (RCiEP), and 7.9.10 (RCEC Ext. Cap.))

The kernel lacks handling for these RCECs and the error messages received
from their respective associated RCiEPs. More recently, a new CPU
interconnect, Compute eXpress Link (CXL) depends on RCEC capabilities for
purposes of error messaging from CXL 1.1 supported RCiEP devices.

DocLink: https://www.computeexpresslink.org/

This use case is not limited to CXL. Existing hardware today includes
support for RCECs, such as the Denverton microserver product
family. Future hardware will be forthcoming.

(See Intel Document, Order number: 33061-003US)

So services such as AER or PME could be associated with an RCEC driver.
In the case of CXL, if an RCiEP (i.e., CXL 1.1 device) is associated with a
platform's RCEC it shall signal PME and AER error conditions through that
RCEC.

Towards the above use cases, add the missing RCEC class and extend the
PCIe Root Port and service drivers to allow association of RCiEPs to their
respective parent RCEC and facilitate handling of terminating error and PME
messages.


Jonathan Cameron (1):
  PCI/AER: Extend AER error handling to RCECs

Qiuxu Zhuo (6):
  PCI/RCEC: Add RCEC class code and extended capability
  PCI/RCEC: Bind RCEC devices to the Root Port driver
  PCI/RCEC: Add pcie_walk_rcec() to walk associated RCiEPs
  PCI/AER: Apply function level reset to RCiEP on fatal error
  PCI: Add 'rcec' field to pci_dev for associated RCiEPs
  PCI/AER: Add RCEC AER error injection support

Sean V Kelley (3):
  PCI/RCEC: Cache RCEC capabilities in pci_init_capabilities()
  PCI/AER: Add RCEC AER handling
  PCI/PME: Add RCEC PME handling

 drivers/pci/pci.h               |  22 ++++++
 drivers/pci/pcie/Makefile       |   2 +-
 drivers/pci/pcie/aer.c          |  36 ++++++---
 drivers/pci/pcie/aer_inject.c   |   5 +-
 drivers/pci/pcie/err.c          |  85 +++++++++++++++++----
 drivers/pci/pcie/pme.c          |  15 +++-
 drivers/pci/pcie/portdrv_core.c |   8 +-
 drivers/pci/pcie/portdrv_pci.c  |  20 ++++-
 drivers/pci/pcie/rcec.c         | 128 ++++++++++++++++++++++++++++++++
 drivers/pci/probe.c             |   2 +
 include/linux/pci.h             |   5 ++
 include/linux/pci_ids.h         |   1 +
 include/uapi/linux/pci_regs.h   |   7 ++
 13 files changed, 300 insertions(+), 36 deletions(-)
 create mode 100644 drivers/pci/pcie/rcec.c

--
2.28.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v3 01/10] PCI/RCEC: Add RCEC class code and extended capability
  2020-08-12 16:46 [PATCH v3 00/10] Add RCEC handling to PCI/AER Sean V Kelley
@ 2020-08-12 16:46 ` Sean V Kelley
  2020-08-12 16:46 ` [PATCH v3 02/10] PCI/RCEC: Bind RCEC devices to the Root Port driver Sean V Kelley
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 26+ messages in thread
From: Sean V Kelley @ 2020-08-12 16:46 UTC (permalink / raw)
  To: bhelgaas, Jonathan.Cameron, rjw, ashok.raj, tony.luck,
	sathyanarayanan.kuppuswamy, qiuxu.zhuo
  Cc: linux-pci, linux-kernel

From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>

A PCIe Root Complex Event Collector(RCEC) has the base class 0x08,
sub-class 0x07, and programming interface 0x00. Add the class code
0x0807 to identify RCEC devices and add the defines for the RCEC
Endpoint Association Extended Capability.

See PCI Express Base Specification, version 5.0-1, section "1.3.4
Root Complex Event Collector" and section "7.9.10 Root Complex
Event Collector Endpoint Association Extended Capability"

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 include/linux/pci_ids.h       | 1 +
 include/uapi/linux/pci_regs.h | 7 +++++++
 2 files changed, 8 insertions(+)

diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index 0ad57693f392..de8dff1fb176 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -81,6 +81,7 @@
 #define PCI_CLASS_SYSTEM_RTC		0x0803
 #define PCI_CLASS_SYSTEM_PCI_HOTPLUG	0x0804
 #define PCI_CLASS_SYSTEM_SDHCI		0x0805
+#define PCI_CLASS_SYSTEM_RCEC		0x0807
 #define PCI_CLASS_SYSTEM_OTHER		0x0880
 
 #define PCI_BASE_CLASS_INPUT		0x09
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index f9701410d3b5..f335f65f65d6 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -828,6 +828,13 @@
 #define  PCI_PWR_CAP_BUDGET(x)	((x) & 1)	/* Included in system budget */
 #define PCI_EXT_CAP_PWR_SIZEOF	16
 
+/* Root Complex Event Collector Endpoint Association  */
+#define PCI_RCEC_RCIEP_BITMAP	4	/* Associated Bitmap for RCiEPs */
+#define PCI_RCEC_BUSN		8	/* RCEC Associated Bus Numbers */
+#define  PCI_RCEC_BUSN_REG_VER	0x02	/* Least capability version that BUSN present */
+#define  PCI_RCEC_BUSN_NEXT(x)	(((x) >> 8) & 0xff)
+#define  PCI_RCEC_BUSN_LAST(x)	(((x) >> 16) & 0xff)
+
 /* Vendor-Specific (VSEC, PCI_EXT_CAP_ID_VNDR) */
 #define PCI_VNDR_HEADER		4	/* Vendor-Specific Header */
 #define  PCI_VNDR_HEADER_ID(x)	((x) & 0xffff)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 02/10] PCI/RCEC: Bind RCEC devices to the Root Port driver
  2020-08-12 16:46 [PATCH v3 00/10] Add RCEC handling to PCI/AER Sean V Kelley
  2020-08-12 16:46 ` [PATCH v3 01/10] PCI/RCEC: Add RCEC class code and extended capability Sean V Kelley
@ 2020-08-12 16:46 ` Sean V Kelley
  2020-08-12 16:46 ` [PATCH v3 03/10] PCI/RCEC: Cache RCEC capabilities in pci_init_capabilities() Sean V Kelley
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 26+ messages in thread
From: Sean V Kelley @ 2020-08-12 16:46 UTC (permalink / raw)
  To: bhelgaas, Jonathan.Cameron, rjw, ashok.raj, tony.luck,
	sathyanarayanan.kuppuswamy, qiuxu.zhuo
  Cc: linux-pci, linux-kernel, Sean V Kelley

From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>

If a Root Complex Integrated Endpoint (RCiEP) is implemented, errors may
optionally be sent to a corresponding Root Complex Event Collector (RCEC).
Each RCiEP must be associated with no more than one RCEC. Interface errors
are reported to the OS by RCECs.

For an RCEC (technically not a Bridge), error messages "received" from
associated RCiEPs must be enabled for "transmission" in order to cause a
System Error via the Root Control register or (when the Advanced Error
Reporting Capability is present) reporting via the Root Error Command
register and logging in the Root Error Status register and Error Source
Identification register.

Given the commonality with Root Ports and the need to also support AER
and PME services for RCECs, extend the Root Port driver to support RCEC
devices through the addition of the RCEC Class ID to the driver
structure.

Co-developed-by: Sean V Kelley <sean.v.kelley@intel.com>
Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 drivers/pci/pcie/portdrv_core.c | 8 ++++----
 drivers/pci/pcie/portdrv_pci.c  | 5 ++++-
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index 50a9522ab07d..99769c636775 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -234,11 +234,11 @@ static int get_port_device_capability(struct pci_dev *dev)
 #endif
 
 	/*
-	 * Root ports are capable of generating PME too.  Root Complex
-	 * Event Collectors can also generate PMEs, but we don't handle
-	 * those yet.
+	 * Root ports and Root Complex Event Collectors are capable
+	 * of generating PME.
 	 */
-	if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT &&
+	if ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
+	     pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC) &&
 	    (pcie_ports_native || host->native_pme)) {
 		services |= PCIE_PORT_SERVICE_PME;
 
diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index 3acf151ae015..d5b109499b10 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -106,7 +106,8 @@ static int pcie_portdrv_probe(struct pci_dev *dev,
 	if (!pci_is_pcie(dev) ||
 	    ((pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT) &&
 	     (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM) &&
-	     (pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM)))
+	     (pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM) &&
+	     (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_EC)))
 		return -ENODEV;
 
 	status = pcie_port_device_register(dev);
@@ -195,6 +196,8 @@ static const struct pci_device_id port_pci_ids[] = {
 	{ PCI_DEVICE_CLASS(((PCI_CLASS_BRIDGE_PCI << 8) | 0x00), ~0) },
 	/* subtractive decode PCI-to-PCI bridge, class type is 060401h */
 	{ PCI_DEVICE_CLASS(((PCI_CLASS_BRIDGE_PCI << 8) | 0x01), ~0) },
+	/* handle any Root Complex Event Collector */
+	{ PCI_DEVICE_CLASS(((PCI_CLASS_SYSTEM_RCEC << 8) | 0x00), ~0) },
 	{ },
 };
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 03/10] PCI/RCEC: Cache RCEC capabilities in pci_init_capabilities()
  2020-08-12 16:46 [PATCH v3 00/10] Add RCEC handling to PCI/AER Sean V Kelley
  2020-08-12 16:46 ` [PATCH v3 01/10] PCI/RCEC: Add RCEC class code and extended capability Sean V Kelley
  2020-08-12 16:46 ` [PATCH v3 02/10] PCI/RCEC: Bind RCEC devices to the Root Port driver Sean V Kelley
@ 2020-08-12 16:46 ` Sean V Kelley
  2020-08-12 16:46 ` [PATCH v3 04/10] PCI/RCEC: Add pcie_walk_rcec() to walk associated RCiEPs Sean V Kelley
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 26+ messages in thread
From: Sean V Kelley @ 2020-08-12 16:46 UTC (permalink / raw)
  To: bhelgaas, Jonathan.Cameron, rjw, ashok.raj, tony.luck,
	sathyanarayanan.kuppuswamy, qiuxu.zhuo
  Cc: linux-pci, linux-kernel, Sean V Kelley

Extend support for Root Complex Event Collectors by decoding and
caching the RCEC Endpoint Association Extended Capabilities when
enumerating. Use that cached information for later error source
reporting. See PCI Express Base Specification, version 5.0-1,
section 7.9.10.

Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
---
 drivers/pci/pci.h         | 18 ++++++++++++++
 drivers/pci/pcie/Makefile |  2 +-
 drivers/pci/pcie/rcec.c   | 52 +++++++++++++++++++++++++++++++++++++++
 drivers/pci/probe.c       |  2 ++
 include/linux/pci.h       |  4 +++
 5 files changed, 77 insertions(+), 1 deletion(-)
 create mode 100644 drivers/pci/pcie/rcec.c

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 6d3f75867106..bd25e6047b54 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -448,6 +448,16 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info);
 void aer_print_error(struct pci_dev *dev, struct aer_err_info *info);
 #endif	/* CONFIG_PCIEAER */
 
+#ifdef CONFIG_PCIEPORTBUS
+/* Cached RCEC Associated Endpoint Extended Capabilities */
+struct rcec_ext {
+	u8		ver;
+	u8		nextbusn;
+	u8		lastbusn;
+	u32		bitmap;
+};
+#endif
+
 #ifdef CONFIG_PCIE_DPC
 void pci_save_dpc_state(struct pci_dev *dev);
 void pci_restore_dpc_state(struct pci_dev *dev);
@@ -460,6 +470,14 @@ static inline void pci_restore_dpc_state(struct pci_dev *dev) {}
 static inline void pci_dpc_init(struct pci_dev *pdev) {}
 #endif
 
+#ifdef CONFIG_PCIEPORTBUS
+void pci_rcec_init(struct pci_dev *dev);
+void pci_rcec_exit(struct pci_dev *dev);
+#else
+static inline void pci_rcec_init(struct pci_dev *dev) {}
+static inline void pci_rcec_exit(struct pci_dev *dev) {}
+#endif
+
 #ifdef CONFIG_PCI_ATS
 /* Address Translation Service */
 void pci_ats_init(struct pci_dev *dev);
diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile
index 68da9280ff11..d9697892fa3e 100644
--- a/drivers/pci/pcie/Makefile
+++ b/drivers/pci/pcie/Makefile
@@ -2,7 +2,7 @@
 #
 # Makefile for PCI Express features and port driver
 
-pcieportdrv-y			:= portdrv_core.o portdrv_pci.o err.o
+pcieportdrv-y			:= portdrv_core.o portdrv_pci.o err.o rcec.o
 
 obj-$(CONFIG_PCIEPORTBUS)	+= pcieportdrv.o
 
diff --git a/drivers/pci/pcie/rcec.c b/drivers/pci/pcie/rcec.c
new file mode 100644
index 000000000000..519ae086ff41
--- /dev/null
+++ b/drivers/pci/pcie/rcec.c
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Root Complex Event Collector Support
+ *
+ * Authors:
+ *  Sean V Kelley <sean.v.kelley@intel.com>
+ *  Qiuxu Zhuo <qiuxu.zhuo@intel.com>
+ *
+ * Copyright (C) 2020 Intel Corp.
+ */
+
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/bitops.h>
+#include <linux/pci.h>
+#include <linux/pci_regs.h>
+
+#include "../pci.h"
+
+void pci_rcec_init(struct pci_dev *dev)
+{
+	u32 rcec, hdr, busn;
+
+	/* Only for Root Complex Event Collectors */
+	if (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_EC)
+		return;
+
+	dev->rcec_cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_RCEC);
+	if (!dev->rcec_cap)
+		return;
+
+	dev->rcec_ext = kzalloc(sizeof(*dev->rcec_ext), GFP_KERNEL);
+
+	rcec = dev->rcec_cap;
+	pci_read_config_dword(dev, rcec + PCI_RCEC_RCIEP_BITMAP, &dev->rcec_ext->bitmap);
+
+	/* Check whether RCEC BUSN register is present */
+	pci_read_config_dword(dev, rcec, &hdr);
+	dev->rcec_ext->ver = PCI_EXT_CAP_VER(hdr);
+	if (dev->rcec_ext->ver < PCI_RCEC_BUSN_REG_VER)
+		return;
+
+	pci_read_config_dword(dev, rcec + PCI_RCEC_BUSN, &busn);
+	dev->rcec_ext->nextbusn = PCI_RCEC_BUSN_NEXT(busn);
+	dev->rcec_ext->lastbusn = PCI_RCEC_BUSN_LAST(busn);
+}
+
+void pci_rcec_exit(struct pci_dev *dev)
+{
+	kfree(dev->rcec_ext);
+	dev->rcec_ext = NULL;
+}
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 2f66988cea25..2ad1e3e4ee6a 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2195,6 +2195,7 @@ static void pci_configure_device(struct pci_dev *dev)
 static void pci_release_capabilities(struct pci_dev *dev)
 {
 	pci_aer_exit(dev);
+	pci_rcec_exit(dev);
 	pci_vpd_release(dev);
 	pci_iov_release(dev);
 	pci_free_cap_save_buffers(dev);
@@ -2394,6 +2395,7 @@ static void pci_init_capabilities(struct pci_dev *dev)
 	pci_ptm_init(dev);		/* Precision Time Measurement */
 	pci_aer_init(dev);		/* Advanced Error Reporting */
 	pci_dpc_init(dev);		/* Downstream Port Containment */
+	pci_rcec_init(dev);		/* Root Complex Event Collector */
 
 	pcie_report_downtraining(dev);
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index c79d83304e52..c7fc5726872c 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -326,6 +326,10 @@ struct pci_dev {
 #ifdef CONFIG_PCIEAER
 	u16		aer_cap;	/* AER capability offset */
 	struct aer_stats *aer_stats;	/* AER stats for this device */
+#endif
+#ifdef CONFIG_PCIEPORTBUS
+	u16		rcec_cap;	/* RCEC capability offset */
+	struct rcec_ext *rcec_ext;	/* RCEC cached assoc. endpoint extended capabilities */
 #endif
 	u8		pcie_cap;	/* PCIe capability offset */
 	u8		msi_cap;	/* MSI capability offset */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 04/10] PCI/RCEC: Add pcie_walk_rcec() to walk associated RCiEPs
  2020-08-12 16:46 [PATCH v3 00/10] Add RCEC handling to PCI/AER Sean V Kelley
                   ` (2 preceding siblings ...)
  2020-08-12 16:46 ` [PATCH v3 03/10] PCI/RCEC: Cache RCEC capabilities in pci_init_capabilities() Sean V Kelley
@ 2020-08-12 16:46 ` Sean V Kelley
  2020-09-02 19:00   ` Bjorn Helgaas
  2020-08-12 16:46 ` [PATCH v3 05/10] PCI/AER: Extend AER error handling to RCECs Sean V Kelley
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 26+ messages in thread
From: Sean V Kelley @ 2020-08-12 16:46 UTC (permalink / raw)
  To: bhelgaas, Jonathan.Cameron, rjw, ashok.raj, tony.luck,
	sathyanarayanan.kuppuswamy, qiuxu.zhuo
  Cc: linux-pci, linux-kernel, Sean V Kelley

From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>

When an RCEC device signals error(s) to a CPU core, the CPU core
needs to walk all the RCiEPs associated with that RCEC to check
errors. So add the function pcie_walk_rcec() to walk all RCiEPs
associated with the RCEC device.

Co-developed-by: Sean V Kelley <sean.v.kelley@intel.com>
Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 drivers/pci/pci.h       |  4 +++
 drivers/pci/pcie/rcec.c | 76 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 80 insertions(+)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index bd25e6047b54..8bd7528d6977 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -473,9 +473,13 @@ static inline void pci_dpc_init(struct pci_dev *pdev) {}
 #ifdef CONFIG_PCIEPORTBUS
 void pci_rcec_init(struct pci_dev *dev);
 void pci_rcec_exit(struct pci_dev *dev);
+void pcie_walk_rcec(struct pci_dev *rcec, int (*cb)(struct pci_dev *, void *),
+		    void *userdata);
 #else
 static inline void pci_rcec_init(struct pci_dev *dev) {}
 static inline void pci_rcec_exit(struct pci_dev *dev) {}
+static inline void pcie_walk_rcec(struct pci_dev *rcec, int (*cb)(struct pci_dev *, void *),
+				  void *userdata) {}
 #endif
 
 #ifdef CONFIG_PCI_ATS
diff --git a/drivers/pci/pcie/rcec.c b/drivers/pci/pcie/rcec.c
index 519ae086ff41..405f92fcdf7f 100644
--- a/drivers/pci/pcie/rcec.c
+++ b/drivers/pci/pcie/rcec.c
@@ -17,6 +17,82 @@
 
 #include "../pci.h"
 
+static int pcie_walk_rciep_devfn(struct pci_bus *bus, int (*cb)(struct pci_dev *, void *),
+				 void *userdata, const unsigned long bitmap)
+{
+	unsigned int devn, fn;
+	struct pci_dev *dev;
+	int retval;
+
+	for_each_set_bit(devn, &bitmap, 32) {
+		for (fn = 0; fn < 8; fn++) {
+			dev = pci_get_slot(bus, PCI_DEVFN(devn, fn));
+
+			if (!dev)
+				continue;
+
+			if (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_END) {
+				pci_dev_put(dev);
+				continue;
+			}
+
+			retval = cb(dev, userdata);
+			pci_dev_put(dev);
+			if (retval)
+				return retval;
+		}
+	}
+
+	return 0;
+}
+
+/**
+ * pcie_walk_rcec - Walk RCiEP devices associating with RCEC and call callback.
+ * @rcec     RCEC whose RCiEP devices should be walked.
+ * @cb       Callback to be called for each RCiEP device found.
+ * @userdata Arbitrary pointer to be passed to callback.
+ *
+ * Walk the given RCEC. Call the provided callback on each RCiEP device found.
+ *
+ * We check the return of @cb each time. If it returns anything
+ * other than 0, we break out.
+ */
+void pcie_walk_rcec(struct pci_dev *rcec, int (*cb)(struct pci_dev *, void *),
+		    void *userdata)
+{
+	u8 nextbusn, lastbusn;
+	struct pci_bus *bus;
+	unsigned int bnr;
+
+	if (!rcec->rcec_cap)
+		return;
+
+	/* Find RCiEP devices on the same bus as the RCEC */
+	if (pcie_walk_rciep_devfn(rcec->bus, cb, userdata, rcec->rcec_ext->bitmap))
+		return;
+
+	/* Check whether RCEC BUSN register is present */
+	if (rcec->rcec_ext->ver < PCI_RCEC_BUSN_REG_VER)
+		return;
+
+	nextbusn = rcec->rcec_ext->nextbusn;
+	lastbusn = rcec->rcec_ext->lastbusn;
+
+	/* All RCiEP devices are on the same bus as the RCEC */
+	if (nextbusn == 0xff && lastbusn == 0x00)
+		return;
+
+	for (bnr = nextbusn; bnr <= lastbusn; bnr++) {
+		bus = pci_find_bus(pci_domain_nr(rcec->bus), bnr);
+		if (!bus)
+			continue;
+
+		/* Find RCiEP devices on the given bus */
+		if (pcie_walk_rciep_devfn(bus, cb, userdata, 0xffffffff))
+			return;
+	}
+}
+
 void pci_rcec_init(struct pci_dev *dev)
 {
 	u32 rcec, hdr, busn;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 05/10] PCI/AER: Extend AER error handling to RCECs
  2020-08-12 16:46 [PATCH v3 00/10] Add RCEC handling to PCI/AER Sean V Kelley
                   ` (3 preceding siblings ...)
  2020-08-12 16:46 ` [PATCH v3 04/10] PCI/RCEC: Add pcie_walk_rcec() to walk associated RCiEPs Sean V Kelley
@ 2020-08-12 16:46 ` Sean V Kelley
  2020-08-26 17:26   ` Kuppuswamy, Sathyanarayanan
  2020-08-12 16:46 ` [PATCH v3 06/10] PCI/AER: Apply function level reset to RCiEP on fatal error Sean V Kelley
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 26+ messages in thread
From: Sean V Kelley @ 2020-08-12 16:46 UTC (permalink / raw)
  To: bhelgaas, Jonathan.Cameron, rjw, ashok.raj, tony.luck,
	sathyanarayanan.kuppuswamy, qiuxu.zhuo
  Cc: linux-pci, linux-kernel, Sean V Kelley

From: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Currently the kernel does not handle AER errors for Root Complex
integrated End Points (RCiEPs)[0]. These devices sit on a root bus within
the Root Complex (RC). AER handling is performed by a Root Complex Event
Collector (RCEC) [1] which is a effectively a type of RCiEP on the same
root bus.

For an RCEC (technically not a Bridge), error messages "received" from
associated RCiEPs must be enabled for "transmission" in order to cause a
System Error via the Root Control register or (when the Advanced Error
Reporting Capability is present) reporting via the Root Error Command
register and logging in the Root Error Status register and Error Source
Identification register.

In addition to the defined OS level handling of the reset flow for the
associated RCiEPs of an RCEC, it is possible to also have non-native
handling. In that case there is no need to take any actions on the RCEC
because the firmware is responsible for them. This is true where APEI [2]
is used to report the AER errors via a GHES[v2] HEST entry [3] and
relevant AER CPER record [4] and non-native handling is in use.

We effectively end up with two different types of discovery for
purposes of handling AER errors:

1) Normal bus walk - we pass the downstream port above a bus to which
the device is attached and it walks everything below that point.

2) An RCiEP with no visible association with an RCEC as there is no need
to walk devices. In that case, the flow is to just call the callbacks for
the actual device.

A new walk function pci_walk_dev_affected(), similar to pci_bus_walk(),
is provided that takes a pci_dev instead of a bus. If that dev corresponds
to a downstream port it will walk the subordinate bus of that downstream
port. If the dev does not then it will call the function on that device
alone.

[0] ACPI PCI Express Base Specification 5.0-1 1.3.2.3 Root Complex
Integrated Endpoint Rules.
[1] ACPI PCI Express Base Specification 5.0-1 6.2 Error Signalling and
Logging
[2] ACPI Specification 6.3 Chapter 18 ACPI Platform Error Interface (APEI)
[3] ACPI Specification 6.3 18.2.3.7 Generic Hardware Error Source
[4] UEFI Specification 2.8, N.2.7 PCI Express Error Section

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
---
 drivers/pci/pcie/err.c | 54 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 44 insertions(+), 10 deletions(-)

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 14bb8f54723e..f4cfb37c26c1 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -146,38 +146,68 @@ static int report_resume(struct pci_dev *dev, void *data)
 	return 0;
 }
 
+/**
+ * pci_walk_dev_affected - walk devices potentially AER affected
+ * @dev      device which may be an RCEC with associated RCiEPs,
+ *           an RCiEP associated with an RCEC, or a Port.
+ * @cb       callback to be called for each device found
+ * @userdata arbitrary pointer to be passed to callback.
+ *
+ * If the device provided is a bridge, walk the subordinate bus,
+ * including any bridged devices on buses under this bus.
+ * Call the provided callback on each device found.
+ *
+ * If the device provided has no subordinate bus, call the provided
+ * callback on the device itself.
+ */
+static void pci_walk_dev_affected(struct pci_dev *dev, int (*cb)(struct pci_dev *, void *),
+				  void *userdata)
+{
+	if (dev->subordinate)
+		pci_walk_bus(dev->subordinate, cb, userdata);
+	else
+		cb(dev, userdata);
+}
+
 pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 			enum pci_channel_state state,
 			pci_ers_result_t (*reset_link)(struct pci_dev *pdev))
 {
 	pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
-	struct pci_bus *bus;
 
 	/*
 	 * Error recovery runs on all subordinates of the first downstream port.
 	 * If the downstream port detected the error, it is cleared at the end.
+	 * For RCiEPs we should reset just the RCiEP itself.
 	 */
 	if (!(pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
-	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM))
+	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
+	      pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END ||
+	      pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC))
 		dev = dev->bus->self;
-	bus = dev->subordinate;
 
 	pci_dbg(dev, "broadcast error_detected message\n");
 	if (state == pci_channel_io_frozen) {
-		pci_walk_bus(bus, report_frozen_detected, &status);
+		pci_walk_dev_affected(dev, report_frozen_detected, &status);
+		if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END) {
+			pci_warn(dev, "link reset not possible for RCiEP\n");
+			status = PCI_ERS_RESULT_NONE;
+			goto failed;
+		}
+
 		status = reset_link(dev);
 		if (status != PCI_ERS_RESULT_RECOVERED) {
 			pci_warn(dev, "link reset failed\n");
 			goto failed;
 		}
 	} else {
-		pci_walk_bus(bus, report_normal_detected, &status);
+		pci_walk_dev_affected(dev, report_normal_detected, &status);
 	}
 
 	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
 		status = PCI_ERS_RESULT_RECOVERED;
 		pci_dbg(dev, "broadcast mmio_enabled message\n");
-		pci_walk_bus(bus, report_mmio_enabled, &status);
+		pci_walk_dev_affected(dev, report_mmio_enabled, &status);
 	}
 
 	if (status == PCI_ERS_RESULT_NEED_RESET) {
@@ -188,17 +218,21 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 		 */
 		status = PCI_ERS_RESULT_RECOVERED;
 		pci_dbg(dev, "broadcast slot_reset message\n");
-		pci_walk_bus(bus, report_slot_reset, &status);
+		pci_walk_dev_affected(dev, report_slot_reset, &status);
 	}
 
 	if (status != PCI_ERS_RESULT_RECOVERED)
 		goto failed;
 
 	pci_dbg(dev, "broadcast resume message\n");
-	pci_walk_bus(bus, report_resume, &status);
+	pci_walk_dev_affected(dev, report_resume, &status);
 
-	pci_aer_clear_device_status(dev);
-	pci_aer_clear_nonfatal_status(dev);
+	if ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
+	     pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
+	     pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC)) {
+		pci_aer_clear_device_status(dev);
+		pci_aer_clear_nonfatal_status(dev);
+	}
 	pci_info(dev, "device recovery successful\n");
 	return status;
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 06/10] PCI/AER: Apply function level reset to RCiEP on fatal error
  2020-08-12 16:46 [PATCH v3 00/10] Add RCEC handling to PCI/AER Sean V Kelley
                   ` (4 preceding siblings ...)
  2020-08-12 16:46 ` [PATCH v3 05/10] PCI/AER: Extend AER error handling to RCECs Sean V Kelley
@ 2020-08-12 16:46 ` Sean V Kelley
  2020-08-12 16:46 ` [PATCH v3 07/10] PCI: Add 'rcec' field to pci_dev for associated RCiEPs Sean V Kelley
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 26+ messages in thread
From: Sean V Kelley @ 2020-08-12 16:46 UTC (permalink / raw)
  To: bhelgaas, Jonathan.Cameron, rjw, ashok.raj, tony.luck,
	sathyanarayanan.kuppuswamy, qiuxu.zhuo
  Cc: linux-pci, linux-kernel

From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>

Attempt to do function level reset for an RCiEP associated with an
RCEC device on fatal error.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
---
 drivers/pci/pcie/err.c | 31 ++++++++++++++++++++++---------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index f4cfb37c26c1..04528b57876b 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -169,6 +169,17 @@ static void pci_walk_dev_affected(struct pci_dev *dev, int (*cb)(struct pci_dev
 		cb(dev, userdata);
 }
 
+static enum pci_ers_result flr_on_rciep(struct pci_dev *dev)
+{
+	if (!pcie_has_flr(dev))
+		return PCI_ERS_RESULT_NONE;
+
+	if (pcie_flr(dev))
+		return PCI_ERS_RESULT_DISCONNECT;
+
+	return PCI_ERS_RESULT_RECOVERED;
+}
+
 pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 			enum pci_channel_state state,
 			pci_ers_result_t (*reset_link)(struct pci_dev *pdev))
@@ -190,15 +201,17 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 	if (state == pci_channel_io_frozen) {
 		pci_walk_dev_affected(dev, report_frozen_detected, &status);
 		if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END) {
-			pci_warn(dev, "link reset not possible for RCiEP\n");
-			status = PCI_ERS_RESULT_NONE;
-			goto failed;
-		}
-
-		status = reset_link(dev);
-		if (status != PCI_ERS_RESULT_RECOVERED) {
-			pci_warn(dev, "link reset failed\n");
-			goto failed;
+			status = flr_on_rciep(dev);
+			if (status != PCI_ERS_RESULT_RECOVERED) {
+				pci_warn(dev, "function level reset failed\n");
+				goto failed;
+			}
+		} else {
+			status = reset_link(dev);
+			if (status != PCI_ERS_RESULT_RECOVERED) {
+				pci_warn(dev, "link reset failed\n");
+				goto failed;
+			}
 		}
 	} else {
 		pci_walk_dev_affected(dev, report_normal_detected, &status);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 07/10] PCI: Add 'rcec' field to pci_dev for associated RCiEPs
  2020-08-12 16:46 [PATCH v3 00/10] Add RCEC handling to PCI/AER Sean V Kelley
                   ` (5 preceding siblings ...)
  2020-08-12 16:46 ` [PATCH v3 06/10] PCI/AER: Apply function level reset to RCiEP on fatal error Sean V Kelley
@ 2020-08-12 16:46 ` Sean V Kelley
  2020-09-02 16:35   ` Bjorn Helgaas
  2020-08-12 16:46 ` [PATCH v3 08/10] PCI/AER: Add RCEC AER handling Sean V Kelley
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 26+ messages in thread
From: Sean V Kelley @ 2020-08-12 16:46 UTC (permalink / raw)
  To: bhelgaas, Jonathan.Cameron, rjw, ashok.raj, tony.luck,
	sathyanarayanan.kuppuswamy, qiuxu.zhuo
  Cc: linux-pci, linux-kernel, Sean V Kelley

From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>

When attempting error recovery for an RCiEP associated with an RCEC device,
there needs to be a way to update the Root Error Status, the Uncorrectable
Error Status and the Uncorrectable Error Severity of the parent RCEC.
So add the 'rcec' field to the pci_dev structure and provide a hook for the
Root Port Driver to associate RCiEPs with their respective parent RCEC.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Co-developed-by: Sean V Kelley <sean.v.kelley@intel.com>
Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
---
 drivers/pci/pcie/aer.c         |  9 +++++----
 drivers/pci/pcie/err.c         | 10 ++++++++++
 drivers/pci/pcie/portdrv_pci.c | 15 +++++++++++++++
 include/linux/pci.h            |  1 +
 4 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 3acf56683915..f1bf06be449e 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1335,17 +1335,18 @@ static int aer_probe(struct pcie_device *dev)
 static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
 {
 	int aer = dev->aer_cap;
+	int rc = 0;
 	u32 reg32;
-	int rc;
-
 
 	/* Disable Root's interrupt in response to error messages */
 	pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, &reg32);
 	reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
 	pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32);
 
-	rc = pci_bus_error_reset(dev);
-	pci_info(dev, "Root Port link has been reset\n");
+	if (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_EC) {
+		rc = pci_bus_error_reset(dev);
+		pci_info(dev, "Root Port link has been reset\n");
+	}
 
 	/* Clear Root Error Status */
 	pci_read_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, &reg32);
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 04528b57876b..a5b7e708079d 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -202,6 +202,12 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 		pci_walk_dev_affected(dev, report_frozen_detected, &status);
 		if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END) {
 			status = flr_on_rciep(dev);
+			/*
+			 * The callback only clears the Root Error Status
+			 * of the RCEC (see aer.c).
+			 */
+			if (dev->rcec)
+				reset_link(dev->rcec);
 			if (status != PCI_ERS_RESULT_RECOVERED) {
 				pci_warn(dev, "function level reset failed\n");
 				goto failed;
@@ -245,7 +251,11 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 	     pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC)) {
 		pci_aer_clear_device_status(dev);
 		pci_aer_clear_nonfatal_status(dev);
+	} else if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END && dev->rcec) {
+		pci_aer_clear_device_status(dev->rcec);
+		pci_aer_clear_nonfatal_status(dev->rcec);
 	}
+
 	pci_info(dev, "device recovery successful\n");
 	return status;
 
diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index d5b109499b10..a64e88b7166b 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -90,6 +90,18 @@ static const struct dev_pm_ops pcie_portdrv_pm_ops = {
 #define PCIE_PORTDRV_PM_OPS	NULL
 #endif /* !PM */
 
+static int pcie_hook_rcec(struct pci_dev *pdev, void *data)
+{
+	struct pci_dev *rcec = (struct pci_dev *)data;
+
+	pdev->rcec = rcec;
+	pci_dbg(rcec, "RCiEP(under an RCEC) %04x:%02x:%02x.%d\n",
+		pci_domain_nr(pdev->bus), pdev->bus->number,
+		PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
+
+	return 0;
+}
+
 /*
  * pcie_portdrv_probe - Probe PCI-Express port devices
  * @dev: PCI-Express port device being probed
@@ -110,6 +122,9 @@ static int pcie_portdrv_probe(struct pci_dev *dev,
 	     (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_EC)))
 		return -ENODEV;
 
+	if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC)
+		pcie_walk_rcec(dev, pcie_hook_rcec, dev);
+
 	status = pcie_port_device_register(dev);
 	if (status)
 		return status;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index c7fc5726872c..ba29816c827b 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -330,6 +330,7 @@ struct pci_dev {
 #ifdef CONFIG_PCIEPORTBUS
 	u16		rcec_cap;	/* RCEC capability offset */
 	struct rcec_ext *rcec_ext;	/* RCEC cached assoc. endpoint extended capabilities */
+	struct pci_dev	*rcec;		/* Associated RCEC device */
 #endif
 	u8		pcie_cap;	/* PCIe capability offset */
 	u8		msi_cap;	/* MSI capability offset */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 08/10] PCI/AER: Add RCEC AER handling
  2020-08-12 16:46 [PATCH v3 00/10] Add RCEC handling to PCI/AER Sean V Kelley
                   ` (6 preceding siblings ...)
  2020-08-12 16:46 ` [PATCH v3 07/10] PCI: Add 'rcec' field to pci_dev for associated RCiEPs Sean V Kelley
@ 2020-08-12 16:46 ` Sean V Kelley
  2020-08-12 16:46 ` [PATCH v3 09/10] PCI/PME: Add RCEC PME handling Sean V Kelley
  2020-08-12 16:46 ` [PATCH v3 10/10] PCI/AER: Add RCEC AER error injection support Sean V Kelley
  9 siblings, 0 replies; 26+ messages in thread
From: Sean V Kelley @ 2020-08-12 16:46 UTC (permalink / raw)
  To: bhelgaas, Jonathan.Cameron, rjw, ashok.raj, tony.luck,
	sathyanarayanan.kuppuswamy, qiuxu.zhuo
  Cc: linux-pci, linux-kernel, Sean V Kelley

The Root Complex Event Collectors(RCEC) appear as peers to Root Ports
and also have the AER capability. So add RCEC support to the current AER
service driver and attach the AER service driver to the RCEC device.

Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 drivers/pci/pcie/aer.c | 27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index f1bf06be449e..b6c7bbbc8f97 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -303,7 +303,7 @@ int pci_aer_raw_clear_status(struct pci_dev *dev)
 		return -EIO;
 
 	port_type = pci_pcie_type(dev);
-	if (port_type == PCI_EXP_TYPE_ROOT_PORT) {
+	if (port_type == PCI_EXP_TYPE_ROOT_PORT || port_type == PCI_EXP_TYPE_RC_EC) {
 		pci_read_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, &status);
 		pci_write_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, status);
 	}
@@ -577,7 +577,8 @@ static umode_t aer_stats_attrs_are_visible(struct kobject *kobj,
 	if ((a == &dev_attr_aer_rootport_total_err_cor.attr ||
 	     a == &dev_attr_aer_rootport_total_err_fatal.attr ||
 	     a == &dev_attr_aer_rootport_total_err_nonfatal.attr) &&
-	    pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT)
+	    ((pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT) &&
+	    (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_EC)))
 		return 0;
 
 	return a->mode;
@@ -894,7 +895,10 @@ static bool find_source_device(struct pci_dev *parent,
 	if (result)
 		return true;
 
-	pci_walk_bus(parent->subordinate, find_device_iter, e_info);
+	if (pci_pcie_type(parent) == PCI_EXP_TYPE_RC_EC)
+		pcie_walk_rcec(parent, find_device_iter, e_info);
+	else
+		pci_walk_bus(parent->subordinate, find_device_iter, e_info);
 
 	if (!e_info->error_dev_num) {
 		pci_info(parent, "can't find device of ID%04x\n", e_info->id);
@@ -1030,6 +1034,7 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info)
 		if (!(info->status & ~info->mask))
 			return 0;
 	} else if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
+		   pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC ||
 	           pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
 		   info->severity == AER_NONFATAL) {
 
@@ -1182,6 +1187,7 @@ static int set_device_error_reporting(struct pci_dev *dev, void *data)
 	int type = pci_pcie_type(dev);
 
 	if ((type == PCI_EXP_TYPE_ROOT_PORT) ||
+	    (type == PCI_EXP_TYPE_RC_EC) ||
 	    (type == PCI_EXP_TYPE_UPSTREAM) ||
 	    (type == PCI_EXP_TYPE_DOWNSTREAM)) {
 		if (enable)
@@ -1206,9 +1212,11 @@ static void set_downstream_devices_error_reporting(struct pci_dev *dev,
 {
 	set_device_error_reporting(dev, &enable);
 
-	if (!dev->subordinate)
-		return;
-	pci_walk_bus(dev->subordinate, set_device_error_reporting, &enable);
+	if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC)
+		pcie_walk_rcec(dev, set_device_error_reporting, &enable);
+	else if (dev->subordinate)
+		pci_walk_bus(dev->subordinate, set_device_error_reporting, &enable);
+
 }
 
 /**
@@ -1306,6 +1314,11 @@ static int aer_probe(struct pcie_device *dev)
 	struct device *device = &dev->device;
 	struct pci_dev *port = dev->port;
 
+	/* Limit to Root Ports or Root Complex Event Collectors */
+	if ((pci_pcie_type(port) != PCI_EXP_TYPE_RC_EC) &&
+	    (pci_pcie_type(port) != PCI_EXP_TYPE_ROOT_PORT))
+		return -ENODEV;
+
 	rpc = devm_kzalloc(device, sizeof(struct aer_rpc), GFP_KERNEL);
 	if (!rpc)
 		return -ENOMEM;
@@ -1362,7 +1375,7 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
 
 static struct pcie_port_service_driver aerdriver = {
 	.name		= "aer",
-	.port_type	= PCI_EXP_TYPE_ROOT_PORT,
+	.port_type	= PCIE_ANY_PORT,
 	.service	= PCIE_PORT_SERVICE_AER,
 
 	.probe		= aer_probe,
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 09/10] PCI/PME: Add RCEC PME handling
  2020-08-12 16:46 [PATCH v3 00/10] Add RCEC handling to PCI/AER Sean V Kelley
                   ` (7 preceding siblings ...)
  2020-08-12 16:46 ` [PATCH v3 08/10] PCI/AER: Add RCEC AER handling Sean V Kelley
@ 2020-08-12 16:46 ` Sean V Kelley
  2020-08-12 16:46 ` [PATCH v3 10/10] PCI/AER: Add RCEC AER error injection support Sean V Kelley
  9 siblings, 0 replies; 26+ messages in thread
From: Sean V Kelley @ 2020-08-12 16:46 UTC (permalink / raw)
  To: bhelgaas, Jonathan.Cameron, rjw, ashok.raj, tony.luck,
	sathyanarayanan.kuppuswamy, qiuxu.zhuo
  Cc: linux-pci, linux-kernel, Sean V Kelley

The Root Complex Event Collectors(RCEC) appear as peers of Root Ports
and also have the PME capability. So add RCEC support to the current PME
service driver and attach the PME service driver to the RCEC device.

Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 drivers/pci/pcie/pme.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pcie/pme.c b/drivers/pci/pcie/pme.c
index 6a32970bb731..87799166c96a 100644
--- a/drivers/pci/pcie/pme.c
+++ b/drivers/pci/pcie/pme.c
@@ -310,7 +310,10 @@ static int pcie_pme_can_wakeup(struct pci_dev *dev, void *ign)
 static void pcie_pme_mark_devices(struct pci_dev *port)
 {
 	pcie_pme_can_wakeup(port, NULL);
-	if (port->subordinate)
+
+	if (pci_pcie_type(port) == PCI_EXP_TYPE_RC_EC)
+		pcie_walk_rcec(port, pcie_pme_can_wakeup, NULL);
+	else if (port->subordinate)
 		pci_walk_bus(port->subordinate, pcie_pme_can_wakeup, NULL);
 }
 
@@ -320,10 +323,15 @@ static void pcie_pme_mark_devices(struct pci_dev *port)
  */
 static int pcie_pme_probe(struct pcie_device *srv)
 {
-	struct pci_dev *port;
+	struct pci_dev *port = srv->port;
 	struct pcie_pme_service_data *data;
 	int ret;
 
+	/* Limit to Root Ports or Root Complex Event Collectors */
+	if ((pci_pcie_type(port) != PCI_EXP_TYPE_RC_EC) &&
+	    (pci_pcie_type(port) != PCI_EXP_TYPE_ROOT_PORT))
+		return -ENODEV;
+
 	data = kzalloc(sizeof(*data), GFP_KERNEL);
 	if (!data)
 		return -ENOMEM;
@@ -333,7 +341,6 @@ static int pcie_pme_probe(struct pcie_device *srv)
 	data->srv = srv;
 	set_service_data(srv, data);
 
-	port = srv->port;
 	pcie_pme_interrupt_enable(port, false);
 	pcie_clear_root_pme_status(port);
 
@@ -445,7 +452,7 @@ static void pcie_pme_remove(struct pcie_device *srv)
 
 static struct pcie_port_service_driver pcie_pme_driver = {
 	.name		= "pcie_pme",
-	.port_type	= PCI_EXP_TYPE_ROOT_PORT,
+	.port_type	= PCIE_ANY_PORT,
 	.service	= PCIE_PORT_SERVICE_PME,
 
 	.probe		= pcie_pme_probe,
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 10/10] PCI/AER: Add RCEC AER error injection support
  2020-08-12 16:46 [PATCH v3 00/10] Add RCEC handling to PCI/AER Sean V Kelley
                   ` (8 preceding siblings ...)
  2020-08-12 16:46 ` [PATCH v3 09/10] PCI/PME: Add RCEC PME handling Sean V Kelley
@ 2020-08-12 16:46 ` Sean V Kelley
  9 siblings, 0 replies; 26+ messages in thread
From: Sean V Kelley @ 2020-08-12 16:46 UTC (permalink / raw)
  To: bhelgaas, Jonathan.Cameron, rjw, ashok.raj, tony.luck,
	sathyanarayanan.kuppuswamy, qiuxu.zhuo
  Cc: linux-pci, linux-kernel, Sean V Kelley

From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>

The Root Complex Event Collectors(RCEC) appear as peers to Root Ports
and also have the AER capability. So add RCEC support to the current AER
error injection driver.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Co-developed-by: Sean V Kelley <sean.v.kelley@intel.com>
Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
---
 drivers/pci/pcie/aer_inject.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pcie/aer_inject.c b/drivers/pci/pcie/aer_inject.c
index 21cc3d3387f7..8107e782546a 100644
--- a/drivers/pci/pcie/aer_inject.c
+++ b/drivers/pci/pcie/aer_inject.c
@@ -333,8 +333,11 @@ static int aer_inject(struct aer_error_inj *einj)
 	if (!dev)
 		return -ENODEV;
 	rpdev = pcie_find_root_port(dev);
+	/* If Root port not found, try to find an RCEC */
+	if (!rpdev)
+		rpdev = dev->rcec;
 	if (!rpdev) {
-		pci_err(dev, "Root port not found\n");
+		pci_err(dev, "Neither root port nor RCEC found\n");
 		ret = -ENODEV;
 		goto out_put;
 	}
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 05/10] PCI/AER: Extend AER error handling to RCECs
  2020-08-12 16:46 ` [PATCH v3 05/10] PCI/AER: Extend AER error handling to RCECs Sean V Kelley
@ 2020-08-26 17:26   ` Kuppuswamy, Sathyanarayanan
  2020-08-26 18:55     ` sean.v.kelley
  0 siblings, 1 reply; 26+ messages in thread
From: Kuppuswamy, Sathyanarayanan @ 2020-08-26 17:26 UTC (permalink / raw)
  To: Sean V Kelley, bhelgaas, Jonathan.Cameron, rjw, ashok.raj,
	tony.luck, qiuxu.zhuo
  Cc: linux-pci, linux-kernel



On 8/12/20 9:46 AM, Sean V Kelley wrote:
> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> Currently the kernel does not handle AER errors for Root Complex
> integrated End Points (RCiEPs)[0]. These devices sit on a root bus within
> the Root Complex (RC). AER handling is performed by a Root Complex Event
> Collector (RCEC) [1] which is a effectively a type of RCiEP on the same
> root bus.
> 
> For an RCEC (technically not a Bridge), error messages "received" from
> associated RCiEPs must be enabled for "transmission" in order to cause a
> System Error via the Root Control register or (when the Advanced Error
> Reporting Capability is present) reporting via the Root Error Command
> register and logging in the Root Error Status register and Error Source
> Identification register.
> 
> In addition to the defined OS level handling of the reset flow for the
> associated RCiEPs of an RCEC, it is possible to also have non-native
> handling. In that case there is no need to take any actions on the RCEC
> because the firmware is responsible for them. This is true where APEI [2]
> is used to report the AER errors via a GHES[v2] HEST entry [3] and
> relevant AER CPER record [4] and non-native handling is in use.
> 
> We effectively end up with two different types of discovery for
> purposes of handling AER errors:
> 
> 1) Normal bus walk - we pass the downstream port above a bus to which
> the device is attached and it walks everything below that point.
> 
> 2) An RCiEP with no visible association with an RCEC as there is no need
> to walk devices. In that case, the flow is to just call the callbacks for
> the actual device.
> 
> A new walk function pci_walk_dev_affected(), similar to pci_bus_walk(),
> is provided that takes a pci_dev instead of a bus. If that dev corresponds
> to a downstream port it will walk the subordinate bus of that downstream
> port. If the dev does not then it will call the function on that device
> alone.
> 
> [0] ACPI PCI Express Base Specification 5.0-1 1.3.2.3 Root Complex
> Integrated Endpoint Rules.
> [1] ACPI PCI Express Base Specification 5.0-1 6.2 Error Signalling and
> Logging
> [2] ACPI Specification 6.3 Chapter 18 ACPI Platform Error Interface (APEI)
> [3] ACPI Specification 6.3 18.2.3.7 Generic Hardware Error Source
> [4] UEFI Specification 2.8, N.2.7 PCI Express Error Section
> 
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
> ---
>   drivers/pci/pcie/err.c | 54 ++++++++++++++++++++++++++++++++++--------
>   1 file changed, 44 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 14bb8f54723e..f4cfb37c26c1 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -146,38 +146,68 @@ static int report_resume(struct pci_dev *dev, void *data)
>   	return 0;
>   }
>   
> +/**
> + * pci_walk_dev_affected - walk devices potentially AER affected
> + * @dev      device which may be an RCEC with associated RCiEPs,
> + *           an RCiEP associated with an RCEC, or a Port.
> + * @cb       callback to be called for each device found
> + * @userdata arbitrary pointer to be passed to callback.
> + *
> + * If the device provided is a bridge, walk the subordinate bus,
> + * including any bridged devices on buses under this bus.
> + * Call the provided callback on each device found.
> + *
> + * If the device provided has no subordinate bus, call the provided
> + * callback on the device itself.
> + */
> +static void pci_walk_dev_affected(struct pci_dev *dev, int (*cb)(struct pci_dev *, void *),
> +				  void *userdata)
> +{
> +	if (dev->subordinate)
> +		pci_walk_bus(dev->subordinate, cb, userdata);
> +	else
> +		cb(dev, userdata);
> +}
> +
>   pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>   			enum pci_channel_state state,
>   			pci_ers_result_t (*reset_link)(struct pci_dev *pdev))
>   {
>   	pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
> -	struct pci_bus *bus;
>   
>   	/*
>   	 * Error recovery runs on all subordinates of the first downstream port.
>   	 * If the downstream port detected the error, it is cleared at the end.
> +	 * For RCiEPs we should reset just the RCiEP itself.
>   	 */
>   	if (!(pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> -	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM))
> +	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
> +	      pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END ||
> +	      pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC))
>   		dev = dev->bus->self;
> -	bus = dev->subordinate;
>   
>   	pci_dbg(dev, "broadcast error_detected message\n");
>   	if (state == pci_channel_io_frozen) {
> -		pci_walk_bus(bus, report_frozen_detected, &status);
> +		pci_walk_dev_affected(dev, report_frozen_detected, &status);
> +		if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END) {
> +			pci_warn(dev, "link reset not possible for RCiEP\n");
> +			status = PCI_ERS_RESULT_NONE;
> +			goto failed;
reset_link is not applicable for RC_END, but why do you want to fail it?
> +		}
> +
>   		status = reset_link(dev);
>   		if (status != PCI_ERS_RESULT_RECOVERED) {
>   			pci_warn(dev, "link reset failed\n");
>   			goto failed;
>   		}
>   	} else {
> -		pci_walk_bus(bus, report_normal_detected, &status);
> +		pci_walk_dev_affected(dev, report_normal_detected, &status);
>   	}
>   
>   	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
>   		status = PCI_ERS_RESULT_RECOVERED;
>   		pci_dbg(dev, "broadcast mmio_enabled message\n");
> -		pci_walk_bus(bus, report_mmio_enabled, &status);
> +		pci_walk_dev_affected(dev, report_mmio_enabled, &status);
>   	}
>   
>   	if (status == PCI_ERS_RESULT_NEED_RESET) {
> @@ -188,17 +218,21 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>   		 */
>   		status = PCI_ERS_RESULT_RECOVERED;
>   		pci_dbg(dev, "broadcast slot_reset message\n");
> -		pci_walk_bus(bus, report_slot_reset, &status);
> +		pci_walk_dev_affected(dev, report_slot_reset, &status);
>   	}
>   
>   	if (status != PCI_ERS_RESULT_RECOVERED)
>   		goto failed;
>   
>   	pci_dbg(dev, "broadcast resume message\n");
> -	pci_walk_bus(bus, report_resume, &status);
> +	pci_walk_dev_affected(dev, report_resume, &status);
>   
> -	pci_aer_clear_device_status(dev);
> -	pci_aer_clear_nonfatal_status(dev);
you want to prevent clearing status for RC_END ? Can you explain?
> +	if ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> +	     pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
> +	     pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC)) {
> +		pci_aer_clear_device_status(dev);
> +		pci_aer_clear_nonfatal_status(dev);
> +	}
>   	pci_info(dev, "device recovery successful\n");
>   	return status;
>   
> 

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 05/10] PCI/AER: Extend AER error handling to RCECs
  2020-08-26 17:26   ` Kuppuswamy, Sathyanarayanan
@ 2020-08-26 18:55     ` sean.v.kelley
  0 siblings, 0 replies; 26+ messages in thread
From: sean.v.kelley @ 2020-08-26 18:55 UTC (permalink / raw)
  To: Kuppuswamy, Sathyanarayanan, bhelgaas, Jonathan.Cameron, rjw,
	ashok.raj, tony.luck, qiuxu.zhuo
  Cc: linux-pci, linux-kernel

Hi Sathya,

On Wed, 2020-08-26 at 10:26 -0700, Kuppuswamy, Sathyanarayanan wrote:
> 
> On 8/12/20 9:46 AM, Sean V Kelley wrote:
> > From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > 
> > Currently the kernel does not handle AER errors for Root Complex
> > integrated End Points (RCiEPs)[0]. These devices sit on a root bus
> > within
> > the Root Complex (RC). AER handling is performed by a Root Complex
> > Event
> > Collector (RCEC) [1] which is a effectively a type of RCiEP on the
> > same
> > root bus.
> > 
> > For an RCEC (technically not a Bridge), error messages "received"
> > from
> > associated RCiEPs must be enabled for "transmission" in order to
> > cause a
> > System Error via the Root Control register or (when the Advanced
> > Error
> > Reporting Capability is present) reporting via the Root Error
> > Command
> > register and logging in the Root Error Status register and Error
> > Source
> > Identification register.
> > 
> > In addition to the defined OS level handling of the reset flow for
> > the
> > associated RCiEPs of an RCEC, it is possible to also have non-
> > native
> > handling. In that case there is no need to take any actions on the
> > RCEC
> > because the firmware is responsible for them. This is true where
> > APEI [2]
> > is used to report the AER errors via a GHES[v2] HEST entry [3] and
> > relevant AER CPER record [4] and non-native handling is in use.
> > 
> > We effectively end up with two different types of discovery for
> > purposes of handling AER errors:
> > 
> > 1) Normal bus walk - we pass the downstream port above a bus to
> > which
> > the device is attached and it walks everything below that point.
> > 
> > 2) An RCiEP with no visible association with an RCEC as there is no
> > need
> > to walk devices. In that case, the flow is to just call the
> > callbacks for
> > the actual device.
> > 
> > A new walk function pci_walk_dev_affected(), similar to
> > pci_bus_walk(),
> > is provided that takes a pci_dev instead of a bus. If that dev
> > corresponds
> > to a downstream port it will walk the subordinate bus of that
> > downstream
> > port. If the dev does not then it will call the function on that
> > device
> > alone.
> > 
> > [0] ACPI PCI Express Base Specification 5.0-1 1.3.2.3 Root Complex
> > Integrated Endpoint Rules.
> > [1] ACPI PCI Express Base Specification 5.0-1 6.2 Error Signalling
> > and
> > Logging
> > [2] ACPI Specification 6.3 Chapter 18 ACPI Platform Error Interface
> > (APEI)
> > [3] ACPI Specification 6.3 18.2.3.7 Generic Hardware Error Source
> > [4] UEFI Specification 2.8, N.2.7 PCI Express Error Section
> > 
> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
> > ---
> >   drivers/pci/pcie/err.c | 54 ++++++++++++++++++++++++++++++++++---
> > -----
> >   1 file changed, 44 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> > index 14bb8f54723e..f4cfb37c26c1 100644
> > --- a/drivers/pci/pcie/err.c
> > +++ b/drivers/pci/pcie/err.c
> > @@ -146,38 +146,68 @@ static int report_resume(struct pci_dev *dev,
> > void *data)
> >   	return 0;
> >   }
> >   
> > +/**
> > + * pci_walk_dev_affected - walk devices potentially AER affected
> > + * @dev      device which may be an RCEC with associated RCiEPs,
> > + *           an RCiEP associated with an RCEC, or a Port.
> > + * @cb       callback to be called for each device found
> > + * @userdata arbitrary pointer to be passed to callback.
> > + *
> > + * If the device provided is a bridge, walk the subordinate bus,
> > + * including any bridged devices on buses under this bus.
> > + * Call the provided callback on each device found.
> > + *
> > + * If the device provided has no subordinate bus, call the
> > provided
> > + * callback on the device itself.
> > + */
> > +static void pci_walk_dev_affected(struct pci_dev *dev, int
> > (*cb)(struct pci_dev *, void *),
> > +				  void *userdata)
> > +{
> > +	if (dev->subordinate)
> > +		pci_walk_bus(dev->subordinate, cb, userdata);
> > +	else
> > +		cb(dev, userdata);
> > +}
> > +
> >   pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> >   			enum pci_channel_state state,
> >   			pci_ers_result_t (*reset_link)(struct pci_dev
> > *pdev))
> >   {
> >   	pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
> > -	struct pci_bus *bus;
> >   
> >   	/*
> >   	 * Error recovery runs on all subordinates of the first
> > downstream port.
> >   	 * If the downstream port detected the error, it is cleared at
> > the end.
> > +	 * For RCiEPs we should reset just the RCiEP itself.
> >   	 */
> >   	if (!(pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> > -	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM))
> > +	      pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
> > +	      pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END ||
> > +	      pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC))
> >   		dev = dev->bus->self;
> > -	bus = dev->subordinate;
> >   
> >   	pci_dbg(dev, "broadcast error_detected message\n");
> >   	if (state == pci_channel_io_frozen) {
> > -		pci_walk_bus(bus, report_frozen_detected, &status);
> > +		pci_walk_dev_affected(dev, report_frozen_detected,
> > &status);
> > +		if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END) {
> > +			pci_warn(dev, "link reset not possible for
> > RCiEP\n");
> > +			status = PCI_ERS_RESULT_NONE;
> > +			goto failed;
> reset_link is not applicable for RC_END, but why do you want to fail
> it?


This patch is incorporated prior to the addition of the dev->rcec link
for actually handling the RC_END case.  This is the first part before I
bring in the rest and is the basis also of Jonathan's original work.

See subsequent patches on top of err.c in this v3 series.


> > +		}
> > +
> >   		status = reset_link(dev);
> >   		if (status != PCI_ERS_RESULT_RECOVERED) {
> >   			pci_warn(dev, "link reset failed\n");
> >   			goto failed;
> >   		}
> >   	} else {
> > -		pci_walk_bus(bus, report_normal_detected, &status);
> > +		pci_walk_dev_affected(dev, report_normal_detected,
> > &status);
> >   	}
> >   
> >   	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
> >   		status = PCI_ERS_RESULT_RECOVERED;
> >   		pci_dbg(dev, "broadcast mmio_enabled message\n");
> > -		pci_walk_bus(bus, report_mmio_enabled, &status);
> > +		pci_walk_dev_affected(dev, report_mmio_enabled,
> > &status);
> >   	}
> >   
> >   	if (status == PCI_ERS_RESULT_NEED_RESET) {
> > @@ -188,17 +218,21 @@ pci_ers_result_t pcie_do_recovery(struct
> > pci_dev *dev,
> >   		 */
> >   		status = PCI_ERS_RESULT_RECOVERED;
> >   		pci_dbg(dev, "broadcast slot_reset message\n");
> > -		pci_walk_bus(bus, report_slot_reset, &status);
> > +		pci_walk_dev_affected(dev, report_slot_reset, &status);
> >   	}
> >   
> >   	if (status != PCI_ERS_RESULT_RECOVERED)
> >   		goto failed;
> >   
> >   	pci_dbg(dev, "broadcast resume message\n");
> > -	pci_walk_bus(bus, report_resume, &status);
> > +	pci_walk_dev_affected(dev, report_resume, &status);
> >   
> > -	pci_aer_clear_device_status(dev);
> > -	pci_aer_clear_nonfatal_status(dev);
> you want to prevent clearing status for RC_END ? Can you explain?

It's the RC_EC of the associated RC_END which is to be cleared.
However, in this original patch from Jonathan prior to my subsequent
addition of dev->rcec it is not possible. The important thing is not to
attempt to clear the RC_END without the association.

See subsequent patches on top of err.c in this v3 series.

Thanks,

Sean

> > +	if ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> > +	     pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
> > +	     pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC)) {
> > +		pci_aer_clear_device_status(dev);
> > +		pci_aer_clear_nonfatal_status(dev);
> > +	}
> >   	pci_info(dev, "device recovery successful\n");
> >   	return status;
> >   
> > 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 07/10] PCI: Add 'rcec' field to pci_dev for associated RCiEPs
  2020-08-12 16:46 ` [PATCH v3 07/10] PCI: Add 'rcec' field to pci_dev for associated RCiEPs Sean V Kelley
@ 2020-09-02 16:35   ` Bjorn Helgaas
  2020-09-02 22:24     ` Sean V Kelley
  0 siblings, 1 reply; 26+ messages in thread
From: Bjorn Helgaas @ 2020-09-02 16:35 UTC (permalink / raw)
  To: Sean V Kelley
  Cc: bhelgaas, Jonathan.Cameron, rjw, ashok.raj, tony.luck,
	sathyanarayanan.kuppuswamy, qiuxu.zhuo, linux-pci, linux-kernel

On Wed, Aug 12, 2020 at 09:46:56AM -0700, Sean V Kelley wrote:
> From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> 
> When attempting error recovery for an RCiEP associated with an RCEC device,
> there needs to be a way to update the Root Error Status, the Uncorrectable
> Error Status and the Uncorrectable Error Severity of the parent RCEC.
> So add the 'rcec' field to the pci_dev structure and provide a hook for the
> Root Port Driver to associate RCiEPs with their respective parent RCEC.

> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -202,6 +202,12 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>  		pci_walk_dev_affected(dev, report_frozen_detected, &status);
>  		if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END) {
>  			status = flr_on_rciep(dev);
> +			/*
> +			 * The callback only clears the Root Error Status
> +			 * of the RCEC (see aer.c).
> +			 */
> +			if (dev->rcec)
> +				reset_link(dev->rcec);
>  			if (status != PCI_ERS_RESULT_RECOVERED) {
>  				pci_warn(dev, "function level reset failed\n");
>  				goto failed;
> @@ -245,7 +251,11 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>  	     pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC)) {
>  		pci_aer_clear_device_status(dev);
>  		pci_aer_clear_nonfatal_status(dev);
> +	} else if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END && dev->rcec) {
> +		pci_aer_clear_device_status(dev->rcec);
> +		pci_aer_clear_nonfatal_status(dev->rcec);

Conceptually, I'm not sure why we need the dev->rcec link.  The error
was *reported* via the RCEC, so don't we know the RCEC up front,
before we even identify the RCiEP?  Can't we just remember that and
dispense with dev->rcec?

I'm also concerned that if we fail to identify the RCiEP (i.e., we
don't have a valid "dev" to use dev->rcec), we will fail to clear the
error status bits.  I think it's possible that the RCEC will report an
error, but the RCiEP that generated the error message is not
responding so we can't find it.

>  	}
> +
>  	pci_info(dev, "device recovery successful\n");
>  	return status;
>  
> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> index d5b109499b10..a64e88b7166b 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -90,6 +90,18 @@ static const struct dev_pm_ops pcie_portdrv_pm_ops = {
>  #define PCIE_PORTDRV_PM_OPS	NULL
>  #endif /* !PM */
>  
> +static int pcie_hook_rcec(struct pci_dev *pdev, void *data)
> +{
> +	struct pci_dev *rcec = (struct pci_dev *)data;
> +
> +	pdev->rcec = rcec;
> +	pci_dbg(rcec, "RCiEP(under an RCEC) %04x:%02x:%02x.%d\n",
> +		pci_domain_nr(pdev->bus), pdev->bus->number,
> +		PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));

If we do need dev->rcec, this should use pci_name() for the second
device instead of formatting the name manually.  I think I would
connect this with the RCiEP instead of the RCEC, e.g.,

  pci_dbg(pdev, "PME & error events reported via %s\n", pci_name(rcec));

> +
> +	return 0;
> +}
> +
>  /*
>   * pcie_portdrv_probe - Probe PCI-Express port devices
>   * @dev: PCI-Express port device being probed
> @@ -110,6 +122,9 @@ static int pcie_portdrv_probe(struct pci_dev *dev,
>  	     (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_EC)))
>  		return -ENODEV;
>  
> +	if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC)
> +		pcie_walk_rcec(dev, pcie_hook_rcec, dev);
> +
>  	status = pcie_port_device_register(dev);
>  	if (status)
>  		return status;
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index c7fc5726872c..ba29816c827b 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -330,6 +330,7 @@ struct pci_dev {
>  #ifdef CONFIG_PCIEPORTBUS
>  	u16		rcec_cap;	/* RCEC capability offset */
>  	struct rcec_ext *rcec_ext;	/* RCEC cached assoc. endpoint extended capabilities */
> +	struct pci_dev	*rcec;		/* Associated RCEC device */
>  #endif
>  	u8		pcie_cap;	/* PCIe capability offset */
>  	u8		msi_cap;	/* MSI capability offset */
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 04/10] PCI/RCEC: Add pcie_walk_rcec() to walk associated RCiEPs
  2020-08-12 16:46 ` [PATCH v3 04/10] PCI/RCEC: Add pcie_walk_rcec() to walk associated RCiEPs Sean V Kelley
@ 2020-09-02 19:00   ` Bjorn Helgaas
  2020-09-02 21:55     ` Sean V Kelley
  0 siblings, 1 reply; 26+ messages in thread
From: Bjorn Helgaas @ 2020-09-02 19:00 UTC (permalink / raw)
  To: Sean V Kelley
  Cc: bhelgaas, Jonathan.Cameron, rjw, ashok.raj, tony.luck,
	sathyanarayanan.kuppuswamy, qiuxu.zhuo, linux-pci, linux-kernel

On Wed, Aug 12, 2020 at 09:46:53AM -0700, Sean V Kelley wrote:
> From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> 
> When an RCEC device signals error(s) to a CPU core, the CPU core
> needs to walk all the RCiEPs associated with that RCEC to check
> errors. So add the function pcie_walk_rcec() to walk all RCiEPs
> associated with the RCEC device.
> 
> Co-developed-by: Sean V Kelley <sean.v.kelley@intel.com>
> Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
>  drivers/pci/pci.h       |  4 +++
>  drivers/pci/pcie/rcec.c | 76 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 80 insertions(+)
> 
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index bd25e6047b54..8bd7528d6977 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -473,9 +473,13 @@ static inline void pci_dpc_init(struct pci_dev *pdev) {}
>  #ifdef CONFIG_PCIEPORTBUS
>  void pci_rcec_init(struct pci_dev *dev);
>  void pci_rcec_exit(struct pci_dev *dev);
> +void pcie_walk_rcec(struct pci_dev *rcec, int (*cb)(struct pci_dev *, void *),
> +		    void *userdata);
>  #else
>  static inline void pci_rcec_init(struct pci_dev *dev) {}
>  static inline void pci_rcec_exit(struct pci_dev *dev) {}
> +static inline void pcie_walk_rcec(struct pci_dev *rcec, int (*cb)(struct pci_dev *, void *),
> +				  void *userdata) {}
>  #endif
>  
>  #ifdef CONFIG_PCI_ATS
> diff --git a/drivers/pci/pcie/rcec.c b/drivers/pci/pcie/rcec.c
> index 519ae086ff41..405f92fcdf7f 100644
> --- a/drivers/pci/pcie/rcec.c
> +++ b/drivers/pci/pcie/rcec.c
> @@ -17,6 +17,82 @@
>  
>  #include "../pci.h"
>  
> +static int pcie_walk_rciep_devfn(struct pci_bus *bus, int (*cb)(struct pci_dev *, void *),
> +				 void *userdata, const unsigned long bitmap)
> +{
> +	unsigned int devn, fn;
> +	struct pci_dev *dev;
> +	int retval;
> +
> +	for_each_set_bit(devn, &bitmap, 32) {
> +		for (fn = 0; fn < 8; fn++) {
> +			dev = pci_get_slot(bus, PCI_DEVFN(devn, fn));

Wow, this is a lot of churning to call pci_get_slot() 256 times per
bus for the "associated bus numbers" case where we pass a bitmap of
0xffffffff.  They didn't really make it easy for software when they
added the next/last bus number thing.

Just thinking out loud here.  What if we could set dev->rcec during
enumeration, and then use that to build pcie_walk_rcec()?

  bool rcec_assoc_rciep(rcec, rciep)
  {
    if (rcec->bus == rciep->bus)
      return (rcec->bitmap contains rciep->devfn);

    return (rcec->next/last contains rciep->bus);
  }

  link_rcec(dev, data)
  {
    struct pci_dev *rcec = data;

    if ((dev is RCiEP) && rcec_assoc_rciep(rcec, dev))
      dev->rcec = rcec;
  }

  find_rcec(dev, data)
  {
    struct pci_dev *rciep = data;

    if ((dev is RCEC) && rcec_assoc_rciep(dev, rciep))
      rciep->rcec = dev;
  }

  pci_setup_device
    ...
      if (dev is RCEC) {
	pci_rcec_init(dev)		# cache bitmap, next/last bus #
	pci_walk_bus(root_bus, link_rcec, dev); # link any RCiEP already found
      }
      if (dev is RCiEP) {
	pci_walk_bus(root_bus, find_rcec, dev); # link any RCEC already found
      }

Now we should have a valid dev->rcec for everything we've enumerated.

  struct walk_rcec_data {
    struct pci_dev *rcec;
    ... user_callback;
    void *user_data;
  };

  pcie_rcec_helper(dev, callback, data)
  {
    struct walk_rcec_data *rcec_data = data;

    if (dev->rcec == rcec_data->rcec)
      rcec_data->user_callback(dev, rcec_data->user_data);
  }

  pcie_walk_rcec(rcec, callback, data)
  {
    struct walk_rcec_data rcec_data;
    ...
    rcec_data.rcec = rcec;
    rcec_data.user_callback = callback;
    rcec_data.user_data = data;
    pci_walk_bus(root_bus, pcie_rcec_helper, &rcec_data);
  }

I hate having to walk the bus so many times, but I do like the fact
that in the runtime path (pcie_walk_rcec()) we don't have to do 256
pci_get_slot() calls per bus, almost all of which are going to fail.

> +			if (!dev)
> +				continue;
> +
> +			if (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_END) {
> +				pci_dev_put(dev);
> +				continue;
> +			}
> +
> +			retval = cb(dev, userdata);
> +			pci_dev_put(dev);
> +			if (retval)
> +				return retval;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * pcie_walk_rcec - Walk RCiEP devices associating with RCEC and call callback.
> + * @rcec     RCEC whose RCiEP devices should be walked.
> + * @cb       Callback to be called for each RCiEP device found.
> + * @userdata Arbitrary pointer to be passed to callback.
> + *
> + * Walk the given RCEC. Call the provided callback on each RCiEP device found.
> + *
> + * We check the return of @cb each time. If it returns anything
> + * other than 0, we break out.
> + */
> +void pcie_walk_rcec(struct pci_dev *rcec, int (*cb)(struct pci_dev *, void *),
> +		    void *userdata)
> +{
> +	u8 nextbusn, lastbusn;
> +	struct pci_bus *bus;
> +	unsigned int bnr;
> +
> +	if (!rcec->rcec_cap)
> +		return;
> +
> +	/* Find RCiEP devices on the same bus as the RCEC */
> +	if (pcie_walk_rciep_devfn(rcec->bus, cb, userdata, rcec->rcec_ext->bitmap))
> +		return;
> +
> +	/* Check whether RCEC BUSN register is present */
> +	if (rcec->rcec_ext->ver < PCI_RCEC_BUSN_REG_VER)
> +		return;
> +
> +	nextbusn = rcec->rcec_ext->nextbusn;
> +	lastbusn = rcec->rcec_ext->lastbusn;
> +
> +	/* All RCiEP devices are on the same bus as the RCEC */
> +	if (nextbusn == 0xff && lastbusn == 0x00)
> +		return;
> +
> +	for (bnr = nextbusn; bnr <= lastbusn; bnr++) {

I think we also need to skip the RCEC bus here, don't we?  7.9.10.3
says the Associated Bus Numbers register does not indicate association
between an EC and any Function on the same bus number as the EC
itself.  Something like this:

  if (bnr == rcec->bus->number)
    continue;

> +		bus = pci_find_bus(pci_domain_nr(rcec->bus), bnr);
> +		if (!bus)
> +			continue;
> +
> +		/* Find RCiEP devices on the given bus */
> +		if (pcie_walk_rciep_devfn(bus, cb, userdata, 0xffffffff))
> +			return;
> +	}
> +}
> +
>  void pci_rcec_init(struct pci_dev *dev)
>  {
>  	u32 rcec, hdr, busn;
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 04/10] PCI/RCEC: Add pcie_walk_rcec() to walk associated RCiEPs
  2020-09-02 19:00   ` Bjorn Helgaas
@ 2020-09-02 21:55     ` Sean V Kelley
  2020-09-04 22:18       ` Kelley, Sean V
  0 siblings, 1 reply; 26+ messages in thread
From: Sean V Kelley @ 2020-09-02 21:55 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: bhelgaas, Jonathan.Cameron, rjw, ashok.raj, tony.luck,
	sathyanarayanan.kuppuswamy, qiuxu.zhuo, linux-pci, linux-kernel

Hi Bjorn,

On Wed, 2020-09-02 at 14:00 -0500, Bjorn Helgaas wrote:
> On Wed, Aug 12, 2020 at 09:46:53AM -0700, Sean V Kelley wrote:
> > From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> > 
> > When an RCEC device signals error(s) to a CPU core, the CPU core
> > needs to walk all the RCiEPs associated with that RCEC to check
> > errors. So add the function pcie_walk_rcec() to walk all RCiEPs
> > associated with the RCEC device.
> > 
> > Co-developed-by: Sean V Kelley <sean.v.kelley@intel.com>
> > Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
> > Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > ---
> >  drivers/pci/pci.h       |  4 +++
> >  drivers/pci/pcie/rcec.c | 76
> > +++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 80 insertions(+)
> > 
> > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > index bd25e6047b54..8bd7528d6977 100644
> > --- a/drivers/pci/pci.h
> > +++ b/drivers/pci/pci.h
> > @@ -473,9 +473,13 @@ static inline void pci_dpc_init(struct pci_dev
> > *pdev) {}
> >  #ifdef CONFIG_PCIEPORTBUS
> >  void pci_rcec_init(struct pci_dev *dev);
> >  void pci_rcec_exit(struct pci_dev *dev);
> > +void pcie_walk_rcec(struct pci_dev *rcec, int (*cb)(struct pci_dev
> > *, void *),
> > +		    void *userdata);
> >  #else
> >  static inline void pci_rcec_init(struct pci_dev *dev) {}
> >  static inline void pci_rcec_exit(struct pci_dev *dev) {}
> > +static inline void pcie_walk_rcec(struct pci_dev *rcec, int
> > (*cb)(struct pci_dev *, void *),
> > +				  void *userdata) {}
> >  #endif
> >  
> >  #ifdef CONFIG_PCI_ATS
> > diff --git a/drivers/pci/pcie/rcec.c b/drivers/pci/pcie/rcec.c
> > index 519ae086ff41..405f92fcdf7f 100644
> > --- a/drivers/pci/pcie/rcec.c
> > +++ b/drivers/pci/pcie/rcec.c
> > @@ -17,6 +17,82 @@
> >  
> >  #include "../pci.h"
> >  
> > +static int pcie_walk_rciep_devfn(struct pci_bus *bus, int
> > (*cb)(struct pci_dev *, void *),
> > +				 void *userdata, const unsigned long
> > bitmap)
> > +{
> > +	unsigned int devn, fn;
> > +	struct pci_dev *dev;
> > +	int retval;
> > +
> > +	for_each_set_bit(devn, &bitmap, 32) {
> > +		for (fn = 0; fn < 8; fn++) {
> > +			dev = pci_get_slot(bus, PCI_DEVFN(devn, fn));
> 
> Wow, this is a lot of churning to call pci_get_slot() 256 times per
> bus for the "associated bus numbers" case where we pass a bitmap of
> 0xffffffff.  They didn't really make it easy for software when they
> added the next/last bus number thing.
> 
> Just thinking out loud here.  What if we could set dev->rcec during
> enumeration, and then use that to build pcie_walk_rcec()?


I think follow what you are doing.

As we enumerate an RCEC, use the time to discover RCiEPs and associate
each RCiEP's dev->rcec. Although BIOS already set the bitmap for this
specific RCEC, it's more efficient to simply discover the devices
through the bus walk and verify each one found against the bitmap. 

Further, while we can be certain that an RCiEP found with a matching
device no. in a bitmap for an associated RCEC is correct, we cannot be
certain that any RCiEP found on another bus range is correct unless we
verify the bus is within that next/last bus range. 

Finally, that's where find_rcec() callback for rcec_assoc_rciep() does
double duty by also checking on the "on-a-separate-bus" case captured
potentially by find_rcec() during an RCiEP's bus walk.

 
> 
>   bool rcec_assoc_rciep(rcec, rciep)
>   {
>     if (rcec->bus == rciep->bus)
>       return (rcec->bitmap contains rciep->devfn);
> 
>     return (rcec->next/last contains rciep->bus);
>   }
> 
>   link_rcec(dev, data)
>   {
>     struct pci_dev *rcec = data;
> 
>     if ((dev is RCiEP) && rcec_assoc_rciep(rcec, dev))
>       dev->rcec = rcec;
>   }
> 
>   find_rcec(dev, data)
>   {
>     struct pci_dev *rciep = data;
> 
>     if ((dev is RCEC) && rcec_assoc_rciep(dev, rciep))
>       rciep->rcec = dev;
>   }
> 
>   pci_setup_device
>     ...
>       if (dev is RCEC) {
> 	pci_rcec_init(dev)		# cache bitmap, next/last bus
> #
> 	pci_walk_bus(root_bus, link_rcec, dev); # link any RCiEP
> already found
>       }
>       if (dev is RCiEP) {
> 	pci_walk_bus(root_bus, find_rcec, dev); # link any RCEC already
> found
>       }
> 
> Now we should have a valid dev->rcec for everything we've enumerated.
> 
>   struct walk_rcec_data {
>     struct pci_dev *rcec;
>     ... user_callback;
>     void *user_data;
>   };
> 
>   pcie_rcec_helper(dev, callback, data)
>   {
>     struct walk_rcec_data *rcec_data = data;
> 
>     if (dev->rcec == rcec_data->rcec)
>       rcec_data->user_callback(dev, rcec_data->user_data);
>   }
> 
>   pcie_walk_rcec(rcec, callback, data)
>   {
>     struct walk_rcec_data rcec_data;
>     ...
>     rcec_data.rcec = rcec;
>     rcec_data.user_callback = callback;
>     rcec_data.user_data = data;
>     pci_walk_bus(root_bus, pcie_rcec_helper, &rcec_data);
>   }
> 
> I hate having to walk the bus so many times, but I do like the fact
> that in the runtime path (pcie_walk_rcec()) we don't have to do 256
> pci_get_slot() calls per bus, almost all of which are going to fail.


That's really the trade-off and it's slightly harder to follow.

Will implement and see how it looks / tests.


> 
> > +			if (!dev)
> > +				continue;
> > +
> > +			if (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_END)
> > {
> > +				pci_dev_put(dev);
> > +				continue;
> > +			}
> > +
> > +			retval = cb(dev, userdata);
> > +			pci_dev_put(dev);
> > +			if (retval)
> > +				return retval;
> > +		}
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +/**
> > + * pcie_walk_rcec - Walk RCiEP devices associating with RCEC and
> > call callback.
> > + * @rcec     RCEC whose RCiEP devices should be walked.
> > + * @cb       Callback to be called for each RCiEP device found.
> > + * @userdata Arbitrary pointer to be passed to callback.
> > + *
> > + * Walk the given RCEC. Call the provided callback on each RCiEP
> > device found.
> > + *
> > + * We check the return of @cb each time. If it returns anything
> > + * other than 0, we break out.
> > + */
> > +void pcie_walk_rcec(struct pci_dev *rcec, int (*cb)(struct pci_dev
> > *, void *),
> > +		    void *userdata)
> > +{
> > +	u8 nextbusn, lastbusn;
> > +	struct pci_bus *bus;
> > +	unsigned int bnr;
> > +
> > +	if (!rcec->rcec_cap)
> > +		return;
> > +
> > +	/* Find RCiEP devices on the same bus as the RCEC */
> > +	if (pcie_walk_rciep_devfn(rcec->bus, cb, userdata, rcec-
> > >rcec_ext->bitmap))
> > +		return;
> > +
> > +	/* Check whether RCEC BUSN register is present */
> > +	if (rcec->rcec_ext->ver < PCI_RCEC_BUSN_REG_VER)
> > +		return;
> > +
> > +	nextbusn = rcec->rcec_ext->nextbusn;
> > +	lastbusn = rcec->rcec_ext->lastbusn;
> > +
> > +	/* All RCiEP devices are on the same bus as the RCEC */
> > +	if (nextbusn == 0xff && lastbusn == 0x00)
> > +		return;
> > +
> > +	for (bnr = nextbusn; bnr <= lastbusn; bnr++) {
> 
> I think we also need to skip the RCEC bus here, don't we?  7.9.10.3
> says the Associated Bus Numbers register does not indicate
> association
> between an EC and any Function on the same bus number as the EC
> itself.  Something like this:
> 
>   if (bnr == rcec->bus->number)
>     continue;

Correct. Although it is permitted to include the bus number for the
RCEC in that range (not sure why), skipping the RCEC's should be done.

Will do.

> 
> > +		bus = pci_find_bus(pci_domain_nr(rcec->bus), bnr);
> > +		if (!bus)
> > +			continue;
> > +
> > +		/* Find RCiEP devices on the given bus */
> > +		if (pcie_walk_rciep_devfn(bus, cb, userdata,
> > 0xffffffff))
> > +			return;
> > +	}
> > +}
> > +
> >  void pci_rcec_init(struct pci_dev *dev)
> >  {
> >  	u32 rcec, hdr, busn;
> > -- 
> > 2.28.0
> > 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 07/10] PCI: Add 'rcec' field to pci_dev for associated RCiEPs
  2020-09-02 16:35   ` Bjorn Helgaas
@ 2020-09-02 22:24     ` Sean V Kelley
  0 siblings, 0 replies; 26+ messages in thread
From: Sean V Kelley @ 2020-09-02 22:24 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: bhelgaas, Jonathan.Cameron, rjw, ashok.raj, tony.luck,
	sathyanarayanan.kuppuswamy, qiuxu.zhuo, linux-pci, linux-kernel

Hi Bjorn,

On Wed, 2020-09-02 at 11:35 -0500, Bjorn Helgaas wrote:
> On Wed, Aug 12, 2020 at 09:46:56AM -0700, Sean V Kelley wrote:
> > From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> > 
> > When attempting error recovery for an RCiEP associated with an RCEC
> > device,
> > there needs to be a way to update the Root Error Status, the
> > Uncorrectable
> > Error Status and the Uncorrectable Error Severity of the parent
> > RCEC.
> > So add the 'rcec' field to the pci_dev structure and provide a hook
> > for the
> > Root Port Driver to associate RCiEPs with their respective parent
> > RCEC.
> > --- a/drivers/pci/pcie/err.c
> > +++ b/drivers/pci/pcie/err.c
> > @@ -202,6 +202,12 @@ pci_ers_result_t pcie_do_recovery(struct
> > pci_dev *dev,
> >  		pci_walk_dev_affected(dev, report_frozen_detected,
> > &status);
> >  		if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END) {
> >  			status = flr_on_rciep(dev);
> > +			/*
> > +			 * The callback only clears the Root Error
> > Status
> > +			 * of the RCEC (see aer.c).
> > +			 */
> > +			if (dev->rcec)
> > +				reset_link(dev->rcec);
> >  			if (status != PCI_ERS_RESULT_RECOVERED) {
> >  				pci_warn(dev, "function level reset
> > failed\n");
> >  				goto failed;
> > @@ -245,7 +251,11 @@ pci_ers_result_t pcie_do_recovery(struct
> > pci_dev *dev,
> >  	     pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC)) {
> >  		pci_aer_clear_device_status(dev);
> >  		pci_aer_clear_nonfatal_status(dev);
> > +	} else if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END && dev-
> > >rcec) {
> > +		pci_aer_clear_device_status(dev->rcec);
> > +		pci_aer_clear_nonfatal_status(dev->rcec);
> 
> Conceptually, I'm not sure why we need the dev->rcec link.  The error
> was *reported* via the RCEC, so don't we know the RCEC up front,
> before we even identify the RCiEP?  Can't we just remember that and
> dispense with dev->rcec?

However, we can also get errors reported by that same RCEC that are not
related to the associated RCiEPs. Further, an RCiEP in reporting an
error will trigger logging to the Root Error Command/Status and Error
Source Identification registers of the associated RCEC. The assumption
in pcie_do_recovery() here is that I can cover both scenarios.
 
In a new revision of this patch series...

        if (type != PCI_EXP_TYPE_RC_END) {
                if (pcie_aer_is_native(bridge))
                        pcie_clear_device_status(bridge);
                pci_aer_clear_nonfatal_status(bridge);
        }

I've a new revision that makes use of the earlier thread discussing
'bridge' assignment conditionally and when it makes sense to refer to
the 'dev'.

.

> 
> I'm also concerned that if we fail to identify the RCiEP (i.e., we
> don't have a valid "dev" to use dev->rcec), we will fail to clear the
> error status bits.  I think it's possible that the RCEC will report
> an
> error, but the RCiEP that generated the error message is not
> responding so we can't find it.

That's the association which is enumerated at boot by BIOS either
through the bitmap or through the next/last bus range. Are you
describing a scenario in which during enumeration of the RCECs/RCiEPs
the relationships are not discovered?

The second bit you describe is where the error is reported and the
RCiEP is not responding. I wonder if that will eventually trigger an
error from the RCEC.  But you are right such a scenario could happen so
you are getting the error but the device is not present or may not even
have been present from BIOS perspective?

Sean


> 
> >  	}
> > +
> >  	pci_info(dev, "device recovery successful\n");
> >  	return status;
> >  
> > diff --git a/drivers/pci/pcie/portdrv_pci.c
> > b/drivers/pci/pcie/portdrv_pci.c
> > index d5b109499b10..a64e88b7166b 100644
> > --- a/drivers/pci/pcie/portdrv_pci.c
> > +++ b/drivers/pci/pcie/portdrv_pci.c
> > @@ -90,6 +90,18 @@ static const struct dev_pm_ops
> > pcie_portdrv_pm_ops = {
> >  #define PCIE_PORTDRV_PM_OPS	NULL
> >  #endif /* !PM */
> >  
> > +static int pcie_hook_rcec(struct pci_dev *pdev, void *data)
> > +{
> > +	struct pci_dev *rcec = (struct pci_dev *)data;
> > +
> > +	pdev->rcec = rcec;
> > +	pci_dbg(rcec, "RCiEP(under an RCEC) %04x:%02x:%02x.%d\n",
> > +		pci_domain_nr(pdev->bus), pdev->bus->number,
> > +		PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
> 
> If we do need dev->rcec, this should use pci_name() for the second
> device instead of formatting the name manually.  I think I would
> connect this with the RCiEP instead of the RCEC, e.g.,
> 
>   pci_dbg(pdev, "PME & error events reported via %s\n",
> pci_name(rcec));
> 
> > +
> > +	return 0;
> > +}
> > +
> >  /*
> >   * pcie_portdrv_probe - Probe PCI-Express port devices
> >   * @dev: PCI-Express port device being probed
> > @@ -110,6 +122,9 @@ static int pcie_portdrv_probe(struct pci_dev
> > *dev,
> >  	     (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_EC)))
> >  		return -ENODEV;
> >  
> > +	if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC)
> > +		pcie_walk_rcec(dev, pcie_hook_rcec, dev);
> > +
> >  	status = pcie_port_device_register(dev);
> >  	if (status)
> >  		return status;
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index c7fc5726872c..ba29816c827b 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -330,6 +330,7 @@ struct pci_dev {
> >  #ifdef CONFIG_PCIEPORTBUS
> >  	u16		rcec_cap;	/* RCEC capability offset */
> >  	struct rcec_ext *rcec_ext;	/* RCEC cached assoc. endpoint
> > extended capabilities */
> > +	struct pci_dev	*rcec;		/* Associated RCEC device
> > */
> >  #endif
> >  	u8		pcie_cap;	/* PCIe capability offset */
> >  	u8		msi_cap;	/* MSI capability offset */
> > -- 
> > 2.28.0
> > 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 04/10] PCI/RCEC: Add pcie_walk_rcec() to walk associated RCiEPs
  2020-09-02 21:55     ` Sean V Kelley
@ 2020-09-04 22:18       ` Kelley, Sean V
  2020-09-05  2:23         ` Bjorn Helgaas
  0 siblings, 1 reply; 26+ messages in thread
From: Kelley, Sean V @ 2020-09-04 22:18 UTC (permalink / raw)
  To: helgaas
  Cc: Zhuo, Qiuxu, Jonathan.Cameron, rjw, sathyanarayanan.kuppuswamy,
	Raj, Ashok, Luck, Tony, linux-pci, bhelgaas, linux-kernel

Hi Bjorn,

Quick question below...

On Wed, 2020-09-02 at 14:55 -0700, Sean V Kelley wrote:
> Hi Bjorn,
> 
> On Wed, 2020-09-02 at 14:00 -0500, Bjorn Helgaas wrote:
> > On Wed, Aug 12, 2020 at 09:46:53AM -0700, Sean V Kelley wrote:
> > > From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> > > 
> > > When an RCEC device signals error(s) to a CPU core, the CPU core
> > > needs to walk all the RCiEPs associated with that RCEC to check
> > > errors. So add the function pcie_walk_rcec() to walk all RCiEPs
> > > associated with the RCEC device.
> > > 
> > > Co-developed-by: Sean V Kelley <sean.v.kelley@intel.com>
> > > Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
> > > Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > ---
> > >  drivers/pci/pci.h       |  4 +++
> > >  drivers/pci/pcie/rcec.c | 76
> > > +++++++++++++++++++++++++++++++++++++++++
> > >  2 files changed, 80 insertions(+)
> > > 
> > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > > index bd25e6047b54..8bd7528d6977 100644
> > > --- a/drivers/pci/pci.h
> > > +++ b/drivers/pci/pci.h
> > > @@ -473,9 +473,13 @@ static inline void pci_dpc_init(struct
> > > pci_dev
> > > *pdev) {}
> > >  #ifdef CONFIG_PCIEPORTBUS
> > >  void pci_rcec_init(struct pci_dev *dev);
> > >  void pci_rcec_exit(struct pci_dev *dev);
> > > +void pcie_walk_rcec(struct pci_dev *rcec, int (*cb)(struct
> > > pci_dev
> > > *, void *),
> > > +		    void *userdata);
> > >  #else
> > >  static inline void pci_rcec_init(struct pci_dev *dev) {}
> > >  static inline void pci_rcec_exit(struct pci_dev *dev) {}
> > > +static inline void pcie_walk_rcec(struct pci_dev *rcec, int
> > > (*cb)(struct pci_dev *, void *),
> > > +				  void *userdata) {}
> > >  #endif
> > >  
> > >  #ifdef CONFIG_PCI_ATS
> > > diff --git a/drivers/pci/pcie/rcec.c b/drivers/pci/pcie/rcec.c
> > > index 519ae086ff41..405f92fcdf7f 100644
> > > --- a/drivers/pci/pcie/rcec.c
> > > +++ b/drivers/pci/pcie/rcec.c
> > > @@ -17,6 +17,82 @@
> > >  
> > >  #include "../pci.h"
> > >  
> > > +static int pcie_walk_rciep_devfn(struct pci_bus *bus, int
> > > (*cb)(struct pci_dev *, void *),
> > > +				 void *userdata, const unsigned long
> > > bitmap)
> > > +{
> > > +	unsigned int devn, fn;
> > > +	struct pci_dev *dev;
> > > +	int retval;
> > > +
> > > +	for_each_set_bit(devn, &bitmap, 32) {
> > > +		for (fn = 0; fn < 8; fn++) {
> > > +			dev = pci_get_slot(bus, PCI_DEVFN(devn, fn));
> > 
> > Wow, this is a lot of churning to call pci_get_slot() 256 times per
> > bus for the "associated bus numbers" case where we pass a bitmap of
> > 0xffffffff.  They didn't really make it easy for software when they
> > added the next/last bus number thing.
> > 
> > Just thinking out loud here.  What if we could set dev->rcec during
> > enumeration, and then use that to build pcie_walk_rcec()?
> 
> I think follow what you are doing.
> 
> As we enumerate an RCEC, use the time to discover RCiEPs and
> associate
> each RCiEP's dev->rcec. Although BIOS already set the bitmap for this
> specific RCEC, it's more efficient to simply discover the devices
> through the bus walk and verify each one found against the bitmap. 
> 
> Further, while we can be certain that an RCiEP found with a matching
> device no. in a bitmap for an associated RCEC is correct, we cannot
> be
> certain that any RCiEP found on another bus range is correct unless
> we
> verify the bus is within that next/last bus range. 
> 
> Finally, that's where find_rcec() callback for rcec_assoc_rciep()
> does
> double duty by also checking on the "on-a-separate-bus" case captured
> potentially by find_rcec() during an RCiEP's bus walk.
> 
>  
> >   bool rcec_assoc_rciep(rcec, rciep)
> >   {
> >     if (rcec->bus == rciep->bus)
> >       return (rcec->bitmap contains rciep->devfn);
> > 
> >     return (rcec->next/last contains rciep->bus);
> >   }
> > 
> >   link_rcec(dev, data)
> >   {
> >     struct pci_dev *rcec = data;
> > 
> >     if ((dev is RCiEP) && rcec_assoc_rciep(rcec, dev))
> >       dev->rcec = rcec;
> >   }
> > 
> >   find_rcec(dev, data)
> >   {
> >     struct pci_dev *rciep = data;
> > 
> >     if ((dev is RCEC) && rcec_assoc_rciep(dev, rciep))
> >       rciep->rcec = dev;
> >   }
> > 
> >   pci_setup_device
> >     ...

I just noticed your use of pci_setup_device(). Are you suggesting
moving the call to pci_rcec_init() out of pci_init_capabilities() and
move it into pci_setup_device()?  If so, would pci_rcec_exit() still
remain in pci_release_capabilities()?

I'm just wondering if it could just remain in pci_init_capabilities().


Thanks,

Sean

> >       if (dev is RCEC) {
> > 	pci_rcec_init(dev)		# cache bitmap, next/last bus
> > #
> > 	pci_walk_bus(root_bus, link_rcec, dev); # link any RCiEP
> > already found
> >       }
> >       if (dev is RCiEP) {
> > 	pci_walk_bus(root_bus, find_rcec, dev); # link any RCEC already
> > found
> >       }
> > 
> > Now we should have a valid dev->rcec for everything we've
> > enumerated.
> > 
> >   struct walk_rcec_data {
> >     struct pci_dev *rcec;
> >     ... user_callback;
> >     void *user_data;
> >   };
> > 
> >   pcie_rcec_helper(dev, callback, data)
> >   {
> >     struct walk_rcec_data *rcec_data = data;
> > 
> >     if (dev->rcec == rcec_data->rcec)
> >       rcec_data->user_callback(dev, rcec_data->user_data);
> >   }
> > 
> >   pcie_walk_rcec(rcec, callback, data)
> >   {
> >     struct walk_rcec_data rcec_data;
> >     ...
> >     rcec_data.rcec = rcec;
> >     rcec_data.user_callback = callback;
> >     rcec_data.user_data = data;
> >     pci_walk_bus(root_bus, pcie_rcec_helper, &rcec_data);
> >   }
> > 
> > I hate having to walk the bus so many times, but I do like the fact
> > that in the runtime path (pcie_walk_rcec()) we don't have to do 256
> > pci_get_slot() calls per bus, almost all of which are going to
> > fail.
> 
> That's really the trade-off and it's slightly harder to follow.
> 
> Will implement and see how it looks / tests.
> 
> 
> > > +			if (!dev)
> > > +				continue;
> > > +
> > > +			if (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_END)
> > > {
> > > +				pci_dev_put(dev);
> > > +				continue;
> > > +			}
> > > +
> > > +			retval = cb(dev, userdata);
> > > +			pci_dev_put(dev);
> > > +			if (retval)
> > > +				return retval;
> > > +		}
> > > +	}
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +/**
> > > + * pcie_walk_rcec - Walk RCiEP devices associating with RCEC and
> > > call callback.
> > > + * @rcec     RCEC whose RCiEP devices should be walked.
> > > + * @cb       Callback to be called for each RCiEP device found.
> > > + * @userdata Arbitrary pointer to be passed to callback.
> > > + *
> > > + * Walk the given RCEC. Call the provided callback on each RCiEP
> > > device found.
> > > + *
> > > + * We check the return of @cb each time. If it returns anything
> > > + * other than 0, we break out.
> > > + */
> > > +void pcie_walk_rcec(struct pci_dev *rcec, int (*cb)(struct
> > > pci_dev
> > > *, void *),
> > > +		    void *userdata)
> > > +{
> > > +	u8 nextbusn, lastbusn;
> > > +	struct pci_bus *bus;
> > > +	unsigned int bnr;
> > > +
> > > +	if (!rcec->rcec_cap)
> > > +		return;
> > > +
> > > +	/* Find RCiEP devices on the same bus as the RCEC */
> > > +	if (pcie_walk_rciep_devfn(rcec->bus, cb, userdata, rcec-
> > > > rcec_ext->bitmap))
> > > +		return;
> > > +
> > > +	/* Check whether RCEC BUSN register is present */
> > > +	if (rcec->rcec_ext->ver < PCI_RCEC_BUSN_REG_VER)
> > > +		return;
> > > +
> > > +	nextbusn = rcec->rcec_ext->nextbusn;
> > > +	lastbusn = rcec->rcec_ext->lastbusn;
> > > +
> > > +	/* All RCiEP devices are on the same bus as the RCEC */
> > > +	if (nextbusn == 0xff && lastbusn == 0x00)
> > > +		return;
> > > +
> > > +	for (bnr = nextbusn; bnr <= lastbusn; bnr++) {
> > 
> > I think we also need to skip the RCEC bus here, don't we?  7.9.10.3
> > says the Associated Bus Numbers register does not indicate
> > association
> > between an EC and any Function on the same bus number as the EC
> > itself.  Something like this:
> > 
> >   if (bnr == rcec->bus->number)
> >     continue;
> 
> Correct. Although it is permitted to include the bus number for the
> RCEC in that range (not sure why), skipping the RCEC's should be
> done.
> 
> Will do.
> 
> > > +		bus = pci_find_bus(pci_domain_nr(rcec->bus), bnr);
> > > +		if (!bus)
> > > +			continue;
> > > +
> > > +		/* Find RCiEP devices on the given bus */
> > > +		if (pcie_walk_rciep_devfn(bus, cb, userdata,
> > > 0xffffffff))
> > > +			return;
> > > +	}
> > > +}
> > > +
> > >  void pci_rcec_init(struct pci_dev *dev)
> > >  {
> > >  	u32 rcec, hdr, busn;
> > > -- 
> > > 2.28.0
> > > 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 04/10] PCI/RCEC: Add pcie_walk_rcec() to walk associated RCiEPs
  2020-09-04 22:18       ` Kelley, Sean V
@ 2020-09-05  2:23         ` Bjorn Helgaas
  2020-09-11 23:16           ` Sean V Kelley
  0 siblings, 1 reply; 26+ messages in thread
From: Bjorn Helgaas @ 2020-09-05  2:23 UTC (permalink / raw)
  To: Kelley, Sean V
  Cc: Zhuo, Qiuxu, Jonathan.Cameron, rjw, sathyanarayanan.kuppuswamy,
	Raj, Ashok, Luck, Tony, linux-pci, bhelgaas, linux-kernel

On Fri, Sep 04, 2020 at 10:18:30PM +0000, Kelley, Sean V wrote:
> Hi Bjorn,
> 
> Quick question below...
> 
> On Wed, 2020-09-02 at 14:55 -0700, Sean V Kelley wrote:
> > Hi Bjorn,
> > 
> > On Wed, 2020-09-02 at 14:00 -0500, Bjorn Helgaas wrote:
> > > On Wed, Aug 12, 2020 at 09:46:53AM -0700, Sean V Kelley wrote:
> > > > From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> > > > 
> > > > When an RCEC device signals error(s) to a CPU core, the CPU core
> > > > needs to walk all the RCiEPs associated with that RCEC to check
> > > > errors. So add the function pcie_walk_rcec() to walk all RCiEPs
> > > > associated with the RCEC device.
> > > > 
> > > > Co-developed-by: Sean V Kelley <sean.v.kelley@intel.com>
> > > > Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
> > > > Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> > > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > > ---
> > > >  drivers/pci/pci.h       |  4 +++
> > > >  drivers/pci/pcie/rcec.c | 76
> > > > +++++++++++++++++++++++++++++++++++++++++
> > > >  2 files changed, 80 insertions(+)
> > > > 
> > > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > > > index bd25e6047b54..8bd7528d6977 100644
> > > > --- a/drivers/pci/pci.h
> > > > +++ b/drivers/pci/pci.h
> > > > @@ -473,9 +473,13 @@ static inline void pci_dpc_init(struct
> > > > pci_dev
> > > > *pdev) {}
> > > >  #ifdef CONFIG_PCIEPORTBUS
> > > >  void pci_rcec_init(struct pci_dev *dev);
> > > >  void pci_rcec_exit(struct pci_dev *dev);
> > > > +void pcie_walk_rcec(struct pci_dev *rcec, int (*cb)(struct
> > > > pci_dev
> > > > *, void *),
> > > > +		    void *userdata);
> > > >  #else
> > > >  static inline void pci_rcec_init(struct pci_dev *dev) {}
> > > >  static inline void pci_rcec_exit(struct pci_dev *dev) {}
> > > > +static inline void pcie_walk_rcec(struct pci_dev *rcec, int
> > > > (*cb)(struct pci_dev *, void *),
> > > > +				  void *userdata) {}
> > > >  #endif
> > > >  
> > > >  #ifdef CONFIG_PCI_ATS
> > > > diff --git a/drivers/pci/pcie/rcec.c b/drivers/pci/pcie/rcec.c
> > > > index 519ae086ff41..405f92fcdf7f 100644
> > > > --- a/drivers/pci/pcie/rcec.c
> > > > +++ b/drivers/pci/pcie/rcec.c
> > > > @@ -17,6 +17,82 @@
> > > >  
> > > >  #include "../pci.h"
> > > >  
> > > > +static int pcie_walk_rciep_devfn(struct pci_bus *bus, int
> > > > (*cb)(struct pci_dev *, void *),
> > > > +				 void *userdata, const unsigned long
> > > > bitmap)
> > > > +{
> > > > +	unsigned int devn, fn;
> > > > +	struct pci_dev *dev;
> > > > +	int retval;
> > > > +
> > > > +	for_each_set_bit(devn, &bitmap, 32) {
> > > > +		for (fn = 0; fn < 8; fn++) {
> > > > +			dev = pci_get_slot(bus, PCI_DEVFN(devn, fn));
> > > 
> > > Wow, this is a lot of churning to call pci_get_slot() 256 times per
> > > bus for the "associated bus numbers" case where we pass a bitmap of
> > > 0xffffffff.  They didn't really make it easy for software when they
> > > added the next/last bus number thing.
> > > 
> > > Just thinking out loud here.  What if we could set dev->rcec during
> > > enumeration, and then use that to build pcie_walk_rcec()?
> > 
> > I think follow what you are doing.
> > 
> > As we enumerate an RCEC, use the time to discover RCiEPs and
> > associate
> > each RCiEP's dev->rcec. Although BIOS already set the bitmap for this
> > specific RCEC, it's more efficient to simply discover the devices
> > through the bus walk and verify each one found against the bitmap. 
> > 
> > Further, while we can be certain that an RCiEP found with a matching
> > device no. in a bitmap for an associated RCEC is correct, we cannot
> > be
> > certain that any RCiEP found on another bus range is correct unless
> > we
> > verify the bus is within that next/last bus range. 
> > 
> > Finally, that's where find_rcec() callback for rcec_assoc_rciep()
> > does
> > double duty by also checking on the "on-a-separate-bus" case captured
> > potentially by find_rcec() during an RCiEP's bus walk.
> > 
> >  
> > >   bool rcec_assoc_rciep(rcec, rciep)
> > >   {
> > >     if (rcec->bus == rciep->bus)
> > >       return (rcec->bitmap contains rciep->devfn);
> > > 
> > >     return (rcec->next/last contains rciep->bus);
> > >   }
> > > 
> > >   link_rcec(dev, data)
> > >   {
> > >     struct pci_dev *rcec = data;
> > > 
> > >     if ((dev is RCiEP) && rcec_assoc_rciep(rcec, dev))
> > >       dev->rcec = rcec;
> > >   }
> > > 
> > >   find_rcec(dev, data)
> > >   {
> > >     struct pci_dev *rciep = data;
> > > 
> > >     if ((dev is RCEC) && rcec_assoc_rciep(dev, rciep))
> > >       rciep->rcec = dev;
> > >   }
> > > 
> > >   pci_setup_device
> > >     ...
> 
> I just noticed your use of pci_setup_device(). Are you suggesting
> moving the call to pci_rcec_init() out of pci_init_capabilities() and
> move it into pci_setup_device()?  If so, would pci_rcec_exit() still
> remain in pci_release_capabilities()?
> 
> I'm just wondering if it could just remain in pci_init_capabilities().

Yeah, I didn't mean in pci_setup_device() specifically, just somewhere
in the callchain of pci_setup_device().  But you're right, it probably
would make more sense in pci_init_capabilities(), so I *should* have
said pci_scan_single_device() to be a little less specific.

Bjorn

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 04/10] PCI/RCEC: Add pcie_walk_rcec() to walk associated RCiEPs
  2020-09-05  2:23         ` Bjorn Helgaas
@ 2020-09-11 23:16           ` Sean V Kelley
  2020-09-12  0:50             ` Bjorn Helgaas
  0 siblings, 1 reply; 26+ messages in thread
From: Sean V Kelley @ 2020-09-11 23:16 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Zhuo, Qiuxu, Jonathan.Cameron, rjw, sathyanarayanan.kuppuswamy,
	Raj, Ashok, Luck, Tony, linux-pci, bhelgaas, linux-kernel

On 4 Sep 2020, at 19:23, Bjorn Helgaas wrote:

> On Fri, Sep 04, 2020 at 10:18:30PM +0000, Kelley, Sean V wrote:
>> Hi Bjorn,
>>
>> Quick question below...
>>
>> On Wed, 2020-09-02 at 14:55 -0700, Sean V Kelley wrote:
>>> Hi Bjorn,
>>>
>>> On Wed, 2020-09-02 at 14:00 -0500, Bjorn Helgaas wrote:
>>>> On Wed, Aug 12, 2020 at 09:46:53AM -0700, Sean V Kelley wrote:
>>>>> From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>>>>>
>>>>> When an RCEC device signals error(s) to a CPU core, the CPU core
>>>>> needs to walk all the RCiEPs associated with that RCEC to check
>>>>> errors. So add the function pcie_walk_rcec() to walk all RCiEPs
>>>>> associated with the RCEC device.
>>>>>
>>>>> Co-developed-by: Sean V Kelley <sean.v.kelley@intel.com>
>>>>> Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
>>>>> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>>>>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>>>> ---
>>>>>  drivers/pci/pci.h       |  4 +++
>>>>>  drivers/pci/pcie/rcec.c | 76
>>>>> +++++++++++++++++++++++++++++++++++++++++
>>>>>  2 files changed, 80 insertions(+)
>>>>>
>>>>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>>>>> index bd25e6047b54..8bd7528d6977 100644
>>>>> --- a/drivers/pci/pci.h
>>>>> +++ b/drivers/pci/pci.h
>>>>> @@ -473,9 +473,13 @@ static inline void pci_dpc_init(struct
>>>>> pci_dev
>>>>> *pdev) {}
>>>>>  #ifdef CONFIG_PCIEPORTBUS
>>>>>  void pci_rcec_init(struct pci_dev *dev);
>>>>>  void pci_rcec_exit(struct pci_dev *dev);
>>>>> +void pcie_walk_rcec(struct pci_dev *rcec, int (*cb)(struct
>>>>> pci_dev
>>>>> *, void *),
>>>>> +		    void *userdata);
>>>>>  #else
>>>>>  static inline void pci_rcec_init(struct pci_dev *dev) {}
>>>>>  static inline void pci_rcec_exit(struct pci_dev *dev) {}
>>>>> +static inline void pcie_walk_rcec(struct pci_dev *rcec, int
>>>>> (*cb)(struct pci_dev *, void *),
>>>>> +				  void *userdata) {}
>>>>>  #endif
>>>>>
>>>>>  #ifdef CONFIG_PCI_ATS
>>>>> diff --git a/drivers/pci/pcie/rcec.c b/drivers/pci/pcie/rcec.c
>>>>> index 519ae086ff41..405f92fcdf7f 100644
>>>>> --- a/drivers/pci/pcie/rcec.c
>>>>> +++ b/drivers/pci/pcie/rcec.c
>>>>> @@ -17,6 +17,82 @@
>>>>>
>>>>>  #include "../pci.h"
>>>>>
>>>>> +static int pcie_walk_rciep_devfn(struct pci_bus *bus, int
>>>>> (*cb)(struct pci_dev *, void *),
>>>>> +				 void *userdata, const unsigned long
>>>>> bitmap)
>>>>> +{
>>>>> +	unsigned int devn, fn;
>>>>> +	struct pci_dev *dev;
>>>>> +	int retval;
>>>>> +
>>>>> +	for_each_set_bit(devn, &bitmap, 32) {
>>>>> +		for (fn = 0; fn < 8; fn++) {
>>>>> +			dev = pci_get_slot(bus, PCI_DEVFN(devn, fn));
>>>>
>>>> Wow, this is a lot of churning to call pci_get_slot() 256 times per
>>>> bus for the "associated bus numbers" case where we pass a bitmap of
>>>> 0xffffffff.  They didn't really make it easy for software when they
>>>> added the next/last bus number thing.
>>>>
>>>> Just thinking out loud here.  What if we could set dev->rcec during
>>>> enumeration, and then use that to build pcie_walk_rcec()?
>>>
>>> I think follow what you are doing.
>>>
>>> As we enumerate an RCEC, use the time to discover RCiEPs and
>>> associate
>>> each RCiEP's dev->rcec. Although BIOS already set the bitmap for 
>>> this
>>> specific RCEC, it's more efficient to simply discover the devices
>>> through the bus walk and verify each one found against the bitmap.
>>>
>>> Further, while we can be certain that an RCiEP found with a matching
>>> device no. in a bitmap for an associated RCEC is correct, we cannot
>>> be
>>> certain that any RCiEP found on another bus range is correct unless
>>> we
>>> verify the bus is within that next/last bus range.
>>>
>>> Finally, that's where find_rcec() callback for rcec_assoc_rciep()
>>> does
>>> double duty by also checking on the "on-a-separate-bus" case 
>>> captured
>>> potentially by find_rcec() during an RCiEP's bus walk.
>>>
>>>
>>>>   bool rcec_assoc_rciep(rcec, rciep)
>>>>   {
>>>>     if (rcec->bus == rciep->bus)
>>>>       return (rcec->bitmap contains rciep->devfn);
>>>>
>>>>     return (rcec->next/last contains rciep->bus);
>>>>   }
>>>>
>>>>   link_rcec(dev, data)
>>>>   {
>>>>     struct pci_dev *rcec = data;
>>>>
>>>>     if ((dev is RCiEP) && rcec_assoc_rciep(rcec, dev))
>>>>       dev->rcec = rcec;
>>>>   }
>>>>
>>>>   find_rcec(dev, data)
>>>>   {
>>>>     struct pci_dev *rciep = data;
>>>>
>>>>     if ((dev is RCEC) && rcec_assoc_rciep(dev, rciep))
>>>>       rciep->rcec = dev;
>>>>   }
>>>>
>>>>   pci_setup_device
>>>>     ...
>>
>> I just noticed your use of pci_setup_device(). Are you suggesting
>> moving the call to pci_rcec_init() out of pci_init_capabilities() and
>> move it into pci_setup_device()?  If so, would pci_rcec_exit() still
>> remain in pci_release_capabilities()?
>>
>> I'm just wondering if it could just remain in 
>> pci_init_capabilities().
>
> Yeah, I didn't mean in pci_setup_device() specifically, just somewhere
> in the callchain of pci_setup_device().  But you're right, it probably
> would make more sense in pci_init_capabilities(), so I *should* have
> said pci_scan_single_device() to be a little less specific.
>
> Bjorn

I’ve done some experimenting with this approach, and I think there may 
be a problem of just walking the busses during enumeration 
pci_init_capabilities(). One problem is where one has an RCEC on a root 
bus: 6a(00.4) and an RCiEP on another root bus: 6b(00.0).  They will 
never find each other in this approach through a normal pci_bus_walk() 
call using their respective root_bus.

>  +-[0000:6b]-+-00.0
>  |           +-00.1
>  |           +-00.2
>  |           \-00.3
>  +-[0000:6a]-+-00.0
>  |           +-00.1
>  |           +-00.2
>  |           \-00.4

While having a lot of slot calls per bus is unfortunate, unless I’m 
mistaken you would have to walk every peer root_bus with your RCiEP in 
this example until you hit on the right RCEC, unless of course you have 
a bitmap associated RCEC on dev->bus.

Conversely, if you are enumerating the above RCEC at 6a(00.4) and you 
attempt to link_rcec() through calls to pci_walk_bus(), the walk will 
still be limited to 6a and below; never encountering 6b(00.0).  So you 
would then need an additional walk for each of the associated bus 
ranges, excluding the same bus as the RCEC.

pci_init_capabilities()
…
pci_init_rcec() // Cached

if (RCEC)
  Walk the dev->bus for bitmap associated RCiEP
  Walk all associated bus ranges for RCiEP

else if (RCiEP)
  Walk the dev->bus for bitmap associated RCEC
  Walk all peer root_bus for RCEC, confirm if own dev->bus falls within 
discovered RCEC associated ranges

The other problem here is temporal. I’m wondering if we may be trying 
to find associated devices at the pci_init_capabilities() stage prior to 
them being fully enumerated, i.e., RCEC has not been cached but we are 
searching with a future associated RCiEP.  So one could encounter a race 
condition where one is checking on an RCiEP whose associated RCEC has 
not been enumerated yet.

So let’s say one throws out RCiEP entirely and just relies upon RCEC 
to find the associations because one knows that an encountered RCEC (in 
pci_init_capabilities()) has already been cached. In that case you end 
up with the original implementation being done with this patch series…

if (RCEC)
  Walk the dev->bus for bitmap associated RCiEP
  Walk all associated bus ranges for RCiEP

Perhaps I’ve muddled some things here but it doesn’t look like the 
twain will meet unless I cover multiple peer root_bus and even then you 
may have an issue because the devices don’t yet fully exist from the 
perspective of the OS.

Thanks,

Sean

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 04/10] PCI/RCEC: Add pcie_walk_rcec() to walk associated RCiEPs
  2020-09-11 23:16           ` Sean V Kelley
@ 2020-09-12  0:50             ` Bjorn Helgaas
  2020-09-14 16:55               ` Sean V Kelley
  0 siblings, 1 reply; 26+ messages in thread
From: Bjorn Helgaas @ 2020-09-12  0:50 UTC (permalink / raw)
  To: Sean V Kelley
  Cc: Zhuo, Qiuxu, Jonathan.Cameron, rjw, sathyanarayanan.kuppuswamy,
	Raj, Ashok, Luck, Tony, linux-pci, bhelgaas, linux-kernel

On Fri, Sep 11, 2020 at 04:16:03PM -0700, Sean V Kelley wrote:
> On 4 Sep 2020, at 19:23, Bjorn Helgaas wrote:
> > On Fri, Sep 04, 2020 at 10:18:30PM +0000, Kelley, Sean V wrote:
> > > Hi Bjorn,
> > > 
> > > Quick question below...
> > > 
> > > On Wed, 2020-09-02 at 14:55 -0700, Sean V Kelley wrote:
> > > > Hi Bjorn,
> > > > 
> > > > On Wed, 2020-09-02 at 14:00 -0500, Bjorn Helgaas wrote:
> > > > > On Wed, Aug 12, 2020 at 09:46:53AM -0700, Sean V Kelley wrote:
> > > > > > From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> > > > > > 
> > > > > > When an RCEC device signals error(s) to a CPU core, the CPU core
> > > > > > needs to walk all the RCiEPs associated with that RCEC to check
> > > > > > errors. So add the function pcie_walk_rcec() to walk all RCiEPs
> > > > > > associated with the RCEC device.
> > > > > > 
> > > > > > Co-developed-by: Sean V Kelley <sean.v.kelley@intel.com>
> > > > > > Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
> > > > > > Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> > > > > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > > > > ---
> > > > > >  drivers/pci/pci.h       |  4 +++
> > > > > >  drivers/pci/pcie/rcec.c | 76
> > > > > > +++++++++++++++++++++++++++++++++++++++++
> > > > > >  2 files changed, 80 insertions(+)
> > > > > > 
> > > > > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > > > > > index bd25e6047b54..8bd7528d6977 100644
> > > > > > --- a/drivers/pci/pci.h
> > > > > > +++ b/drivers/pci/pci.h
> > > > > > @@ -473,9 +473,13 @@ static inline void pci_dpc_init(struct
> > > > > > pci_dev
> > > > > > *pdev) {}
> > > > > >  #ifdef CONFIG_PCIEPORTBUS
> > > > > >  void pci_rcec_init(struct pci_dev *dev);
> > > > > >  void pci_rcec_exit(struct pci_dev *dev);
> > > > > > +void pcie_walk_rcec(struct pci_dev *rcec, int (*cb)(struct
> > > > > > pci_dev
> > > > > > *, void *),
> > > > > > +		    void *userdata);
> > > > > >  #else
> > > > > >  static inline void pci_rcec_init(struct pci_dev *dev) {}
> > > > > >  static inline void pci_rcec_exit(struct pci_dev *dev) {}
> > > > > > +static inline void pcie_walk_rcec(struct pci_dev *rcec, int
> > > > > > (*cb)(struct pci_dev *, void *),
> > > > > > +				  void *userdata) {}
> > > > > >  #endif
> > > > > > 
> > > > > >  #ifdef CONFIG_PCI_ATS
> > > > > > diff --git a/drivers/pci/pcie/rcec.c b/drivers/pci/pcie/rcec.c
> > > > > > index 519ae086ff41..405f92fcdf7f 100644
> > > > > > --- a/drivers/pci/pcie/rcec.c
> > > > > > +++ b/drivers/pci/pcie/rcec.c
> > > > > > @@ -17,6 +17,82 @@
> > > > > > 
> > > > > >  #include "../pci.h"
> > > > > > 
> > > > > > +static int pcie_walk_rciep_devfn(struct pci_bus *bus, int
> > > > > > (*cb)(struct pci_dev *, void *),
> > > > > > +				 void *userdata, const unsigned long
> > > > > > bitmap)
> > > > > > +{
> > > > > > +	unsigned int devn, fn;
> > > > > > +	struct pci_dev *dev;
> > > > > > +	int retval;
> > > > > > +
> > > > > > +	for_each_set_bit(devn, &bitmap, 32) {
> > > > > > +		for (fn = 0; fn < 8; fn++) {
> > > > > > +			dev = pci_get_slot(bus, PCI_DEVFN(devn, fn));
> > > > > 
> > > > > Wow, this is a lot of churning to call pci_get_slot() 256 times per
> > > > > bus for the "associated bus numbers" case where we pass a bitmap of
> > > > > 0xffffffff.  They didn't really make it easy for software when they
> > > > > added the next/last bus number thing.
> > > > > 
> > > > > Just thinking out loud here.  What if we could set dev->rcec during
> > > > > enumeration, and then use that to build pcie_walk_rcec()?
> > > > 
> > > > I think follow what you are doing.
> > > > 
> > > > As we enumerate an RCEC, use the time to discover RCiEPs and
> > > > associate
> > > > each RCiEP's dev->rcec. Although BIOS already set the bitmap for
> > > > this
> > > > specific RCEC, it's more efficient to simply discover the devices
> > > > through the bus walk and verify each one found against the bitmap.
> > > > 
> > > > Further, while we can be certain that an RCiEP found with a matching
> > > > device no. in a bitmap for an associated RCEC is correct, we cannot
> > > > be
> > > > certain that any RCiEP found on another bus range is correct unless
> > > > we
> > > > verify the bus is within that next/last bus range.
> > > > 
> > > > Finally, that's where find_rcec() callback for rcec_assoc_rciep()
> > > > does
> > > > double duty by also checking on the "on-a-separate-bus" case
> > > > captured
> > > > potentially by find_rcec() during an RCiEP's bus walk.
> > > > 
> > > > 
> > > > >   bool rcec_assoc_rciep(rcec, rciep)
> > > > >   {
> > > > >     if (rcec->bus == rciep->bus)
> > > > >       return (rcec->bitmap contains rciep->devfn);
> > > > > 
> > > > >     return (rcec->next/last contains rciep->bus);
> > > > >   }
> > > > > 
> > > > >   link_rcec(dev, data)
> > > > >   {
> > > > >     struct pci_dev *rcec = data;
> > > > > 
> > > > >     if ((dev is RCiEP) && rcec_assoc_rciep(rcec, dev))
> > > > >       dev->rcec = rcec;
> > > > >   }
> > > > > 
> > > > >   find_rcec(dev, data)
> > > > >   {
> > > > >     struct pci_dev *rciep = data;
> > > > > 
> > > > >     if ((dev is RCEC) && rcec_assoc_rciep(dev, rciep))
> > > > >       rciep->rcec = dev;
> > > > >   }
> > > > > 
> > > > >   pci_setup_device
> > > > >     ...
> > > 
> > > I just noticed your use of pci_setup_device(). Are you suggesting
> > > moving the call to pci_rcec_init() out of pci_init_capabilities() and
> > > move it into pci_setup_device()?  If so, would pci_rcec_exit() still
> > > remain in pci_release_capabilities()?
> > > 
> > > I'm just wondering if it could just remain in
> > > pci_init_capabilities().
> > 
> > Yeah, I didn't mean in pci_setup_device() specifically, just somewhere
> > in the callchain of pci_setup_device().  But you're right, it probably
> > would make more sense in pci_init_capabilities(), so I *should* have
> > said pci_scan_single_device() to be a little less specific.
> 
> I’ve done some experimenting with this approach, and I think there may be a
> problem of just walking the busses during enumeration
> pci_init_capabilities(). One problem is where one has an RCEC on a root bus:
> 6a(00.4) and an RCiEP on another root bus: 6b(00.0).  They will never find
> each other in this approach through a normal pci_bus_walk() call using their
> respective root_bus.
> 
> >  +-[0000:6b]-+-00.0
> >  |           +-00.1
> >  |           +-00.2
> >  |           \-00.3
> >  +-[0000:6a]-+-00.0
> >  |           +-00.1
> >  |           +-00.2
> >  |           \-00.4

Wow, is that even allowed?  

There's no bridge from 0000:6a to 0000:6b, so we will not scan 0000:6b
unless we find a host bridge with _CRS where 6b is the first bus
number below the bridge.  I think that means this would have to be
described in ACPI as two separate root bridges:

  ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 6a])
  ACPI: PCI Root Bridge [PCI1] (domain 0000 [bus 6b])

I *guess* maybe it's allowed by the PCIe spec to have an RCEC and
associated RCiEPs on separate root buses?  It seems awfully strange
and not in character for PCIe, but I guess I can't point to language
that prohibits it.

> While having a lot of slot calls per bus is unfortunate, unless I’m mistaken
> you would have to walk every peer root_bus with your RCiEP in this example
> until you hit on the right RCEC, unless of course you have a bitmap
> associated RCEC on dev->bus.

I really despise pci_find_bus(), pci_get_slot(), and related
functions, but maybe they can't be avoided.

I briefly hoped we could forget about connecting them at
enumeration-time and just build a list of RCECs in the host bridge.
Then when we handle an event for an RCiEP, we could search the list to
find the right RCEC (and potentially cache it then).  But I don't
think that would work either, because in the example above, they will
be under different host bridges.  I guess we could maybe have a global
(or maybe per-domain) list of RCECs.

> Conversely, if you are enumerating the above RCEC at 6a(00.4) and you
> attempt to link_rcec() through calls to pci_walk_bus(), the walk will still
> be limited to 6a and below; never encountering 6b(00.0).  So you would then
> need an additional walk for each of the associated bus ranges, excluding the
> same bus as the RCEC.
> 
> pci_init_capabilities()
> …
> pci_init_rcec() // Cached
> 
> if (RCEC)
>  Walk the dev->bus for bitmap associated RCiEP
>  Walk all associated bus ranges for RCiEP
> 
> else if (RCiEP)
>  Walk the dev->bus for bitmap associated RCEC
>  Walk all peer root_bus for RCEC, confirm if own dev->bus falls within
> discovered RCEC associated ranges
> 
> The other problem here is temporal. I’m wondering if we may be trying to
> find associated devices at the pci_init_capabilities() stage prior to them
> being fully enumerated, i.e., RCEC has not been cached but we are searching
> with a future associated RCiEP.  So one could encounter a race condition
> where one is checking on an RCiEP whose associated RCEC has not been
> enumerated yet.

Maybe I'm misunderstanding this problem, but I think my idea would
handle this: If we find the RCEC first, we cache its info and do
nothing else.  When we subsequently discover an RCiEP, we walk the
tree, find the RCEC, and connect them.

If we find an RCiEP first, we do nothing.  When we subsequently
discover an RCEC, we walk the tree, find any associated RCiEPs, and
connect them.

The discovery can happen in different orders based on the
bus/device/function numbers, but it's not really a race because
enumeration is single-threaded.  But if we're talking about two
separate host bridges (PNP0A03 devices), we *should* allow them to be
enumerated in parallel, even though we don't do that today.

I think this RCEC association design is really kind of problematic.

We don't really *need* the association until some RCiEP event (PME,
error, etc) occurs, so it's reasonable to defer making the RCEC
connection until then.  But it seems like things should be designed so
we're guaranteed to enumerate the RCEC before the RCiEPs.  Otherwise,
we don't really know when we can enable RCiEP events.  If we discover
an RCiEP first, enable error reporting, and an error occurs before we
find the RCEC, we're in a bit of a pickle.

> So let’s say one throws out RCiEP entirely and just relies upon RCEC to find
> the associations because one knows that an encountered RCEC (in
> pci_init_capabilities()) has already been cached. In that case you end up
> with the original implementation being done with this patch series…
> 
> if (RCEC)
>  Walk the dev->bus for bitmap associated RCiEP
>  Walk all associated bus ranges for RCiEP
> 
> Perhaps I’ve muddled some things here but it doesn’t look like the twain
> will meet unless I cover multiple peer root_bus and even then you may have
> an issue because the devices don’t yet fully exist from the perspective of
> the OS.
> 
> Thanks,
> 
> Sean

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 04/10] PCI/RCEC: Add pcie_walk_rcec() to walk associated RCiEPs
  2020-09-12  0:50             ` Bjorn Helgaas
@ 2020-09-14 16:55               ` Sean V Kelley
  2020-09-15 16:09                 ` Sean V Kelley
  2020-09-16 22:49                 ` Bjorn Helgaas
  0 siblings, 2 replies; 26+ messages in thread
From: Sean V Kelley @ 2020-09-14 16:55 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Zhuo, Qiuxu, Jonathan.Cameron, rjw, sathyanarayanan.kuppuswamy,
	Raj, Ashok, Luck, Tony, linux-pci, bhelgaas, linux-kernel

On 11 Sep 2020, at 17:50, Bjorn Helgaas wrote:

> On Fri, Sep 11, 2020 at 04:16:03PM -0700, Sean V Kelley wrote:
>> On 4 Sep 2020, at 19:23, Bjorn Helgaas wrote:
>>> On Fri, Sep 04, 2020 at 10:18:30PM +0000, Kelley, Sean V wrote:
>>>> Hi Bjorn,
>>>>
>>>> Quick question below...
>>>>
>>>> On Wed, 2020-09-02 at 14:55 -0700, Sean V Kelley wrote:
>>>>> Hi Bjorn,
>>>>>
>>>>> On Wed, 2020-09-02 at 14:00 -0500, Bjorn Helgaas wrote:
>>>>>> On Wed, Aug 12, 2020 at 09:46:53AM -0700, Sean V Kelley wrote:
>>>>>>> From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>>>>>>>
>>>>>>> When an RCEC device signals error(s) to a CPU core, the CPU core
>>>>>>> needs to walk all the RCiEPs associated with that RCEC to check
>>>>>>> errors. So add the function pcie_walk_rcec() to walk all RCiEPs
>>>>>>> associated with the RCEC device.
>>>>>>>
>>>>>>> Co-developed-by: Sean V Kelley <sean.v.kelley@intel.com>
>>>>>>> Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
>>>>>>> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>>>>>>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>>>>>> ---
>>>>>>>  drivers/pci/pci.h       |  4 +++
>>>>>>>  drivers/pci/pcie/rcec.c | 76
>>>>>>> +++++++++++++++++++++++++++++++++++++++++
>>>>>>>  2 files changed, 80 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>>>>>>> index bd25e6047b54..8bd7528d6977 100644
>>>>>>> --- a/drivers/pci/pci.h
>>>>>>> +++ b/drivers/pci/pci.h
>>>>>>> @@ -473,9 +473,13 @@ static inline void pci_dpc_init(struct
>>>>>>> pci_dev
>>>>>>> *pdev) {}
>>>>>>>  #ifdef CONFIG_PCIEPORTBUS
>>>>>>>  void pci_rcec_init(struct pci_dev *dev);
>>>>>>>  void pci_rcec_exit(struct pci_dev *dev);
>>>>>>> +void pcie_walk_rcec(struct pci_dev *rcec, int (*cb)(struct
>>>>>>> pci_dev
>>>>>>> *, void *),
>>>>>>> +		    void *userdata);
>>>>>>>  #else
>>>>>>>  static inline void pci_rcec_init(struct pci_dev *dev) {}
>>>>>>>  static inline void pci_rcec_exit(struct pci_dev *dev) {}
>>>>>>> +static inline void pcie_walk_rcec(struct pci_dev *rcec, int
>>>>>>> (*cb)(struct pci_dev *, void *),
>>>>>>> +				  void *userdata) {}
>>>>>>>  #endif
>>>>>>>
>>>>>>>  #ifdef CONFIG_PCI_ATS
>>>>>>> diff --git a/drivers/pci/pcie/rcec.c b/drivers/pci/pcie/rcec.c
>>>>>>> index 519ae086ff41..405f92fcdf7f 100644
>>>>>>> --- a/drivers/pci/pcie/rcec.c
>>>>>>> +++ b/drivers/pci/pcie/rcec.c
>>>>>>> @@ -17,6 +17,82 @@
>>>>>>>
>>>>>>>  #include "../pci.h"
>>>>>>>
>>>>>>> +static int pcie_walk_rciep_devfn(struct pci_bus *bus, int
>>>>>>> (*cb)(struct pci_dev *, void *),
>>>>>>> +				 void *userdata, const unsigned long
>>>>>>> bitmap)
>>>>>>> +{
>>>>>>> +	unsigned int devn, fn;
>>>>>>> +	struct pci_dev *dev;
>>>>>>> +	int retval;
>>>>>>> +
>>>>>>> +	for_each_set_bit(devn, &bitmap, 32) {
>>>>>>> +		for (fn = 0; fn < 8; fn++) {
>>>>>>> +			dev = pci_get_slot(bus, PCI_DEVFN(devn, fn));
>>>>>>
>>>>>> Wow, this is a lot of churning to call pci_get_slot() 256 times 
>>>>>> per
>>>>>> bus for the "associated bus numbers" case where we pass a bitmap 
>>>>>> of
>>>>>> 0xffffffff.  They didn't really make it easy for software when 
>>>>>> they
>>>>>> added the next/last bus number thing.
>>>>>>
>>>>>> Just thinking out loud here.  What if we could set dev->rcec 
>>>>>> during
>>>>>> enumeration, and then use that to build pcie_walk_rcec()?
>>>>>
>>>>> I think follow what you are doing.
>>>>>
>>>>> As we enumerate an RCEC, use the time to discover RCiEPs and
>>>>> associate
>>>>> each RCiEP's dev->rcec. Although BIOS already set the bitmap for
>>>>> this
>>>>> specific RCEC, it's more efficient to simply discover the devices
>>>>> through the bus walk and verify each one found against the bitmap.
>>>>>
>>>>> Further, while we can be certain that an RCiEP found with a 
>>>>> matching
>>>>> device no. in a bitmap for an associated RCEC is correct, we 
>>>>> cannot
>>>>> be
>>>>> certain that any RCiEP found on another bus range is correct 
>>>>> unless
>>>>> we
>>>>> verify the bus is within that next/last bus range.
>>>>>
>>>>> Finally, that's where find_rcec() callback for rcec_assoc_rciep()
>>>>> does
>>>>> double duty by also checking on the "on-a-separate-bus" case
>>>>> captured
>>>>> potentially by find_rcec() during an RCiEP's bus walk.
>>>>>
>>>>>
>>>>>>   bool rcec_assoc_rciep(rcec, rciep)
>>>>>>   {
>>>>>>     if (rcec->bus == rciep->bus)
>>>>>>       return (rcec->bitmap contains rciep->devfn);
>>>>>>
>>>>>>     return (rcec->next/last contains rciep->bus);
>>>>>>   }
>>>>>>
>>>>>>   link_rcec(dev, data)
>>>>>>   {
>>>>>>     struct pci_dev *rcec = data;
>>>>>>
>>>>>>     if ((dev is RCiEP) && rcec_assoc_rciep(rcec, dev))
>>>>>>       dev->rcec = rcec;
>>>>>>   }
>>>>>>
>>>>>>   find_rcec(dev, data)
>>>>>>   {
>>>>>>     struct pci_dev *rciep = data;
>>>>>>
>>>>>>     if ((dev is RCEC) && rcec_assoc_rciep(dev, rciep))
>>>>>>       rciep->rcec = dev;
>>>>>>   }
>>>>>>
>>>>>>   pci_setup_device
>>>>>>     ...
>>>>
>>>> I just noticed your use of pci_setup_device(). Are you suggesting
>>>> moving the call to pci_rcec_init() out of pci_init_capabilities() 
>>>> and
>>>> move it into pci_setup_device()?  If so, would pci_rcec_exit() 
>>>> still
>>>> remain in pci_release_capabilities()?
>>>>
>>>> I'm just wondering if it could just remain in
>>>> pci_init_capabilities().
>>>
>>> Yeah, I didn't mean in pci_setup_device() specifically, just 
>>> somewhere
>>> in the callchain of pci_setup_device().  But you're right, it 
>>> probably
>>> would make more sense in pci_init_capabilities(), so I *should* have
>>> said pci_scan_single_device() to be a little less specific.
>>
>> I’ve done some experimenting with this approach, and I think there 
>> may be a
>> problem of just walking the busses during enumeration
>> pci_init_capabilities(). One problem is where one has an RCEC on a 
>> root bus:
>> 6a(00.4) and an RCiEP on another root bus: 6b(00.0).  They will never 
>> find
>> each other in this approach through a normal pci_bus_walk() call 
>> using their
>> respective root_bus.
>>
>>>  +-[0000:6b]-+-00.0
>>>  |           +-00.1
>>>  |           +-00.2
>>>  |           \-00.3
>>>  +-[0000:6a]-+-00.0
>>>  |           +-00.1
>>>  |           +-00.2
>>>  |           \-00.4
>
> Wow, is that even allowed?
>
> There's no bridge from 0000:6a to 0000:6b, so we will not scan 0000:6b
> unless we find a host bridge with _CRS where 6b is the first bus
> number below the bridge.  I think that means this would have to be
> described in ACPI as two separate root bridges:
>
>   ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 6a])
>   ACPI: PCI Root Bridge [PCI1] (domain 0000 [bus 6b])

Otherwise, the RCEC Associated Endpoint Extended Capabilities would have 
to have explicitly mentioned a bridge?

>
> I *guess* maybe it's allowed by the PCIe spec to have an RCEC and
> associated RCiEPs on separate root buses?  It seems awfully strange
> and not in character for PCIe, but I guess I can't point to language
> that prohibits it.

Yes, it should be possible.

>
>> While having a lot of slot calls per bus is unfortunate, unless I’m 
>> mistaken
>> you would have to walk every peer root_bus with your RCiEP in this 
>> example
>> until you hit on the right RCEC, unless of course you have a bitmap
>> associated RCEC on dev->bus.
>
> I really despise pci_find_bus(), pci_get_slot(), and related
> functions, but maybe they can't be avoided.
>
> I briefly hoped we could forget about connecting them at
> enumeration-time and just build a list of RCECs in the host bridge.
> Then when we handle an event for an RCiEP, we could search the list to
> find the right RCEC (and potentially cache it then).  But I don't
> think that would work either, because in the example above, they will
> be under different host bridges.  I guess we could maybe have a global
> (or maybe per-domain) list of RCECs.

Right, we could just have a list and only need to search the list when 
we need to handle an event.

>
>> Conversely, if you are enumerating the above RCEC at 6a(00.4) and you
>> attempt to link_rcec() through calls to pci_walk_bus(), the walk will 
>> still
>> be limited to 6a and below; never encountering 6b(00.0).  So you 
>> would then
>> need an additional walk for each of the associated bus ranges, 
>> excluding the
>> same bus as the RCEC.
>>
>> pci_init_capabilities()
>> …
>> pci_init_rcec() // Cached
>>
>> if (RCEC)
>>  Walk the dev->bus for bitmap associated RCiEP
>>  Walk all associated bus ranges for RCiEP
>>
>> else if (RCiEP)
>>  Walk the dev->bus for bitmap associated RCEC
>>  Walk all peer root_bus for RCEC, confirm if own dev->bus falls 
>> within
>> discovered RCEC associated ranges
>>
>> The other problem here is temporal. I’m wondering if we may be 
>> trying to
>> find associated devices at the pci_init_capabilities() stage prior to 
>> them
>> being fully enumerated, i.e., RCEC has not been cached but we are 
>> searching
>> with a future associated RCiEP.  So one could encounter a race 
>> condition
>> where one is checking on an RCiEP whose associated RCEC has not been
>> enumerated yet.
>
> Maybe I'm misunderstanding this problem, but I think my idea would
> handle this: If we find the RCEC first, we cache its info and do
> nothing else.  When we subsequently discover an RCiEP, we walk the
> tree, find the RCEC, and connect them.
>
> If we find an RCiEP first, we do nothing.  When we subsequently
> discover an RCEC, we walk the tree, find any associated RCiEPs, and
> connect them.


Your approach makes sense. In retrospect, I don’t think there can be a 
race condition here because of the fallback from one RCEC to the other 
when they enumerate. You have two chances of finding the relationship.

>
> The discovery can happen in different orders based on the
> bus/device/function numbers, but it's not really a race because
> enumeration is single-threaded.  But if we're talking about two
> separate host bridges (PNP0A03 devices), we *should* allow them to be
> enumerated in parallel, even though we don't do that today.
>
> I think this RCEC association design is really kind of problematic.
>
> We don't really *need* the association until some RCiEP event (PME,
> error, etc) occurs, so it's reasonable to defer making the RCEC
> connection until then.  But it seems like things should be designed so
> we're guaranteed to enumerate the RCEC before the RCiEPs.  Otherwise,
> we don't really know when we can enable RCiEP events.  If we discover
> an RCiEP first, enable error reporting, and an error occurs before we
> find the RCEC, we're in a bit of a pickle.

So we could have a global list (RCiEP or RCEC) in which we wait to 
identify the associations only when we need to respond to events. Or in 
your original suggestion we could walk the RCEC bus ranges in addition 
to its root_bus.

i.e., We bus walk an enumerating RCEC with link_rcec() callback for not 
only the root_bus but each bus number in the associated ranges if the 
extended cap exists.

RCiEPs will simply continue to invoke their callback via find_rcec() and 
only check on their own bus for the encountered RCEC’s associated 
bitmap case.


Sean

>
>> So let’s say one throws out RCiEP entirely and just relies upon 
>> RCEC to find
>> the associations because one knows that an encountered RCEC (in
>> pci_init_capabilities()) has already been cached. In that case you 
>> end up
>> with the original implementation being done with this patch series…
>>
>> if (RCEC)
>>  Walk the dev->bus for bitmap associated RCiEP
>>  Walk all associated bus ranges for RCiEP
>>
>> Perhaps I’ve muddled some things here but it doesn’t look like 
>> the twain
>> will meet unless I cover multiple peer root_bus and even then you may 
>> have
>> an issue because the devices don’t yet fully exist from the 
>> perspective of
>> the OS.
>>
>> Thanks,
>>
>> Sean

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 04/10] PCI/RCEC: Add pcie_walk_rcec() to walk associated RCiEPs
  2020-09-14 16:55               ` Sean V Kelley
@ 2020-09-15 16:09                 ` Sean V Kelley
  2020-09-15 17:47                   ` Sean V Kelley
  2020-09-16 23:06                   ` Bjorn Helgaas
  2020-09-16 22:49                 ` Bjorn Helgaas
  1 sibling, 2 replies; 26+ messages in thread
From: Sean V Kelley @ 2020-09-15 16:09 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Zhuo, Qiuxu, Jonathan.Cameron, rjw, sathyanarayanan.kuppuswamy,
	Raj, Ashok, Luck, Tony, linux-pci, bhelgaas, linux-kernel

On 14 Sep 2020, at 9:55, Sean V Kelley wrote:

> On 11 Sep 2020, at 17:50, Bjorn Helgaas wrote:
>
>> On Fri, Sep 11, 2020 at 04:16:03PM -0700, Sean V Kelley wrote:
>>> On 4 Sep 2020, at 19:23, Bjorn Helgaas wrote:
>>>> On Fri, Sep 04, 2020 at 10:18:30PM +0000, Kelley, Sean V wrote:
>>>>> Hi Bjorn,
>>>>>
>>>>> Quick question below...
>>>>>
>>>>> On Wed, 2020-09-02 at 14:55 -0700, Sean V Kelley wrote:
>>>>>> Hi Bjorn,
>>>>>>
>>>>>> On Wed, 2020-09-02 at 14:00 -0500, Bjorn Helgaas wrote:
>>>>>>> On Wed, Aug 12, 2020 at 09:46:53AM -0700, Sean V Kelley wrote:
>>>>>>>> From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>>>>>>>>
>>>>>>>> When an RCEC device signals error(s) to a CPU core, the CPU 
>>>>>>>> core
>>>>>>>> needs to walk all the RCiEPs associated with that RCEC to check
>>>>>>>> errors. So add the function pcie_walk_rcec() to walk all RCiEPs
>>>>>>>> associated with the RCEC device.
>>>>>>>>
>>>>>>>> Co-developed-by: Sean V Kelley <sean.v.kelley@intel.com>
>>>>>>>> Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
>>>>>>>> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>>>>>>>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>>>>>>> ---
>>>>>>>>  drivers/pci/pci.h       |  4 +++
>>>>>>>>  drivers/pci/pcie/rcec.c | 76
>>>>>>>> +++++++++++++++++++++++++++++++++++++++++
>>>>>>>>  2 files changed, 80 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>>>>>>>> index bd25e6047b54..8bd7528d6977 100644
>>>>>>>> --- a/drivers/pci/pci.h
>>>>>>>> +++ b/drivers/pci/pci.h
>>>>>>>> @@ -473,9 +473,13 @@ static inline void pci_dpc_init(struct
>>>>>>>> pci_dev
>>>>>>>> *pdev) {}
>>>>>>>>  #ifdef CONFIG_PCIEPORTBUS
>>>>>>>>  void pci_rcec_init(struct pci_dev *dev);
>>>>>>>>  void pci_rcec_exit(struct pci_dev *dev);
>>>>>>>> +void pcie_walk_rcec(struct pci_dev *rcec, int (*cb)(struct
>>>>>>>> pci_dev
>>>>>>>> *, void *),
>>>>>>>> +		    void *userdata);
>>>>>>>>  #else
>>>>>>>>  static inline void pci_rcec_init(struct pci_dev *dev) {}
>>>>>>>>  static inline void pci_rcec_exit(struct pci_dev *dev) {}
>>>>>>>> +static inline void pcie_walk_rcec(struct pci_dev *rcec, int
>>>>>>>> (*cb)(struct pci_dev *, void *),
>>>>>>>> +				  void *userdata) {}
>>>>>>>>  #endif
>>>>>>>>
>>>>>>>>  #ifdef CONFIG_PCI_ATS
>>>>>>>> diff --git a/drivers/pci/pcie/rcec.c b/drivers/pci/pcie/rcec.c
>>>>>>>> index 519ae086ff41..405f92fcdf7f 100644
>>>>>>>> --- a/drivers/pci/pcie/rcec.c
>>>>>>>> +++ b/drivers/pci/pcie/rcec.c
>>>>>>>> @@ -17,6 +17,82 @@
>>>>>>>>
>>>>>>>>  #include "../pci.h"
>>>>>>>>
>>>>>>>> +static int pcie_walk_rciep_devfn(struct pci_bus *bus, int
>>>>>>>> (*cb)(struct pci_dev *, void *),
>>>>>>>> +				 void *userdata, const unsigned long
>>>>>>>> bitmap)
>>>>>>>> +{
>>>>>>>> +	unsigned int devn, fn;
>>>>>>>> +	struct pci_dev *dev;
>>>>>>>> +	int retval;
>>>>>>>> +
>>>>>>>> +	for_each_set_bit(devn, &bitmap, 32) {
>>>>>>>> +		for (fn = 0; fn < 8; fn++) {
>>>>>>>> +			dev = pci_get_slot(bus, PCI_DEVFN(devn, fn));
>>>>>>>
>>>>>>> Wow, this is a lot of churning to call pci_get_slot() 256 times 
>>>>>>> per
>>>>>>> bus for the "associated bus numbers" case where we pass a bitmap 
>>>>>>> of
>>>>>>> 0xffffffff.  They didn't really make it easy for software when 
>>>>>>> they
>>>>>>> added the next/last bus number thing.
>>>>>>>
>>>>>>> Just thinking out loud here.  What if we could set dev->rcec 
>>>>>>> during
>>>>>>> enumeration, and then use that to build pcie_walk_rcec()?
>>>>>>
>>>>>> I think follow what you are doing.
>>>>>>
>>>>>> As we enumerate an RCEC, use the time to discover RCiEPs and
>>>>>> associate
>>>>>> each RCiEP's dev->rcec. Although BIOS already set the bitmap for
>>>>>> this
>>>>>> specific RCEC, it's more efficient to simply discover the devices
>>>>>> through the bus walk and verify each one found against the 
>>>>>> bitmap.
>>>>>>
>>>>>> Further, while we can be certain that an RCiEP found with a 
>>>>>> matching
>>>>>> device no. in a bitmap for an associated RCEC is correct, we 
>>>>>> cannot
>>>>>> be
>>>>>> certain that any RCiEP found on another bus range is correct 
>>>>>> unless
>>>>>> we
>>>>>> verify the bus is within that next/last bus range.
>>>>>>
>>>>>> Finally, that's where find_rcec() callback for rcec_assoc_rciep()
>>>>>> does
>>>>>> double duty by also checking on the "on-a-separate-bus" case
>>>>>> captured
>>>>>> potentially by find_rcec() during an RCiEP's bus walk.
>>>>>>
>>>>>>
>>>>>>>   bool rcec_assoc_rciep(rcec, rciep)
>>>>>>>   {
>>>>>>>     if (rcec->bus == rciep->bus)
>>>>>>>       return (rcec->bitmap contains rciep->devfn);
>>>>>>>
>>>>>>>     return (rcec->next/last contains rciep->bus);
>>>>>>>   }
>>>>>>>
>>>>>>>   link_rcec(dev, data)
>>>>>>>   {
>>>>>>>     struct pci_dev *rcec = data;
>>>>>>>
>>>>>>>     if ((dev is RCiEP) && rcec_assoc_rciep(rcec, dev))
>>>>>>>       dev->rcec = rcec;
>>>>>>>   }
>>>>>>>
>>>>>>>   find_rcec(dev, data)
>>>>>>>   {
>>>>>>>     struct pci_dev *rciep = data;
>>>>>>>
>>>>>>>     if ((dev is RCEC) && rcec_assoc_rciep(dev, rciep))
>>>>>>>       rciep->rcec = dev;
>>>>>>>   }
>>>>>>>
>>>>>>>   pci_setup_device
>>>>>>>     ...
>>>>>
>>>>> I just noticed your use of pci_setup_device(). Are you suggesting
>>>>> moving the call to pci_rcec_init() out of pci_init_capabilities() 
>>>>> and
>>>>> move it into pci_setup_device()?  If so, would pci_rcec_exit() 
>>>>> still
>>>>> remain in pci_release_capabilities()?
>>>>>
>>>>> I'm just wondering if it could just remain in
>>>>> pci_init_capabilities().
>>>>
>>>> Yeah, I didn't mean in pci_setup_device() specifically, just 
>>>> somewhere
>>>> in the callchain of pci_setup_device().  But you're right, it 
>>>> probably
>>>> would make more sense in pci_init_capabilities(), so I *should* 
>>>> have
>>>> said pci_scan_single_device() to be a little less specific.
>>>
>>> I’ve done some experimenting with this approach, and I think there 
>>> may be a
>>> problem of just walking the busses during enumeration
>>> pci_init_capabilities(). One problem is where one has an RCEC on a 
>>> root bus:
>>> 6a(00.4) and an RCiEP on another root bus: 6b(00.0).  They will 
>>> never find
>>> each other in this approach through a normal pci_bus_walk() call 
>>> using their
>>> respective root_bus.
>>>
>>>>  +-[0000:6b]-+-00.0
>>>>  |           +-00.1
>>>>  |           +-00.2
>>>>  |           \-00.3
>>>>  +-[0000:6a]-+-00.0
>>>>  |           +-00.1
>>>>  |           +-00.2
>>>>  |           \-00.4
>>
>> Wow, is that even allowed?
>>
>> There's no bridge from 0000:6a to 0000:6b, so we will not scan 
>> 0000:6b
>> unless we find a host bridge with _CRS where 6b is the first bus
>> number below the bridge.  I think that means this would have to be
>> described in ACPI as two separate root bridges:
>>
>>   ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 6a])
>>   ACPI: PCI Root Bridge [PCI1] (domain 0000 [bus 6b])
>
> Otherwise, the RCEC Associated Endpoint Extended Capabilities would 
> have to have explicitly mentioned a bridge?
>
>>
>> I *guess* maybe it's allowed by the PCIe spec to have an RCEC and
>> associated RCiEPs on separate root buses?  It seems awfully strange
>> and not in character for PCIe, but I guess I can't point to language
>> that prohibits it.
>
> Yes, it should be possible.
>
>>
>>> While having a lot of slot calls per bus is unfortunate, unless 
>>> I’m mistaken
>>> you would have to walk every peer root_bus with your RCiEP in this 
>>> example
>>> until you hit on the right RCEC, unless of course you have a bitmap
>>> associated RCEC on dev->bus.
>>
>> I really despise pci_find_bus(), pci_get_slot(), and related
>> functions, but maybe they can't be avoided.
>>
>> I briefly hoped we could forget about connecting them at
>> enumeration-time and just build a list of RCECs in the host bridge.
>> Then when we handle an event for an RCiEP, we could search the list 
>> to
>> find the right RCEC (and potentially cache it then).  But I don't
>> think that would work either, because in the example above, they will
>> be under different host bridges.  I guess we could maybe have a 
>> global
>> (or maybe per-domain) list of RCECs.
>
> Right, we could just have a list and only need to search the list when 
> we need to handle an event.
>
>>
>>> Conversely, if you are enumerating the above RCEC at 6a(00.4) and 
>>> you
>>> attempt to link_rcec() through calls to pci_walk_bus(), the walk 
>>> will still
>>> be limited to 6a and below; never encountering 6b(00.0).  So you 
>>> would then
>>> need an additional walk for each of the associated bus ranges, 
>>> excluding the
>>> same bus as the RCEC.
>>>
>>> pci_init_capabilities()
>>> …
>>> pci_init_rcec() // Cached
>>>
>>> if (RCEC)
>>>  Walk the dev->bus for bitmap associated RCiEP
>>>  Walk all associated bus ranges for RCiEP
>>>
>>> else if (RCiEP)
>>>  Walk the dev->bus for bitmap associated RCEC
>>>  Walk all peer root_bus for RCEC, confirm if own dev->bus falls 
>>> within
>>> discovered RCEC associated ranges
>>>
>>> The other problem here is temporal. I’m wondering if we may be 
>>> trying to
>>> find associated devices at the pci_init_capabilities() stage prior 
>>> to them
>>> being fully enumerated, i.e., RCEC has not been cached but we are 
>>> searching
>>> with a future associated RCiEP.  So one could encounter a race 
>>> condition
>>> where one is checking on an RCiEP whose associated RCEC has not been
>>> enumerated yet.
>>
>> Maybe I'm misunderstanding this problem, but I think my idea would
>> handle this: If we find the RCEC first, we cache its info and do
>> nothing else.  When we subsequently discover an RCiEP, we walk the
>> tree, find the RCEC, and connect them.
>>
>> If we find an RCiEP first, we do nothing.  When we subsequently
>> discover an RCEC, we walk the tree, find any associated RCiEPs, and
>> connect them.
>
>
> Your approach makes sense. In retrospect, I don’t think there can be 
> a race condition here because of the fallback from one RCEC to the 
> other when they enumerate. You have two chances of finding the 
> relationship.
>
>>
>> The discovery can happen in different orders based on the
>> bus/device/function numbers, but it's not really a race because
>> enumeration is single-threaded.  But if we're talking about two
>> separate host bridges (PNP0A03 devices), we *should* allow them to be
>> enumerated in parallel, even though we don't do that today.
>>
>> I think this RCEC association design is really kind of problematic.
>>
>> We don't really *need* the association until some RCiEP event (PME,
>> error, etc) occurs, so it's reasonable to defer making the RCEC
>> connection until then.  But it seems like things should be designed 
>> so
>> we're guaranteed to enumerate the RCEC before the RCiEPs.  Otherwise,
>> we don't really know when we can enable RCiEP events.  If we discover
>> an RCiEP first, enable error reporting, and an error occurs before we
>> find the RCEC, we're in a bit of a pickle.
>
> So we could have a global list (RCiEP or RCEC) in which we wait to 
> identify the associations only when we need to respond to events. Or 
> in your original suggestion we could walk the RCEC bus ranges in 
> addition to its root_bus.
>
> i.e., We bus walk an enumerating RCEC with link_rcec() callback for 
> not only the root_bus but each bus number in the associated ranges if 
> the extended cap exists.
>
> RCiEPs will simply continue to invoke their callback via find_rcec() 
> and only check on their own bus for the encountered RCEC’s 
> associated bitmap case.

Walking the bus with an RCEC as it is probed in the portdrv_pci.c can be 
done with both its own bus (bitmap) and with supported associated bus 
ranges.  In that walk I’m able to find all the associated endpoints 
via both bitmap on own bus and the bus ranges. No pci_get_slot() is 
needed as per your original suggestion.

The suggsted approach to the rcec_helper() seems to imply that we either 
walk it again or have cached all the associated RCiEP.  When we do the 
walk above, we are merely, finding the RCiEPs and linking them to the 
RCEC’s structure. There is no caching done of a list of RCiEPs per 
RCEC.

i.e., in aer.c : set_downstream_devices_error_reporting() wants to 
enable each associated RCiEP via pcie_walk_rcec().

Looking into changes to the pice_walk_rcec()

Sean

>
>
> Sean
>
>>
>>> So let’s say one throws out RCiEP entirely and just relies upon 
>>> RCEC to find
>>> the associations because one knows that an encountered RCEC (in
>>> pci_init_capabilities()) has already been cached. In that case you 
>>> end up
>>> with the original implementation being done with this patch 
>>> series…
>>>
>>> if (RCEC)
>>>  Walk the dev->bus for bitmap associated RCiEP
>>>  Walk all associated bus ranges for RCiEP
>>>
>>> Perhaps I’ve muddled some things here but it doesn’t look like 
>>> the twain
>>> will meet unless I cover multiple peer root_bus and even then you 
>>> may have
>>> an issue because the devices don’t yet fully exist from the 
>>> perspective of
>>> the OS.
>>>
>>> Thanks,
>>>
>>> Sean

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 04/10] PCI/RCEC: Add pcie_walk_rcec() to walk associated RCiEPs
  2020-09-15 16:09                 ` Sean V Kelley
@ 2020-09-15 17:47                   ` Sean V Kelley
  2020-09-16 23:06                   ` Bjorn Helgaas
  1 sibling, 0 replies; 26+ messages in thread
From: Sean V Kelley @ 2020-09-15 17:47 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Zhuo, Qiuxu, Jonathan.Cameron, rjw, sathyanarayanan.kuppuswamy,
	Raj, Ashok, Luck, Tony, linux-pci, bhelgaas, linux-kernel

On 15 Sep 2020, at 9:09, Sean V Kelley wrote:

> On 14 Sep 2020, at 9:55, Sean V Kelley wrote:
>
>> On 11 Sep 2020, at 17:50, Bjorn Helgaas wrote:
>>
>>> On Fri, Sep 11, 2020 at 04:16:03PM -0700, Sean V Kelley wrote:
>>>> On 4 Sep 2020, at 19:23, Bjorn Helgaas wrote:
>>>>> On Fri, Sep 04, 2020 at 10:18:30PM +0000, Kelley, Sean V wrote:
>>>>>> Hi Bjorn,
>>>>>>
>>>>>> Quick question below...
>>>>>>
>>>>>> On Wed, 2020-09-02 at 14:55 -0700, Sean V Kelley wrote:
>>>>>>> Hi Bjorn,
>>>>>>>
>>>>>>> On Wed, 2020-09-02 at 14:00 -0500, Bjorn Helgaas wrote:
>>>>>>>> On Wed, Aug 12, 2020 at 09:46:53AM -0700, Sean V Kelley wrote:
>>>>>>>>> From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>>>>>>>>>
>>>>>>>>> When an RCEC device signals error(s) to a CPU core, the CPU 
>>>>>>>>> core
>>>>>>>>> needs to walk all the RCiEPs associated with that RCEC to 
>>>>>>>>> check
>>>>>>>>> errors. So add the function pcie_walk_rcec() to walk all 
>>>>>>>>> RCiEPs
>>>>>>>>> associated with the RCEC device.
>>>>>>>>>
>>>>>>>>> Co-developed-by: Sean V Kelley <sean.v.kelley@intel.com>
>>>>>>>>> Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
>>>>>>>>> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>>>>>>>>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>>>>>>>> ---
>>>>>>>>>  drivers/pci/pci.h       |  4 +++
>>>>>>>>>  drivers/pci/pcie/rcec.c | 76
>>>>>>>>> +++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>  2 files changed, 80 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
>>>>>>>>> index bd25e6047b54..8bd7528d6977 100644
>>>>>>>>> --- a/drivers/pci/pci.h
>>>>>>>>> +++ b/drivers/pci/pci.h
>>>>>>>>> @@ -473,9 +473,13 @@ static inline void pci_dpc_init(struct
>>>>>>>>> pci_dev
>>>>>>>>> *pdev) {}
>>>>>>>>>  #ifdef CONFIG_PCIEPORTBUS
>>>>>>>>>  void pci_rcec_init(struct pci_dev *dev);
>>>>>>>>>  void pci_rcec_exit(struct pci_dev *dev);
>>>>>>>>> +void pcie_walk_rcec(struct pci_dev *rcec, int (*cb)(struct
>>>>>>>>> pci_dev
>>>>>>>>> *, void *),
>>>>>>>>> +		    void *userdata);
>>>>>>>>>  #else
>>>>>>>>>  static inline void pci_rcec_init(struct pci_dev *dev) {}
>>>>>>>>>  static inline void pci_rcec_exit(struct pci_dev *dev) {}
>>>>>>>>> +static inline void pcie_walk_rcec(struct pci_dev *rcec, int
>>>>>>>>> (*cb)(struct pci_dev *, void *),
>>>>>>>>> +				  void *userdata) {}
>>>>>>>>>  #endif
>>>>>>>>>
>>>>>>>>>  #ifdef CONFIG_PCI_ATS
>>>>>>>>> diff --git a/drivers/pci/pcie/rcec.c b/drivers/pci/pcie/rcec.c
>>>>>>>>> index 519ae086ff41..405f92fcdf7f 100644
>>>>>>>>> --- a/drivers/pci/pcie/rcec.c
>>>>>>>>> +++ b/drivers/pci/pcie/rcec.c
>>>>>>>>> @@ -17,6 +17,82 @@
>>>>>>>>>
>>>>>>>>>  #include "../pci.h"
>>>>>>>>>
>>>>>>>>> +static int pcie_walk_rciep_devfn(struct pci_bus *bus, int
>>>>>>>>> (*cb)(struct pci_dev *, void *),
>>>>>>>>> +				 void *userdata, const unsigned long
>>>>>>>>> bitmap)
>>>>>>>>> +{
>>>>>>>>> +	unsigned int devn, fn;
>>>>>>>>> +	struct pci_dev *dev;
>>>>>>>>> +	int retval;
>>>>>>>>> +
>>>>>>>>> +	for_each_set_bit(devn, &bitmap, 32) {
>>>>>>>>> +		for (fn = 0; fn < 8; fn++) {
>>>>>>>>> +			dev = pci_get_slot(bus, PCI_DEVFN(devn, fn));
>>>>>>>>
>>>>>>>> Wow, this is a lot of churning to call pci_get_slot() 256 times 
>>>>>>>> per
>>>>>>>> bus for the "associated bus numbers" case where we pass a 
>>>>>>>> bitmap of
>>>>>>>> 0xffffffff.  They didn't really make it easy for software when 
>>>>>>>> they
>>>>>>>> added the next/last bus number thing.
>>>>>>>>
>>>>>>>> Just thinking out loud here.  What if we could set dev->rcec 
>>>>>>>> during
>>>>>>>> enumeration, and then use that to build pcie_walk_rcec()?
>>>>>>>
>>>>>>> I think follow what you are doing.
>>>>>>>
>>>>>>> As we enumerate an RCEC, use the time to discover RCiEPs and
>>>>>>> associate
>>>>>>> each RCiEP's dev->rcec. Although BIOS already set the bitmap for
>>>>>>> this
>>>>>>> specific RCEC, it's more efficient to simply discover the 
>>>>>>> devices
>>>>>>> through the bus walk and verify each one found against the 
>>>>>>> bitmap.
>>>>>>>
>>>>>>> Further, while we can be certain that an RCiEP found with a 
>>>>>>> matching
>>>>>>> device no. in a bitmap for an associated RCEC is correct, we 
>>>>>>> cannot
>>>>>>> be
>>>>>>> certain that any RCiEP found on another bus range is correct 
>>>>>>> unless
>>>>>>> we
>>>>>>> verify the bus is within that next/last bus range.
>>>>>>>
>>>>>>> Finally, that's where find_rcec() callback for 
>>>>>>> rcec_assoc_rciep()
>>>>>>> does
>>>>>>> double duty by also checking on the "on-a-separate-bus" case
>>>>>>> captured
>>>>>>> potentially by find_rcec() during an RCiEP's bus walk.
>>>>>>>
>>>>>>>
>>>>>>>>   bool rcec_assoc_rciep(rcec, rciep)
>>>>>>>>   {
>>>>>>>>     if (rcec->bus == rciep->bus)
>>>>>>>>       return (rcec->bitmap contains rciep->devfn);
>>>>>>>>
>>>>>>>>     return (rcec->next/last contains rciep->bus);
>>>>>>>>   }
>>>>>>>>
>>>>>>>>   link_rcec(dev, data)
>>>>>>>>   {
>>>>>>>>     struct pci_dev *rcec = data;
>>>>>>>>
>>>>>>>>     if ((dev is RCiEP) && rcec_assoc_rciep(rcec, dev))
>>>>>>>>       dev->rcec = rcec;
>>>>>>>>   }
>>>>>>>>
>>>>>>>>   find_rcec(dev, data)
>>>>>>>>   {
>>>>>>>>     struct pci_dev *rciep = data;
>>>>>>>>
>>>>>>>>     if ((dev is RCEC) && rcec_assoc_rciep(dev, rciep))
>>>>>>>>       rciep->rcec = dev;
>>>>>>>>   }
>>>>>>>>
>>>>>>>>   pci_setup_device
>>>>>>>>     ...
>>>>>>
>>>>>> I just noticed your use of pci_setup_device(). Are you suggesting
>>>>>> moving the call to pci_rcec_init() out of pci_init_capabilities() 
>>>>>> and
>>>>>> move it into pci_setup_device()?  If so, would pci_rcec_exit() 
>>>>>> still
>>>>>> remain in pci_release_capabilities()?
>>>>>>
>>>>>> I'm just wondering if it could just remain in
>>>>>> pci_init_capabilities().
>>>>>
>>>>> Yeah, I didn't mean in pci_setup_device() specifically, just 
>>>>> somewhere
>>>>> in the callchain of pci_setup_device().  But you're right, it 
>>>>> probably
>>>>> would make more sense in pci_init_capabilities(), so I *should* 
>>>>> have
>>>>> said pci_scan_single_device() to be a little less specific.
>>>>
>>>> I’ve done some experimenting with this approach, and I think 
>>>> there may be a
>>>> problem of just walking the busses during enumeration
>>>> pci_init_capabilities(). One problem is where one has an RCEC on a 
>>>> root bus:
>>>> 6a(00.4) and an RCiEP on another root bus: 6b(00.0).  They will 
>>>> never find
>>>> each other in this approach through a normal pci_bus_walk() call 
>>>> using their
>>>> respective root_bus.
>>>>
>>>>>  +-[0000:6b]-+-00.0
>>>>>  |           +-00.1
>>>>>  |           +-00.2
>>>>>  |           \-00.3
>>>>>  +-[0000:6a]-+-00.0
>>>>>  |           +-00.1
>>>>>  |           +-00.2
>>>>>  |           \-00.4
>>>
>>> Wow, is that even allowed?
>>>
>>> There's no bridge from 0000:6a to 0000:6b, so we will not scan 
>>> 0000:6b
>>> unless we find a host bridge with _CRS where 6b is the first bus
>>> number below the bridge.  I think that means this would have to be
>>> described in ACPI as two separate root bridges:
>>>
>>>   ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 6a])
>>>   ACPI: PCI Root Bridge [PCI1] (domain 0000 [bus 6b])
>>
>> Otherwise, the RCEC Associated Endpoint Extended Capabilities would 
>> have to have explicitly mentioned a bridge?
>>
>>>
>>> I *guess* maybe it's allowed by the PCIe spec to have an RCEC and
>>> associated RCiEPs on separate root buses?  It seems awfully strange
>>> and not in character for PCIe, but I guess I can't point to language
>>> that prohibits it.
>>
>> Yes, it should be possible.
>>
>>>
>>>> While having a lot of slot calls per bus is unfortunate, unless 
>>>> I’m mistaken
>>>> you would have to walk every peer root_bus with your RCiEP in this 
>>>> example
>>>> until you hit on the right RCEC, unless of course you have a bitmap
>>>> associated RCEC on dev->bus.
>>>
>>> I really despise pci_find_bus(), pci_get_slot(), and related
>>> functions, but maybe they can't be avoided.
>>>
>>> I briefly hoped we could forget about connecting them at
>>> enumeration-time and just build a list of RCECs in the host bridge.
>>> Then when we handle an event for an RCiEP, we could search the list 
>>> to
>>> find the right RCEC (and potentially cache it then).  But I don't
>>> think that would work either, because in the example above, they 
>>> will
>>> be under different host bridges.  I guess we could maybe have a 
>>> global
>>> (or maybe per-domain) list of RCECs.
>>
>> Right, we could just have a list and only need to search the list 
>> when we need to handle an event.
>>
>>>
>>>> Conversely, if you are enumerating the above RCEC at 6a(00.4) and 
>>>> you
>>>> attempt to link_rcec() through calls to pci_walk_bus(), the walk 
>>>> will still
>>>> be limited to 6a and below; never encountering 6b(00.0).  So you 
>>>> would then
>>>> need an additional walk for each of the associated bus ranges, 
>>>> excluding the
>>>> same bus as the RCEC.
>>>>
>>>> pci_init_capabilities()
>>>> …
>>>> pci_init_rcec() // Cached
>>>>
>>>> if (RCEC)
>>>>  Walk the dev->bus for bitmap associated RCiEP
>>>>  Walk all associated bus ranges for RCiEP
>>>>
>>>> else if (RCiEP)
>>>>  Walk the dev->bus for bitmap associated RCEC
>>>>  Walk all peer root_bus for RCEC, confirm if own dev->bus falls 
>>>> within
>>>> discovered RCEC associated ranges
>>>>
>>>> The other problem here is temporal. I’m wondering if we may be 
>>>> trying to
>>>> find associated devices at the pci_init_capabilities() stage prior 
>>>> to them
>>>> being fully enumerated, i.e., RCEC has not been cached but we are 
>>>> searching
>>>> with a future associated RCiEP.  So one could encounter a race 
>>>> condition
>>>> where one is checking on an RCiEP whose associated RCEC has not 
>>>> been
>>>> enumerated yet.
>>>
>>> Maybe I'm misunderstanding this problem, but I think my idea would
>>> handle this: If we find the RCEC first, we cache its info and do
>>> nothing else.  When we subsequently discover an RCiEP, we walk the
>>> tree, find the RCEC, and connect them.
>>>
>>> If we find an RCiEP first, we do nothing.  When we subsequently
>>> discover an RCEC, we walk the tree, find any associated RCiEPs, and
>>> connect them.
>>
>>
>> Your approach makes sense. In retrospect, I don’t think there can 
>> be a race condition here because of the fallback from one RCEC to the 
>> other when they enumerate. You have two chances of finding the 
>> relationship.
>>
>>>
>>> The discovery can happen in different orders based on the
>>> bus/device/function numbers, but it's not really a race because
>>> enumeration is single-threaded.  But if we're talking about two
>>> separate host bridges (PNP0A03 devices), we *should* allow them to 
>>> be
>>> enumerated in parallel, even though we don't do that today.
>>>
>>> I think this RCEC association design is really kind of problematic.
>>>
>>> We don't really *need* the association until some RCiEP event (PME,
>>> error, etc) occurs, so it's reasonable to defer making the RCEC
>>> connection until then.  But it seems like things should be designed 
>>> so
>>> we're guaranteed to enumerate the RCEC before the RCiEPs.  
>>> Otherwise,
>>> we don't really know when we can enable RCiEP events.  If we 
>>> discover
>>> an RCiEP first, enable error reporting, and an error occurs before 
>>> we
>>> find the RCEC, we're in a bit of a pickle.
>>
>> So we could have a global list (RCiEP or RCEC) in which we wait to 
>> identify the associations only when we need to respond to events. Or 
>> in your original suggestion we could walk the RCEC bus ranges in 
>> addition to its root_bus.
>>
>> i.e., We bus walk an enumerating RCEC with link_rcec() callback for 
>> not only the root_bus but each bus number in the associated ranges if 
>> the extended cap exists.
>>
>> RCiEPs will simply continue to invoke their callback via find_rcec() 
>> and only check on their own bus for the encountered RCEC’s 
>> associated bitmap case.
>
> Walking the bus with an RCEC as it is probed in the portdrv_pci.c can 
> be done with both its own bus (bitmap) and with supported associated 
> bus ranges.  In that walk I’m able to find all the associated 
> endpoints via both bitmap on own bus and the bus ranges. No 
> pci_get_slot() is needed as per your original suggestion.
>
> The suggsted approach to the rcec_helper() seems to imply that we 
> either walk it again or have cached all the associated RCiEP.  When we 
> do the walk above, we are merely, finding the RCiEPs and linking them 
> to the RCEC’s structure. There is no caching done of a list of 
> RCiEPs per RCEC.
>
> i.e., in aer.c : set_downstream_devices_error_reporting() wants to 
> enable each associated RCiEP via pcie_walk_rcec().
>
> Looking into changes to the pice_walk_rcec()
>
> Sean


Okay, got it working doing the bus walk with helpers for both linking 
(enumeration case) and call backs (other such calls that need to act on 
associated RCiEPs as in above aer.c example). I’ll do more testing and 
queue up V4 for review.

Thanks,

Sean



>
>>
>>
>> Sean
>>
>>>
>>>> So let’s say one throws out RCiEP entirely and just relies upon 
>>>> RCEC to find
>>>> the associations because one knows that an encountered RCEC (in
>>>> pci_init_capabilities()) has already been cached. In that case you 
>>>> end up
>>>> with the original implementation being done with this patch 
>>>> series…
>>>>
>>>> if (RCEC)
>>>>  Walk the dev->bus for bitmap associated RCiEP
>>>>  Walk all associated bus ranges for RCiEP
>>>>
>>>> Perhaps I’ve muddled some things here but it doesn’t look like 
>>>> the twain
>>>> will meet unless I cover multiple peer root_bus and even then you 
>>>> may have
>>>> an issue because the devices don’t yet fully exist from the 
>>>> perspective of
>>>> the OS.
>>>>
>>>> Thanks,
>>>>
>>>> Sean

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 04/10] PCI/RCEC: Add pcie_walk_rcec() to walk associated RCiEPs
  2020-09-14 16:55               ` Sean V Kelley
  2020-09-15 16:09                 ` Sean V Kelley
@ 2020-09-16 22:49                 ` Bjorn Helgaas
  1 sibling, 0 replies; 26+ messages in thread
From: Bjorn Helgaas @ 2020-09-16 22:49 UTC (permalink / raw)
  To: Sean V Kelley
  Cc: Zhuo, Qiuxu, Jonathan.Cameron, rjw, sathyanarayanan.kuppuswamy,
	Raj, Ashok, Luck, Tony, linux-pci, bhelgaas, linux-kernel

On Mon, Sep 14, 2020 at 09:55:53AM -0700, Sean V Kelley wrote:
> On 11 Sep 2020, at 17:50, Bjorn Helgaas wrote:
> > On Fri, Sep 11, 2020 at 04:16:03PM -0700, Sean V Kelley wrote:

> > > I’ve done some experimenting with this approach, and I think
> > > there may be a problem of just walking the busses during
> > > enumeration pci_init_capabilities(). One problem is where one
> > > has an RCEC on a root bus: 6a(00.4) and an RCiEP on another root
> > > bus: 6b(00.0).  They will never find each other in this approach
> > > through a normal pci_bus_walk() call using their respective
> > > root_bus.
> > > 
> > > >  +-[0000:6b]-+-00.0
> > > >  |           +-00.1
> > > >  |           +-00.2
> > > >  |           \-00.3
> > > >  +-[0000:6a]-+-00.0
> > > >  |           +-00.1
> > > >  |           +-00.2
> > > >  |           \-00.4
> > 
> > Wow, is that even allowed?
> > 
> > There's no bridge from 0000:6a to 0000:6b, so we will not scan 0000:6b
> > unless we find a host bridge with _CRS where 6b is the first bus
> > number below the bridge.  I think that means this would have to be
> > described in ACPI as two separate root bridges:
> > 
> >   ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 6a])
> >   ACPI: PCI Root Bridge [PCI1] (domain 0000 [bus 6b])
> 
> Otherwise, the RCEC Associated Endpoint Extended Capabilities would have to
> have explicitly mentioned a bridge?

I just meant that the enumeration algorithm starts with a PNP0A03
device and searches the root bus from its _CRS, descending under any
bridges it finds.  There's no PCI-to-PCI bridge from 6a to 6b (if
there *were* such a bridge, 6b would not be a root bridge).

> > I *guess* maybe it's allowed by the PCIe spec to have an RCEC and
> > associated RCiEPs on separate root buses?  It seems awfully strange
> > and not in character for PCIe, but I guess I can't point to language
> > that prohibits it.
> 
> Yes, it should be possible.

Ugh :)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 04/10] PCI/RCEC: Add pcie_walk_rcec() to walk associated RCiEPs
  2020-09-15 16:09                 ` Sean V Kelley
  2020-09-15 17:47                   ` Sean V Kelley
@ 2020-09-16 23:06                   ` Bjorn Helgaas
  1 sibling, 0 replies; 26+ messages in thread
From: Bjorn Helgaas @ 2020-09-16 23:06 UTC (permalink / raw)
  To: Sean V Kelley
  Cc: Zhuo, Qiuxu, Jonathan.Cameron, rjw, sathyanarayanan.kuppuswamy,
	Raj, Ashok, Luck, Tony, linux-pci, bhelgaas, linux-kernel

On Tue, Sep 15, 2020 at 09:09:20AM -0700, Sean V Kelley wrote:

> Walking the bus with an RCEC as it is probed in the portdrv_pci.c can be
> done with both its own bus (bitmap) and with supported associated bus
> ranges.  In that walk I’m able to find all the associated endpoints via both
> bitmap on own bus and the bus ranges. No pci_get_slot() is needed as per
> your original suggestion.

OK.  That seems OK for now.

> The suggsted approach to the rcec_helper() seems to imply that we either
> walk it again or have cached all the associated RCiEP.  When we do the walk
> above, we are merely, finding the RCiEPs and linking them to the RCEC’s
> structure. There is no caching done of a list of RCiEPs per RCEC.

pcie_portdrv_init() is a device_initcall, so it should be called after
we enumerate all the PCI devices (at least those present at boot).  So
you don't have to worry about finding an RCiEP before its associated
RCEC -- we should have found them *all* by the time we get to
pcie_portdrv_probe().

Bjorn

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2020-09-16 23:06 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-12 16:46 [PATCH v3 00/10] Add RCEC handling to PCI/AER Sean V Kelley
2020-08-12 16:46 ` [PATCH v3 01/10] PCI/RCEC: Add RCEC class code and extended capability Sean V Kelley
2020-08-12 16:46 ` [PATCH v3 02/10] PCI/RCEC: Bind RCEC devices to the Root Port driver Sean V Kelley
2020-08-12 16:46 ` [PATCH v3 03/10] PCI/RCEC: Cache RCEC capabilities in pci_init_capabilities() Sean V Kelley
2020-08-12 16:46 ` [PATCH v3 04/10] PCI/RCEC: Add pcie_walk_rcec() to walk associated RCiEPs Sean V Kelley
2020-09-02 19:00   ` Bjorn Helgaas
2020-09-02 21:55     ` Sean V Kelley
2020-09-04 22:18       ` Kelley, Sean V
2020-09-05  2:23         ` Bjorn Helgaas
2020-09-11 23:16           ` Sean V Kelley
2020-09-12  0:50             ` Bjorn Helgaas
2020-09-14 16:55               ` Sean V Kelley
2020-09-15 16:09                 ` Sean V Kelley
2020-09-15 17:47                   ` Sean V Kelley
2020-09-16 23:06                   ` Bjorn Helgaas
2020-09-16 22:49                 ` Bjorn Helgaas
2020-08-12 16:46 ` [PATCH v3 05/10] PCI/AER: Extend AER error handling to RCECs Sean V Kelley
2020-08-26 17:26   ` Kuppuswamy, Sathyanarayanan
2020-08-26 18:55     ` sean.v.kelley
2020-08-12 16:46 ` [PATCH v3 06/10] PCI/AER: Apply function level reset to RCiEP on fatal error Sean V Kelley
2020-08-12 16:46 ` [PATCH v3 07/10] PCI: Add 'rcec' field to pci_dev for associated RCiEPs Sean V Kelley
2020-09-02 16:35   ` Bjorn Helgaas
2020-09-02 22:24     ` Sean V Kelley
2020-08-12 16:46 ` [PATCH v3 08/10] PCI/AER: Add RCEC AER handling Sean V Kelley
2020-08-12 16:46 ` [PATCH v3 09/10] PCI/PME: Add RCEC PME handling Sean V Kelley
2020-08-12 16:46 ` [PATCH v3 10/10] PCI/AER: Add RCEC AER error injection support Sean V Kelley

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.