All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/5] CXL, ACPI, APEI, EINJ: Update EINJ for CXL error types
@ 2023-11-28 16:06 Ben Cheatham
  2023-11-28 16:06 ` [PATCH v7 1/5] cxl/port: Add EINJ debugfs files and callback support Ben Cheatham
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Ben Cheatham @ 2023-11-28 16:06 UTC (permalink / raw)
  To: dave, jonathan.cameron, dave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, rafael
  Cc: benjamin.cheatham, yazen.ghannam, linux-cxl, linux-acpi

v7 Changes:
	- Fixed a bug a permissions bug with einj_inject file (was using
	  0x200 instead of 0200)
	- Fixed a kernel test bot error
	- Removed a "magic" number in cxl_einj_available_error_type()
	- Bumped kernel version in debugfs documentation entries
	- Added Jonathan's Reviewed-by

v6 Changes:
	- Reworked to have CXL error types under /sys/kernel/debug/cxl (Dan)
		- Removed CXL error types from legacy EINJ interface in favor of
		new interface
	- Removed cxl_rcrb_addr file
	- Added optional patch for CXL error type #defines (patch 2/5)
	- Changes to documentation updates to match rework
	- Change base to cxl-fixes branch

The new CXL error types will use the Memory Address field in the
SET_ERROR_TYPE_WITH_ADDRESS structure in order to target a CXL 1.1
compliant memory-mapped downstream port. The value of the memory address
will be in the port's MMIO range, and it will not represent physical
(normal or persistent) memory.

Add the functionality for injecting CXL 1.1 errors to the EINJ module,
but not through the EINJ legacy interface under /sys/kernel/debug/apei/einj.
Instead, make the error types available under /sys/kernel/debug/cxl.
This allows for validating the MMIO address for a CXL 1.1 error type
while also not making the user responsible for finding it.

Ben Cheatham (5):
  cxl/port: Add EINJ debugfs files and callback support
  ACPI: Add CXL protocol error defines
  EINJ: Separate CXL errors from other EINJ errors
  cxl/port, EINJ: Add CXL EINJ callback functions
  EINJ: Update EINJ documentation

 Documentation/ABI/testing/debugfs-cxl         |  27 ++++
 .../firmware-guide/acpi/apei/einj.rst         |  12 ++
 drivers/acpi/apei/Kconfig                     |   3 +
 drivers/acpi/apei/einj.c                      | 151 ++++++++++++++++--
 drivers/cxl/core/port.c                       |  84 ++++++++++
 drivers/cxl/cxl.h                             |  10 ++
 include/acpi/actbl1.h                         |   6 +
 7 files changed, 283 insertions(+), 10 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v7 1/5] cxl/port: Add EINJ debugfs files and callback support
  2023-11-28 16:06 [PATCH v7 0/5] CXL, ACPI, APEI, EINJ: Update EINJ for CXL error types Ben Cheatham
@ 2023-11-28 16:06 ` Ben Cheatham
  2023-12-07 23:13   ` Dan Williams
  2023-11-28 16:06 ` [PATCH v7 2/5] ACPI: Add CXL protocol error defines Ben Cheatham
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Ben Cheatham @ 2023-11-28 16:06 UTC (permalink / raw)
  To: dave, jonathan.cameron, dave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, rafael
  Cc: benjamin.cheatham, yazen.ghannam, linux-cxl, linux-acpi

Add creation of debugfs directories for ports and dports under
/sys/kernel/debug/cxl when EINJ support is enabled. The dport
directories will contain files for injecting CXL protocol errors.
These files are only usable once the EINJ module has loaded and
registered callback functions with the CXL core module, before that
occurs (or if the EINJ module isn't loaded) the files will do nothing
and return an ENODEV error.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
 Documentation/ABI/testing/debugfs-cxl | 27 +++++++++
 drivers/cxl/core/port.c               | 84 +++++++++++++++++++++++++++
 drivers/cxl/cxl.h                     | 10 ++++
 3 files changed, 121 insertions(+)

diff --git a/Documentation/ABI/testing/debugfs-cxl b/Documentation/ABI/testing/debugfs-cxl
index fe61d372e3fa..782a1bb78884 100644
--- a/Documentation/ABI/testing/debugfs-cxl
+++ b/Documentation/ABI/testing/debugfs-cxl
@@ -33,3 +33,30 @@ Description:
 		device cannot clear poison from the address, -ENXIO is returned.
 		The clear_poison attribute is only visible for devices
 		supporting the capability.
+
+What:		/sys/kernel/debug/cxl/portX/dportY/einj_types
+Date:		November, 2023
+KernelVersion:	v6.8
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(RO) Prints the CXL protocol error types made available by
+		the platform in the format "0x<error number>	<error type>".
+		The <error number> can be written to einj_inject to inject
+		<error type> into dportY. This file is only visible if
+		CONFIG_ACPI_APEI_EINJ is enabled, and the EINJ module must
+		be able to reach one (or both) of the CXL_ACPI or CXL_PORT
+		modules to be functional.
+
+What:		/sys/kernel/debug/cxl/portX/dportY/einj_inject
+Date:		November, 2023
+KernelVersion:	v6.8
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(WO) Writing an integer to this file injects the corresponding
+		CXL protocol error into dportY (integer to type mapping is
+		available by reading from einj_types). If the dport was
+		enumerated in RCH mode, a CXL 1.1 error is injected, otherwise
+		a CXL 2.0 error is injected. This file is only visible if
+		CONFIG_ACPI_APEI_EINJ is enabled, and the EINJ module must
+		be able to reach one (or both) of the CXL_ACPI or CXL_PORT
+		modules to be functional.
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 38441634e4c6..acf10415a174 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -783,6 +783,72 @@ static int cxl_dport_setup_regs(struct device *host, struct cxl_dport *dport,
 	return rc;
 }
 
+static struct cxl_einj_ops einj_ops;
+void cxl_einj_set_ops_cbs(struct cxl_einj_ops *ops)
+{
+	if (!IS_REACHABLE(CONFIG_ACPI_APEI_EINJ) || !ops)
+		return;
+
+	einj_ops = *ops;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_einj_set_ops_cbs, CXL);
+
+static int cxl_einj_type_show(struct seq_file *f, void *data)
+{
+	if (!einj_ops.einj_type)
+		return -ENODEV;
+
+	return einj_ops.einj_type(f, data);
+}
+
+static int cxl_einj_inject(void *data, u64 type)
+{
+	struct cxl_dport *dport = data;
+
+	if (!einj_ops.einj_inject)
+		return -ENODEV;
+
+	return einj_ops.einj_inject(dport, type);
+}
+DEFINE_DEBUGFS_ATTRIBUTE(cxl_einj_inject_fops, NULL, cxl_einj_inject, "%llx\n");
+
+static int cxl_debugfs_create_dport_dir(struct dentry *port_dir,
+						   struct cxl_dport *dport)
+{
+	struct dentry *dir;
+	char dir_name[32];
+
+	snprintf(dir_name, 31, "dport%d", dport->port_id);
+	dir = debugfs_create_dir(dir_name, port_dir);
+	if (IS_ERR(dir))
+		return PTR_ERR(dir);
+
+	debugfs_create_devm_seqfile(dport->dport_dev, "einj_types", dir,
+				    cxl_einj_type_show);
+
+	debugfs_create_file("einj_inject", 0200, dir, dport,
+			    &cxl_einj_inject_fops);
+	return 0;
+}
+
+static struct dentry *cxl_debugfs_create_port_dir(struct cxl_port *port)
+{
+	const char *dir_name = dev_name(&port->dev);
+	struct dentry *dir;
+
+	if (!IS_ENABLED(CONFIG_ACPI_APEI_EINJ))
+		return ERR_PTR(-ENODEV);
+
+	dir = cxl_debugfs_create_dir(dir_name);
+	if (IS_ERR(dir)) {
+		dev_dbg(&port->dev, "Failed to create port debugfs dir: %ld\n",
+			PTR_ERR(dir));
+		return dir;
+	}
+
+	return dir;
+}
+
 static struct cxl_port *__devm_cxl_add_port(struct device *host,
 					    struct device *uport_dev,
 					    resource_size_t component_reg_phys,
@@ -861,6 +927,7 @@ struct cxl_port *devm_cxl_add_port(struct device *host,
 				   struct cxl_dport *parent_dport)
 {
 	struct cxl_port *port, *parent_port;
+	struct dentry *dir;
 
 	port = __devm_cxl_add_port(host, uport_dev, component_reg_phys,
 				   parent_dport);
@@ -878,6 +945,10 @@ struct cxl_port *devm_cxl_add_port(struct device *host,
 			parent_port ? " to " : "",
 			parent_port ? dev_name(&parent_port->dev) : "",
 			parent_port ? "" : " (root port)");
+
+		dir = cxl_debugfs_create_port_dir(port);
+		if (!IS_ERR(dir))
+			port->debug_dir = dir;
 	}
 
 	return port;
@@ -1127,6 +1198,7 @@ struct cxl_dport *devm_cxl_add_dport(struct cxl_port *port,
 				     resource_size_t component_reg_phys)
 {
 	struct cxl_dport *dport;
+	int rc;
 
 	dport = __devm_cxl_add_dport(port, dport_dev, port_id,
 				     component_reg_phys, CXL_RESOURCE_NONE);
@@ -1136,6 +1208,11 @@ struct cxl_dport *devm_cxl_add_dport(struct cxl_port *port,
 	} else {
 		dev_dbg(dport_dev, "dport added to %s\n",
 			dev_name(&port->dev));
+
+		rc = cxl_debugfs_create_dport_dir(port->debug_dir, dport);
+		if (rc)
+			dev_dbg(dport_dev,
+				"Failed to create dport debugfs dir: %d\n", rc);
 	}
 
 	return dport;
@@ -1156,6 +1233,7 @@ struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port,
 					 resource_size_t rcrb)
 {
 	struct cxl_dport *dport;
+	int rc;
 
 	if (rcrb == CXL_RESOURCE_NONE) {
 		dev_dbg(&port->dev, "failed to add RCH dport, missing RCRB\n");
@@ -1170,6 +1248,12 @@ struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port,
 	} else {
 		dev_dbg(dport_dev, "RCH dport added to %s\n",
 			dev_name(&port->dev));
+
+		rc = cxl_debugfs_create_dport_dir(port->debug_dir, dport);
+		if (rc)
+			dev_dbg(dport_dev,
+				"Failed to create rch dport debugfs dir: %d\n",
+				rc);
 	}
 
 	return dport;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 687043ece101..3c7744fc3106 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -590,6 +590,7 @@ struct cxl_dax_region {
  * @depth: How deep this port is relative to the root. depth 0 is the root.
  * @cdat: Cached CDAT data
  * @cdat_available: Should a CDAT attribute be available in sysfs
+ * @debug_dir: dentry for port in cxl debugfs (optional)
  */
 struct cxl_port {
 	struct device dev;
@@ -612,6 +613,7 @@ struct cxl_port {
 		size_t length;
 	} cdat;
 	bool cdat_available;
+	struct dentry *debug_dir;
 };
 
 static inline struct cxl_dport *
@@ -813,6 +815,14 @@ bool is_cxl_nvdimm_bridge(struct device *dev);
 int devm_cxl_add_nvdimm(struct cxl_memdev *cxlmd);
 struct cxl_nvdimm_bridge *cxl_find_nvdimm_bridge(struct cxl_memdev *cxlmd);
 
+struct cxl_einj_ops {
+	int (*einj_type)(struct seq_file *f, void *data);
+	int (*einj_inject)(struct cxl_dport *dport, u64 type);
+};
+
+void cxl_einj_set_ops_cbs(struct cxl_einj_ops *ops);
+
+
 #ifdef CONFIG_CXL_REGION
 bool is_cxl_pmem_region(struct device *dev);
 struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v7 2/5] ACPI: Add CXL protocol error defines
  2023-11-28 16:06 [PATCH v7 0/5] CXL, ACPI, APEI, EINJ: Update EINJ for CXL error types Ben Cheatham
  2023-11-28 16:06 ` [PATCH v7 1/5] cxl/port: Add EINJ debugfs files and callback support Ben Cheatham
@ 2023-11-28 16:06 ` Ben Cheatham
  2023-12-07 23:28   ` Dan Williams
  2023-11-28 16:06 ` [PATCH v7 3/5] EINJ: Separate CXL errors from other EINJ errors Ben Cheatham
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Ben Cheatham @ 2023-11-28 16:06 UTC (permalink / raw)
  To: dave, jonathan.cameron, dave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, rafael
  Cc: benjamin.cheatham, yazen.ghannam, linux-cxl, linux-acpi

Add CXL protocol error defines to include/actbl1.h.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>

---
I made a pull request for adding this support into
the acpica project that has already been merged [1],
Feel free to discard this commit and use the acpica
changes instead.

[1]:
Link: https://github.com/acpica/acpica/pull/884

---
 include/acpi/actbl1.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h
index a33375e055ad..1f58c5d86869 100644
--- a/include/acpi/actbl1.h
+++ b/include/acpi/actbl1.h
@@ -1096,6 +1096,12 @@ enum acpi_einj_command_status {
 #define ACPI_EINJ_PLATFORM_CORRECTABLE      (1<<9)
 #define ACPI_EINJ_PLATFORM_UNCORRECTABLE    (1<<10)
 #define ACPI_EINJ_PLATFORM_FATAL            (1<<11)
+#define ACPI_EINJ_CXL_CACHE_CORRECTABLE     (1<<12)
+#define ACPI_EINJ_CXL_CACHE_UNCORRECTABLE   (1<<13)
+#define ACPI_EINJ_CXL_CACHE_FATAL           (1<<14)
+#define ACPI_EINJ_CXL_MEM_CORRECTABLE       (1<<15)
+#define ACPI_EINJ_CXL_MEM_UNCORRECTABLE     (1<<16)
+#define ACPI_EINJ_CXL_MEM_FATAL             (1<<17)
 #define ACPI_EINJ_VENDOR_DEFINED            (1<<31)
 
 /*******************************************************************************
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v7 3/5] EINJ: Separate CXL errors from other EINJ errors
  2023-11-28 16:06 [PATCH v7 0/5] CXL, ACPI, APEI, EINJ: Update EINJ for CXL error types Ben Cheatham
  2023-11-28 16:06 ` [PATCH v7 1/5] cxl/port: Add EINJ debugfs files and callback support Ben Cheatham
  2023-11-28 16:06 ` [PATCH v7 2/5] ACPI: Add CXL protocol error defines Ben Cheatham
@ 2023-11-28 16:06 ` Ben Cheatham
  2023-12-07 23:30   ` Dan Williams
  2023-11-28 16:06 ` [PATCH v7 4/5] cxl/port, EINJ: Add CXL EINJ callback functions Ben Cheatham
  2023-11-28 16:06 ` [PATCH v7 5/5] EINJ: Update EINJ documentation Ben Cheatham
  4 siblings, 1 reply; 16+ messages in thread
From: Ben Cheatham @ 2023-11-28 16:06 UTC (permalink / raw)
  To: dave, jonathan.cameron, dave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, rafael
  Cc: benjamin.cheatham, yazen.ghannam, linux-cxl, linux-acpi

Separate CXL error types from other EINJ error types and disallow them
in the legacy EINJ interface under /sys/kernel/debug/apei/einj. Support
for the CXL error types will be added under /sys/kernel/debug/cxl in the
next commit.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
 drivers/acpi/apei/einj.c | 56 +++++++++++++++++++++++++++++-----------
 1 file changed, 41 insertions(+), 15 deletions(-)

diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
index 013eb621dc92..330329ac2f1f 100644
--- a/drivers/acpi/apei/einj.c
+++ b/drivers/acpi/apei/einj.c
@@ -36,6 +36,12 @@
 #define MEM_ERROR_MASK		(ACPI_EINJ_MEMORY_CORRECTABLE | \
 				ACPI_EINJ_MEMORY_UNCORRECTABLE | \
 				ACPI_EINJ_MEMORY_FATAL)
+#define CXL_ERROR_MASK		(ACPI_EINJ_CXL_CACHE_CORRECTABLE | \
+				ACPI_EINJ_CXL_CACHE_UNCORRECTABLE | \
+				ACPI_EINJ_CXL_CACHE_FATAL | \
+				ACPI_EINJ_CXL_MEM_CORRECTABLE | \
+				ACPI_EINJ_CXL_MEM_UNCORRECTABLE | \
+				ACPI_EINJ_CXL_MEM_FATAL)
 
 /*
  * ACPI version 5 provides a SET_ERROR_TYPE_WITH_ADDRESS action.
@@ -537,8 +543,11 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
 	if (type & ACPI5_VENDOR_BIT) {
 		if (vendor_flags != SETWA_FLAGS_MEM)
 			goto inject;
-	} else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM))
+	} else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM)) {
 		goto inject;
+	} else if ((type & CXL_ERROR_MASK) && (flags & SETWA_FLAGS_MEM)) {
+		goto inject;
+	}
 
 	/*
 	 * Disallow crazy address masks that give BIOS leeway to pick
@@ -590,6 +599,9 @@ static const char * const einj_error_type_string[] = {
 	"0x00000200\tPlatform Correctable\n",
 	"0x00000400\tPlatform Uncorrectable non-fatal\n",
 	"0x00000800\tPlatform Uncorrectable fatal\n",
+};
+
+static const char * const einj_cxl_error_type_string[] = {
 	"0x00001000\tCXL.cache Protocol Correctable\n",
 	"0x00002000\tCXL.cache Protocol Uncorrectable non-fatal\n",
 	"0x00004000\tCXL.cache Protocol Uncorrectable fatal\n",
@@ -615,29 +627,21 @@ static int available_error_type_show(struct seq_file *m, void *v)
 
 DEFINE_SHOW_ATTRIBUTE(available_error_type);
 
-static int error_type_get(void *data, u64 *val)
-{
-	*val = error_type;
-
-	return 0;
-}
-
-static int error_type_set(void *data, u64 val)
+static int validate_error_type(u64 type)
 {
+	u32 tval, vendor, available_error_type = 0;
 	int rc;
-	u32 available_error_type = 0;
-	u32 tval, vendor;
 
 	/* Only low 32 bits for error type are valid */
-	if (val & GENMASK_ULL(63, 32))
+	if (type & GENMASK_ULL(63, 32))
 		return -EINVAL;
 
 	/*
 	 * Vendor defined types have 0x80000000 bit set, and
 	 * are not enumerated by ACPI_EINJ_GET_ERROR_TYPE
 	 */
-	vendor = val & ACPI5_VENDOR_BIT;
-	tval = val & 0x7fffffff;
+	vendor = type & ACPI5_VENDOR_BIT;
+	tval = type & 0x7fffffff;
 
 	/* Only one error type can be specified */
 	if (tval & (tval - 1))
@@ -646,9 +650,31 @@ static int error_type_set(void *data, u64 val)
 		rc = einj_get_available_error_type(&available_error_type);
 		if (rc)
 			return rc;
-		if (!(val & available_error_type))
+		if (!(type & available_error_type))
 			return -EINVAL;
 	}
+
+	return 0;
+}
+
+static int error_type_get(void *data, u64 *val)
+{
+	*val = error_type;
+
+	return 0;
+}
+
+static int error_type_set(void *data, u64 val)
+{
+	int rc;
+
+	if (val & CXL_ERROR_MASK && !(val & ACPI5_VENDOR_BIT))
+		return -EINVAL;
+
+	rc = validate_error_type(val);
+	if (rc)
+		return rc;
+
 	error_type = val;
 
 	return 0;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v7 4/5] cxl/port, EINJ: Add CXL EINJ callback functions
  2023-11-28 16:06 [PATCH v7 0/5] CXL, ACPI, APEI, EINJ: Update EINJ for CXL error types Ben Cheatham
                   ` (2 preceding siblings ...)
  2023-11-28 16:06 ` [PATCH v7 3/5] EINJ: Separate CXL errors from other EINJ errors Ben Cheatham
@ 2023-11-28 16:06 ` Ben Cheatham
  2023-12-08  0:03   ` Dan Williams
  2023-11-28 16:06 ` [PATCH v7 5/5] EINJ: Update EINJ documentation Ben Cheatham
  4 siblings, 1 reply; 16+ messages in thread
From: Ben Cheatham @ 2023-11-28 16:06 UTC (permalink / raw)
  To: dave, jonathan.cameron, dave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, rafael
  Cc: benjamin.cheatham, yazen.ghannam, linux-cxl, linux-acpi

Add functions to the EINJ module to be used in the CXL module for CXL
protocol error injection. The callbacks implement the einj_types and
einj_inject files under /sys/kernel/debug/cxl/portX/dportY. These two
files work in the same way as the available_error_types and error_inject
files under /sys/kernel/debug/apei/einj, but only for CXL error types.
If the dport is enumerated in RCH (CXL 1.1) mode, a CXL 1.1 error is
injected into the dport; a CXL 2.0 error is injected otherwise.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
 drivers/acpi/apei/Kconfig |   3 ++
 drivers/acpi/apei/einj.c  | 105 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 108 insertions(+)

diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
index 6b18f8bc7be3..100c27beb581 100644
--- a/drivers/acpi/apei/Kconfig
+++ b/drivers/acpi/apei/Kconfig
@@ -11,6 +11,9 @@ config ACPI_APEI
 	select PSTORE
 	select UEFI_CPER
 	depends on HAVE_ACPI_APEI
+	imply CXL_BUS
+	imply CXL_ACPI
+	imply CXL_PORT
 	help
 	  APEI allows to report errors (for example from the chipset)
 	  to the operating system. This improves NMI handling
diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
index 330329ac2f1f..98d5e6d60a02 100644
--- a/drivers/acpi/apei/einj.c
+++ b/drivers/acpi/apei/einj.c
@@ -21,9 +21,11 @@
 #include <linux/nmi.h>
 #include <linux/delay.h>
 #include <linux/mm.h>
+#include <linux/pci.h>
 #include <asm/unaligned.h>
 
 #include "apei-internal.h"
+#include "../../cxl/cxl.h"
 
 #undef pr_fmt
 #define pr_fmt(fmt) "EINJ: " fmt
@@ -627,6 +629,25 @@ static int available_error_type_show(struct seq_file *m, void *v)
 
 DEFINE_SHOW_ATTRIBUTE(available_error_type);
 
+static int cxl_einj_available_error_type(struct seq_file *m, void *v)
+{
+	int cxl_err, rc;
+	u32 available_error_type = 0;
+
+	rc = einj_get_available_error_type(&available_error_type);
+	if (rc)
+		return rc;
+
+	for (int pos = 0; pos < ARRAY_SIZE(einj_cxl_error_type_string); pos++) {
+		cxl_err = ACPI_EINJ_CXL_CACHE_CORRECTABLE << pos;
+
+		if (available_error_type & cxl_err)
+			seq_puts(m, einj_cxl_error_type_string[pos]);
+	}
+
+	return 0;
+}
+
 static int validate_error_type(u64 type)
 {
 	u32 tval, vendor, available_error_type = 0;
@@ -657,6 +678,68 @@ static int validate_error_type(u64 type)
 	return 0;
 }
 
+static int cxl_dport_get_sbdf(struct cxl_dport *dport, u64 *sbdf)
+{
+	struct pci_dev *pdev;
+	struct pci_bus *pbus;
+	struct pci_host_bridge *bridge;
+	u64 seg = 0, bus;
+
+	if (!dev_is_pci(dport->dport_dev))
+		return -EINVAL;
+
+	pdev = to_pci_dev(dport->dport_dev);
+	pbus = pdev->bus;
+	bridge = pci_find_host_bridge(pbus);
+
+	if (!bridge)
+		return -ENODEV;
+
+	if (bridge->domain_nr != PCI_DOMAIN_NR_NOT_SET)
+		seg = bridge->domain_nr << 24;
+
+	bus = pbus->number << 16;
+	*sbdf = seg | bus | pdev->devfn;
+
+	return 0;
+}
+
+static int cxl_einj_inject_error(struct cxl_dport *dport, u64 type)
+{
+	u64 param1 = 0, param2 = 0, param4 = 0;
+	u32 flags;
+	int rc;
+
+	/* Only CXL error types can be specified */
+	if (type & ~CXL_ERROR_MASK || (type & ACPI5_VENDOR_BIT))
+		return -EINVAL;
+
+	rc = validate_error_type(type);
+	if (rc)
+		return rc;
+
+	/*
+	 * If dport is in restricted mode, inject a CXL 1.1 error,
+	 * otherwise inject a CXL 2.0 error
+	 */
+	if (dport->rch) {
+		if (dport->rcrb.base == CXL_RESOURCE_NONE)
+			return -EINVAL;
+
+		param1 = (u64) dport->rcrb.base;
+		param2 = 0xfffffffffffff000;
+		flags = 0x2;
+	} else {
+		rc = cxl_dport_get_sbdf(dport, &param4);
+		if (rc)
+			return rc;
+
+		flags = 0x4;
+	}
+
+	return einj_error_inject(type, flags, param1, param2, 0, param4);
+}
+
 static int error_type_get(void *data, u64 *val)
 {
 	*val = error_type;
@@ -668,6 +751,7 @@ static int error_type_set(void *data, u64 val)
 {
 	int rc;
 
+	/* CXL error types have to be injected from cxl debugfs */
 	if (val & CXL_ERROR_MASK && !(val & ACPI5_VENDOR_BIT))
 		return -EINVAL;
 
@@ -714,6 +798,7 @@ static int __init einj_init(void)
 {
 	int rc;
 	acpi_status status;
+	struct cxl_einj_ops cxl_ops;
 	struct apei_exec_context ctx;
 
 	if (acpi_disabled) {
@@ -793,6 +878,15 @@ static int __init einj_init(void)
 				   einj_debug_dir, &vendor_flags);
 	}
 
+	if (IS_REACHABLE(CONFIG_CXL_ACPI) || IS_REACHABLE(CONFIG_CXL_PORT)) {
+		cxl_ops = (struct cxl_einj_ops) {
+			.einj_type = cxl_einj_available_error_type,
+			.einj_inject = cxl_einj_inject_error,
+		};
+
+		cxl_einj_set_ops_cbs(&cxl_ops);
+	}
+
 	pr_info("Error INJection is initialized.\n");
 
 	return 0;
@@ -810,8 +904,18 @@ static int __init einj_init(void)
 
 static void __exit einj_exit(void)
 {
+	struct cxl_einj_ops cxl_ops;
 	struct apei_exec_context ctx;
 
+	if (IS_REACHABLE(CONFIG_CXL_ACPI) || IS_REACHABLE(CONFIG_CXL_PORT)) {
+		cxl_ops = (struct cxl_einj_ops) {
+			.einj_type = NULL,
+			.einj_inject = NULL,
+		};
+
+		cxl_einj_set_ops_cbs(&cxl_ops);
+	}
+
 	if (einj_param) {
 		acpi_size size = (acpi5) ?
 			sizeof(struct set_error_type_with_address) :
@@ -832,4 +936,5 @@ module_exit(einj_exit);
 
 MODULE_AUTHOR("Huang Ying");
 MODULE_DESCRIPTION("APEI Error INJection support");
+MODULE_IMPORT_NS(CXL);
 MODULE_LICENSE("GPL");
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v7 5/5] EINJ: Update EINJ documentation
  2023-11-28 16:06 [PATCH v7 0/5] CXL, ACPI, APEI, EINJ: Update EINJ for CXL error types Ben Cheatham
                   ` (3 preceding siblings ...)
  2023-11-28 16:06 ` [PATCH v7 4/5] cxl/port, EINJ: Add CXL EINJ callback functions Ben Cheatham
@ 2023-11-28 16:06 ` Ben Cheatham
  4 siblings, 0 replies; 16+ messages in thread
From: Ben Cheatham @ 2023-11-28 16:06 UTC (permalink / raw)
  To: dave, jonathan.cameron, dave.jiang, alison.schofield,
	vishal.l.verma, ira.weiny, dan.j.williams, rafael
  Cc: benjamin.cheatham, yazen.ghannam, linux-cxl, linux-acpi

Update EINJ documentation with build requirements for CXL error types
and how to inject CXL error types.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
---
 Documentation/firmware-guide/acpi/apei/einj.rst | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/Documentation/firmware-guide/acpi/apei/einj.rst b/Documentation/firmware-guide/acpi/apei/einj.rst
index d6b61d22f525..83afe3bac793 100644
--- a/Documentation/firmware-guide/acpi/apei/einj.rst
+++ b/Documentation/firmware-guide/acpi/apei/einj.rst
@@ -181,6 +181,18 @@ You should see something like this in dmesg::
   [22715.834759] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1422553404 SOCKET 0 APIC 0
   [22716.616173] EDAC MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12345 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)
 
+CXL error types are supported from ACPI 6.5 onwards. These error types
+are not available in the legacy interface at /sys/kernel/debug/apei/einj,
+but are instead at /sys/kernel/debug/cxl/portX/dportY. Inside the dportY
+directory are two files, einj_types and einj_inject. These files work the
+same as the available_error_type and error_inject files (read the error
+types from einj_types and write the type to inject to einj_inject).
+
+To use these error types one of (or both) ``CONFIG_CXL_ACPI`` or
+``CONFIG_CXL_PORT`` must be reachable by the EINJ module; if
+``CONFIG_ACPI_APEI_EINJ`` == y/m, then at least one of ``CONFIG_CXL_ACPI``
+or ``CONFIG_CXL_PORT`` must also be set to y/m.
+
 Special notes for injection into SGX enclaves:
 
 There may be a separate BIOS setup option to enable SGX injection.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* RE: [PATCH v7 1/5] cxl/port: Add EINJ debugfs files and callback support
  2023-11-28 16:06 ` [PATCH v7 1/5] cxl/port: Add EINJ debugfs files and callback support Ben Cheatham
@ 2023-12-07 23:13   ` Dan Williams
  2023-12-08 16:22     ` Ben Cheatham
  0 siblings, 1 reply; 16+ messages in thread
From: Dan Williams @ 2023-12-07 23:13 UTC (permalink / raw)
  To: Ben Cheatham, dave, jonathan.cameron, dave.jiang,
	alison.schofield, vishal.l.verma, ira.weiny, dan.j.williams,
	rafael
  Cc: benjamin.cheatham, yazen.ghannam, linux-cxl, linux-acpi

Ben Cheatham wrote:
> Add creation of debugfs directories for ports and dports under
> /sys/kernel/debug/cxl when EINJ support is enabled. The dport
> directories will contain files for injecting CXL protocol errors.
> These files are only usable once the EINJ module has loaded and
> registered callback functions with the CXL core module, before that
> occurs (or if the EINJ module isn't loaded) the files will do nothing
> and return an ENODEV error.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
> ---
>  Documentation/ABI/testing/debugfs-cxl | 27 +++++++++
>  drivers/cxl/core/port.c               | 84 +++++++++++++++++++++++++++
>  drivers/cxl/cxl.h                     | 10 ++++
>  3 files changed, 121 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/debugfs-cxl b/Documentation/ABI/testing/debugfs-cxl
> index fe61d372e3fa..782a1bb78884 100644
> --- a/Documentation/ABI/testing/debugfs-cxl
> +++ b/Documentation/ABI/testing/debugfs-cxl
> @@ -33,3 +33,30 @@ Description:
>  		device cannot clear poison from the address, -ENXIO is returned.
>  		The clear_poison attribute is only visible for devices
>  		supporting the capability.
> +
> +What:		/sys/kernel/debug/cxl/portX/dportY/einj_types

Given this file is identical contents for all dports it only needs to
exist in one common location

/sys/kernel/debug/cxl/einj_types


> +Date:		November, 2023
> +KernelVersion:	v6.8
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(RO) Prints the CXL protocol error types made available by
> +		the platform in the format "0x<error number>	<error type>".
> +		The <error number> can be written to einj_inject to inject
> +		<error type> into dportY. This file is only visible if
> +		CONFIG_ACPI_APEI_EINJ is enabled, and the EINJ module must
> +		be able to reach one (or both) of the CXL_ACPI or CXL_PORT
> +		modules to be functional.

This can be simplified. Have something like:

config CXL_EINJ
	default CXL_BUS
	depends on ACPI_APEI_EINJ && ACPI_APEI_EINJ=CXL_BUS
	...

Then the documentation moves to Kconfig for how to enable this and the
CXL code can directly call into the EINJ module without worry.

It would of course need stubs like these in a shared header:

#ifdef CONFIG_CXL_EINJ
int cxl_einj_available_error_type(struct seq_file *m, void *v);
int cxl_einj_inject_error(struct cxl_dport *dport, u64 type);
#else
static inline int cxl_einj_available_error_type(struct seq_file *m, void *v)
{
	return -ENXIO;
}

int cxl_einj_inject_error(struct cxl_dport *dport, u64 type)
{
	return -ENXIO;
}
#endif

> +
> +What:		/sys/kernel/debug/cxl/portX/dportY/einj_inject

See my comments on cxl_debugfs_create_dport_dir() later on, but I think
the "portX" directory can be eliminated.

> +Date:		November, 2023
> +KernelVersion:	v6.8
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(WO) Writing an integer to this file injects the corresponding
> +		CXL protocol error into dportY (integer to type mapping is
> +		available by reading from einj_types). If the dport was
> +		enumerated in RCH mode, a CXL 1.1 error is injected, otherwise
> +		a CXL 2.0 error is injected. This file is only visible if
> +		CONFIG_ACPI_APEI_EINJ is enabled, and the EINJ module must
> +		be able to reach one (or both) of the CXL_ACPI or CXL_PORT
> +		modules to be functional.

Similar comments about dropping these details that can just be solved in
Kconfig.

This is next comment is on EINJ ABI, but you can skip it just to
maintain momentum with the status quo. Why require the user to do the
string to integer conversion? Seems like a small matter of programming
to allow:

echo "CXL.cache Protocol Correctable" > einj_inject

...to do the right thing. That probably makes scripts more readable as
well.

> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 38441634e4c6..acf10415a174 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -783,6 +783,72 @@ static int cxl_dport_setup_regs(struct device *host, struct cxl_dport *dport,
>  	return rc;
>  }
>  
> +static struct cxl_einj_ops einj_ops;
> +void cxl_einj_set_ops_cbs(struct cxl_einj_ops *ops)
> +{
> +	if (!IS_REACHABLE(CONFIG_ACPI_APEI_EINJ) || !ops)
> +		return;
> +
> +	einj_ops = *ops;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_einj_set_ops_cbs, CXL);

einj_ops goes away when the CXL code can just call the EINJ module
directly.

> +
> +static int cxl_einj_type_show(struct seq_file *f, void *data)
> +{
> +	if (!einj_ops.einj_type)
> +		return -ENODEV;
> +
> +	return einj_ops.einj_type(f, data);
> +}
> +
> +static int cxl_einj_inject(void *data, u64 type)
> +{
> +	struct cxl_dport *dport = data;
> +
> +	if (!einj_ops.einj_inject)
> +		return -ENODEV;
> +
> +	return einj_ops.einj_inject(dport, type);
> +}
> +DEFINE_DEBUGFS_ATTRIBUTE(cxl_einj_inject_fops, NULL, cxl_einj_inject, "%llx\n");

The wrappers go away and DEFINE_DEBUGFS_ATTRIBUTE() can reference the
EINJ symbols directly.

> +
> +static int cxl_debugfs_create_dport_dir(struct dentry *port_dir,
> +						   struct cxl_dport *dport)
> +{
> +	struct dentry *dir;
> +	char dir_name[32];
> +
> +	snprintf(dir_name, 31, "dport%d", dport->port_id);

How about dev_name(dport->dport_dev) rather than the dynamic name?

The other benefit of this is that the dport_dev names are unique, so you
can move the einj_inject file to:

/sys/kernel/debug/cxl/$dport_dev/einj_inject

> +	dir = debugfs_create_dir(dir_name, port_dir);
> +	if (IS_ERR(dir))
> +		return PTR_ERR(dir);
> +
> +	debugfs_create_devm_seqfile(dport->dport_dev, "einj_types", dir,
> +				    cxl_einj_type_show);

Per above, move this to be a top-level file.

> +
> +	debugfs_create_file("einj_inject", 0200, dir, dport,
> +			    &cxl_einj_inject_fops);
> +	return 0;

debugfs is good about failing gracefully when pre-requisites are not
present. This is why none of the debugfs creation helpers have return
codes because failing to setup debugfs is never fatal.

In other words, it is ok to take the output of debugfs_create_dir()
without checking, and this function should not be returning an error.

> +}
> +
> +static struct dentry *cxl_debugfs_create_port_dir(struct cxl_port *port)
> +{
> +	const char *dir_name = dev_name(&port->dev);
> +	struct dentry *dir;
> +
> +	if (!IS_ENABLED(CONFIG_ACPI_APEI_EINJ))
> +		return ERR_PTR(-ENODEV);
> +
> +	dir = cxl_debugfs_create_dir(dir_name);
> +	if (IS_ERR(dir)) {
> +		dev_dbg(&port->dev, "Failed to create port debugfs dir: %ld\n",
> +			PTR_ERR(dir));
> +		return dir;
> +	}
> +
> +	return dir;
> +}
> +
>  static struct cxl_port *__devm_cxl_add_port(struct device *host,
>  					    struct device *uport_dev,
>  					    resource_size_t component_reg_phys,
> @@ -861,6 +927,7 @@ struct cxl_port *devm_cxl_add_port(struct device *host,
>  				   struct cxl_dport *parent_dport)
>  {
>  	struct cxl_port *port, *parent_port;
> +	struct dentry *dir;
>  
>  	port = __devm_cxl_add_port(host, uport_dev, component_reg_phys,
>  				   parent_dport);
> @@ -878,6 +945,10 @@ struct cxl_port *devm_cxl_add_port(struct device *host,
>  			parent_port ? " to " : "",
>  			parent_port ? dev_name(&parent_port->dev) : "",
>  			parent_port ? "" : " (root port)");
> +
> +		dir = cxl_debugfs_create_port_dir(port);
> +		if (!IS_ERR(dir))
> +			port->debug_dir = dir;
>  	}
>  
>  	return port;
> @@ -1127,6 +1198,7 @@ struct cxl_dport *devm_cxl_add_dport(struct cxl_port *port,
>  				     resource_size_t component_reg_phys)
>  {
>  	struct cxl_dport *dport;
> +	int rc;
>  
>  	dport = __devm_cxl_add_dport(port, dport_dev, port_id,
>  				     component_reg_phys, CXL_RESOURCE_NONE);
> @@ -1136,6 +1208,11 @@ struct cxl_dport *devm_cxl_add_dport(struct cxl_port *port,
>  	} else {
>  		dev_dbg(dport_dev, "dport added to %s\n",
>  			dev_name(&port->dev));
> +
> +		rc = cxl_debugfs_create_dport_dir(port->debug_dir, dport);
> +		if (rc)
> +			dev_dbg(dport_dev,
> +				"Failed to create dport debugfs dir: %d\n", rc);

Drop the debug messages about failing to setup debugfs. This follows the
lead of other debugfs setup in CXL.

>  	}
>  
>  	return dport;
> @@ -1156,6 +1233,7 @@ struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port,
>  					 resource_size_t rcrb)
>  {
>  	struct cxl_dport *dport;
> +	int rc;
>  
>  	if (rcrb == CXL_RESOURCE_NONE) {
>  		dev_dbg(&port->dev, "failed to add RCH dport, missing RCRB\n");
> @@ -1170,6 +1248,12 @@ struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port,
>  	} else {
>  		dev_dbg(dport_dev, "RCH dport added to %s\n",
>  			dev_name(&port->dev));
> +
> +		rc = cxl_debugfs_create_dport_dir(port->debug_dir, dport);
> +		if (rc)
> +			dev_dbg(dport_dev,
> +				"Failed to create rch dport debugfs dir: %d\n",
> +				rc);
>  	}
>  
>  	return dport;
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 687043ece101..3c7744fc3106 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -590,6 +590,7 @@ struct cxl_dax_region {
>   * @depth: How deep this port is relative to the root. depth 0 is the root.
>   * @cdat: Cached CDAT data
>   * @cdat_available: Should a CDAT attribute be available in sysfs
> + * @debug_dir: dentry for port in cxl debugfs (optional)
>   */
>  struct cxl_port {
>  	struct device dev;
> @@ -612,6 +613,7 @@ struct cxl_port {
>  		size_t length;
>  	} cdat;
>  	bool cdat_available;
> +	struct dentry *debug_dir;

Part of why I asked for the debugfs file rename was to eliminate this
wart on the data structure.

>  };
>  
>  static inline struct cxl_dport *
> @@ -813,6 +815,14 @@ bool is_cxl_nvdimm_bridge(struct device *dev);
>  int devm_cxl_add_nvdimm(struct cxl_memdev *cxlmd);
>  struct cxl_nvdimm_bridge *cxl_find_nvdimm_bridge(struct cxl_memdev *cxlmd);
>  
> +struct cxl_einj_ops {
> +	int (*einj_type)(struct seq_file *f, void *data);
> +	int (*einj_inject)(struct cxl_dport *dport, u64 type);
> +};
> +
> +void cxl_einj_set_ops_cbs(struct cxl_einj_ops *ops);
> +
> +
>  #ifdef CONFIG_CXL_REGION
>  bool is_cxl_pmem_region(struct device *dev);
>  struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev);
> -- 
> 2.34.1
> 
> 




^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v7 2/5] ACPI: Add CXL protocol error defines
  2023-11-28 16:06 ` [PATCH v7 2/5] ACPI: Add CXL protocol error defines Ben Cheatham
@ 2023-12-07 23:28   ` Dan Williams
  0 siblings, 0 replies; 16+ messages in thread
From: Dan Williams @ 2023-12-07 23:28 UTC (permalink / raw)
  To: Ben Cheatham, dave, jonathan.cameron, dave.jiang,
	alison.schofield, vishal.l.verma, ira.weiny, dan.j.williams,
	rafael
  Cc: benjamin.cheatham, yazen.ghannam, linux-cxl, linux-acpi

Ben Cheatham wrote:
> Add CXL protocol error defines to include/actbl1.h.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
> 
> ---
> I made a pull request for adding this support into
> the acpica project that has already been merged [1],
> Feel free to discard this commit and use the acpica
> changes instead.
> 
> [1]:
> Link: https://github.com/acpica/acpica/pull/884

Yes, once this makes it into the acpica kernel branch then I can move
ahead with a CXL topic branch for this series.

Looks like we're still awaiting that update event.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v7 3/5] EINJ: Separate CXL errors from other EINJ errors
  2023-11-28 16:06 ` [PATCH v7 3/5] EINJ: Separate CXL errors from other EINJ errors Ben Cheatham
@ 2023-12-07 23:30   ` Dan Williams
  0 siblings, 0 replies; 16+ messages in thread
From: Dan Williams @ 2023-12-07 23:30 UTC (permalink / raw)
  To: Ben Cheatham, dave, jonathan.cameron, dave.jiang,
	alison.schofield, vishal.l.verma, ira.weiny, dan.j.williams,
	rafael
  Cc: benjamin.cheatham, yazen.ghannam, linux-cxl, linux-acpi

Ben Cheatham wrote:
> Separate CXL error types from other EINJ error types and disallow them
> in the legacy EINJ interface under /sys/kernel/debug/apei/einj. Support
> for the CXL error types will be added under /sys/kernel/debug/cxl in the
> next commit.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>

Looks good to me. Would be nice to get an ack from ACPI maintainer so I
can call that out to Linus about the cross-subsystem work.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v7 4/5] cxl/port, EINJ: Add CXL EINJ callback functions
  2023-11-28 16:06 ` [PATCH v7 4/5] cxl/port, EINJ: Add CXL EINJ callback functions Ben Cheatham
@ 2023-12-08  0:03   ` Dan Williams
  2023-12-08 16:22     ` Ben Cheatham
  0 siblings, 1 reply; 16+ messages in thread
From: Dan Williams @ 2023-12-08  0:03 UTC (permalink / raw)
  To: Ben Cheatham, dave, jonathan.cameron, dave.jiang,
	alison.schofield, vishal.l.verma, ira.weiny, dan.j.williams,
	rafael
  Cc: benjamin.cheatham, yazen.ghannam, linux-cxl, linux-acpi

Ben Cheatham wrote:
> Add functions to the EINJ module to be used in the CXL module for CXL
> protocol error injection. The callbacks implement the einj_types and
> einj_inject files under /sys/kernel/debug/cxl/portX/dportY. These two
> files work in the same way as the available_error_types and error_inject
> files under /sys/kernel/debug/apei/einj, but only for CXL error types.
> If the dport is enumerated in RCH (CXL 1.1) mode, a CXL 1.1 error is
> injected into the dport; a CXL 2.0 error is injected otherwise.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
> ---
>  drivers/acpi/apei/Kconfig |   3 ++
>  drivers/acpi/apei/einj.c  | 105 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 108 insertions(+)
> 
> diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
> index 6b18f8bc7be3..100c27beb581 100644
> --- a/drivers/acpi/apei/Kconfig
> +++ b/drivers/acpi/apei/Kconfig
> @@ -11,6 +11,9 @@ config ACPI_APEI
>  	select PSTORE
>  	select UEFI_CPER
>  	depends on HAVE_ACPI_APEI
> +	imply CXL_BUS
> +	imply CXL_ACPI
> +	imply CXL_PORT

This goes away with the CONFIG_CXL_EINJ scheme proposed on patch1.

>  	help
>  	  APEI allows to report errors (for example from the chipset)
>  	  to the operating system. This improves NMI handling
> diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
> index 330329ac2f1f..98d5e6d60a02 100644
> --- a/drivers/acpi/apei/einj.c
> +++ b/drivers/acpi/apei/einj.c
> @@ -21,9 +21,11 @@
>  #include <linux/nmi.h>
>  #include <linux/delay.h>
>  #include <linux/mm.h>
> +#include <linux/pci.h>
>  #include <asm/unaligned.h>
>  
>  #include "apei-internal.h"
> +#include "../../cxl/cxl.h"

EINJ has no business knowing all of the details in cxl.h. In fact all
EINJ cares about is dport->dport_dev and dport->rch. I think just add
those to the calling convention, see below:

>  #undef pr_fmt
>  #define pr_fmt(fmt) "EINJ: " fmt
> @@ -627,6 +629,25 @@ static int available_error_type_show(struct seq_file *m, void *v)
>  
>  DEFINE_SHOW_ATTRIBUTE(available_error_type);
>  
> +static int cxl_einj_available_error_type(struct seq_file *m, void *v)
> +{
> +	int cxl_err, rc;
> +	u32 available_error_type = 0;
> +
> +	rc = einj_get_available_error_type(&available_error_type);
> +	if (rc)
> +		return rc;
> +
> +	for (int pos = 0; pos < ARRAY_SIZE(einj_cxl_error_type_string); pos++) {
> +		cxl_err = ACPI_EINJ_CXL_CACHE_CORRECTABLE << pos;
> +
> +		if (available_error_type & cxl_err)
> +			seq_puts(m, einj_cxl_error_type_string[pos]);
> +	}
> +
> +	return 0;
> +}
> +
>  static int validate_error_type(u64 type)
>  {
>  	u32 tval, vendor, available_error_type = 0;
> @@ -657,6 +678,68 @@ static int validate_error_type(u64 type)
>  	return 0;
>  }
>  
> +static int cxl_dport_get_sbdf(struct cxl_dport *dport, u64 *sbdf)
> +{
> +	struct pci_dev *pdev;
> +	struct pci_bus *pbus;
> +	struct pci_host_bridge *bridge;
> +	u64 seg = 0, bus;
> +
> +	if (!dev_is_pci(dport->dport_dev))
> +		return -EINVAL;
> +
> +	pdev = to_pci_dev(dport->dport_dev);
> +	pbus = pdev->bus;
> +	bridge = pci_find_host_bridge(pbus);
> +
> +	if (!bridge)
> +		return -ENODEV;
> +
> +	if (bridge->domain_nr != PCI_DOMAIN_NR_NOT_SET)
> +		seg = bridge->domain_nr << 24;
> +
> +	bus = pbus->number << 16;
> +	*sbdf = seg | bus | pdev->devfn;
> +
> +	return 0;
> +}
> +
> +static int cxl_einj_inject_error(struct cxl_dport *dport, u64 type)

If you split this into cxl_einj_inject_error() and
cxl_einj_inject_rch_error() with following prototypes:

static int cxl_einj_inject_error(struct pci_dev *dport_dev, u64 type)
static int cxl_einj_inject_rch_error(u64 rcrb, u64 type)

...then you don't need any of the cxl.h definitions. The dev_is_pci()
and rch determination details can remain private to CXL.

> +{
> +	u64 param1 = 0, param2 = 0, param4 = 0;
> +	u32 flags;
> +	int rc;
> +
> +	/* Only CXL error types can be specified */
> +	if (type & ~CXL_ERROR_MASK || (type & ACPI5_VENDOR_BIT))
> +		return -EINVAL;
> +
> +	rc = validate_error_type(type);
> +	if (rc)
> +		return rc;
> +
> +	/*
> +	 * If dport is in restricted mode, inject a CXL 1.1 error,
> +	 * otherwise inject a CXL 2.0 error
> +	 */
> +	if (dport->rch) {
> +		if (dport->rcrb.base == CXL_RESOURCE_NONE)
> +			return -EINVAL;
> +
> +		param1 = (u64) dport->rcrb.base;
> +		param2 = 0xfffffffffffff000;
> +		flags = 0x2;
> +	} else {
> +		rc = cxl_dport_get_sbdf(dport, &param4);
> +		if (rc)
> +			return rc;
> +
> +		flags = 0x4;
> +	}
> +
> +	return einj_error_inject(type, flags, param1, param2, 0, param4);
> +}
> +
>  static int error_type_get(void *data, u64 *val)
>  {
>  	*val = error_type;
> @@ -668,6 +751,7 @@ static int error_type_set(void *data, u64 val)
>  {
>  	int rc;
>  
> +	/* CXL error types have to be injected from cxl debugfs */
>  	if (val & CXL_ERROR_MASK && !(val & ACPI5_VENDOR_BIT))
>  		return -EINVAL;
>  
> @@ -714,6 +798,7 @@ static int __init einj_init(void)
>  {
>  	int rc;
>  	acpi_status status;
> +	struct cxl_einj_ops cxl_ops;
>  	struct apei_exec_context ctx;
>  
>  	if (acpi_disabled) {
> @@ -793,6 +878,15 @@ static int __init einj_init(void)
>  				   einj_debug_dir, &vendor_flags);
>  	}
>  
> +	if (IS_REACHABLE(CONFIG_CXL_ACPI) || IS_REACHABLE(CONFIG_CXL_PORT)) {
> +		cxl_ops = (struct cxl_einj_ops) {
> +			.einj_type = cxl_einj_available_error_type,
> +			.einj_inject = cxl_einj_inject_error,
> +		};
> +
> +		cxl_einj_set_ops_cbs(&cxl_ops);
> +	}

This goes away with the new Kconfig dependency scheme.

> +
>  	pr_info("Error INJection is initialized.\n");
>  
>  	return 0;
> @@ -810,8 +904,18 @@ static int __init einj_init(void)
>  
>  static void __exit einj_exit(void)
>  {
> +	struct cxl_einj_ops cxl_ops;
>  	struct apei_exec_context ctx;
>  
> +	if (IS_REACHABLE(CONFIG_CXL_ACPI) || IS_REACHABLE(CONFIG_CXL_PORT)) {
> +		cxl_ops = (struct cxl_einj_ops) {
> +			.einj_type = NULL,
> +			.einj_inject = NULL,
> +		};
> +
> +		cxl_einj_set_ops_cbs(&cxl_ops);
> +	}
> +
>  	if (einj_param) {
>  		acpi_size size = (acpi5) ?
>  			sizeof(struct set_error_type_with_address) :
> @@ -832,4 +936,5 @@ module_exit(einj_exit);
>  
>  MODULE_AUTHOR("Huang Ying");
>  MODULE_DESCRIPTION("APEI Error INJection support");
> +MODULE_IMPORT_NS(CXL);

EINJ never references a CXL symbol in the new proposed scheme.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v7 1/5] cxl/port: Add EINJ debugfs files and callback support
  2023-12-07 23:13   ` Dan Williams
@ 2023-12-08 16:22     ` Ben Cheatham
  2023-12-08 18:01       ` Dan Williams
  0 siblings, 1 reply; 16+ messages in thread
From: Ben Cheatham @ 2023-12-08 16:22 UTC (permalink / raw)
  To: Dan Williams, dave, jonathan.cameron, dave.jiang,
	alison.schofield, vishal.l.verma, ira.weiny, rafael
  Cc: yazen.ghannam, linux-cxl, linux-acpi

Thanks for taking a look Dan! Replies inline.

On 12/7/23 5:13 PM, Dan Williams wrote:
> Ben Cheatham wrote:
>> Add creation of debugfs directories for ports and dports under
>> /sys/kernel/debug/cxl when EINJ support is enabled. The dport
>> directories will contain files for injecting CXL protocol errors.
>> These files are only usable once the EINJ module has loaded and
>> registered callback functions with the CXL core module, before that
>> occurs (or if the EINJ module isn't loaded) the files will do nothing
>> and return an ENODEV error.
>>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
>> ---
>>  Documentation/ABI/testing/debugfs-cxl | 27 +++++++++
>>  drivers/cxl/core/port.c               | 84 +++++++++++++++++++++++++++
>>  drivers/cxl/cxl.h                     | 10 ++++
>>  3 files changed, 121 insertions(+)
>>
>> diff --git a/Documentation/ABI/testing/debugfs-cxl b/Documentation/ABI/testing/debugfs-cxl
>> index fe61d372e3fa..782a1bb78884 100644
>> --- a/Documentation/ABI/testing/debugfs-cxl
>> +++ b/Documentation/ABI/testing/debugfs-cxl
>> @@ -33,3 +33,30 @@ Description:
>>  		device cannot clear poison from the address, -ENXIO is returned.
>>  		The clear_poison attribute is only visible for devices
>>  		supporting the capability.
>> +
>> +What:		/sys/kernel/debug/cxl/portX/dportY/einj_types
> 
> Given this file is identical contents for all dports it only needs to
> exist in one common location
> 
> /sys/kernel/debug/cxl/einj_types
> 

Good point, I'll make that change.

> 
>> +Date:		November, 2023
>> +KernelVersion:	v6.8
>> +Contact:	linux-cxl@vger.kernel.org
>> +Description:
>> +		(RO) Prints the CXL protocol error types made available by
>> +		the platform in the format "0x<error number>	<error type>".
>> +		The <error number> can be written to einj_inject to inject
>> +		<error type> into dportY. This file is only visible if
>> +		CONFIG_ACPI_APEI_EINJ is enabled, and the EINJ module must
>> +		be able to reach one (or both) of the CXL_ACPI or CXL_PORT
>> +		modules to be functional.
> 
> This can be simplified. Have something like:
> 
> config CXL_EINJ
> 	default CXL_BUS
> 	depends on ACPI_APEI_EINJ && ACPI_APEI_EINJ=CXL_BUS
> 	...
> 
> Then the documentation moves to Kconfig for how to enable this and the
> CXL code can directly call into the EINJ module without worry.
> 
> It would of course need stubs like these in a shared header:
> 
> #ifdef CONFIG_CXL_EINJ
> int cxl_einj_available_error_type(struct seq_file *m, void *v);
> int cxl_einj_inject_error(struct cxl_dport *dport, u64 type);
> #else
> static inline int cxl_einj_available_error_type(struct seq_file *m, void *v)
> {
> 	return -ENXIO;
> }
> 
> int cxl_einj_inject_error(struct cxl_dport *dport, u64 type)
> {
> 	return -ENXIO;
> }
> #endif
> 

I had to go back and take a look, but I had a shared header in v5 (link: https://lore.kernel.org/linux-cxl/20230926120418.0000575d@Huawei.com/). 
Jonathan recommended that I instead include cxl.h directly, but that was pretty much a completely different patch set
at the time (and the header was under include/linux/). That being said, I agree that a header under drivers/cxl would
make much more sense here.

>> +
>> +What:		/sys/kernel/debug/cxl/portX/dportY/einj_inject
> 
> See my comments on cxl_debugfs_create_dport_dir() later on, but I think
> the "portX" directory can be eliminated.
> 
>> +Date:		November, 2023
>> +KernelVersion:	v6.8
>> +Contact:	linux-cxl@vger.kernel.org
>> +Description:
>> +		(WO) Writing an integer to this file injects the corresponding
>> +		CXL protocol error into dportY (integer to type mapping is
>> +		available by reading from einj_types). If the dport was
>> +		enumerated in RCH mode, a CXL 1.1 error is injected, otherwise
>> +		a CXL 2.0 error is injected. This file is only visible if
>> +		CONFIG_ACPI_APEI_EINJ is enabled, and the EINJ module must
>> +		be able to reach one (or both) of the CXL_ACPI or CXL_PORT
>> +		modules to be functional.
> 
> Similar comments about dropping these details that can just be solved in
> Kconfig.
> 
> This is next comment is on EINJ ABI, but you can skip it just to
> maintain momentum with the status quo. Why require the user to do the
> string to integer conversion? Seems like a small matter of programming
> to allow:
> 
> echo "CXL.cache Protocol Correctable" > einj_inject
> 
> ...to do the right thing. That probably makes scripts more readable as
> well.
> 

That's a good point. I can do that, but I think it may be better to keep the
consistency with the EINJ module to simplify things for end users. If you feel
that isn't a big enough concern I can go ahead and modify it.

>> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
>> index 38441634e4c6..acf10415a174 100644
>> --- a/drivers/cxl/core/port.c
>> +++ b/drivers/cxl/core/port.c
>> @@ -783,6 +783,72 @@ static int cxl_dport_setup_regs(struct device *host, struct cxl_dport *dport,
>>  	return rc;
>>  }
>>  
>> +static struct cxl_einj_ops einj_ops;
>> +void cxl_einj_set_ops_cbs(struct cxl_einj_ops *ops)
>> +{
>> +	if (!IS_REACHABLE(CONFIG_ACPI_APEI_EINJ) || !ops)
>> +		return;
>> +
>> +	einj_ops = *ops;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_einj_set_ops_cbs, CXL);
> 
> einj_ops goes away when the CXL code can just call the EINJ module
> directly.
> 
>> +
>> +static int cxl_einj_type_show(struct seq_file *f, void *data)
>> +{
>> +	if (!einj_ops.einj_type)
>> +		return -ENODEV;
>> +
>> +	return einj_ops.einj_type(f, data);
>> +}
>> +
>> +static int cxl_einj_inject(void *data, u64 type)
>> +{
>> +	struct cxl_dport *dport = data;
>> +
>> +	if (!einj_ops.einj_inject)
>> +		return -ENODEV;
>> +
>> +	return einj_ops.einj_inject(dport, type);
>> +}
>> +DEFINE_DEBUGFS_ATTRIBUTE(cxl_einj_inject_fops, NULL, cxl_einj_inject, "%llx\n");
> 
> The wrappers go away and DEFINE_DEBUGFS_ATTRIBUTE() can reference the
> EINJ symbols directly.
> 
>> +
>> +static int cxl_debugfs_create_dport_dir(struct dentry *port_dir,
>> +						   struct cxl_dport *dport)
>> +{
>> +	struct dentry *dir;
>> +	char dir_name[32];
>> +
>> +	snprintf(dir_name, 31, "dport%d", dport->port_id);
> 
> How about dev_name(dport->dport_dev) rather than the dynamic name?
> 
> The other benefit of this is that the dport_dev names are unique, so you
> can move the einj_inject file to:
> 
> /sys/kernel/debug/cxl/$dport_dev/einj_inject
> 

I didn't realize the dport names were also unique. I'll go ahead and do that instead.

>> +	dir = debugfs_create_dir(dir_name, port_dir);
>> +	if (IS_ERR(dir))
>> +		return PTR_ERR(dir);
>> +
>> +	debugfs_create_devm_seqfile(dport->dport_dev, "einj_types", dir,
>> +				    cxl_einj_type_show);
> 
> Per above, move this to be a top-level file.
> 

Will do.

>> +
>> +	debugfs_create_file("einj_inject", 0200, dir, dport,
>> +			    &cxl_einj_inject_fops);
>> +	return 0;
> 
> debugfs is good about failing gracefully when pre-requisites are not
> present. This is why none of the debugfs creation helpers have return
> codes because failing to setup debugfs is never fatal.
> 
> In other words, it is ok to take the output of debugfs_create_dir()
> without checking, and this function should not be returning an error.
> 

Will do.

>> +}
>> +
>> +static struct dentry *cxl_debugfs_create_port_dir(struct cxl_port *port)
>> +{
>> +	const char *dir_name = dev_name(&port->dev);
>> +	struct dentry *dir;
>> +
>> +	if (!IS_ENABLED(CONFIG_ACPI_APEI_EINJ))
>> +		return ERR_PTR(-ENODEV);
>> +
>> +	dir = cxl_debugfs_create_dir(dir_name);
>> +	if (IS_ERR(dir)) {
>> +		dev_dbg(&port->dev, "Failed to create port debugfs dir: %ld\n",
>> +			PTR_ERR(dir));
>> +		return dir;
>> +	}
>> +
>> +	return dir;
>> +}
>> +
>>  static struct cxl_port *__devm_cxl_add_port(struct device *host,
>>  					    struct device *uport_dev,
>>  					    resource_size_t component_reg_phys,
>> @@ -861,6 +927,7 @@ struct cxl_port *devm_cxl_add_port(struct device *host,
>>  				   struct cxl_dport *parent_dport)
>>  {
>>  	struct cxl_port *port, *parent_port;
>> +	struct dentry *dir;
>>  
>>  	port = __devm_cxl_add_port(host, uport_dev, component_reg_phys,
>>  				   parent_dport);
>> @@ -878,6 +945,10 @@ struct cxl_port *devm_cxl_add_port(struct device *host,
>>  			parent_port ? " to " : "",
>>  			parent_port ? dev_name(&parent_port->dev) : "",
>>  			parent_port ? "" : " (root port)");
>> +
>> +		dir = cxl_debugfs_create_port_dir(port);
>> +		if (!IS_ERR(dir))
>> +			port->debug_dir = dir;
>>  	}
>>  
>>  	return port;
>> @@ -1127,6 +1198,7 @@ struct cxl_dport *devm_cxl_add_dport(struct cxl_port *port,
>>  				     resource_size_t component_reg_phys)
>>  {
>>  	struct cxl_dport *dport;
>> +	int rc;
>>  
>>  	dport = __devm_cxl_add_dport(port, dport_dev, port_id,
>>  				     component_reg_phys, CXL_RESOURCE_NONE);
>> @@ -1136,6 +1208,11 @@ struct cxl_dport *devm_cxl_add_dport(struct cxl_port *port,
>>  	} else {
>>  		dev_dbg(dport_dev, "dport added to %s\n",
>>  			dev_name(&port->dev));
>> +
>> +		rc = cxl_debugfs_create_dport_dir(port->debug_dir, dport);
>> +		if (rc)
>> +			dev_dbg(dport_dev,
>> +				"Failed to create dport debugfs dir: %d\n", rc);
> 
> Drop the debug messages about failing to setup debugfs. This follows the
> lead of other debugfs setup in CXL.
> 

Will do.

>>  	}
>>  
>>  	return dport;
>> @@ -1156,6 +1233,7 @@ struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port,
>>  					 resource_size_t rcrb)
>>  {
>>  	struct cxl_dport *dport;
>> +	int rc;
>>  
>>  	if (rcrb == CXL_RESOURCE_NONE) {
>>  		dev_dbg(&port->dev, "failed to add RCH dport, missing RCRB\n");
>> @@ -1170,6 +1248,12 @@ struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port,
>>  	} else {
>>  		dev_dbg(dport_dev, "RCH dport added to %s\n",
>>  			dev_name(&port->dev));
>> +
>> +		rc = cxl_debugfs_create_dport_dir(port->debug_dir, dport);
>> +		if (rc)
>> +			dev_dbg(dport_dev,
>> +				"Failed to create rch dport debugfs dir: %d\n",
>> +				rc);
>>  	}
>>  
>>  	return dport;
>> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
>> index 687043ece101..3c7744fc3106 100644
>> --- a/drivers/cxl/cxl.h
>> +++ b/drivers/cxl/cxl.h
>> @@ -590,6 +590,7 @@ struct cxl_dax_region {
>>   * @depth: How deep this port is relative to the root. depth 0 is the root.
>>   * @cdat: Cached CDAT data
>>   * @cdat_available: Should a CDAT attribute be available in sysfs
>> + * @debug_dir: dentry for port in cxl debugfs (optional)
>>   */
>>  struct cxl_port {
>>  	struct device dev;
>> @@ -612,6 +613,7 @@ struct cxl_port {
>>  		size_t length;
>>  	} cdat;
>>  	bool cdat_available;
>> +	struct dentry *debug_dir;
> 
> Part of why I asked for the debugfs file rename was to eliminate this
> wart on the data structure.
> 

Yeah I wasn't that happy about adding it, so I'd be happy to remove it!

>>  };
>>  
>>  static inline struct cxl_dport *
>> @@ -813,6 +815,14 @@ bool is_cxl_nvdimm_bridge(struct device *dev);
>>  int devm_cxl_add_nvdimm(struct cxl_memdev *cxlmd);
>>  struct cxl_nvdimm_bridge *cxl_find_nvdimm_bridge(struct cxl_memdev *cxlmd);
>>  
>> +struct cxl_einj_ops {
>> +	int (*einj_type)(struct seq_file *f, void *data);
>> +	int (*einj_inject)(struct cxl_dport *dport, u64 type);
>> +};
>> +
>> +void cxl_einj_set_ops_cbs(struct cxl_einj_ops *ops);
>> +
>> +
>>  #ifdef CONFIG_CXL_REGION
>>  bool is_cxl_pmem_region(struct device *dev);
>>  struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev);
>> -- 
>> 2.34.1
>>
>>
> 
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v7 4/5] cxl/port, EINJ: Add CXL EINJ callback functions
  2023-12-08  0:03   ` Dan Williams
@ 2023-12-08 16:22     ` Ben Cheatham
  0 siblings, 0 replies; 16+ messages in thread
From: Ben Cheatham @ 2023-12-08 16:22 UTC (permalink / raw)
  To: Dan Williams, dave, jonathan.cameron, dave.jiang,
	alison.schofield, vishal.l.verma, ira.weiny, rafael
  Cc: yazen.ghannam, linux-cxl, linux-acpi



On 12/7/23 6:03 PM, Dan Williams wrote:
> Ben Cheatham wrote:
>> Add functions to the EINJ module to be used in the CXL module for CXL
>> protocol error injection. The callbacks implement the einj_types and
>> einj_inject files under /sys/kernel/debug/cxl/portX/dportY. These two
>> files work in the same way as the available_error_types and error_inject
>> files under /sys/kernel/debug/apei/einj, but only for CXL error types.
>> If the dport is enumerated in RCH (CXL 1.1) mode, a CXL 1.1 error is
>> injected into the dport; a CXL 2.0 error is injected otherwise.
>>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com>
>> ---
>>  drivers/acpi/apei/Kconfig |   3 ++
>>  drivers/acpi/apei/einj.c  | 105 ++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 108 insertions(+)
>>
>> diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
>> index 6b18f8bc7be3..100c27beb581 100644
>> --- a/drivers/acpi/apei/Kconfig
>> +++ b/drivers/acpi/apei/Kconfig
>> @@ -11,6 +11,9 @@ config ACPI_APEI
>>  	select PSTORE
>>  	select UEFI_CPER
>>  	depends on HAVE_ACPI_APEI
>> +	imply CXL_BUS
>> +	imply CXL_ACPI
>> +	imply CXL_PORT
> 
> This goes away with the CONFIG_CXL_EINJ scheme proposed on patch1.
> 
>>  	help
>>  	  APEI allows to report errors (for example from the chipset)
>>  	  to the operating system. This improves NMI handling
>> diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
>> index 330329ac2f1f..98d5e6d60a02 100644
>> --- a/drivers/acpi/apei/einj.c
>> +++ b/drivers/acpi/apei/einj.c
>> @@ -21,9 +21,11 @@
>>  #include <linux/nmi.h>
>>  #include <linux/delay.h>
>>  #include <linux/mm.h>
>> +#include <linux/pci.h>
>>  #include <asm/unaligned.h>
>>  
>>  #include "apei-internal.h"
>> +#include "../../cxl/cxl.h"
> 
> EINJ has no business knowing all of the details in cxl.h. In fact all
> EINJ cares about is dport->dport_dev and dport->rch. I think just add
> those to the calling convention, see below:
> 
>>  #undef pr_fmt
>>  #define pr_fmt(fmt) "EINJ: " fmt
>> @@ -627,6 +629,25 @@ static int available_error_type_show(struct seq_file *m, void *v)
>>  
>>  DEFINE_SHOW_ATTRIBUTE(available_error_type);
>>  
>> +static int cxl_einj_available_error_type(struct seq_file *m, void *v)
>> +{
>> +	int cxl_err, rc;
>> +	u32 available_error_type = 0;
>> +
>> +	rc = einj_get_available_error_type(&available_error_type);
>> +	if (rc)
>> +		return rc;
>> +
>> +	for (int pos = 0; pos < ARRAY_SIZE(einj_cxl_error_type_string); pos++) {
>> +		cxl_err = ACPI_EINJ_CXL_CACHE_CORRECTABLE << pos;
>> +
>> +		if (available_error_type & cxl_err)
>> +			seq_puts(m, einj_cxl_error_type_string[pos]);
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>  static int validate_error_type(u64 type)
>>  {
>>  	u32 tval, vendor, available_error_type = 0;
>> @@ -657,6 +678,68 @@ static int validate_error_type(u64 type)
>>  	return 0;
>>  }
>>  
>> +static int cxl_dport_get_sbdf(struct cxl_dport *dport, u64 *sbdf)
>> +{
>> +	struct pci_dev *pdev;
>> +	struct pci_bus *pbus;
>> +	struct pci_host_bridge *bridge;
>> +	u64 seg = 0, bus;
>> +
>> +	if (!dev_is_pci(dport->dport_dev))
>> +		return -EINVAL;
>> +
>> +	pdev = to_pci_dev(dport->dport_dev);
>> +	pbus = pdev->bus;
>> +	bridge = pci_find_host_bridge(pbus);
>> +
>> +	if (!bridge)
>> +		return -ENODEV;
>> +
>> +	if (bridge->domain_nr != PCI_DOMAIN_NR_NOT_SET)
>> +		seg = bridge->domain_nr << 24;
>> +
>> +	bus = pbus->number << 16;
>> +	*sbdf = seg | bus | pdev->devfn;
>> +
>> +	return 0;
>> +}
>> +
>> +static int cxl_einj_inject_error(struct cxl_dport *dport, u64 type)
> 
> If you split this into cxl_einj_inject_error() and
> cxl_einj_inject_rch_error() with following prototypes:
> 
> static int cxl_einj_inject_error(struct pci_dev *dport_dev, u64 type)
> static int cxl_einj_inject_rch_error(u64 rcrb, u64 type)
> 
> ...then you don't need any of the cxl.h definitions. The dev_is_pci()
> and rch determination details can remain private to CXL.
> 

Good suggestion. Will do!

Thanks,
Ben

>> +{
>> +	u64 param1 = 0, param2 = 0, param4 = 0;
>> +	u32 flags;
>> +	int rc;
>> +
>> +	/* Only CXL error types can be specified */
>> +	if (type & ~CXL_ERROR_MASK || (type & ACPI5_VENDOR_BIT))
>> +		return -EINVAL;
>> +
>> +	rc = validate_error_type(type);
>> +	if (rc)
>> +		return rc;
>> +
>> +	/*
>> +	 * If dport is in restricted mode, inject a CXL 1.1 error,
>> +	 * otherwise inject a CXL 2.0 error
>> +	 */
>> +	if (dport->rch) {
>> +		if (dport->rcrb.base == CXL_RESOURCE_NONE)
>> +			return -EINVAL;
>> +
>> +		param1 = (u64) dport->rcrb.base;
>> +		param2 = 0xfffffffffffff000;
>> +		flags = 0x2;
>> +	} else {
>> +		rc = cxl_dport_get_sbdf(dport, &param4);
>> +		if (rc)
>> +			return rc;
>> +
>> +		flags = 0x4;
>> +	}
>> +
>> +	return einj_error_inject(type, flags, param1, param2, 0, param4);
>> +}
>> +
>>  static int error_type_get(void *data, u64 *val)
>>  {
>>  	*val = error_type;
>> @@ -668,6 +751,7 @@ static int error_type_set(void *data, u64 val)
>>  {
>>  	int rc;
>>  
>> +	/* CXL error types have to be injected from cxl debugfs */
>>  	if (val & CXL_ERROR_MASK && !(val & ACPI5_VENDOR_BIT))
>>  		return -EINVAL;
>>  
>> @@ -714,6 +798,7 @@ static int __init einj_init(void)
>>  {
>>  	int rc;
>>  	acpi_status status;
>> +	struct cxl_einj_ops cxl_ops;
>>  	struct apei_exec_context ctx;
>>  
>>  	if (acpi_disabled) {
>> @@ -793,6 +878,15 @@ static int __init einj_init(void)
>>  				   einj_debug_dir, &vendor_flags);
>>  	}
>>  
>> +	if (IS_REACHABLE(CONFIG_CXL_ACPI) || IS_REACHABLE(CONFIG_CXL_PORT)) {
>> +		cxl_ops = (struct cxl_einj_ops) {
>> +			.einj_type = cxl_einj_available_error_type,
>> +			.einj_inject = cxl_einj_inject_error,
>> +		};
>> +
>> +		cxl_einj_set_ops_cbs(&cxl_ops);
>> +	}
> 
> This goes away with the new Kconfig dependency scheme.
> 
>> +
>>  	pr_info("Error INJection is initialized.\n");
>>  
>>  	return 0;
>> @@ -810,8 +904,18 @@ static int __init einj_init(void)
>>  
>>  static void __exit einj_exit(void)
>>  {
>> +	struct cxl_einj_ops cxl_ops;
>>  	struct apei_exec_context ctx;
>>  
>> +	if (IS_REACHABLE(CONFIG_CXL_ACPI) || IS_REACHABLE(CONFIG_CXL_PORT)) {
>> +		cxl_ops = (struct cxl_einj_ops) {
>> +			.einj_type = NULL,
>> +			.einj_inject = NULL,
>> +		};
>> +
>> +		cxl_einj_set_ops_cbs(&cxl_ops);
>> +	}
>> +
>>  	if (einj_param) {
>>  		acpi_size size = (acpi5) ?
>>  			sizeof(struct set_error_type_with_address) :
>> @@ -832,4 +936,5 @@ module_exit(einj_exit);
>>  
>>  MODULE_AUTHOR("Huang Ying");
>>  MODULE_DESCRIPTION("APEI Error INJection support");
>> +MODULE_IMPORT_NS(CXL);
> 
> EINJ never references a CXL symbol in the new proposed scheme.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v7 1/5] cxl/port: Add EINJ debugfs files and callback support
  2023-12-08 16:22     ` Ben Cheatham
@ 2023-12-08 18:01       ` Dan Williams
  2023-12-08 20:35         ` Ben Cheatham
  0 siblings, 1 reply; 16+ messages in thread
From: Dan Williams @ 2023-12-08 18:01 UTC (permalink / raw)
  To: Ben Cheatham, Dan Williams, dave, jonathan.cameron, dave.jiang,
	alison.schofield, vishal.l.verma, ira.weiny, rafael
  Cc: yazen.ghannam, linux-cxl, linux-acpi

Ben Cheatham wrote:
[..]
> > This can be simplified. Have something like:
> > 
> > config CXL_EINJ
> > 	default CXL_BUS
> > 	depends on ACPI_APEI_EINJ && ACPI_APEI_EINJ=CXL_BUS
> > 	...
> > 
> > Then the documentation moves to Kconfig for how to enable this and the
> > CXL code can directly call into the EINJ module without worry.
> > 
> > It would of course need stubs like these in a shared header:
> > 
> > #ifdef CONFIG_CXL_EINJ
> > int cxl_einj_available_error_type(struct seq_file *m, void *v);
> > int cxl_einj_inject_error(struct cxl_dport *dport, u64 type);
> > #else
> > static inline int cxl_einj_available_error_type(struct seq_file *m, void *v)
> > {
> > 	return -ENXIO;
> > }
> > 
> > int cxl_einj_inject_error(struct cxl_dport *dport, u64 type)
> > {
> > 	return -ENXIO;
> > }
> > #endif
> > 
> 
> I had to go back and take a look, but I had a shared header in v5
> (link:
> https://lore.kernel.org/linux-cxl/20230926120418.0000575d@Huawei.com/).
> Jonathan recommended that I instead include cxl.h directly, but that
> was pretty much a completely different patch set at the time (and the
> header was under include/linux/). That being said, I agree that a
> header under drivers/cxl would make much more sense here.

I agree with Jonathan that there are still cases where it makes sense to
do the relative include thing, but cxl_pmu is an intimate extenstion of
the CXL subsystem that just happens to live in a another directory. This
EINJ support is a handful of functions to communicate between modules
with no need for EINJ to understand all the gory details in cxl.h.

[..]
> >> +Date:               November, 2023
> >> +KernelVersion:      v6.8
> >> +Contact:    linux-cxl@vger.kernel.org
> >> +Description:
> >> +            (WO) Writing an integer to this file injects the corresponding
> >> +            CXL protocol error into dportY (integer to type mapping is
> >> +            available by reading from einj_types). If the dport was
> >> +            enumerated in RCH mode, a CXL 1.1 error is injected, otherwise
> >> +            a CXL 2.0 error is injected. This file is only visible if
> >> +            CONFIG_ACPI_APEI_EINJ is enabled, and the EINJ module must
> >> +            be able to reach one (or both) of the CXL_ACPI or CXL_PORT
> >> +            modules to be functional.
> > 
> > Similar comments about dropping these details that can just be solved in
> > Kconfig.
> > 
> > This is next comment is on EINJ ABI, but you can skip it just to
> > maintain momentum with the status quo. Why require the user to do the
> > string to integer conversion? Seems like a small matter of programming
> > to allow:
> > 
> > echo "CXL.cache Protocol Correctable" > einj_inject
> > 
> > ...to do the right thing. That probably makes scripts more readable as
> > well.
> > 
> 
> That's a good point. I can do that, but I think it may be better to keep the
> consistency with the EINJ module to simplify things for end users. If you feel
> that isn't a big enough concern I can go ahead and modify it.

Oh, definitely keep the old style as well. I was thinking of something
like:

echo "CXL.cache Protocol Correctable" > einj_inject
echo "0x1000" > einj_inject

...having the same result. Up to you though, I will still take the
series if only the integer way works.

[..]
> >> +	snprintf(dir_name, 31, "dport%d", dport->port_id);
> > 
> > How about dev_name(dport->dport_dev) rather than the dynamic name?
> > 
> > The other benefit of this is that the dport_dev names are unique, so you
> > can move the einj_inject file to:
> > 
> > /sys/kernel/debug/cxl/$dport_dev/einj_inject
> > 
> 
> I didn't realize the dport names were also unique. I'll go ahead and do that instead.

Yeah, you can assume that all devices that share a bus must have unique
names so that /sys/bus/$bus/devices can list all of them without
name collisions.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v7 1/5] cxl/port: Add EINJ debugfs files and callback support
  2023-12-08 18:01       ` Dan Williams
@ 2023-12-08 20:35         ` Ben Cheatham
  2023-12-08 22:27           ` Ben Cheatham
  0 siblings, 1 reply; 16+ messages in thread
From: Ben Cheatham @ 2023-12-08 20:35 UTC (permalink / raw)
  To: Dan Williams, dave, jonathan.cameron, dave.jiang,
	alison.schofield, vishal.l.verma, ira.weiny, rafael
  Cc: yazen.ghannam, linux-cxl, linux-acpi



On 12/8/23 12:01 PM, Dan Williams wrote:
> Ben Cheatham wrote:
> [..]
>>> This can be simplified. Have something like:
>>>
>>> config CXL_EINJ
>>> 	default CXL_BUS
>>> 	depends on ACPI_APEI_EINJ && ACPI_APEI_EINJ=CXL_BUS
>>> 	...
>>>
>>> Then the documentation moves to Kconfig for how to enable this and the
>>> CXL code can directly call into the EINJ module without worry.
>>>
>>> It would of course need stubs like these in a shared header:
>>>
>>> #ifdef CONFIG_CXL_EINJ
>>> int cxl_einj_available_error_type(struct seq_file *m, void *v);
>>> int cxl_einj_inject_error(struct cxl_dport *dport, u64 type);
>>> #else
>>> static inline int cxl_einj_available_error_type(struct seq_file *m, void *v)
>>> {
>>> 	return -ENXIO;
>>> }
>>>
>>> int cxl_einj_inject_error(struct cxl_dport *dport, u64 type)
>>> {
>>> 	return -ENXIO;
>>> }
>>> #endif
>>>
>>
>> I had to go back and take a look, but I had a shared header in v5
>> (link:
>> https://lore.kernel.org/linux-cxl/20230926120418.0000575d@Huawei.com/).
>> Jonathan recommended that I instead include cxl.h directly, but that
>> was pretty much a completely different patch set at the time (and the
>> header was under include/linux/). That being said, I agree that a
>> header under drivers/cxl would make much more sense here.
> 
> I agree with Jonathan that there are still cases where it makes sense to
> do the relative include thing, but cxl_pmu is an intimate extenstion of
> the CXL subsystem that just happens to live in a another directory. This
> EINJ support is a handful of functions to communicate between modules
> with no need for EINJ to understand all the gory details in cxl.h.
> 

All right that makes sense. Thanks for the clarification.

> [..]
>>>> +Date:               November, 2023
>>>> +KernelVersion:      v6.8
>>>> +Contact:    linux-cxl@vger.kernel.org
>>>> +Description:
>>>> +            (WO) Writing an integer to this file injects the corresponding
>>>> +            CXL protocol error into dportY (integer to type mapping is
>>>> +            available by reading from einj_types). If the dport was
>>>> +            enumerated in RCH mode, a CXL 1.1 error is injected, otherwise
>>>> +            a CXL 2.0 error is injected. This file is only visible if
>>>> +            CONFIG_ACPI_APEI_EINJ is enabled, and the EINJ module must
>>>> +            be able to reach one (or both) of the CXL_ACPI or CXL_PORT
>>>> +            modules to be functional.
>>>
>>> Similar comments about dropping these details that can just be solved in
>>> Kconfig.
>>>
>>> This is next comment is on EINJ ABI, but you can skip it just to
>>> maintain momentum with the status quo. Why require the user to do the
>>> string to integer conversion? Seems like a small matter of programming
>>> to allow:
>>>
>>> echo "CXL.cache Protocol Correctable" > einj_inject
>>>
>>> ...to do the right thing. That probably makes scripts more readable as
>>> well.
>>>
>>
>> That's a good point. I can do that, but I think it may be better to keep the
>> consistency with the EINJ module to simplify things for end users. If you feel
>> that isn't a big enough concern I can go ahead and modify it.
> 
> Oh, definitely keep the old style as well. I was thinking of something
> like:
> 
> echo "CXL.cache Protocol Correctable" > einj_inject
> echo "0x1000" > einj_inject
> 
> ...having the same result. Up to you though, I will still take the
> series if only the integer way works.
> 

Sounds good. If I get the time I'll add in the string version as well.

> [..]
>>>> +	snprintf(dir_name, 31, "dport%d", dport->port_id);
>>>
>>> How about dev_name(dport->dport_dev) rather than the dynamic name?
>>>
>>> The other benefit of this is that the dport_dev names are unique, so you
>>> can move the einj_inject file to:
>>>
>>> /sys/kernel/debug/cxl/$dport_dev/einj_inject
>>>
>>
>> I didn't realize the dport names were also unique. I'll go ahead and do that instead.
> 
> Yeah, you can assume that all devices that share a bus must have unique
> names so that /sys/bus/$bus/devices can list all of them without
> name collisions.

Yeah I was thinking that dev_name(dport->dport_dev) would give a name like dportX for some reason
and realized what that would actually do about 30 mins after I sent out the email :/.

Thanks,
Ben

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v7 1/5] cxl/port: Add EINJ debugfs files and callback support
  2023-12-08 20:35         ` Ben Cheatham
@ 2023-12-08 22:27           ` Ben Cheatham
  2023-12-09  0:07             ` Dan Williams
  0 siblings, 1 reply; 16+ messages in thread
From: Ben Cheatham @ 2023-12-08 22:27 UTC (permalink / raw)
  To: Dan Williams, dave, jonathan.cameron, dave.jiang,
	alison.schofield, vishal.l.verma, ira.weiny, rafael
  Cc: yazen.ghannam, linux-cxl, linux-acpi



On 12/8/23 2:35 PM, Ben Cheatham wrote:
> 
> 
> On 12/8/23 12:01 PM, Dan Williams wrote:
>> Ben Cheatham wrote:
>> [..]
>>>> This can be simplified. Have something like:
>>>>
>>>> config CXL_EINJ
>>>> 	default CXL_BUS
>>>> 	depends on ACPI_APEI_EINJ && ACPI_APEI_EINJ=CXL_BUS
>>>> 	...
>>>>
>>>> Then the documentation moves to Kconfig for how to enable this and the
>>>> CXL code can directly call into the EINJ module without worry.
>>>>
>>>> It would of course need stubs like these in a shared header:
>>>>
>>>> #ifdef CONFIG_CXL_EINJ
>>>> int cxl_einj_available_error_type(struct seq_file *m, void *v);
>>>> int cxl_einj_inject_error(struct cxl_dport *dport, u64 type);
>>>> #else
>>>> static inline int cxl_einj_available_error_type(struct seq_file *m, void *v)
>>>> {
>>>> 	return -ENXIO;
>>>> }
>>>>
>>>> int cxl_einj_inject_error(struct cxl_dport *dport, u64 type)
>>>> {
>>>> 	return -ENXIO;
>>>> }
>>>> #endif
>>>>
>>>
>>> I had to go back and take a look, but I had a shared header in v5
>>> (link:
>>> https://lore.kernel.org/linux-cxl/20230926120418.0000575d@Huawei.com/).
>>> Jonathan recommended that I instead include cxl.h directly, but that
>>> was pretty much a completely different patch set at the time (and the
>>> header was under include/linux/). That being said, I agree that a
>>> header under drivers/cxl would make much more sense here.
>>
>> I agree with Jonathan that there are still cases where it makes sense to
>> do the relative include thing, but cxl_pmu is an intimate extenstion of
>> the CXL subsystem that just happens to live in a another directory. This
>> EINJ support is a handful of functions to communicate between modules
>> with no need for EINJ to understand all the gory details in cxl.h.
>>
> 
> All right that makes sense. Thanks for the clarification.
> 

Ok so I took a look at implementing this. I don't think this solution ends up having
the intended behavior. Using a shared header and the Kconfig rules above introduces
a dependency on the EINJ module, which I was trying to avoid by using the callbacks
since I don't think the CXL core should fail to load if the EINJ module fails.
So, unless you have any other suggestions, I'll use the Kconfig rules above but keep
the callbacks (and also change the EINJ module to use IS_REACHABLE(CONFIG_CXL_EINJ) instead
of IS_REACHABLE(CONFIG_CXL_ACPI) || IS_REACHABLE(CONFIG_CXL_PORT)). I may also just
be doing something wrong (most likely due to late Friday brain fog), so let me know if
I got something wrong here.

Thanks,
Ben

>> [..]
>>>>> +Date:               November, 2023
>>>>> +KernelVersion:      v6.8
>>>>> +Contact:    linux-cxl@vger.kernel.org
>>>>> +Description:
>>>>> +            (WO) Writing an integer to this file injects the corresponding
>>>>> +            CXL protocol error into dportY (integer to type mapping is
>>>>> +            available by reading from einj_types). If the dport was
>>>>> +            enumerated in RCH mode, a CXL 1.1 error is injected, otherwise
>>>>> +            a CXL 2.0 error is injected. This file is only visible if
>>>>> +            CONFIG_ACPI_APEI_EINJ is enabled, and the EINJ module must
>>>>> +            be able to reach one (or both) of the CXL_ACPI or CXL_PORT
>>>>> +            modules to be functional.
>>>>
>>>> Similar comments about dropping these details that can just be solved in
>>>> Kconfig.
>>>>
>>>> This is next comment is on EINJ ABI, but you can skip it just to
>>>> maintain momentum with the status quo. Why require the user to do the
>>>> string to integer conversion? Seems like a small matter of programming
>>>> to allow:
>>>>
>>>> echo "CXL.cache Protocol Correctable" > einj_inject
>>>>
>>>> ...to do the right thing. That probably makes scripts more readable as
>>>> well.
>>>>
>>>
>>> That's a good point. I can do that, but I think it may be better to keep the
>>> consistency with the EINJ module to simplify things for end users. If you feel
>>> that isn't a big enough concern I can go ahead and modify it.
>>
>> Oh, definitely keep the old style as well. I was thinking of something
>> like:
>>
>> echo "CXL.cache Protocol Correctable" > einj_inject
>> echo "0x1000" > einj_inject
>>
>> ...having the same result. Up to you though, I will still take the
>> series if only the integer way works.
>>
> 
> Sounds good. If I get the time I'll add in the string version as well.
> 
>> [..]
>>>>> +	snprintf(dir_name, 31, "dport%d", dport->port_id);
>>>>
>>>> How about dev_name(dport->dport_dev) rather than the dynamic name?
>>>>
>>>> The other benefit of this is that the dport_dev names are unique, so you
>>>> can move the einj_inject file to:
>>>>
>>>> /sys/kernel/debug/cxl/$dport_dev/einj_inject
>>>>
>>>
>>> I didn't realize the dport names were also unique. I'll go ahead and do that instead.
>>
>> Yeah, you can assume that all devices that share a bus must have unique
>> names so that /sys/bus/$bus/devices can list all of them without
>> name collisions.
> 
> Yeah I was thinking that dev_name(dport->dport_dev) would give a name like dportX for some reason
> and realized what that would actually do about 30 mins after I sent out the email :/.
> 
> Thanks,
> Ben
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v7 1/5] cxl/port: Add EINJ debugfs files and callback support
  2023-12-08 22:27           ` Ben Cheatham
@ 2023-12-09  0:07             ` Dan Williams
  0 siblings, 0 replies; 16+ messages in thread
From: Dan Williams @ 2023-12-09  0:07 UTC (permalink / raw)
  To: Ben Cheatham, Dan Williams, dave, jonathan.cameron, dave.jiang,
	alison.schofield, vishal.l.verma, ira.weiny, rafael
  Cc: yazen.ghannam, linux-cxl, linux-acpi

Ben Cheatham wrote:
> 
> 
> On 12/8/23 2:35 PM, Ben Cheatham wrote:
> > 
> > 
> > On 12/8/23 12:01 PM, Dan Williams wrote:
> >> Ben Cheatham wrote:
> >> [..]
> >>>> This can be simplified. Have something like:
> >>>>
> >>>> config CXL_EINJ
> >>>> 	default CXL_BUS
> >>>> 	depends on ACPI_APEI_EINJ && ACPI_APEI_EINJ=CXL_BUS
> >>>> 	...
> >>>>
> >>>> Then the documentation moves to Kconfig for how to enable this and the
> >>>> CXL code can directly call into the EINJ module without worry.
> >>>>
> >>>> It would of course need stubs like these in a shared header:
> >>>>
> >>>> #ifdef CONFIG_CXL_EINJ
> >>>> int cxl_einj_available_error_type(struct seq_file *m, void *v);
> >>>> int cxl_einj_inject_error(struct cxl_dport *dport, u64 type);
> >>>> #else
> >>>> static inline int cxl_einj_available_error_type(struct seq_file *m, void *v)
> >>>> {
> >>>> 	return -ENXIO;
> >>>> }
> >>>>
> >>>> int cxl_einj_inject_error(struct cxl_dport *dport, u64 type)
> >>>> {
> >>>> 	return -ENXIO;
> >>>> }
> >>>> #endif
> >>>>
> >>>
> >>> I had to go back and take a look, but I had a shared header in v5
> >>> (link:
> >>> https://lore.kernel.org/linux-cxl/20230926120418.0000575d@Huawei.com/).
> >>> Jonathan recommended that I instead include cxl.h directly, but that
> >>> was pretty much a completely different patch set at the time (and the
> >>> header was under include/linux/). That being said, I agree that a
> >>> header under drivers/cxl would make much more sense here.
> >>
> >> I agree with Jonathan that there are still cases where it makes sense to
> >> do the relative include thing, but cxl_pmu is an intimate extenstion of
> >> the CXL subsystem that just happens to live in a another directory. This
> >> EINJ support is a handful of functions to communicate between modules
> >> with no need for EINJ to understand all the gory details in cxl.h.
> >>
> > 
> > All right that makes sense. Thanks for the clarification.
> > 
> 
> Ok so I took a look at implementing this. I don't think this solution ends up having
> the intended behavior. Using a shared header and the Kconfig rules above introduces
> a dependency on the EINJ module, which I was trying to avoid by using the callbacks
> since I don't think the CXL core should fail to load if the EINJ module fails.

Good looking out for that. However, if EINJ is going to be offering
services to other parts of the kernel it needs to be better behaved as
library module that can export symbols independent of the platform
specific, not software, error conditions that can make einj_init() fail.

> So, unless you have any other suggestions, I'll use the Kconfig rules above but keep
> the callbacks (and also change the EINJ module to use IS_REACHABLE(CONFIG_CXL_EINJ) instead
> of IS_REACHABLE(CONFIG_CXL_ACPI) || IS_REACHABLE(CONFIG_CXL_PORT)). I may also just
> be doing something wrong (most likely due to late Friday brain fog), so let me know if
> I got something wrong here.

I think the dependency is ok, it's the myriad of failure cases in
einj_init() that are at issue.

If einj.ko was loaded relative to a platform device then it would
already be the case that all of those platform specific setup details
would move to something like einj_probe() insted of einj_init().
Unfortuantely, a full refactor along those lines would probably yield
more violence than benefit.

However, something like that can be approximated since einj.ko, before
your changes, is only ever loaded manually. Just require it to be
unloaded manually unless CXL has it pinned, something like the below.

Open to suggestions from other linux-acpi@ denizens.

diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
index 013eb621dc92..9c179766af88 100644
--- a/drivers/acpi/apei/einj.c
+++ b/drivers/acpi/apei/einj.c
@@ -137,6 +137,7 @@ static struct apei_exec_ins_type einj_ins_type[] = {
 static DEFINE_MUTEX(einj_mutex);
 
 static void *einj_param;
+static bool einj_initialized;
 
 static void einj_exec_ctx_init(struct apei_exec_context *ctx)
 {
@@ -684,7 +685,7 @@ static int einj_check_table(struct acpi_table_einj *einj_tab)
 	return 0;
 }
 
-static int __init einj_init(void)
+static int __init __einj_init(void)
 {
 	int rc;
 	acpi_status status;
@@ -782,10 +783,30 @@ static int __init einj_init(void)
 	return rc;
 }
 
+static int __init einj_init(void)
+{
+	int rc = __einj_init();
+
+	einj_initialized = (rc == 0);
+
+	/*
+	 * CXL needs to be able to link and call its error injection
+	 * helpers regardless of whether the EINJ table is present and
+	 * initialized correctly. Helpers check @einj_initialized.
+	 */
+	if (IS_ENABLED(CONFIG_CXL_EINJ))
+		return 0;
+
+	return rc;
+}
+
 static void __exit einj_exit(void)
 {
 	struct apei_exec_context ctx;
 
+	if (!einj_initialized)
+		return;
+
 	if (einj_param) {
 		acpi_size size = (acpi5) ?
 			sizeof(struct set_error_type_with_address) :

^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-12-09  0:07 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-28 16:06 [PATCH v7 0/5] CXL, ACPI, APEI, EINJ: Update EINJ for CXL error types Ben Cheatham
2023-11-28 16:06 ` [PATCH v7 1/5] cxl/port: Add EINJ debugfs files and callback support Ben Cheatham
2023-12-07 23:13   ` Dan Williams
2023-12-08 16:22     ` Ben Cheatham
2023-12-08 18:01       ` Dan Williams
2023-12-08 20:35         ` Ben Cheatham
2023-12-08 22:27           ` Ben Cheatham
2023-12-09  0:07             ` Dan Williams
2023-11-28 16:06 ` [PATCH v7 2/5] ACPI: Add CXL protocol error defines Ben Cheatham
2023-12-07 23:28   ` Dan Williams
2023-11-28 16:06 ` [PATCH v7 3/5] EINJ: Separate CXL errors from other EINJ errors Ben Cheatham
2023-12-07 23:30   ` Dan Williams
2023-11-28 16:06 ` [PATCH v7 4/5] cxl/port, EINJ: Add CXL EINJ callback functions Ben Cheatham
2023-12-08  0:03   ` Dan Williams
2023-12-08 16:22     ` Ben Cheatham
2023-11-28 16:06 ` [PATCH v7 5/5] EINJ: Update EINJ documentation Ben Cheatham

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.