[PATCH 0/6] Introduce acpi_table_parse_cedt and extra nodes for CXL.mem

linux-cxl.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/6] Introduce acpi_table_parse_cedt and extra nodes for CXL.mem
@ 2021-10-29 19:51 Dan Williams
  2021-10-29 19:51 ` [PATCH 1/6] ACPI: Keep sub-table parsing infrastructure available for modules Dan Williams
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: Dan Williams @ 2021-10-29 19:51 UTC (permalink / raw)
  To: rafael.j.wysocki
  Cc: Len Brown, Alison Schofield, Rafael J. Wysocki, linux-cxl, linux-acpi

Hi Rafael,

While reviewing "[PATCH v3] ACPI: NUMA: Add a node and memblk for each
CFMWS not in SRAT" [1]. I noticed that it was open coding CEDT sub-table
parsing in a similar fashion as drivers/cxl/acpi.c. The driver open
coded the parsing because the ACPI sub-table helpers are marked __init.
In order to avoid the ongoing maintenance burden of a split between
"early" and "late" ACPI sub-table parsing this series proposes to make
those helpers available to drivers.

The savings in drivers/cxl/ are:

 drivers/cxl/Kconfig |    1 
 drivers/cxl/acpi.c  |  234 +++++++++++++++++++--------------------------------
 2 files changed, 88 insertions(+), 147 deletions(-)

...and 15 lines new code not added are saved in this new version of
"ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT".

Let me know if this looks ok to you and I can carry it in the CXL tree
(i.e. after the merge window, for v5.17 consideration).

[1]: https://lore.kernel.org/r/20211019050908.449231-1-alison.schofield@intel.com

---

Alison Schofield (1):
      ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT

Dan Williams (5):
      ACPI: Keep sub-table parsing infrastructure available for modules
      ACPI: Teach ACPI table parsing about the CEDT header format
      ACPI: Add a context argument for table parsing handlers
      cxl/acpi: Convert CFMWS parsing to ACPI sub-table helpers
      cxl/test: Mock acpi_table_parse_cedt()


 drivers/acpi/Kconfig          |    3 +
 drivers/acpi/numa/srat.c      |   59 ++++++++++
 drivers/acpi/tables.c         |   87 +++++++++++----
 drivers/cxl/Kconfig           |    1 
 drivers/cxl/acpi.c            |  237 ++++++++++++++++-------------------------
 include/linux/acpi.h          |   34 +++++-
 tools/testing/cxl/Kbuild      |    3 -
 tools/testing/cxl/test/cxl.c  |   68 ++++++++----
 tools/testing/cxl/test/mock.c |   30 ++---
 tools/testing/cxl/test/mock.h |    6 +
 10 files changed, 304 insertions(+), 224 deletions(-)

base-commit: c6d7e1341cc99ba49df1384c8c5b3f534a5463b1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/6] ACPI: Keep sub-table parsing infrastructure available for modules
  2021-10-29 19:51 [PATCH 0/6] Introduce acpi_table_parse_cedt and extra nodes for CXL.mem Dan Williams
@ 2021-10-29 19:51 ` Dan Williams
  2021-10-29 19:51 ` [PATCH 2/6] ACPI: Teach ACPI table parsing about the CEDT header format Dan Williams
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Dan Williams @ 2021-10-29 19:51 UTC (permalink / raw)
  To: rafael.j.wysocki
  Cc: Rafael J. Wysocki, Len Brown, Alison Schofield, linux-cxl, linux-acpi

The NFIT driver and now the CXL ACPI driver have both open-coded ACPI
table parsing. Before another instance is added arrange for the core
ACPI sub-table parsing to be optionally available to drivers via the
CONFIG_ACPI_TABLE_LIB symbol. If no drivers select the symbol then the
infrastructure reverts back to being tagged __init via the
__init_or_acpilib annotation.

For now, only tag the core sub-table routines and data that the CEDT parsing in
the cxl_acpi driver would want to reuse, a CEDT parsing helper is added
in a later change.

Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/acpi/Kconfig  |    3 +++
 drivers/acpi/tables.c |   27 +++++++++++++--------------
 include/linux/acpi.h  |   22 +++++++++++++++-------
 3 files changed, 31 insertions(+), 21 deletions(-)

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index 1da360c51d66..13ed24414956 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -59,6 +59,9 @@ config ACPI_SYSTEM_POWER_STATES_SUPPORT
 config ACPI_CCA_REQUIRED
 	bool
 
+config ACPI_TABLE_LIB
+	bool
+
 config ACPI_DEBUGGER
 	bool "AML debugger interface"
 	select ACPI_DEBUG
diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
index f9383736fa0f..29d44fe19ef0 100644
--- a/drivers/acpi/tables.c
+++ b/drivers/acpi/tables.c
@@ -34,7 +34,7 @@ static char *mps_inti_flags_trigger[] = { "dfl", "edge", "res", "level" };
 
 static struct acpi_table_desc initial_tables[ACPI_MAX_TABLES] __initdata;
 
-static int acpi_apic_instance __initdata;
+static int acpi_apic_instance __initdata_or_acpilib;
 
 enum acpi_subtable_type {
 	ACPI_SUBTABLE_COMMON,
@@ -51,7 +51,7 @@ struct acpi_subtable_entry {
  * Disable table checksum verification for the early stage due to the size
  * limitation of the current x86 early mapping implementation.
  */
-static bool acpi_verify_table_checksum __initdata = false;
+static bool acpi_verify_table_checksum __initdata_or_acpilib = false;
 
 void acpi_table_print_madt_entry(struct acpi_subtable_header *header)
 {
@@ -215,7 +215,7 @@ void acpi_table_print_madt_entry(struct acpi_subtable_header *header)
 	}
 }
 
-static unsigned long __init
+static unsigned long __init_or_acpilib
 acpi_get_entry_type(struct acpi_subtable_entry *entry)
 {
 	switch (entry->type) {
@@ -229,7 +229,7 @@ acpi_get_entry_type(struct acpi_subtable_entry *entry)
 	return 0;
 }
 
-static unsigned long __init
+static unsigned long __init_or_acpilib
 acpi_get_entry_length(struct acpi_subtable_entry *entry)
 {
 	switch (entry->type) {
@@ -243,7 +243,7 @@ acpi_get_entry_length(struct acpi_subtable_entry *entry)
 	return 0;
 }
 
-static unsigned long __init
+static unsigned long __init_or_acpilib
 acpi_get_subtable_header_length(struct acpi_subtable_entry *entry)
 {
 	switch (entry->type) {
@@ -257,7 +257,7 @@ acpi_get_subtable_header_length(struct acpi_subtable_entry *entry)
 	return 0;
 }
 
-static enum acpi_subtable_type __init
+static enum acpi_subtable_type __init_or_acpilib
 acpi_get_subtable_type(char *id)
 {
 	if (strncmp(id, ACPI_SIG_HMAT, 4) == 0)
@@ -290,10 +290,10 @@ acpi_get_subtable_type(char *id)
  * On success returns sum of all matching entries for all proc handlers.
  * Otherwise, -ENODEV or -EINVAL is returned.
  */
-static int __init acpi_parse_entries_array(char *id, unsigned long table_size,
-		struct acpi_table_header *table_header,
-		struct acpi_subtable_proc *proc, int proc_num,
-		unsigned int max_entries)
+static int __init_or_acpilib acpi_parse_entries_array(
+	char *id, unsigned long table_size,
+	struct acpi_table_header *table_header, struct acpi_subtable_proc *proc,
+	int proc_num, unsigned int max_entries)
 {
 	struct acpi_subtable_entry entry;
 	unsigned long table_end, subtable_len, entry_len;
@@ -351,10 +351,9 @@ static int __init acpi_parse_entries_array(char *id, unsigned long table_size,
 	return errs ? -EINVAL : count;
 }
 
-int __init acpi_table_parse_entries_array(char *id,
-			 unsigned long table_size,
-			 struct acpi_subtable_proc *proc, int proc_num,
-			 unsigned int max_entries)
+int __init_or_acpilib acpi_table_parse_entries_array(
+	char *id, unsigned long table_size, struct acpi_subtable_proc *proc,
+	int proc_num, unsigned int max_entries)
 {
 	struct acpi_table_header *table_header = NULL;
 	int count;
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 974d497a897d..509b72676e2e 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -232,14 +232,22 @@ int acpi_locate_initial_tables (void);
 void acpi_reserve_initial_tables (void);
 void acpi_table_init_complete (void);
 int acpi_table_init (void);
+
+#ifdef CONFIG_ACPI_TABLE_LIB
+#define __init_or_acpilib
+#define __initdata_or_acpilib
+#else
+#define __init_or_acpilib __init
+#define __initdata_or_acpilib __initdata
+#endif
+
 int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
-int __init acpi_table_parse_entries(char *id, unsigned long table_size,
-			      int entry_id,
-			      acpi_tbl_entry_handler handler,
-			      unsigned int max_entries);
-int __init acpi_table_parse_entries_array(char *id, unsigned long table_size,
-			      struct acpi_subtable_proc *proc, int proc_num,
-			      unsigned int max_entries);
+int __init_or_acpilib acpi_table_parse_entries(char *id,
+		unsigned long table_size, int entry_id,
+		acpi_tbl_entry_handler handler, unsigned int max_entries);
+int __init_or_acpilib acpi_table_parse_entries_array(char *id,
+		unsigned long table_size, struct acpi_subtable_proc *proc,
+		int proc_num, unsigned int max_entries);
 int acpi_table_parse_madt(enum acpi_madt_type id,
 			  acpi_tbl_entry_handler handler,
 			  unsigned int max_entries);


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/6] ACPI: Teach ACPI table parsing about the CEDT header format
  2021-10-29 19:51 [PATCH 0/6] Introduce acpi_table_parse_cedt and extra nodes for CXL.mem Dan Williams
  2021-10-29 19:51 ` [PATCH 1/6] ACPI: Keep sub-table parsing infrastructure available for modules Dan Williams
@ 2021-10-29 19:51 ` Dan Williams
  2021-10-29 19:51 ` [PATCH 3/6] ACPI: Add a context argument for table parsing handlers Dan Williams
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Dan Williams @ 2021-10-29 19:51 UTC (permalink / raw)
  To: rafael.j.wysocki
  Cc: Rafael J. Wysocki, Len Brown, Alison Schofield, Alison Schofield,
	linux-cxl, linux-acpi

The CEDT adds yet one more unique subtable header type where the length
is a 16-bit value. Extend the subtable helpers to detect this scenario.

Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Len Brown <lenb@kernel.org>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/acpi/tables.c |    9 +++++++++
 include/linux/acpi.h  |    1 +
 2 files changed, 10 insertions(+)

diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
index 29d44fe19ef0..b1514fde11bc 100644
--- a/drivers/acpi/tables.c
+++ b/drivers/acpi/tables.c
@@ -40,6 +40,7 @@ enum acpi_subtable_type {
 	ACPI_SUBTABLE_COMMON,
 	ACPI_SUBTABLE_HMAT,
 	ACPI_SUBTABLE_PRMT,
+	ACPI_SUBTABLE_CEDT,
 };
 
 struct acpi_subtable_entry {
@@ -225,6 +226,8 @@ acpi_get_entry_type(struct acpi_subtable_entry *entry)
 		return entry->hdr->hmat.type;
 	case ACPI_SUBTABLE_PRMT:
 		return 0;
+	case ACPI_SUBTABLE_CEDT:
+		return entry->hdr->cedt.type;
 	}
 	return 0;
 }
@@ -239,6 +242,8 @@ acpi_get_entry_length(struct acpi_subtable_entry *entry)
 		return entry->hdr->hmat.length;
 	case ACPI_SUBTABLE_PRMT:
 		return entry->hdr->prmt.length;
+	case ACPI_SUBTABLE_CEDT:
+		return entry->hdr->cedt.length;
 	}
 	return 0;
 }
@@ -253,6 +258,8 @@ acpi_get_subtable_header_length(struct acpi_subtable_entry *entry)
 		return sizeof(entry->hdr->hmat);
 	case ACPI_SUBTABLE_PRMT:
 		return sizeof(entry->hdr->prmt);
+	case ACPI_SUBTABLE_CEDT:
+		return sizeof(entry->hdr->cedt);
 	}
 	return 0;
 }
@@ -264,6 +271,8 @@ acpi_get_subtable_type(char *id)
 		return ACPI_SUBTABLE_HMAT;
 	if (strncmp(id, ACPI_SIG_PRMT, 4) == 0)
 		return ACPI_SUBTABLE_PRMT;
+	if (strncmp(id, ACPI_SIG_CEDT, 4) == 0)
+		return ACPI_SUBTABLE_CEDT;
 	return ACPI_SUBTABLE_COMMON;
 }
 
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 509b72676e2e..cbee050f4b1d 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -133,6 +133,7 @@ union acpi_subtable_headers {
 	struct acpi_subtable_header common;
 	struct acpi_hmat_structure hmat;
 	struct acpi_prmt_module_header prmt;
+	struct acpi_cedt_header cedt;
 };
 
 typedef int (*acpi_tbl_table_handler)(struct acpi_table_header *table);


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/6] ACPI: Add a context argument for table parsing handlers
  2021-10-29 19:51 [PATCH 0/6] Introduce acpi_table_parse_cedt and extra nodes for CXL.mem Dan Williams
  2021-10-29 19:51 ` [PATCH 1/6] ACPI: Keep sub-table parsing infrastructure available for modules Dan Williams
  2021-10-29 19:51 ` [PATCH 2/6] ACPI: Teach ACPI table parsing about the CEDT header format Dan Williams
@ 2021-10-29 19:51 ` Dan Williams
  2021-10-29 19:51 ` [PATCH 4/6] cxl/acpi: Convert CFMWS parsing to ACPI sub-table helpers Dan Williams
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Dan Williams @ 2021-10-29 19:51 UTC (permalink / raw)
  To: rafael.j.wysocki
  Cc: Rafael J. Wysocki, Len Brown, Alison Schofield, Alison Schofield,
	linux-cxl, linux-acpi

In preparation for drivers reusing the core table parsing
infrastructure, arrange for handlers to take a context argument. This
allows driver table parsing to wrap ACPI table entries in
driver-specific data.

The first consumer of this infrastructure is the CEDT parsing that
happens in the cxl_acpi driver, add a conditional
(CONFIG_ACPI_TABLE_LIB=y) export of a acpi_table_parse_cedt() helper for
this case.

Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Len Brown <lenb@kernel.org>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/acpi/tables.c |   51 ++++++++++++++++++++++++++++++++++++++++++-------
 include/linux/acpi.h  |   11 +++++++++++
 2 files changed, 55 insertions(+), 7 deletions(-)

diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
index b1514fde11bc..0a7be92a4f37 100644
--- a/drivers/acpi/tables.c
+++ b/drivers/acpi/tables.c
@@ -276,6 +276,22 @@ acpi_get_subtable_type(char *id)
 	return ACPI_SUBTABLE_COMMON;
 }
 
+static __init_or_acpilib bool has_handler(struct acpi_subtable_proc *proc)
+{
+	return proc->handler || proc->handler_arg;
+}
+
+static __init_or_acpilib int call_handler(struct acpi_subtable_proc *proc,
+					  union acpi_subtable_headers *hdr,
+					  unsigned long end)
+{
+	if (proc->handler)
+		return proc->handler(hdr, end);
+	if (proc->handler_arg)
+		return proc->handler_arg(hdr, proc->arg, end);
+	return -EINVAL;
+}
+
 /**
  * acpi_parse_entries_array - for each proc_num find a suitable subtable
  *
@@ -326,8 +342,9 @@ static int __init_or_acpilib acpi_parse_entries_array(
 		for (i = 0; i < proc_num; i++) {
 			if (acpi_get_entry_type(&entry) != proc[i].id)
 				continue;
-			if (!proc[i].handler ||
-			     (!errs && proc[i].handler(entry.hdr, table_end))) {
+			if (!has_handler(&proc[i]) ||
+			    (!errs &&
+			     call_handler(&proc[i], entry.hdr, table_end))) {
 				errs++;
 				continue;
 			}
@@ -393,21 +410,41 @@ int __init_or_acpilib acpi_table_parse_entries_array(
 	return count;
 }
 
-int __init acpi_table_parse_entries(char *id,
-			unsigned long table_size,
-			int entry_id,
-			acpi_tbl_entry_handler handler,
-			unsigned int max_entries)
+static int __init_or_acpilib __acpi_table_parse_entries(
+	char *id, unsigned long table_size, int entry_id,
+	acpi_tbl_entry_handler handler, acpi_tbl_entry_handler_arg handler_arg,
+	void *arg, unsigned int max_entries)
 {
 	struct acpi_subtable_proc proc = {
 		.id		= entry_id,
 		.handler	= handler,
+		.handler_arg	= handler_arg,
+		.arg		= arg,
 	};
 
 	return acpi_table_parse_entries_array(id, table_size, &proc, 1,
 						max_entries);
 }
 
+int __init_or_acpilib
+acpi_table_parse_cedt(enum acpi_cedt_type id,
+		      acpi_tbl_entry_handler_arg handler_arg, void *arg)
+{
+	return __acpi_table_parse_entries(ACPI_SIG_CEDT,
+					  sizeof(struct acpi_table_cedt), id,
+					  NULL, handler_arg, arg, 0);
+}
+EXPORT_SYMBOL_ACPI_LIB(acpi_table_parse_cedt);
+
+int __init acpi_table_parse_entries(char *id, unsigned long table_size,
+				    int entry_id,
+				    acpi_tbl_entry_handler handler,
+				    unsigned int max_entries)
+{
+	return __acpi_table_parse_entries(id, table_size, entry_id, handler,
+					  NULL, NULL, max_entries);
+}
+
 int __init acpi_table_parse_madt(enum acpi_madt_type id,
 		      acpi_tbl_entry_handler handler, unsigned int max_entries)
 {
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index cbee050f4b1d..a077bd91ba4a 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -141,6 +141,9 @@ typedef int (*acpi_tbl_table_handler)(struct acpi_table_header *table);
 typedef int (*acpi_tbl_entry_handler)(union acpi_subtable_headers *header,
 				      const unsigned long end);
 
+typedef int (*acpi_tbl_entry_handler_arg)(union acpi_subtable_headers *header,
+					  void *arg, const unsigned long end);
+
 /* Debugger support */
 
 struct acpi_debugger_ops {
@@ -217,6 +220,8 @@ static inline int acpi_debugger_notify_command_complete(void)
 struct acpi_subtable_proc {
 	int id;
 	acpi_tbl_entry_handler handler;
+	acpi_tbl_entry_handler_arg handler_arg;
+	void *arg;
 	int count;
 };
 
@@ -235,9 +240,11 @@ void acpi_table_init_complete (void);
 int acpi_table_init (void);
 
 #ifdef CONFIG_ACPI_TABLE_LIB
+#define EXPORT_SYMBOL_ACPI_LIB(x) EXPORT_SYMBOL_NS_GPL(x, ACPI)
 #define __init_or_acpilib
 #define __initdata_or_acpilib
 #else
+#define EXPORT_SYMBOL_ACPI_LIB(x)
 #define __init_or_acpilib __init
 #define __initdata_or_acpilib __initdata
 #endif
@@ -252,6 +259,10 @@ int __init_or_acpilib acpi_table_parse_entries_array(char *id,
 int acpi_table_parse_madt(enum acpi_madt_type id,
 			  acpi_tbl_entry_handler handler,
 			  unsigned int max_entries);
+int __init_or_acpilib
+acpi_table_parse_cedt(enum acpi_cedt_type id,
+		      acpi_tbl_entry_handler_arg handler_arg, void *arg);
+
 int acpi_parse_mcfg (struct acpi_table_header *header);
 void acpi_table_print_madt_entry (struct acpi_subtable_header *madt);
 


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/6] cxl/acpi: Convert CFMWS parsing to ACPI sub-table helpers
  2021-10-29 19:51 [PATCH 0/6] Introduce acpi_table_parse_cedt and extra nodes for CXL.mem Dan Williams
                   ` (2 preceding siblings ...)
  2021-10-29 19:51 ` [PATCH 3/6] ACPI: Add a context argument for table parsing handlers Dan Williams
@ 2021-10-29 19:51 ` Dan Williams
  2021-10-29 19:51 ` [PATCH 5/6] cxl/test: Mock acpi_table_parse_cedt() Dan Williams
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Dan Williams @ 2021-10-29 19:51 UTC (permalink / raw)
  To: rafael.j.wysocki; +Cc: Alison Schofield, linux-cxl, linux-acpi

The cxl_acpi driver originally open-coded its table parsing since the
ACPI subtable helpers were marked __init and only used in early NUMA
initialization.  Now that those helpers have been exported for driver
usage replace the open-coded solution with the common one.

Cc: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/Kconfig |    1 
 drivers/cxl/acpi.c  |  234 +++++++++++++++++++--------------------------------
 2 files changed, 88 insertions(+), 147 deletions(-)

diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index e6de221cc568..67c91378f2dd 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -51,6 +51,7 @@ config CXL_ACPI
 	tristate "CXL ACPI: Platform Support"
 	depends on ACPI
 	default CXL_BUS
+	select ACPI_TABLE_LIB
 	help
 	  Enable support for host managed device memory (HDM) resources
 	  published by a platform's ACPI CXL memory layout description.  See
diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index dadc7f64b9ff..7820082a2746 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -8,8 +8,6 @@
 #include <linux/pci.h>
 #include "cxl.h"
 
-static struct acpi_table_header *acpi_cedt;
-
 /* Encode defined in CXL 2.0 8.2.5.12.7 HDM Decoder Control Register */
 #define CFMWS_INTERLEAVE_WAYS(x)	(1 << (x)->interleave_ways)
 #define CFMWS_INTERLEAVE_GRANULARITY(x)	((x)->granularity + 8)
@@ -74,134 +72,63 @@ static int cxl_acpi_cfmws_verify(struct device *dev,
 	return 0;
 }
 
-static void cxl_add_cfmws_decoders(struct device *dev,
-				   struct cxl_port *root_port)
+struct cxl_cfmws_context {
+	struct device *dev;
+	struct cxl_port *root_port;
+};
+
+static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
+			   const unsigned long end)
 {
 	int target_map[CXL_DECODER_MAX_INTERLEAVE];
+	struct cxl_cfmws_context *ctx = arg;
+	struct cxl_port *root_port = ctx->root_port;
+	struct device *dev = ctx->dev;
 	struct acpi_cedt_cfmws *cfmws;
 	struct cxl_decoder *cxld;
-	acpi_size len, cur = 0;
-	void *cedt_subtable;
-	int rc;
-
-	len = acpi_cedt->length - sizeof(*acpi_cedt);
-	cedt_subtable = acpi_cedt + 1;
-
-	while (cur < len) {
-		struct acpi_cedt_header *c = cedt_subtable + cur;
-		int i;
-
-		if (c->type != ACPI_CEDT_TYPE_CFMWS) {
-			cur += c->length;
-			continue;
-		}
+	int rc, i;
 
-		cfmws = cedt_subtable + cur;
+	cfmws = (struct acpi_cedt_cfmws *) header;
 
-		if (cfmws->header.length < sizeof(*cfmws)) {
-			dev_warn_once(dev,
-				      "CFMWS entry skipped:invalid length:%u\n",
-				      cfmws->header.length);
-			cur += c->length;
-			continue;
-		}
-
-		rc = cxl_acpi_cfmws_verify(dev, cfmws);
-		if (rc) {
-			dev_err(dev, "CFMWS range %#llx-%#llx not registered\n",
-				cfmws->base_hpa, cfmws->base_hpa +
-				cfmws->window_size - 1);
-			cur += c->length;
-			continue;
-		}
-
-		for (i = 0; i < CFMWS_INTERLEAVE_WAYS(cfmws); i++)
-			target_map[i] = cfmws->interleave_targets[i];
-
-		cxld = cxl_decoder_alloc(root_port,
-					 CFMWS_INTERLEAVE_WAYS(cfmws));
-		if (IS_ERR(cxld))
-			goto next;
-
-		cxld->flags = cfmws_to_decoder_flags(cfmws->restrictions);
-		cxld->target_type = CXL_DECODER_EXPANDER;
-		cxld->range = (struct range) {
-			.start = cfmws->base_hpa,
-			.end = cfmws->base_hpa + cfmws->window_size - 1,
-		};
-		cxld->interleave_ways = CFMWS_INTERLEAVE_WAYS(cfmws);
-		cxld->interleave_granularity =
-			CFMWS_INTERLEAVE_GRANULARITY(cfmws);
-
-		rc = cxl_decoder_add(cxld, target_map);
-		if (rc)
-			put_device(&cxld->dev);
-		else
-			rc = cxl_decoder_autoremove(dev, cxld);
-		if (rc) {
-			dev_err(dev, "Failed to add decoder for %#llx-%#llx\n",
-				cfmws->base_hpa, cfmws->base_hpa +
-				cfmws->window_size - 1);
-			goto next;
-		}
-		dev_dbg(dev, "add: %s range %#llx-%#llx\n",
-			dev_name(&cxld->dev), cfmws->base_hpa,
+	rc = cxl_acpi_cfmws_verify(dev, cfmws);
+	if (rc) {
+		dev_err(dev, "CFMWS range %#llx-%#llx not registered\n",
+			cfmws->base_hpa,
 			cfmws->base_hpa + cfmws->window_size - 1);
-next:
-		cur += c->length;
+		return 0;
 	}
-}
-
-static struct acpi_cedt_chbs *cxl_acpi_match_chbs(struct device *dev, u32 uid)
-{
-	struct acpi_cedt_chbs *chbs, *chbs_match = NULL;
-	acpi_size len, cur = 0;
-	void *cedt_subtable;
 
-	len = acpi_cedt->length - sizeof(*acpi_cedt);
-	cedt_subtable = acpi_cedt + 1;
+	for (i = 0; i < CFMWS_INTERLEAVE_WAYS(cfmws); i++)
+		target_map[i] = cfmws->interleave_targets[i];
 
-	while (cur < len) {
-		struct acpi_cedt_header *c = cedt_subtable + cur;
-
-		if (c->type != ACPI_CEDT_TYPE_CHBS) {
-			cur += c->length;
-			continue;
-		}
-
-		chbs = cedt_subtable + cur;
-
-		if (chbs->header.length < sizeof(*chbs)) {
-			dev_warn_once(dev,
-				      "CHBS entry skipped: invalid length:%u\n",
-				      chbs->header.length);
-			cur += c->length;
-			continue;
-		}
-
-		if (chbs->uid != uid) {
-			cur += c->length;
-			continue;
-		}
+	cxld = cxl_decoder_alloc(root_port, CFMWS_INTERLEAVE_WAYS(cfmws));
+	if (IS_ERR(cxld))
+		return 0;
 
-		if (chbs_match) {
-			dev_warn_once(dev,
-				      "CHBS entry skipped: duplicate UID:%u\n",
-				      uid);
-			cur += c->length;
-			continue;
-		}
+	cxld->flags = cfmws_to_decoder_flags(cfmws->restrictions);
+	cxld->target_type = CXL_DECODER_EXPANDER;
+	cxld->range = (struct range){
+		.start = cfmws->base_hpa,
+		.end = cfmws->base_hpa + cfmws->window_size - 1,
+	};
+	cxld->interleave_ways = CFMWS_INTERLEAVE_WAYS(cfmws);
+	cxld->interleave_granularity = CFMWS_INTERLEAVE_GRANULARITY(cfmws);
 
-		chbs_match = chbs;
-		cur += c->length;
+	rc = cxl_decoder_add(cxld, target_map);
+	if (rc)
+		put_device(&cxld->dev);
+	else
+		rc = cxl_decoder_autoremove(dev, cxld);
+	if (rc) {
+		dev_err(dev, "Failed to add decoder for %#llx-%#llx\n",
+			cfmws->base_hpa,
+			cfmws->base_hpa + cfmws->window_size - 1);
+		return 0;
 	}
+	dev_dbg(dev, "add: %s range %#llx-%#llx\n", dev_name(&cxld->dev),
+		cfmws->base_hpa, cfmws->base_hpa + cfmws->window_size - 1);
 
-	return chbs_match ? chbs_match : ERR_PTR(-ENODEV);
-}
-
-static resource_size_t get_chbcr(struct acpi_cedt_chbs *chbs)
-{
-	return IS_ERR(chbs) ? CXL_RESOURCE_NONE : chbs->base;
+	return 0;
 }
 
 __mock int match_add_root_ports(struct pci_dev *pdev, void *data)
@@ -355,12 +282,35 @@ static int add_host_bridge_uport(struct device *match, void *arg)
 	return rc;
 }
 
+struct cxl_chbs_context {
+	unsigned long long uid;
+	resource_size_t chbcr;
+};
+
+static int cxl_get_chbcr(union acpi_subtable_headers *header, void *arg,
+			 const unsigned long end)
+{
+	struct cxl_chbs_context *ctx = arg;
+	struct acpi_cedt_chbs *chbs;
+
+	if (ctx->chbcr)
+		return 0;
+
+	chbs = (struct acpi_cedt_chbs *) header;
+
+	if (ctx->uid != chbs->uid)
+		return 0;
+	ctx->chbcr = chbs->base;
+
+	return 0;
+}
+
 static int add_host_bridge_dport(struct device *match, void *arg)
 {
 	int rc;
 	acpi_status status;
 	unsigned long long uid;
-	struct acpi_cedt_chbs *chbs;
+	struct cxl_chbs_context ctx;
 	struct cxl_port *root_port = arg;
 	struct device *host = root_port->dev.parent;
 	struct acpi_device *bridge = to_cxl_host_bridge(host, match);
@@ -376,14 +326,18 @@ static int add_host_bridge_dport(struct device *match, void *arg)
 		return -ENODEV;
 	}
 
-	chbs = cxl_acpi_match_chbs(host, uid);
-	if (IS_ERR(chbs)) {
+	ctx = (struct cxl_chbs_context) {
+		.uid = uid,
+	};
+	acpi_table_parse_cedt(ACPI_CEDT_TYPE_CHBS, cxl_get_chbcr, &ctx);
+
+	if (ctx.chbcr == 0) {
 		dev_warn(host, "No CHBS found for Host Bridge: %s\n",
 			 dev_name(match));
 		return 0;
 	}
 
-	rc = cxl_add_dport(root_port, match, uid, get_chbcr(chbs));
+	rc = cxl_add_dport(root_port, match, uid, ctx.chbcr);
 	if (rc) {
 		dev_err(host, "failed to add downstream port: %s\n",
 			dev_name(match));
@@ -417,40 +371,29 @@ static int add_root_nvdimm_bridge(struct device *match, void *data)
 	return 1;
 }
 
-static u32 cedt_instance(struct platform_device *pdev)
-{
-	const bool *native_acpi0017 = acpi_device_get_match_data(&pdev->dev);
-
-	if (native_acpi0017 && *native_acpi0017)
-		return 0;
-
-	/* for cxl_test request a non-canonical instance */
-	return U32_MAX;
-}
-
 static int cxl_acpi_probe(struct platform_device *pdev)
 {
 	int rc;
-	acpi_status status;
 	struct cxl_port *root_port;
 	struct device *host = &pdev->dev;
 	struct acpi_device *adev = ACPI_COMPANION(host);
+	struct cxl_cfmws_context ctx;
 
 	root_port = devm_cxl_add_port(host, host, CXL_RESOURCE_NONE, NULL);
 	if (IS_ERR(root_port))
 		return PTR_ERR(root_port);
 	dev_dbg(host, "add: %s\n", dev_name(&root_port->dev));
 
-	status = acpi_get_table(ACPI_SIG_CEDT, cedt_instance(pdev), &acpi_cedt);
-	if (ACPI_FAILURE(status))
-		return -ENXIO;
-
 	rc = bus_for_each_dev(adev->dev.bus, NULL, root_port,
 			      add_host_bridge_dport);
-	if (rc)
-		goto out;
+	if (rc < 0)
+		return rc;
 
-	cxl_add_cfmws_decoders(host, root_port);
+	ctx = (struct cxl_cfmws_context) {
+		.dev = host,
+		.root_port = root_port,
+	};
+	acpi_table_parse_cedt(ACPI_CEDT_TYPE_CFMWS, cxl_parse_cfmws, &ctx);
 
 	/*
 	 * Root level scanned with host-bridge as dports, now scan host-bridges
@@ -458,24 +401,20 @@ static int cxl_acpi_probe(struct platform_device *pdev)
 	 */
 	rc = bus_for_each_dev(adev->dev.bus, NULL, root_port,
 			      add_host_bridge_uport);
-	if (rc)
-		goto out;
+	if (rc < 0)
+		return rc;
 
 	if (IS_ENABLED(CONFIG_CXL_PMEM))
 		rc = device_for_each_child(&root_port->dev, root_port,
 					   add_root_nvdimm_bridge);
-
-out:
-	acpi_put_table(acpi_cedt);
 	if (rc < 0)
 		return rc;
+
 	return 0;
 }
 
-static bool native_acpi0017 = true;
-
 static const struct acpi_device_id cxl_acpi_ids[] = {
-	{ "ACPI0017", (unsigned long) &native_acpi0017 },
+	{ "ACPI0017" },
 	{ },
 };
 MODULE_DEVICE_TABLE(acpi, cxl_acpi_ids);
@@ -491,3 +430,4 @@ static struct platform_driver cxl_acpi_driver = {
 module_platform_driver(cxl_acpi_driver);
 MODULE_LICENSE("GPL v2");
 MODULE_IMPORT_NS(CXL);
+MODULE_IMPORT_NS(ACPI);


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/6] cxl/test: Mock acpi_table_parse_cedt()
  2021-10-29 19:51 [PATCH 0/6] Introduce acpi_table_parse_cedt and extra nodes for CXL.mem Dan Williams
                   ` (3 preceding siblings ...)
  2021-10-29 19:51 ` [PATCH 4/6] cxl/acpi: Convert CFMWS parsing to ACPI sub-table helpers Dan Williams
@ 2021-10-29 19:51 ` Dan Williams
  2021-10-29 19:51 ` [PATCH 6/6] ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT Dan Williams
  2021-11-01 12:00 ` [PATCH 0/6] Introduce acpi_table_parse_cedt and extra nodes for CXL.mem Jonathan Cameron
  6 siblings, 0 replies; 16+ messages in thread
From: Dan Williams @ 2021-10-29 19:51 UTC (permalink / raw)
  To: rafael.j.wysocki; +Cc: Alison Schofield, linux-cxl, linux-acpi

Now that cxl_acpi has been converted to use the core ACPI CEDT sub-table
parser, update cxl_test to inject CFMWS and CHBS data directly into
cxl_acpi's handlers.

Cc: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/acpi.c            |    2 +
 tools/testing/cxl/Kbuild      |    3 +-
 tools/testing/cxl/test/cxl.c  |   68 ++++++++++++++++++++++++++++-------------
 tools/testing/cxl/test/mock.c |   30 +++++-------------
 tools/testing/cxl/test/mock.h |    6 ++--
 5 files changed, 61 insertions(+), 48 deletions(-)

diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 7820082a2746..91e4072e7649 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -283,6 +283,7 @@ static int add_host_bridge_uport(struct device *match, void *arg)
 }
 
 struct cxl_chbs_context {
+	struct device *dev;
 	unsigned long long uid;
 	resource_size_t chbcr;
 };
@@ -327,6 +328,7 @@ static int add_host_bridge_dport(struct device *match, void *arg)
 	}
 
 	ctx = (struct cxl_chbs_context) {
+		.dev = host,
 		.uid = uid,
 	};
 	acpi_table_parse_cedt(ACPI_CEDT_TYPE_CHBS, cxl_get_chbcr, &ctx);
diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
index 86deba8308a1..1acdf2fc31c5 100644
--- a/tools/testing/cxl/Kbuild
+++ b/tools/testing/cxl/Kbuild
@@ -1,7 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
+ldflags-y += --wrap=acpi_table_parse_cedt
 ldflags-y += --wrap=is_acpi_device_node
-ldflags-y += --wrap=acpi_get_table
-ldflags-y += --wrap=acpi_put_table
 ldflags-y += --wrap=acpi_evaluate_integer
 ldflags-y += --wrap=acpi_pci_find_root
 ldflags-y += --wrap=pci_walk_bus
diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
index cb32f9e27d5d..736d99006fb7 100644
--- a/tools/testing/cxl/test/cxl.c
+++ b/tools/testing/cxl/test/cxl.c
@@ -182,6 +182,13 @@ static struct {
 	},
 };
 
+struct acpi_cedt_cfmws *mock_cfmws[4] = {
+	[0] = &mock_cedt.cfmws0.cfmws,
+	[1] = &mock_cedt.cfmws1.cfmws,
+	[2] = &mock_cedt.cfmws2.cfmws,
+	[3] = &mock_cedt.cfmws3.cfmws,
+};
+
 struct cxl_mock_res {
 	struct list_head list;
 	struct range range;
@@ -232,12 +239,6 @@ static struct cxl_mock_res *alloc_mock_res(resource_size_t size)
 
 static int populate_cedt(void)
 {
-	struct acpi_cedt_cfmws *cfmws[4] = {
-		[0] = &mock_cedt.cfmws0.cfmws,
-		[1] = &mock_cedt.cfmws1.cfmws,
-		[2] = &mock_cedt.cfmws2.cfmws,
-		[3] = &mock_cedt.cfmws3.cfmws,
-	};
 	struct cxl_mock_res *res;
 	int i;
 
@@ -257,8 +258,8 @@ static int populate_cedt(void)
 		chbs->length = size;
 	}
 
-	for (i = 0; i < ARRAY_SIZE(cfmws); i++) {
-		struct acpi_cedt_cfmws *window = cfmws[i];
+	for (i = 0; i < ARRAY_SIZE(mock_cfmws); i++) {
+		struct acpi_cedt_cfmws *window = mock_cfmws[i];
 
 		res = alloc_mock_res(window->window_size);
 		if (!res)
@@ -269,21 +270,44 @@ static int populate_cedt(void)
 	return 0;
 }
 
-static acpi_status mock_acpi_get_table(char *signature, u32 instance,
-				       struct acpi_table_header **out_table)
+/*
+ * WARNING, this hack assumes the format of 'struct
+ * cxl_cfmws_context' and 'struct cxl_chbs_context' share the property that
+ * the first struct member is the device being probed by the cxl_acpi
+ * driver.
+ */
+struct cxl_cedt_context {
+	struct device *dev;
+};
+
+static int mock_acpi_table_parse_cedt(enum acpi_cedt_type id,
+				      acpi_tbl_entry_handler_arg handler_arg,
+				      void *arg)
 {
-	if (instance < U32_MAX || strcmp(signature, ACPI_SIG_CEDT) != 0)
-		return acpi_get_table(signature, instance, out_table);
+	struct cxl_cedt_context *ctx = arg;
+	struct device *dev = ctx->dev;
+	union acpi_subtable_headers *h;
+	unsigned long end;
+	int i;
 
-	*out_table = (struct acpi_table_header *) &mock_cedt;
-	return AE_OK;
-}
+	if (dev != &cxl_acpi->dev)
+		return acpi_table_parse_cedt(id, handler_arg, arg);
 
-static void mock_acpi_put_table(struct acpi_table_header *table)
-{
-	if (table == (struct acpi_table_header *) &mock_cedt)
-		return;
-	acpi_put_table(table);
+	if (id == ACPI_CEDT_TYPE_CHBS)
+		for (i = 0; i < ARRAY_SIZE(mock_cedt.chbs); i++) {
+			h = (union acpi_subtable_headers *)&mock_cedt.chbs[i];
+			end = (unsigned long)&mock_cedt.chbs[i + 1];
+			handler_arg(h, arg, end);
+		}
+
+	if (id == ACPI_CEDT_TYPE_CFMWS)
+		for (i = 0; i < ARRAY_SIZE(mock_cfmws); i++) {
+			h = (union acpi_subtable_headers *) mock_cfmws[i];
+			end = (unsigned long) h + mock_cfmws[i]->header.length;
+			handler_arg(h, arg, end);
+		}
+
+	return 0;
 }
 
 static bool is_mock_bridge(struct device *dev)
@@ -388,8 +412,7 @@ static struct cxl_mock_ops cxl_mock_ops = {
 	.is_mock_port = is_mock_port,
 	.is_mock_dev = is_mock_dev,
 	.mock_port = mock_cxl_root_port,
-	.acpi_get_table = mock_acpi_get_table,
-	.acpi_put_table = mock_acpi_put_table,
+	.acpi_table_parse_cedt = mock_acpi_table_parse_cedt,
 	.acpi_evaluate_integer = mock_acpi_evaluate_integer,
 	.acpi_pci_find_root = mock_acpi_pci_find_root,
 	.list = LIST_HEAD_INIT(cxl_mock_ops.list),
@@ -574,3 +597,4 @@ static __exit void cxl_test_exit(void)
 module_init(cxl_test_init);
 module_exit(cxl_test_exit);
 MODULE_LICENSE("GPL v2");
+MODULE_IMPORT_NS(ACPI);
diff --git a/tools/testing/cxl/test/mock.c b/tools/testing/cxl/test/mock.c
index b8c108abcf07..17408f892df4 100644
--- a/tools/testing/cxl/test/mock.c
+++ b/tools/testing/cxl/test/mock.c
@@ -58,36 +58,23 @@ bool __wrap_is_acpi_device_node(const struct fwnode_handle *fwnode)
 }
 EXPORT_SYMBOL(__wrap_is_acpi_device_node);
 
-acpi_status __wrap_acpi_get_table(char *signature, u32 instance,
-				  struct acpi_table_header **out_table)
+int __wrap_acpi_table_parse_cedt(enum acpi_cedt_type id,
+				 acpi_tbl_entry_handler_arg handler_arg,
+				 void *arg)
 {
-	int index;
+	int index, rc;
 	struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
-	acpi_status status;
 
 	if (ops)
-		status = ops->acpi_get_table(signature, instance, out_table);
+		rc = ops->acpi_table_parse_cedt(id, handler_arg, arg);
 	else
-		status = acpi_get_table(signature, instance, out_table);
+		rc = acpi_table_parse_cedt(id, handler_arg, arg);
 
 	put_cxl_mock_ops(index);
 
-	return status;
-}
-EXPORT_SYMBOL(__wrap_acpi_get_table);
-
-void __wrap_acpi_put_table(struct acpi_table_header *table)
-{
-	int index;
-	struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
-
-	if (ops)
-		ops->acpi_put_table(table);
-	else
-		acpi_put_table(table);
-	put_cxl_mock_ops(index);
+	return rc;
 }
-EXPORT_SYMBOL(__wrap_acpi_put_table);
+EXPORT_SYMBOL_NS_GPL(__wrap_acpi_table_parse_cedt, ACPI);
 
 acpi_status __wrap_acpi_evaluate_integer(acpi_handle handle,
 					 acpi_string pathname,
@@ -169,3 +156,4 @@ __wrap_nvdimm_bus_register(struct device *dev,
 EXPORT_SYMBOL_GPL(__wrap_nvdimm_bus_register);
 
 MODULE_LICENSE("GPL v2");
+MODULE_IMPORT_NS(ACPI);
diff --git a/tools/testing/cxl/test/mock.h b/tools/testing/cxl/test/mock.h
index 805a94cb3fbe..15ed0fd877e4 100644
--- a/tools/testing/cxl/test/mock.h
+++ b/tools/testing/cxl/test/mock.h
@@ -6,9 +6,9 @@
 struct cxl_mock_ops {
 	struct list_head list;
 	bool (*is_mock_adev)(struct acpi_device *dev);
-	acpi_status (*acpi_get_table)(char *signature, u32 instance,
-				      struct acpi_table_header **out_table);
-	void (*acpi_put_table)(struct acpi_table_header *table);
+	int (*acpi_table_parse_cedt)(enum acpi_cedt_type id,
+				     acpi_tbl_entry_handler_arg handler_arg,
+				     void *arg);
 	bool (*is_mock_bridge)(struct device *dev);
 	acpi_status (*acpi_evaluate_integer)(acpi_handle handle,
 					     acpi_string pathname,


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 6/6] ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT
  2021-10-29 19:51 [PATCH 0/6] Introduce acpi_table_parse_cedt and extra nodes for CXL.mem Dan Williams
                   ` (4 preceding siblings ...)
  2021-10-29 19:51 ` [PATCH 5/6] cxl/test: Mock acpi_table_parse_cedt() Dan Williams
@ 2021-10-29 19:51 ` Dan Williams
  2021-11-18 13:12   ` Jonathan Cameron
  2021-11-01 12:00 ` [PATCH 0/6] Introduce acpi_table_parse_cedt and extra nodes for CXL.mem Jonathan Cameron
  6 siblings, 1 reply; 16+ messages in thread
From: Dan Williams @ 2021-10-29 19:51 UTC (permalink / raw)
  To: rafael.j.wysocki; +Cc: Alison Schofield, linux-cxl, linux-acpi

From: Alison Schofield <alison.schofield@intel.com>

During NUMA init, CXL memory defined in the SRAT Memory Affinity
subtable may be assigned to a NUMA node. Since there is no
requirement that the SRAT be comprehensive for CXL memory another
mechanism is needed to assign NUMA nodes to CXL memory not identified
in the SRAT.

Use the CXL Fixed Memory Window Structure (CFMWS) of the ACPI CXL
Early Discovery Table (CEDT) to find all CXL memory ranges.
Create a NUMA node for each CFMWS that is not already assigned to
a NUMA node. Add a memblk attaching its host physical address
range to the node.

Note that these ranges may not actually map any memory at boot time.
They may describe persistent capacity or may be present to enable
hot-plug.

Consumers can use phys_to_target_node() to discover the NUMA node.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/acpi/numa/srat.c |   59 +++++++++++++++++++++++++++++++++++++++++++++-
 drivers/cxl/acpi.c       |    3 ++
 2 files changed, 60 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
index b8795fc49097..66a0142dc78c 100644
--- a/drivers/acpi/numa/srat.c
+++ b/drivers/acpi/numa/srat.c
@@ -298,6 +298,47 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
 out_err:
 	return -EINVAL;
 }
+
+static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
+				   void *arg, const unsigned long table_end)
+{
+	struct acpi_cedt_cfmws *cfmws;
+	int *fake_pxm = arg;
+	u64 start, end;
+	int node;
+
+	cfmws = (struct acpi_cedt_cfmws *)header;
+	start = cfmws->base_hpa;
+	end = cfmws->base_hpa + cfmws->window_size;
+
+	/* Skip if the SRAT already described the NUMA details for this HPA */
+	node = phys_to_target_node(start);
+	if (node != NUMA_NO_NODE)
+		return 0;
+
+	node = acpi_map_pxm_to_node(*fake_pxm);
+
+	if (node == NUMA_NO_NODE) {
+		pr_err("ACPI NUMA: Too many proximity domains while processing CFMWS.\n");
+		return -EINVAL;
+	}
+
+	if (numa_add_memblk(node, start, end) < 0) {
+		/* CXL driver must handle the NUMA_NO_NODE case */
+		pr_warn("ACPI NUMA: Failed to add memblk for CFMWS node %d [mem %#llx-%#llx]\n",
+			node, start, end);
+	}
+
+	/* Set the next available fake_pxm value */
+	(*fake_pxm)++;
+	return 0;
+}
+#else
+static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
+				   void *arg, const unsigned long table_end)
+{
+	return 0;
+}
 #endif /* defined(CONFIG_X86) || defined (CONFIG_ARM64) */
 
 static int __init acpi_parse_slit(struct acpi_table_header *table)
@@ -442,7 +483,7 @@ acpi_table_parse_srat(enum acpi_srat_type id,
 
 int __init acpi_numa_init(void)
 {
-	int cnt = 0;
+	int i, fake_pxm, cnt = 0;
 
 	if (acpi_disabled)
 		return -EINVAL;
@@ -478,6 +519,22 @@ int __init acpi_numa_init(void)
 	/* SLIT: System Locality Information Table */
 	acpi_table_parse(ACPI_SIG_SLIT, acpi_parse_slit);
 
+	/*
+	 * CXL Fixed Memory Window Structures (CFMWS) must be parsed
+	 * after the SRAT. Create NUMA Nodes for CXL memory ranges that
+	 * are defined in the CFMWS and not already defined in the SRAT.
+	 * Initialize a fake_pxm as the first available PXM to emulate.
+	 */
+
+	/* fake_pxm is the next unused PXM value after SRAT parsing */
+	for (i = 0, fake_pxm = -1; i < MAX_NUMNODES - 1; i++) {
+		if (node_to_pxm_map[i] > fake_pxm)
+			fake_pxm = node_to_pxm_map[i];
+	}
+	fake_pxm++;
+	acpi_table_parse_cedt(ACPI_CEDT_TYPE_CFMWS, acpi_parse_cfmws,
+			      &fake_pxm);
+
 	if (cnt < 0)
 		return cnt;
 	else if (!parsed_numa_memblks)
diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 91e4072e7649..3163167ecc3a 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -125,7 +125,8 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
 			cfmws->base_hpa + cfmws->window_size - 1);
 		return 0;
 	}
-	dev_dbg(dev, "add: %s range %#llx-%#llx\n", dev_name(&cxld->dev),
+	dev_dbg(dev, "add: %s node: %d range %#llx-%#llx\n",
+		dev_name(&cxld->dev), phys_to_target_node(cxld->range.start),
 		cfmws->base_hpa, cfmws->base_hpa + cfmws->window_size - 1);
 
 	return 0;


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/6] Introduce acpi_table_parse_cedt and extra nodes for CXL.mem
  2021-10-29 19:51 [PATCH 0/6] Introduce acpi_table_parse_cedt and extra nodes for CXL.mem Dan Williams
                   ` (5 preceding siblings ...)
  2021-10-29 19:51 ` [PATCH 6/6] ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT Dan Williams
@ 2021-11-01 12:00 ` Jonathan Cameron
  2021-11-02  3:41   ` Dan Williams
  6 siblings, 1 reply; 16+ messages in thread
From: Jonathan Cameron @ 2021-11-01 12:00 UTC (permalink / raw)
  To: Dan Williams
  Cc: rafael.j.wysocki, Len Brown, Alison Schofield, Rafael J. Wysocki,
	linux-cxl, linux-acpi

On Fri, 29 Oct 2021 12:51:27 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> Hi Rafael,
> 
> While reviewing "[PATCH v3] ACPI: NUMA: Add a node and memblk for each
> CFMWS not in SRAT" [1]. I noticed that it was open coding CEDT sub-table
> parsing in a similar fashion as drivers/cxl/acpi.c. The driver open
> coded the parsing because the ACPI sub-table helpers are marked __init.
> In order to avoid the ongoing maintenance burden of a split between
> "early" and "late" ACPI sub-table parsing this series proposes to make
> those helpers available to drivers.
> 
> The savings in drivers/cxl/ are:
> 
>  drivers/cxl/Kconfig |    1 
>  drivers/cxl/acpi.c  |  234 +++++++++++++++++++--------------------------------
>  2 files changed, 88 insertions(+), 147 deletions(-)
> 
> ...and 15 lines new code not added are saved in this new version of
> "ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT".
> 
> Let me know if this looks ok to you and I can carry it in the CXL tree
> (i.e. after the merge window, for v5.17 consideration).
> 
> [1]: https://lore.kernel.org/r/20211019050908.449231-1-alison.schofield@intel.com

Is it worth the complexity of the __init_or_acpilib and export part?
Seems like a fiddly dance for what looks to be minor savings...

Jonathan

> 
> ---
> 
> Alison Schofield (1):
>       ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT
> 
> Dan Williams (5):
>       ACPI: Keep sub-table parsing infrastructure available for modules
>       ACPI: Teach ACPI table parsing about the CEDT header format
>       ACPI: Add a context argument for table parsing handlers
>       cxl/acpi: Convert CFMWS parsing to ACPI sub-table helpers
>       cxl/test: Mock acpi_table_parse_cedt()
> 
> 
>  drivers/acpi/Kconfig          |    3 +
>  drivers/acpi/numa/srat.c      |   59 ++++++++++
>  drivers/acpi/tables.c         |   87 +++++++++++----
>  drivers/cxl/Kconfig           |    1 
>  drivers/cxl/acpi.c            |  237 ++++++++++++++++-------------------------
>  include/linux/acpi.h          |   34 +++++-
>  tools/testing/cxl/Kbuild      |    3 -
>  tools/testing/cxl/test/cxl.c  |   68 ++++++++----
>  tools/testing/cxl/test/mock.c |   30 ++---
>  tools/testing/cxl/test/mock.h |    6 +
>  10 files changed, 304 insertions(+), 224 deletions(-)
> 
> base-commit: c6d7e1341cc99ba49df1384c8c5b3f534a5463b1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/6] Introduce acpi_table_parse_cedt and extra nodes for CXL.mem
  2021-11-01 12:00 ` [PATCH 0/6] Introduce acpi_table_parse_cedt and extra nodes for CXL.mem Jonathan Cameron
@ 2021-11-02  3:41   ` Dan Williams
  2021-11-02 17:44     ` Jonathan Cameron
  0 siblings, 1 reply; 16+ messages in thread
From: Dan Williams @ 2021-11-02  3:41 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Rafael J Wysocki, Len Brown, Alison Schofield, Rafael J. Wysocki,
	linux-cxl, Linux ACPI

On Mon, Nov 1, 2021 at 5:01 AM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> On Fri, 29 Oct 2021 12:51:27 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
>
> > Hi Rafael,
> >
> > While reviewing "[PATCH v3] ACPI: NUMA: Add a node and memblk for each
> > CFMWS not in SRAT" [1]. I noticed that it was open coding CEDT sub-table
> > parsing in a similar fashion as drivers/cxl/acpi.c. The driver open
> > coded the parsing because the ACPI sub-table helpers are marked __init.
> > In order to avoid the ongoing maintenance burden of a split between
> > "early" and "late" ACPI sub-table parsing this series proposes to make
> > those helpers available to drivers.
> >
> > The savings in drivers/cxl/ are:
> >
> >  drivers/cxl/Kconfig |    1
> >  drivers/cxl/acpi.c  |  234 +++++++++++++++++++--------------------------------
> >  2 files changed, 88 insertions(+), 147 deletions(-)
> >
> > ...and 15 lines new code not added are saved in this new version of
> > "ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT".
> >
> > Let me know if this looks ok to you and I can carry it in the CXL tree
> > (i.e. after the merge window, for v5.17 consideration).
> >
> > [1]: https://lore.kernel.org/r/20211019050908.449231-1-alison.schofield@intel.com
>
> Is it worth the complexity of the __init_or_acpilib and export part?
> Seems like a fiddly dance for what looks to be minor savings...

It follows the __initdata_or_meminfo precedent that identifies data
that is normally __init unless a specific driver needs it. The lesson
from the tinyconfig effort was that image size dies a death of many
cuts unless care is taken to preserve minor savings. Yes, it's likely
trivial in this case, but it's at least a gesture to avoid bloating
the kernel image size unnecessarily when the kernel has gotten by so
long with this infrastructure being purely __init.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/6] Introduce acpi_table_parse_cedt and extra nodes for CXL.mem
  2021-11-02  3:41   ` Dan Williams
@ 2021-11-02 17:44     ` Jonathan Cameron
  2021-11-05 15:07       ` Rafael J. Wysocki
  0 siblings, 1 reply; 16+ messages in thread
From: Jonathan Cameron @ 2021-11-02 17:44 UTC (permalink / raw)
  To: Dan Williams
  Cc: Rafael J Wysocki, Len Brown, Alison Schofield, Rafael J. Wysocki,
	linux-cxl, Linux ACPI

On Mon, 1 Nov 2021 20:41:34 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> On Mon, Nov 1, 2021 at 5:01 AM Jonathan Cameron
> <Jonathan.Cameron@huawei.com> wrote:
> >
> > On Fri, 29 Oct 2021 12:51:27 -0700
> > Dan Williams <dan.j.williams@intel.com> wrote:
> >  
> > > Hi Rafael,
> > >
> > > While reviewing "[PATCH v3] ACPI: NUMA: Add a node and memblk for each
> > > CFMWS not in SRAT" [1]. I noticed that it was open coding CEDT sub-table
> > > parsing in a similar fashion as drivers/cxl/acpi.c. The driver open
> > > coded the parsing because the ACPI sub-table helpers are marked __init.
> > > In order to avoid the ongoing maintenance burden of a split between
> > > "early" and "late" ACPI sub-table parsing this series proposes to make
> > > those helpers available to drivers.
> > >
> > > The savings in drivers/cxl/ are:
> > >
> > >  drivers/cxl/Kconfig |    1
> > >  drivers/cxl/acpi.c  |  234 +++++++++++++++++++--------------------------------
> > >  2 files changed, 88 insertions(+), 147 deletions(-)
> > >
> > > ...and 15 lines new code not added are saved in this new version of
> > > "ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT".
> > >
> > > Let me know if this looks ok to you and I can carry it in the CXL tree
> > > (i.e. after the merge window, for v5.17 consideration).
> > >
> > > [1]: https://lore.kernel.org/r/20211019050908.449231-1-alison.schofield@intel.com  
> >
> > Is it worth the complexity of the __init_or_acpilib and export part?
> > Seems like a fiddly dance for what looks to be minor savings...  
> 
> It follows the __initdata_or_meminfo precedent that identifies data
> that is normally __init unless a specific driver needs it. The lesson
> from the tinyconfig effort was that image size dies a death of many
> cuts unless care is taken to preserve minor savings. Yes, it's likely
> trivial in this case, but it's at least a gesture to avoid bloating
> the kernel image size unnecessarily when the kernel has gotten by so
> long with this infrastructure being purely __init.

I'm in favor avoiding bloat, but this is ACPI code so rarely very small machines
and very like that all distros will turn it on anyway on basis they will want
to support CXL (hopefully!)

I guess let's see what Rafael's opinion is.  I don't feel that strongly about
it if general view is that it is worth the small amount of complexity.

Jonathan


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/6] Introduce acpi_table_parse_cedt and extra nodes for CXL.mem
  2021-11-02 17:44     ` Jonathan Cameron
@ 2021-11-05 15:07       ` Rafael J. Wysocki
  0 siblings, 0 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2021-11-05 15:07 UTC (permalink / raw)
  To: Jonathan Cameron, Dan Williams
  Cc: Rafael J Wysocki, Len Brown, Alison Schofield, Rafael J. Wysocki,
	linux-cxl, Linux ACPI

On Tue, Nov 2, 2021 at 6:44 PM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> On Mon, 1 Nov 2021 20:41:34 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
>
> > On Mon, Nov 1, 2021 at 5:01 AM Jonathan Cameron
> > <Jonathan.Cameron@huawei.com> wrote:
> > >
> > > On Fri, 29 Oct 2021 12:51:27 -0700
> > > Dan Williams <dan.j.williams@intel.com> wrote:
> > >
> > > > Hi Rafael,
> > > >
> > > > While reviewing "[PATCH v3] ACPI: NUMA: Add a node and memblk for each
> > > > CFMWS not in SRAT" [1]. I noticed that it was open coding CEDT sub-table
> > > > parsing in a similar fashion as drivers/cxl/acpi.c. The driver open
> > > > coded the parsing because the ACPI sub-table helpers are marked __init.
> > > > In order to avoid the ongoing maintenance burden of a split between
> > > > "early" and "late" ACPI sub-table parsing this series proposes to make
> > > > those helpers available to drivers.
> > > >
> > > > The savings in drivers/cxl/ are:
> > > >
> > > >  drivers/cxl/Kconfig |    1
> > > >  drivers/cxl/acpi.c  |  234 +++++++++++++++++++--------------------------------
> > > >  2 files changed, 88 insertions(+), 147 deletions(-)
> > > >
> > > > ...and 15 lines new code not added are saved in this new version of
> > > > "ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT".
> > > >
> > > > Let me know if this looks ok to you and I can carry it in the CXL tree
> > > > (i.e. after the merge window, for v5.17 consideration).
> > > >
> > > > [1]: https://lore.kernel.org/r/20211019050908.449231-1-alison.schofield@intel.com
> > >
> > > Is it worth the complexity of the __init_or_acpilib and export part?
> > > Seems like a fiddly dance for what looks to be minor savings...
> >
> > It follows the __initdata_or_meminfo precedent that identifies data
> > that is normally __init unless a specific driver needs it. The lesson
> > from the tinyconfig effort was that image size dies a death of many
> > cuts unless care is taken to preserve minor savings. Yes, it's likely
> > trivial in this case, but it's at least a gesture to avoid bloating
> > the kernel image size unnecessarily when the kernel has gotten by so
> > long with this infrastructure being purely __init.
>
> I'm in favor avoiding bloat, but this is ACPI code so rarely very small machines
> and very like that all distros will turn it on anyway on basis they will want
> to support CXL (hopefully!)
>
> I guess let's see what Rafael's opinion is.  I don't feel that strongly about
> it if general view is that it is worth the small amount of complexity.

The general ACPI changes in this series are fine with me, so Dan
please feel free to add

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

to it.

Thanks!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 6/6] ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT
  2021-10-29 19:51 ` [PATCH 6/6] ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT Dan Williams
@ 2021-11-18 13:12   ` Jonathan Cameron
  2021-11-18 17:14     ` Dan Williams
  0 siblings, 1 reply; 16+ messages in thread
From: Jonathan Cameron @ 2021-11-18 13:12 UTC (permalink / raw)
  To: Dan Williams, rafael.j.wysocki, linux-cxl; +Cc: Alison Schofield, linux-acpi

On Fri, 29 Oct 2021 12:51:59 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> From: Alison Schofield <alison.schofield@intel.com>
> 
> During NUMA init, CXL memory defined in the SRAT Memory Affinity
> subtable may be assigned to a NUMA node. Since there is no
> requirement that the SRAT be comprehensive for CXL memory another
> mechanism is needed to assign NUMA nodes to CXL memory not identified
> in the SRAT.
> 
> Use the CXL Fixed Memory Window Structure (CFMWS) of the ACPI CXL
> Early Discovery Table (CEDT) to find all CXL memory ranges.
> Create a NUMA node for each CFMWS that is not already assigned to
> a NUMA node. Add a memblk attaching its host physical address
> range to the node.
> 
> Note that these ranges may not actually map any memory at boot time.
> They may describe persistent capacity or may be present to enable
> hot-plug.
> 
> Consumers can use phys_to_target_node() to discover the NUMA node.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Hi,

I was just discussing this with one of our firmware / ACPI experts and he asked
an interesting question.   If you want to have CFMWS regions correspond to
new NUMA nodes, why not put them in SRAT as hotpluggable memory, but have none
present in the memory map (whichever route you use to get that)?
We do this for normal memory hotplug as (via the other discussion on qemu virtio-mem
nodes) apparently does qemu. 

https://lore.kernel.org/all/655c65af-fd7a-8007-37b3-a56c60a0ec5b@redhat.com/

This doesn't solve the question of whether we have enough nodes, but it's
not worse than if we use CFMWS regions and fits within existing ACPI spec.

The only reason I can immediately think of to not do this, is that it might be
a pain to later change over to dynamic numa node allocation in a fashion that
then ignores SRAT entries.  Probably a solvable problem.

Jonathan


> ---
>  drivers/acpi/numa/srat.c |   59 +++++++++++++++++++++++++++++++++++++++++++++-
>  drivers/cxl/acpi.c       |    3 ++
>  2 files changed, 60 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c
> index b8795fc49097..66a0142dc78c 100644
> --- a/drivers/acpi/numa/srat.c
> +++ b/drivers/acpi/numa/srat.c
> @@ -298,6 +298,47 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
>  out_err:
>  	return -EINVAL;
>  }
> +
> +static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
> +				   void *arg, const unsigned long table_end)
> +{
> +	struct acpi_cedt_cfmws *cfmws;
> +	int *fake_pxm = arg;
> +	u64 start, end;
> +	int node;
> +
> +	cfmws = (struct acpi_cedt_cfmws *)header;
> +	start = cfmws->base_hpa;
> +	end = cfmws->base_hpa + cfmws->window_size;
> +
> +	/* Skip if the SRAT already described the NUMA details for this HPA */
> +	node = phys_to_target_node(start);
> +	if (node != NUMA_NO_NODE)
> +		return 0;
> +
> +	node = acpi_map_pxm_to_node(*fake_pxm);
> +
> +	if (node == NUMA_NO_NODE) {
> +		pr_err("ACPI NUMA: Too many proximity domains while processing CFMWS.\n");
> +		return -EINVAL;
> +	}
> +
> +	if (numa_add_memblk(node, start, end) < 0) {
> +		/* CXL driver must handle the NUMA_NO_NODE case */
> +		pr_warn("ACPI NUMA: Failed to add memblk for CFMWS node %d [mem %#llx-%#llx]\n",
> +			node, start, end);
> +	}
> +
> +	/* Set the next available fake_pxm value */
> +	(*fake_pxm)++;
> +	return 0;
> +}
> +#else
> +static int __init acpi_parse_cfmws(union acpi_subtable_headers *header,
> +				   void *arg, const unsigned long table_end)
> +{
> +	return 0;
> +}
>  #endif /* defined(CONFIG_X86) || defined (CONFIG_ARM64) */
>  
>  static int __init acpi_parse_slit(struct acpi_table_header *table)
> @@ -442,7 +483,7 @@ acpi_table_parse_srat(enum acpi_srat_type id,
>  
>  int __init acpi_numa_init(void)
>  {
> -	int cnt = 0;
> +	int i, fake_pxm, cnt = 0;
>  
>  	if (acpi_disabled)
>  		return -EINVAL;
> @@ -478,6 +519,22 @@ int __init acpi_numa_init(void)
>  	/* SLIT: System Locality Information Table */
>  	acpi_table_parse(ACPI_SIG_SLIT, acpi_parse_slit);
>  
> +	/*
> +	 * CXL Fixed Memory Window Structures (CFMWS) must be parsed
> +	 * after the SRAT. Create NUMA Nodes for CXL memory ranges that
> +	 * are defined in the CFMWS and not already defined in the SRAT.
> +	 * Initialize a fake_pxm as the first available PXM to emulate.
> +	 */
> +
> +	/* fake_pxm is the next unused PXM value after SRAT parsing */
> +	for (i = 0, fake_pxm = -1; i < MAX_NUMNODES - 1; i++) {
> +		if (node_to_pxm_map[i] > fake_pxm)
> +			fake_pxm = node_to_pxm_map[i];
> +	}
> +	fake_pxm++;
> +	acpi_table_parse_cedt(ACPI_CEDT_TYPE_CFMWS, acpi_parse_cfmws,
> +			      &fake_pxm);
> +
>  	if (cnt < 0)
>  		return cnt;
>  	else if (!parsed_numa_memblks)
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index 91e4072e7649..3163167ecc3a 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -125,7 +125,8 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
>  			cfmws->base_hpa + cfmws->window_size - 1);
>  		return 0;
>  	}
> -	dev_dbg(dev, "add: %s range %#llx-%#llx\n", dev_name(&cxld->dev),
> +	dev_dbg(dev, "add: %s node: %d range %#llx-%#llx\n",
> +		dev_name(&cxld->dev), phys_to_target_node(cxld->range.start),
>  		cfmws->base_hpa, cfmws->base_hpa + cfmws->window_size - 1);
>  
>  	return 0;
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 6/6] ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT
  2021-11-18 13:12   ` Jonathan Cameron
@ 2021-11-18 17:14     ` Dan Williams
  2021-11-18 17:53       ` Jonathan Cameron
  0 siblings, 1 reply; 16+ messages in thread
From: Dan Williams @ 2021-11-18 17:14 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Rafael J Wysocki, linux-cxl, Alison Schofield, Linux ACPI

On Thu, Nov 18, 2021 at 5:12 AM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> On Fri, 29 Oct 2021 12:51:59 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
>
> > From: Alison Schofield <alison.schofield@intel.com>
> >
> > During NUMA init, CXL memory defined in the SRAT Memory Affinity
> > subtable may be assigned to a NUMA node. Since there is no
> > requirement that the SRAT be comprehensive for CXL memory another
> > mechanism is needed to assign NUMA nodes to CXL memory not identified
> > in the SRAT.
> >
> > Use the CXL Fixed Memory Window Structure (CFMWS) of the ACPI CXL
> > Early Discovery Table (CEDT) to find all CXL memory ranges.
> > Create a NUMA node for each CFMWS that is not already assigned to
> > a NUMA node. Add a memblk attaching its host physical address
> > range to the node.
> >
> > Note that these ranges may not actually map any memory at boot time.
> > They may describe persistent capacity or may be present to enable
> > hot-plug.
> >
> > Consumers can use phys_to_target_node() to discover the NUMA node.
> >
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Hi,
>
> I was just discussing this with one of our firmware / ACPI experts and he asked
> an interesting question.   If you want to have CFMWS regions correspond to
> new NUMA nodes, why not put them in SRAT as hotpluggable memory, but have none
> present in the memory map (whichever route you use to get that)?
> We do this for normal memory hotplug as (via the other discussion on qemu virtio-mem
> nodes) apparently does qemu.
>
> https://lore.kernel.org/all/655c65af-fd7a-8007-37b3-a56c60a0ec5b@redhat.com/
>
> This doesn't solve the question of whether we have enough nodes, but it's
> not worse than if we use CFMWS regions and fits within existing ACPI spec.
>
> The only reason I can immediately think of to not do this, is that it might be
> a pain to later change over to dynamic numa node allocation in a fashion that
> then ignores SRAT entries.  Probably a solvable problem.

Interesting, yes, that works for expanding the NUMA node number space.
However, if you populate SRAT what do you put in the corresponding
HMAT entries? In the case of dynamic CXL regions the driver is going
to generate the equivalent of the corresponding HMAT data based on
what devices it decides to place in that range. I actually do not know
what happens with HMAT today for memory hotplug, but I suspect there
are less degrees of freedom of what might populate those ranges than
what CXL allows, and there is a chance to pre-populate the HMAT for
future hotplug.

All that said, if an ACPI platform did do that population it would not
collide with the scheme proposed in this patch because this is
checking SRAT for the range before padding the proximity domain number
space for CFMWS entries.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 6/6] ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT
  2021-11-18 17:14     ` Dan Williams
@ 2021-11-18 17:53       ` Jonathan Cameron
  2021-11-18 18:10         ` Dan Williams
  0 siblings, 1 reply; 16+ messages in thread
From: Jonathan Cameron @ 2021-11-18 17:53 UTC (permalink / raw)
  To: Dan Williams; +Cc: Rafael J Wysocki, linux-cxl, Alison Schofield, Linux ACPI

On Thu, 18 Nov 2021 09:14:07 -0800
Dan Williams <dan.j.williams@intel.com> wrote:

> On Thu, Nov 18, 2021 at 5:12 AM Jonathan Cameron
> <Jonathan.Cameron@huawei.com> wrote:
> >
> > On Fri, 29 Oct 2021 12:51:59 -0700
> > Dan Williams <dan.j.williams@intel.com> wrote:
> >  
> > > From: Alison Schofield <alison.schofield@intel.com>
> > >
> > > During NUMA init, CXL memory defined in the SRAT Memory Affinity
> > > subtable may be assigned to a NUMA node. Since there is no
> > > requirement that the SRAT be comprehensive for CXL memory another
> > > mechanism is needed to assign NUMA nodes to CXL memory not identified
> > > in the SRAT.
> > >
> > > Use the CXL Fixed Memory Window Structure (CFMWS) of the ACPI CXL
> > > Early Discovery Table (CEDT) to find all CXL memory ranges.
> > > Create a NUMA node for each CFMWS that is not already assigned to
> > > a NUMA node. Add a memblk attaching its host physical address
> > > range to the node.
> > >
> > > Note that these ranges may not actually map any memory at boot time.
> > > They may describe persistent capacity or may be present to enable
> > > hot-plug.
> > >
> > > Consumers can use phys_to_target_node() to discover the NUMA node.
> > >
> > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>  
> > Hi,
> >
> > I was just discussing this with one of our firmware / ACPI experts and he asked
> > an interesting question.   If you want to have CFMWS regions correspond to
> > new NUMA nodes, why not put them in SRAT as hotpluggable memory, but have none
> > present in the memory map (whichever route you use to get that)?
> > We do this for normal memory hotplug as (via the other discussion on qemu virtio-mem
> > nodes) apparently does qemu.
> >
> > https://lore.kernel.org/all/655c65af-fd7a-8007-37b3-a56c60a0ec5b@redhat.com/
> >
> > This doesn't solve the question of whether we have enough nodes, but it's
> > not worse than if we use CFMWS regions and fits within existing ACPI spec.
> >
> > The only reason I can immediately think of to not do this, is that it might be
> > a pain to later change over to dynamic numa node allocation in a fashion that
> > then ignores SRAT entries.  Probably a solvable problem.  
> 
> Interesting, yes, that works for expanding the NUMA node number space.
> However, if you populate SRAT what do you put in the corresponding
> HMAT entries? In the case of dynamic CXL regions the driver is going
> to generate the equivalent of the corresponding HMAT data based on
> what devices it decides to place in that range. I actually do not know
> what happens with HMAT today for memory hotplug, but I suspect there
> are less degrees of freedom of what might populate those ranges than
> what CXL allows, and there is a chance to pre-populate the HMAT for
> future hotplug.

So... There are two answers to that question as I understand it.

1) What Linux does is nothing.  You get whatever was in HMAT to start with.
   Worth noting that HMAT doesn't need to be in any sense 'complete' so it
   is possible there was nothing there with a target in this NUMA node.

2) What ACPI intends to happen if anyone implements it.  There is an event
notification for this..

"Heterogeneous Memory Attributes Update. Dynamic reconfiguration of
the system may cause existing latency, bandwidth or memory side caching
attribute to change. The platform software issues the Heterogeneous
Memory Attributes Update notification to a point on a device tree to
indicate to OSPM that it needs to invoke the _HMA objects associated
with the Heterogeneous Memory Attributes on the device tree starting
from the point notified."

So call an AML method in DSDT for which ever device has a notification event.
A similar dance is actually implemented for NFIT updates and Linux
actually implements that handling for that one.

> 
> All that said, if an ACPI platform did do that population it would not
> collide with the scheme proposed in this patch because this is
> checking SRAT for the range before padding the proximity domain number
> space for CFMWS entries.

Good point and that probably answers our concern.

Jonathan



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 6/6] ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT
  2021-11-18 17:53       ` Jonathan Cameron
@ 2021-11-18 18:10         ` Dan Williams
  2021-11-18 18:18           ` Jonathan Cameron
  0 siblings, 1 reply; 16+ messages in thread
From: Dan Williams @ 2021-11-18 18:10 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Rafael J Wysocki, linux-cxl, Alison Schofield, Linux ACPI

On Thu, Nov 18, 2021 at 9:54 AM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> On Thu, 18 Nov 2021 09:14:07 -0800
> Dan Williams <dan.j.williams@intel.com> wrote:
>
> > On Thu, Nov 18, 2021 at 5:12 AM Jonathan Cameron
> > <Jonathan.Cameron@huawei.com> wrote:
> > >
> > > On Fri, 29 Oct 2021 12:51:59 -0700
> > > Dan Williams <dan.j.williams@intel.com> wrote:
> > >
> > > > From: Alison Schofield <alison.schofield@intel.com>
> > > >
> > > > During NUMA init, CXL memory defined in the SRAT Memory Affinity
> > > > subtable may be assigned to a NUMA node. Since there is no
> > > > requirement that the SRAT be comprehensive for CXL memory another
> > > > mechanism is needed to assign NUMA nodes to CXL memory not identified
> > > > in the SRAT.
> > > >
> > > > Use the CXL Fixed Memory Window Structure (CFMWS) of the ACPI CXL
> > > > Early Discovery Table (CEDT) to find all CXL memory ranges.
> > > > Create a NUMA node for each CFMWS that is not already assigned to
> > > > a NUMA node. Add a memblk attaching its host physical address
> > > > range to the node.
> > > >
> > > > Note that these ranges may not actually map any memory at boot time.
> > > > They may describe persistent capacity or may be present to enable
> > > > hot-plug.
> > > >
> > > > Consumers can use phys_to_target_node() to discover the NUMA node.
> > > >
> > > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > > Hi,
> > >
> > > I was just discussing this with one of our firmware / ACPI experts and he asked
> > > an interesting question.   If you want to have CFMWS regions correspond to
> > > new NUMA nodes, why not put them in SRAT as hotpluggable memory, but have none
> > > present in the memory map (whichever route you use to get that)?
> > > We do this for normal memory hotplug as (via the other discussion on qemu virtio-mem
> > > nodes) apparently does qemu.
> > >
> > > https://lore.kernel.org/all/655c65af-fd7a-8007-37b3-a56c60a0ec5b@redhat.com/
> > >
> > > This doesn't solve the question of whether we have enough nodes, but it's
> > > not worse than if we use CFMWS regions and fits within existing ACPI spec.
> > >
> > > The only reason I can immediately think of to not do this, is that it might be
> > > a pain to later change over to dynamic numa node allocation in a fashion that
> > > then ignores SRAT entries.  Probably a solvable problem.
> >
> > Interesting, yes, that works for expanding the NUMA node number space.
> > However, if you populate SRAT what do you put in the corresponding
> > HMAT entries? In the case of dynamic CXL regions the driver is going
> > to generate the equivalent of the corresponding HMAT data based on
> > what devices it decides to place in that range. I actually do not know
> > what happens with HMAT today for memory hotplug, but I suspect there
> > are less degrees of freedom of what might populate those ranges than
> > what CXL allows, and there is a chance to pre-populate the HMAT for
> > future hotplug.
>
> So... There are two answers to that question as I understand it.
>
> 1) What Linux does is nothing.  You get whatever was in HMAT to start with.
>    Worth noting that HMAT doesn't need to be in any sense 'complete' so it
>    is possible there was nothing there with a target in this NUMA node.
>
> 2) What ACPI intends to happen if anyone implements it.  There is an event
> notification for this..
>
> "Heterogeneous Memory Attributes Update. Dynamic reconfiguration of
> the system may cause existing latency, bandwidth or memory side caching
> attribute to change. The platform software issues the Heterogeneous
> Memory Attributes Update notification to a point on a device tree to
> indicate to OSPM that it needs to invoke the _HMA objects associated
> with the Heterogeneous Memory Attributes on the device tree starting
> from the point notified."
>
> So call an AML method in DSDT for which ever device has a notification event.
> A similar dance is actually implemented for NFIT updates and Linux
> actually implements that handling for that one.

Oh, yes, I know I helped implement it, and I think it is objectively a
terrible interface because you have no idea what changed or is allowed
to change. _HMA is slightly less problematic because it can't invent
new ranges, but I have low expectations that the BIOS would be able to
coordinate properly with the OS that has done the dynamic
re-configuration of the CXL topology. The OS could tell the BIOS what
to put in the HMAT, but might as well skip that step and just have the
OS populate its parsed copy of the HMAT data. I like your "just don't
publish HMAT for entries for these ranges" observation better.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 6/6] ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT
  2021-11-18 18:10         ` Dan Williams
@ 2021-11-18 18:18           ` Jonathan Cameron
  0 siblings, 0 replies; 16+ messages in thread
From: Jonathan Cameron @ 2021-11-18 18:18 UTC (permalink / raw)
  To: Dan Williams; +Cc: Rafael J Wysocki, linux-cxl, Alison Schofield, Linux ACPI

On Thu, 18 Nov 2021 10:10:18 -0800
Dan Williams <dan.j.williams@intel.com> wrote:

> On Thu, Nov 18, 2021 at 9:54 AM Jonathan Cameron
> <Jonathan.Cameron@huawei.com> wrote:
> >
> > On Thu, 18 Nov 2021 09:14:07 -0800
> > Dan Williams <dan.j.williams@intel.com> wrote:
> >  
> > > On Thu, Nov 18, 2021 at 5:12 AM Jonathan Cameron
> > > <Jonathan.Cameron@huawei.com> wrote:  
> > > >
> > > > On Fri, 29 Oct 2021 12:51:59 -0700
> > > > Dan Williams <dan.j.williams@intel.com> wrote:
> > > >  
> > > > > From: Alison Schofield <alison.schofield@intel.com>
> > > > >
> > > > > During NUMA init, CXL memory defined in the SRAT Memory Affinity
> > > > > subtable may be assigned to a NUMA node. Since there is no
> > > > > requirement that the SRAT be comprehensive for CXL memory another
> > > > > mechanism is needed to assign NUMA nodes to CXL memory not identified
> > > > > in the SRAT.
> > > > >
> > > > > Use the CXL Fixed Memory Window Structure (CFMWS) of the ACPI CXL
> > > > > Early Discovery Table (CEDT) to find all CXL memory ranges.
> > > > > Create a NUMA node for each CFMWS that is not already assigned to
> > > > > a NUMA node. Add a memblk attaching its host physical address
> > > > > range to the node.
> > > > >
> > > > > Note that these ranges may not actually map any memory at boot time.
> > > > > They may describe persistent capacity or may be present to enable
> > > > > hot-plug.
> > > > >
> > > > > Consumers can use phys_to_target_node() to discover the NUMA node.
> > > > >
> > > > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > > > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>  
> > > > Hi,
> > > >
> > > > I was just discussing this with one of our firmware / ACPI experts and he asked
> > > > an interesting question.   If you want to have CFMWS regions correspond to
> > > > new NUMA nodes, why not put them in SRAT as hotpluggable memory, but have none
> > > > present in the memory map (whichever route you use to get that)?
> > > > We do this for normal memory hotplug as (via the other discussion on qemu virtio-mem
> > > > nodes) apparently does qemu.
> > > >
> > > > https://lore.kernel.org/all/655c65af-fd7a-8007-37b3-a56c60a0ec5b@redhat.com/
> > > >
> > > > This doesn't solve the question of whether we have enough nodes, but it's
> > > > not worse than if we use CFMWS regions and fits within existing ACPI spec.
> > > >
> > > > The only reason I can immediately think of to not do this, is that it might be
> > > > a pain to later change over to dynamic numa node allocation in a fashion that
> > > > then ignores SRAT entries.  Probably a solvable problem.  
> > >
> > > Interesting, yes, that works for expanding the NUMA node number space.
> > > However, if you populate SRAT what do you put in the corresponding
> > > HMAT entries? In the case of dynamic CXL regions the driver is going
> > > to generate the equivalent of the corresponding HMAT data based on
> > > what devices it decides to place in that range. I actually do not know
> > > what happens with HMAT today for memory hotplug, but I suspect there
> > > are less degrees of freedom of what might populate those ranges than
> > > what CXL allows, and there is a chance to pre-populate the HMAT for
> > > future hotplug.  
> >
> > So... There are two answers to that question as I understand it.
> >
> > 1) What Linux does is nothing.  You get whatever was in HMAT to start with.
> >    Worth noting that HMAT doesn't need to be in any sense 'complete' so it
> >    is possible there was nothing there with a target in this NUMA node.
> >
> > 2) What ACPI intends to happen if anyone implements it.  There is an event
> > notification for this..
> >
> > "Heterogeneous Memory Attributes Update. Dynamic reconfiguration of
> > the system may cause existing latency, bandwidth or memory side caching
> > attribute to change. The platform software issues the Heterogeneous
> > Memory Attributes Update notification to a point on a device tree to
> > indicate to OSPM that it needs to invoke the _HMA objects associated
> > with the Heterogeneous Memory Attributes on the device tree starting
> > from the point notified."
> >
> > So call an AML method in DSDT for which ever device has a notification event.
> > A similar dance is actually implemented for NFIT updates and Linux
> > actually implements that handling for that one.  
> 
> Oh, yes, I know I helped implement it, and I think it is objectively a
> terrible interface because you have no idea what changed or is allowed
> to change. _HMA is slightly less problematic because it can't invent
> new ranges, but I have low expectations that the BIOS would be able to
> coordinate properly with the OS that has done the dynamic
> re-configuration of the CXL topology. The OS could tell the BIOS what
> to put in the HMAT, but might as well skip that step and just have the
> OS populate its parsed copy of the HMAT data. I like your "just don't
> publish HMAT for entries for these ranges" observation better.

I absolutely aggree and wouldn't suggest using _HMA for CXL devices.
The OS can do a better job (I'm sure the firmware folks will disagree)
Nothing says we have to take any notice even if there is such an update :)

It's more reasonable for DIMM hotplug where some level of firmware is heavily
involved already or possibly for virtualization if it didn't know the right
answer at boot for some reason.  Then a hotplug controller / management
controller can fill in the details.

Jonathan

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-11-18 18:18 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-29 19:51 [PATCH 0/6] Introduce acpi_table_parse_cedt and extra nodes for CXL.mem Dan Williams
2021-10-29 19:51 ` [PATCH 1/6] ACPI: Keep sub-table parsing infrastructure available for modules Dan Williams
2021-10-29 19:51 ` [PATCH 2/6] ACPI: Teach ACPI table parsing about the CEDT header format Dan Williams
2021-10-29 19:51 ` [PATCH 3/6] ACPI: Add a context argument for table parsing handlers Dan Williams
2021-10-29 19:51 ` [PATCH 4/6] cxl/acpi: Convert CFMWS parsing to ACPI sub-table helpers Dan Williams
2021-10-29 19:51 ` [PATCH 5/6] cxl/test: Mock acpi_table_parse_cedt() Dan Williams
2021-10-29 19:51 ` [PATCH 6/6] ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT Dan Williams
2021-11-18 13:12   ` Jonathan Cameron
2021-11-18 17:14     ` Dan Williams
2021-11-18 17:53       ` Jonathan Cameron
2021-11-18 18:10         ` Dan Williams
2021-11-18 18:18           ` Jonathan Cameron
2021-11-01 12:00 ` [PATCH 0/6] Introduce acpi_table_parse_cedt and extra nodes for CXL.mem Jonathan Cameron
2021-11-02  3:41   ` Dan Williams
2021-11-02 17:44     ` Jonathan Cameron
2021-11-05 15:07       ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).