Linux-ACPI Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 0/5] Manual definition of Soft Reserved memory devices
@ 2020-03-02 22:19 Dan Williams
  2020-03-02 22:20 ` [PATCH 1/5] ACPI: NUMA: Add 'nohmat' option Dan Williams
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Dan Williams @ 2020-03-02 22:19 UTC (permalink / raw)
  To: linux-acpi
  Cc: Jason Gunthorpe, Peter Zijlstra, Ard Biesheuvel,
	Jonathan Cameron, Borislav Petkov, Wei Yang, x86, H. Peter Anvin,
	Brice Goglin, Thomas Gleixner, Jeff Moyer, Ingo Molnar,
	Dave Hansen, Rafael J. Wysocki, Ard Biesheuvel, Andy Lutomirski,
	Tom Lendacky, linux-nvdimm, linux-kernel

Given the current dearth of systems that supply an ACPI HMAT table, and
the utility of being able to manually define device-dax "hmem" instances
via the efi_fake_mem= option, relax the requirements for creating these
devices. Specifically, add an option (numa=nohmat) to optionally disable
consideration of the HMAT and update efi_fake_mem= to behave like
memmap=nn!ss in terms of delimiting device boundaries.

All review welcome of course, but the E820 changes want an x86
maintainer ack, the efi_fake_mem update needs Ard, and Rafael has
previously shepherded the HMAT changes. For the changes to
kernel/resource.c, where there is no clear maintainer, I just copied the
last few people to make thoughtful changes in that area. I am happy to
take these through the nvdimm tree along with these prerequisites
already in -next:

b2ca916ce392 ACPI: NUMA: Up-level "map to online node" functionality
4fcbe96e4d0b mm/numa: Skip NUMA_NO_NODE and online nodes in numa_map_to_online_node()
575e23b6e13c powerpc/papr_scm: Switch to numa_map_to_online_node()
1e5d8e1e47af x86/mm: Introduce CONFIG_NUMA_KEEP_MEMINFO
5d30f92e7631 x86/NUMA: Provide a range-to-target_node lookup facility
7b27a8622f80 libnvdimm/e820: Retrieve and populate correct 'target_node' info

Tested with:

        numa=nohmat efi_fake_mem=4G@9G:0x40000,4G@13G:0x40000

...to create to device-dax instances:

	# daxctl list -RDu
	[
	  {
	    "path":"\/platform\/hmem.1",
	    "id":1,
	    "size":"4.00 GiB (4.29 GB)",
	    "align":2097152,
	    "devices":[
	      {
	        "chardev":"dax1.0",
	        "size":"4.00 GiB (4.29 GB)",
	        "target_node":3,
	        "mode":"devdax"
	      }
	    ]
	  },
	  {
	    "path":"\/platform\/hmem.0",
	    "id":0,
	    "size":"4.00 GiB (4.29 GB)",
	    "align":2097152,
	    "devices":[
	      {
	        "chardev":"dax0.0",
	        "size":"4.00 GiB (4.29 GB)",
	        "target_node":2,
	        "mode":"devdax"
	      }
	    ]
	  }
	]

---

Dan Williams (5):
      ACPI: NUMA: Add 'nohmat' option
      efi/fake_mem: Arrange for a resource entry per efi_fake_mem instance
      ACPI: HMAT: Refactor hmat_register_target_device to hmem_register_device
      resource: Report parent to walk_iomem_res_desc() callback
      ACPI: HMAT: Attach a device for each soft-reserved range


 arch/x86/kernel/e820.c              |   16 +++++-
 arch/x86/mm/numa.c                  |    4 +
 drivers/acpi/numa/hmat.c            |   71 +++-----------------------
 drivers/dax/Kconfig                 |    5 ++
 drivers/dax/Makefile                |    3 -
 drivers/dax/hmem/Makefile           |    6 ++
 drivers/dax/hmem/device.c           |   97 +++++++++++++++++++++++++++++++++++
 drivers/dax/hmem/hmem.c             |    2 -
 drivers/firmware/efi/x86_fake_mem.c |   12 +++-
 include/acpi/acpi_numa.h            |    1 
 include/linux/dax.h                 |    8 +++
 kernel/resource.c                   |    1 
 12 files changed, 156 insertions(+), 70 deletions(-)
 create mode 100644 drivers/dax/hmem/Makefile
 create mode 100644 drivers/dax/hmem/device.c
 rename drivers/dax/{hmem.c => hmem/hmem.c} (98%)

base-commit: 7b27a8622f802761d5c6abd6c37b22312a35343c

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/5] ACPI: NUMA: Add 'nohmat' option
  2020-03-02 22:19 [PATCH 0/5] Manual definition of Soft Reserved memory devices Dan Williams
@ 2020-03-02 22:20 ` Dan Williams
  2020-03-18  0:08   ` Dan Williams
  2020-03-02 22:20 ` [PATCH 2/5] efi/fake_mem: Arrange for a resource entry per efi_fake_mem instance Dan Williams
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Dan Williams @ 2020-03-02 22:20 UTC (permalink / raw)
  To: linux-acpi
  Cc: x86, Rafael J. Wysocki, Dave Hansen, Andy Lutomirski,
	Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, ard.biesheuvel, linux-nvdimm, linux-kernel

Disable parsing of the HMAT for debug, to workaround broken platform
instances, or cases where it is otherwise not wanted.

Cc: x86@kernel.org
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/mm/numa.c       |    4 ++++
 drivers/acpi/numa/hmat.c |    3 ++-
 include/acpi/acpi_numa.h |    1 +
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 59ba008504dc..22de2e2610c1 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -44,6 +44,10 @@ static __init int numa_setup(char *opt)
 #ifdef CONFIG_ACPI_NUMA
 	if (!strncmp(opt, "noacpi", 6))
 		acpi_numa = -1;
+#ifdef CONFIG_ACPI_HMAT
+	if (!strncmp(opt, "nohmat", 6))
+		hmat_disable = 1;
+#endif
 #endif
 	return 0;
 }
diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
index 2c32cfb72370..d3db121e393a 100644
--- a/drivers/acpi/numa/hmat.c
+++ b/drivers/acpi/numa/hmat.c
@@ -26,6 +26,7 @@
 #include <linux/sysfs.h>
 
 static u8 hmat_revision;
+int hmat_disable __initdata;
 
 static LIST_HEAD(targets);
 static LIST_HEAD(initiators);
@@ -814,7 +815,7 @@ static __init int hmat_init(void)
 	enum acpi_hmat_type i;
 	acpi_status status;
 
-	if (srat_disabled())
+	if (srat_disabled() || hmat_disable)
 		return 0;
 
 	status = acpi_get_table(ACPI_SIG_SRAT, 0, &tbl);
diff --git a/include/acpi/acpi_numa.h b/include/acpi/acpi_numa.h
index fdebcfc6c8df..48ca468e9b61 100644
--- a/include/acpi/acpi_numa.h
+++ b/include/acpi/acpi_numa.h
@@ -18,6 +18,7 @@ extern int node_to_pxm(int);
 extern int acpi_map_pxm_to_node(int);
 extern unsigned char acpi_srat_revision;
 extern int acpi_numa __initdata;
+extern int hmat_disable __initdata;
 
 extern void bad_srat(void);
 extern int srat_disabled(void);


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/5] efi/fake_mem: Arrange for a resource entry per efi_fake_mem instance
  2020-03-02 22:19 [PATCH 0/5] Manual definition of Soft Reserved memory devices Dan Williams
  2020-03-02 22:20 ` [PATCH 1/5] ACPI: NUMA: Add 'nohmat' option Dan Williams
@ 2020-03-02 22:20 ` Dan Williams
  2020-03-03  8:01   ` Ard Biesheuvel
  2020-03-02 22:20 ` [PATCH 3/5] ACPI: HMAT: Refactor hmat_register_target_device to hmem_register_device Dan Williams
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Dan Williams @ 2020-03-02 22:20 UTC (permalink / raw)
  To: linux-acpi
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, Ard Biesheuvel, peterz, dave.hansen, ard.biesheuvel,
	linux-nvdimm, linux-kernel

In preparation for attaching a platform device per iomem resource teach
the efi_fake_mem code to create an e820 entry per instance. Similar to
E820_TYPE_PRAM, bypass merging resource when the e820 map is sanitized.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/kernel/e820.c              |   16 +++++++++++++++-
 drivers/firmware/efi/x86_fake_mem.c |   12 +++++++++---
 2 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index c5399e80c59c..96babb3a6629 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -305,6 +305,20 @@ static int __init cpcompare(const void *a, const void *b)
 	return (ap->addr != ap->entry->addr) - (bp->addr != bp->entry->addr);
 }
 
+static bool e820_nomerge(enum e820_type type)
+{
+	/*
+	 * These types may indicate distinct platform ranges aligned to
+	 * numa node, protection domain, performance domain, or other
+	 * boundaries. Do not merge them.
+	 */
+	if (type == E820_TYPE_PRAM)
+		return true;
+	if (type == E820_TYPE_SOFT_RESERVED)
+		return true;
+	return false;
+}
+
 int __init e820__update_table(struct e820_table *table)
 {
 	struct e820_entry *entries = table->entries;
@@ -380,7 +394,7 @@ int __init e820__update_table(struct e820_table *table)
 		}
 
 		/* Continue building up new map based on this information: */
-		if (current_type != last_type || current_type == E820_TYPE_PRAM) {
+		if (current_type != last_type || e820_nomerge(current_type)) {
 			if (last_type != 0)	 {
 				new_entries[new_nr_entries].size = change_point[chg_idx]->addr - last_addr;
 				/* Move forward only if the new size was non-zero: */
diff --git a/drivers/firmware/efi/x86_fake_mem.c b/drivers/firmware/efi/x86_fake_mem.c
index e5d6d5a1b240..0bafcc1bb0f6 100644
--- a/drivers/firmware/efi/x86_fake_mem.c
+++ b/drivers/firmware/efi/x86_fake_mem.c
@@ -38,7 +38,7 @@ void __init efi_fake_memmap_early(void)
 		m_start = mem->range.start;
 		m_end = mem->range.end;
 		for_each_efi_memory_desc(md) {
-			u64 start, end;
+			u64 start, end, size;
 
 			if (md->type != EFI_CONVENTIONAL_MEMORY)
 				continue;
@@ -58,11 +58,17 @@ void __init efi_fake_memmap_early(void)
 			 */
 			start = max(start, m_start);
 			end = min(end, m_end);
+			size = end - start + 1;
 
 			if (end <= start)
 				continue;
-			e820__range_update(start, end - start + 1, E820_TYPE_RAM,
-					E820_TYPE_SOFT_RESERVED);
+
+			/*
+			 * Ensure each efi_fake_mem instance results in
+			 * a unique e820 resource
+			 */
+			e820__range_remove(start, size, E820_TYPE_RAM, 1);
+			e820__range_add(start, size, E820_TYPE_SOFT_RESERVED);
 			e820__update_table(e820_table);
 		}
 	}


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/5] ACPI: HMAT: Refactor hmat_register_target_device to hmem_register_device
  2020-03-02 22:19 [PATCH 0/5] Manual definition of Soft Reserved memory devices Dan Williams
  2020-03-02 22:20 ` [PATCH 1/5] ACPI: NUMA: Add 'nohmat' option Dan Williams
  2020-03-02 22:20 ` [PATCH 2/5] efi/fake_mem: Arrange for a resource entry per efi_fake_mem instance Dan Williams
@ 2020-03-02 22:20 ` Dan Williams
  2020-03-02 22:20 ` [PATCH 4/5] resource: Report parent to walk_iomem_res_desc() callback Dan Williams
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Dan Williams @ 2020-03-02 22:20 UTC (permalink / raw)
  To: linux-acpi
  Cc: Rafael J. Wysocki, peterz, dave.hansen, ard.biesheuvel,
	linux-nvdimm, linux-kernel

In preparation for exposing "Soft Reserved" memory ranges without an
HMAT, move the hmem device registration to its own compilation unit and
make the implementation generic.

The generic implementation drops usage acpi_map_pxm_to_online_node()
that was translating ACPI proximity domain values and instead relies on
numa_map_to_online_node() to determine the numa node for the device.

Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/acpi/numa/hmat.c  |   68 ++++-----------------------------------------
 drivers/dax/Kconfig       |    4 +++
 drivers/dax/Makefile      |    3 +-
 drivers/dax/hmem/Makefile |    5 +++
 drivers/dax/hmem/device.c |   64 ++++++++++++++++++++++++++++++++++++++++++
 drivers/dax/hmem/hmem.c   |    2 +
 include/linux/dax.h       |    8 +++++
 7 files changed, 89 insertions(+), 65 deletions(-)
 create mode 100644 drivers/dax/hmem/Makefile
 create mode 100644 drivers/dax/hmem/device.c
 rename drivers/dax/{hmem.c => hmem/hmem.c} (98%)

diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
index d3db121e393a..2379efcea570 100644
--- a/drivers/acpi/numa/hmat.c
+++ b/drivers/acpi/numa/hmat.c
@@ -24,6 +24,7 @@
 #include <linux/mutex.h>
 #include <linux/node.h>
 #include <linux/sysfs.h>
+#include <linux/dax.h>
 
 static u8 hmat_revision;
 int hmat_disable __initdata;
@@ -635,66 +636,6 @@ static void hmat_register_target_perf(struct memory_target *target)
 	node_set_perf_attrs(mem_nid, &target->hmem_attrs, 0);
 }
 
-static void hmat_register_target_device(struct memory_target *target,
-		struct resource *r)
-{
-	/* define a clean / non-busy resource for the platform device */
-	struct resource res = {
-		.start = r->start,
-		.end = r->end,
-		.flags = IORESOURCE_MEM,
-	};
-	struct platform_device *pdev;
-	struct memregion_info info;
-	int rc, id;
-
-	rc = region_intersects(res.start, resource_size(&res), IORESOURCE_MEM,
-			IORES_DESC_SOFT_RESERVED);
-	if (rc != REGION_INTERSECTS)
-		return;
-
-	id = memregion_alloc(GFP_KERNEL);
-	if (id < 0) {
-		pr_err("memregion allocation failure for %pr\n", &res);
-		return;
-	}
-
-	pdev = platform_device_alloc("hmem", id);
-	if (!pdev) {
-		pr_err("hmem device allocation failure for %pr\n", &res);
-		goto out_pdev;
-	}
-
-	pdev->dev.numa_node = acpi_map_pxm_to_online_node(target->memory_pxm);
-	info = (struct memregion_info) {
-		.target_node = acpi_map_pxm_to_node(target->memory_pxm),
-	};
-	rc = platform_device_add_data(pdev, &info, sizeof(info));
-	if (rc < 0) {
-		pr_err("hmem memregion_info allocation failure for %pr\n", &res);
-		goto out_pdev;
-	}
-
-	rc = platform_device_add_resources(pdev, &res, 1);
-	if (rc < 0) {
-		pr_err("hmem resource allocation failure for %pr\n", &res);
-		goto out_resource;
-	}
-
-	rc = platform_device_add(pdev);
-	if (rc < 0) {
-		dev_err(&pdev->dev, "device add failed for %pr\n", &res);
-		goto out_resource;
-	}
-
-	return;
-
-out_resource:
-	put_device(&pdev->dev);
-out_pdev:
-	memregion_free(id);
-}
-
 static void hmat_register_target_devices(struct memory_target *target)
 {
 	struct resource *res;
@@ -706,8 +647,11 @@ static void hmat_register_target_devices(struct memory_target *target)
 	if (!IS_ENABLED(CONFIG_DEV_DAX_HMEM))
 		return;
 
-	for (res = target->memregions.child; res; res = res->sibling)
-		hmat_register_target_device(target, res);
+	for (res = target->memregions.child; res; res = res->sibling) {
+		int target_nid = acpi_map_pxm_to_node(target->memory_pxm);
+
+		hmem_register_device(target_nid, res);
+	}
 }
 
 static void hmat_register_target(struct memory_target *target)
diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
index 3b6c06f07326..a229f45d34aa 100644
--- a/drivers/dax/Kconfig
+++ b/drivers/dax/Kconfig
@@ -48,6 +48,10 @@ config DEV_DAX_HMEM
 
 	  Say M if unsure.
 
+config DEV_DAX_HMEM_DEVICES
+	depends on DEV_DAX_HMEM
+	def_bool y
+
 config DEV_DAX_KMEM
 	tristate "KMEM DAX: volatile-use of persistent memory"
 	default DEV_DAX
diff --git a/drivers/dax/Makefile b/drivers/dax/Makefile
index 80065b38b3c4..9d4ba672d305 100644
--- a/drivers/dax/Makefile
+++ b/drivers/dax/Makefile
@@ -2,11 +2,10 @@
 obj-$(CONFIG_DAX) += dax.o
 obj-$(CONFIG_DEV_DAX) += device_dax.o
 obj-$(CONFIG_DEV_DAX_KMEM) += kmem.o
-obj-$(CONFIG_DEV_DAX_HMEM) += dax_hmem.o
 
 dax-y := super.o
 dax-y += bus.o
 device_dax-y := device.o
-dax_hmem-y := hmem.o
 
 obj-y += pmem/
+obj-y += hmem/
diff --git a/drivers/dax/hmem/Makefile b/drivers/dax/hmem/Makefile
new file mode 100644
index 000000000000..a9d353d0c9ed
--- /dev/null
+++ b/drivers/dax/hmem/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_DEV_DAX_HMEM) += dax_hmem.o
+obj-$(CONFIG_DEV_DAX_HMEM_DEVICES) += device.o
+
+dax_hmem-y := hmem.o
diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c
new file mode 100644
index 000000000000..99bc15a8b031
--- /dev/null
+++ b/drivers/dax/hmem/device.c
@@ -0,0 +1,64 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/platform_device.h>
+#include <linux/memregion.h>
+#include <linux/module.h>
+#include <linux/mm.h>
+
+void hmem_register_device(int target_nid, struct resource *r)
+{
+	/* define a clean / non-busy resource for the platform device */
+	struct resource res = {
+		.start = r->start,
+		.end = r->end,
+		.flags = IORESOURCE_MEM,
+	};
+	struct platform_device *pdev;
+	struct memregion_info info;
+	int rc, id;
+
+	rc = region_intersects(res.start, resource_size(&res), IORESOURCE_MEM,
+			IORES_DESC_SOFT_RESERVED);
+	if (rc != REGION_INTERSECTS)
+		return;
+
+	id = memregion_alloc(GFP_KERNEL);
+	if (id < 0) {
+		pr_err("memregion allocation failure for %pr\n", &res);
+		return;
+	}
+
+	pdev = platform_device_alloc("hmem", id);
+	if (!pdev) {
+		pr_err("hmem device allocation failure for %pr\n", &res);
+		goto out_pdev;
+	}
+
+	pdev->dev.numa_node = numa_map_to_online_node(target_nid);
+	info = (struct memregion_info) {
+		.target_node = target_nid,
+	};
+	rc = platform_device_add_data(pdev, &info, sizeof(info));
+	if (rc < 0) {
+		pr_err("hmem memregion_info allocation failure for %pr\n", &res);
+		goto out_pdev;
+	}
+
+	rc = platform_device_add_resources(pdev, &res, 1);
+	if (rc < 0) {
+		pr_err("hmem resource allocation failure for %pr\n", &res);
+		goto out_resource;
+	}
+
+	rc = platform_device_add(pdev);
+	if (rc < 0) {
+		dev_err(&pdev->dev, "device add failed for %pr\n", &res);
+		goto out_resource;
+	}
+
+	return;
+
+out_resource:
+	put_device(&pdev->dev);
+out_pdev:
+	memregion_free(id);
+}
diff --git a/drivers/dax/hmem.c b/drivers/dax/hmem/hmem.c
similarity index 98%
rename from drivers/dax/hmem.c
rename to drivers/dax/hmem/hmem.c
index fe7214daf62e..29ceb5795297 100644
--- a/drivers/dax/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -3,7 +3,7 @@
 #include <linux/memregion.h>
 #include <linux/module.h>
 #include <linux/pfn_t.h>
-#include "bus.h"
+#include "../bus.h"
 
 static int dax_hmem_probe(struct platform_device *pdev)
 {
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 9bd8528bd305..9f6c282e9140 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -239,4 +239,12 @@ static inline bool dax_mapping(struct address_space *mapping)
 	return mapping->host && IS_DAX(mapping->host);
 }
 
+#ifdef CONFIG_DEV_DAX_HMEM_DEVICES
+void hmem_register_device(int target_nid, struct resource *r);
+#else
+static inline void hmem_register_device(int target_nid, struct resource *r)
+{
+}
+#endif
+
 #endif


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 4/5] resource: Report parent to walk_iomem_res_desc() callback
  2020-03-02 22:19 [PATCH 0/5] Manual definition of Soft Reserved memory devices Dan Williams
                   ` (2 preceding siblings ...)
  2020-03-02 22:20 ` [PATCH 3/5] ACPI: HMAT: Refactor hmat_register_target_device to hmem_register_device Dan Williams
@ 2020-03-02 22:20 ` Dan Williams
  2020-03-05 14:42   ` Tom Lendacky
  2020-03-02 22:20 ` [PATCH 5/5] ACPI: HMAT: Attach a device for each soft-reserved range Dan Williams
  2020-03-06 20:07 ` [PATCH 0/5] Manual definition of Soft Reserved memory devices Jeff Moyer
  5 siblings, 1 reply; 15+ messages in thread
From: Dan Williams @ 2020-03-02 22:20 UTC (permalink / raw)
  To: linux-acpi
  Cc: Jason Gunthorpe, Dave Hansen, Wei Yang, Tom Lendacky, peterz,
	ard.biesheuvel, linux-nvdimm, linux-kernel

In support of detecting whether a resource might have been been claimed,
report the parent to the walk_iomem_res_desc() callback. For example,
the ACPI HMAT parser publishes "hmem" platform devices per target range.
However, if the HMAT is disabled / missing a fallback driver can attach
devices to the raw memory ranges as a fallback if it sees unclaimed /
orphan "Soft Reserved" resources in the resource tree.

Otherwise, find_next_iomem_res() returns a resource with garbage data
from the stack allocation in __walk_iomem_res_desc() for the res->parent
field.

Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 kernel/resource.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/resource.c b/kernel/resource.c
index 76036a41143b..6e22e312fd55 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -386,6 +386,7 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end,
 		res->end = min(end, p->end);
 		res->flags = p->flags;
 		res->desc = p->desc;
+		res->parent = p->parent;
 	}
 
 	read_unlock(&resource_lock);


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 5/5] ACPI: HMAT: Attach a device for each soft-reserved range
  2020-03-02 22:19 [PATCH 0/5] Manual definition of Soft Reserved memory devices Dan Williams
                   ` (3 preceding siblings ...)
  2020-03-02 22:20 ` [PATCH 4/5] resource: Report parent to walk_iomem_res_desc() callback Dan Williams
@ 2020-03-02 22:20 ` Dan Williams
  2020-03-06 20:07 ` [PATCH 0/5] Manual definition of Soft Reserved memory devices Jeff Moyer
  5 siblings, 0 replies; 15+ messages in thread
From: Dan Williams @ 2020-03-02 22:20 UTC (permalink / raw)
  To: linux-acpi
  Cc: Jonathan Cameron, Brice Goglin, Ard Biesheuvel,
	Rafael J. Wysocki, Jeff Moyer, peterz, dave.hansen, linux-nvdimm,
	linux-kernel

The hmem enabling in commit 'cf8741ac57ed ("ACPI: NUMA: HMAT: Register
"soft reserved" memory as an "hmem" device")' only registered ranges to
the hmem driver for each soft-reservation that also appeared in the
HMAT. While this is meant to encourage platform firmware to "do the
right thing" and publish an HMAT, the corollary is that platforms that
fail to publish an accurate HMAT will strand memory from Linux usage.
Additionally, the "efi_fake_mem" kernel command line option enabling
will strand memory by default without an HMAT.

Arrange for "soft reserved" memory that goes unclaimed by HMAT entries
to be published as raw resource ranges for the hmem driver to consume.

Include a module parameter to disable either this fallback behavior, or
the hmat enabling from creating hmem devices. The module parameter
requires the hmem device enabling to have unique name in the module
namespace: "device_hmem".

Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/Kconfig       |    1 +
 drivers/dax/hmem/Makefile |    3 ++-
 drivers/dax/hmem/device.c |   33 +++++++++++++++++++++++++++++++++
 3 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
index a229f45d34aa..163edde6ba41 100644
--- a/drivers/dax/Kconfig
+++ b/drivers/dax/Kconfig
@@ -50,6 +50,7 @@ config DEV_DAX_HMEM
 
 config DEV_DAX_HMEM_DEVICES
 	depends on DEV_DAX_HMEM
+	select NUMA_KEEP_MEMINFO if NUMA
 	def_bool y
 
 config DEV_DAX_KMEM
diff --git a/drivers/dax/hmem/Makefile b/drivers/dax/hmem/Makefile
index a9d353d0c9ed..57377b4c3d47 100644
--- a/drivers/dax/hmem/Makefile
+++ b/drivers/dax/hmem/Makefile
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_DEV_DAX_HMEM) += dax_hmem.o
-obj-$(CONFIG_DEV_DAX_HMEM_DEVICES) += device.o
+obj-$(CONFIG_DEV_DAX_HMEM_DEVICES) += device_hmem.o
 
+device_hmem-y := device.o
 dax_hmem-y := hmem.o
diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c
index 99bc15a8b031..f9c5fa8b1880 100644
--- a/drivers/dax/hmem/device.c
+++ b/drivers/dax/hmem/device.c
@@ -4,6 +4,9 @@
 #include <linux/module.h>
 #include <linux/mm.h>
 
+static bool nohmem;
+module_param_named(disable, nohmem, bool, 0444);
+
 void hmem_register_device(int target_nid, struct resource *r)
 {
 	/* define a clean / non-busy resource for the platform device */
@@ -16,6 +19,9 @@ void hmem_register_device(int target_nid, struct resource *r)
 	struct memregion_info info;
 	int rc, id;
 
+	if (nohmem)
+		return;
+
 	rc = region_intersects(res.start, resource_size(&res), IORESOURCE_MEM,
 			IORES_DESC_SOFT_RESERVED);
 	if (rc != REGION_INTERSECTS)
@@ -62,3 +68,30 @@ void hmem_register_device(int target_nid, struct resource *r)
 out_pdev:
 	memregion_free(id);
 }
+
+static __init int hmem_register_one(struct resource *res, void *data)
+{
+	/*
+	 * If the resource is not a top-level resource it was already
+	 * assigned to a device by the HMAT parsing.
+	 */
+	if (res->parent != &iomem_resource)
+		return 0;
+
+	hmem_register_device(phys_to_target_node(res->start), res);
+
+	return 0;
+}
+
+static __init int hmem_init(void)
+{
+	walk_iomem_res_desc(IORES_DESC_SOFT_RESERVED,
+			IORESOURCE_MEM, 0, -1, NULL, hmem_register_one);
+	return 0;
+}
+
+/*
+ * As this is a fallback for address ranges unclaimed by the ACPI HMAT
+ * parsing it must be at an initcall level greater than hmat_init().
+ */
+late_initcall(hmem_init);


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] efi/fake_mem: Arrange for a resource entry per efi_fake_mem instance
  2020-03-02 22:20 ` [PATCH 2/5] efi/fake_mem: Arrange for a resource entry per efi_fake_mem instance Dan Williams
@ 2020-03-03  8:01   ` Ard Biesheuvel
  0 siblings, 0 replies; 15+ messages in thread
From: Ard Biesheuvel @ 2020-03-03  8:01 UTC (permalink / raw)
  To: Dan Williams
  Cc: ACPI Devel Maling List, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, the arch/x86 maintainers,
	Peter Zijlstra, Dave Hansen, linux-nvdimm,
	Linux Kernel Mailing List

On Mon, 2 Mar 2020 at 23:36, Dan Williams <dan.j.williams@intel.com> wrote:
>
> In preparation for attaching a platform device per iomem resource teach
> the efi_fake_mem code to create an e820 entry per instance. Similar to
> E820_TYPE_PRAM, bypass merging resource when the e820 map is sanitized.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Acked-by: Ard Biesheuvel <ardb@kernel.org>

> ---
>  arch/x86/kernel/e820.c              |   16 +++++++++++++++-
>  drivers/firmware/efi/x86_fake_mem.c |   12 +++++++++---
>  2 files changed, 24 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> index c5399e80c59c..96babb3a6629 100644
> --- a/arch/x86/kernel/e820.c
> +++ b/arch/x86/kernel/e820.c
> @@ -305,6 +305,20 @@ static int __init cpcompare(const void *a, const void *b)
>         return (ap->addr != ap->entry->addr) - (bp->addr != bp->entry->addr);
>  }
>
> +static bool e820_nomerge(enum e820_type type)
> +{
> +       /*
> +        * These types may indicate distinct platform ranges aligned to
> +        * numa node, protection domain, performance domain, or other
> +        * boundaries. Do not merge them.
> +        */
> +       if (type == E820_TYPE_PRAM)
> +               return true;
> +       if (type == E820_TYPE_SOFT_RESERVED)
> +               return true;
> +       return false;
> +}
> +
>  int __init e820__update_table(struct e820_table *table)
>  {
>         struct e820_entry *entries = table->entries;
> @@ -380,7 +394,7 @@ int __init e820__update_table(struct e820_table *table)
>                 }
>
>                 /* Continue building up new map based on this information: */
> -               if (current_type != last_type || current_type == E820_TYPE_PRAM) {
> +               if (current_type != last_type || e820_nomerge(current_type)) {
>                         if (last_type != 0)      {
>                                 new_entries[new_nr_entries].size = change_point[chg_idx]->addr - last_addr;
>                                 /* Move forward only if the new size was non-zero: */
> diff --git a/drivers/firmware/efi/x86_fake_mem.c b/drivers/firmware/efi/x86_fake_mem.c
> index e5d6d5a1b240..0bafcc1bb0f6 100644
> --- a/drivers/firmware/efi/x86_fake_mem.c
> +++ b/drivers/firmware/efi/x86_fake_mem.c
> @@ -38,7 +38,7 @@ void __init efi_fake_memmap_early(void)
>                 m_start = mem->range.start;
>                 m_end = mem->range.end;
>                 for_each_efi_memory_desc(md) {
> -                       u64 start, end;
> +                       u64 start, end, size;
>
>                         if (md->type != EFI_CONVENTIONAL_MEMORY)
>                                 continue;
> @@ -58,11 +58,17 @@ void __init efi_fake_memmap_early(void)
>                          */
>                         start = max(start, m_start);
>                         end = min(end, m_end);
> +                       size = end - start + 1;
>
>                         if (end <= start)
>                                 continue;
> -                       e820__range_update(start, end - start + 1, E820_TYPE_RAM,
> -                                       E820_TYPE_SOFT_RESERVED);
> +
> +                       /*
> +                        * Ensure each efi_fake_mem instance results in
> +                        * a unique e820 resource
> +                        */
> +                       e820__range_remove(start, size, E820_TYPE_RAM, 1);
> +                       e820__range_add(start, size, E820_TYPE_SOFT_RESERVED);
>                         e820__update_table(e820_table);
>                 }
>         }
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/5] resource: Report parent to walk_iomem_res_desc() callback
  2020-03-02 22:20 ` [PATCH 4/5] resource: Report parent to walk_iomem_res_desc() callback Dan Williams
@ 2020-03-05 14:42   ` Tom Lendacky
  2020-03-17 22:04     ` Dan Williams
  0 siblings, 1 reply; 15+ messages in thread
From: Tom Lendacky @ 2020-03-05 14:42 UTC (permalink / raw)
  To: Dan Williams, linux-acpi
  Cc: Jason Gunthorpe, Dave Hansen, Wei Yang, peterz, ard.biesheuvel,
	linux-nvdimm, linux-kernel

On 3/2/20 4:20 PM, Dan Williams wrote:
> In support of detecting whether a resource might have been been claimed,
> report the parent to the walk_iomem_res_desc() callback. For example,
> the ACPI HMAT parser publishes "hmem" platform devices per target range.
> However, if the HMAT is disabled / missing a fallback driver can attach
> devices to the raw memory ranges as a fallback if it sees unclaimed /
> orphan "Soft Reserved" resources in the resource tree.
> 
> Otherwise, find_next_iomem_res() returns a resource with garbage data
> from the stack allocation in __walk_iomem_res_desc() for the res->parent
> field.

Just wondering if we shouldn't just copy the complete resource struct and
just override the start and end values? That way, if some code in the
future wants to look at sibling or child values, another change isn't needed.

Just a thought.

Thanks,
Tom

> 
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Wei Yang <richardw.yang@linux.intel.com>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  kernel/resource.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 76036a41143b..6e22e312fd55 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -386,6 +386,7 @@ static int find_next_iomem_res(resource_size_t start, resource_size_t end,
>  		res->end = min(end, p->end);
>  		res->flags = p->flags;
>  		res->desc = p->desc;
> +		res->parent = p->parent;
>  	}
>  
>  	read_unlock(&resource_lock);
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/5] Manual definition of Soft Reserved memory devices
  2020-03-02 22:19 [PATCH 0/5] Manual definition of Soft Reserved memory devices Dan Williams
                   ` (4 preceding siblings ...)
  2020-03-02 22:20 ` [PATCH 5/5] ACPI: HMAT: Attach a device for each soft-reserved range Dan Williams
@ 2020-03-06 20:07 ` Jeff Moyer
  2020-03-06 21:05   ` Dan Williams
  5 siblings, 1 reply; 15+ messages in thread
From: Jeff Moyer @ 2020-03-06 20:07 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-acpi, Jason Gunthorpe, Peter Zijlstra, Ard Biesheuvel,
	Jonathan Cameron, Borislav Petkov, Wei Yang, x86, H. Peter Anvin,
	Brice Goglin, Thomas Gleixner, Ingo Molnar, Dave Hansen,
	Rafael J. Wysocki, Ard Biesheuvel, Andy Lutomirski, Tom Lendacky,
	linux-nvdimm, linux-kernel

Dan Williams <dan.j.williams@intel.com> writes:

> Given the current dearth of systems that supply an ACPI HMAT table, and
> the utility of being able to manually define device-dax "hmem" instances
> via the efi_fake_mem= option, relax the requirements for creating these
> devices. Specifically, add an option (numa=nohmat) to optionally disable
> consideration of the HMAT and update efi_fake_mem= to behave like
> memmap=nn!ss in terms of delimiting device boundaries.

So, am I correct in deducing that your primary motivation is testing
without hardware/firmware support?  This looks like a bit of a hack to
me, and I think maybe it would be better to just emulate the HMAT using
qemu.  I don't have a strong objection, though.

-Jeff

>
> All review welcome of course, but the E820 changes want an x86
> maintainer ack, the efi_fake_mem update needs Ard, and Rafael has
> previously shepherded the HMAT changes. For the changes to
> kernel/resource.c, where there is no clear maintainer, I just copied the
> last few people to make thoughtful changes in that area. I am happy to
> take these through the nvdimm tree along with these prerequisites
> already in -next:
>
> b2ca916ce392 ACPI: NUMA: Up-level "map to online node" functionality
> 4fcbe96e4d0b mm/numa: Skip NUMA_NO_NODE and online nodes in numa_map_to_online_node()
> 575e23b6e13c powerpc/papr_scm: Switch to numa_map_to_online_node()
> 1e5d8e1e47af x86/mm: Introduce CONFIG_NUMA_KEEP_MEMINFO
> 5d30f92e7631 x86/NUMA: Provide a range-to-target_node lookup facility
> 7b27a8622f80 libnvdimm/e820: Retrieve and populate correct 'target_node' info
>
> Tested with:
>
>         numa=nohmat efi_fake_mem=4G@9G:0x40000,4G@13G:0x40000
>
> ...to create to device-dax instances:
>
> 	# daxctl list -RDu
> 	[
> 	  {
> 	    "path":"\/platform\/hmem.1",
> 	    "id":1,
> 	    "size":"4.00 GiB (4.29 GB)",
> 	    "align":2097152,
> 	    "devices":[
> 	      {
> 	        "chardev":"dax1.0",
> 	        "size":"4.00 GiB (4.29 GB)",
> 	        "target_node":3,
> 	        "mode":"devdax"
> 	      }
> 	    ]
> 	  },
> 	  {
> 	    "path":"\/platform\/hmem.0",
> 	    "id":0,
> 	    "size":"4.00 GiB (4.29 GB)",
> 	    "align":2097152,
> 	    "devices":[
> 	      {
> 	        "chardev":"dax0.0",
> 	        "size":"4.00 GiB (4.29 GB)",
> 	        "target_node":2,
> 	        "mode":"devdax"
> 	      }
> 	    ]
> 	  }
> 	]
>
> ---
>
> Dan Williams (5):
>       ACPI: NUMA: Add 'nohmat' option
>       efi/fake_mem: Arrange for a resource entry per efi_fake_mem instance
>       ACPI: HMAT: Refactor hmat_register_target_device to hmem_register_device
>       resource: Report parent to walk_iomem_res_desc() callback
>       ACPI: HMAT: Attach a device for each soft-reserved range
>
>
>  arch/x86/kernel/e820.c              |   16 +++++-
>  arch/x86/mm/numa.c                  |    4 +
>  drivers/acpi/numa/hmat.c            |   71 +++-----------------------
>  drivers/dax/Kconfig                 |    5 ++
>  drivers/dax/Makefile                |    3 -
>  drivers/dax/hmem/Makefile           |    6 ++
>  drivers/dax/hmem/device.c           |   97 +++++++++++++++++++++++++++++++++++
>  drivers/dax/hmem/hmem.c             |    2 -
>  drivers/firmware/efi/x86_fake_mem.c |   12 +++-
>  include/acpi/acpi_numa.h            |    1 
>  include/linux/dax.h                 |    8 +++
>  kernel/resource.c                   |    1 
>  12 files changed, 156 insertions(+), 70 deletions(-)
>  create mode 100644 drivers/dax/hmem/Makefile
>  create mode 100644 drivers/dax/hmem/device.c
>  rename drivers/dax/{hmem.c => hmem/hmem.c} (98%)
>
> base-commit: 7b27a8622f802761d5c6abd6c37b22312a35343c


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/5] Manual definition of Soft Reserved memory devices
  2020-03-06 20:07 ` [PATCH 0/5] Manual definition of Soft Reserved memory devices Jeff Moyer
@ 2020-03-06 21:05   ` Dan Williams
  0 siblings, 0 replies; 15+ messages in thread
From: Dan Williams @ 2020-03-06 21:05 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Linux ACPI, Jason Gunthorpe, Peter Zijlstra, Ard Biesheuvel,
	Jonathan Cameron, Borislav Petkov, Wei Yang, X86 ML,
	H. Peter Anvin, Brice Goglin, Thomas Gleixner, Ingo Molnar,
	Dave Hansen, Rafael J. Wysocki, Ard Biesheuvel, Andy Lutomirski,
	Tom Lendacky, linux-nvdimm, Linux Kernel Mailing List,
	Joao Martins

On Fri, Mar 6, 2020 at 12:07 PM Jeff Moyer <jmoyer@redhat.com> wrote:
>
> Dan Williams <dan.j.williams@intel.com> writes:
>
> > Given the current dearth of systems that supply an ACPI HMAT table, and
> > the utility of being able to manually define device-dax "hmem" instances
> > via the efi_fake_mem= option, relax the requirements for creating these
> > devices. Specifically, add an option (numa=nohmat) to optionally disable
> > consideration of the HMAT and update efi_fake_mem= to behave like
> > memmap=nn!ss in terms of delimiting device boundaries.
>
> So, am I correct in deducing that your primary motivation is testing
> without hardware/firmware support?

My primary motivation is making the dax_kmem facility useful to
shipping platforms that have performance differentiated memory, but
may not have EFI-defined soft-reservations / HMAT (or
non-EFI-ACPI-platform equivalent). I'm anticipating HMAT enabled
platforms where the platform firmware policy for what is
soft-reserved, or not, is not the policy the system owner would pick.
I'd also highlight Joao's work [1] (see the TODO section) as an
indication of the demand for custom carving memory resources and
applying the device-dax memory management interface.

> This looks like a bit of a hack to
> me, and I think maybe it would be better to just emulate the HMAT using
> qemu.  I don't have a strong objection, though.

Yeah, qemu emulation does not help when you, the system owner, have a
different use case than what the bare-metal platform-firmware
envisioned for "specific-purpose memory".

[1]: https://lore.kernel.org/lkml/20200110190313.17144-1-joao.m.martins@oracle.com/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/5] resource: Report parent to walk_iomem_res_desc() callback
  2020-03-05 14:42   ` Tom Lendacky
@ 2020-03-17 22:04     ` Dan Williams
  0 siblings, 0 replies; 15+ messages in thread
From: Dan Williams @ 2020-03-17 22:04 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Linux ACPI, Jason Gunthorpe, Dave Hansen, Wei Yang,
	Peter Zijlstra, Ard Biesheuvel, linux-nvdimm,
	Linux Kernel Mailing List

On Thu, Mar 5, 2020 at 6:42 AM Tom Lendacky <thomas.lendacky@amd.com> wrote:
>
> On 3/2/20 4:20 PM, Dan Williams wrote:
> > In support of detecting whether a resource might have been been claimed,
> > report the parent to the walk_iomem_res_desc() callback. For example,
> > the ACPI HMAT parser publishes "hmem" platform devices per target range.
> > However, if the HMAT is disabled / missing a fallback driver can attach
> > devices to the raw memory ranges as a fallback if it sees unclaimed /
> > orphan "Soft Reserved" resources in the resource tree.
> >
> > Otherwise, find_next_iomem_res() returns a resource with garbage data
> > from the stack allocation in __walk_iomem_res_desc() for the res->parent
> > field.
>
> Just wondering if we shouldn't just copy the complete resource struct and
> just override the start and end values? That way, if some code in the
> future wants to look at sibling or child values, another change isn't needed.
>
> Just a thought.

Thanks for taking a look. I think it's ok to come update this again if
that need arises.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] ACPI: NUMA: Add 'nohmat' option
  2020-03-02 22:20 ` [PATCH 1/5] ACPI: NUMA: Add 'nohmat' option Dan Williams
@ 2020-03-18  0:08   ` Dan Williams
  2020-03-18  8:24     ` Rafael J. Wysocki
  0 siblings, 1 reply; 15+ messages in thread
From: Dan Williams @ 2020-03-18  0:08 UTC (permalink / raw)
  To: Linux ACPI
  Cc: X86 ML, Rafael J. Wysocki, Dave Hansen, Andy Lutomirski,
	Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Ard Biesheuvel, linux-nvdimm,
	Linux Kernel Mailing List

On Mon, Mar 2, 2020 at 2:36 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> Disable parsing of the HMAT for debug, to workaround broken platform
> instances, or cases where it is otherwise not wanted.

Rafael, any heartburn with this change to the numa= option?

...as I look at this I realize I failed to also update
Documentation/x86/x86_64/boot-options.rst, will fix.

>
> Cc: x86@kernel.org
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  arch/x86/mm/numa.c       |    4 ++++
>  drivers/acpi/numa/hmat.c |    3 ++-
>  include/acpi/acpi_numa.h |    1 +
>  3 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> index 59ba008504dc..22de2e2610c1 100644
> --- a/arch/x86/mm/numa.c
> +++ b/arch/x86/mm/numa.c
> @@ -44,6 +44,10 @@ static __init int numa_setup(char *opt)
>  #ifdef CONFIG_ACPI_NUMA
>         if (!strncmp(opt, "noacpi", 6))
>                 acpi_numa = -1;
> +#ifdef CONFIG_ACPI_HMAT
> +       if (!strncmp(opt, "nohmat", 6))
> +               hmat_disable = 1;
> +#endif
>  #endif
>         return 0;
>  }
> diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
> index 2c32cfb72370..d3db121e393a 100644
> --- a/drivers/acpi/numa/hmat.c
> +++ b/drivers/acpi/numa/hmat.c
> @@ -26,6 +26,7 @@
>  #include <linux/sysfs.h>
>
>  static u8 hmat_revision;
> +int hmat_disable __initdata;
>
>  static LIST_HEAD(targets);
>  static LIST_HEAD(initiators);
> @@ -814,7 +815,7 @@ static __init int hmat_init(void)
>         enum acpi_hmat_type i;
>         acpi_status status;
>
> -       if (srat_disabled())
> +       if (srat_disabled() || hmat_disable)
>                 return 0;
>
>         status = acpi_get_table(ACPI_SIG_SRAT, 0, &tbl);
> diff --git a/include/acpi/acpi_numa.h b/include/acpi/acpi_numa.h
> index fdebcfc6c8df..48ca468e9b61 100644
> --- a/include/acpi/acpi_numa.h
> +++ b/include/acpi/acpi_numa.h
> @@ -18,6 +18,7 @@ extern int node_to_pxm(int);
>  extern int acpi_map_pxm_to_node(int);
>  extern unsigned char acpi_srat_revision;
>  extern int acpi_numa __initdata;
> +extern int hmat_disable __initdata;
>
>  extern void bad_srat(void);
>  extern int srat_disabled(void);
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] ACPI: NUMA: Add 'nohmat' option
  2020-03-18  0:08   ` Dan Williams
@ 2020-03-18  8:24     ` Rafael J. Wysocki
  2020-03-18 17:39       ` Dan Williams
  0 siblings, 1 reply; 15+ messages in thread
From: Rafael J. Wysocki @ 2020-03-18  8:24 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux ACPI, X86 ML, Rafael J. Wysocki, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Ard Biesheuvel, linux-nvdimm,
	Linux Kernel Mailing List

On Wed, Mar 18, 2020 at 1:09 AM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Mon, Mar 2, 2020 at 2:36 PM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > Disable parsing of the HMAT for debug, to workaround broken platform
> > instances, or cases where it is otherwise not wanted.
>
> Rafael, any heartburn with this change to the numa= option?
>
> ...as I look at this I realize I failed to also update
> Documentation/x86/x86_64/boot-options.rst, will fix.

Thanks!

Apart from this just a minor nit below.

> >
> > Cc: x86@kernel.org
> > Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Andy Lutomirski <luto@kernel.org>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  arch/x86/mm/numa.c       |    4 ++++
> >  drivers/acpi/numa/hmat.c |    3 ++-
> >  include/acpi/acpi_numa.h |    1 +
> >  3 files changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> > index 59ba008504dc..22de2e2610c1 100644
> > --- a/arch/x86/mm/numa.c
> > +++ b/arch/x86/mm/numa.c
> > @@ -44,6 +44,10 @@ static __init int numa_setup(char *opt)
> >  #ifdef CONFIG_ACPI_NUMA
> >         if (!strncmp(opt, "noacpi", 6))
> >                 acpi_numa = -1;
> > +#ifdef CONFIG_ACPI_HMAT
> > +       if (!strncmp(opt, "nohmat", 6))
> > +               hmat_disable = 1;
> > +#endif

I wonder if IS_ENABLED() would work here?

> >  #endif
> >         return 0;
> >  }
> > diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
> > index 2c32cfb72370..d3db121e393a 100644
> > --- a/drivers/acpi/numa/hmat.c
> > +++ b/drivers/acpi/numa/hmat.c
> > @@ -26,6 +26,7 @@
> >  #include <linux/sysfs.h>
> >
> >  static u8 hmat_revision;
> > +int hmat_disable __initdata;
> >
> >  static LIST_HEAD(targets);
> >  static LIST_HEAD(initiators);
> > @@ -814,7 +815,7 @@ static __init int hmat_init(void)
> >         enum acpi_hmat_type i;
> >         acpi_status status;
> >
> > -       if (srat_disabled())
> > +       if (srat_disabled() || hmat_disable)
> >                 return 0;
> >
> >         status = acpi_get_table(ACPI_SIG_SRAT, 0, &tbl);
> > diff --git a/include/acpi/acpi_numa.h b/include/acpi/acpi_numa.h
> > index fdebcfc6c8df..48ca468e9b61 100644
> > --- a/include/acpi/acpi_numa.h
> > +++ b/include/acpi/acpi_numa.h
> > @@ -18,6 +18,7 @@ extern int node_to_pxm(int);
> >  extern int acpi_map_pxm_to_node(int);
> >  extern unsigned char acpi_srat_revision;
> >  extern int acpi_numa __initdata;
> > +extern int hmat_disable __initdata;
> >
> >  extern void bad_srat(void);
> >  extern int srat_disabled(void);
> >

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] ACPI: NUMA: Add 'nohmat' option
  2020-03-18  8:24     ` Rafael J. Wysocki
@ 2020-03-18 17:39       ` Dan Williams
  2020-03-19  9:30         ` Rafael J. Wysocki
  0 siblings, 1 reply; 15+ messages in thread
From: Dan Williams @ 2020-03-18 17:39 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux ACPI, X86 ML, Rafael J. Wysocki, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Ard Biesheuvel, linux-nvdimm,
	Linux Kernel Mailing List

On Wed, Mar 18, 2020 at 1:24 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Wed, Mar 18, 2020 at 1:09 AM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > On Mon, Mar 2, 2020 at 2:36 PM Dan Williams <dan.j.williams@intel.com> wrote:
> > >
> > > Disable parsing of the HMAT for debug, to workaround broken platform
> > > instances, or cases where it is otherwise not wanted.
> >
> > Rafael, any heartburn with this change to the numa= option?
> >
> > ...as I look at this I realize I failed to also update
> > Documentation/x86/x86_64/boot-options.rst, will fix.
>
> Thanks!
>
> Apart from this just a minor nit below.
>
> > >
> > > Cc: x86@kernel.org
> > > Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> > > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > > Cc: Andy Lutomirski <luto@kernel.org>
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Ingo Molnar <mingo@redhat.com>
> > > Cc: Borislav Petkov <bp@alien8.de>
> > > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > > ---
> > >  arch/x86/mm/numa.c       |    4 ++++
> > >  drivers/acpi/numa/hmat.c |    3 ++-
> > >  include/acpi/acpi_numa.h |    1 +
> > >  3 files changed, 7 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> > > index 59ba008504dc..22de2e2610c1 100644
> > > --- a/arch/x86/mm/numa.c
> > > +++ b/arch/x86/mm/numa.c
> > > @@ -44,6 +44,10 @@ static __init int numa_setup(char *opt)
> > >  #ifdef CONFIG_ACPI_NUMA
> > >         if (!strncmp(opt, "noacpi", 6))
> > >                 acpi_numa = -1;
> > > +#ifdef CONFIG_ACPI_HMAT
> > > +       if (!strncmp(opt, "nohmat", 6))
> > > +               hmat_disable = 1;
> > > +#endif
>
> I wonder if IS_ENABLED() would work here?

I took a look. hmat_disable, acpi_numa, and numa_emu_cmdline() are in
other compilation units. I could wrap writing those variables with
helper functions, and change numa_emu_cmdline(), to compile away when
their respective configuration options are not present.

Should we do that in general to have a touch point to report "you
specified an option that is invalid for your current kernel
configuration"? I'm happy to do that as a follow-on if you think it's
worthwhile.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] ACPI: NUMA: Add 'nohmat' option
  2020-03-18 17:39       ` Dan Williams
@ 2020-03-19  9:30         ` Rafael J. Wysocki
  0 siblings, 0 replies; 15+ messages in thread
From: Rafael J. Wysocki @ 2020-03-19  9:30 UTC (permalink / raw)
  To: Dan Williams
  Cc: Rafael J. Wysocki, Linux ACPI, X86 ML, Rafael J. Wysocki,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, Ard Biesheuvel,
	linux-nvdimm, Linux Kernel Mailing List

On Wed, Mar 18, 2020 at 6:39 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Wed, Mar 18, 2020 at 1:24 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Wed, Mar 18, 2020 at 1:09 AM Dan Williams <dan.j.williams@intel.com> wrote:
> > >
> > > On Mon, Mar 2, 2020 at 2:36 PM Dan Williams <dan.j.williams@intel.com> wrote:
> > > >
> > > > Disable parsing of the HMAT for debug, to workaround broken platform
> > > > instances, or cases where it is otherwise not wanted.
> > >
> > > Rafael, any heartburn with this change to the numa= option?
> > >
> > > ...as I look at this I realize I failed to also update
> > > Documentation/x86/x86_64/boot-options.rst, will fix.
> >
> > Thanks!
> >
> > Apart from this just a minor nit below.
> >
> > > >
> > > > Cc: x86@kernel.org
> > > > Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> > > > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > > > Cc: Andy Lutomirski <luto@kernel.org>
> > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > Cc: Ingo Molnar <mingo@redhat.com>
> > > > Cc: Borislav Petkov <bp@alien8.de>
> > > > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > > > ---
> > > >  arch/x86/mm/numa.c       |    4 ++++
> > > >  drivers/acpi/numa/hmat.c |    3 ++-
> > > >  include/acpi/acpi_numa.h |    1 +
> > > >  3 files changed, 7 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> > > > index 59ba008504dc..22de2e2610c1 100644
> > > > --- a/arch/x86/mm/numa.c
> > > > +++ b/arch/x86/mm/numa.c
> > > > @@ -44,6 +44,10 @@ static __init int numa_setup(char *opt)
> > > >  #ifdef CONFIG_ACPI_NUMA
> > > >         if (!strncmp(opt, "noacpi", 6))
> > > >                 acpi_numa = -1;
> > > > +#ifdef CONFIG_ACPI_HMAT
> > > > +       if (!strncmp(opt, "nohmat", 6))
> > > > +               hmat_disable = 1;
> > > > +#endif
> >
> > I wonder if IS_ENABLED() would work here?
>
> I took a look. hmat_disable, acpi_numa, and numa_emu_cmdline() are in
> other compilation units. I could wrap writing those variables with
> helper functions, and change numa_emu_cmdline(), to compile away when
> their respective configuration options are not present.
>
> Should we do that in general to have a touch point to report "you
> specified an option that is invalid for your current kernel
> configuration"? I'm happy to do that as a follow-on if you think it's
> worthwhile.

Yes, please.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, back to index

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-02 22:19 [PATCH 0/5] Manual definition of Soft Reserved memory devices Dan Williams
2020-03-02 22:20 ` [PATCH 1/5] ACPI: NUMA: Add 'nohmat' option Dan Williams
2020-03-18  0:08   ` Dan Williams
2020-03-18  8:24     ` Rafael J. Wysocki
2020-03-18 17:39       ` Dan Williams
2020-03-19  9:30         ` Rafael J. Wysocki
2020-03-02 22:20 ` [PATCH 2/5] efi/fake_mem: Arrange for a resource entry per efi_fake_mem instance Dan Williams
2020-03-03  8:01   ` Ard Biesheuvel
2020-03-02 22:20 ` [PATCH 3/5] ACPI: HMAT: Refactor hmat_register_target_device to hmem_register_device Dan Williams
2020-03-02 22:20 ` [PATCH 4/5] resource: Report parent to walk_iomem_res_desc() callback Dan Williams
2020-03-05 14:42   ` Tom Lendacky
2020-03-17 22:04     ` Dan Williams
2020-03-02 22:20 ` [PATCH 5/5] ACPI: HMAT: Attach a device for each soft-reserved range Dan Williams
2020-03-06 20:07 ` [PATCH 0/5] Manual definition of Soft Reserved memory devices Jeff Moyer
2020-03-06 21:05   ` Dan Williams

Linux-ACPI Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-acpi/0 linux-acpi/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-acpi linux-acpi/ https://lore.kernel.org/linux-acpi \
		linux-acpi@vger.kernel.org
	public-inbox-index linux-acpi

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-acpi


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git