From: Dan Williams <dan.j.williams@intel.com> To: linux-nvdimm@lists.01.org Cc: Dave Hansen <dave.hansen@linux.intel.com>, Andy Lutomirski <luto@kernel.org>, Peter Zijlstra <peterz@infradead.org>, Thomas Gleixner <tglx@linutronix.de>, Andrew Morton <akpm@linux-foundation.org>, David Hildenbrand <david@redhat.com>, Michal Hocko <mhocko@suse.com>, Christoph Hellwig <hch@lst.de>, Ingo Molnar <mingo@kernel.org>, linux-kernel@vger.kernel.org, x86@kernel.org Subject: [PATCH v5 6/6] libnvdimm/e820: Retrieve and populate correct 'target_node' info Date: Sun, 16 Feb 2020 12:01:16 -0800 [thread overview] Message-ID: <158188327614.894464.13122730362187722603.stgit@dwillia2-desk3.amr.corp.intel.com> (raw) In-Reply-To: <158188324272.894464.5941332130956525504.stgit@dwillia2-desk3.amr.corp.intel.com> Use the new phys_to_target_node() and numa_map_to_online_node() helpers to retrieve the correct id for the 'numa_node' ("local" / online initiator node) and 'target_node' (offline target memory node) sysfs attributes. Below is an example from a 4 NUMA node system where all the memory on node2 is pmem / reserved. It should be noted that with the arrival of the ACPI HMAT table and EFI Specific Purpose Memory the kernel will start to see more platforms with reserved / performance differentiated memory in its own NUMA node. Hence all the stakeholders on the Cc for what is ostensibly a libnvdimm local patch. === Before === /* Notice no online memory on node2 at start */ # numactl --hardware available: 3 nodes (0-1,3) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 node 0 size: 3958 MB node 0 free: 3708 MB node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 node 1 size: 4027 MB node 1 free: 3871 MB node 3 cpus: node 3 size: 3994 MB node 3 free: 3971 MB node distances: node 0 1 3 0: 10 21 21 1: 21 10 21 3: 21 21 10 /* * Put the pmem namespace into devdax mode so it can be assigned to the * kmem driver */ # ndctl create-namespace -e namespace0.0 -m devdax -f { "dev":"namespace0.0", "mode":"devdax", "map":"dev", "size":"3.94 GiB (4.23 GB)", "uuid":"1650af9b-9ba3-4704-acd6-10178399d9a3", [..] } /* Online Persistent Memory as System RAM */ # daxctl reconfigure-device --mode=system-ram dax0.0 libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success [ { "chardev":"dax0.0", "size":4225761280, "target_node":0, "mode":"system-ram" } ] reconfigured 1 device /* Note that the memory is onlined by default to the wrong node, node0 */ # numactl --hardware available: 3 nodes (0-1,3) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 node 0 size: 7926 MB node 0 free: 7655 MB node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 node 1 size: 4027 MB node 1 free: 3871 MB node 3 cpus: node 3 size: 3994 MB node 3 free: 3971 MB node distances: node 0 1 3 0: 10 21 21 1: 21 10 21 3: 21 21 10 === After === /* Notice that the "phys_index" error messages are gone */ # daxctl reconfigure-device --mode=system-ram dax0.0 [ { "chardev":"dax0.0", "size":4225761280, "target_node":2, "mode":"system-ram" } ] reconfigured 1 device /* Notice that node2 is now correctly populated */ # numactl --hardware available: 4 nodes (0-3) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 node 0 size: 3958 MB node 0 free: 3793 MB node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 node 1 size: 4027 MB node 1 free: 3851 MB node 2 cpus: node 2 size: 3968 MB node 2 free: 3968 MB node 3 cpus: node 3 size: 3994 MB node 3 free: 3908 MB node distances: node 0 1 2 3 0: 10 21 21 21 1: 21 10 21 21 2: 21 21 10 21 3: 21 21 21 10 Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Hildenbrand <david@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- arch/x86/Kconfig | 1 + drivers/nvdimm/e820.c | 18 ++++-------------- 2 files changed, 5 insertions(+), 14 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index beea77046f9b..d4e446daf457 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1664,6 +1664,7 @@ config X86_PMEM_LEGACY depends on PHYS_ADDR_T_64BIT depends on BLK_DEV select X86_PMEM_LEGACY_DEVICE + select NUMA_KEEP_MEMINFO if NUMA select LIBNVDIMM help Treat memory marked using the non-standard e820 type of 12 as used diff --git a/drivers/nvdimm/e820.c b/drivers/nvdimm/e820.c index e02f60ad6c99..4cd18be9d0e9 100644 --- a/drivers/nvdimm/e820.c +++ b/drivers/nvdimm/e820.c @@ -7,6 +7,7 @@ #include <linux/memory_hotplug.h> #include <linux/libnvdimm.h> #include <linux/module.h> +#include <linux/numa.h> static int e820_pmem_remove(struct platform_device *pdev) { @@ -16,27 +17,16 @@ static int e820_pmem_remove(struct platform_device *pdev) return 0; } -#ifdef CONFIG_MEMORY_HOTPLUG -static int e820_range_to_nid(resource_size_t addr) -{ - return memory_add_physaddr_to_nid(addr); -} -#else -static int e820_range_to_nid(resource_size_t addr) -{ - return NUMA_NO_NODE; -} -#endif - static int e820_register_one(struct resource *res, void *data) { struct nd_region_desc ndr_desc; struct nvdimm_bus *nvdimm_bus = data; + int nid = phys_to_target_node(res->start); memset(&ndr_desc, 0, sizeof(ndr_desc)); ndr_desc.res = res; - ndr_desc.numa_node = e820_range_to_nid(res->start); - ndr_desc.target_node = ndr_desc.numa_node; + ndr_desc.numa_node = numa_map_to_online_node(nid); + ndr_desc.target_node = nid; set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags); if (!nvdimm_pmem_region_create(nvdimm_bus, &ndr_desc)) return -ENXIO; _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org
WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com> To: linux-nvdimm@lists.01.org Cc: Dave Hansen <dave.hansen@linux.intel.com>, Andy Lutomirski <luto@kernel.org>, Peter Zijlstra <peterz@infradead.org>, Thomas Gleixner <tglx@linutronix.de>, Andrew Morton <akpm@linux-foundation.org>, David Hildenbrand <david@redhat.com>, Michal Hocko <mhocko@suse.com>, Ira Weiny <ira.weiny@intel.com>, Vishal Verma <vishal.l.verma@intel.com>, Christoph Hellwig <hch@lst.de>, Ingo Molnar <mingo@kernel.org>, linux-kernel@vger.kernel.org, x86@kernel.org Subject: [PATCH v5 6/6] libnvdimm/e820: Retrieve and populate correct 'target_node' info Date: Sun, 16 Feb 2020 12:01:16 -0800 [thread overview] Message-ID: <158188327614.894464.13122730362187722603.stgit@dwillia2-desk3.amr.corp.intel.com> (raw) In-Reply-To: <158188324272.894464.5941332130956525504.stgit@dwillia2-desk3.amr.corp.intel.com> Use the new phys_to_target_node() and numa_map_to_online_node() helpers to retrieve the correct id for the 'numa_node' ("local" / online initiator node) and 'target_node' (offline target memory node) sysfs attributes. Below is an example from a 4 NUMA node system where all the memory on node2 is pmem / reserved. It should be noted that with the arrival of the ACPI HMAT table and EFI Specific Purpose Memory the kernel will start to see more platforms with reserved / performance differentiated memory in its own NUMA node. Hence all the stakeholders on the Cc for what is ostensibly a libnvdimm local patch. === Before === /* Notice no online memory on node2 at start */ # numactl --hardware available: 3 nodes (0-1,3) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 node 0 size: 3958 MB node 0 free: 3708 MB node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 node 1 size: 4027 MB node 1 free: 3871 MB node 3 cpus: node 3 size: 3994 MB node 3 free: 3971 MB node distances: node 0 1 3 0: 10 21 21 1: 21 10 21 3: 21 21 10 /* * Put the pmem namespace into devdax mode so it can be assigned to the * kmem driver */ # ndctl create-namespace -e namespace0.0 -m devdax -f { "dev":"namespace0.0", "mode":"devdax", "map":"dev", "size":"3.94 GiB (4.23 GB)", "uuid":"1650af9b-9ba3-4704-acd6-10178399d9a3", [..] } /* Online Persistent Memory as System RAM */ # daxctl reconfigure-device --mode=system-ram dax0.0 libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success [ { "chardev":"dax0.0", "size":4225761280, "target_node":0, "mode":"system-ram" } ] reconfigured 1 device /* Note that the memory is onlined by default to the wrong node, node0 */ # numactl --hardware available: 3 nodes (0-1,3) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 node 0 size: 7926 MB node 0 free: 7655 MB node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 node 1 size: 4027 MB node 1 free: 3871 MB node 3 cpus: node 3 size: 3994 MB node 3 free: 3971 MB node distances: node 0 1 3 0: 10 21 21 1: 21 10 21 3: 21 21 10 === After === /* Notice that the "phys_index" error messages are gone */ # daxctl reconfigure-device --mode=system-ram dax0.0 [ { "chardev":"dax0.0", "size":4225761280, "target_node":2, "mode":"system-ram" } ] reconfigured 1 device /* Notice that node2 is now correctly populated */ # numactl --hardware available: 4 nodes (0-3) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 node 0 size: 3958 MB node 0 free: 3793 MB node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 node 1 size: 4027 MB node 1 free: 3851 MB node 2 cpus: node 2 size: 3968 MB node 2 free: 3968 MB node 3 cpus: node 3 size: 3994 MB node 3 free: 3908 MB node distances: node 0 1 2 3 0: 10 21 21 21 1: 21 10 21 21 2: 21 21 10 21 3: 21 21 21 10 Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Hildenbrand <david@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- arch/x86/Kconfig | 1 + drivers/nvdimm/e820.c | 18 ++++-------------- 2 files changed, 5 insertions(+), 14 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index beea77046f9b..d4e446daf457 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1664,6 +1664,7 @@ config X86_PMEM_LEGACY depends on PHYS_ADDR_T_64BIT depends on BLK_DEV select X86_PMEM_LEGACY_DEVICE + select NUMA_KEEP_MEMINFO if NUMA select LIBNVDIMM help Treat memory marked using the non-standard e820 type of 12 as used diff --git a/drivers/nvdimm/e820.c b/drivers/nvdimm/e820.c index e02f60ad6c99..4cd18be9d0e9 100644 --- a/drivers/nvdimm/e820.c +++ b/drivers/nvdimm/e820.c @@ -7,6 +7,7 @@ #include <linux/memory_hotplug.h> #include <linux/libnvdimm.h> #include <linux/module.h> +#include <linux/numa.h> static int e820_pmem_remove(struct platform_device *pdev) { @@ -16,27 +17,16 @@ static int e820_pmem_remove(struct platform_device *pdev) return 0; } -#ifdef CONFIG_MEMORY_HOTPLUG -static int e820_range_to_nid(resource_size_t addr) -{ - return memory_add_physaddr_to_nid(addr); -} -#else -static int e820_range_to_nid(resource_size_t addr) -{ - return NUMA_NO_NODE; -} -#endif - static int e820_register_one(struct resource *res, void *data) { struct nd_region_desc ndr_desc; struct nvdimm_bus *nvdimm_bus = data; + int nid = phys_to_target_node(res->start); memset(&ndr_desc, 0, sizeof(ndr_desc)); ndr_desc.res = res; - ndr_desc.numa_node = e820_range_to_nid(res->start); - ndr_desc.target_node = ndr_desc.numa_node; + ndr_desc.numa_node = numa_map_to_online_node(nid); + ndr_desc.target_node = nid; set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags); if (!nvdimm_pmem_region_create(nvdimm_bus, &ndr_desc)) return -ENXIO;
next prev parent reply other threads:[~2020-02-16 20:17 UTC|newest] Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-01-22 3:04 [PATCH v4 0/6] Memory Hierarchy: Enable target node lookups for reserved memory Dan Williams 2020-01-22 3:04 ` Dan Williams 2020-01-22 3:04 ` [PATCH v4 1/6] ACPI: NUMA: Up-level "map to online node" functionality Dan Williams 2020-01-22 3:04 ` Dan Williams 2020-01-22 3:04 ` [PATCH v4 2/6] mm/numa: Skip NUMA_NO_NODE and online nodes in numa_map_to_online_node() Dan Williams 2020-01-22 3:04 ` Dan Williams 2020-01-22 3:04 ` [PATCH v4 3/6] powerpc/papr_scm: Switch to numa_map_to_online_node() Dan Williams 2020-01-22 3:04 ` Dan Williams 2020-01-22 3:04 ` [PATCH v4 4/6] x86/mm: Introduce CONFIG_KEEP_NUMA Dan Williams 2020-01-22 3:04 ` Dan Williams 2020-02-13 9:32 ` Ingo Molnar 2020-02-13 9:32 ` Ingo Molnar 2020-02-13 21:19 ` Dan Williams 2020-02-13 21:19 ` Dan Williams 2020-02-13 11:22 ` Thomas Gleixner 2020-02-13 11:22 ` Thomas Gleixner 2020-02-13 21:21 ` Dan Williams 2020-02-13 21:21 ` Dan Williams 2020-01-22 3:05 ` [PATCH v4 5/6] x86/numa: Provide a range-to-target_node lookup facility Dan Williams 2020-01-22 3:05 ` Dan Williams 2020-02-13 9:36 ` Ingo Molnar 2020-02-13 9:36 ` Ingo Molnar 2020-02-13 11:37 ` Thomas Gleixner 2020-02-13 11:37 ` Thomas Gleixner 2020-02-13 21:40 ` Dan Williams 2020-02-13 21:40 ` Dan Williams 2020-01-22 3:05 ` [PATCH v4 6/6] libnvdimm/e820: Retrieve and populate correct 'target_node' info Dan Williams 2020-01-22 3:05 ` Dan Williams 2020-02-13 2:28 ` [PATCH v4 0/6] Memory Hierarchy: Enable target node lookups for reserved memory Dan Williams 2020-02-13 2:28 ` Dan Williams 2020-02-13 9:37 ` Ingo Molnar 2020-02-13 9:37 ` Ingo Molnar 2020-02-16 20:00 ` [PATCH v5 " Dan Williams 2020-02-16 20:00 ` Dan Williams 2020-02-16 20:00 ` [PATCH v5 1/6] ACPI: NUMA: Up-level "map to online node" functionality Dan Williams 2020-02-16 20:00 ` Dan Williams 2020-02-16 20:00 ` [PATCH v5 2/6] mm/numa: Skip NUMA_NO_NODE and online nodes in numa_map_to_online_node() Dan Williams 2020-02-16 20:00 ` Dan Williams 2020-02-16 20:00 ` [PATCH v5 3/6] powerpc/papr_scm: Switch to numa_map_to_online_node() Dan Williams 2020-02-16 20:00 ` Dan Williams 2020-02-16 20:01 ` [PATCH v5 4/6] x86/mm: Introduce CONFIG_NUMA_KEEP_MEMINFO Dan Williams 2020-02-16 20:01 ` Dan Williams 2020-02-17 8:21 ` Thomas Gleixner 2020-02-17 8:21 ` Thomas Gleixner 2020-02-16 20:01 ` [PATCH v5 5/6] x86/NUMA: Provide a range-to-target_node lookup facility Dan Williams 2020-02-16 20:01 ` Dan Williams 2020-02-17 8:22 ` Thomas Gleixner 2020-02-17 8:22 ` Thomas Gleixner 2020-02-16 20:01 ` Dan Williams [this message] 2020-02-16 20:01 ` [PATCH v5 6/6] libnvdimm/e820: Retrieve and populate correct 'target_node' info Dan Williams
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=158188327614.894464.13122730362187722603.stgit@dwillia2-desk3.amr.corp.intel.com \ --to=dan.j.williams@intel.com \ --cc=akpm@linux-foundation.org \ --cc=dave.hansen@linux.intel.com \ --cc=david@redhat.com \ --cc=hch@lst.de \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-nvdimm@lists.01.org \ --cc=luto@kernel.org \ --cc=mhocko@suse.com \ --cc=mingo@kernel.org \ --cc=peterz@infradead.org \ --cc=tglx@linutronix.de \ --cc=x86@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.