From: Dan Williams <dan.j.williams@intel.com> To: linux-nvdimm@lists.01.org Cc: Dave Hansen <dave.hansen@linux.intel.com>, Andy Lutomirski <luto@kernel.org>, Peter Zijlstra <peterz@infradead.org>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Andrew Morton <akpm@linux-foundation.org>, David Hildenbrand <david@redhat.com>, Michal Hocko <mhocko@suse.com>, Christoph Hellwig <hch@lst.de>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-acpi@vger.kernel.org Subject: [PATCH v2 18/18] libnvdimm/e820: Retrieve and populate correct 'target_node' info Date: Sun, 17 Nov 2019 09:46:12 -0800 [thread overview] Message-ID: <157401277293.43284.3805106435228534675.stgit@dwillia2-desk3.amr.corp.intel.com> (raw) In-Reply-To: <157401267421.43284.2135775608523385279.stgit@dwillia2-desk3.amr.corp.intel.com> Use the new phys_to_target_node() and numa_map_to_online_node() helpers to retrieve the correct id for the 'numa_node' ("local" / online initiator node) and 'target_node' (offline target memory node) sysfs attributes. Below is an example from a 4 numa node system where all the memory on node2 is pmem / reserved. It should be noted that with the arrival of the ACPI HMAT table and EFI Specific Purpose Memory the kernel will start to see more platforms with reserved / performance differentiated memory in its own numa node. Hence all the stakeholders on the Cc for what is ostensibly a libnvdimm local patch. === Before === /* Notice no online memory on node2 at start */ # numactl --hardware available: 3 nodes (0-1,3) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 node 0 size: 3958 MB node 0 free: 3708 MB node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 node 1 size: 4027 MB node 1 free: 3871 MB node 3 cpus: node 3 size: 3994 MB node 3 free: 3971 MB node distances: node 0 1 3 0: 10 21 21 1: 21 10 21 3: 21 21 10 /* * Put the pmem namespace into devdax mode so it can be assigned to the * kmem driver */ # ndctl create-namespace -e namespace0.0 -m devdax -f { "dev":"namespace0.0", "mode":"devdax", "map":"dev", "size":"3.94 GiB (4.23 GB)", "uuid":"1650af9b-9ba3-4704-acd6-10178399d9a3", [..] } /* Online Persistent Memory as System RAM */ # daxctl reconfigure-device --mode=system-ram dax0.0 libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success [ { "chardev":"dax0.0", "size":4225761280, "target_node":0, "mode":"system-ram" } ] reconfigured 1 device /* Note that the memory is onlined by default to the wrong node, node0 */ # numactl --hardware available: 3 nodes (0-1,3) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 node 0 size: 7926 MB node 0 free: 7655 MB node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 node 1 size: 4027 MB node 1 free: 3871 MB node 3 cpus: node 3 size: 3994 MB node 3 free: 3971 MB node distances: node 0 1 3 0: 10 21 21 1: 21 10 21 3: 21 21 10 === After === /* Notice that the "phys_index" error messages are gone */ # daxctl reconfigure-device --mode=system-ram dax0.0 [ { "chardev":"dax0.0", "size":4225761280, "target_node":2, "mode":"system-ram" } ] reconfigured 1 device /* Notice that node2 is now correctly populated */ # numactl --hardware available: 4 nodes (0-3) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 node 0 size: 3958 MB node 0 free: 3793 MB node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 node 1 size: 4027 MB node 1 free: 3851 MB node 2 cpus: node 2 size: 3968 MB node 2 free: 3968 MB node 3 cpus: node 3 size: 3994 MB node 3 free: 3908 MB node distances: node 0 1 2 3 0: 10 21 21 21 1: 21 10 21 21 2: 21 21 10 21 3: 21 21 21 10 Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Hildenbrand <david@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- drivers/nvdimm/e820.c | 18 ++++-------------- 1 file changed, 4 insertions(+), 14 deletions(-) diff --git a/drivers/nvdimm/e820.c b/drivers/nvdimm/e820.c index e02f60ad6c99..4cd18be9d0e9 100644 --- a/drivers/nvdimm/e820.c +++ b/drivers/nvdimm/e820.c @@ -7,6 +7,7 @@ #include <linux/memory_hotplug.h> #include <linux/libnvdimm.h> #include <linux/module.h> +#include <linux/numa.h> static int e820_pmem_remove(struct platform_device *pdev) { @@ -16,27 +17,16 @@ static int e820_pmem_remove(struct platform_device *pdev) return 0; } -#ifdef CONFIG_MEMORY_HOTPLUG -static int e820_range_to_nid(resource_size_t addr) -{ - return memory_add_physaddr_to_nid(addr); -} -#else -static int e820_range_to_nid(resource_size_t addr) -{ - return NUMA_NO_NODE; -} -#endif - static int e820_register_one(struct resource *res, void *data) { struct nd_region_desc ndr_desc; struct nvdimm_bus *nvdimm_bus = data; + int nid = phys_to_target_node(res->start); memset(&ndr_desc, 0, sizeof(ndr_desc)); ndr_desc.res = res; - ndr_desc.numa_node = e820_range_to_nid(res->start); - ndr_desc.target_node = ndr_desc.numa_node; + ndr_desc.numa_node = numa_map_to_online_node(nid); + ndr_desc.target_node = nid; set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags); if (!nvdimm_pmem_region_create(nvdimm_bus, &ndr_desc)) return -ENXIO; _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org
WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com> To: linux-nvdimm@lists.01.org Cc: Dave Hansen <dave.hansen@linux.intel.com>, Andy Lutomirski <luto@kernel.org>, Peter Zijlstra <peterz@infradead.org>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Andrew Morton <akpm@linux-foundation.org>, David Hildenbrand <david@redhat.com>, Michal Hocko <mhocko@suse.com>, Ira Weiny <ira.weiny@intel.com>, Vishal Verma <vishal.l.verma@intel.com>, Christoph Hellwig <hch@lst.de>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-acpi@vger.kernel.org Subject: [PATCH v2 18/18] libnvdimm/e820: Retrieve and populate correct 'target_node' info Date: Sun, 17 Nov 2019 09:46:12 -0800 [thread overview] Message-ID: <157401277293.43284.3805106435228534675.stgit@dwillia2-desk3.amr.corp.intel.com> (raw) In-Reply-To: <157401267421.43284.2135775608523385279.stgit@dwillia2-desk3.amr.corp.intel.com> Use the new phys_to_target_node() and numa_map_to_online_node() helpers to retrieve the correct id for the 'numa_node' ("local" / online initiator node) and 'target_node' (offline target memory node) sysfs attributes. Below is an example from a 4 numa node system where all the memory on node2 is pmem / reserved. It should be noted that with the arrival of the ACPI HMAT table and EFI Specific Purpose Memory the kernel will start to see more platforms with reserved / performance differentiated memory in its own numa node. Hence all the stakeholders on the Cc for what is ostensibly a libnvdimm local patch. === Before === /* Notice no online memory on node2 at start */ # numactl --hardware available: 3 nodes (0-1,3) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 node 0 size: 3958 MB node 0 free: 3708 MB node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 node 1 size: 4027 MB node 1 free: 3871 MB node 3 cpus: node 3 size: 3994 MB node 3 free: 3971 MB node distances: node 0 1 3 0: 10 21 21 1: 21 10 21 3: 21 21 10 /* * Put the pmem namespace into devdax mode so it can be assigned to the * kmem driver */ # ndctl create-namespace -e namespace0.0 -m devdax -f { "dev":"namespace0.0", "mode":"devdax", "map":"dev", "size":"3.94 GiB (4.23 GB)", "uuid":"1650af9b-9ba3-4704-acd6-10178399d9a3", [..] } /* Online Persistent Memory as System RAM */ # daxctl reconfigure-device --mode=system-ram dax0.0 libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success [ { "chardev":"dax0.0", "size":4225761280, "target_node":0, "mode":"system-ram" } ] reconfigured 1 device /* Note that the memory is onlined by default to the wrong node, node0 */ # numactl --hardware available: 3 nodes (0-1,3) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 node 0 size: 7926 MB node 0 free: 7655 MB node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 node 1 size: 4027 MB node 1 free: 3871 MB node 3 cpus: node 3 size: 3994 MB node 3 free: 3971 MB node distances: node 0 1 3 0: 10 21 21 1: 21 10 21 3: 21 21 10 === After === /* Notice that the "phys_index" error messages are gone */ # daxctl reconfigure-device --mode=system-ram dax0.0 [ { "chardev":"dax0.0", "size":4225761280, "target_node":2, "mode":"system-ram" } ] reconfigured 1 device /* Notice that node2 is now correctly populated */ # numactl --hardware available: 4 nodes (0-3) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 node 0 size: 3958 MB node 0 free: 3793 MB node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 node 1 size: 4027 MB node 1 free: 3851 MB node 2 cpus: node 2 size: 3968 MB node 2 free: 3968 MB node 3 cpus: node 3 size: 3994 MB node 3 free: 3908 MB node distances: node 0 1 2 3 0: 10 21 21 21 1: 21 10 21 21 2: 21 21 10 21 3: 21 21 21 10 Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Hildenbrand <david@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- drivers/nvdimm/e820.c | 18 ++++-------------- 1 file changed, 4 insertions(+), 14 deletions(-) diff --git a/drivers/nvdimm/e820.c b/drivers/nvdimm/e820.c index e02f60ad6c99..4cd18be9d0e9 100644 --- a/drivers/nvdimm/e820.c +++ b/drivers/nvdimm/e820.c @@ -7,6 +7,7 @@ #include <linux/memory_hotplug.h> #include <linux/libnvdimm.h> #include <linux/module.h> +#include <linux/numa.h> static int e820_pmem_remove(struct platform_device *pdev) { @@ -16,27 +17,16 @@ static int e820_pmem_remove(struct platform_device *pdev) return 0; } -#ifdef CONFIG_MEMORY_HOTPLUG -static int e820_range_to_nid(resource_size_t addr) -{ - return memory_add_physaddr_to_nid(addr); -} -#else -static int e820_range_to_nid(resource_size_t addr) -{ - return NUMA_NO_NODE; -} -#endif - static int e820_register_one(struct resource *res, void *data) { struct nd_region_desc ndr_desc; struct nvdimm_bus *nvdimm_bus = data; + int nid = phys_to_target_node(res->start); memset(&ndr_desc, 0, sizeof(ndr_desc)); ndr_desc.res = res; - ndr_desc.numa_node = e820_range_to_nid(res->start); - ndr_desc.target_node = ndr_desc.numa_node; + ndr_desc.numa_node = numa_map_to_online_node(nid); + ndr_desc.target_node = nid; set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags); if (!nvdimm_pmem_region_create(nvdimm_bus, &ndr_desc)) return -ENXIO;
next prev parent reply other threads:[~2019-11-17 18:00 UTC|newest] Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-11-17 17:44 [PATCH v2 00/18] Memory Hierarchy: Enable target node lookups for reserved memory Dan Williams 2019-11-17 17:44 ` Dan Williams 2019-11-17 17:44 ` [PATCH v2 01/18] libnvdimm: Move attribute groups to device type Dan Williams 2019-11-17 17:44 ` Dan Williams 2019-11-17 17:44 ` [PATCH v2 02/18] libnvdimm: Move region attribute group definition Dan Williams 2019-11-17 17:44 ` Dan Williams 2019-11-17 17:44 ` [PATCH v2 03/18] libnvdimm: Move nd_device_attribute_group to device_type Dan Williams 2019-11-17 17:44 ` Dan Williams 2019-11-17 17:44 ` [PATCH v2 04/18] libnvdimm: Move nd_numa_attribute_group " Dan Williams 2019-11-17 17:44 ` Dan Williams 2019-11-18 9:46 ` Aneesh Kumar K.V 2019-11-18 9:46 ` Aneesh Kumar K.V 2019-11-17 17:45 ` [PATCH v2 05/18] libnvdimm: Move nd_region_attribute_group " Dan Williams 2019-11-17 17:45 ` Dan Williams 2019-11-17 17:45 ` [PATCH v2 06/18] libnvdimm: Move nd_mapping_attribute_group " Dan Williams 2019-11-17 17:45 ` Dan Williams 2019-11-17 17:45 ` [PATCH v2 07/18] libnvdimm: Move nvdimm_attribute_group " Dan Williams 2019-11-17 17:45 ` Dan Williams 2019-11-17 17:45 ` [PATCH v2 08/18] libnvdimm: Move nvdimm_bus_attribute_group " Dan Williams 2019-11-17 17:45 ` Dan Williams 2019-11-17 17:45 ` [PATCH v2 09/18] dax: Create a dax device_type Dan Williams 2019-11-17 17:45 ` Dan Williams 2019-11-17 17:45 ` [PATCH v2 10/18] dax: Simplify root read-only definition for the 'resource' attribute Dan Williams 2019-11-17 17:45 ` Dan Williams 2019-11-17 17:45 ` [PATCH v2 11/18] libnvdimm: " Dan Williams 2019-11-17 17:45 ` Dan Williams 2019-11-17 17:45 ` [PATCH v2 12/18] dax: Add numa_node to the default device-dax attributes Dan Williams 2019-11-17 17:45 ` Dan Williams 2019-11-17 17:45 ` [PATCH v2 13/18] libnvdimm: Export the target_node attribute for regions and namespaces Dan Williams 2019-11-17 17:45 ` Dan Williams 2019-11-18 9:45 ` Aneesh Kumar K.V 2019-11-18 9:45 ` Aneesh Kumar K.V 2019-11-17 17:45 ` [PATCH v2 14/18] acpi/numa: Up-level "map to online node" functionality Dan Williams 2019-11-17 17:45 ` Dan Williams 2019-11-29 11:56 ` Rafael J. Wysocki 2019-11-29 11:56 ` Rafael J. Wysocki 2019-11-17 17:45 ` [PATCH v2 15/18] mm/numa: Skip NUMA_NO_NODE and online nodes in numa_map_to_online_node() Dan Williams 2019-11-17 17:45 ` Dan Williams 2019-11-18 9:45 ` Aneesh Kumar K.V 2019-11-18 9:45 ` Aneesh Kumar K.V 2019-11-17 17:46 ` [PATCH v2 16/18] powerpc/papr_scm: Switch to numa_map_to_online_node() Dan Williams 2019-11-17 17:46 ` Dan Williams 2019-11-18 9:46 ` Aneesh Kumar K.V 2019-11-18 9:46 ` Aneesh Kumar K.V 2019-11-20 10:30 ` Michael Ellerman 2019-11-20 10:30 ` Michael Ellerman 2019-11-17 17:46 ` [PATCH v2 17/18] x86/numa: Provide a range-to-target_node lookup facility Dan Williams 2019-11-17 17:46 ` Dan Williams 2019-11-18 18:45 ` Dan Williams 2019-11-18 18:45 ` Dan Williams 2019-11-18 18:45 ` Dan Williams 2019-11-17 17:46 ` Dan Williams [this message] 2019-11-17 17:46 ` [PATCH v2 18/18] libnvdimm/e820: Retrieve and populate correct 'target_node' info Dan Williams
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=157401277293.43284.3805106435228534675.stgit@dwillia2-desk3.amr.corp.intel.com \ --to=dan.j.williams@intel.com \ --cc=akpm@linux-foundation.org \ --cc=dave.hansen@linux.intel.com \ --cc=david@redhat.com \ --cc=hch@lst.de \ --cc=linux-acpi@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=linux-nvdimm@lists.01.org \ --cc=luto@kernel.org \ --cc=mhocko@suse.com \ --cc=mingo@redhat.com \ --cc=peterz@infradead.org \ --cc=tglx@linutronix.de \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.