From: Dan Williams <dan.j.williams@intel.com> To: linux-nvdimm@lists.01.org Cc: David Hildenbrand <david@redhat.com>, Borislav Petkov <bp@alien8.de>, "H. Peter Anvin" <hpa@zytor.com>, Thomas Gleixner <tglx@linutronix.de>, "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>, kbuild test robot <lkp@intel.com>, Andrew Morton <akpm@linux-foundation.org>, Peter Zijlstra <peterz@infradead.org>, Benjamin Herrenschmidt <benh@kernel.crashing.org>, Michal Hocko <mhocko@suse.com>, Paul Mackerras <paulus@samba.org>, Christoph Hellwig <hch@lst.de>, Ingo Molnar <mingo@redhat.com>, Dave Hansen <dave.hansen@linux.intel.com>, Michael Ellerman <mpe@ellerman.id.au>, x86@kernel.org, "Rafael J. Wysocki" <rjw@rjwysocki.net>, Andy Lutomirski <luto@kernel.org>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-acpi@vger.kernel.org Subject: [PATCH v2 00/18] Memory Hierarchy: Enable target node lookups for reserved memory Date: Sun, 17 Nov 2019 09:44:34 -0800 [thread overview] Message-ID: <157401267421.43284.2135775608523385279.stgit@dwillia2-desk3.amr.corp.intel.com> (raw) Changes since v1 [1]: - Rework numa_map_to_online_node() to be compatible with papr_scm_node() (Aneesh) - Export the 'target_node' attribute for nvdimm regions and namespaces (Aneesh) - Rename memory_add_physaddr_to_target_nid() to phys_to_target_node() and make it independent of CONFIG_MEMORY_HOTPLUG=y. Put a weak definition in mm/mempolicy.c that can be overridden by an arch implementation. - Fix various build reports (kbuild-robot) - Collect some reviewed-by's from Aneesh. [1]: https://lore.kernel.org/r/157309899529.1582359.15358067933360719580.stgit@dwillia2-desk3.amr.corp.intel.com/ --- As mentioned in the v1 cover letter [1] the libnvdimm device-type cleanup is intertwined with the new target_node infrastructure. The more interesting patches for arch and mm folks start at patch 14. This new infrastructure will prove more valuable over time for Memory Tiers / Hierarchy management as more platforms (via the ACPI HMAT and EFI Specific Purpose Memory) publish reserved or "soft-reserved" ranges to Linux. Linux system administrators will expect to be able to interact with those ranges with a unique numa node number when/if that memory is onlined via the dax_kmem driver [2]. One configuration that currently fails to properly convey the target node for the resulting memory hotplug operation is persistent memory defined by the memmap=nn!ss parameter. For example, today if node1 is a memory only node, and all the memory from node1 is specified to memmap=nn!ss and subsequently onlined, it will end up being onlined as node0 memory. As it stands, memory_add_physaddr_to_nid() can only identify online nodes and since node1 in this example has no online cpus / memory the target node is initialized node0. The fix is to preserve rather than discard the numa_meminfo entries that are relevant for reserved memory ranges, and to uplevel the node distance helper for determining the "local" (closest) node relative to an initiator node. The first 13 patches are cleanups to make sure that all nvdimm devices and their children properly export a numa_node attribute, and add a 'target_node' attribute by default to regions and namespaces. The switch to a device-type is less code and less error prone as a result. Patch 14 - 17 are the core changes to allow numa node information for offline memory to be tracked, and to provide a unified node mapping distance helper across architectures numa_map_to_online_node. Patches 18 uses this new capability to fix the conveyance of target_node information for memmap=nn!ss assignments. See patch 18 for more details and the test case. Given the timeframe to the v5.5 merge window I expect patch 14 - 18 will likely miss due to not enough time to review, but posting them for feedback nonetheless. [2]: https://pmem.io/ndctl/daxctl-reconfigure-device.html --- Dan Williams (18): libnvdimm: Move attribute groups to device type libnvdimm: Move region attribute group definition libnvdimm: Move nd_device_attribute_group to device_type libnvdimm: Move nd_numa_attribute_group to device_type libnvdimm: Move nd_region_attribute_group to device_type libnvdimm: Move nd_mapping_attribute_group to device_type libnvdimm: Move nvdimm_attribute_group to device_type libnvdimm: Move nvdimm_bus_attribute_group to device_type dax: Create a dax device_type dax: Simplify root read-only definition for the 'resource' attribute libnvdimm: Simplify root read-only definition for the 'resource' attribute dax: Add numa_node to the default device-dax attributes libnvdimm: Export the target_node attribute for regions and namespaces acpi/numa: Up-level "map to online node" functionality mm/numa: Skip NUMA_NO_NODE and online nodes in numa_map_to_online_node() powerpc/papr_scm: Switch to numa_map_to_online_node() x86/numa: Provide a range-to-target_node lookup facility libnvdimm/e820: Retrieve and populate correct 'target_node' info arch/powerpc/platforms/pseries/papr_scm.c | 46 ------ arch/x86/mm/numa.c | 76 +++++++++ drivers/acpi/nfit/core.c | 7 - drivers/acpi/numa.c | 41 ----- drivers/dax/bus.c | 22 ++- drivers/nvdimm/btt_devs.c | 24 +-- drivers/nvdimm/bus.c | 44 +++++ drivers/nvdimm/core.c | 8 + drivers/nvdimm/dax_devs.c | 27 +-- drivers/nvdimm/dimm_devs.c | 30 ++-- drivers/nvdimm/e820.c | 31 ---- drivers/nvdimm/namespace_devs.c | 77 +++++----- drivers/nvdimm/nd.h | 5 - drivers/nvdimm/of_pmem.c | 13 -- drivers/nvdimm/pfn_devs.c | 38 ++--- drivers/nvdimm/region_devs.c | 235 +++++++++++++++-------------- include/linux/acpi.h | 23 +++ include/linux/libnvdimm.h | 7 - include/linux/numa.h | 17 ++ mm/mempolicy.c | 35 ++++ 20 files changed, 430 insertions(+), 376 deletions(-) _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org
WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com> To: linux-nvdimm@lists.01.org Cc: Ira Weiny <ira.weiny@intel.com>, David Hildenbrand <david@redhat.com>, Borislav Petkov <bp@alien8.de>, Vishal Verma <vishal.l.verma@intel.com>, "H. Peter Anvin" <hpa@zytor.com>, Thomas Gleixner <tglx@linutronix.de>, "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>, kbuild test robot <lkp@intel.com>, Andrew Morton <akpm@linux-foundation.org>, Peter Zijlstra <peterz@infradead.org>, Benjamin Herrenschmidt <benh@kernel.crashing.org>, Michal Hocko <mhocko@suse.com>, Paul Mackerras <paulus@samba.org>, Christoph Hellwig <hch@lst.de>, Ingo Molnar <mingo@redhat.com>, Dave Hansen <dave.hansen@linux.intel.com>, "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>, Michael Ellerman <mpe@ellerman.id.au>, x86@kernel.org, Oliver O'Halloran <oohall@gmail.com>, "Rafael J. Wysocki" <rjw@rjwysocki.net>, Andy Lutomirski <luto@kernel.org>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-acpi@vger.kernel.org Subject: [PATCH v2 00/18] Memory Hierarchy: Enable target node lookups for reserved memory Date: Sun, 17 Nov 2019 09:44:34 -0800 [thread overview] Message-ID: <157401267421.43284.2135775608523385279.stgit@dwillia2-desk3.amr.corp.intel.com> (raw) Changes since v1 [1]: - Rework numa_map_to_online_node() to be compatible with papr_scm_node() (Aneesh) - Export the 'target_node' attribute for nvdimm regions and namespaces (Aneesh) - Rename memory_add_physaddr_to_target_nid() to phys_to_target_node() and make it independent of CONFIG_MEMORY_HOTPLUG=y. Put a weak definition in mm/mempolicy.c that can be overridden by an arch implementation. - Fix various build reports (kbuild-robot) - Collect some reviewed-by's from Aneesh. [1]: https://lore.kernel.org/r/157309899529.1582359.15358067933360719580.stgit@dwillia2-desk3.amr.corp.intel.com/ --- As mentioned in the v1 cover letter [1] the libnvdimm device-type cleanup is intertwined with the new target_node infrastructure. The more interesting patches for arch and mm folks start at patch 14. This new infrastructure will prove more valuable over time for Memory Tiers / Hierarchy management as more platforms (via the ACPI HMAT and EFI Specific Purpose Memory) publish reserved or "soft-reserved" ranges to Linux. Linux system administrators will expect to be able to interact with those ranges with a unique numa node number when/if that memory is onlined via the dax_kmem driver [2]. One configuration that currently fails to properly convey the target node for the resulting memory hotplug operation is persistent memory defined by the memmap=nn!ss parameter. For example, today if node1 is a memory only node, and all the memory from node1 is specified to memmap=nn!ss and subsequently onlined, it will end up being onlined as node0 memory. As it stands, memory_add_physaddr_to_nid() can only identify online nodes and since node1 in this example has no online cpus / memory the target node is initialized node0. The fix is to preserve rather than discard the numa_meminfo entries that are relevant for reserved memory ranges, and to uplevel the node distance helper for determining the "local" (closest) node relative to an initiator node. The first 13 patches are cleanups to make sure that all nvdimm devices and their children properly export a numa_node attribute, and add a 'target_node' attribute by default to regions and namespaces. The switch to a device-type is less code and less error prone as a result. Patch 14 - 17 are the core changes to allow numa node information for offline memory to be tracked, and to provide a unified node mapping distance helper across architectures numa_map_to_online_node. Patches 18 uses this new capability to fix the conveyance of target_node information for memmap=nn!ss assignments. See patch 18 for more details and the test case. Given the timeframe to the v5.5 merge window I expect patch 14 - 18 will likely miss due to not enough time to review, but posting them for feedback nonetheless. [2]: https://pmem.io/ndctl/daxctl-reconfigure-device.html --- Dan Williams (18): libnvdimm: Move attribute groups to device type libnvdimm: Move region attribute group definition libnvdimm: Move nd_device_attribute_group to device_type libnvdimm: Move nd_numa_attribute_group to device_type libnvdimm: Move nd_region_attribute_group to device_type libnvdimm: Move nd_mapping_attribute_group to device_type libnvdimm: Move nvdimm_attribute_group to device_type libnvdimm: Move nvdimm_bus_attribute_group to device_type dax: Create a dax device_type dax: Simplify root read-only definition for the 'resource' attribute libnvdimm: Simplify root read-only definition for the 'resource' attribute dax: Add numa_node to the default device-dax attributes libnvdimm: Export the target_node attribute for regions and namespaces acpi/numa: Up-level "map to online node" functionality mm/numa: Skip NUMA_NO_NODE and online nodes in numa_map_to_online_node() powerpc/papr_scm: Switch to numa_map_to_online_node() x86/numa: Provide a range-to-target_node lookup facility libnvdimm/e820: Retrieve and populate correct 'target_node' info arch/powerpc/platforms/pseries/papr_scm.c | 46 ------ arch/x86/mm/numa.c | 76 +++++++++ drivers/acpi/nfit/core.c | 7 - drivers/acpi/numa.c | 41 ----- drivers/dax/bus.c | 22 ++- drivers/nvdimm/btt_devs.c | 24 +-- drivers/nvdimm/bus.c | 44 +++++ drivers/nvdimm/core.c | 8 + drivers/nvdimm/dax_devs.c | 27 +-- drivers/nvdimm/dimm_devs.c | 30 ++-- drivers/nvdimm/e820.c | 31 ---- drivers/nvdimm/namespace_devs.c | 77 +++++----- drivers/nvdimm/nd.h | 5 - drivers/nvdimm/of_pmem.c | 13 -- drivers/nvdimm/pfn_devs.c | 38 ++--- drivers/nvdimm/region_devs.c | 235 +++++++++++++++-------------- include/linux/acpi.h | 23 +++ include/linux/libnvdimm.h | 7 - include/linux/numa.h | 17 ++ mm/mempolicy.c | 35 ++++ 20 files changed, 430 insertions(+), 376 deletions(-)
next reply other threads:[~2019-11-17 17:58 UTC|newest] Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-11-17 17:44 Dan Williams [this message] 2019-11-17 17:44 ` [PATCH v2 00/18] Memory Hierarchy: Enable target node lookups for reserved memory Dan Williams 2019-11-17 17:44 ` [PATCH v2 01/18] libnvdimm: Move attribute groups to device type Dan Williams 2019-11-17 17:44 ` Dan Williams 2019-11-17 17:44 ` [PATCH v2 02/18] libnvdimm: Move region attribute group definition Dan Williams 2019-11-17 17:44 ` Dan Williams 2019-11-17 17:44 ` [PATCH v2 03/18] libnvdimm: Move nd_device_attribute_group to device_type Dan Williams 2019-11-17 17:44 ` Dan Williams 2019-11-17 17:44 ` [PATCH v2 04/18] libnvdimm: Move nd_numa_attribute_group " Dan Williams 2019-11-17 17:44 ` Dan Williams 2019-11-18 9:46 ` Aneesh Kumar K.V 2019-11-18 9:46 ` Aneesh Kumar K.V 2019-11-17 17:45 ` [PATCH v2 05/18] libnvdimm: Move nd_region_attribute_group " Dan Williams 2019-11-17 17:45 ` Dan Williams 2019-11-17 17:45 ` [PATCH v2 06/18] libnvdimm: Move nd_mapping_attribute_group " Dan Williams 2019-11-17 17:45 ` Dan Williams 2019-11-17 17:45 ` [PATCH v2 07/18] libnvdimm: Move nvdimm_attribute_group " Dan Williams 2019-11-17 17:45 ` Dan Williams 2019-11-17 17:45 ` [PATCH v2 08/18] libnvdimm: Move nvdimm_bus_attribute_group " Dan Williams 2019-11-17 17:45 ` Dan Williams 2019-11-17 17:45 ` [PATCH v2 09/18] dax: Create a dax device_type Dan Williams 2019-11-17 17:45 ` Dan Williams 2019-11-17 17:45 ` [PATCH v2 10/18] dax: Simplify root read-only definition for the 'resource' attribute Dan Williams 2019-11-17 17:45 ` Dan Williams 2019-11-17 17:45 ` [PATCH v2 11/18] libnvdimm: " Dan Williams 2019-11-17 17:45 ` Dan Williams 2019-11-17 17:45 ` [PATCH v2 12/18] dax: Add numa_node to the default device-dax attributes Dan Williams 2019-11-17 17:45 ` Dan Williams 2019-11-17 17:45 ` [PATCH v2 13/18] libnvdimm: Export the target_node attribute for regions and namespaces Dan Williams 2019-11-17 17:45 ` Dan Williams 2019-11-18 9:45 ` Aneesh Kumar K.V 2019-11-18 9:45 ` Aneesh Kumar K.V 2019-11-17 17:45 ` [PATCH v2 14/18] acpi/numa: Up-level "map to online node" functionality Dan Williams 2019-11-17 17:45 ` Dan Williams 2019-11-29 11:56 ` Rafael J. Wysocki 2019-11-29 11:56 ` Rafael J. Wysocki 2019-11-17 17:45 ` [PATCH v2 15/18] mm/numa: Skip NUMA_NO_NODE and online nodes in numa_map_to_online_node() Dan Williams 2019-11-17 17:45 ` Dan Williams 2019-11-18 9:45 ` Aneesh Kumar K.V 2019-11-18 9:45 ` Aneesh Kumar K.V 2019-11-17 17:46 ` [PATCH v2 16/18] powerpc/papr_scm: Switch to numa_map_to_online_node() Dan Williams 2019-11-17 17:46 ` Dan Williams 2019-11-18 9:46 ` Aneesh Kumar K.V 2019-11-18 9:46 ` Aneesh Kumar K.V 2019-11-20 10:30 ` Michael Ellerman 2019-11-20 10:30 ` Michael Ellerman 2019-11-17 17:46 ` [PATCH v2 17/18] x86/numa: Provide a range-to-target_node lookup facility Dan Williams 2019-11-17 17:46 ` Dan Williams 2019-11-18 18:45 ` Dan Williams 2019-11-18 18:45 ` Dan Williams 2019-11-18 18:45 ` Dan Williams 2019-11-17 17:46 ` [PATCH v2 18/18] libnvdimm/e820: Retrieve and populate correct 'target_node' info Dan Williams 2019-11-17 17:46 ` Dan Williams
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=157401267421.43284.2135775608523385279.stgit@dwillia2-desk3.amr.corp.intel.com \ --to=dan.j.williams@intel.com \ --cc=akpm@linux-foundation.org \ --cc=aneesh.kumar@linux.ibm.com \ --cc=benh@kernel.crashing.org \ --cc=bp@alien8.de \ --cc=dave.hansen@linux.intel.com \ --cc=david@redhat.com \ --cc=hch@lst.de \ --cc=hpa@zytor.com \ --cc=linux-acpi@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=linux-nvdimm@lists.01.org \ --cc=lkp@intel.com \ --cc=luto@kernel.org \ --cc=mhocko@suse.com \ --cc=mingo@redhat.com \ --cc=mpe@ellerman.id.au \ --cc=paulus@samba.org \ --cc=peterz@infradead.org \ --cc=rjw@rjwysocki.net \ --cc=tglx@linutronix.de \ --cc=x86@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.