From: Dan Williams <dan.j.williams@intel.com> To: linux-nvdimm@lists.01.org Cc: Dave Hansen <dave.hansen@linux.intel.com>, Andy Lutomirski <luto@kernel.org>, Peter Zijlstra <peterz@infradead.org>, Thomas Gleixner <tglx@linutronix.de>, Borislav Petkov <bp@alien8.de>, "H. Peter Anvin" <hpa@zytor.com>, x86@kernel.org, Andrew Morton <akpm@linux-foundation.org>, David Hildenbrand <david@redhat.com>, Michal Hocko <mhocko@suse.com>, kbuild test robot <lkp@intel.com>, Ingo Molnar <mingo@kernel.org>, hch@lst.de, linux-kernel@vger.kernel.org Subject: [PATCH v5 5/6] x86/NUMA: Provide a range-to-target_node lookup facility Date: Sun, 16 Feb 2020 12:01:09 -0800 [thread overview] Message-ID: <158188326978.894464.217282995221175417.stgit@dwillia2-desk3.amr.corp.intel.com> (raw) In-Reply-To: <158188324272.894464.5941332130956525504.stgit@dwillia2-desk3.amr.corp.intel.com> The DEV_DAX_KMEM facility is a generic mechanism to allow device-dax instances, fronting performance-differentiated-memory like pmem, to be added to the System RAM pool. The NUMA node for that hot-added memory is derived from the device-dax instance's 'target_node' attribute. Recall that the 'target_node' is the ACPI-PXM-to-node translation for memory when it comes online whereas the 'numa_node' attribute of the device represents the closest online cpu node. Presently useful target_node information from the ACPI SRAT is discarded with the expectation that "Reserved" memory will never be onlined. Now, DEV_DAX_KMEM violates that assumption, there is a need to retain the translation. Move, rather than discard, numa_memblk data to a secondary array that memory_add_physaddr_to_target_node() may consider at a later point in time. Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <x86@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Hildenbrand <david@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Reported-by: kbuild test robot <lkp@intel.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- arch/x86/mm/numa.c | 61 ++++++++++++++++++++++++++++++++++++++++++-------- include/linux/numa.h | 14 +++++++++++ 2 files changed, 64 insertions(+), 11 deletions(-) diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c index 2450b21cc28a..59ba008504dc 100644 --- a/arch/x86/mm/numa.c +++ b/arch/x86/mm/numa.c @@ -26,6 +26,7 @@ struct pglist_data *node_data[MAX_NUMNODES] __read_mostly; EXPORT_SYMBOL(node_data); static struct numa_meminfo numa_meminfo __initdata_or_meminfo; +static struct numa_meminfo numa_reserved_meminfo __initdata_or_meminfo; static int numa_distance_cnt; static u8 *numa_distance; @@ -164,6 +165,19 @@ void __init numa_remove_memblk_from(int idx, struct numa_meminfo *mi) (mi->nr_blks - idx) * sizeof(mi->blk[0])); } +/** + * numa_move_tail_memblk - Move a numa_memblk from one numa_meminfo to another + * @dst: numa_meminfo to append block to + * @idx: Index of memblk to remove + * @src: numa_meminfo to remove memblk from + */ +static void __init numa_move_tail_memblk(struct numa_meminfo *dst, int idx, + struct numa_meminfo *src) +{ + dst->blk[dst->nr_blks++] = src->blk[idx]; + numa_remove_memblk_from(idx, src); +} + /** * numa_add_memblk - Add one numa_memblk to numa_meminfo * @nid: NUMA node ID of the new memblk @@ -233,14 +247,19 @@ int __init numa_cleanup_meminfo(struct numa_meminfo *mi) for (i = 0; i < mi->nr_blks; i++) { struct numa_memblk *bi = &mi->blk[i]; - /* make sure all blocks are inside the limits */ + /* move / save reserved memory ranges */ + if (!memblock_overlaps_region(&memblock.memory, + bi->start, bi->end - bi->start)) { + numa_move_tail_memblk(&numa_reserved_meminfo, i--, mi); + continue; + } + + /* make sure all non-reserved blocks are inside the limits */ bi->start = max(bi->start, low); bi->end = min(bi->end, high); - /* and there's no empty or non-exist block */ - if (bi->start >= bi->end || - !memblock_overlaps_region(&memblock.memory, - bi->start, bi->end - bi->start)) + /* and there's no empty block */ + if (bi->start >= bi->end) numa_remove_memblk_from(i--, mi); } @@ -877,16 +896,38 @@ EXPORT_SYMBOL(cpumask_of_node); #endif /* !CONFIG_DEBUG_PER_CPU_MAPS */ -#ifdef CONFIG_MEMORY_HOTPLUG -int memory_add_physaddr_to_nid(u64 start) +#ifdef CONFIG_NUMA_KEEP_MEMINFO +static int meminfo_to_nid(struct numa_meminfo *mi, u64 start) { - struct numa_meminfo *mi = &numa_meminfo; - int nid = mi->blk[0].nid; int i; for (i = 0; i < mi->nr_blks; i++) if (mi->blk[i].start <= start && mi->blk[i].end > start) - nid = mi->blk[i].nid; + return mi->blk[i].nid; + return NUMA_NO_NODE; +} + +int phys_to_target_node(phys_addr_t start) +{ + int nid = meminfo_to_nid(&numa_meminfo, start); + + /* + * Prefer online nodes, but if reserved memory might be + * hot-added continue the search with reserved ranges. + */ + if (nid != NUMA_NO_NODE) + return nid; + + return meminfo_to_nid(&numa_reserved_meminfo, start); +} +EXPORT_SYMBOL_GPL(phys_to_target_node); + +int memory_add_physaddr_to_nid(u64 start) +{ + int nid = meminfo_to_nid(&numa_meminfo, start); + + if (nid == NUMA_NO_NODE) + nid = numa_meminfo.blk[0].nid; return nid; } EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); diff --git a/include/linux/numa.h b/include/linux/numa.h index 5773cd2613fc..a42df804679e 100644 --- a/include/linux/numa.h +++ b/include/linux/numa.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_NUMA_H #define _LINUX_NUMA_H - +#include <linux/types.h> #ifdef CONFIG_NODES_SHIFT #define NODES_SHIFT CONFIG_NODES_SHIFT @@ -21,12 +21,24 @@ #endif #ifdef CONFIG_NUMA +/* Generic implementation available */ int numa_map_to_online_node(int node); + +/* + * Optional architecture specific implementation, users need a "depends + * on $ARCH" + */ +int phys_to_target_node(phys_addr_t addr); #else static inline int numa_map_to_online_node(int node) { return NUMA_NO_NODE; } + +static inline int phys_to_target_node(phys_addr_t addr) +{ + return NUMA_NO_NODE; +} #endif #endif /* _LINUX_NUMA_H */ _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org
WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com> To: linux-nvdimm@lists.01.org Cc: Dave Hansen <dave.hansen@linux.intel.com>, Andy Lutomirski <luto@kernel.org>, Peter Zijlstra <peterz@infradead.org>, Thomas Gleixner <tglx@linutronix.de>, Borislav Petkov <bp@alien8.de>, "H. Peter Anvin" <hpa@zytor.com>, x86@kernel.org, Andrew Morton <akpm@linux-foundation.org>, David Hildenbrand <david@redhat.com>, Michal Hocko <mhocko@suse.com>, kbuild test robot <lkp@intel.com>, Ingo Molnar <mingo@kernel.org>, vishal.l.verma@intel.com, hch@lst.de, linux-kernel@vger.kernel.org, x86@kernel.org Subject: [PATCH v5 5/6] x86/NUMA: Provide a range-to-target_node lookup facility Date: Sun, 16 Feb 2020 12:01:09 -0800 [thread overview] Message-ID: <158188326978.894464.217282995221175417.stgit@dwillia2-desk3.amr.corp.intel.com> (raw) In-Reply-To: <158188324272.894464.5941332130956525504.stgit@dwillia2-desk3.amr.corp.intel.com> The DEV_DAX_KMEM facility is a generic mechanism to allow device-dax instances, fronting performance-differentiated-memory like pmem, to be added to the System RAM pool. The NUMA node for that hot-added memory is derived from the device-dax instance's 'target_node' attribute. Recall that the 'target_node' is the ACPI-PXM-to-node translation for memory when it comes online whereas the 'numa_node' attribute of the device represents the closest online cpu node. Presently useful target_node information from the ACPI SRAT is discarded with the expectation that "Reserved" memory will never be onlined. Now, DEV_DAX_KMEM violates that assumption, there is a need to retain the translation. Move, rather than discard, numa_memblk data to a secondary array that memory_add_physaddr_to_target_node() may consider at a later point in time. Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <x86@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Hildenbrand <david@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Reported-by: kbuild test robot <lkp@intel.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- arch/x86/mm/numa.c | 61 ++++++++++++++++++++++++++++++++++++++++++-------- include/linux/numa.h | 14 +++++++++++ 2 files changed, 64 insertions(+), 11 deletions(-) diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c index 2450b21cc28a..59ba008504dc 100644 --- a/arch/x86/mm/numa.c +++ b/arch/x86/mm/numa.c @@ -26,6 +26,7 @@ struct pglist_data *node_data[MAX_NUMNODES] __read_mostly; EXPORT_SYMBOL(node_data); static struct numa_meminfo numa_meminfo __initdata_or_meminfo; +static struct numa_meminfo numa_reserved_meminfo __initdata_or_meminfo; static int numa_distance_cnt; static u8 *numa_distance; @@ -164,6 +165,19 @@ void __init numa_remove_memblk_from(int idx, struct numa_meminfo *mi) (mi->nr_blks - idx) * sizeof(mi->blk[0])); } +/** + * numa_move_tail_memblk - Move a numa_memblk from one numa_meminfo to another + * @dst: numa_meminfo to append block to + * @idx: Index of memblk to remove + * @src: numa_meminfo to remove memblk from + */ +static void __init numa_move_tail_memblk(struct numa_meminfo *dst, int idx, + struct numa_meminfo *src) +{ + dst->blk[dst->nr_blks++] = src->blk[idx]; + numa_remove_memblk_from(idx, src); +} + /** * numa_add_memblk - Add one numa_memblk to numa_meminfo * @nid: NUMA node ID of the new memblk @@ -233,14 +247,19 @@ int __init numa_cleanup_meminfo(struct numa_meminfo *mi) for (i = 0; i < mi->nr_blks; i++) { struct numa_memblk *bi = &mi->blk[i]; - /* make sure all blocks are inside the limits */ + /* move / save reserved memory ranges */ + if (!memblock_overlaps_region(&memblock.memory, + bi->start, bi->end - bi->start)) { + numa_move_tail_memblk(&numa_reserved_meminfo, i--, mi); + continue; + } + + /* make sure all non-reserved blocks are inside the limits */ bi->start = max(bi->start, low); bi->end = min(bi->end, high); - /* and there's no empty or non-exist block */ - if (bi->start >= bi->end || - !memblock_overlaps_region(&memblock.memory, - bi->start, bi->end - bi->start)) + /* and there's no empty block */ + if (bi->start >= bi->end) numa_remove_memblk_from(i--, mi); } @@ -877,16 +896,38 @@ EXPORT_SYMBOL(cpumask_of_node); #endif /* !CONFIG_DEBUG_PER_CPU_MAPS */ -#ifdef CONFIG_MEMORY_HOTPLUG -int memory_add_physaddr_to_nid(u64 start) +#ifdef CONFIG_NUMA_KEEP_MEMINFO +static int meminfo_to_nid(struct numa_meminfo *mi, u64 start) { - struct numa_meminfo *mi = &numa_meminfo; - int nid = mi->blk[0].nid; int i; for (i = 0; i < mi->nr_blks; i++) if (mi->blk[i].start <= start && mi->blk[i].end > start) - nid = mi->blk[i].nid; + return mi->blk[i].nid; + return NUMA_NO_NODE; +} + +int phys_to_target_node(phys_addr_t start) +{ + int nid = meminfo_to_nid(&numa_meminfo, start); + + /* + * Prefer online nodes, but if reserved memory might be + * hot-added continue the search with reserved ranges. + */ + if (nid != NUMA_NO_NODE) + return nid; + + return meminfo_to_nid(&numa_reserved_meminfo, start); +} +EXPORT_SYMBOL_GPL(phys_to_target_node); + +int memory_add_physaddr_to_nid(u64 start) +{ + int nid = meminfo_to_nid(&numa_meminfo, start); + + if (nid == NUMA_NO_NODE) + nid = numa_meminfo.blk[0].nid; return nid; } EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); diff --git a/include/linux/numa.h b/include/linux/numa.h index 5773cd2613fc..a42df804679e 100644 --- a/include/linux/numa.h +++ b/include/linux/numa.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_NUMA_H #define _LINUX_NUMA_H - +#include <linux/types.h> #ifdef CONFIG_NODES_SHIFT #define NODES_SHIFT CONFIG_NODES_SHIFT @@ -21,12 +21,24 @@ #endif #ifdef CONFIG_NUMA +/* Generic implementation available */ int numa_map_to_online_node(int node); + +/* + * Optional architecture specific implementation, users need a "depends + * on $ARCH" + */ +int phys_to_target_node(phys_addr_t addr); #else static inline int numa_map_to_online_node(int node) { return NUMA_NO_NODE; } + +static inline int phys_to_target_node(phys_addr_t addr) +{ + return NUMA_NO_NODE; +} #endif #endif /* _LINUX_NUMA_H */
next prev parent reply other threads:[~2020-02-16 20:17 UTC|newest] Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-01-22 3:04 [PATCH v4 0/6] Memory Hierarchy: Enable target node lookups for reserved memory Dan Williams 2020-01-22 3:04 ` Dan Williams 2020-01-22 3:04 ` [PATCH v4 1/6] ACPI: NUMA: Up-level "map to online node" functionality Dan Williams 2020-01-22 3:04 ` Dan Williams 2020-01-22 3:04 ` [PATCH v4 2/6] mm/numa: Skip NUMA_NO_NODE and online nodes in numa_map_to_online_node() Dan Williams 2020-01-22 3:04 ` Dan Williams 2020-01-22 3:04 ` [PATCH v4 3/6] powerpc/papr_scm: Switch to numa_map_to_online_node() Dan Williams 2020-01-22 3:04 ` Dan Williams 2020-01-22 3:04 ` [PATCH v4 4/6] x86/mm: Introduce CONFIG_KEEP_NUMA Dan Williams 2020-01-22 3:04 ` Dan Williams 2020-02-13 9:32 ` Ingo Molnar 2020-02-13 9:32 ` Ingo Molnar 2020-02-13 21:19 ` Dan Williams 2020-02-13 21:19 ` Dan Williams 2020-02-13 11:22 ` Thomas Gleixner 2020-02-13 11:22 ` Thomas Gleixner 2020-02-13 21:21 ` Dan Williams 2020-02-13 21:21 ` Dan Williams 2020-01-22 3:05 ` [PATCH v4 5/6] x86/numa: Provide a range-to-target_node lookup facility Dan Williams 2020-01-22 3:05 ` Dan Williams 2020-02-13 9:36 ` Ingo Molnar 2020-02-13 9:36 ` Ingo Molnar 2020-02-13 11:37 ` Thomas Gleixner 2020-02-13 11:37 ` Thomas Gleixner 2020-02-13 21:40 ` Dan Williams 2020-02-13 21:40 ` Dan Williams 2020-01-22 3:05 ` [PATCH v4 6/6] libnvdimm/e820: Retrieve and populate correct 'target_node' info Dan Williams 2020-01-22 3:05 ` Dan Williams 2020-02-13 2:28 ` [PATCH v4 0/6] Memory Hierarchy: Enable target node lookups for reserved memory Dan Williams 2020-02-13 2:28 ` Dan Williams 2020-02-13 9:37 ` Ingo Molnar 2020-02-13 9:37 ` Ingo Molnar 2020-02-16 20:00 ` [PATCH v5 " Dan Williams 2020-02-16 20:00 ` Dan Williams 2020-02-16 20:00 ` [PATCH v5 1/6] ACPI: NUMA: Up-level "map to online node" functionality Dan Williams 2020-02-16 20:00 ` Dan Williams 2020-02-16 20:00 ` [PATCH v5 2/6] mm/numa: Skip NUMA_NO_NODE and online nodes in numa_map_to_online_node() Dan Williams 2020-02-16 20:00 ` Dan Williams 2020-02-16 20:00 ` [PATCH v5 3/6] powerpc/papr_scm: Switch to numa_map_to_online_node() Dan Williams 2020-02-16 20:00 ` Dan Williams 2020-02-16 20:01 ` [PATCH v5 4/6] x86/mm: Introduce CONFIG_NUMA_KEEP_MEMINFO Dan Williams 2020-02-16 20:01 ` Dan Williams 2020-02-17 8:21 ` Thomas Gleixner 2020-02-17 8:21 ` Thomas Gleixner 2020-02-16 20:01 ` Dan Williams [this message] 2020-02-16 20:01 ` [PATCH v5 5/6] x86/NUMA: Provide a range-to-target_node lookup facility Dan Williams 2020-02-17 8:22 ` Thomas Gleixner 2020-02-17 8:22 ` Thomas Gleixner 2020-02-16 20:01 ` [PATCH v5 6/6] libnvdimm/e820: Retrieve and populate correct 'target_node' info Dan Williams 2020-02-16 20:01 ` Dan Williams
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=158188326978.894464.217282995221175417.stgit@dwillia2-desk3.amr.corp.intel.com \ --to=dan.j.williams@intel.com \ --cc=akpm@linux-foundation.org \ --cc=bp@alien8.de \ --cc=dave.hansen@linux.intel.com \ --cc=david@redhat.com \ --cc=hch@lst.de \ --cc=hpa@zytor.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-nvdimm@lists.01.org \ --cc=lkp@intel.com \ --cc=luto@kernel.org \ --cc=mhocko@suse.com \ --cc=mingo@kernel.org \ --cc=peterz@infradead.org \ --cc=tglx@linutronix.de \ --cc=x86@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.