* [PATCH v2 0/3] Fix and enable pmem as RAM device on arm64 @ 2020-07-07 5:59 Jia He 2020-07-07 5:59 ` [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL Jia He ` (2 more replies) 0 siblings, 3 replies; 46+ messages in thread From: Jia He @ 2020-07-07 5:59 UTC (permalink / raw) To: Catalin Marinas, Will Deacon, Dan Williams, Vishal Verma, Dave Jiang Cc: Michal Hocko, Andrew Morton, Mike Rapoport, Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm, linux-nvdimm, Kaly Xin, Jia He This fix a few issues when I tried to enable pmem as RAM device on arm64. Tested on ThunderX2 host/qemu "-M virt" guest with a nvdimm device. The memblocks from the dax pmem device can be either hot-added or hot-removed on arm64 guest. Changes: v2: - Drop unneccessary patch to harden try_offline_node - Use new solution(by David) to fix dev->target_node=-1 during probing - Refine the mem_hotplug_begin/done patch v1: https://lkml.org/lkml/2020/7/5/381 Jia He (3): arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL device-dax: use fallback nid when numa_node is invalid mm/memory_hotplug: fix unpaired mem_hotplug_begin/done arch/arm64/mm/numa.c | 5 +++-- drivers/dax/kmem.c | 22 ++++++++++++++-------- mm/memory_hotplug.c | 5 ++--- 3 files changed, 19 insertions(+), 13 deletions(-) -- 2.17.1 ^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-07 5:59 [PATCH v2 0/3] Fix and enable pmem as RAM device on arm64 Jia He @ 2020-07-07 5:59 ` Jia He 2020-07-07 11:35 ` David Hildenbrand 2020-07-07 11:54 ` Michal Hocko 2020-07-07 5:59 ` [RFC PATCH v2 2/3] device-dax: use fallback nid when numa_node is invalid Jia He 2020-07-07 5:59 ` [PATCH v2 3/3] mm/memory_hotplug: fix unpaired mem_hotplug_begin/done Jia He 2 siblings, 2 replies; 46+ messages in thread From: Jia He @ 2020-07-07 5:59 UTC (permalink / raw) To: Catalin Marinas, Will Deacon, Dan Williams, Vishal Verma, Dave Jiang Cc: Michal Hocko, Andrew Morton, Mike Rapoport, Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm, linux-nvdimm, Kaly Xin, Jia He This exports memory_add_physaddr_to_nid() for module driver to use. memory_add_physaddr_to_nid() is a fallback option to get the nid in case NUMA_NO_NID is detected. Suggested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Jia He <justin.he@arm.com> --- arch/arm64/mm/numa.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c index aafcee3e3f7e..7eeb31740248 100644 --- a/arch/arm64/mm/numa.c +++ b/arch/arm64/mm/numa.c @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) /* * We hope that we will be hotplugging memory on nodes we already know about, - * such that acpi_get_node() succeeds and we never fall back to this... + * such that acpi_get_node() succeeds. But when SRAT is not present, the node + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option. */ int memory_add_physaddr_to_nid(u64 addr) { - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr); return 0; } +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); -- 2.17.1 ^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-07 5:59 ` [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL Jia He @ 2020-07-07 11:35 ` David Hildenbrand 2020-07-07 11:54 ` Michal Hocko 1 sibling, 0 replies; 46+ messages in thread From: David Hildenbrand @ 2020-07-07 11:35 UTC (permalink / raw) To: Jia He, Catalin Marinas, Will Deacon, Dan Williams, Vishal Verma, Dave Jiang Cc: Michal Hocko, Andrew Morton, Mike Rapoport, Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm, linux-nvdimm, Kaly Xin On 07.07.20 07:59, Jia He wrote: > This exports memory_add_physaddr_to_nid() for module driver to use. > > memory_add_physaddr_to_nid() is a fallback option to get the nid in case > NUMA_NO_NID is detected. > > Suggested-by: David Hildenbrand <david@redhat.com> > Signed-off-by: Jia He <justin.he@arm.com> > --- > arch/arm64/mm/numa.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c > index aafcee3e3f7e..7eeb31740248 100644 > --- a/arch/arm64/mm/numa.c > +++ b/arch/arm64/mm/numa.c > @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) > > /* > * We hope that we will be hotplugging memory on nodes we already know about, > - * such that acpi_get_node() succeeds and we never fall back to this... > + * such that acpi_get_node() succeeds. But when SRAT is not present, the node > + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option. > */ > int memory_add_physaddr_to_nid(u64 addr) > { > - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr); > return 0; > } > +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); > We could turn that into a pr_info() instead, but the effect is visible to user space (e.g., which memory blocks belong to which node in sysfs), so this can be debugged easily on demand. Reviewed-by: David Hildenbrand <david@redhat.com> -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-07 5:59 ` [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL Jia He 2020-07-07 11:35 ` David Hildenbrand @ 2020-07-07 11:54 ` Michal Hocko 2020-07-07 12:13 ` Mike Rapoport 2020-07-08 2:20 ` Justin He 1 sibling, 2 replies; 46+ messages in thread From: Michal Hocko @ 2020-07-07 11:54 UTC (permalink / raw) To: Jia He Cc: Catalin Marinas, Will Deacon, Dan Williams, Vishal Verma, Dave Jiang, Andrew Morton, Mike Rapoport, Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm, linux-nvdimm, Kaly Xin On Tue 07-07-20 13:59:15, Jia He wrote: > This exports memory_add_physaddr_to_nid() for module driver to use. > > memory_add_physaddr_to_nid() is a fallback option to get the nid in case > NUMA_NO_NID is detected. > > Suggested-by: David Hildenbrand <david@redhat.com> > Signed-off-by: Jia He <justin.he@arm.com> > --- > arch/arm64/mm/numa.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c > index aafcee3e3f7e..7eeb31740248 100644 > --- a/arch/arm64/mm/numa.c > +++ b/arch/arm64/mm/numa.c > @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) > > /* > * We hope that we will be hotplugging memory on nodes we already know about, > - * such that acpi_get_node() succeeds and we never fall back to this... > + * such that acpi_get_node() succeeds. But when SRAT is not present, the node > + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option. > */ > int memory_add_physaddr_to_nid(u64 addr) > { > - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr); > return 0; > } > +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); Does it make sense to export a noop function? Wouldn't make more sense to simply make it static inline somewhere in a header? I haven't checked whether there is an easy way to do that sanely bu this just hit my eyes. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-07 11:54 ` Michal Hocko @ 2020-07-07 12:13 ` Mike Rapoport 2020-07-07 12:26 ` David Hildenbrand 2020-07-08 2:20 ` Justin He 1 sibling, 1 reply; 46+ messages in thread From: Mike Rapoport @ 2020-07-07 12:13 UTC (permalink / raw) To: Michal Hocko Cc: Jia He, Catalin Marinas, Will Deacon, Dan Williams, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm, linux-nvdimm, Kaly Xin On Tue, Jul 07, 2020 at 01:54:54PM +0200, Michal Hocko wrote: > On Tue 07-07-20 13:59:15, Jia He wrote: > > This exports memory_add_physaddr_to_nid() for module driver to use. > > > > memory_add_physaddr_to_nid() is a fallback option to get the nid in case > > NUMA_NO_NID is detected. > > > > Suggested-by: David Hildenbrand <david@redhat.com> > > Signed-off-by: Jia He <justin.he@arm.com> > > --- > > arch/arm64/mm/numa.c | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c > > index aafcee3e3f7e..7eeb31740248 100644 > > --- a/arch/arm64/mm/numa.c > > +++ b/arch/arm64/mm/numa.c > > @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) > > > > /* > > * We hope that we will be hotplugging memory on nodes we already know about, > > - * such that acpi_get_node() succeeds and we never fall back to this... > > + * such that acpi_get_node() succeeds. But when SRAT is not present, the node > > + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option. > > */ > > int memory_add_physaddr_to_nid(u64 addr) > > { > > - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr); > > return 0; > > } > > +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); > > Does it make sense to export a noop function? Wouldn't make more sense > to simply make it static inline somewhere in a header? I haven't checked > whether there is an easy way to do that sanely bu this just hit my eyes. We'll need to either add a CONFIG_ option or arch specific callback to make both non-empty (x86, powerpc, ia64) and empty (arm64, sh) implementations coexist ... > -- > Michal Hocko > SUSE Labs -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-07 12:13 ` Mike Rapoport @ 2020-07-07 12:26 ` David Hildenbrand 2020-07-07 18:00 ` Mike Rapoport 0 siblings, 1 reply; 46+ messages in thread From: David Hildenbrand @ 2020-07-07 12:26 UTC (permalink / raw) To: Mike Rapoport, Michal Hocko Cc: Jia He, Catalin Marinas, Will Deacon, Dan Williams, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm, linux-nvdimm, Kaly Xin On 07.07.20 14:13, Mike Rapoport wrote: > On Tue, Jul 07, 2020 at 01:54:54PM +0200, Michal Hocko wrote: >> On Tue 07-07-20 13:59:15, Jia He wrote: >>> This exports memory_add_physaddr_to_nid() for module driver to use. >>> >>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case >>> NUMA_NO_NID is detected. >>> >>> Suggested-by: David Hildenbrand <david@redhat.com> >>> Signed-off-by: Jia He <justin.he@arm.com> >>> --- >>> arch/arm64/mm/numa.c | 5 +++-- >>> 1 file changed, 3 insertions(+), 2 deletions(-) >>> >>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c >>> index aafcee3e3f7e..7eeb31740248 100644 >>> --- a/arch/arm64/mm/numa.c >>> +++ b/arch/arm64/mm/numa.c >>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) >>> >>> /* >>> * We hope that we will be hotplugging memory on nodes we already know about, >>> - * such that acpi_get_node() succeeds and we never fall back to this... >>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node >>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option. >>> */ >>> int memory_add_physaddr_to_nid(u64 addr) >>> { >>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr); >>> return 0; >>> } >>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); >> >> Does it make sense to export a noop function? Wouldn't make more sense >> to simply make it static inline somewhere in a header? I haven't checked >> whether there is an easy way to do that sanely bu this just hit my eyes. > > We'll need to either add a CONFIG_ option or arch specific callback to > make both non-empty (x86, powerpc, ia64) and empty (arm64, sh) > implementations coexist ... Note: I have a similar dummy (return 0) patch for s390x lying around here. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-07 12:26 ` David Hildenbrand @ 2020-07-07 18:00 ` Mike Rapoport 2020-07-07 22:05 ` Dan Williams 0 siblings, 1 reply; 46+ messages in thread From: Mike Rapoport @ 2020-07-07 18:00 UTC (permalink / raw) To: David Hildenbrand Cc: Michal Hocko, Jia He, Catalin Marinas, Will Deacon, Dan Williams, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm, linux-nvdimm, Kaly Xin On Tue, Jul 07, 2020 at 02:26:08PM +0200, David Hildenbrand wrote: > On 07.07.20 14:13, Mike Rapoport wrote: > > On Tue, Jul 07, 2020 at 01:54:54PM +0200, Michal Hocko wrote: > >> On Tue 07-07-20 13:59:15, Jia He wrote: > >>> This exports memory_add_physaddr_to_nid() for module driver to use. > >>> > >>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case > >>> NUMA_NO_NID is detected. > >>> > >>> Suggested-by: David Hildenbrand <david@redhat.com> > >>> Signed-off-by: Jia He <justin.he@arm.com> > >>> --- > >>> arch/arm64/mm/numa.c | 5 +++-- > >>> 1 file changed, 3 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c > >>> index aafcee3e3f7e..7eeb31740248 100644 > >>> --- a/arch/arm64/mm/numa.c > >>> +++ b/arch/arm64/mm/numa.c > >>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) > >>> > >>> /* > >>> * We hope that we will be hotplugging memory on nodes we already know about, > >>> - * such that acpi_get_node() succeeds and we never fall back to this... > >>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node > >>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option. > >>> */ > >>> int memory_add_physaddr_to_nid(u64 addr) > >>> { > >>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr); > >>> return 0; > >>> } > >>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); > >> > >> Does it make sense to export a noop function? Wouldn't make more sense > >> to simply make it static inline somewhere in a header? I haven't checked > >> whether there is an easy way to do that sanely bu this just hit my eyes. > > > > We'll need to either add a CONFIG_ option or arch specific callback to > > make both non-empty (x86, powerpc, ia64) and empty (arm64, sh) > > implementations coexist ... > > Note: I have a similar dummy (return 0) patch for s390x lying around here. Then we'll call it a tie - 3:3 ;-) > -- > Thanks, > > David / dhildenb > -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-07 18:00 ` Mike Rapoport @ 2020-07-07 22:05 ` Dan Williams 2020-07-08 5:27 ` Mike Rapoport 0 siblings, 1 reply; 46+ messages in thread From: Dan Williams @ 2020-07-07 22:05 UTC (permalink / raw) To: Mike Rapoport Cc: David Hildenbrand, Michal Hocko, Jia He, Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He, Chuhong Yuan, Linux ARM, Linux Kernel Mailing List, Linux MM, linux-nvdimm, Kaly Xin On Tue, Jul 7, 2020 at 11:01 AM Mike Rapoport <rppt@linux.ibm.com> wrote: > > On Tue, Jul 07, 2020 at 02:26:08PM +0200, David Hildenbrand wrote: > > On 07.07.20 14:13, Mike Rapoport wrote: > > > On Tue, Jul 07, 2020 at 01:54:54PM +0200, Michal Hocko wrote: > > >> On Tue 07-07-20 13:59:15, Jia He wrote: > > >>> This exports memory_add_physaddr_to_nid() for module driver to use. > > >>> > > >>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case > > >>> NUMA_NO_NID is detected. > > >>> > > >>> Suggested-by: David Hildenbrand <david@redhat.com> > > >>> Signed-off-by: Jia He <justin.he@arm.com> > > >>> --- > > >>> arch/arm64/mm/numa.c | 5 +++-- > > >>> 1 file changed, 3 insertions(+), 2 deletions(-) > > >>> > > >>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c > > >>> index aafcee3e3f7e..7eeb31740248 100644 > > >>> --- a/arch/arm64/mm/numa.c > > >>> +++ b/arch/arm64/mm/numa.c > > >>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) > > >>> > > >>> /* > > >>> * We hope that we will be hotplugging memory on nodes we already know about, > > >>> - * such that acpi_get_node() succeeds and we never fall back to this... > > >>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node > > >>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option. > > >>> */ > > >>> int memory_add_physaddr_to_nid(u64 addr) > > >>> { > > >>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr); > > >>> return 0; > > >>> } > > >>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); > > >> > > >> Does it make sense to export a noop function? Wouldn't make more sense > > >> to simply make it static inline somewhere in a header? I haven't checked > > >> whether there is an easy way to do that sanely bu this just hit my eyes. > > > > > > We'll need to either add a CONFIG_ option or arch specific callback to > > > make both non-empty (x86, powerpc, ia64) and empty (arm64, sh) > > > implementations coexist ... > > > > Note: I have a similar dummy (return 0) patch for s390x lying around here. > > Then we'll call it a tie - 3:3 ;-) So I'd be happy to jump on the train of people wanting to export the ARM stub for this (and add a new ARM stub for phys_to_target_node()), but Will did have a plausibly better idea that I have been meaning to circle back to: http://lore.kernel.org/r/20200325111039.GA32109@willie-the-truck ...i.e. iterate over node data to do the lookup. This would seem to work generically for multiple archs unless I am missing something? ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-07 22:05 ` Dan Williams @ 2020-07-08 5:27 ` Mike Rapoport 2020-07-08 7:21 ` David Hildenbrand 0 siblings, 1 reply; 46+ messages in thread From: Mike Rapoport @ 2020-07-08 5:27 UTC (permalink / raw) To: Dan Williams Cc: David Hildenbrand, Michal Hocko, Jia He, Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He, Chuhong Yuan, Linux ARM, Linux Kernel Mailing List, Linux MM, linux-nvdimm, Kaly Xin On Tue, Jul 07, 2020 at 03:05:48PM -0700, Dan Williams wrote: > On Tue, Jul 7, 2020 at 11:01 AM Mike Rapoport <rppt@linux.ibm.com> wrote: > > > > On Tue, Jul 07, 2020 at 02:26:08PM +0200, David Hildenbrand wrote: > > > On 07.07.20 14:13, Mike Rapoport wrote: > > > > On Tue, Jul 07, 2020 at 01:54:54PM +0200, Michal Hocko wrote: > > > >> On Tue 07-07-20 13:59:15, Jia He wrote: > > > >>> This exports memory_add_physaddr_to_nid() for module driver to use. > > > >>> > > > >>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case > > > >>> NUMA_NO_NID is detected. > > > >>> > > > >>> Suggested-by: David Hildenbrand <david@redhat.com> > > > >>> Signed-off-by: Jia He <justin.he@arm.com> > > > >>> --- > > > >>> arch/arm64/mm/numa.c | 5 +++-- > > > >>> 1 file changed, 3 insertions(+), 2 deletions(-) > > > >>> > > > >>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c > > > >>> index aafcee3e3f7e..7eeb31740248 100644 > > > >>> --- a/arch/arm64/mm/numa.c > > > >>> +++ b/arch/arm64/mm/numa.c > > > >>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) > > > >>> > > > >>> /* > > > >>> * We hope that we will be hotplugging memory on nodes we already know about, > > > >>> - * such that acpi_get_node() succeeds and we never fall back to this... > > > >>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node > > > >>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option. > > > >>> */ > > > >>> int memory_add_physaddr_to_nid(u64 addr) > > > >>> { > > > >>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr); > > > >>> return 0; > > > >>> } > > > >>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); > > > >> > > > >> Does it make sense to export a noop function? Wouldn't make more sense > > > >> to simply make it static inline somewhere in a header? I haven't checked > > > >> whether there is an easy way to do that sanely bu this just hit my eyes. > > > > > > > > We'll need to either add a CONFIG_ option or arch specific callback to > > > > make both non-empty (x86, powerpc, ia64) and empty (arm64, sh) > > > > implementations coexist ... > > > > > > Note: I have a similar dummy (return 0) patch for s390x lying around here. > > > > Then we'll call it a tie - 3:3 ;-) > > So I'd be happy to jump on the train of people wanting to export the > ARM stub for this (and add a new ARM stub for phys_to_target_node()), > but Will did have a plausibly better idea that I have been meaning to > circle back to: > > http://lore.kernel.org/r/20200325111039.GA32109@willie-the-truck > > ...i.e. iterate over node data to do the lookup. This would seem to > work generically for multiple archs unless I am missing something? I think it would work on arm64, power and, most propbably on s390 (David?), but not on x86. x86 does not have reserved memory in pgdat, it's never memblock_add()'ed (see e820__memblock_setup()). I've suggested to add E820_*_RESERVED to memblock.memory a while ago [1], but apparently there are systems that cannot tolerate OS mappings of the BIOS reserved areas. [1] https://lore.kernel.org/lkml/20200522142053.GW1059226@linux.ibm.com/ -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-08 5:27 ` Mike Rapoport @ 2020-07-08 7:21 ` David Hildenbrand 2020-07-08 7:38 ` Mike Rapoport 2020-07-08 7:50 ` Dan Williams 0 siblings, 2 replies; 46+ messages in thread From: David Hildenbrand @ 2020-07-08 7:21 UTC (permalink / raw) To: Mike Rapoport, Dan Williams Cc: Michal Hocko, Jia He, Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He, Chuhong Yuan, Linux ARM, Linux Kernel Mailing List, Linux MM, linux-nvdimm, Kaly Xin On 08.07.20 07:27, Mike Rapoport wrote: > On Tue, Jul 07, 2020 at 03:05:48PM -0700, Dan Williams wrote: >> On Tue, Jul 7, 2020 at 11:01 AM Mike Rapoport <rppt@linux.ibm.com> wrote: >>> >>> On Tue, Jul 07, 2020 at 02:26:08PM +0200, David Hildenbrand wrote: >>>> On 07.07.20 14:13, Mike Rapoport wrote: >>>>> On Tue, Jul 07, 2020 at 01:54:54PM +0200, Michal Hocko wrote: >>>>>> On Tue 07-07-20 13:59:15, Jia He wrote: >>>>>>> This exports memory_add_physaddr_to_nid() for module driver to use. >>>>>>> >>>>>>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case >>>>>>> NUMA_NO_NID is detected. >>>>>>> >>>>>>> Suggested-by: David Hildenbrand <david@redhat.com> >>>>>>> Signed-off-by: Jia He <justin.he@arm.com> >>>>>>> --- >>>>>>> arch/arm64/mm/numa.c | 5 +++-- >>>>>>> 1 file changed, 3 insertions(+), 2 deletions(-) >>>>>>> >>>>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c >>>>>>> index aafcee3e3f7e..7eeb31740248 100644 >>>>>>> --- a/arch/arm64/mm/numa.c >>>>>>> +++ b/arch/arm64/mm/numa.c >>>>>>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) >>>>>>> >>>>>>> /* >>>>>>> * We hope that we will be hotplugging memory on nodes we already know about, >>>>>>> - * such that acpi_get_node() succeeds and we never fall back to this... >>>>>>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node >>>>>>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option. >>>>>>> */ >>>>>>> int memory_add_physaddr_to_nid(u64 addr) >>>>>>> { >>>>>>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr); >>>>>>> return 0; >>>>>>> } >>>>>>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); >>>>>> >>>>>> Does it make sense to export a noop function? Wouldn't make more sense >>>>>> to simply make it static inline somewhere in a header? I haven't checked >>>>>> whether there is an easy way to do that sanely bu this just hit my eyes. >>>>> >>>>> We'll need to either add a CONFIG_ option or arch specific callback to >>>>> make both non-empty (x86, powerpc, ia64) and empty (arm64, sh) >>>>> implementations coexist ... >>>> >>>> Note: I have a similar dummy (return 0) patch for s390x lying around here. >>> >>> Then we'll call it a tie - 3:3 ;-) >> >> So I'd be happy to jump on the train of people wanting to export the >> ARM stub for this (and add a new ARM stub for phys_to_target_node()), >> but Will did have a plausibly better idea that I have been meaning to >> circle back to: >> >> http://lore.kernel.org/r/20200325111039.GA32109@willie-the-truck >> >> ...i.e. iterate over node data to do the lookup. This would seem to >> work generically for multiple archs unless I am missing something? IIRC, only memory assigned to/onlined to a ZONE is represented in the pgdat node span. E.g., not offline memory blocks. Esp., when hotplugging + onlining consecutive memory, there won't really be any intersections in most cases if I am not wrong. It would not be "intersection" but rather "closest fit". With overlapping nodes it's even more unclear. Which one to pick? > > I think it would work on arm64, power and, most propbably on s390 With only a single dummy node I guess it should work (searching when there is only a single node does not make too much sense). > (David?), but not on x86. x86 does not have reserved memory in pgdat, > it's never memblock_add()'ed (see e820__memblock_setup()). Can you enlighten me why that is relevant for the memory hotplug path? (or is it just a general comment to make the function as accurate as possible for all addresses?) -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-08 7:21 ` David Hildenbrand @ 2020-07-08 7:38 ` Mike Rapoport 2020-07-08 7:40 ` David Hildenbrand 2020-07-08 7:50 ` Dan Williams 1 sibling, 1 reply; 46+ messages in thread From: Mike Rapoport @ 2020-07-08 7:38 UTC (permalink / raw) To: David Hildenbrand Cc: Dan Williams, Michal Hocko, Jia He, Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He, Chuhong Yuan, Linux ARM, Linux Kernel Mailing List, Linux MM, linux-nvdimm, Kaly Xin On Wed, Jul 08, 2020 at 09:21:25AM +0200, David Hildenbrand wrote: > On 08.07.20 07:27, Mike Rapoport wrote: > > On Tue, Jul 07, 2020 at 03:05:48PM -0700, Dan Williams wrote: > >> On Tue, Jul 7, 2020 at 11:01 AM Mike Rapoport <rppt@linux.ibm.com> wrote: > >>> > >>> On Tue, Jul 07, 2020 at 02:26:08PM +0200, David Hildenbrand wrote: > >>>> On 07.07.20 14:13, Mike Rapoport wrote: > >>>>> On Tue, Jul 07, 2020 at 01:54:54PM +0200, Michal Hocko wrote: > >>>>>> On Tue 07-07-20 13:59:15, Jia He wrote: > >>>>>>> This exports memory_add_physaddr_to_nid() for module driver to use. > >>>>>>> > >>>>>>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case > >>>>>>> NUMA_NO_NID is detected. > >>>>>>> > >>>>>>> Suggested-by: David Hildenbrand <david@redhat.com> > >>>>>>> Signed-off-by: Jia He <justin.he@arm.com> > >>>>>>> --- > >>>>>>> arch/arm64/mm/numa.c | 5 +++-- > >>>>>>> 1 file changed, 3 insertions(+), 2 deletions(-) > >>>>>>> > >>>>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c > >>>>>>> index aafcee3e3f7e..7eeb31740248 100644 > >>>>>>> --- a/arch/arm64/mm/numa.c > >>>>>>> +++ b/arch/arm64/mm/numa.c > >>>>>>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) > >>>>>>> > >>>>>>> /* > >>>>>>> * We hope that we will be hotplugging memory on nodes we already know about, > >>>>>>> - * such that acpi_get_node() succeeds and we never fall back to this... > >>>>>>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node > >>>>>>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option. > >>>>>>> */ > >>>>>>> int memory_add_physaddr_to_nid(u64 addr) > >>>>>>> { > >>>>>>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr); > >>>>>>> return 0; > >>>>>>> } > >>>>>>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); > >>>>>> > >>>>>> Does it make sense to export a noop function? Wouldn't make more sense > >>>>>> to simply make it static inline somewhere in a header? I haven't checked > >>>>>> whether there is an easy way to do that sanely bu this just hit my eyes. > >>>>> > >>>>> We'll need to either add a CONFIG_ option or arch specific callback to > >>>>> make both non-empty (x86, powerpc, ia64) and empty (arm64, sh) > >>>>> implementations coexist ... > >>>> > >>>> Note: I have a similar dummy (return 0) patch for s390x lying around here. > >>> > >>> Then we'll call it a tie - 3:3 ;-) > >> > >> So I'd be happy to jump on the train of people wanting to export the > >> ARM stub for this (and add a new ARM stub for phys_to_target_node()), > >> but Will did have a plausibly better idea that I have been meaning to > >> circle back to: > >> > >> http://lore.kernel.org/r/20200325111039.GA32109@willie-the-truck > >> > >> ...i.e. iterate over node data to do the lookup. This would seem to > >> work generically for multiple archs unless I am missing something? > > IIRC, only memory assigned to/onlined to a ZONE is represented in the > pgdat node span. E.g., not offline memory blocks. > > Esp., when hotplugging + onlining consecutive memory, there won't really > be any intersections in most cases if I am not wrong. It would not be > "intersection" but rather "closest fit". > > With overlapping nodes it's even more unclear. Which one to pick? > > > > > I think it would work on arm64, power and, most propbably on s390 > > With only a single dummy node I guess it should work (searching when > there is only a single node does not make too much sense). > > > (David?), but not on x86. x86 does not have reserved memory in pgdat, > > it's never memblock_add()'ed (see e820__memblock_setup()). > > Can you enlighten me why that is relevant for the memory hotplug path? > (or is it just a general comment to make the function as accurate as > possible for all addresses?) phys_to_target_node() on x86 falls back to numa_reserved_meminfo which holds memory that is never listed in a node. > -- > Thanks, > > David / dhildenb > -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-08 7:38 ` Mike Rapoport @ 2020-07-08 7:40 ` David Hildenbrand 0 siblings, 0 replies; 46+ messages in thread From: David Hildenbrand @ 2020-07-08 7:40 UTC (permalink / raw) To: Mike Rapoport Cc: Dan Williams, Michal Hocko, Jia He, Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He, Chuhong Yuan, Linux ARM, Linux Kernel Mailing List, Linux MM, linux-nvdimm, Kaly Xin On 08.07.20 09:38, Mike Rapoport wrote: > On Wed, Jul 08, 2020 at 09:21:25AM +0200, David Hildenbrand wrote: >> On 08.07.20 07:27, Mike Rapoport wrote: >>> On Tue, Jul 07, 2020 at 03:05:48PM -0700, Dan Williams wrote: >>>> On Tue, Jul 7, 2020 at 11:01 AM Mike Rapoport <rppt@linux.ibm.com> wrote: >>>>> >>>>> On Tue, Jul 07, 2020 at 02:26:08PM +0200, David Hildenbrand wrote: >>>>>> On 07.07.20 14:13, Mike Rapoport wrote: >>>>>>> On Tue, Jul 07, 2020 at 01:54:54PM +0200, Michal Hocko wrote: >>>>>>>> On Tue 07-07-20 13:59:15, Jia He wrote: >>>>>>>>> This exports memory_add_physaddr_to_nid() for module driver to use. >>>>>>>>> >>>>>>>>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case >>>>>>>>> NUMA_NO_NID is detected. >>>>>>>>> >>>>>>>>> Suggested-by: David Hildenbrand <david@redhat.com> >>>>>>>>> Signed-off-by: Jia He <justin.he@arm.com> >>>>>>>>> --- >>>>>>>>> arch/arm64/mm/numa.c | 5 +++-- >>>>>>>>> 1 file changed, 3 insertions(+), 2 deletions(-) >>>>>>>>> >>>>>>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c >>>>>>>>> index aafcee3e3f7e..7eeb31740248 100644 >>>>>>>>> --- a/arch/arm64/mm/numa.c >>>>>>>>> +++ b/arch/arm64/mm/numa.c >>>>>>>>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) >>>>>>>>> >>>>>>>>> /* >>>>>>>>> * We hope that we will be hotplugging memory on nodes we already know about, >>>>>>>>> - * such that acpi_get_node() succeeds and we never fall back to this... >>>>>>>>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node >>>>>>>>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option. >>>>>>>>> */ >>>>>>>>> int memory_add_physaddr_to_nid(u64 addr) >>>>>>>>> { >>>>>>>>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr); >>>>>>>>> return 0; >>>>>>>>> } >>>>>>>>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); >>>>>>>> >>>>>>>> Does it make sense to export a noop function? Wouldn't make more sense >>>>>>>> to simply make it static inline somewhere in a header? I haven't checked >>>>>>>> whether there is an easy way to do that sanely bu this just hit my eyes. >>>>>>> >>>>>>> We'll need to either add a CONFIG_ option or arch specific callback to >>>>>>> make both non-empty (x86, powerpc, ia64) and empty (arm64, sh) >>>>>>> implementations coexist ... >>>>>> >>>>>> Note: I have a similar dummy (return 0) patch for s390x lying around here. >>>>> >>>>> Then we'll call it a tie - 3:3 ;-) >>>> >>>> So I'd be happy to jump on the train of people wanting to export the >>>> ARM stub for this (and add a new ARM stub for phys_to_target_node()), >>>> but Will did have a plausibly better idea that I have been meaning to >>>> circle back to: >>>> >>>> http://lore.kernel.org/r/20200325111039.GA32109@willie-the-truck >>>> >>>> ...i.e. iterate over node data to do the lookup. This would seem to >>>> work generically for multiple archs unless I am missing something? >> >> IIRC, only memory assigned to/onlined to a ZONE is represented in the >> pgdat node span. E.g., not offline memory blocks. >> >> Esp., when hotplugging + onlining consecutive memory, there won't really >> be any intersections in most cases if I am not wrong. It would not be >> "intersection" but rather "closest fit". >> >> With overlapping nodes it's even more unclear. Which one to pick? >> >>> >>> I think it would work on arm64, power and, most propbably on s390 >> >> With only a single dummy node I guess it should work (searching when >> there is only a single node does not make too much sense). >> >>> (David?), but not on x86. x86 does not have reserved memory in pgdat, >>> it's never memblock_add()'ed (see e820__memblock_setup()). >> >> Can you enlighten me why that is relevant for the memory hotplug path? >> (or is it just a general comment to make the function as accurate as >> possible for all addresses?) > > phys_to_target_node() on x86 falls back to numa_reserved_meminfo which > holds memory that is never listed in a node. > Ah, I see - thanks. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-08 7:21 ` David Hildenbrand 2020-07-08 7:38 ` Mike Rapoport @ 2020-07-08 7:50 ` Dan Williams 2020-07-08 8:26 ` David Hildenbrand 1 sibling, 1 reply; 46+ messages in thread From: Dan Williams @ 2020-07-08 7:50 UTC (permalink / raw) To: David Hildenbrand Cc: Mike Rapoport, Michal Hocko, Jia He, Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He, Chuhong Yuan, Linux ARM, Linux Kernel Mailing List, Linux MM, linux-nvdimm, Kaly Xin On Wed, Jul 8, 2020 at 12:22 AM David Hildenbrand <david@redhat.com> wrote: > > On 08.07.20 07:27, Mike Rapoport wrote: > > On Tue, Jul 07, 2020 at 03:05:48PM -0700, Dan Williams wrote: > >> On Tue, Jul 7, 2020 at 11:01 AM Mike Rapoport <rppt@linux.ibm.com> wrote: > >>> > >>> On Tue, Jul 07, 2020 at 02:26:08PM +0200, David Hildenbrand wrote: > >>>> On 07.07.20 14:13, Mike Rapoport wrote: > >>>>> On Tue, Jul 07, 2020 at 01:54:54PM +0200, Michal Hocko wrote: > >>>>>> On Tue 07-07-20 13:59:15, Jia He wrote: > >>>>>>> This exports memory_add_physaddr_to_nid() for module driver to use. > >>>>>>> > >>>>>>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case > >>>>>>> NUMA_NO_NID is detected. > >>>>>>> > >>>>>>> Suggested-by: David Hildenbrand <david@redhat.com> > >>>>>>> Signed-off-by: Jia He <justin.he@arm.com> > >>>>>>> --- > >>>>>>> arch/arm64/mm/numa.c | 5 +++-- > >>>>>>> 1 file changed, 3 insertions(+), 2 deletions(-) > >>>>>>> > >>>>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c > >>>>>>> index aafcee3e3f7e..7eeb31740248 100644 > >>>>>>> --- a/arch/arm64/mm/numa.c > >>>>>>> +++ b/arch/arm64/mm/numa.c > >>>>>>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) > >>>>>>> > >>>>>>> /* > >>>>>>> * We hope that we will be hotplugging memory on nodes we already know about, > >>>>>>> - * such that acpi_get_node() succeeds and we never fall back to this... > >>>>>>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node > >>>>>>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option. > >>>>>>> */ > >>>>>>> int memory_add_physaddr_to_nid(u64 addr) > >>>>>>> { > >>>>>>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr); > >>>>>>> return 0; > >>>>>>> } > >>>>>>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); > >>>>>> > >>>>>> Does it make sense to export a noop function? Wouldn't make more sense > >>>>>> to simply make it static inline somewhere in a header? I haven't checked > >>>>>> whether there is an easy way to do that sanely bu this just hit my eyes. > >>>>> > >>>>> We'll need to either add a CONFIG_ option or arch specific callback to > >>>>> make both non-empty (x86, powerpc, ia64) and empty (arm64, sh) > >>>>> implementations coexist ... > >>>> > >>>> Note: I have a similar dummy (return 0) patch for s390x lying around here. > >>> > >>> Then we'll call it a tie - 3:3 ;-) > >> > >> So I'd be happy to jump on the train of people wanting to export the > >> ARM stub for this (and add a new ARM stub for phys_to_target_node()), > >> but Will did have a plausibly better idea that I have been meaning to > >> circle back to: > >> > >> http://lore.kernel.org/r/20200325111039.GA32109@willie-the-truck > >> > >> ...i.e. iterate over node data to do the lookup. This would seem to > >> work generically for multiple archs unless I am missing something? > > IIRC, only memory assigned to/onlined to a ZONE is represented in the > pgdat node span. E.g., not offline memory blocks. So this dovetails somewhat with Will's idea. What if we populated node_data for "offline" ranges? I started there, but then saw ARCH_KEEP_MEMBLOCK and thought it would be safer to just teach phys_to_target_node() to use that rather than update other code paths to expect node_data might not always reflect online data. > Esp., when hotplugging + onlining consecutive memory, there won't really > be any intersections in most cases if I am not wrong. It would not be > "intersection" but rather "closest fit". > > With overlapping nodes it's even more unclear. Which one to pick? In the overlap case you get what you get. Some signal is better than the noise of a dummy function. The consequences of picking the wrong node might be that the kernel can't properly associate a memory range to its performance data tables in firmware, but then again firmware messed up with an overlapping node definition in the first instance. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-08 7:50 ` Dan Williams @ 2020-07-08 8:26 ` David Hildenbrand 2020-07-08 8:39 ` Mike Rapoport 0 siblings, 1 reply; 46+ messages in thread From: David Hildenbrand @ 2020-07-08 8:26 UTC (permalink / raw) To: Dan Williams Cc: Mike Rapoport, Michal Hocko, Jia He, Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He, Chuhong Yuan, Linux ARM, Linux Kernel Mailing List, Linux MM, linux-nvdimm, Kaly Xin On 08.07.20 09:50, Dan Williams wrote: > On Wed, Jul 8, 2020 at 12:22 AM David Hildenbrand <david@redhat.com> wrote: >> >> On 08.07.20 07:27, Mike Rapoport wrote: >>> On Tue, Jul 07, 2020 at 03:05:48PM -0700, Dan Williams wrote: >>>> On Tue, Jul 7, 2020 at 11:01 AM Mike Rapoport <rppt@linux.ibm.com> wrote: >>>>> >>>>> On Tue, Jul 07, 2020 at 02:26:08PM +0200, David Hildenbrand wrote: >>>>>> On 07.07.20 14:13, Mike Rapoport wrote: >>>>>>> On Tue, Jul 07, 2020 at 01:54:54PM +0200, Michal Hocko wrote: >>>>>>>> On Tue 07-07-20 13:59:15, Jia He wrote: >>>>>>>>> This exports memory_add_physaddr_to_nid() for module driver to use. >>>>>>>>> >>>>>>>>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case >>>>>>>>> NUMA_NO_NID is detected. >>>>>>>>> >>>>>>>>> Suggested-by: David Hildenbrand <david@redhat.com> >>>>>>>>> Signed-off-by: Jia He <justin.he@arm.com> >>>>>>>>> --- >>>>>>>>> arch/arm64/mm/numa.c | 5 +++-- >>>>>>>>> 1 file changed, 3 insertions(+), 2 deletions(-) >>>>>>>>> >>>>>>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c >>>>>>>>> index aafcee3e3f7e..7eeb31740248 100644 >>>>>>>>> --- a/arch/arm64/mm/numa.c >>>>>>>>> +++ b/arch/arm64/mm/numa.c >>>>>>>>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) >>>>>>>>> >>>>>>>>> /* >>>>>>>>> * We hope that we will be hotplugging memory on nodes we already know about, >>>>>>>>> - * such that acpi_get_node() succeeds and we never fall back to this... >>>>>>>>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node >>>>>>>>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option. >>>>>>>>> */ >>>>>>>>> int memory_add_physaddr_to_nid(u64 addr) >>>>>>>>> { >>>>>>>>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr); >>>>>>>>> return 0; >>>>>>>>> } >>>>>>>>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); >>>>>>>> >>>>>>>> Does it make sense to export a noop function? Wouldn't make more sense >>>>>>>> to simply make it static inline somewhere in a header? I haven't checked >>>>>>>> whether there is an easy way to do that sanely bu this just hit my eyes. >>>>>>> >>>>>>> We'll need to either add a CONFIG_ option or arch specific callback to >>>>>>> make both non-empty (x86, powerpc, ia64) and empty (arm64, sh) >>>>>>> implementations coexist ... >>>>>> >>>>>> Note: I have a similar dummy (return 0) patch for s390x lying around here. >>>>> >>>>> Then we'll call it a tie - 3:3 ;-) >>>> >>>> So I'd be happy to jump on the train of people wanting to export the >>>> ARM stub for this (and add a new ARM stub for phys_to_target_node()), >>>> but Will did have a plausibly better idea that I have been meaning to >>>> circle back to: >>>> >>>> http://lore.kernel.org/r/20200325111039.GA32109@willie-the-truck >>>> >>>> ...i.e. iterate over node data to do the lookup. This would seem to >>>> work generically for multiple archs unless I am missing something? >> >> IIRC, only memory assigned to/onlined to a ZONE is represented in the >> pgdat node span. E.g., not offline memory blocks. > > So this dovetails somewhat with Will's idea. What if we populated > node_data for "offline" ranges? I started there, but then saw > ARCH_KEEP_MEMBLOCK and thought it would be safer to just teach > phys_to_target_node() to use that rather than update other code paths > to expect node_data might not always reflect online data. We currently need a somewhat-accurate pgdat node span to detect when to offline a node. See try_offline_node(). This works fairly reliable. Shrinking the node span is currently fairly easy for !ZONE_DEVICE memory, because we can rely on pfn_to_online_page() + pfn_to_nid(pfn). See e.g., find_biggest_section_pfn(). If we glue growing/shrinking the node span to adding/removing of memory (instead of e.g., onlining/offlining), we can no longer base shrinking on memmap data. We would have to get the information ("how far can I shrink the node span, is it empty?") from somewhere else. E.g., for_each_memory_block() - but that one does not cover ZONE_DEVICE. And there are memory blocks which cover multiple nodes, in which case we only store one of them ... unreliable. This certainly needs more thought :/ > >> Esp., when hotplugging + onlining consecutive memory, there won't really >> be any intersections in most cases if I am not wrong. It would not be >> "intersection" but rather "closest fit". >> >> With overlapping nodes it's even more unclear. Which one to pick? > > In the overlap case you get what you get. Some signal is better than > the noise of a dummy function. The consequences of picking the wrong > node might be that the kernel can't properly associate a memory range > to its performance data tables in firmware, but then again firmware > messed up with an overlapping node definition in the first instance. I'd be curious if what we are trying to optimize here is actually worth optimizing. IOW, is there a well-known scenario where the dummy value on arm64 would be problematic and is worth the effort? I mean, in all performance relevant setups (ignoring hv_balloon/xen-balloon/prove_store(), which also use memory_add_physaddr_to_nid()), we should have a proper PXM/node specified by the hardware on memory hotadd. The fallback of memory_add_physaddr_to_nid() is not relevant in these scenarios. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-08 8:26 ` David Hildenbrand @ 2020-07-08 8:39 ` Mike Rapoport 2020-07-08 8:45 ` David Hildenbrand 0 siblings, 1 reply; 46+ messages in thread From: Mike Rapoport @ 2020-07-08 8:39 UTC (permalink / raw) To: David Hildenbrand Cc: Dan Williams, Michal Hocko, Jia He, Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He, Chuhong Yuan, Linux ARM, Linux Kernel Mailing List, Linux MM, linux-nvdimm, Kaly Xin On Wed, Jul 08, 2020 at 10:26:41AM +0200, David Hildenbrand wrote: > On 08.07.20 09:50, Dan Williams wrote: > > On Wed, Jul 8, 2020 at 12:22 AM David Hildenbrand <david@redhat.com> wrote: > >> > >>>>>>>> On Tue 07-07-20 13:59:15, Jia He wrote: > >>>>>>>>> This exports memory_add_physaddr_to_nid() for module driver to use. > >>>>>>>>> > >>>>>>>>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case > >>>>>>>>> NUMA_NO_NID is detected. > >>>>>>>>> > >>>>>>>>> Suggested-by: David Hildenbrand <david@redhat.com> > >>>>>>>>> Signed-off-by: Jia He <justin.he@arm.com> > >>>>>>>>> --- > >>>>>>>>> arch/arm64/mm/numa.c | 5 +++-- > >>>>>>>>> 1 file changed, 3 insertions(+), 2 deletions(-) > >>>>>>>>> > >>>>>>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c > >>>>>>>>> index aafcee3e3f7e..7eeb31740248 100644 > >>>>>>>>> --- a/arch/arm64/mm/numa.c > >>>>>>>>> +++ b/arch/arm64/mm/numa.c > >>>>>>>>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) > >>>>>>>>> > >>>>>>>>> /* > >>>>>>>>> * We hope that we will be hotplugging memory on nodes we already know about, > >>>>>>>>> - * such that acpi_get_node() succeeds and we never fall back to this... > >>>>>>>>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node > >>>>>>>>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option. > >>>>>>>>> */ > >>>>>>>>> int memory_add_physaddr_to_nid(u64 addr) > >>>>>>>>> { > >>>>>>>>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr); > >>>>>>>>> return 0; > >>>>>>>>> } > >>>>>>>>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); > >>>>>>>> > >>>>>>>> Does it make sense to export a noop function? Wouldn't make more sense > >>>>>>>> to simply make it static inline somewhere in a header? I haven't checked > >>>>>>>> whether there is an easy way to do that sanely bu this just hit my eyes. > I'd be curious if what we are trying to optimize here is actually worth > optimizing. IOW, is there a well-known scenario where the dummy value on > arm64 would be problematic and is worth the effort? Well, it started with Michal's comment above that EXPORT_SYMBOL_GPL() for a stub might be an overkill. I think Jia's suggestion [1] with addition of a comment that explains why and when the stub will be used, can work for both memory_add_physaddr_to_nid() and phys_to_target_node(). But on more theoretical/fundmanetal level, I think we lack a generic abstraction similar to e.g. x86 'struct numa_meminfo' that serves as translaton of firmware supplied information into data that can be used by the generic mm without need to reimplement it for each and every arch. [1] https://lore.kernel.org/lkml/AM6PR08MB406907F9F2B13DA6DC893AD9F7670@AM6PR08MB4069.eurprd08.prod.outlook.com > I mean, in all performance relevant setups (ignoring > hv_balloon/xen-balloon/prove_store(), which also use > memory_add_physaddr_to_nid()), we should have a proper PXM/node > specified by the hardware on memory hotadd. The fallback of > memory_add_physaddr_to_nid() is not relevant in these scenarios. > > -- > Thanks, > > David / dhildenb > -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-08 8:39 ` Mike Rapoport @ 2020-07-08 8:45 ` David Hildenbrand 2020-07-08 9:15 ` Mike Rapoport 0 siblings, 1 reply; 46+ messages in thread From: David Hildenbrand @ 2020-07-08 8:45 UTC (permalink / raw) To: Mike Rapoport Cc: Dan Williams, Michal Hocko, Jia He, Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He, Chuhong Yuan, Linux ARM, Linux Kernel Mailing List, Linux MM, linux-nvdimm, Kaly Xin On 08.07.20 10:39, Mike Rapoport wrote: > On Wed, Jul 08, 2020 at 10:26:41AM +0200, David Hildenbrand wrote: >> On 08.07.20 09:50, Dan Williams wrote: >>> On Wed, Jul 8, 2020 at 12:22 AM David Hildenbrand <david@redhat.com> wrote: >>>> >>>>>>>>>> On Tue 07-07-20 13:59:15, Jia He wrote: >>>>>>>>>>> This exports memory_add_physaddr_to_nid() for module driver to use. >>>>>>>>>>> >>>>>>>>>>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case >>>>>>>>>>> NUMA_NO_NID is detected. >>>>>>>>>>> >>>>>>>>>>> Suggested-by: David Hildenbrand <david@redhat.com> >>>>>>>>>>> Signed-off-by: Jia He <justin.he@arm.com> >>>>>>>>>>> --- >>>>>>>>>>> arch/arm64/mm/numa.c | 5 +++-- >>>>>>>>>>> 1 file changed, 3 insertions(+), 2 deletions(-) >>>>>>>>>>> >>>>>>>>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c >>>>>>>>>>> index aafcee3e3f7e..7eeb31740248 100644 >>>>>>>>>>> --- a/arch/arm64/mm/numa.c >>>>>>>>>>> +++ b/arch/arm64/mm/numa.c >>>>>>>>>>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) >>>>>>>>>>> >>>>>>>>>>> /* >>>>>>>>>>> * We hope that we will be hotplugging memory on nodes we already know about, >>>>>>>>>>> - * such that acpi_get_node() succeeds and we never fall back to this... >>>>>>>>>>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node >>>>>>>>>>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option. >>>>>>>>>>> */ >>>>>>>>>>> int memory_add_physaddr_to_nid(u64 addr) >>>>>>>>>>> { >>>>>>>>>>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr); >>>>>>>>>>> return 0; >>>>>>>>>>> } >>>>>>>>>>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); >>>>>>>>>> >>>>>>>>>> Does it make sense to export a noop function? Wouldn't make more sense >>>>>>>>>> to simply make it static inline somewhere in a header? I haven't checked >>>>>>>>>> whether there is an easy way to do that sanely bu this just hit my eyes. > >> I'd be curious if what we are trying to optimize here is actually worth >> optimizing. IOW, is there a well-known scenario where the dummy value on >> arm64 would be problematic and is worth the effort? > > Well, it started with Michal's comment above that EXPORT_SYMBOL_GPL() > for a stub might be an overkill. > > I think Jia's suggestion [1] with addition of a comment that explains > why and when the stub will be used, can work for both > memory_add_physaddr_to_nid() and phys_to_target_node(). Agreed. > > But on more theoretical/fundmanetal level, I think we lack a generic > abstraction similar to e.g. x86 'struct numa_meminfo' that serves as > translaton of firmware supplied information into data that can be used > by the generic mm without need to reimplement it for each and every > arch. Right. As I expressed, I am not a friend of using memblock for that, and the pgdat node span is tricky. Maybe abstracting that x86 concept is possible in some way (and we could restrict the information to boot-time properties, so we don't have to mess with memory hot(un)plug - just as done for numa_meminfo AFAIKS). -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-08 8:45 ` David Hildenbrand @ 2020-07-08 9:15 ` Mike Rapoport 2020-07-08 9:25 ` David Hildenbrand 0 siblings, 1 reply; 46+ messages in thread From: Mike Rapoport @ 2020-07-08 9:15 UTC (permalink / raw) To: David Hildenbrand Cc: Mike Rapoport, Dan Williams, Michal Hocko, Jia He, Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He, Chuhong Yuan, Linux ARM, Linux Kernel Mailing List, Linux MM, linux-nvdimm, Kaly Xin On Wed, Jul 08, 2020 at 10:45:17AM +0200, David Hildenbrand wrote: > On 08.07.20 10:39, Mike Rapoport wrote: > > On Wed, Jul 08, 2020 at 10:26:41AM +0200, David Hildenbrand wrote: > >> On 08.07.20 09:50, Dan Williams wrote: > >>> On Wed, Jul 8, 2020 at 12:22 AM David Hildenbrand <david@redhat.com> wrote: > >>>> > >>>>>>>>>> On Tue 07-07-20 13:59:15, Jia He wrote: > >>>>>>>>>>> This exports memory_add_physaddr_to_nid() for module driver to use. > >>>>>>>>>>> > >>>>>>>>>>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case > >>>>>>>>>>> NUMA_NO_NID is detected. > >>>>>>>>>>> > >>>>>>>>>>> Suggested-by: David Hildenbrand <david@redhat.com> > >>>>>>>>>>> Signed-off-by: Jia He <justin.he@arm.com> > >>>>>>>>>>> --- > >>>>>>>>>>> arch/arm64/mm/numa.c | 5 +++-- > >>>>>>>>>>> 1 file changed, 3 insertions(+), 2 deletions(-) > >>>>>>>>>>> > >>>>>>>>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c > >>>>>>>>>>> index aafcee3e3f7e..7eeb31740248 100644 > >>>>>>>>>>> --- a/arch/arm64/mm/numa.c > >>>>>>>>>>> +++ b/arch/arm64/mm/numa.c > >>>>>>>>>>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) > >>>>>>>>>>> > >>>>>>>>>>> /* > >>>>>>>>>>> * We hope that we will be hotplugging memory on nodes we already know about, > >>>>>>>>>>> - * such that acpi_get_node() succeeds and we never fall back to this... > >>>>>>>>>>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node > >>>>>>>>>>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option. > >>>>>>>>>>> */ > >>>>>>>>>>> int memory_add_physaddr_to_nid(u64 addr) > >>>>>>>>>>> { > >>>>>>>>>>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr); > >>>>>>>>>>> return 0; > >>>>>>>>>>> } > >>>>>>>>>>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); > >>>>>>>>>> > >>>>>>>>>> Does it make sense to export a noop function? Wouldn't make more sense > >>>>>>>>>> to simply make it static inline somewhere in a header? I haven't checked > >>>>>>>>>> whether there is an easy way to do that sanely bu this just hit my eyes. > > > >> I'd be curious if what we are trying to optimize here is actually worth > >> optimizing. IOW, is there a well-known scenario where the dummy value on > >> arm64 would be problematic and is worth the effort? > > > > Well, it started with Michal's comment above that EXPORT_SYMBOL_GPL() > > for a stub might be an overkill. > > > > I think Jia's suggestion [1] with addition of a comment that explains > > why and when the stub will be used, can work for both > > memory_add_physaddr_to_nid() and phys_to_target_node(). > > Agreed. > > > > > But on more theoretical/fundmanetal level, I think we lack a generic > > abstraction similar to e.g. x86 'struct numa_meminfo' that serves as > > translaton of firmware supplied information into data that can be used > > by the generic mm without need to reimplement it for each and every > > arch. > > Right. As I expressed, I am not a friend of using memblock for that, and > the pgdat node span is tricky. > > Maybe abstracting that x86 concept is possible in some way (and we could > restrict the information to boot-time properties, so we don't have to > mess with memory hot(un)plug - just as done for numa_meminfo AFAIKS). I agree with pgdat part and disagree about memblock. It already has non-init physmap, why won't we add memblock.memory to the mix? ;-) Now, seriously, memblock already has all the necessary information about the coldplug memory for several architectures. x86 being an exception because for some reason the reserved memory is not considered memory there. The infrastructure for quiering and iterating memory regions is already there. We just need to leave out the irrelevant parts, like memblock.reserved and allocation funcions. Otherwise we'll add yet another 'struct { start, end }', a horde of covnersion and re-initialization functions that will do more or less the same things as current memblock APIs. > -- > Thanks, > > David / dhildenb > > -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-08 9:15 ` Mike Rapoport @ 2020-07-08 9:25 ` David Hildenbrand 2020-07-08 9:45 ` Mike Rapoport 0 siblings, 1 reply; 46+ messages in thread From: David Hildenbrand @ 2020-07-08 9:25 UTC (permalink / raw) To: Mike Rapoport Cc: Mike Rapoport, Dan Williams, Michal Hocko, Jia He, Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He, Chuhong Yuan, Linux ARM, Linux Kernel Mailing List, Linux MM, linux-nvdimm, Kaly Xin On 08.07.20 11:15, Mike Rapoport wrote: > On Wed, Jul 08, 2020 at 10:45:17AM +0200, David Hildenbrand wrote: >> On 08.07.20 10:39, Mike Rapoport wrote: >>> On Wed, Jul 08, 2020 at 10:26:41AM +0200, David Hildenbrand wrote: >>>> On 08.07.20 09:50, Dan Williams wrote: >>>>> On Wed, Jul 8, 2020 at 12:22 AM David Hildenbrand <david@redhat.com> wrote: >>>>>> >>>>>>>>>>>> On Tue 07-07-20 13:59:15, Jia He wrote: >>>>>>>>>>>>> This exports memory_add_physaddr_to_nid() for module driver to use. >>>>>>>>>>>>> >>>>>>>>>>>>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case >>>>>>>>>>>>> NUMA_NO_NID is detected. >>>>>>>>>>>>> >>>>>>>>>>>>> Suggested-by: David Hildenbrand <david@redhat.com> >>>>>>>>>>>>> Signed-off-by: Jia He <justin.he@arm.com> >>>>>>>>>>>>> --- >>>>>>>>>>>>> arch/arm64/mm/numa.c | 5 +++-- >>>>>>>>>>>>> 1 file changed, 3 insertions(+), 2 deletions(-) >>>>>>>>>>>>> >>>>>>>>>>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c >>>>>>>>>>>>> index aafcee3e3f7e..7eeb31740248 100644 >>>>>>>>>>>>> --- a/arch/arm64/mm/numa.c >>>>>>>>>>>>> +++ b/arch/arm64/mm/numa.c >>>>>>>>>>>>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) >>>>>>>>>>>>> >>>>>>>>>>>>> /* >>>>>>>>>>>>> * We hope that we will be hotplugging memory on nodes we already know about, >>>>>>>>>>>>> - * such that acpi_get_node() succeeds and we never fall back to this... >>>>>>>>>>>>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node >>>>>>>>>>>>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option. >>>>>>>>>>>>> */ >>>>>>>>>>>>> int memory_add_physaddr_to_nid(u64 addr) >>>>>>>>>>>>> { >>>>>>>>>>>>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr); >>>>>>>>>>>>> return 0; >>>>>>>>>>>>> } >>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); >>>>>>>>>>>> >>>>>>>>>>>> Does it make sense to export a noop function? Wouldn't make more sense >>>>>>>>>>>> to simply make it static inline somewhere in a header? I haven't checked >>>>>>>>>>>> whether there is an easy way to do that sanely bu this just hit my eyes. >>> >>>> I'd be curious if what we are trying to optimize here is actually worth >>>> optimizing. IOW, is there a well-known scenario where the dummy value on >>>> arm64 would be problematic and is worth the effort? >>> >>> Well, it started with Michal's comment above that EXPORT_SYMBOL_GPL() >>> for a stub might be an overkill. >>> >>> I think Jia's suggestion [1] with addition of a comment that explains >>> why and when the stub will be used, can work for both >>> memory_add_physaddr_to_nid() and phys_to_target_node(). >> >> Agreed. >> >>> >>> But on more theoretical/fundmanetal level, I think we lack a generic >>> abstraction similar to e.g. x86 'struct numa_meminfo' that serves as >>> translaton of firmware supplied information into data that can be used >>> by the generic mm without need to reimplement it for each and every >>> arch. >> >> Right. As I expressed, I am not a friend of using memblock for that, and >> the pgdat node span is tricky. >> >> Maybe abstracting that x86 concept is possible in some way (and we could >> restrict the information to boot-time properties, so we don't have to >> mess with memory hot(un)plug - just as done for numa_meminfo AFAIKS). > > I agree with pgdat part and disagree about memblock. It already has > non-init physmap, why won't we add memblock.memory to the mix? ;-) Can we generalize and tweak physmap to contain node info? That's all we need, no? (the special mem= parameter handling should not matter for our use case, where "physmap" and "memory" would differ) > > Now, seriously, memblock already has all the necessary information about > the coldplug memory for several architectures. x86 being an exception > because for some reason the reserved memory is not considered memory > there. The infrastructure for quiering and iterating memory regions is > already there. We just need to leave out the irrelevant parts, like > memblock.reserved and allocation funcions. I *really* don't want to mess with memblocks on memory hot(un)plug on x86 and s390x (+other architectures in the future). I also thought about stopping to create memblocks for hotplugged memory on arm64, by tweaking pfn_valid() to query memblocks only for early sections. If "physmem" is not an option, can we at least introduce something like ARCH_UPDTAE_MEMBLOCK_ON_HOTPLUG to avoid doing that on x86 and s390x for now (and later maybe for others)? -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-08 9:25 ` David Hildenbrand @ 2020-07-08 9:45 ` Mike Rapoport 2020-07-08 10:04 ` David Hildenbrand 0 siblings, 1 reply; 46+ messages in thread From: Mike Rapoport @ 2020-07-08 9:45 UTC (permalink / raw) To: David Hildenbrand Cc: Mike Rapoport, Dan Williams, Michal Hocko, Jia He, Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He, Chuhong Yuan, Linux ARM, Linux Kernel Mailing List, Linux MM, linux-nvdimm, Kaly Xin On Wed, Jul 08, 2020 at 11:25:36AM +0200, David Hildenbrand wrote: > On 08.07.20 11:15, Mike Rapoport wrote: > >>>>>> > >>> But on more theoretical/fundmanetal level, I think we lack a generic > >>> abstraction similar to e.g. x86 'struct numa_meminfo' that serves as > >>> translaton of firmware supplied information into data that can be used > >>> by the generic mm without need to reimplement it for each and every > >>> arch. > >> > >> Right. As I expressed, I am not a friend of using memblock for that, and > >> the pgdat node span is tricky. > >> > >> Maybe abstracting that x86 concept is possible in some way (and we could > >> restrict the information to boot-time properties, so we don't have to > >> mess with memory hot(un)plug - just as done for numa_meminfo AFAIKS). > > > > I agree with pgdat part and disagree about memblock. It already has > > non-init physmap, why won't we add memblock.memory to the mix? ;-) > > Can we generalize and tweak physmap to contain node info? That's all we > need, no? (the special mem= parameter handling should not matter for our > use case, where "physmap" and "memory" would differ) TBH, I have only random vague thoughts at the moment. This might be an option. But then we need to enable physmap on !s390, right? > > Now, seriously, memblock already has all the necessary information about > > the coldplug memory for several architectures. x86 being an exception > > because for some reason the reserved memory is not considered memory > > there. The infrastructure for quiering and iterating memory regions is > > already there. We just need to leave out the irrelevant parts, like > > memblock.reserved and allocation funcions. > > I *really* don't want to mess with memblocks on memory hot(un)plug on > x86 and s390x (+other architectures in the future). I also thought about > stopping to create memblocks for hotplugged memory on arm64, by tweaking > pfn_valid() to query memblocks only for early sections. > > If "physmem" is not an option, can we at least introduce something like > ARCH_UPDTAE_MEMBLOCK_ON_HOTPLUG to avoid doing that on x86 and s390x for > now (and later maybe for others)? I have to do more memory hotplug howework to answer that ;-) My general point is that we don't have to reinvent the wheel to have coldplug memory representation, it's already there. We just need a way to use it properly. > -- > Thanks, > > David / dhildenb > -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-08 9:45 ` Mike Rapoport @ 2020-07-08 10:04 ` David Hildenbrand 2020-07-08 15:50 ` Dan Williams 0 siblings, 1 reply; 46+ messages in thread From: David Hildenbrand @ 2020-07-08 10:04 UTC (permalink / raw) To: Mike Rapoport Cc: Mike Rapoport, Dan Williams, Michal Hocko, Jia He, Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He, Chuhong Yuan, Linux ARM, Linux Kernel Mailing List, Linux MM, linux-nvdimm, Kaly Xin On 08.07.20 11:45, Mike Rapoport wrote: > On Wed, Jul 08, 2020 at 11:25:36AM +0200, David Hildenbrand wrote: >> On 08.07.20 11:15, Mike Rapoport wrote: >>>>>>>> >>>>> But on more theoretical/fundmanetal level, I think we lack a generic >>>>> abstraction similar to e.g. x86 'struct numa_meminfo' that serves as >>>>> translaton of firmware supplied information into data that can be used >>>>> by the generic mm without need to reimplement it for each and every >>>>> arch. >>>> >>>> Right. As I expressed, I am not a friend of using memblock for that, and >>>> the pgdat node span is tricky. >>>> >>>> Maybe abstracting that x86 concept is possible in some way (and we could >>>> restrict the information to boot-time properties, so we don't have to >>>> mess with memory hot(un)plug - just as done for numa_meminfo AFAIKS). >>> >>> I agree with pgdat part and disagree about memblock. It already has >>> non-init physmap, why won't we add memblock.memory to the mix? ;-) >> >> Can we generalize and tweak physmap to contain node info? That's all we >> need, no? (the special mem= parameter handling should not matter for our >> use case, where "physmap" and "memory" would differ) > > TBH, I have only random vague thoughts at the moment. This might be an > option. But then we need to enable physmap on !s390, right? Yes, looks like it. > >>> Now, seriously, memblock already has all the necessary information about >>> the coldplug memory for several architectures. x86 being an exception >>> because for some reason the reserved memory is not considered memory >>> there. The infrastructure for quiering and iterating memory regions is >>> already there. We just need to leave out the irrelevant parts, like >>> memblock.reserved and allocation funcions. >> >> I *really* don't want to mess with memblocks on memory hot(un)plug on >> x86 and s390x (+other architectures in the future). I also thought about >> stopping to create memblocks for hotplugged memory on arm64, by tweaking >> pfn_valid() to query memblocks only for early sections. >> >> If "physmem" is not an option, can we at least introduce something like >> ARCH_UPDTAE_MEMBLOCK_ON_HOTPLUG to avoid doing that on x86 and s390x for >> now (and later maybe for others)? > > I have to do more memory hotplug howework to answer that ;-) > > My general point is that we don't have to reinvent the wheel to have > coldplug memory representation, it's already there. We just need a way > to use it properly. Yes, I tend to agree. Details to be clarified :) -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-08 10:04 ` David Hildenbrand @ 2020-07-08 15:50 ` Dan Williams 2020-07-08 16:10 ` David Hildenbrand 0 siblings, 1 reply; 46+ messages in thread From: Dan Williams @ 2020-07-08 15:50 UTC (permalink / raw) To: David Hildenbrand Cc: Mike Rapoport, Mike Rapoport, Michal Hocko, Jia He, Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He, Chuhong Yuan, Linux ARM, Linux Kernel Mailing List, Linux MM, linux-nvdimm, Kaly Xin On Wed, Jul 8, 2020 at 3:04 AM David Hildenbrand <david@redhat.com> wrote: > > On 08.07.20 11:45, Mike Rapoport wrote: > > On Wed, Jul 08, 2020 at 11:25:36AM +0200, David Hildenbrand wrote: > >> On 08.07.20 11:15, Mike Rapoport wrote: > >>>>>>>> > >>>>> But on more theoretical/fundmanetal level, I think we lack a generic > >>>>> abstraction similar to e.g. x86 'struct numa_meminfo' that serves as > >>>>> translaton of firmware supplied information into data that can be used > >>>>> by the generic mm without need to reimplement it for each and every > >>>>> arch. > >>>> > >>>> Right. As I expressed, I am not a friend of using memblock for that, and > >>>> the pgdat node span is tricky. > >>>> > >>>> Maybe abstracting that x86 concept is possible in some way (and we could > >>>> restrict the information to boot-time properties, so we don't have to > >>>> mess with memory hot(un)plug - just as done for numa_meminfo AFAIKS). > >>> > >>> I agree with pgdat part and disagree about memblock. It already has > >>> non-init physmap, why won't we add memblock.memory to the mix? ;-) > >> > >> Can we generalize and tweak physmap to contain node info? That's all we > >> need, no? (the special mem= parameter handling should not matter for our > >> use case, where "physmap" and "memory" would differ) > > > > TBH, I have only random vague thoughts at the moment. This might be an > > option. But then we need to enable physmap on !s390, right? > > Yes, looks like it. > > > > >>> Now, seriously, memblock already has all the necessary information about > >>> the coldplug memory for several architectures. x86 being an exception > >>> because for some reason the reserved memory is not considered memory > >>> there. The infrastructure for quiering and iterating memory regions is > >>> already there. We just need to leave out the irrelevant parts, like > >>> memblock.reserved and allocation funcions. > >> > >> I *really* don't want to mess with memblocks on memory hot(un)plug on > >> x86 and s390x (+other architectures in the future). I also thought about > >> stopping to create memblocks for hotplugged memory on arm64, by tweaking > >> pfn_valid() to query memblocks only for early sections. > >> > >> If "physmem" is not an option, can we at least introduce something like > >> ARCH_UPDTAE_MEMBLOCK_ON_HOTPLUG to avoid doing that on x86 and s390x for > >> now (and later maybe for others)? > > > > I have to do more memory hotplug howework to answer that ;-) > > > > My general point is that we don't have to reinvent the wheel to have > > coldplug memory representation, it's already there. We just need a way > > to use it properly. > > Yes, I tend to agree. Details to be clarified :) I'm not quite understanding the concern, or requirement about "updating memblock" in the hotplug path. The routines memory_add_physaddr_to_nid() and phys_to_target_node() are helpers to interrogate platform-firmware numa info through a common abstraction. They place no burden on the memory hotplug code they're just used to see if a hot-added range lies within an existing node span when platform-firmware otherwise fails to communicate a node. x86 can continue to back those helpers with numa_meminfo, arm64 can use a generic memblock implementation and other archs can follow the arm64 example if they want better numa answers for drivers. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-08 15:50 ` Dan Williams @ 2020-07-08 16:10 ` David Hildenbrand 2020-07-08 16:47 ` Mike Rapoport 0 siblings, 1 reply; 46+ messages in thread From: David Hildenbrand @ 2020-07-08 16:10 UTC (permalink / raw) To: Dan Williams Cc: Mike Rapoport, Mike Rapoport, Michal Hocko, Jia He, Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He, Chuhong Yuan, Linux ARM, Linux Kernel Mailing List, Linux MM, linux-nvdimm, Kaly Xin On 08.07.20 17:50, Dan Williams wrote: > On Wed, Jul 8, 2020 at 3:04 AM David Hildenbrand <david@redhat.com> wrote: >> >> On 08.07.20 11:45, Mike Rapoport wrote: >>> On Wed, Jul 08, 2020 at 11:25:36AM +0200, David Hildenbrand wrote: >>>> On 08.07.20 11:15, Mike Rapoport wrote: >>>>>>>>>> >>>>>>> But on more theoretical/fundmanetal level, I think we lack a generic >>>>>>> abstraction similar to e.g. x86 'struct numa_meminfo' that serves as >>>>>>> translaton of firmware supplied information into data that can be used >>>>>>> by the generic mm without need to reimplement it for each and every >>>>>>> arch. >>>>>> >>>>>> Right. As I expressed, I am not a friend of using memblock for that, and >>>>>> the pgdat node span is tricky. >>>>>> >>>>>> Maybe abstracting that x86 concept is possible in some way (and we could >>>>>> restrict the information to boot-time properties, so we don't have to >>>>>> mess with memory hot(un)plug - just as done for numa_meminfo AFAIKS). >>>>> >>>>> I agree with pgdat part and disagree about memblock. It already has >>>>> non-init physmap, why won't we add memblock.memory to the mix? ;-) >>>> >>>> Can we generalize and tweak physmap to contain node info? That's all we >>>> need, no? (the special mem= parameter handling should not matter for our >>>> use case, where "physmap" and "memory" would differ) >>> >>> TBH, I have only random vague thoughts at the moment. This might be an >>> option. But then we need to enable physmap on !s390, right? >> >> Yes, looks like it. >> >>> >>>>> Now, seriously, memblock already has all the necessary information about >>>>> the coldplug memory for several architectures. x86 being an exception >>>>> because for some reason the reserved memory is not considered memory >>>>> there. The infrastructure for quiering and iterating memory regions is >>>>> already there. We just need to leave out the irrelevant parts, like >>>>> memblock.reserved and allocation funcions. >>>> >>>> I *really* don't want to mess with memblocks on memory hot(un)plug on >>>> x86 and s390x (+other architectures in the future). I also thought about >>>> stopping to create memblocks for hotplugged memory on arm64, by tweaking >>>> pfn_valid() to query memblocks only for early sections. >>>> >>>> If "physmem" is not an option, can we at least introduce something like >>>> ARCH_UPDTAE_MEMBLOCK_ON_HOTPLUG to avoid doing that on x86 and s390x for >>>> now (and later maybe for others)? >>> >>> I have to do more memory hotplug howework to answer that ;-) >>> >>> My general point is that we don't have to reinvent the wheel to have >>> coldplug memory representation, it's already there. We just need a way >>> to use it properly. >> >> Yes, I tend to agree. Details to be clarified :) > > I'm not quite understanding the concern, or requirement about > "updating memblock" in the hotplug path. The routines > memory_add_physaddr_to_nid() and phys_to_target_node() are helpers to > interrogate platform-firmware numa info through a common abstraction. > They place no burden on the memory hotplug code they're just used to > see if a hot-added range lies within an existing node span when > platform-firmware otherwise fails to communicate a node. x86 can > continue to back those helpers with numa_meminfo, arm64 can use a > generic memblock implementation and other archs can follow the arm64 > example if they want better numa answers for drivers. > See memblock_add_node()/memblock_remove() in mm/memory_hotplug.c. I don't want that code be reactivated for x86/s390x. That's all I am saying. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-08 16:10 ` David Hildenbrand @ 2020-07-08 16:47 ` Mike Rapoport 0 siblings, 0 replies; 46+ messages in thread From: Mike Rapoport @ 2020-07-08 16:47 UTC (permalink / raw) To: David Hildenbrand Cc: Dan Williams, Mike Rapoport, Michal Hocko, Jia He, Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He, Chuhong Yuan, Linux ARM, Linux Kernel Mailing List, Linux MM, linux-nvdimm, Kaly Xin On Wed, Jul 08, 2020 at 06:10:19PM +0200, David Hildenbrand wrote: > On 08.07.20 17:50, Dan Williams wrote: > > On Wed, Jul 8, 2020 at 3:04 AM David Hildenbrand <david@redhat.com> wrote: > >> > > > > I'm not quite understanding the concern, or requirement about > > "updating memblock" in the hotplug path. The routines > > memory_add_physaddr_to_nid() and phys_to_target_node() are helpers to > > interrogate platform-firmware numa info through a common abstraction. > > They place no burden on the memory hotplug code they're just used to > > see if a hot-added range lies within an existing node span when > > platform-firmware otherwise fails to communicate a node. x86 can > > continue to back those helpers with numa_meminfo, arm64 can use a > > generic memblock implementation and other archs can follow the arm64 > > example if they want better numa answers for drivers. > > > > See memblock_add_node()/memblock_remove() in mm/memory_hotplug.c. I > don't want that code be reactivated for x86/s390x. That's all I am saying. And these have actual meaning only on arm64 because powerpc does not rely on memblock for memory hot(un)plug, AFAIU. Anyway, at the moment we can use memblock on hotplug path only on arm64 and I don't think its the path worth exploring. > -- > Thanks, > > David / dhildenb > -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 46+ messages in thread
* RE: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-07 11:54 ` Michal Hocko 2020-07-07 12:13 ` Mike Rapoport @ 2020-07-08 2:20 ` Justin He 2020-07-08 3:56 ` Dan Williams 1 sibling, 1 reply; 46+ messages in thread From: Justin He @ 2020-07-08 2:20 UTC (permalink / raw) To: Michal Hocko, David Hildenbrand Cc: Catalin Marinas, Will Deacon, Dan Williams, Vishal Verma, Dave Jiang, Andrew Morton, Mike Rapoport, Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm, linux-nvdimm, Kaly Xin Hi Michal and David > -----Original Message----- > From: Michal Hocko <mhocko@kernel.org> > Sent: Tuesday, July 7, 2020 7:55 PM > To: Justin He <Justin.He@arm.com> > Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Will Deacon > <will@kernel.org>; Dan Williams <dan.j.williams@intel.com>; Vishal Verma > <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com>; Andrew > Morton <akpm@linux-foundation.org>; Mike Rapoport <rppt@linux.ibm.com>; > Baoquan He <bhe@redhat.com>; Chuhong Yuan <hslester96@gmail.com>; linux- > arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux- > mm@kvack.org; linux-nvdimm@lists.01.org; Kaly Xin <Kaly.Xin@arm.com> > Subject: Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid > as EXPORT_SYMBOL_GPL > > On Tue 07-07-20 13:59:15, Jia He wrote: > > This exports memory_add_physaddr_to_nid() for module driver to use. > > > > memory_add_physaddr_to_nid() is a fallback option to get the nid in case > > NUMA_NO_NID is detected. > > > > Suggested-by: David Hildenbrand <david@redhat.com> > > Signed-off-by: Jia He <justin.he@arm.com> > > --- > > arch/arm64/mm/numa.c | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c > > index aafcee3e3f7e..7eeb31740248 100644 > > --- a/arch/arm64/mm/numa.c > > +++ b/arch/arm64/mm/numa.c > > @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) > > > > /* > > * We hope that we will be hotplugging memory on nodes we already know > about, > > - * such that acpi_get_node() succeeds and we never fall back to this... > > + * such that acpi_get_node() succeeds. But when SRAT is not present, > the node > > + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback > option. > > */ > > int memory_add_physaddr_to_nid(u64 addr) > > { > > - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", > addr); > > return 0; > > } > > +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); > > Does it make sense to export a noop function? Wouldn't make more sense > to simply make it static inline somewhere in a header? I haven't checked > whether there is an easy way to do that sanely bu this just hit my eyes. Okay, I can make a change in memory_hotplug.h, sth like: --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -149,13 +149,13 @@ int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages, struct mhp_params *params); #endif /* ARCH_HAS_ADD_PAGES */ -#ifdef CONFIG_NUMA -extern int memory_add_physaddr_to_nid(u64 start); -#else +#if !defined(CONFIG_NUMA) || !defined(memory_add_physaddr_to_nid) static inline int memory_add_physaddr_to_nid(u64 start) { return 0; } +#else +extern int memory_add_physaddr_to_nid(u64 start); #endif And then check the memory_add_physaddr_to_nid() helper on all arches, if it is noop(return 0), I can simply remove it. if it is not noop, after the helper, #define memory_add_physaddr_to_nid What do you think of this proposal? -- Cheers, Justin (Jia He) ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-08 2:20 ` Justin He @ 2020-07-08 3:56 ` Dan Williams 2020-07-08 4:08 ` Justin He 2020-07-08 5:32 ` Mike Rapoport 0 siblings, 2 replies; 46+ messages in thread From: Dan Williams @ 2020-07-08 3:56 UTC (permalink / raw) To: Justin He Cc: Michal Hocko, David Hildenbrand, Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Mike Rapoport, Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm, linux-nvdimm, Kaly Xin On Tue, Jul 7, 2020 at 7:20 PM Justin He <Justin.He@arm.com> wrote: > > Hi Michal and David > > > -----Original Message----- > > From: Michal Hocko <mhocko@kernel.org> > > Sent: Tuesday, July 7, 2020 7:55 PM > > To: Justin He <Justin.He@arm.com> > > Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Will Deacon > > <will@kernel.org>; Dan Williams <dan.j.williams@intel.com>; Vishal Verma > > <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com>; Andrew > > Morton <akpm@linux-foundation.org>; Mike Rapoport <rppt@linux.ibm.com>; > > Baoquan He <bhe@redhat.com>; Chuhong Yuan <hslester96@gmail.com>; linux- > > arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux- > > mm@kvack.org; linux-nvdimm@lists.01.org; Kaly Xin <Kaly.Xin@arm.com> > > Subject: Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid > > as EXPORT_SYMBOL_GPL > > > > On Tue 07-07-20 13:59:15, Jia He wrote: > > > This exports memory_add_physaddr_to_nid() for module driver to use. > > > > > > memory_add_physaddr_to_nid() is a fallback option to get the nid in case > > > NUMA_NO_NID is detected. > > > > > > Suggested-by: David Hildenbrand <david@redhat.com> > > > Signed-off-by: Jia He <justin.he@arm.com> > > > --- > > > arch/arm64/mm/numa.c | 5 +++-- > > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > > > diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c > > > index aafcee3e3f7e..7eeb31740248 100644 > > > --- a/arch/arm64/mm/numa.c > > > +++ b/arch/arm64/mm/numa.c > > > @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) > > > > > > /* > > > * We hope that we will be hotplugging memory on nodes we already know > > about, > > > - * such that acpi_get_node() succeeds and we never fall back to this... > > > + * such that acpi_get_node() succeeds. But when SRAT is not present, > > the node > > > + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback > > option. > > > */ > > > int memory_add_physaddr_to_nid(u64 addr) > > > { > > > - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", > > addr); > > > return 0; > > > } > > > +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); > > > > Does it make sense to export a noop function? Wouldn't make more sense > > to simply make it static inline somewhere in a header? I haven't checked > > whether there is an easy way to do that sanely bu this just hit my eyes. > > Okay, I can make a change in memory_hotplug.h, sth like: > --- a/include/linux/memory_hotplug.h > +++ b/include/linux/memory_hotplug.h > @@ -149,13 +149,13 @@ int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages, > struct mhp_params *params); > #endif /* ARCH_HAS_ADD_PAGES */ > > -#ifdef CONFIG_NUMA > -extern int memory_add_physaddr_to_nid(u64 start); > -#else > +#if !defined(CONFIG_NUMA) || !defined(memory_add_physaddr_to_nid) > static inline int memory_add_physaddr_to_nid(u64 start) > { > return 0; > } > +#else > +extern int memory_add_physaddr_to_nid(u64 start); > #endif > > And then check the memory_add_physaddr_to_nid() helper on all arches, > if it is noop(return 0), I can simply remove it. > if it is not noop, after the helper, > #define memory_add_physaddr_to_nid > > What do you think of this proposal? Especially for architectures that use memblock info for numa info (which seems to be everyone except x86) why not implement a generic memory_add_physaddr_to_nid() that does: int memory_add_physaddr_to_nid(u64 addr) { unsigned long start_pfn, end_pfn, pfn = PHYS_PFN(addr); int nid; for_each_online_node(nid) { get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); if (pfn >= start_pfn && pfn <= end_pfn) return nid; } return NUMA_NO_NODE; } ^ permalink raw reply [flat|nested] 46+ messages in thread
* RE: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-08 3:56 ` Dan Williams @ 2020-07-08 4:08 ` Justin He 2020-07-08 4:27 ` Dan Williams 2020-07-08 5:32 ` Mike Rapoport 1 sibling, 1 reply; 46+ messages in thread From: Justin He @ 2020-07-08 4:08 UTC (permalink / raw) To: Dan Williams Cc: Michal Hocko, David Hildenbrand, Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Mike Rapoport, Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm, linux-nvdimm, Kaly Xin Hi Dan > -----Original Message----- > From: Dan Williams <dan.j.williams@intel.com> > Sent: Wednesday, July 8, 2020 11:57 AM > To: Justin He <Justin.He@arm.com> > Cc: Michal Hocko <mhocko@kernel.org>; David Hildenbrand <david@redhat.com>; > Catalin Marinas <Catalin.Marinas@arm.com>; Will Deacon <will@kernel.org>; > Vishal Verma <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com>; > Andrew Morton <akpm@linux-foundation.org>; Mike Rapoport > <rppt@linux.ibm.com>; Baoquan He <bhe@redhat.com>; Chuhong Yuan > <hslester96@gmail.com>; linux-arm-kernel@lists.infradead.org; linux- > kernel@vger.kernel.org; linux-mm@kvack.org; linux-nvdimm@lists.01.org; > Kaly Xin <Kaly.Xin@arm.com> > Subject: Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid > as EXPORT_SYMBOL_GPL > > On Tue, Jul 7, 2020 at 7:20 PM Justin He <Justin.He@arm.com> wrote: > > > > Hi Michal and David > > > > > -----Original Message----- > > > From: Michal Hocko <mhocko@kernel.org> > > > Sent: Tuesday, July 7, 2020 7:55 PM > > > To: Justin He <Justin.He@arm.com> > > > Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Will Deacon > > > <will@kernel.org>; Dan Williams <dan.j.williams@intel.com>; Vishal > Verma > > > <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com>; Andrew > > > Morton <akpm@linux-foundation.org>; Mike Rapoport <rppt@linux.ibm.com>; > > > Baoquan He <bhe@redhat.com>; Chuhong Yuan <hslester96@gmail.com>; > linux- > > > arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux- > > > mm@kvack.org; linux-nvdimm@lists.01.org; Kaly Xin <Kaly.Xin@arm.com> > > > Subject: Re: [PATCH v2 1/3] arm64/numa: export > memory_add_physaddr_to_nid > > > as EXPORT_SYMBOL_GPL > > > > > > On Tue 07-07-20 13:59:15, Jia He wrote: > > > > This exports memory_add_physaddr_to_nid() for module driver to use. > > > > > > > > memory_add_physaddr_to_nid() is a fallback option to get the nid in > case > > > > NUMA_NO_NID is detected. > > > > > > > > Suggested-by: David Hildenbrand <david@redhat.com> > > > > Signed-off-by: Jia He <justin.he@arm.com> > > > > --- > > > > arch/arm64/mm/numa.c | 5 +++-- > > > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c > > > > index aafcee3e3f7e..7eeb31740248 100644 > > > > --- a/arch/arm64/mm/numa.c > > > > +++ b/arch/arm64/mm/numa.c > > > > @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) > > > > > > > > /* > > > > * We hope that we will be hotplugging memory on nodes we already > know > > > about, > > > > - * such that acpi_get_node() succeeds and we never fall back to > this... > > > > + * such that acpi_get_node() succeeds. But when SRAT is not present, > > > the node > > > > + * id may be probed as NUMA_NO_NODE by acpi, Here provide a > fallback > > > option. > > > > */ > > > > int memory_add_physaddr_to_nid(u64 addr) > > > > { > > > > - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", > > > addr); > > > > return 0; > > > > } > > > > +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); > > > > > > Does it make sense to export a noop function? Wouldn't make more sense > > > to simply make it static inline somewhere in a header? I haven't > checked > > > whether there is an easy way to do that sanely bu this just hit my > eyes. > > > > Okay, I can make a change in memory_hotplug.h, sth like: > > --- a/include/linux/memory_hotplug.h > > +++ b/include/linux/memory_hotplug.h > > @@ -149,13 +149,13 @@ int add_pages(int nid, unsigned long start_pfn, > unsigned long nr_pages, > > struct mhp_params *params); > > #endif /* ARCH_HAS_ADD_PAGES */ > > > > -#ifdef CONFIG_NUMA > > -extern int memory_add_physaddr_to_nid(u64 start); > > -#else > > +#if !defined(CONFIG_NUMA) || !defined(memory_add_physaddr_to_nid) > > static inline int memory_add_physaddr_to_nid(u64 start) > > { > > return 0; > > } > > +#else > > +extern int memory_add_physaddr_to_nid(u64 start); > > #endif > > > > And then check the memory_add_physaddr_to_nid() helper on all arches, > > if it is noop(return 0), I can simply remove it. > > if it is not noop, after the helper, > > #define memory_add_physaddr_to_nid > > > > What do you think of this proposal? > > Especially for architectures that use memblock info for numa info > (which seems to be everyone except x86) why not implement a generic > memory_add_physaddr_to_nid() that does: > > int memory_add_physaddr_to_nid(u64 addr) > { > unsigned long start_pfn, end_pfn, pfn = PHYS_PFN(addr); > int nid; > > for_each_online_node(nid) { > get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); > if (pfn >= start_pfn && pfn <= end_pfn) > return nid; > } > return NUMA_NO_NODE; > } Thanks for your suggestion, Could I wrap the codes and let memory_add_physaddr_to_nid simply invoke phys_to_target_node()? -- Cheers, Justin (Jia He) ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-08 4:08 ` Justin He @ 2020-07-08 4:27 ` Dan Williams 2020-07-08 6:22 ` Mike Rapoport 0 siblings, 1 reply; 46+ messages in thread From: Dan Williams @ 2020-07-08 4:27 UTC (permalink / raw) To: Justin He Cc: Michal Hocko, David Hildenbrand, Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Mike Rapoport, Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm, linux-nvdimm, Kaly Xin On Tue, Jul 7, 2020 at 9:08 PM Justin He <Justin.He@arm.com> wrote: [..] > > Especially for architectures that use memblock info for numa info > > (which seems to be everyone except x86) why not implement a generic > > memory_add_physaddr_to_nid() that does: > > > > int memory_add_physaddr_to_nid(u64 addr) > > { > > unsigned long start_pfn, end_pfn, pfn = PHYS_PFN(addr); > > int nid; > > > > for_each_online_node(nid) { > > get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); > > if (pfn >= start_pfn && pfn <= end_pfn) > > return nid; > > } > > return NUMA_NO_NODE; > > } > > Thanks for your suggestion, > Could I wrap the codes and let memory_add_physaddr_to_nid simply invoke > phys_to_target_node()? I think it needs to be the reverse. phys_to_target_node() should call memory_add_physaddr_to_nid() by default, but fall back to searching reserved memory address ranges in memblock. See phys_to_target_node() in arch/x86/mm/numa.c. That one uses numa_meminfo instead of memblock, but the principle is the same i.e. that a target node may not be represented in memblock.memory, but memblock.reserved. I'm working on a patch to provide a function similar to get_pfn_range_for_nid() that operates on reserved memory. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-08 4:27 ` Dan Williams @ 2020-07-08 6:22 ` Mike Rapoport 2020-07-08 6:53 ` Dan Williams 2020-07-08 6:59 ` David Hildenbrand 0 siblings, 2 replies; 46+ messages in thread From: Mike Rapoport @ 2020-07-08 6:22 UTC (permalink / raw) To: Dan Williams Cc: Justin He, Michal Hocko, David Hildenbrand, Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm, linux-nvdimm, Kaly Xin On Tue, Jul 07, 2020 at 09:27:43PM -0700, Dan Williams wrote: > On Tue, Jul 7, 2020 at 9:08 PM Justin He <Justin.He@arm.com> wrote: > [..] > > > Especially for architectures that use memblock info for numa info > > > (which seems to be everyone except x86) why not implement a generic > > > memory_add_physaddr_to_nid() that does: > > > > > > int memory_add_physaddr_to_nid(u64 addr) > > > { > > > unsigned long start_pfn, end_pfn, pfn = PHYS_PFN(addr); > > > int nid; > > > > > > for_each_online_node(nid) { > > > get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); > > > if (pfn >= start_pfn && pfn <= end_pfn) > > > return nid; > > > } > > > return NUMA_NO_NODE; > > > } > > > > Thanks for your suggestion, > > Could I wrap the codes and let memory_add_physaddr_to_nid simply invoke > > phys_to_target_node()? > > I think it needs to be the reverse. phys_to_target_node() should call > memory_add_physaddr_to_nid() by default, but fall back to searching > reserved memory address ranges in memblock. See phys_to_target_node() > in arch/x86/mm/numa.c. That one uses numa_meminfo instead of memblock, > but the principle is the same i.e. that a target node may not be > represented in memblock.memory, but memblock.reserved. I'm working on > a patch to provide a function similar to get_pfn_range_for_nid() that > operates on reserved memory. Do we really need yet another memblock iterator? I think only x86 has memory that is not in memblock.memory but only in memblock.reserved. -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-08 6:22 ` Mike Rapoport @ 2020-07-08 6:53 ` Dan Williams 2020-07-08 6:59 ` David Hildenbrand 1 sibling, 0 replies; 46+ messages in thread From: Dan Williams @ 2020-07-08 6:53 UTC (permalink / raw) To: Mike Rapoport Cc: Justin He, Michal Hocko, David Hildenbrand, Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm, linux-nvdimm, Kaly Xin On Tue, Jul 7, 2020 at 11:22 PM Mike Rapoport <rppt@linux.ibm.com> wrote: [..] > > > Thanks for your suggestion, > > > Could I wrap the codes and let memory_add_physaddr_to_nid simply invoke > > > phys_to_target_node()? > > > > I think it needs to be the reverse. phys_to_target_node() should call > > memory_add_physaddr_to_nid() by default, but fall back to searching > > reserved memory address ranges in memblock. See phys_to_target_node() > > in arch/x86/mm/numa.c. That one uses numa_meminfo instead of memblock, > > but the principle is the same i.e. that a target node may not be > > represented in memblock.memory, but memblock.reserved. I'm working on > > a patch to provide a function similar to get_pfn_range_for_nid() that > > operates on reserved memory. > > Do we really need yet another memblock iterator? > I think only x86 has memory that is not in memblock.memory but only in > memblock.reserved. Well, that's what led me here. EFI has introduced a memory attribute called "EFI Special Purpose Memory". I mapped it to a new Linux concept called Soft Reserved memory (commit b617c5266eed "efi: Common enable/disable infrastructure for EFI soft reservation"). The driver I want to claim that memory, device-dax, wants to be able to look up numa information for an address range that is marked reserved in memblock. The device-dax facility has the ability to either let userspace map a device, or assign the memory backing that device to the page allocator. In both scenarios the driver needs numa info to either populate the 'numa_node' property of the device in sysfs, or to pass an node-id to add_memory_resource() when it is hot-plugged. I was thwarted by the lack of phys_to_target_node() on arm64, and rather than add another stub like memory_add_physaddr_to_nid() I wanted to see if it could be solved properly / generically with memblock data. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-08 6:22 ` Mike Rapoport 2020-07-08 6:53 ` Dan Williams @ 2020-07-08 6:59 ` David Hildenbrand 2020-07-08 7:04 ` Dan Williams 1 sibling, 1 reply; 46+ messages in thread From: David Hildenbrand @ 2020-07-08 6:59 UTC (permalink / raw) To: Mike Rapoport, Dan Williams Cc: Justin He, Michal Hocko, Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm, linux-nvdimm, Kaly Xin On 08.07.20 08:22, Mike Rapoport wrote: > On Tue, Jul 07, 2020 at 09:27:43PM -0700, Dan Williams wrote: >> On Tue, Jul 7, 2020 at 9:08 PM Justin He <Justin.He@arm.com> wrote: >> [..] >>>> Especially for architectures that use memblock info for numa info >>>> (which seems to be everyone except x86) why not implement a generic >>>> memory_add_physaddr_to_nid() that does: >>>> >>>> int memory_add_physaddr_to_nid(u64 addr) >>>> { >>>> unsigned long start_pfn, end_pfn, pfn = PHYS_PFN(addr); >>>> int nid; >>>> >>>> for_each_online_node(nid) { >>>> get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); >>>> if (pfn >= start_pfn && pfn <= end_pfn) >>>> return nid; >>>> } >>>> return NUMA_NO_NODE; >>>> } >>> >>> Thanks for your suggestion, >>> Could I wrap the codes and let memory_add_physaddr_to_nid simply invoke >>> phys_to_target_node()? >> >> I think it needs to be the reverse. phys_to_target_node() should call >> memory_add_physaddr_to_nid() by default, but fall back to searching >> reserved memory address ranges in memblock. See phys_to_target_node() >> in arch/x86/mm/numa.c. That one uses numa_meminfo instead of memblock, >> but the principle is the same i.e. that a target node may not be >> represented in memblock.memory, but memblock.reserved. I'm working on >> a patch to provide a function similar to get_pfn_range_for_nid() that >> operates on reserved memory. > > Do we really need yet another memblock iterator? > I think only x86 has memory that is not in memblock.memory but only in > memblock.reserved. Reading about abusing the memblock allcoator once again in memory hotplug paths makes me shiver. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-08 6:59 ` David Hildenbrand @ 2020-07-08 7:04 ` Dan Williams 2020-07-08 7:16 ` David Hildenbrand 0 siblings, 1 reply; 46+ messages in thread From: Dan Williams @ 2020-07-08 7:04 UTC (permalink / raw) To: David Hildenbrand Cc: Mike Rapoport, Justin He, Michal Hocko, Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm, linux-nvdimm, Kaly Xin On Tue, Jul 7, 2020 at 11:59 PM David Hildenbrand <david@redhat.com> wrote: > > On 08.07.20 08:22, Mike Rapoport wrote: > > On Tue, Jul 07, 2020 at 09:27:43PM -0700, Dan Williams wrote: > >> On Tue, Jul 7, 2020 at 9:08 PM Justin He <Justin.He@arm.com> wrote: > >> [..] > >>>> Especially for architectures that use memblock info for numa info > >>>> (which seems to be everyone except x86) why not implement a generic > >>>> memory_add_physaddr_to_nid() that does: > >>>> > >>>> int memory_add_physaddr_to_nid(u64 addr) > >>>> { > >>>> unsigned long start_pfn, end_pfn, pfn = PHYS_PFN(addr); > >>>> int nid; > >>>> > >>>> for_each_online_node(nid) { > >>>> get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); > >>>> if (pfn >= start_pfn && pfn <= end_pfn) > >>>> return nid; > >>>> } > >>>> return NUMA_NO_NODE; > >>>> } > >>> > >>> Thanks for your suggestion, > >>> Could I wrap the codes and let memory_add_physaddr_to_nid simply invoke > >>> phys_to_target_node()? > >> > >> I think it needs to be the reverse. phys_to_target_node() should call > >> memory_add_physaddr_to_nid() by default, but fall back to searching > >> reserved memory address ranges in memblock. See phys_to_target_node() > >> in arch/x86/mm/numa.c. That one uses numa_meminfo instead of memblock, > >> but the principle is the same i.e. that a target node may not be > >> represented in memblock.memory, but memblock.reserved. I'm working on > >> a patch to provide a function similar to get_pfn_range_for_nid() that > >> operates on reserved memory. > > > > Do we really need yet another memblock iterator? > > I think only x86 has memory that is not in memblock.memory but only in > > memblock.reserved. > > Reading about abusing the memblock allcoator once again in memory > hotplug paths makes me shiver. Technical reasoning please? arm64 numa information is established from memblock data. It seems counterproductive to ignore that fact if we're already touching memory_add_physaddr_to_nid() and have a use case for a driver to call it. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-08 7:04 ` Dan Williams @ 2020-07-08 7:16 ` David Hildenbrand 2020-07-08 7:43 ` Mike Rapoport 0 siblings, 1 reply; 46+ messages in thread From: David Hildenbrand @ 2020-07-08 7:16 UTC (permalink / raw) To: Dan Williams Cc: Mike Rapoport, Justin He, Michal Hocko, Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm, linux-nvdimm, Kaly Xin On 08.07.20 09:04, Dan Williams wrote: > On Tue, Jul 7, 2020 at 11:59 PM David Hildenbrand <david@redhat.com> wrote: >> >> On 08.07.20 08:22, Mike Rapoport wrote: >>> On Tue, Jul 07, 2020 at 09:27:43PM -0700, Dan Williams wrote: >>>> On Tue, Jul 7, 2020 at 9:08 PM Justin He <Justin.He@arm.com> wrote: >>>> [..] >>>>>> Especially for architectures that use memblock info for numa info >>>>>> (which seems to be everyone except x86) why not implement a generic >>>>>> memory_add_physaddr_to_nid() that does: >>>>>> >>>>>> int memory_add_physaddr_to_nid(u64 addr) >>>>>> { >>>>>> unsigned long start_pfn, end_pfn, pfn = PHYS_PFN(addr); >>>>>> int nid; >>>>>> >>>>>> for_each_online_node(nid) { >>>>>> get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); >>>>>> if (pfn >= start_pfn && pfn <= end_pfn) >>>>>> return nid; >>>>>> } >>>>>> return NUMA_NO_NODE; >>>>>> } >>>>> >>>>> Thanks for your suggestion, >>>>> Could I wrap the codes and let memory_add_physaddr_to_nid simply invoke >>>>> phys_to_target_node()? >>>> >>>> I think it needs to be the reverse. phys_to_target_node() should call >>>> memory_add_physaddr_to_nid() by default, but fall back to searching >>>> reserved memory address ranges in memblock. See phys_to_target_node() >>>> in arch/x86/mm/numa.c. That one uses numa_meminfo instead of memblock, >>>> but the principle is the same i.e. that a target node may not be >>>> represented in memblock.memory, but memblock.reserved. I'm working on >>>> a patch to provide a function similar to get_pfn_range_for_nid() that >>>> operates on reserved memory. >>> >>> Do we really need yet another memblock iterator? >>> I think only x86 has memory that is not in memblock.memory but only in >>> memblock.reserved. >> >> Reading about abusing the memblock allcoator once again in memory >> hotplug paths makes me shiver. > > Technical reasoning please? ARCH_KEEP_MEMBLOCK is (AFAIK) only a hack for arm64 to implement pfn_valid(), because they zap out individual pages corresponding to memory holes of full sections. I am not a friend of adding more post-init code to rely on memblock data. It just makes it harder to eventually get rid of ARCH_KEEP_MEMBLOCK. > > arm64 numa information is established from memblock data. It seems > counterproductive to ignore that fact if we're already touching > memory_add_physaddr_to_nid() and have a use case for a driver to call > it. ... and we are trying to handle the "only a single dummy node" case (patch #2), or what am I missing? What is there to optimize currently? -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-08 7:16 ` David Hildenbrand @ 2020-07-08 7:43 ` Mike Rapoport 0 siblings, 0 replies; 46+ messages in thread From: Mike Rapoport @ 2020-07-08 7:43 UTC (permalink / raw) To: David Hildenbrand Cc: Dan Williams, Justin He, Michal Hocko, Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm, linux-nvdimm, Kaly Xin On Wed, Jul 08, 2020 at 09:16:01AM +0200, David Hildenbrand wrote: > On 08.07.20 09:04, Dan Williams wrote: > > On Tue, Jul 7, 2020 at 11:59 PM David Hildenbrand <david@redhat.com> wrote: > >> > >> On 08.07.20 08:22, Mike Rapoport wrote: > >>> On Tue, Jul 07, 2020 at 09:27:43PM -0700, Dan Williams wrote: > >>>> On Tue, Jul 7, 2020 at 9:08 PM Justin He <Justin.He@arm.com> wrote: > >>>> [..] > >>>>>> Especially for architectures that use memblock info for numa info > >>>>>> (which seems to be everyone except x86) why not implement a generic > >>>>>> memory_add_physaddr_to_nid() that does: > >>>>>> > >>>>>> int memory_add_physaddr_to_nid(u64 addr) > >>>>>> { > >>>>>> unsigned long start_pfn, end_pfn, pfn = PHYS_PFN(addr); > >>>>>> int nid; > >>>>>> > >>>>>> for_each_online_node(nid) { > >>>>>> get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); > >>>>>> if (pfn >= start_pfn && pfn <= end_pfn) > >>>>>> return nid; > >>>>>> } > >>>>>> return NUMA_NO_NODE; > >>>>>> } > >>>>> > >>>>> Thanks for your suggestion, > >>>>> Could I wrap the codes and let memory_add_physaddr_to_nid simply invoke > >>>>> phys_to_target_node()? > >>>> > >>>> I think it needs to be the reverse. phys_to_target_node() should call > >>>> memory_add_physaddr_to_nid() by default, but fall back to searching > >>>> reserved memory address ranges in memblock. See phys_to_target_node() > >>>> in arch/x86/mm/numa.c. That one uses numa_meminfo instead of memblock, > >>>> but the principle is the same i.e. that a target node may not be > >>>> represented in memblock.memory, but memblock.reserved. I'm working on > >>>> a patch to provide a function similar to get_pfn_range_for_nid() that > >>>> operates on reserved memory. > >>> > >>> Do we really need yet another memblock iterator? > >>> I think only x86 has memory that is not in memblock.memory but only in > >>> memblock.reserved. > >> > >> Reading about abusing the memblock allcoator once again in memory > >> hotplug paths makes me shiver. > > > > Technical reasoning please? > > ARCH_KEEP_MEMBLOCK is (AFAIK) only a hack for arm64 to implement > pfn_valid(), because they zap out individual pages corresponding to > memory holes of full sections. > > I am not a friend of adding more post-init code to rely on memblock > data. It just makes it harder to eventually get rid of ARCH_KEEP_MEMBLOCK. The most heavy user of memblock in post-init code is powerpc. It won't be easy to get rid of it there. > > arm64 numa information is established from memblock data. It seems > > counterproductive to ignore that fact if we're already touching > > memory_add_physaddr_to_nid() and have a use case for a driver to call > > it. > > ... and we are trying to handle the "only a single dummy node" case > (patch #2), or what am I missing? What is there to optimize currently? > > -- > Thanks, > > David / dhildenb > -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-08 3:56 ` Dan Williams 2020-07-08 4:08 ` Justin He @ 2020-07-08 5:32 ` Mike Rapoport 2020-07-08 5:48 ` Dan Williams 1 sibling, 1 reply; 46+ messages in thread From: Mike Rapoport @ 2020-07-08 5:32 UTC (permalink / raw) To: Dan Williams Cc: Justin He, Michal Hocko, David Hildenbrand, Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm, linux-nvdimm, Kaly Xin On Tue, Jul 07, 2020 at 08:56:36PM -0700, Dan Williams wrote: > On Tue, Jul 7, 2020 at 7:20 PM Justin He <Justin.He@arm.com> wrote: > > > > Hi Michal and David > > > > > -----Original Message----- > > > From: Michal Hocko <mhocko@kernel.org> > > > Sent: Tuesday, July 7, 2020 7:55 PM > > > To: Justin He <Justin.He@arm.com> > > > Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Will Deacon > > > <will@kernel.org>; Dan Williams <dan.j.williams@intel.com>; Vishal Verma > > > <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com>; Andrew > > > Morton <akpm@linux-foundation.org>; Mike Rapoport <rppt@linux.ibm.com>; > > > Baoquan He <bhe@redhat.com>; Chuhong Yuan <hslester96@gmail.com>; linux- > > > arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux- > > > mm@kvack.org; linux-nvdimm@lists.01.org; Kaly Xin <Kaly.Xin@arm.com> > > > Subject: Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid > > > as EXPORT_SYMBOL_GPL > > > > > > On Tue 07-07-20 13:59:15, Jia He wrote: > > > > This exports memory_add_physaddr_to_nid() for module driver to use. > > > > > > > > memory_add_physaddr_to_nid() is a fallback option to get the nid in case > > > > NUMA_NO_NID is detected. > > > > > > > > Suggested-by: David Hildenbrand <david@redhat.com> > > > > Signed-off-by: Jia He <justin.he@arm.com> > > > > --- > > > > arch/arm64/mm/numa.c | 5 +++-- > > > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c > > > > index aafcee3e3f7e..7eeb31740248 100644 > > > > --- a/arch/arm64/mm/numa.c > > > > +++ b/arch/arm64/mm/numa.c > > > > @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) > > > > > > > > /* > > > > * We hope that we will be hotplugging memory on nodes we already know > > > about, > > > > - * such that acpi_get_node() succeeds and we never fall back to this... > > > > + * such that acpi_get_node() succeeds. But when SRAT is not present, > > > the node > > > > + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback > > > option. > > > > */ > > > > int memory_add_physaddr_to_nid(u64 addr) > > > > { > > > > - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", > > > addr); > > > > return 0; > > > > } > > > > +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); > > > > > > Does it make sense to export a noop function? Wouldn't make more sense > > > to simply make it static inline somewhere in a header? I haven't checked > > > whether there is an easy way to do that sanely bu this just hit my eyes. > > > > Okay, I can make a change in memory_hotplug.h, sth like: > > --- a/include/linux/memory_hotplug.h > > +++ b/include/linux/memory_hotplug.h > > @@ -149,13 +149,13 @@ int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages, > > struct mhp_params *params); > > #endif /* ARCH_HAS_ADD_PAGES */ > > > > -#ifdef CONFIG_NUMA > > -extern int memory_add_physaddr_to_nid(u64 start); > > -#else > > +#if !defined(CONFIG_NUMA) || !defined(memory_add_physaddr_to_nid) > > static inline int memory_add_physaddr_to_nid(u64 start) > > { > > return 0; > > } > > +#else > > +extern int memory_add_physaddr_to_nid(u64 start); > > #endif > > > > And then check the memory_add_physaddr_to_nid() helper on all arches, > > if it is noop(return 0), I can simply remove it. > > if it is not noop, after the helper, > > #define memory_add_physaddr_to_nid > > > > What do you think of this proposal? > > Especially for architectures that use memblock info for numa info > (which seems to be everyone except x86) why not implement a generic > memory_add_physaddr_to_nid() that does: That would be only arm64. > int memory_add_physaddr_to_nid(u64 addr) > { > unsigned long start_pfn, end_pfn, pfn = PHYS_PFN(addr); > int nid; > > for_each_online_node(nid) { > get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); > if (pfn >= start_pfn && pfn <= end_pfn) > return nid; > } > return NUMA_NO_NODE; > } -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-08 5:32 ` Mike Rapoport @ 2020-07-08 5:48 ` Dan Williams 2020-07-08 6:19 ` Mike Rapoport 2020-07-08 6:56 ` Justin He 0 siblings, 2 replies; 46+ messages in thread From: Dan Williams @ 2020-07-08 5:48 UTC (permalink / raw) To: Mike Rapoport Cc: Justin He, Michal Hocko, David Hildenbrand, Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm, linux-nvdimm, Kaly Xin On Tue, Jul 7, 2020 at 10:33 PM Mike Rapoport <rppt@linux.ibm.com> wrote: > > On Tue, Jul 07, 2020 at 08:56:36PM -0700, Dan Williams wrote: > > On Tue, Jul 7, 2020 at 7:20 PM Justin He <Justin.He@arm.com> wrote: > > > > > > Hi Michal and David > > > > > > > -----Original Message----- > > > > From: Michal Hocko <mhocko@kernel.org> > > > > Sent: Tuesday, July 7, 2020 7:55 PM > > > > To: Justin He <Justin.He@arm.com> > > > > Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Will Deacon > > > > <will@kernel.org>; Dan Williams <dan.j.williams@intel.com>; Vishal Verma > > > > <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com>; Andrew > > > > Morton <akpm@linux-foundation.org>; Mike Rapoport <rppt@linux.ibm.com>; > > > > Baoquan He <bhe@redhat.com>; Chuhong Yuan <hslester96@gmail.com>; linux- > > > > arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux- > > > > mm@kvack.org; linux-nvdimm@lists.01.org; Kaly Xin <Kaly.Xin@arm.com> > > > > Subject: Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid > > > > as EXPORT_SYMBOL_GPL > > > > > > > > On Tue 07-07-20 13:59:15, Jia He wrote: > > > > > This exports memory_add_physaddr_to_nid() for module driver to use. > > > > > > > > > > memory_add_physaddr_to_nid() is a fallback option to get the nid in case > > > > > NUMA_NO_NID is detected. > > > > > > > > > > Suggested-by: David Hildenbrand <david@redhat.com> > > > > > Signed-off-by: Jia He <justin.he@arm.com> > > > > > --- > > > > > arch/arm64/mm/numa.c | 5 +++-- > > > > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c > > > > > index aafcee3e3f7e..7eeb31740248 100644 > > > > > --- a/arch/arm64/mm/numa.c > > > > > +++ b/arch/arm64/mm/numa.c > > > > > @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) > > > > > > > > > > /* > > > > > * We hope that we will be hotplugging memory on nodes we already know > > > > about, > > > > > - * such that acpi_get_node() succeeds and we never fall back to this... > > > > > + * such that acpi_get_node() succeeds. But when SRAT is not present, > > > > the node > > > > > + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback > > > > option. > > > > > */ > > > > > int memory_add_physaddr_to_nid(u64 addr) > > > > > { > > > > > - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", > > > > addr); > > > > > return 0; > > > > > } > > > > > +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); > > > > > > > > Does it make sense to export a noop function? Wouldn't make more sense > > > > to simply make it static inline somewhere in a header? I haven't checked > > > > whether there is an easy way to do that sanely bu this just hit my eyes. > > > > > > Okay, I can make a change in memory_hotplug.h, sth like: > > > --- a/include/linux/memory_hotplug.h > > > +++ b/include/linux/memory_hotplug.h > > > @@ -149,13 +149,13 @@ int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages, > > > struct mhp_params *params); > > > #endif /* ARCH_HAS_ADD_PAGES */ > > > > > > -#ifdef CONFIG_NUMA > > > -extern int memory_add_physaddr_to_nid(u64 start); > > > -#else > > > +#if !defined(CONFIG_NUMA) || !defined(memory_add_physaddr_to_nid) > > > static inline int memory_add_physaddr_to_nid(u64 start) > > > { > > > return 0; > > > } > > > +#else > > > +extern int memory_add_physaddr_to_nid(u64 start); > > > #endif > > > > > > And then check the memory_add_physaddr_to_nid() helper on all arches, > > > if it is noop(return 0), I can simply remove it. > > > if it is not noop, after the helper, > > > #define memory_add_physaddr_to_nid > > > > > > What do you think of this proposal? > > > > Especially for architectures that use memblock info for numa info > > (which seems to be everyone except x86) why not implement a generic > > memory_add_physaddr_to_nid() that does: > > That would be only arm64. > Darn, I saw ARCH_KEEP_MEMBLOCK and had delusions of grandeur that it could solve my numa api woes. At least for x86 the problem is already solved with reserved numa_meminfo, but now I'm trying to write generic drivers that use those apis and finding these gaps on other archs. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-08 5:48 ` Dan Williams @ 2020-07-08 6:19 ` Mike Rapoport 2020-07-08 6:44 ` Dan Williams 2020-07-08 6:56 ` Justin He 1 sibling, 1 reply; 46+ messages in thread From: Mike Rapoport @ 2020-07-08 6:19 UTC (permalink / raw) To: Dan Williams Cc: Justin He, Michal Hocko, David Hildenbrand, Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm, linux-nvdimm, Kaly Xin On Tue, Jul 07, 2020 at 10:48:19PM -0700, Dan Williams wrote: > On Tue, Jul 7, 2020 at 10:33 PM Mike Rapoport <rppt@linux.ibm.com> wrote: > > > > On Tue, Jul 07, 2020 at 08:56:36PM -0700, Dan Williams wrote: > > > On Tue, Jul 7, 2020 at 7:20 PM Justin He <Justin.He@arm.com> wrote: > > > > > > > > Hi Michal and David > > > > > > > > > -----Original Message----- > > > > > From: Michal Hocko <mhocko@kernel.org> > > > > > Sent: Tuesday, July 7, 2020 7:55 PM > > > > > To: Justin He <Justin.He@arm.com> > > > > > Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Will Deacon > > > > > <will@kernel.org>; Dan Williams <dan.j.williams@intel.com>; Vishal Verma > > > > > <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com>; Andrew > > > > > Morton <akpm@linux-foundation.org>; Mike Rapoport <rppt@linux.ibm.com>; > > > > > Baoquan He <bhe@redhat.com>; Chuhong Yuan <hslester96@gmail.com>; linux- > > > > > arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux- > > > > > mm@kvack.org; linux-nvdimm@lists.01.org; Kaly Xin <Kaly.Xin@arm.com> > > > > > Subject: Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid > > > > > as EXPORT_SYMBOL_GPL > > > > > > > > > > On Tue 07-07-20 13:59:15, Jia He wrote: > > > > > > This exports memory_add_physaddr_to_nid() for module driver to use. > > > > > > > > > > > > memory_add_physaddr_to_nid() is a fallback option to get the nid in case > > > > > > NUMA_NO_NID is detected. > > > > > > > > > > > > Suggested-by: David Hildenbrand <david@redhat.com> > > > > > > Signed-off-by: Jia He <justin.he@arm.com> > > > > > > --- > > > > > > arch/arm64/mm/numa.c | 5 +++-- > > > > > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > > > > > > > > > diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c > > > > > > index aafcee3e3f7e..7eeb31740248 100644 > > > > > > --- a/arch/arm64/mm/numa.c > > > > > > +++ b/arch/arm64/mm/numa.c > > > > > > @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) > > > > > > > > > > > > /* > > > > > > * We hope that we will be hotplugging memory on nodes we already know > > > > > about, > > > > > > - * such that acpi_get_node() succeeds and we never fall back to this... > > > > > > + * such that acpi_get_node() succeeds. But when SRAT is not present, > > > > > the node > > > > > > + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback > > > > > option. > > > > > > */ > > > > > > int memory_add_physaddr_to_nid(u64 addr) > > > > > > { > > > > > > - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", > > > > > addr); > > > > > > return 0; > > > > > > } > > > > > > +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); > > > > > > > > > > Does it make sense to export a noop function? Wouldn't make more sense > > > > > to simply make it static inline somewhere in a header? I haven't checked > > > > > whether there is an easy way to do that sanely bu this just hit my eyes. > > > > > > > > Okay, I can make a change in memory_hotplug.h, sth like: > > > > --- a/include/linux/memory_hotplug.h > > > > +++ b/include/linux/memory_hotplug.h > > > > @@ -149,13 +149,13 @@ int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages, > > > > struct mhp_params *params); > > > > #endif /* ARCH_HAS_ADD_PAGES */ > > > > > > > > -#ifdef CONFIG_NUMA > > > > -extern int memory_add_physaddr_to_nid(u64 start); > > > > -#else > > > > +#if !defined(CONFIG_NUMA) || !defined(memory_add_physaddr_to_nid) > > > > static inline int memory_add_physaddr_to_nid(u64 start) > > > > { > > > > return 0; > > > > } > > > > +#else > > > > +extern int memory_add_physaddr_to_nid(u64 start); > > > > #endif > > > > > > > > And then check the memory_add_physaddr_to_nid() helper on all arches, > > > > if it is noop(return 0), I can simply remove it. > > > > if it is not noop, after the helper, > > > > #define memory_add_physaddr_to_nid > > > > > > > > What do you think of this proposal? > > > > > > Especially for architectures that use memblock info for numa info > > > (which seems to be everyone except x86) why not implement a generic > > > memory_add_physaddr_to_nid() that does: > > > > That would be only arm64. > > > > Darn, I saw ARCH_KEEP_MEMBLOCK and had delusions of grandeur that it > could solve my numa api woes. At least for x86 the problem is already > solved with reserved numa_meminfo, but now I'm trying to write generic > drivers that use those apis and finding these gaps on other archs. I'm not sure if x86's numa_meminfo is a part of the solution or a part of the problem ;-) Anyway, this all indeed messy and there is a lot to improve there. -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-08 6:19 ` Mike Rapoport @ 2020-07-08 6:44 ` Dan Williams 0 siblings, 0 replies; 46+ messages in thread From: Dan Williams @ 2020-07-08 6:44 UTC (permalink / raw) To: Mike Rapoport Cc: Justin He, Michal Hocko, David Hildenbrand, Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm, linux-nvdimm, Kaly Xin On Tue, Jul 7, 2020 at 11:20 PM Mike Rapoport <rppt@linux.ibm.com> wrote: [..] > > Darn, I saw ARCH_KEEP_MEMBLOCK and had delusions of grandeur that it > > could solve my numa api woes. At least for x86 the problem is already > > solved with reserved numa_meminfo, but now I'm trying to write generic > > drivers that use those apis and finding these gaps on other archs. > > I'm not sure if x86's numa_meminfo is a part of the solution or a part > of the problem ;-) More the latter, but hopefully it can remain an exception and not the rule. ^ permalink raw reply [flat|nested] 46+ messages in thread
* RE: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-08 5:48 ` Dan Williams 2020-07-08 6:19 ` Mike Rapoport @ 2020-07-08 6:56 ` Justin He 2020-07-08 7:00 ` David Hildenbrand 1 sibling, 1 reply; 46+ messages in thread From: Justin He @ 2020-07-08 6:56 UTC (permalink / raw) To: Dan Williams Cc: Michal Hocko, David Hildenbrand, Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm, linux-nvdimm, Kaly Xin, Mike Rapoport Hi Dan > -----Original Message----- > From: Dan Williams <dan.j.williams@intel.com> > Sent: Wednesday, July 8, 2020 1:48 PM > To: Mike Rapoport <rppt@linux.ibm.com> > Cc: Justin He <Justin.He@arm.com>; Michal Hocko <mhocko@kernel.org>; David > Hildenbrand <david@redhat.com>; Catalin Marinas <Catalin.Marinas@arm.com>; > Will Deacon <will@kernel.org>; Vishal Verma <vishal.l.verma@intel.com>; > Dave Jiang <dave.jiang@intel.com>; Andrew Morton <akpm@linux- > foundation.org>; Baoquan He <bhe@redhat.com>; Chuhong Yuan > <hslester96@gmail.com>; linux-arm-kernel@lists.infradead.org; linux- > kernel@vger.kernel.org; linux-mm@kvack.org; linux-nvdimm@lists.01.org; > Kaly Xin <Kaly.Xin@arm.com> > Subject: Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid > as EXPORT_SYMBOL_GPL > > On Tue, Jul 7, 2020 at 10:33 PM Mike Rapoport <rppt@linux.ibm.com> wrote: > > > > On Tue, Jul 07, 2020 at 08:56:36PM -0700, Dan Williams wrote: > > > On Tue, Jul 7, 2020 at 7:20 PM Justin He <Justin.He@arm.com> wrote: > > > > > > > > Hi Michal and David > > > > > > > > > -----Original Message----- > > > > > From: Michal Hocko <mhocko@kernel.org> > > > > > Sent: Tuesday, July 7, 2020 7:55 PM > > > > > To: Justin He <Justin.He@arm.com> > > > > > Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Will Deacon > > > > > <will@kernel.org>; Dan Williams <dan.j.williams@intel.com>; Vishal > Verma > > > > > <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com>; > Andrew > > > > > Morton <akpm@linux-foundation.org>; Mike Rapoport > <rppt@linux.ibm.com>; > > > > > Baoquan He <bhe@redhat.com>; Chuhong Yuan <hslester96@gmail.com>; > linux- > > > > > arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; > linux- > > > > > mm@kvack.org; linux-nvdimm@lists.01.org; Kaly Xin > <Kaly.Xin@arm.com> > > > > > Subject: Re: [PATCH v2 1/3] arm64/numa: export > memory_add_physaddr_to_nid > > > > > as EXPORT_SYMBOL_GPL > > > > > > > > > > On Tue 07-07-20 13:59:15, Jia He wrote: > > > > > > This exports memory_add_physaddr_to_nid() for module driver to > use. > > > > > > > > > > > > memory_add_physaddr_to_nid() is a fallback option to get the nid > in case > > > > > > NUMA_NO_NID is detected. > > > > > > > > > > > > Suggested-by: David Hildenbrand <david@redhat.com> > > > > > > Signed-off-by: Jia He <justin.he@arm.com> > > > > > > --- > > > > > > arch/arm64/mm/numa.c | 5 +++-- > > > > > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > > > > > > > > > diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c > > > > > > index aafcee3e3f7e..7eeb31740248 100644 > > > > > > --- a/arch/arm64/mm/numa.c > > > > > > +++ b/arch/arm64/mm/numa.c > > > > > > @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) > > > > > > > > > > > > /* > > > > > > * We hope that we will be hotplugging memory on nodes we > already know > > > > > about, > > > > > > - * such that acpi_get_node() succeeds and we never fall back to > this... > > > > > > + * such that acpi_get_node() succeeds. But when SRAT is not > present, > > > > > the node > > > > > > + * id may be probed as NUMA_NO_NODE by acpi, Here provide a > fallback > > > > > option. > > > > > > */ > > > > > > int memory_add_physaddr_to_nid(u64 addr) > > > > > > { > > > > > > - pr_warn("Unknown node for memory at 0x%llx, assuming node > 0\n", > > > > > addr); > > > > > > return 0; > > > > > > } > > > > > > +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); > > > > > > > > > > Does it make sense to export a noop function? Wouldn't make more > sense > > > > > to simply make it static inline somewhere in a header? I haven't > checked > > > > > whether there is an easy way to do that sanely bu this just hit my > eyes. > > > > > > > > Okay, I can make a change in memory_hotplug.h, sth like: > > > > --- a/include/linux/memory_hotplug.h > > > > +++ b/include/linux/memory_hotplug.h > > > > @@ -149,13 +149,13 @@ int add_pages(int nid, unsigned long start_pfn, > unsigned long nr_pages, > > > > struct mhp_params *params); > > > > #endif /* ARCH_HAS_ADD_PAGES */ > > > > > > > > -#ifdef CONFIG_NUMA > > > > -extern int memory_add_physaddr_to_nid(u64 start); > > > > -#else > > > > +#if !defined(CONFIG_NUMA) || !defined(memory_add_physaddr_to_nid) > > > > static inline int memory_add_physaddr_to_nid(u64 start) > > > > { > > > > return 0; > > > > } > > > > +#else > > > > +extern int memory_add_physaddr_to_nid(u64 start); > > > > #endif > > > > > > > > And then check the memory_add_physaddr_to_nid() helper on all arches, > > > > if it is noop(return 0), I can simply remove it. > > > > if it is not noop, after the helper, > > > > #define memory_add_physaddr_to_nid > > > > > > > > What do you think of this proposal? > > > > > > Especially for architectures that use memblock info for numa info > > > (which seems to be everyone except x86) why not implement a generic > > > memory_add_physaddr_to_nid() that does: > > > > That would be only arm64. > > > > Darn, I saw ARCH_KEEP_MEMBLOCK and had delusions of grandeur that it > could solve my numa api woes. At least for x86 the problem is already > solved with reserved numa_meminfo, but now I'm trying to write generic > drivers that use those apis and finding these gaps on other archs. Even on arm64, there is a dependency issue in dax_pmem kmem case. If dax pmem uses memory_add_physaddr_to_nid() to decide which node that memblock should add into, get_pfn_range_for_nid() might not have the correct memblock info at that time. That is, get_pfn_range_for_nid() can't get the correct memblock info before add_memory() So IMO, memory_add_physaddr_to_nid() still have to implement as noop on arm64 (return 0) together with sh,s390x? Powerpc, x86,ia64 can use their own implementation. And phys_to_target_node() can use your suggested( for_each_online_node() ...) What do you think of it? Thanks -- Cheers, Justin (Jia He) ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL 2020-07-08 6:56 ` Justin He @ 2020-07-08 7:00 ` David Hildenbrand 0 siblings, 0 replies; 46+ messages in thread From: David Hildenbrand @ 2020-07-08 7:00 UTC (permalink / raw) To: Justin He, Dan Williams Cc: Michal Hocko, Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm, linux-nvdimm, Kaly Xin, Mike Rapoport On 08.07.20 08:56, Justin He wrote: > Hi Dan > >> -----Original Message----- >> From: Dan Williams <dan.j.williams@intel.com> >> Sent: Wednesday, July 8, 2020 1:48 PM >> To: Mike Rapoport <rppt@linux.ibm.com> >> Cc: Justin He <Justin.He@arm.com>; Michal Hocko <mhocko@kernel.org>; David >> Hildenbrand <david@redhat.com>; Catalin Marinas <Catalin.Marinas@arm.com>; >> Will Deacon <will@kernel.org>; Vishal Verma <vishal.l.verma@intel.com>; >> Dave Jiang <dave.jiang@intel.com>; Andrew Morton <akpm@linux- >> foundation.org>; Baoquan He <bhe@redhat.com>; Chuhong Yuan >> <hslester96@gmail.com>; linux-arm-kernel@lists.infradead.org; linux- >> kernel@vger.kernel.org; linux-mm@kvack.org; linux-nvdimm@lists.01.org; >> Kaly Xin <Kaly.Xin@arm.com> >> Subject: Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid >> as EXPORT_SYMBOL_GPL >> >> On Tue, Jul 7, 2020 at 10:33 PM Mike Rapoport <rppt@linux.ibm.com> wrote: >>> >>> On Tue, Jul 07, 2020 at 08:56:36PM -0700, Dan Williams wrote: >>>> On Tue, Jul 7, 2020 at 7:20 PM Justin He <Justin.He@arm.com> wrote: >>>>> >>>>> Hi Michal and David >>>>> >>>>>> -----Original Message----- >>>>>> From: Michal Hocko <mhocko@kernel.org> >>>>>> Sent: Tuesday, July 7, 2020 7:55 PM >>>>>> To: Justin He <Justin.He@arm.com> >>>>>> Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Will Deacon >>>>>> <will@kernel.org>; Dan Williams <dan.j.williams@intel.com>; Vishal >> Verma >>>>>> <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com>; >> Andrew >>>>>> Morton <akpm@linux-foundation.org>; Mike Rapoport >> <rppt@linux.ibm.com>; >>>>>> Baoquan He <bhe@redhat.com>; Chuhong Yuan <hslester96@gmail.com>; >> linux- >>>>>> arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; >> linux- >>>>>> mm@kvack.org; linux-nvdimm@lists.01.org; Kaly Xin >> <Kaly.Xin@arm.com> >>>>>> Subject: Re: [PATCH v2 1/3] arm64/numa: export >> memory_add_physaddr_to_nid >>>>>> as EXPORT_SYMBOL_GPL >>>>>> >>>>>> On Tue 07-07-20 13:59:15, Jia He wrote: >>>>>>> This exports memory_add_physaddr_to_nid() for module driver to >> use. >>>>>>> >>>>>>> memory_add_physaddr_to_nid() is a fallback option to get the nid >> in case >>>>>>> NUMA_NO_NID is detected. >>>>>>> >>>>>>> Suggested-by: David Hildenbrand <david@redhat.com> >>>>>>> Signed-off-by: Jia He <justin.he@arm.com> >>>>>>> --- >>>>>>> arch/arm64/mm/numa.c | 5 +++-- >>>>>>> 1 file changed, 3 insertions(+), 2 deletions(-) >>>>>>> >>>>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c >>>>>>> index aafcee3e3f7e..7eeb31740248 100644 >>>>>>> --- a/arch/arm64/mm/numa.c >>>>>>> +++ b/arch/arm64/mm/numa.c >>>>>>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void) >>>>>>> >>>>>>> /* >>>>>>> * We hope that we will be hotplugging memory on nodes we >> already know >>>>>> about, >>>>>>> - * such that acpi_get_node() succeeds and we never fall back to >> this... >>>>>>> + * such that acpi_get_node() succeeds. But when SRAT is not >> present, >>>>>> the node >>>>>>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a >> fallback >>>>>> option. >>>>>>> */ >>>>>>> int memory_add_physaddr_to_nid(u64 addr) >>>>>>> { >>>>>>> - pr_warn("Unknown node for memory at 0x%llx, assuming node >> 0\n", >>>>>> addr); >>>>>>> return 0; >>>>>>> } >>>>>>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid); >>>>>> >>>>>> Does it make sense to export a noop function? Wouldn't make more >> sense >>>>>> to simply make it static inline somewhere in a header? I haven't >> checked >>>>>> whether there is an easy way to do that sanely bu this just hit my >> eyes. >>>>> >>>>> Okay, I can make a change in memory_hotplug.h, sth like: >>>>> --- a/include/linux/memory_hotplug.h >>>>> +++ b/include/linux/memory_hotplug.h >>>>> @@ -149,13 +149,13 @@ int add_pages(int nid, unsigned long start_pfn, >> unsigned long nr_pages, >>>>> struct mhp_params *params); >>>>> #endif /* ARCH_HAS_ADD_PAGES */ >>>>> >>>>> -#ifdef CONFIG_NUMA >>>>> -extern int memory_add_physaddr_to_nid(u64 start); >>>>> -#else >>>>> +#if !defined(CONFIG_NUMA) || !defined(memory_add_physaddr_to_nid) >>>>> static inline int memory_add_physaddr_to_nid(u64 start) >>>>> { >>>>> return 0; >>>>> } >>>>> +#else >>>>> +extern int memory_add_physaddr_to_nid(u64 start); >>>>> #endif >>>>> >>>>> And then check the memory_add_physaddr_to_nid() helper on all arches, >>>>> if it is noop(return 0), I can simply remove it. >>>>> if it is not noop, after the helper, >>>>> #define memory_add_physaddr_to_nid >>>>> >>>>> What do you think of this proposal? >>>> >>>> Especially for architectures that use memblock info for numa info >>>> (which seems to be everyone except x86) why not implement a generic >>>> memory_add_physaddr_to_nid() that does: >>> >>> That would be only arm64. >>> >> >> Darn, I saw ARCH_KEEP_MEMBLOCK and had delusions of grandeur that it >> could solve my numa api woes. At least for x86 the problem is already >> solved with reserved numa_meminfo, but now I'm trying to write generic >> drivers that use those apis and finding these gaps on other archs. > > Even on arm64, there is a dependency issue in dax_pmem kmem case. > If dax pmem uses memory_add_physaddr_to_nid() to decide which node that > memblock should add into, get_pfn_range_for_nid() might not have > the correct memblock info at that time. That is, get_pfn_range_for_nid() > can't get the correct memblock info before add_memory() > > So IMO, memory_add_physaddr_to_nid() still have to implement as noop on > arm64 (return 0) together with sh,s390x? Powerpc, x86,ia64 can use their > own implementation. And phys_to_target_node() can use your suggested( > for_each_online_node() ...) > > What do you think of it? Thanks You are trying to fix the "we only have one dummy node" AFAIU, so what you propose here is certainly good enough for now. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 46+ messages in thread
* [RFC PATCH v2 2/3] device-dax: use fallback nid when numa_node is invalid 2020-07-07 5:59 [PATCH v2 0/3] Fix and enable pmem as RAM device on arm64 Jia He 2020-07-07 5:59 ` [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL Jia He @ 2020-07-07 5:59 ` Jia He 2020-07-07 6:08 ` Justin He 2020-07-07 11:34 ` David Hildenbrand 2020-07-07 5:59 ` [PATCH v2 3/3] mm/memory_hotplug: fix unpaired mem_hotplug_begin/done Jia He 2 siblings, 2 replies; 46+ messages in thread From: Jia He @ 2020-07-07 5:59 UTC (permalink / raw) To: Catalin Marinas, Will Deacon, Dan Williams, Vishal Verma, Dave Jiang Cc: Michal Hocko, Andrew Morton, Mike Rapoport, Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm, linux-nvdimm, Kaly Xin, Jia He Previously, numa_off is set unconditionally at the end of dummy_numa_init(), even with a fake numa node. Then ACPI detects node id as NUMA_NO_NODE(-1) in acpi_map_pxm_to_node() because it regards numa_off as turning off the numa node. Hence dev_dax->target_node is NUMA_NO_NODE on arm64 with fake numa. Without this patch, pmem can't be probed as a RAM device on arm64 if SRAT table isn't present: $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g -a 64K kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] with invalid node: -1 kmem: probe of dax0.0 failed with error -22 This fixes it by using fallback memory_add_physaddr_to_nid() as nid. Suggested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Jia He <justin.he@arm.com> --- I noticed that on powerpc memory_add_physaddr_to_nid is not exported for module driver. Set it to RFC due to this concern. drivers/dax/kmem.c | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c index 275aa5f87399..68e693ca6d59 100644 --- a/drivers/dax/kmem.c +++ b/drivers/dax/kmem.c @@ -28,20 +28,22 @@ int dev_dax_kmem_probe(struct device *dev) resource_size_t kmem_end; struct resource *new_res; const char *new_res_name; - int numa_node; + int numa_node, new_node; int rc; /* * Ensure good NUMA information for the persistent memory. - * Without this check, there is a risk that slow memory - * could be mixed in a node with faster memory, causing - * unavoidable performance issues. + * Without this check, there is a risk but not fatal that slow + * memory could be mixed in a node with faster memory, causing + * unavoidable performance issues. Furthermore, fallback node + * id can be used when numa_node is invalid. */ numa_node = dev_dax->target_node; if (numa_node < 0) { - dev_warn(dev, "rejecting DAX region %pR with invalid node: %d\n", - res, numa_node); - return -EINVAL; + new_node = memory_add_physaddr_to_nid(kmem_start); + dev_info(dev, "changing nid from %d to %d for DAX region %pR\n", + numa_node, new_node, res); + numa_node = new_node; } /* Hotplug starting at the beginning of the next block: */ @@ -100,6 +102,7 @@ static int dev_dax_kmem_remove(struct device *dev) resource_size_t kmem_start = res->start; resource_size_t kmem_size = resource_size(res); const char *res_name = res->name; + int numa_node = dev_dax->target_node; int rc; /* @@ -108,7 +111,10 @@ static int dev_dax_kmem_remove(struct device *dev) * there is no way to hotremove this memory until reboot because device * unbind will succeed even if we return failure. */ - rc = remove_memory(dev_dax->target_node, kmem_start, kmem_size); + if (numa_node < 0) + numa_node = memory_add_physaddr_to_nid(kmem_start); + + rc = remove_memory(numa_node, kmem_start, kmem_size); if (rc) { any_hotremove_failed = true; dev_err(dev, -- 2.17.1 ^ permalink raw reply related [flat|nested] 46+ messages in thread
* RE: [RFC PATCH v2 2/3] device-dax: use fallback nid when numa_node is invalid 2020-07-07 5:59 ` [RFC PATCH v2 2/3] device-dax: use fallback nid when numa_node is invalid Jia He @ 2020-07-07 6:08 ` Justin He 2020-07-07 11:34 ` David Hildenbrand 1 sibling, 0 replies; 46+ messages in thread From: Justin He @ 2020-07-07 6:08 UTC (permalink / raw) To: Justin He Cc: Michal Hocko, Andrew Morton, Mike Rapoport, Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm, linux-nvdimm, Kaly Xin, Catalin Marinas, Will Deacon, Dan Williams, Vishal Verma, Dave Jiang, Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras [+] add Powerpc maintainers to check my concern about memory_add_physaddr_to_nid Exporting -- Cheers, Justin (Jia He) > -----Original Message----- > From: Jia He <justin.he@arm.com> > Sent: Tuesday, July 7, 2020 1:59 PM > To: Catalin Marinas <Catalin.Marinas@arm.com>; Will Deacon > <will@kernel.org>; Dan Williams <dan.j.williams@intel.com>; Vishal Verma > <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com> > Cc: Michal Hocko <mhocko@suse.com>; Andrew Morton <akpm@linux- > foundation.org>; Mike Rapoport <rppt@linux.ibm.com>; Baoquan He > <bhe@redhat.com>; Chuhong Yuan <hslester96@gmail.com>; linux-arm- > kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux- > mm@kvack.org; linux-nvdimm@lists.01.org; Kaly Xin <Kaly.Xin@arm.com>; > Justin He <Justin.He@arm.com> > Subject: [RFC PATCH v2 2/3] device-dax: use fallback nid when numa_node is > invalid > > Previously, numa_off is set unconditionally at the end of > dummy_numa_init(), > even with a fake numa node. Then ACPI detects node id as NUMA_NO_NODE(-1) > in > acpi_map_pxm_to_node() because it regards numa_off as turning off the numa > node. Hence dev_dax->target_node is NUMA_NO_NODE on arm64 with fake numa. > > Without this patch, pmem can't be probed as a RAM device on arm64 if SRAT > table > isn't present: > $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g -a > 64K > kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] with > invalid node: -1 > kmem: probe of dax0.0 failed with error -22 > > This fixes it by using fallback memory_add_physaddr_to_nid() as nid. > > Suggested-by: David Hildenbrand <david@redhat.com> > Signed-off-by: Jia He <justin.he@arm.com> > --- > I noticed that on powerpc memory_add_physaddr_to_nid is not exported for > module > driver. Set it to RFC due to this concern. > > drivers/dax/kmem.c | 22 ++++++++++++++-------- > 1 file changed, 14 insertions(+), 8 deletions(-) > > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c > index 275aa5f87399..68e693ca6d59 100644 > --- a/drivers/dax/kmem.c > +++ b/drivers/dax/kmem.c > @@ -28,20 +28,22 @@ int dev_dax_kmem_probe(struct device *dev) > resource_size_t kmem_end; > struct resource *new_res; > const char *new_res_name; > - int numa_node; > + int numa_node, new_node; > int rc; > > /* > * Ensure good NUMA information for the persistent memory. > - * Without this check, there is a risk that slow memory > - * could be mixed in a node with faster memory, causing > - * unavoidable performance issues. > + * Without this check, there is a risk but not fatal that slow > + * memory could be mixed in a node with faster memory, causing > + * unavoidable performance issues. Furthermore, fallback node > + * id can be used when numa_node is invalid. > */ > numa_node = dev_dax->target_node; > if (numa_node < 0) { > - dev_warn(dev, "rejecting DAX region %pR with invalid > node: %d\n", > - res, numa_node); > - return -EINVAL; > + new_node = memory_add_physaddr_to_nid(kmem_start); > + dev_info(dev, "changing nid from %d to %d for DAX > region %pR\n", > + numa_node, new_node, res); > + numa_node = new_node; > } > > /* Hotplug starting at the beginning of the next block: */ > @@ -100,6 +102,7 @@ static int dev_dax_kmem_remove(struct device *dev) > resource_size_t kmem_start = res->start; > resource_size_t kmem_size = resource_size(res); > const char *res_name = res->name; > + int numa_node = dev_dax->target_node; > int rc; > > /* > @@ -108,7 +111,10 @@ static int dev_dax_kmem_remove(struct device *dev) > * there is no way to hotremove this memory until reboot because > device > * unbind will succeed even if we return failure. > */ > - rc = remove_memory(dev_dax->target_node, kmem_start, kmem_size); > + if (numa_node < 0) > + numa_node = memory_add_physaddr_to_nid(kmem_start); > + > + rc = remove_memory(numa_node, kmem_start, kmem_size); > if (rc) { > any_hotremove_failed = true; > dev_err(dev, > -- > 2.17.1 ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [RFC PATCH v2 2/3] device-dax: use fallback nid when numa_node is invalid 2020-07-07 5:59 ` [RFC PATCH v2 2/3] device-dax: use fallback nid when numa_node is invalid Jia He 2020-07-07 6:08 ` Justin He @ 2020-07-07 11:34 ` David Hildenbrand 2020-07-08 1:41 ` Justin He 1 sibling, 1 reply; 46+ messages in thread From: David Hildenbrand @ 2020-07-07 11:34 UTC (permalink / raw) To: Jia He, Catalin Marinas, Will Deacon, Dan Williams, Vishal Verma, Dave Jiang Cc: Michal Hocko, Andrew Morton, Mike Rapoport, Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm, linux-nvdimm, Kaly Xin On 07.07.20 07:59, Jia He wrote: > Previously, numa_off is set unconditionally at the end of dummy_numa_init(), > even with a fake numa node. Then ACPI detects node id as NUMA_NO_NODE(-1) in > acpi_map_pxm_to_node() because it regards numa_off as turning off the numa > node. Hence dev_dax->target_node is NUMA_NO_NODE on arm64 with fake numa. > > Without this patch, pmem can't be probed as a RAM device on arm64 if SRAT table > isn't present: > $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g -a 64K > kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] with invalid node: -1 > kmem: probe of dax0.0 failed with error -22 > > This fixes it by using fallback memory_add_physaddr_to_nid() as nid. > > Suggested-by: David Hildenbrand <david@redhat.com> > Signed-off-by: Jia He <justin.he@arm.com> > --- > I noticed that on powerpc memory_add_physaddr_to_nid is not exported for module > driver. Set it to RFC due to this concern. > > drivers/dax/kmem.c | 22 ++++++++++++++-------- > 1 file changed, 14 insertions(+), 8 deletions(-) > > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c > index 275aa5f87399..68e693ca6d59 100644 > --- a/drivers/dax/kmem.c > +++ b/drivers/dax/kmem.c > @@ -28,20 +28,22 @@ int dev_dax_kmem_probe(struct device *dev) > resource_size_t kmem_end; > struct resource *new_res; > const char *new_res_name; > - int numa_node; > + int numa_node, new_node; > int rc; > > /* > * Ensure good NUMA information for the persistent memory. > - * Without this check, there is a risk that slow memory > - * could be mixed in a node with faster memory, causing > - * unavoidable performance issues. > + * Without this check, there is a risk but not fatal that slow > + * memory could be mixed in a node with faster memory, causing > + * unavoidable performance issues. Furthermore, fallback node > + * id can be used when numa_node is invalid. > */ > numa_node = dev_dax->target_node; > if (numa_node < 0) { > - dev_warn(dev, "rejecting DAX region %pR with invalid node: %d\n", > - res, numa_node); > - return -EINVAL; > + new_node = memory_add_physaddr_to_nid(kmem_start); > + dev_info(dev, "changing nid from %d to %d for DAX region %pR\n", > + numa_node, new_node, res); > + numa_node = new_node; Now, the warning does not really make sense. We have NUMA_NO_NODE (< 0), that is not a change in the nid, but a selection of a nid. Printing NUMA_NO_NODE does not make too much sense. I suggest just getting rid of new_node and turning the dev_info() into something like dev_info(dev, "using nid %d for DAX region with undefined nid %pR\n", numa_node, res); -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 46+ messages in thread
* RE: [RFC PATCH v2 2/3] device-dax: use fallback nid when numa_node is invalid 2020-07-07 11:34 ` David Hildenbrand @ 2020-07-08 1:41 ` Justin He 0 siblings, 0 replies; 46+ messages in thread From: Justin He @ 2020-07-08 1:41 UTC (permalink / raw) To: David Hildenbrand, Catalin Marinas, Will Deacon, Dan Williams, Vishal Verma, Dave Jiang Cc: Michal Hocko, Andrew Morton, Mike Rapoport, Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm, linux-nvdimm, Kaly Xin Hi David > -----Original Message----- > From: David Hildenbrand <david@redhat.com> > Sent: Tuesday, July 7, 2020 7:34 PM > To: Justin He <Justin.He@arm.com>; Catalin Marinas > <Catalin.Marinas@arm.com>; Will Deacon <will@kernel.org>; Dan Williams > <dan.j.williams@intel.com>; Vishal Verma <vishal.l.verma@intel.com>; Dave > Jiang <dave.jiang@intel.com> > Cc: Michal Hocko <mhocko@suse.com>; Andrew Morton <akpm@linux- > foundation.org>; Mike Rapoport <rppt@linux.ibm.com>; Baoquan He > <bhe@redhat.com>; Chuhong Yuan <hslester96@gmail.com>; linux-arm- > kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux- > mm@kvack.org; linux-nvdimm@lists.01.org; Kaly Xin <Kaly.Xin@arm.com> > Subject: Re: [RFC PATCH v2 2/3] device-dax: use fallback nid when > numa_node is invalid > > On 07.07.20 07:59, Jia He wrote: > > Previously, numa_off is set unconditionally at the end of > dummy_numa_init(), > > even with a fake numa node. Then ACPI detects node id as NUMA_NO_NODE(-1) > in > > acpi_map_pxm_to_node() because it regards numa_off as turning off the > numa > > node. Hence dev_dax->target_node is NUMA_NO_NODE on arm64 with fake numa. > > > > Without this patch, pmem can't be probed as a RAM device on arm64 if > SRAT table > > isn't present: > > $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g - > a 64K > > kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] with > invalid node: -1 > > kmem: probe of dax0.0 failed with error -22 > > > > This fixes it by using fallback memory_add_physaddr_to_nid() as nid. > > > > Suggested-by: David Hildenbrand <david@redhat.com> > > Signed-off-by: Jia He <justin.he@arm.com> > > --- > > I noticed that on powerpc memory_add_physaddr_to_nid is not exported for > module > > driver. Set it to RFC due to this concern. > > > > drivers/dax/kmem.c | 22 ++++++++++++++-------- > > 1 file changed, 14 insertions(+), 8 deletions(-) > > > > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c > > index 275aa5f87399..68e693ca6d59 100644 > > --- a/drivers/dax/kmem.c > > +++ b/drivers/dax/kmem.c > > @@ -28,20 +28,22 @@ int dev_dax_kmem_probe(struct device *dev) > > resource_size_t kmem_end; > > struct resource *new_res; > > const char *new_res_name; > > - int numa_node; > > + int numa_node, new_node; > > int rc; > > > > /* > > * Ensure good NUMA information for the persistent memory. > > - * Without this check, there is a risk that slow memory > > - * could be mixed in a node with faster memory, causing > > - * unavoidable performance issues. > > + * Without this check, there is a risk but not fatal that slow > > + * memory could be mixed in a node with faster memory, causing > > + * unavoidable performance issues. Furthermore, fallback node > > + * id can be used when numa_node is invalid. > > */ > > numa_node = dev_dax->target_node; > > if (numa_node < 0) { > > - dev_warn(dev, "rejecting DAX region %pR with invalid > node: %d\n", > > - res, numa_node); > > - return -EINVAL; > > + new_node = memory_add_physaddr_to_nid(kmem_start); > > + dev_info(dev, "changing nid from %d to %d for DAX > region %pR\n", > > + numa_node, new_node, res); > > + numa_node = new_node; > > Now, the warning does not really make sense. We have NUMA_NO_NODE (< 0), > that is not a change in the nid, but a selection of a nid. Printing > NUMA_NO_NODE does not make too much sense. I suggest just getting rid of > new_node and turning the dev_info() into something like > > dev_info(dev, "using nid %d for DAX region with undefined nid %pR\n", > numa_node, res); > Okay, I will update it as your sugguestion. Thanks -- Cheers, Justin (Jia He) ^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH v2 3/3] mm/memory_hotplug: fix unpaired mem_hotplug_begin/done 2020-07-07 5:59 [PATCH v2 0/3] Fix and enable pmem as RAM device on arm64 Jia He 2020-07-07 5:59 ` [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL Jia He 2020-07-07 5:59 ` [RFC PATCH v2 2/3] device-dax: use fallback nid when numa_node is invalid Jia He @ 2020-07-07 5:59 ` Jia He 2020-07-07 10:06 ` Michal Hocko 2020-07-07 11:31 ` David Hildenbrand 2 siblings, 2 replies; 46+ messages in thread From: Jia He @ 2020-07-07 5:59 UTC (permalink / raw) To: Catalin Marinas, Will Deacon, Dan Williams, Vishal Verma, Dave Jiang Cc: Michal Hocko, Andrew Morton, Mike Rapoport, Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm, linux-nvdimm, Kaly Xin, Jia He, stable When check_memblock_offlined_cb() returns failed rc(e.g. the memblock is online at that time), mem_hotplug_begin/done is unpaired in such case. Therefore a warning: Call Trace: percpu_up_write+0x33/0x40 try_remove_memory+0x66/0x120 ? _cond_resched+0x19/0x30 remove_memory+0x2b/0x40 dev_dax_kmem_remove+0x36/0x72 [kmem] device_release_driver_internal+0xf0/0x1c0 device_release_driver+0x12/0x20 bus_remove_device+0xe1/0x150 device_del+0x17b/0x3e0 unregister_dev_dax+0x29/0x60 devm_action_release+0x15/0x20 release_nodes+0x19a/0x1e0 devres_release_all+0x3f/0x50 device_release_driver_internal+0x100/0x1c0 driver_detach+0x4c/0x8f bus_remove_driver+0x5c/0xd0 driver_unregister+0x31/0x50 dax_pmem_exit+0x10/0xfe0 [dax_pmem] Fixes: f1037ec0cc8a ("mm/memory_hotplug: fix remove_memory() lockdep splat") Cc: stable@vger.kernel.org # v5.6+ Signed-off-by: Jia He <justin.he@arm.com> --- mm/memory_hotplug.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index da374cd3d45b..76c75a599da3 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1742,7 +1742,7 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size) */ rc = walk_memory_blocks(start, size, NULL, check_memblock_offlined_cb); if (rc) - goto done; + return rc; /* remove memmap entry */ firmware_map_remove(start, start + size, "System RAM"); @@ -1766,9 +1766,8 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size) try_offline_node(nid); -done: mem_hotplug_done(); - return rc; + return 0; } /** -- 2.17.1 ^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH v2 3/3] mm/memory_hotplug: fix unpaired mem_hotplug_begin/done 2020-07-07 5:59 ` [PATCH v2 3/3] mm/memory_hotplug: fix unpaired mem_hotplug_begin/done Jia He @ 2020-07-07 10:06 ` Michal Hocko 2020-07-07 11:31 ` David Hildenbrand 1 sibling, 0 replies; 46+ messages in thread From: Michal Hocko @ 2020-07-07 10:06 UTC (permalink / raw) To: Jia He Cc: Catalin Marinas, Will Deacon, Dan Williams, Vishal Verma, Dave Jiang, Andrew Morton, Mike Rapoport, Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm, linux-nvdimm, Kaly Xin, stable On Tue 07-07-20 13:59:17, Jia He wrote: > When check_memblock_offlined_cb() returns failed rc(e.g. the memblock is > online at that time), mem_hotplug_begin/done is unpaired in such case. > > Therefore a warning: > Call Trace: > percpu_up_write+0x33/0x40 > try_remove_memory+0x66/0x120 > ? _cond_resched+0x19/0x30 > remove_memory+0x2b/0x40 > dev_dax_kmem_remove+0x36/0x72 [kmem] > device_release_driver_internal+0xf0/0x1c0 > device_release_driver+0x12/0x20 > bus_remove_device+0xe1/0x150 > device_del+0x17b/0x3e0 > unregister_dev_dax+0x29/0x60 > devm_action_release+0x15/0x20 > release_nodes+0x19a/0x1e0 > devres_release_all+0x3f/0x50 > device_release_driver_internal+0x100/0x1c0 > driver_detach+0x4c/0x8f > bus_remove_driver+0x5c/0xd0 > driver_unregister+0x31/0x50 > dax_pmem_exit+0x10/0xfe0 [dax_pmem] > > Fixes: f1037ec0cc8a ("mm/memory_hotplug: fix remove_memory() lockdep splat") > Cc: stable@vger.kernel.org # v5.6+ > Signed-off-by: Jia He <justin.he@arm.com> Ups, I have missed that when reviewing that commit. Thanks for catching this up! Acked-by: Michal Hocko <mhocko@suse.com> > --- > mm/memory_hotplug.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index da374cd3d45b..76c75a599da3 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1742,7 +1742,7 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size) > */ > rc = walk_memory_blocks(start, size, NULL, check_memblock_offlined_cb); > if (rc) > - goto done; > + return rc; > > /* remove memmap entry */ > firmware_map_remove(start, start + size, "System RAM"); > @@ -1766,9 +1766,8 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size) > > try_offline_node(nid); > > -done: > mem_hotplug_done(); > - return rc; > + return 0; > } > > /** > -- > 2.17.1 -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v2 3/3] mm/memory_hotplug: fix unpaired mem_hotplug_begin/done 2020-07-07 5:59 ` [PATCH v2 3/3] mm/memory_hotplug: fix unpaired mem_hotplug_begin/done Jia He 2020-07-07 10:06 ` Michal Hocko @ 2020-07-07 11:31 ` David Hildenbrand 1 sibling, 0 replies; 46+ messages in thread From: David Hildenbrand @ 2020-07-07 11:31 UTC (permalink / raw) To: Jia He, Catalin Marinas, Will Deacon, Dan Williams, Vishal Verma, Dave Jiang Cc: Michal Hocko, Andrew Morton, Mike Rapoport, Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm, linux-nvdimm, Kaly Xin, stable On 07.07.20 07:59, Jia He wrote: > When check_memblock_offlined_cb() returns failed rc(e.g. the memblock is > online at that time), mem_hotplug_begin/done is unpaired in such case. > > Therefore a warning: > Call Trace: > percpu_up_write+0x33/0x40 > try_remove_memory+0x66/0x120 > ? _cond_resched+0x19/0x30 > remove_memory+0x2b/0x40 > dev_dax_kmem_remove+0x36/0x72 [kmem] > device_release_driver_internal+0xf0/0x1c0 > device_release_driver+0x12/0x20 > bus_remove_device+0xe1/0x150 > device_del+0x17b/0x3e0 > unregister_dev_dax+0x29/0x60 > devm_action_release+0x15/0x20 > release_nodes+0x19a/0x1e0 > devres_release_all+0x3f/0x50 > device_release_driver_internal+0x100/0x1c0 > driver_detach+0x4c/0x8f > bus_remove_driver+0x5c/0xd0 > driver_unregister+0x31/0x50 > dax_pmem_exit+0x10/0xfe0 [dax_pmem] > > Fixes: f1037ec0cc8a ("mm/memory_hotplug: fix remove_memory() lockdep splat") > Cc: stable@vger.kernel.org # v5.6+ > Signed-off-by: Jia He <justin.he@arm.com> > --- > mm/memory_hotplug.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index da374cd3d45b..76c75a599da3 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1742,7 +1742,7 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size) > */ > rc = walk_memory_blocks(start, size, NULL, check_memblock_offlined_cb); > if (rc) > - goto done; > + return rc; > > /* remove memmap entry */ > firmware_map_remove(start, start + size, "System RAM"); > @@ -1766,9 +1766,8 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size) > > try_offline_node(nid); > > -done: > mem_hotplug_done(); > - return rc; > + return 0; > } > > /** > Reviewed-by: David Hildenbrand <david@redhat.com> -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 46+ messages in thread
end of thread, other threads:[~2020-07-08 16:48 UTC | newest] Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-07-07 5:59 [PATCH v2 0/3] Fix and enable pmem as RAM device on arm64 Jia He 2020-07-07 5:59 ` [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL Jia He 2020-07-07 11:35 ` David Hildenbrand 2020-07-07 11:54 ` Michal Hocko 2020-07-07 12:13 ` Mike Rapoport 2020-07-07 12:26 ` David Hildenbrand 2020-07-07 18:00 ` Mike Rapoport 2020-07-07 22:05 ` Dan Williams 2020-07-08 5:27 ` Mike Rapoport 2020-07-08 7:21 ` David Hildenbrand 2020-07-08 7:38 ` Mike Rapoport 2020-07-08 7:40 ` David Hildenbrand 2020-07-08 7:50 ` Dan Williams 2020-07-08 8:26 ` David Hildenbrand 2020-07-08 8:39 ` Mike Rapoport 2020-07-08 8:45 ` David Hildenbrand 2020-07-08 9:15 ` Mike Rapoport 2020-07-08 9:25 ` David Hildenbrand 2020-07-08 9:45 ` Mike Rapoport 2020-07-08 10:04 ` David Hildenbrand 2020-07-08 15:50 ` Dan Williams 2020-07-08 16:10 ` David Hildenbrand 2020-07-08 16:47 ` Mike Rapoport 2020-07-08 2:20 ` Justin He 2020-07-08 3:56 ` Dan Williams 2020-07-08 4:08 ` Justin He 2020-07-08 4:27 ` Dan Williams 2020-07-08 6:22 ` Mike Rapoport 2020-07-08 6:53 ` Dan Williams 2020-07-08 6:59 ` David Hildenbrand 2020-07-08 7:04 ` Dan Williams 2020-07-08 7:16 ` David Hildenbrand 2020-07-08 7:43 ` Mike Rapoport 2020-07-08 5:32 ` Mike Rapoport 2020-07-08 5:48 ` Dan Williams 2020-07-08 6:19 ` Mike Rapoport 2020-07-08 6:44 ` Dan Williams 2020-07-08 6:56 ` Justin He 2020-07-08 7:00 ` David Hildenbrand 2020-07-07 5:59 ` [RFC PATCH v2 2/3] device-dax: use fallback nid when numa_node is invalid Jia He 2020-07-07 6:08 ` Justin He 2020-07-07 11:34 ` David Hildenbrand 2020-07-08 1:41 ` Justin He 2020-07-07 5:59 ` [PATCH v2 3/3] mm/memory_hotplug: fix unpaired mem_hotplug_begin/done Jia He 2020-07-07 10:06 ` Michal Hocko 2020-07-07 11:31 ` David Hildenbrand
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).