linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] Fix and enable pmem as RAM device on arm64
@ 2020-07-07  5:59 Jia He
  2020-07-07  5:59 ` [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL Jia He
                   ` (2 more replies)
  0 siblings, 3 replies; 46+ messages in thread
From: Jia He @ 2020-07-07  5:59 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Dan Williams, Vishal Verma, Dave Jiang
  Cc: Michal Hocko, Andrew Morton, Mike Rapoport, Baoquan He,
	Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm,
	linux-nvdimm, Kaly Xin, Jia He

This fix a few issues when I tried to enable pmem as RAM device on arm64.

Tested on ThunderX2 host/qemu "-M virt" guest with a nvdimm device. The 
memblocks from the dax pmem device can be either hot-added or hot-removed
on arm64 guest.

Changes:
v2: - Drop unneccessary patch to harden try_offline_node
    - Use new solution(by David) to fix dev->target_node=-1 during probing
    - Refine the mem_hotplug_begin/done patch

v1: https://lkml.org/lkml/2020/7/5/381

Jia He (3):
  arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  device-dax: use fallback nid when numa_node is invalid
  mm/memory_hotplug: fix unpaired mem_hotplug_begin/done

 arch/arm64/mm/numa.c |  5 +++--
 drivers/dax/kmem.c   | 22 ++++++++++++++--------
 mm/memory_hotplug.c  |  5 ++---
 3 files changed, 19 insertions(+), 13 deletions(-)

-- 
2.17.1



^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-07  5:59 [PATCH v2 0/3] Fix and enable pmem as RAM device on arm64 Jia He
@ 2020-07-07  5:59 ` Jia He
  2020-07-07 11:35   ` David Hildenbrand
  2020-07-07 11:54   ` Michal Hocko
  2020-07-07  5:59 ` [RFC PATCH v2 2/3] device-dax: use fallback nid when numa_node is invalid Jia He
  2020-07-07  5:59 ` [PATCH v2 3/3] mm/memory_hotplug: fix unpaired mem_hotplug_begin/done Jia He
  2 siblings, 2 replies; 46+ messages in thread
From: Jia He @ 2020-07-07  5:59 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Dan Williams, Vishal Verma, Dave Jiang
  Cc: Michal Hocko, Andrew Morton, Mike Rapoport, Baoquan He,
	Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm,
	linux-nvdimm, Kaly Xin, Jia He

This exports memory_add_physaddr_to_nid() for module driver to use.

memory_add_physaddr_to_nid() is a fallback option to get the nid in case
NUMA_NO_NID is detected.

Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Jia He <justin.he@arm.com>
---
 arch/arm64/mm/numa.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
index aafcee3e3f7e..7eeb31740248 100644
--- a/arch/arm64/mm/numa.c
+++ b/arch/arm64/mm/numa.c
@@ -464,10 +464,11 @@ void __init arm64_numa_init(void)
 
 /*
  * We hope that we will be hotplugging memory on nodes we already know about,
- * such that acpi_get_node() succeeds and we never fall back to this...
+ * such that acpi_get_node() succeeds. But when SRAT is not present, the node
+ * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option.
  */
 int memory_add_physaddr_to_nid(u64 addr)
 {
-	pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr);
 	return 0;
 }
+EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [RFC PATCH v2 2/3] device-dax: use fallback nid when numa_node is invalid
  2020-07-07  5:59 [PATCH v2 0/3] Fix and enable pmem as RAM device on arm64 Jia He
  2020-07-07  5:59 ` [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL Jia He
@ 2020-07-07  5:59 ` Jia He
  2020-07-07  6:08   ` Justin He
  2020-07-07 11:34   ` David Hildenbrand
  2020-07-07  5:59 ` [PATCH v2 3/3] mm/memory_hotplug: fix unpaired mem_hotplug_begin/done Jia He
  2 siblings, 2 replies; 46+ messages in thread
From: Jia He @ 2020-07-07  5:59 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Dan Williams, Vishal Verma, Dave Jiang
  Cc: Michal Hocko, Andrew Morton, Mike Rapoport, Baoquan He,
	Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm,
	linux-nvdimm, Kaly Xin, Jia He

Previously, numa_off is set unconditionally at the end of dummy_numa_init(),
even with a fake numa node. Then ACPI detects node id as NUMA_NO_NODE(-1) in
acpi_map_pxm_to_node() because it regards numa_off as turning off the numa
node. Hence dev_dax->target_node is NUMA_NO_NODE on arm64 with fake numa.

Without this patch, pmem can't be probed as a RAM device on arm64 if SRAT table
isn't present:
$ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g -a 64K
kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] with invalid node: -1
kmem: probe of dax0.0 failed with error -22

This fixes it by using fallback memory_add_physaddr_to_nid() as nid.

Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Jia He <justin.he@arm.com>
---
I noticed that on powerpc memory_add_physaddr_to_nid is not exported for module
driver. Set it to RFC due to this concern.

 drivers/dax/kmem.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 275aa5f87399..68e693ca6d59 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -28,20 +28,22 @@ int dev_dax_kmem_probe(struct device *dev)
 	resource_size_t kmem_end;
 	struct resource *new_res;
 	const char *new_res_name;
-	int numa_node;
+	int numa_node, new_node;
 	int rc;
 
 	/*
 	 * Ensure good NUMA information for the persistent memory.
-	 * Without this check, there is a risk that slow memory
-	 * could be mixed in a node with faster memory, causing
-	 * unavoidable performance issues.
+	 * Without this check, there is a risk but not fatal that slow
+	 * memory could be mixed in a node with faster memory, causing
+	 * unavoidable performance issues. Furthermore, fallback node
+	 * id can be used when numa_node is invalid.
 	 */
 	numa_node = dev_dax->target_node;
 	if (numa_node < 0) {
-		dev_warn(dev, "rejecting DAX region %pR with invalid node: %d\n",
-			 res, numa_node);
-		return -EINVAL;
+		new_node = memory_add_physaddr_to_nid(kmem_start);
+		dev_info(dev, "changing nid from %d to %d for DAX region %pR\n",
+			numa_node, new_node, res);
+		numa_node = new_node;
 	}
 
 	/* Hotplug starting at the beginning of the next block: */
@@ -100,6 +102,7 @@ static int dev_dax_kmem_remove(struct device *dev)
 	resource_size_t kmem_start = res->start;
 	resource_size_t kmem_size = resource_size(res);
 	const char *res_name = res->name;
+	int numa_node = dev_dax->target_node;
 	int rc;
 
 	/*
@@ -108,7 +111,10 @@ static int dev_dax_kmem_remove(struct device *dev)
 	 * there is no way to hotremove this memory until reboot because device
 	 * unbind will succeed even if we return failure.
 	 */
-	rc = remove_memory(dev_dax->target_node, kmem_start, kmem_size);
+	if (numa_node < 0)
+		numa_node = memory_add_physaddr_to_nid(kmem_start);
+
+	rc = remove_memory(numa_node, kmem_start, kmem_size);
 	if (rc) {
 		any_hotremove_failed = true;
 		dev_err(dev,
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 3/3] mm/memory_hotplug: fix unpaired mem_hotplug_begin/done
  2020-07-07  5:59 [PATCH v2 0/3] Fix and enable pmem as RAM device on arm64 Jia He
  2020-07-07  5:59 ` [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL Jia He
  2020-07-07  5:59 ` [RFC PATCH v2 2/3] device-dax: use fallback nid when numa_node is invalid Jia He
@ 2020-07-07  5:59 ` Jia He
  2020-07-07 10:06   ` Michal Hocko
  2020-07-07 11:31   ` David Hildenbrand
  2 siblings, 2 replies; 46+ messages in thread
From: Jia He @ 2020-07-07  5:59 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Dan Williams, Vishal Verma, Dave Jiang
  Cc: Michal Hocko, Andrew Morton, Mike Rapoport, Baoquan He,
	Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm,
	linux-nvdimm, Kaly Xin, Jia He, stable

When check_memblock_offlined_cb() returns failed rc(e.g. the memblock is
online at that time), mem_hotplug_begin/done is unpaired in such case.

Therefore a warning:
 Call Trace:
  percpu_up_write+0x33/0x40
  try_remove_memory+0x66/0x120
  ? _cond_resched+0x19/0x30
  remove_memory+0x2b/0x40
  dev_dax_kmem_remove+0x36/0x72 [kmem]
  device_release_driver_internal+0xf0/0x1c0
  device_release_driver+0x12/0x20
  bus_remove_device+0xe1/0x150
  device_del+0x17b/0x3e0
  unregister_dev_dax+0x29/0x60
  devm_action_release+0x15/0x20
  release_nodes+0x19a/0x1e0
  devres_release_all+0x3f/0x50
  device_release_driver_internal+0x100/0x1c0
  driver_detach+0x4c/0x8f
  bus_remove_driver+0x5c/0xd0
  driver_unregister+0x31/0x50
  dax_pmem_exit+0x10/0xfe0 [dax_pmem]

Fixes: f1037ec0cc8a ("mm/memory_hotplug: fix remove_memory() lockdep splat")
Cc: stable@vger.kernel.org # v5.6+
Signed-off-by: Jia He <justin.he@arm.com>
---
 mm/memory_hotplug.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index da374cd3d45b..76c75a599da3 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1742,7 +1742,7 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size)
 	 */
 	rc = walk_memory_blocks(start, size, NULL, check_memblock_offlined_cb);
 	if (rc)
-		goto done;
+		return rc;
 
 	/* remove memmap entry */
 	firmware_map_remove(start, start + size, "System RAM");
@@ -1766,9 +1766,8 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size)
 
 	try_offline_node(nid);
 
-done:
 	mem_hotplug_done();
-	return rc;
+	return 0;
 }
 
 /**
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* RE: [RFC PATCH v2 2/3] device-dax: use fallback nid when numa_node is invalid
  2020-07-07  5:59 ` [RFC PATCH v2 2/3] device-dax: use fallback nid when numa_node is invalid Jia He
@ 2020-07-07  6:08   ` Justin He
  2020-07-07 11:34   ` David Hildenbrand
  1 sibling, 0 replies; 46+ messages in thread
From: Justin He @ 2020-07-07  6:08 UTC (permalink / raw)
  To: Justin He
  Cc: Michal Hocko, Andrew Morton, Mike Rapoport, Baoquan He,
	Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm,
	linux-nvdimm, Kaly Xin, Catalin Marinas, Will Deacon,
	Dan Williams, Vishal Verma, Dave Jiang, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras

[+] add Powerpc maintainers to check my concern about memory_add_physaddr_to_nid
Exporting


--
Cheers,
Justin (Jia He)



> -----Original Message-----
> From: Jia He <justin.he@arm.com>
> Sent: Tuesday, July 7, 2020 1:59 PM
> To: Catalin Marinas <Catalin.Marinas@arm.com>; Will Deacon
> <will@kernel.org>; Dan Williams <dan.j.williams@intel.com>; Vishal Verma
> <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com>
> Cc: Michal Hocko <mhocko@suse.com>; Andrew Morton <akpm@linux-
> foundation.org>; Mike Rapoport <rppt@linux.ibm.com>; Baoquan He
> <bhe@redhat.com>; Chuhong Yuan <hslester96@gmail.com>; linux-arm-
> kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux-
> mm@kvack.org; linux-nvdimm@lists.01.org; Kaly Xin <Kaly.Xin@arm.com>;
> Justin He <Justin.He@arm.com>
> Subject: [RFC PATCH v2 2/3] device-dax: use fallback nid when numa_node is
> invalid
> 
> Previously, numa_off is set unconditionally at the end of
> dummy_numa_init(),
> even with a fake numa node. Then ACPI detects node id as NUMA_NO_NODE(-1)
> in
> acpi_map_pxm_to_node() because it regards numa_off as turning off the numa
> node. Hence dev_dax->target_node is NUMA_NO_NODE on arm64 with fake numa.
> 
> Without this patch, pmem can't be probed as a RAM device on arm64 if SRAT
> table
> isn't present:
> $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g -a
> 64K
> kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] with
> invalid node: -1
> kmem: probe of dax0.0 failed with error -22
> 
> This fixes it by using fallback memory_add_physaddr_to_nid() as nid.
> 
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Jia He <justin.he@arm.com>
> ---
> I noticed that on powerpc memory_add_physaddr_to_nid is not exported for
> module
> driver. Set it to RFC due to this concern.
> 
>  drivers/dax/kmem.c | 22 ++++++++++++++--------
>  1 file changed, 14 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> index 275aa5f87399..68e693ca6d59 100644
> --- a/drivers/dax/kmem.c
> +++ b/drivers/dax/kmem.c
> @@ -28,20 +28,22 @@ int dev_dax_kmem_probe(struct device *dev)
>  	resource_size_t kmem_end;
>  	struct resource *new_res;
>  	const char *new_res_name;
> -	int numa_node;
> +	int numa_node, new_node;
>  	int rc;
> 
>  	/*
>  	 * Ensure good NUMA information for the persistent memory.
> -	 * Without this check, there is a risk that slow memory
> -	 * could be mixed in a node with faster memory, causing
> -	 * unavoidable performance issues.
> +	 * Without this check, there is a risk but not fatal that slow
> +	 * memory could be mixed in a node with faster memory, causing
> +	 * unavoidable performance issues. Furthermore, fallback node
> +	 * id can be used when numa_node is invalid.
>  	 */
>  	numa_node = dev_dax->target_node;
>  	if (numa_node < 0) {
> -		dev_warn(dev, "rejecting DAX region %pR with invalid
> node: %d\n",
> -			 res, numa_node);
> -		return -EINVAL;
> +		new_node = memory_add_physaddr_to_nid(kmem_start);
> +		dev_info(dev, "changing nid from %d to %d for DAX
> region %pR\n",
> +			numa_node, new_node, res);
> +		numa_node = new_node;
>  	}
> 
>  	/* Hotplug starting at the beginning of the next block: */
> @@ -100,6 +102,7 @@ static int dev_dax_kmem_remove(struct device *dev)
>  	resource_size_t kmem_start = res->start;
>  	resource_size_t kmem_size = resource_size(res);
>  	const char *res_name = res->name;
> +	int numa_node = dev_dax->target_node;
>  	int rc;
> 
>  	/*
> @@ -108,7 +111,10 @@ static int dev_dax_kmem_remove(struct device *dev)
>  	 * there is no way to hotremove this memory until reboot because
> device
>  	 * unbind will succeed even if we return failure.
>  	 */
> -	rc = remove_memory(dev_dax->target_node, kmem_start, kmem_size);
> +	if (numa_node < 0)
> +		numa_node = memory_add_physaddr_to_nid(kmem_start);
> +
> +	rc = remove_memory(numa_node, kmem_start, kmem_size);
>  	if (rc) {
>  		any_hotremove_failed = true;
>  		dev_err(dev,
> --
> 2.17.1



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 3/3] mm/memory_hotplug: fix unpaired mem_hotplug_begin/done
  2020-07-07  5:59 ` [PATCH v2 3/3] mm/memory_hotplug: fix unpaired mem_hotplug_begin/done Jia He
@ 2020-07-07 10:06   ` Michal Hocko
  2020-07-07 11:31   ` David Hildenbrand
  1 sibling, 0 replies; 46+ messages in thread
From: Michal Hocko @ 2020-07-07 10:06 UTC (permalink / raw)
  To: Jia He
  Cc: Catalin Marinas, Will Deacon, Dan Williams, Vishal Verma,
	Dave Jiang, Andrew Morton, Mike Rapoport, Baoquan He,
	Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm,
	linux-nvdimm, Kaly Xin, stable

On Tue 07-07-20 13:59:17, Jia He wrote:
> When check_memblock_offlined_cb() returns failed rc(e.g. the memblock is
> online at that time), mem_hotplug_begin/done is unpaired in such case.
> 
> Therefore a warning:
>  Call Trace:
>   percpu_up_write+0x33/0x40
>   try_remove_memory+0x66/0x120
>   ? _cond_resched+0x19/0x30
>   remove_memory+0x2b/0x40
>   dev_dax_kmem_remove+0x36/0x72 [kmem]
>   device_release_driver_internal+0xf0/0x1c0
>   device_release_driver+0x12/0x20
>   bus_remove_device+0xe1/0x150
>   device_del+0x17b/0x3e0
>   unregister_dev_dax+0x29/0x60
>   devm_action_release+0x15/0x20
>   release_nodes+0x19a/0x1e0
>   devres_release_all+0x3f/0x50
>   device_release_driver_internal+0x100/0x1c0
>   driver_detach+0x4c/0x8f
>   bus_remove_driver+0x5c/0xd0
>   driver_unregister+0x31/0x50
>   dax_pmem_exit+0x10/0xfe0 [dax_pmem]
> 
> Fixes: f1037ec0cc8a ("mm/memory_hotplug: fix remove_memory() lockdep splat")
> Cc: stable@vger.kernel.org # v5.6+
> Signed-off-by: Jia He <justin.he@arm.com>

Ups, I have missed that when reviewing that commit. Thanks for catching
this up!

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/memory_hotplug.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index da374cd3d45b..76c75a599da3 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1742,7 +1742,7 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size)
>  	 */
>  	rc = walk_memory_blocks(start, size, NULL, check_memblock_offlined_cb);
>  	if (rc)
> -		goto done;
> +		return rc;
>  
>  	/* remove memmap entry */
>  	firmware_map_remove(start, start + size, "System RAM");
> @@ -1766,9 +1766,8 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size)
>  
>  	try_offline_node(nid);
>  
> -done:
>  	mem_hotplug_done();
> -	return rc;
> +	return 0;
>  }
>  
>  /**
> -- 
> 2.17.1

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 3/3] mm/memory_hotplug: fix unpaired mem_hotplug_begin/done
  2020-07-07  5:59 ` [PATCH v2 3/3] mm/memory_hotplug: fix unpaired mem_hotplug_begin/done Jia He
  2020-07-07 10:06   ` Michal Hocko
@ 2020-07-07 11:31   ` David Hildenbrand
  1 sibling, 0 replies; 46+ messages in thread
From: David Hildenbrand @ 2020-07-07 11:31 UTC (permalink / raw)
  To: Jia He, Catalin Marinas, Will Deacon, Dan Williams, Vishal Verma,
	Dave Jiang
  Cc: Michal Hocko, Andrew Morton, Mike Rapoport, Baoquan He,
	Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm,
	linux-nvdimm, Kaly Xin, stable

On 07.07.20 07:59, Jia He wrote:
> When check_memblock_offlined_cb() returns failed rc(e.g. the memblock is
> online at that time), mem_hotplug_begin/done is unpaired in such case.
> 
> Therefore a warning:
>  Call Trace:
>   percpu_up_write+0x33/0x40
>   try_remove_memory+0x66/0x120
>   ? _cond_resched+0x19/0x30
>   remove_memory+0x2b/0x40
>   dev_dax_kmem_remove+0x36/0x72 [kmem]
>   device_release_driver_internal+0xf0/0x1c0
>   device_release_driver+0x12/0x20
>   bus_remove_device+0xe1/0x150
>   device_del+0x17b/0x3e0
>   unregister_dev_dax+0x29/0x60
>   devm_action_release+0x15/0x20
>   release_nodes+0x19a/0x1e0
>   devres_release_all+0x3f/0x50
>   device_release_driver_internal+0x100/0x1c0
>   driver_detach+0x4c/0x8f
>   bus_remove_driver+0x5c/0xd0
>   driver_unregister+0x31/0x50
>   dax_pmem_exit+0x10/0xfe0 [dax_pmem]
> 
> Fixes: f1037ec0cc8a ("mm/memory_hotplug: fix remove_memory() lockdep splat")
> Cc: stable@vger.kernel.org # v5.6+
> Signed-off-by: Jia He <justin.he@arm.com>
> ---
>  mm/memory_hotplug.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index da374cd3d45b..76c75a599da3 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1742,7 +1742,7 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size)
>  	 */
>  	rc = walk_memory_blocks(start, size, NULL, check_memblock_offlined_cb);
>  	if (rc)
> -		goto done;
> +		return rc;
>  
>  	/* remove memmap entry */
>  	firmware_map_remove(start, start + size, "System RAM");
> @@ -1766,9 +1766,8 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size)
>  
>  	try_offline_node(nid);
>  
> -done:
>  	mem_hotplug_done();
> -	return rc;
> +	return 0;
>  }
>  
>  /**
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC PATCH v2 2/3] device-dax: use fallback nid when numa_node is invalid
  2020-07-07  5:59 ` [RFC PATCH v2 2/3] device-dax: use fallback nid when numa_node is invalid Jia He
  2020-07-07  6:08   ` Justin He
@ 2020-07-07 11:34   ` David Hildenbrand
  2020-07-08  1:41     ` Justin He
  1 sibling, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2020-07-07 11:34 UTC (permalink / raw)
  To: Jia He, Catalin Marinas, Will Deacon, Dan Williams, Vishal Verma,
	Dave Jiang
  Cc: Michal Hocko, Andrew Morton, Mike Rapoport, Baoquan He,
	Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm,
	linux-nvdimm, Kaly Xin

On 07.07.20 07:59, Jia He wrote:
> Previously, numa_off is set unconditionally at the end of dummy_numa_init(),
> even with a fake numa node. Then ACPI detects node id as NUMA_NO_NODE(-1) in
> acpi_map_pxm_to_node() because it regards numa_off as turning off the numa
> node. Hence dev_dax->target_node is NUMA_NO_NODE on arm64 with fake numa.
> 
> Without this patch, pmem can't be probed as a RAM device on arm64 if SRAT table
> isn't present:
> $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g -a 64K
> kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] with invalid node: -1
> kmem: probe of dax0.0 failed with error -22
> 
> This fixes it by using fallback memory_add_physaddr_to_nid() as nid.
> 
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Jia He <justin.he@arm.com>
> ---
> I noticed that on powerpc memory_add_physaddr_to_nid is not exported for module
> driver. Set it to RFC due to this concern.
> 
>  drivers/dax/kmem.c | 22 ++++++++++++++--------
>  1 file changed, 14 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> index 275aa5f87399..68e693ca6d59 100644
> --- a/drivers/dax/kmem.c
> +++ b/drivers/dax/kmem.c
> @@ -28,20 +28,22 @@ int dev_dax_kmem_probe(struct device *dev)
>  	resource_size_t kmem_end;
>  	struct resource *new_res;
>  	const char *new_res_name;
> -	int numa_node;
> +	int numa_node, new_node;
>  	int rc;
>  
>  	/*
>  	 * Ensure good NUMA information for the persistent memory.
> -	 * Without this check, there is a risk that slow memory
> -	 * could be mixed in a node with faster memory, causing
> -	 * unavoidable performance issues.
> +	 * Without this check, there is a risk but not fatal that slow
> +	 * memory could be mixed in a node with faster memory, causing
> +	 * unavoidable performance issues. Furthermore, fallback node
> +	 * id can be used when numa_node is invalid.
>  	 */
>  	numa_node = dev_dax->target_node;
>  	if (numa_node < 0) {
> -		dev_warn(dev, "rejecting DAX region %pR with invalid node: %d\n",
> -			 res, numa_node);
> -		return -EINVAL;
> +		new_node = memory_add_physaddr_to_nid(kmem_start);
> +		dev_info(dev, "changing nid from %d to %d for DAX region %pR\n",
> +			numa_node, new_node, res);
> +		numa_node = new_node;

Now, the warning does not really make sense. We have NUMA_NO_NODE (< 0),
that is not a change in the nid, but a selection of a nid. Printing
NUMA_NO_NODE does not make too much sense. I suggest just getting rid of
new_node and turning the dev_info() into something like

dev_info(dev, "using nid %d for DAX region with undefined nid %pR\n",
         numa_node, res);


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-07  5:59 ` [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL Jia He
@ 2020-07-07 11:35   ` David Hildenbrand
  2020-07-07 11:54   ` Michal Hocko
  1 sibling, 0 replies; 46+ messages in thread
From: David Hildenbrand @ 2020-07-07 11:35 UTC (permalink / raw)
  To: Jia He, Catalin Marinas, Will Deacon, Dan Williams, Vishal Verma,
	Dave Jiang
  Cc: Michal Hocko, Andrew Morton, Mike Rapoport, Baoquan He,
	Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm,
	linux-nvdimm, Kaly Xin

On 07.07.20 07:59, Jia He wrote:
> This exports memory_add_physaddr_to_nid() for module driver to use.
> 
> memory_add_physaddr_to_nid() is a fallback option to get the nid in case
> NUMA_NO_NID is detected.
> 
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Jia He <justin.he@arm.com>
> ---
>  arch/arm64/mm/numa.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> index aafcee3e3f7e..7eeb31740248 100644
> --- a/arch/arm64/mm/numa.c
> +++ b/arch/arm64/mm/numa.c
> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void)
>  
>  /*
>   * We hope that we will be hotplugging memory on nodes we already know about,
> - * such that acpi_get_node() succeeds and we never fall back to this...
> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node
> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option.
>   */
>  int memory_add_physaddr_to_nid(u64 addr)
>  {
> -	pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr);
>  	return 0;
>  }
> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
> 
We could turn that into a pr_info() instead, but the effect is visible
to user space (e.g., which memory blocks belong to which node in sysfs),
so this can be debugged easily on demand.

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-07  5:59 ` [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL Jia He
  2020-07-07 11:35   ` David Hildenbrand
@ 2020-07-07 11:54   ` Michal Hocko
  2020-07-07 12:13     ` Mike Rapoport
  2020-07-08  2:20     ` Justin He
  1 sibling, 2 replies; 46+ messages in thread
From: Michal Hocko @ 2020-07-07 11:54 UTC (permalink / raw)
  To: Jia He
  Cc: Catalin Marinas, Will Deacon, Dan Williams, Vishal Verma,
	Dave Jiang, Andrew Morton, Mike Rapoport, Baoquan He,
	Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm,
	linux-nvdimm, Kaly Xin

On Tue 07-07-20 13:59:15, Jia He wrote:
> This exports memory_add_physaddr_to_nid() for module driver to use.
> 
> memory_add_physaddr_to_nid() is a fallback option to get the nid in case
> NUMA_NO_NID is detected.
> 
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Jia He <justin.he@arm.com>
> ---
>  arch/arm64/mm/numa.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> index aafcee3e3f7e..7eeb31740248 100644
> --- a/arch/arm64/mm/numa.c
> +++ b/arch/arm64/mm/numa.c
> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void)
>  
>  /*
>   * We hope that we will be hotplugging memory on nodes we already know about,
> - * such that acpi_get_node() succeeds and we never fall back to this...
> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node
> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option.
>   */
>  int memory_add_physaddr_to_nid(u64 addr)
>  {
> -	pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr);
>  	return 0;
>  }
> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);

Does it make sense to export a noop function? Wouldn't make more sense
to simply make it static inline somewhere in a header? I haven't checked
whether there is an easy way to do that sanely bu this just hit my eyes.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-07 11:54   ` Michal Hocko
@ 2020-07-07 12:13     ` Mike Rapoport
  2020-07-07 12:26       ` David Hildenbrand
  2020-07-08  2:20     ` Justin He
  1 sibling, 1 reply; 46+ messages in thread
From: Mike Rapoport @ 2020-07-07 12:13 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Jia He, Catalin Marinas, Will Deacon, Dan Williams, Vishal Verma,
	Dave Jiang, Andrew Morton, Baoquan He, Chuhong Yuan,
	linux-arm-kernel, linux-kernel, linux-mm, linux-nvdimm, Kaly Xin

On Tue, Jul 07, 2020 at 01:54:54PM +0200, Michal Hocko wrote:
> On Tue 07-07-20 13:59:15, Jia He wrote:
> > This exports memory_add_physaddr_to_nid() for module driver to use.
> > 
> > memory_add_physaddr_to_nid() is a fallback option to get the nid in case
> > NUMA_NO_NID is detected.
> > 
> > Suggested-by: David Hildenbrand <david@redhat.com>
> > Signed-off-by: Jia He <justin.he@arm.com>
> > ---
> >  arch/arm64/mm/numa.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> > index aafcee3e3f7e..7eeb31740248 100644
> > --- a/arch/arm64/mm/numa.c
> > +++ b/arch/arm64/mm/numa.c
> > @@ -464,10 +464,11 @@ void __init arm64_numa_init(void)
> >  
> >  /*
> >   * We hope that we will be hotplugging memory on nodes we already know about,
> > - * such that acpi_get_node() succeeds and we never fall back to this...
> > + * such that acpi_get_node() succeeds. But when SRAT is not present, the node
> > + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option.
> >   */
> >  int memory_add_physaddr_to_nid(u64 addr)
> >  {
> > -	pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr);
> >  	return 0;
> >  }
> > +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
> 
> Does it make sense to export a noop function? Wouldn't make more sense
> to simply make it static inline somewhere in a header? I haven't checked
> whether there is an easy way to do that sanely bu this just hit my eyes.

We'll need to either add a CONFIG_ option or arch specific callback to
make both non-empty (x86, powerpc, ia64) and empty (arm64, sh)
implementations coexist ...

> -- 
> Michal Hocko
> SUSE Labs

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-07 12:13     ` Mike Rapoport
@ 2020-07-07 12:26       ` David Hildenbrand
  2020-07-07 18:00         ` Mike Rapoport
  0 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2020-07-07 12:26 UTC (permalink / raw)
  To: Mike Rapoport, Michal Hocko
  Cc: Jia He, Catalin Marinas, Will Deacon, Dan Williams, Vishal Verma,
	Dave Jiang, Andrew Morton, Baoquan He, Chuhong Yuan,
	linux-arm-kernel, linux-kernel, linux-mm, linux-nvdimm, Kaly Xin

On 07.07.20 14:13, Mike Rapoport wrote:
> On Tue, Jul 07, 2020 at 01:54:54PM +0200, Michal Hocko wrote:
>> On Tue 07-07-20 13:59:15, Jia He wrote:
>>> This exports memory_add_physaddr_to_nid() for module driver to use.
>>>
>>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case
>>> NUMA_NO_NID is detected.
>>>
>>> Suggested-by: David Hildenbrand <david@redhat.com>
>>> Signed-off-by: Jia He <justin.he@arm.com>
>>> ---
>>>  arch/arm64/mm/numa.c | 5 +++--
>>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>>> index aafcee3e3f7e..7eeb31740248 100644
>>> --- a/arch/arm64/mm/numa.c
>>> +++ b/arch/arm64/mm/numa.c
>>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void)
>>>  
>>>  /*
>>>   * We hope that we will be hotplugging memory on nodes we already know about,
>>> - * such that acpi_get_node() succeeds and we never fall back to this...
>>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node
>>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option.
>>>   */
>>>  int memory_add_physaddr_to_nid(u64 addr)
>>>  {
>>> -	pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr);
>>>  	return 0;
>>>  }
>>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
>>
>> Does it make sense to export a noop function? Wouldn't make more sense
>> to simply make it static inline somewhere in a header? I haven't checked
>> whether there is an easy way to do that sanely bu this just hit my eyes.
> 
> We'll need to either add a CONFIG_ option or arch specific callback to
> make both non-empty (x86, powerpc, ia64) and empty (arm64, sh)
> implementations coexist ...

Note: I have a similar dummy (return 0) patch for s390x lying around here.


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-07 12:26       ` David Hildenbrand
@ 2020-07-07 18:00         ` Mike Rapoport
  2020-07-07 22:05           ` Dan Williams
  0 siblings, 1 reply; 46+ messages in thread
From: Mike Rapoport @ 2020-07-07 18:00 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Michal Hocko, Jia He, Catalin Marinas, Will Deacon, Dan Williams,
	Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He,
	Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm,
	linux-nvdimm, Kaly Xin

On Tue, Jul 07, 2020 at 02:26:08PM +0200, David Hildenbrand wrote:
> On 07.07.20 14:13, Mike Rapoport wrote:
> > On Tue, Jul 07, 2020 at 01:54:54PM +0200, Michal Hocko wrote:
> >> On Tue 07-07-20 13:59:15, Jia He wrote:
> >>> This exports memory_add_physaddr_to_nid() for module driver to use.
> >>>
> >>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case
> >>> NUMA_NO_NID is detected.
> >>>
> >>> Suggested-by: David Hildenbrand <david@redhat.com>
> >>> Signed-off-by: Jia He <justin.he@arm.com>
> >>> ---
> >>>  arch/arm64/mm/numa.c | 5 +++--
> >>>  1 file changed, 3 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> >>> index aafcee3e3f7e..7eeb31740248 100644
> >>> --- a/arch/arm64/mm/numa.c
> >>> +++ b/arch/arm64/mm/numa.c
> >>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void)
> >>>  
> >>>  /*
> >>>   * We hope that we will be hotplugging memory on nodes we already know about,
> >>> - * such that acpi_get_node() succeeds and we never fall back to this...
> >>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node
> >>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option.
> >>>   */
> >>>  int memory_add_physaddr_to_nid(u64 addr)
> >>>  {
> >>> -	pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr);
> >>>  	return 0;
> >>>  }
> >>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
> >>
> >> Does it make sense to export a noop function? Wouldn't make more sense
> >> to simply make it static inline somewhere in a header? I haven't checked
> >> whether there is an easy way to do that sanely bu this just hit my eyes.
> > 
> > We'll need to either add a CONFIG_ option or arch specific callback to
> > make both non-empty (x86, powerpc, ia64) and empty (arm64, sh)
> > implementations coexist ...
> 
> Note: I have a similar dummy (return 0) patch for s390x lying around here.

Then we'll call it a tie - 3:3 ;-)
 
> -- 
> Thanks,
> 
> David / dhildenb
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-07 18:00         ` Mike Rapoport
@ 2020-07-07 22:05           ` Dan Williams
  2020-07-08  5:27             ` Mike Rapoport
  0 siblings, 1 reply; 46+ messages in thread
From: Dan Williams @ 2020-07-07 22:05 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: David Hildenbrand, Michal Hocko, Jia He, Catalin Marinas,
	Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He,
	Chuhong Yuan, Linux ARM, Linux Kernel Mailing List, Linux MM,
	linux-nvdimm, Kaly Xin

On Tue, Jul 7, 2020 at 11:01 AM Mike Rapoport <rppt@linux.ibm.com> wrote:
>
> On Tue, Jul 07, 2020 at 02:26:08PM +0200, David Hildenbrand wrote:
> > On 07.07.20 14:13, Mike Rapoport wrote:
> > > On Tue, Jul 07, 2020 at 01:54:54PM +0200, Michal Hocko wrote:
> > >> On Tue 07-07-20 13:59:15, Jia He wrote:
> > >>> This exports memory_add_physaddr_to_nid() for module driver to use.
> > >>>
> > >>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case
> > >>> NUMA_NO_NID is detected.
> > >>>
> > >>> Suggested-by: David Hildenbrand <david@redhat.com>
> > >>> Signed-off-by: Jia He <justin.he@arm.com>
> > >>> ---
> > >>>  arch/arm64/mm/numa.c | 5 +++--
> > >>>  1 file changed, 3 insertions(+), 2 deletions(-)
> > >>>
> > >>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> > >>> index aafcee3e3f7e..7eeb31740248 100644
> > >>> --- a/arch/arm64/mm/numa.c
> > >>> +++ b/arch/arm64/mm/numa.c
> > >>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void)
> > >>>
> > >>>  /*
> > >>>   * We hope that we will be hotplugging memory on nodes we already know about,
> > >>> - * such that acpi_get_node() succeeds and we never fall back to this...
> > >>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node
> > >>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option.
> > >>>   */
> > >>>  int memory_add_physaddr_to_nid(u64 addr)
> > >>>  {
> > >>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr);
> > >>>   return 0;
> > >>>  }
> > >>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
> > >>
> > >> Does it make sense to export a noop function? Wouldn't make more sense
> > >> to simply make it static inline somewhere in a header? I haven't checked
> > >> whether there is an easy way to do that sanely bu this just hit my eyes.
> > >
> > > We'll need to either add a CONFIG_ option or arch specific callback to
> > > make both non-empty (x86, powerpc, ia64) and empty (arm64, sh)
> > > implementations coexist ...
> >
> > Note: I have a similar dummy (return 0) patch for s390x lying around here.
>
> Then we'll call it a tie - 3:3 ;-)

So I'd be happy to jump on the train of people wanting to export the
ARM stub for this (and add a new ARM stub for phys_to_target_node()),
but Will did have a plausibly better idea that I have been meaning to
circle back to:

http://lore.kernel.org/r/20200325111039.GA32109@willie-the-truck

...i.e. iterate over node data to do the lookup. This would seem to
work generically for multiple archs unless I am missing something?


^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [RFC PATCH v2 2/3] device-dax: use fallback nid when numa_node is invalid
  2020-07-07 11:34   ` David Hildenbrand
@ 2020-07-08  1:41     ` Justin He
  0 siblings, 0 replies; 46+ messages in thread
From: Justin He @ 2020-07-08  1:41 UTC (permalink / raw)
  To: David Hildenbrand, Catalin Marinas, Will Deacon, Dan Williams,
	Vishal Verma, Dave Jiang
  Cc: Michal Hocko, Andrew Morton, Mike Rapoport, Baoquan He,
	Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm,
	linux-nvdimm, Kaly Xin

Hi David

> -----Original Message-----
> From: David Hildenbrand <david@redhat.com>
> Sent: Tuesday, July 7, 2020 7:34 PM
> To: Justin He <Justin.He@arm.com>; Catalin Marinas
> <Catalin.Marinas@arm.com>; Will Deacon <will@kernel.org>; Dan Williams
> <dan.j.williams@intel.com>; Vishal Verma <vishal.l.verma@intel.com>; Dave
> Jiang <dave.jiang@intel.com>
> Cc: Michal Hocko <mhocko@suse.com>; Andrew Morton <akpm@linux-
> foundation.org>; Mike Rapoport <rppt@linux.ibm.com>; Baoquan He
> <bhe@redhat.com>; Chuhong Yuan <hslester96@gmail.com>; linux-arm-
> kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux-
> mm@kvack.org; linux-nvdimm@lists.01.org; Kaly Xin <Kaly.Xin@arm.com>
> Subject: Re: [RFC PATCH v2 2/3] device-dax: use fallback nid when
> numa_node is invalid
> 
> On 07.07.20 07:59, Jia He wrote:
> > Previously, numa_off is set unconditionally at the end of
> dummy_numa_init(),
> > even with a fake numa node. Then ACPI detects node id as NUMA_NO_NODE(-1)
> in
> > acpi_map_pxm_to_node() because it regards numa_off as turning off the
> numa
> > node. Hence dev_dax->target_node is NUMA_NO_NODE on arm64 with fake numa.
> >
> > Without this patch, pmem can't be probed as a RAM device on arm64 if
> SRAT table
> > isn't present:
> > $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g -
> a 64K
> > kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] with
> invalid node: -1
> > kmem: probe of dax0.0 failed with error -22
> >
> > This fixes it by using fallback memory_add_physaddr_to_nid() as nid.
> >
> > Suggested-by: David Hildenbrand <david@redhat.com>
> > Signed-off-by: Jia He <justin.he@arm.com>
> > ---
> > I noticed that on powerpc memory_add_physaddr_to_nid is not exported for
> module
> > driver. Set it to RFC due to this concern.
> >
> >  drivers/dax/kmem.c | 22 ++++++++++++++--------
> >  1 file changed, 14 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> > index 275aa5f87399..68e693ca6d59 100644
> > --- a/drivers/dax/kmem.c
> > +++ b/drivers/dax/kmem.c
> > @@ -28,20 +28,22 @@ int dev_dax_kmem_probe(struct device *dev)
> >  	resource_size_t kmem_end;
> >  	struct resource *new_res;
> >  	const char *new_res_name;
> > -	int numa_node;
> > +	int numa_node, new_node;
> >  	int rc;
> >
> >  	/*
> >  	 * Ensure good NUMA information for the persistent memory.
> > -	 * Without this check, there is a risk that slow memory
> > -	 * could be mixed in a node with faster memory, causing
> > -	 * unavoidable performance issues.
> > +	 * Without this check, there is a risk but not fatal that slow
> > +	 * memory could be mixed in a node with faster memory, causing
> > +	 * unavoidable performance issues. Furthermore, fallback node
> > +	 * id can be used when numa_node is invalid.
> >  	 */
> >  	numa_node = dev_dax->target_node;
> >  	if (numa_node < 0) {
> > -		dev_warn(dev, "rejecting DAX region %pR with invalid
> node: %d\n",
> > -			 res, numa_node);
> > -		return -EINVAL;
> > +		new_node = memory_add_physaddr_to_nid(kmem_start);
> > +		dev_info(dev, "changing nid from %d to %d for DAX
> region %pR\n",
> > +			numa_node, new_node, res);
> > +		numa_node = new_node;
> 
> Now, the warning does not really make sense. We have NUMA_NO_NODE (< 0),
> that is not a change in the nid, but a selection of a nid. Printing
> NUMA_NO_NODE does not make too much sense. I suggest just getting rid of
> new_node and turning the dev_info() into something like
> 
> dev_info(dev, "using nid %d for DAX region with undefined nid %pR\n",
>          numa_node, res);
> 
Okay, I will update it as your sugguestion. Thanks

--
Cheers,
Justin (Jia He)



^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-07 11:54   ` Michal Hocko
  2020-07-07 12:13     ` Mike Rapoport
@ 2020-07-08  2:20     ` Justin He
  2020-07-08  3:56       ` Dan Williams
  1 sibling, 1 reply; 46+ messages in thread
From: Justin He @ 2020-07-08  2:20 UTC (permalink / raw)
  To: Michal Hocko, David Hildenbrand
  Cc: Catalin Marinas, Will Deacon, Dan Williams, Vishal Verma,
	Dave Jiang, Andrew Morton, Mike Rapoport, Baoquan He,
	Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm,
	linux-nvdimm, Kaly Xin

Hi Michal and David

> -----Original Message-----
> From: Michal Hocko <mhocko@kernel.org>
> Sent: Tuesday, July 7, 2020 7:55 PM
> To: Justin He <Justin.He@arm.com>
> Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Will Deacon
> <will@kernel.org>; Dan Williams <dan.j.williams@intel.com>; Vishal Verma
> <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com>; Andrew
> Morton <akpm@linux-foundation.org>; Mike Rapoport <rppt@linux.ibm.com>;
> Baoquan He <bhe@redhat.com>; Chuhong Yuan <hslester96@gmail.com>; linux-
> arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux-
> mm@kvack.org; linux-nvdimm@lists.01.org; Kaly Xin <Kaly.Xin@arm.com>
> Subject: Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid
> as EXPORT_SYMBOL_GPL
> 
> On Tue 07-07-20 13:59:15, Jia He wrote:
> > This exports memory_add_physaddr_to_nid() for module driver to use.
> >
> > memory_add_physaddr_to_nid() is a fallback option to get the nid in case
> > NUMA_NO_NID is detected.
> >
> > Suggested-by: David Hildenbrand <david@redhat.com>
> > Signed-off-by: Jia He <justin.he@arm.com>
> > ---
> >  arch/arm64/mm/numa.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> > index aafcee3e3f7e..7eeb31740248 100644
> > --- a/arch/arm64/mm/numa.c
> > +++ b/arch/arm64/mm/numa.c
> > @@ -464,10 +464,11 @@ void __init arm64_numa_init(void)
> >
> >  /*
> >   * We hope that we will be hotplugging memory on nodes we already know
> about,
> > - * such that acpi_get_node() succeeds and we never fall back to this...
> > + * such that acpi_get_node() succeeds. But when SRAT is not present,
> the node
> > + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback
> option.
> >   */
> >  int memory_add_physaddr_to_nid(u64 addr)
> >  {
> > -	pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n",
> addr);
> >  	return 0;
> >  }
> > +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
> 
> Does it make sense to export a noop function? Wouldn't make more sense
> to simply make it static inline somewhere in a header? I haven't checked
> whether there is an easy way to do that sanely bu this just hit my eyes.

Okay, I can make a change in memory_hotplug.h, sth like:
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -149,13 +149,13 @@ int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
              struct mhp_params *params);
 #endif /* ARCH_HAS_ADD_PAGES */
 
-#ifdef CONFIG_NUMA
-extern int memory_add_physaddr_to_nid(u64 start);
-#else
+#if !defined(CONFIG_NUMA) || !defined(memory_add_physaddr_to_nid)
 static inline int memory_add_physaddr_to_nid(u64 start)
 {
        return 0;
 }
+#else
+extern int memory_add_physaddr_to_nid(u64 start);
 #endif

And then check the memory_add_physaddr_to_nid() helper on all arches,
if it is noop(return 0), I can simply remove it.
if it is not noop, after the helper, 
#define memory_add_physaddr_to_nid

What do you think of this proposal?

--
Cheers,
Justin (Jia He)




^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-08  2:20     ` Justin He
@ 2020-07-08  3:56       ` Dan Williams
  2020-07-08  4:08         ` Justin He
  2020-07-08  5:32         ` Mike Rapoport
  0 siblings, 2 replies; 46+ messages in thread
From: Dan Williams @ 2020-07-08  3:56 UTC (permalink / raw)
  To: Justin He
  Cc: Michal Hocko, David Hildenbrand, Catalin Marinas, Will Deacon,
	Vishal Verma, Dave Jiang, Andrew Morton, Mike Rapoport,
	Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel,
	linux-mm, linux-nvdimm, Kaly Xin

On Tue, Jul 7, 2020 at 7:20 PM Justin He <Justin.He@arm.com> wrote:
>
> Hi Michal and David
>
> > -----Original Message-----
> > From: Michal Hocko <mhocko@kernel.org>
> > Sent: Tuesday, July 7, 2020 7:55 PM
> > To: Justin He <Justin.He@arm.com>
> > Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Will Deacon
> > <will@kernel.org>; Dan Williams <dan.j.williams@intel.com>; Vishal Verma
> > <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com>; Andrew
> > Morton <akpm@linux-foundation.org>; Mike Rapoport <rppt@linux.ibm.com>;
> > Baoquan He <bhe@redhat.com>; Chuhong Yuan <hslester96@gmail.com>; linux-
> > arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux-
> > mm@kvack.org; linux-nvdimm@lists.01.org; Kaly Xin <Kaly.Xin@arm.com>
> > Subject: Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid
> > as EXPORT_SYMBOL_GPL
> >
> > On Tue 07-07-20 13:59:15, Jia He wrote:
> > > This exports memory_add_physaddr_to_nid() for module driver to use.
> > >
> > > memory_add_physaddr_to_nid() is a fallback option to get the nid in case
> > > NUMA_NO_NID is detected.
> > >
> > > Suggested-by: David Hildenbrand <david@redhat.com>
> > > Signed-off-by: Jia He <justin.he@arm.com>
> > > ---
> > >  arch/arm64/mm/numa.c | 5 +++--
> > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> > > index aafcee3e3f7e..7eeb31740248 100644
> > > --- a/arch/arm64/mm/numa.c
> > > +++ b/arch/arm64/mm/numa.c
> > > @@ -464,10 +464,11 @@ void __init arm64_numa_init(void)
> > >
> > >  /*
> > >   * We hope that we will be hotplugging memory on nodes we already know
> > about,
> > > - * such that acpi_get_node() succeeds and we never fall back to this...
> > > + * such that acpi_get_node() succeeds. But when SRAT is not present,
> > the node
> > > + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback
> > option.
> > >   */
> > >  int memory_add_physaddr_to_nid(u64 addr)
> > >  {
> > > -   pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n",
> > addr);
> > >     return 0;
> > >  }
> > > +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
> >
> > Does it make sense to export a noop function? Wouldn't make more sense
> > to simply make it static inline somewhere in a header? I haven't checked
> > whether there is an easy way to do that sanely bu this just hit my eyes.
>
> Okay, I can make a change in memory_hotplug.h, sth like:
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -149,13 +149,13 @@ int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
>               struct mhp_params *params);
>  #endif /* ARCH_HAS_ADD_PAGES */
>
> -#ifdef CONFIG_NUMA
> -extern int memory_add_physaddr_to_nid(u64 start);
> -#else
> +#if !defined(CONFIG_NUMA) || !defined(memory_add_physaddr_to_nid)
>  static inline int memory_add_physaddr_to_nid(u64 start)
>  {
>         return 0;
>  }
> +#else
> +extern int memory_add_physaddr_to_nid(u64 start);
>  #endif
>
> And then check the memory_add_physaddr_to_nid() helper on all arches,
> if it is noop(return 0), I can simply remove it.
> if it is not noop, after the helper,
> #define memory_add_physaddr_to_nid
>
> What do you think of this proposal?

Especially for architectures that use memblock info for numa info
(which seems to be everyone except x86) why not implement a generic
memory_add_physaddr_to_nid() that does:

int memory_add_physaddr_to_nid(u64 addr)
{
        unsigned long start_pfn, end_pfn, pfn = PHYS_PFN(addr);
        int nid;

        for_each_online_node(nid) {
                get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
                if (pfn >= start_pfn && pfn <= end_pfn)
                        return nid;
        }
        return NUMA_NO_NODE;
}


^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-08  3:56       ` Dan Williams
@ 2020-07-08  4:08         ` Justin He
  2020-07-08  4:27           ` Dan Williams
  2020-07-08  5:32         ` Mike Rapoport
  1 sibling, 1 reply; 46+ messages in thread
From: Justin He @ 2020-07-08  4:08 UTC (permalink / raw)
  To: Dan Williams
  Cc: Michal Hocko, David Hildenbrand, Catalin Marinas, Will Deacon,
	Vishal Verma, Dave Jiang, Andrew Morton, Mike Rapoport,
	Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel,
	linux-mm, linux-nvdimm, Kaly Xin

Hi Dan

> -----Original Message-----
> From: Dan Williams <dan.j.williams@intel.com>
> Sent: Wednesday, July 8, 2020 11:57 AM
> To: Justin He <Justin.He@arm.com>
> Cc: Michal Hocko <mhocko@kernel.org>; David Hildenbrand <david@redhat.com>;
> Catalin Marinas <Catalin.Marinas@arm.com>; Will Deacon <will@kernel.org>;
> Vishal Verma <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com>;
> Andrew Morton <akpm@linux-foundation.org>; Mike Rapoport
> <rppt@linux.ibm.com>; Baoquan He <bhe@redhat.com>; Chuhong Yuan
> <hslester96@gmail.com>; linux-arm-kernel@lists.infradead.org; linux-
> kernel@vger.kernel.org; linux-mm@kvack.org; linux-nvdimm@lists.01.org;
> Kaly Xin <Kaly.Xin@arm.com>
> Subject: Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid
> as EXPORT_SYMBOL_GPL
> 
> On Tue, Jul 7, 2020 at 7:20 PM Justin He <Justin.He@arm.com> wrote:
> >
> > Hi Michal and David
> >
> > > -----Original Message-----
> > > From: Michal Hocko <mhocko@kernel.org>
> > > Sent: Tuesday, July 7, 2020 7:55 PM
> > > To: Justin He <Justin.He@arm.com>
> > > Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Will Deacon
> > > <will@kernel.org>; Dan Williams <dan.j.williams@intel.com>; Vishal
> Verma
> > > <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com>; Andrew
> > > Morton <akpm@linux-foundation.org>; Mike Rapoport <rppt@linux.ibm.com>;
> > > Baoquan He <bhe@redhat.com>; Chuhong Yuan <hslester96@gmail.com>;
> linux-
> > > arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux-
> > > mm@kvack.org; linux-nvdimm@lists.01.org; Kaly Xin <Kaly.Xin@arm.com>
> > > Subject: Re: [PATCH v2 1/3] arm64/numa: export
> memory_add_physaddr_to_nid
> > > as EXPORT_SYMBOL_GPL
> > >
> > > On Tue 07-07-20 13:59:15, Jia He wrote:
> > > > This exports memory_add_physaddr_to_nid() for module driver to use.
> > > >
> > > > memory_add_physaddr_to_nid() is a fallback option to get the nid in
> case
> > > > NUMA_NO_NID is detected.
> > > >
> > > > Suggested-by: David Hildenbrand <david@redhat.com>
> > > > Signed-off-by: Jia He <justin.he@arm.com>
> > > > ---
> > > >  arch/arm64/mm/numa.c | 5 +++--
> > > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> > > > index aafcee3e3f7e..7eeb31740248 100644
> > > > --- a/arch/arm64/mm/numa.c
> > > > +++ b/arch/arm64/mm/numa.c
> > > > @@ -464,10 +464,11 @@ void __init arm64_numa_init(void)
> > > >
> > > >  /*
> > > >   * We hope that we will be hotplugging memory on nodes we already
> know
> > > about,
> > > > - * such that acpi_get_node() succeeds and we never fall back to
> this...
> > > > + * such that acpi_get_node() succeeds. But when SRAT is not present,
> > > the node
> > > > + * id may be probed as NUMA_NO_NODE by acpi, Here provide a
> fallback
> > > option.
> > > >   */
> > > >  int memory_add_physaddr_to_nid(u64 addr)
> > > >  {
> > > > -   pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n",
> > > addr);
> > > >     return 0;
> > > >  }
> > > > +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
> > >
> > > Does it make sense to export a noop function? Wouldn't make more sense
> > > to simply make it static inline somewhere in a header? I haven't
> checked
> > > whether there is an easy way to do that sanely bu this just hit my
> eyes.
> >
> > Okay, I can make a change in memory_hotplug.h, sth like:
> > --- a/include/linux/memory_hotplug.h
> > +++ b/include/linux/memory_hotplug.h
> > @@ -149,13 +149,13 @@ int add_pages(int nid, unsigned long start_pfn,
> unsigned long nr_pages,
> >               struct mhp_params *params);
> >  #endif /* ARCH_HAS_ADD_PAGES */
> >
> > -#ifdef CONFIG_NUMA
> > -extern int memory_add_physaddr_to_nid(u64 start);
> > -#else
> > +#if !defined(CONFIG_NUMA) || !defined(memory_add_physaddr_to_nid)
> >  static inline int memory_add_physaddr_to_nid(u64 start)
> >  {
> >         return 0;
> >  }
> > +#else
> > +extern int memory_add_physaddr_to_nid(u64 start);
> >  #endif
> >
> > And then check the memory_add_physaddr_to_nid() helper on all arches,
> > if it is noop(return 0), I can simply remove it.
> > if it is not noop, after the helper,
> > #define memory_add_physaddr_to_nid
> >
> > What do you think of this proposal?
> 
> Especially for architectures that use memblock info for numa info
> (which seems to be everyone except x86) why not implement a generic
> memory_add_physaddr_to_nid() that does:
> 
> int memory_add_physaddr_to_nid(u64 addr)
> {
>         unsigned long start_pfn, end_pfn, pfn = PHYS_PFN(addr);
>         int nid;
> 
>         for_each_online_node(nid) {
>                 get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
>                 if (pfn >= start_pfn && pfn <= end_pfn)
>                         return nid;
>         }
>         return NUMA_NO_NODE;
> }

Thanks for your suggestion,
Could I wrap the codes and let memory_add_physaddr_to_nid simply invoke
phys_to_target_node()? 


--
Cheers,
Justin (Jia He)



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-08  4:08         ` Justin He
@ 2020-07-08  4:27           ` Dan Williams
  2020-07-08  6:22             ` Mike Rapoport
  0 siblings, 1 reply; 46+ messages in thread
From: Dan Williams @ 2020-07-08  4:27 UTC (permalink / raw)
  To: Justin He
  Cc: Michal Hocko, David Hildenbrand, Catalin Marinas, Will Deacon,
	Vishal Verma, Dave Jiang, Andrew Morton, Mike Rapoport,
	Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel,
	linux-mm, linux-nvdimm, Kaly Xin

On Tue, Jul 7, 2020 at 9:08 PM Justin He <Justin.He@arm.com> wrote:
[..]
> > Especially for architectures that use memblock info for numa info
> > (which seems to be everyone except x86) why not implement a generic
> > memory_add_physaddr_to_nid() that does:
> >
> > int memory_add_physaddr_to_nid(u64 addr)
> > {
> >         unsigned long start_pfn, end_pfn, pfn = PHYS_PFN(addr);
> >         int nid;
> >
> >         for_each_online_node(nid) {
> >                 get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
> >                 if (pfn >= start_pfn && pfn <= end_pfn)
> >                         return nid;
> >         }
> >         return NUMA_NO_NODE;
> > }
>
> Thanks for your suggestion,
> Could I wrap the codes and let memory_add_physaddr_to_nid simply invoke
> phys_to_target_node()?

I think it needs to be the reverse. phys_to_target_node() should call
memory_add_physaddr_to_nid() by default, but fall back to searching
reserved memory address ranges in memblock. See phys_to_target_node()
in arch/x86/mm/numa.c. That one uses numa_meminfo instead of memblock,
but the principle is the same i.e. that a target node may not be
represented in memblock.memory, but memblock.reserved. I'm working on
a patch to provide a function similar to get_pfn_range_for_nid() that
operates on reserved memory.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-07 22:05           ` Dan Williams
@ 2020-07-08  5:27             ` Mike Rapoport
  2020-07-08  7:21               ` David Hildenbrand
  0 siblings, 1 reply; 46+ messages in thread
From: Mike Rapoport @ 2020-07-08  5:27 UTC (permalink / raw)
  To: Dan Williams
  Cc: David Hildenbrand, Michal Hocko, Jia He, Catalin Marinas,
	Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He,
	Chuhong Yuan, Linux ARM, Linux Kernel Mailing List, Linux MM,
	linux-nvdimm, Kaly Xin

On Tue, Jul 07, 2020 at 03:05:48PM -0700, Dan Williams wrote:
> On Tue, Jul 7, 2020 at 11:01 AM Mike Rapoport <rppt@linux.ibm.com> wrote:
> >
> > On Tue, Jul 07, 2020 at 02:26:08PM +0200, David Hildenbrand wrote:
> > > On 07.07.20 14:13, Mike Rapoport wrote:
> > > > On Tue, Jul 07, 2020 at 01:54:54PM +0200, Michal Hocko wrote:
> > > >> On Tue 07-07-20 13:59:15, Jia He wrote:
> > > >>> This exports memory_add_physaddr_to_nid() for module driver to use.
> > > >>>
> > > >>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case
> > > >>> NUMA_NO_NID is detected.
> > > >>>
> > > >>> Suggested-by: David Hildenbrand <david@redhat.com>
> > > >>> Signed-off-by: Jia He <justin.he@arm.com>
> > > >>> ---
> > > >>>  arch/arm64/mm/numa.c | 5 +++--
> > > >>>  1 file changed, 3 insertions(+), 2 deletions(-)
> > > >>>
> > > >>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> > > >>> index aafcee3e3f7e..7eeb31740248 100644
> > > >>> --- a/arch/arm64/mm/numa.c
> > > >>> +++ b/arch/arm64/mm/numa.c
> > > >>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void)
> > > >>>
> > > >>>  /*
> > > >>>   * We hope that we will be hotplugging memory on nodes we already know about,
> > > >>> - * such that acpi_get_node() succeeds and we never fall back to this...
> > > >>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node
> > > >>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option.
> > > >>>   */
> > > >>>  int memory_add_physaddr_to_nid(u64 addr)
> > > >>>  {
> > > >>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr);
> > > >>>   return 0;
> > > >>>  }
> > > >>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
> > > >>
> > > >> Does it make sense to export a noop function? Wouldn't make more sense
> > > >> to simply make it static inline somewhere in a header? I haven't checked
> > > >> whether there is an easy way to do that sanely bu this just hit my eyes.
> > > >
> > > > We'll need to either add a CONFIG_ option or arch specific callback to
> > > > make both non-empty (x86, powerpc, ia64) and empty (arm64, sh)
> > > > implementations coexist ...
> > >
> > > Note: I have a similar dummy (return 0) patch for s390x lying around here.
> >
> > Then we'll call it a tie - 3:3 ;-)
> 
> So I'd be happy to jump on the train of people wanting to export the
> ARM stub for this (and add a new ARM stub for phys_to_target_node()),
> but Will did have a plausibly better idea that I have been meaning to
> circle back to:
> 
> http://lore.kernel.org/r/20200325111039.GA32109@willie-the-truck
> 
> ...i.e. iterate over node data to do the lookup. This would seem to
> work generically for multiple archs unless I am missing something?

I think it would work on arm64, power and, most propbably on s390
(David?), but not on x86. x86 does not have reserved memory in pgdat,
it's never memblock_add()'ed (see e820__memblock_setup()).

I've suggested to add E820_*_RESERVED to memblock.memory a while ago
[1], but apparently there are systems that cannot tolerate OS mappings
of the BIOS reserved areas.

[1] https://lore.kernel.org/lkml/20200522142053.GW1059226@linux.ibm.com/

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-08  3:56       ` Dan Williams
  2020-07-08  4:08         ` Justin He
@ 2020-07-08  5:32         ` Mike Rapoport
  2020-07-08  5:48           ` Dan Williams
  1 sibling, 1 reply; 46+ messages in thread
From: Mike Rapoport @ 2020-07-08  5:32 UTC (permalink / raw)
  To: Dan Williams
  Cc: Justin He, Michal Hocko, David Hildenbrand, Catalin Marinas,
	Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He,
	Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm,
	linux-nvdimm, Kaly Xin

On Tue, Jul 07, 2020 at 08:56:36PM -0700, Dan Williams wrote:
> On Tue, Jul 7, 2020 at 7:20 PM Justin He <Justin.He@arm.com> wrote:
> >
> > Hi Michal and David
> >
> > > -----Original Message-----
> > > From: Michal Hocko <mhocko@kernel.org>
> > > Sent: Tuesday, July 7, 2020 7:55 PM
> > > To: Justin He <Justin.He@arm.com>
> > > Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Will Deacon
> > > <will@kernel.org>; Dan Williams <dan.j.williams@intel.com>; Vishal Verma
> > > <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com>; Andrew
> > > Morton <akpm@linux-foundation.org>; Mike Rapoport <rppt@linux.ibm.com>;
> > > Baoquan He <bhe@redhat.com>; Chuhong Yuan <hslester96@gmail.com>; linux-
> > > arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux-
> > > mm@kvack.org; linux-nvdimm@lists.01.org; Kaly Xin <Kaly.Xin@arm.com>
> > > Subject: Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid
> > > as EXPORT_SYMBOL_GPL
> > >
> > > On Tue 07-07-20 13:59:15, Jia He wrote:
> > > > This exports memory_add_physaddr_to_nid() for module driver to use.
> > > >
> > > > memory_add_physaddr_to_nid() is a fallback option to get the nid in case
> > > > NUMA_NO_NID is detected.
> > > >
> > > > Suggested-by: David Hildenbrand <david@redhat.com>
> > > > Signed-off-by: Jia He <justin.he@arm.com>
> > > > ---
> > > >  arch/arm64/mm/numa.c | 5 +++--
> > > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> > > > index aafcee3e3f7e..7eeb31740248 100644
> > > > --- a/arch/arm64/mm/numa.c
> > > > +++ b/arch/arm64/mm/numa.c
> > > > @@ -464,10 +464,11 @@ void __init arm64_numa_init(void)
> > > >
> > > >  /*
> > > >   * We hope that we will be hotplugging memory on nodes we already know
> > > about,
> > > > - * such that acpi_get_node() succeeds and we never fall back to this...
> > > > + * such that acpi_get_node() succeeds. But when SRAT is not present,
> > > the node
> > > > + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback
> > > option.
> > > >   */
> > > >  int memory_add_physaddr_to_nid(u64 addr)
> > > >  {
> > > > -   pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n",
> > > addr);
> > > >     return 0;
> > > >  }
> > > > +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
> > >
> > > Does it make sense to export a noop function? Wouldn't make more sense
> > > to simply make it static inline somewhere in a header? I haven't checked
> > > whether there is an easy way to do that sanely bu this just hit my eyes.
> >
> > Okay, I can make a change in memory_hotplug.h, sth like:
> > --- a/include/linux/memory_hotplug.h
> > +++ b/include/linux/memory_hotplug.h
> > @@ -149,13 +149,13 @@ int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
> >               struct mhp_params *params);
> >  #endif /* ARCH_HAS_ADD_PAGES */
> >
> > -#ifdef CONFIG_NUMA
> > -extern int memory_add_physaddr_to_nid(u64 start);
> > -#else
> > +#if !defined(CONFIG_NUMA) || !defined(memory_add_physaddr_to_nid)
> >  static inline int memory_add_physaddr_to_nid(u64 start)
> >  {
> >         return 0;
> >  }
> > +#else
> > +extern int memory_add_physaddr_to_nid(u64 start);
> >  #endif
> >
> > And then check the memory_add_physaddr_to_nid() helper on all arches,
> > if it is noop(return 0), I can simply remove it.
> > if it is not noop, after the helper,
> > #define memory_add_physaddr_to_nid
> >
> > What do you think of this proposal?
> 
> Especially for architectures that use memblock info for numa info
> (which seems to be everyone except x86) why not implement a generic
> memory_add_physaddr_to_nid() that does:

That would be only arm64.

> int memory_add_physaddr_to_nid(u64 addr)
> {
>         unsigned long start_pfn, end_pfn, pfn = PHYS_PFN(addr);
>         int nid;
> 
>         for_each_online_node(nid) {
>                 get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
>                 if (pfn >= start_pfn && pfn <= end_pfn)
>                         return nid;
>         }
>         return NUMA_NO_NODE;
> }

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-08  5:32         ` Mike Rapoport
@ 2020-07-08  5:48           ` Dan Williams
  2020-07-08  6:19             ` Mike Rapoport
  2020-07-08  6:56             ` Justin He
  0 siblings, 2 replies; 46+ messages in thread
From: Dan Williams @ 2020-07-08  5:48 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Justin He, Michal Hocko, David Hildenbrand, Catalin Marinas,
	Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He,
	Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm,
	linux-nvdimm, Kaly Xin

On Tue, Jul 7, 2020 at 10:33 PM Mike Rapoport <rppt@linux.ibm.com> wrote:
>
> On Tue, Jul 07, 2020 at 08:56:36PM -0700, Dan Williams wrote:
> > On Tue, Jul 7, 2020 at 7:20 PM Justin He <Justin.He@arm.com> wrote:
> > >
> > > Hi Michal and David
> > >
> > > > -----Original Message-----
> > > > From: Michal Hocko <mhocko@kernel.org>
> > > > Sent: Tuesday, July 7, 2020 7:55 PM
> > > > To: Justin He <Justin.He@arm.com>
> > > > Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Will Deacon
> > > > <will@kernel.org>; Dan Williams <dan.j.williams@intel.com>; Vishal Verma
> > > > <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com>; Andrew
> > > > Morton <akpm@linux-foundation.org>; Mike Rapoport <rppt@linux.ibm.com>;
> > > > Baoquan He <bhe@redhat.com>; Chuhong Yuan <hslester96@gmail.com>; linux-
> > > > arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux-
> > > > mm@kvack.org; linux-nvdimm@lists.01.org; Kaly Xin <Kaly.Xin@arm.com>
> > > > Subject: Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid
> > > > as EXPORT_SYMBOL_GPL
> > > >
> > > > On Tue 07-07-20 13:59:15, Jia He wrote:
> > > > > This exports memory_add_physaddr_to_nid() for module driver to use.
> > > > >
> > > > > memory_add_physaddr_to_nid() is a fallback option to get the nid in case
> > > > > NUMA_NO_NID is detected.
> > > > >
> > > > > Suggested-by: David Hildenbrand <david@redhat.com>
> > > > > Signed-off-by: Jia He <justin.he@arm.com>
> > > > > ---
> > > > >  arch/arm64/mm/numa.c | 5 +++--
> > > > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> > > > > index aafcee3e3f7e..7eeb31740248 100644
> > > > > --- a/arch/arm64/mm/numa.c
> > > > > +++ b/arch/arm64/mm/numa.c
> > > > > @@ -464,10 +464,11 @@ void __init arm64_numa_init(void)
> > > > >
> > > > >  /*
> > > > >   * We hope that we will be hotplugging memory on nodes we already know
> > > > about,
> > > > > - * such that acpi_get_node() succeeds and we never fall back to this...
> > > > > + * such that acpi_get_node() succeeds. But when SRAT is not present,
> > > > the node
> > > > > + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback
> > > > option.
> > > > >   */
> > > > >  int memory_add_physaddr_to_nid(u64 addr)
> > > > >  {
> > > > > -   pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n",
> > > > addr);
> > > > >     return 0;
> > > > >  }
> > > > > +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
> > > >
> > > > Does it make sense to export a noop function? Wouldn't make more sense
> > > > to simply make it static inline somewhere in a header? I haven't checked
> > > > whether there is an easy way to do that sanely bu this just hit my eyes.
> > >
> > > Okay, I can make a change in memory_hotplug.h, sth like:
> > > --- a/include/linux/memory_hotplug.h
> > > +++ b/include/linux/memory_hotplug.h
> > > @@ -149,13 +149,13 @@ int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
> > >               struct mhp_params *params);
> > >  #endif /* ARCH_HAS_ADD_PAGES */
> > >
> > > -#ifdef CONFIG_NUMA
> > > -extern int memory_add_physaddr_to_nid(u64 start);
> > > -#else
> > > +#if !defined(CONFIG_NUMA) || !defined(memory_add_physaddr_to_nid)
> > >  static inline int memory_add_physaddr_to_nid(u64 start)
> > >  {
> > >         return 0;
> > >  }
> > > +#else
> > > +extern int memory_add_physaddr_to_nid(u64 start);
> > >  #endif
> > >
> > > And then check the memory_add_physaddr_to_nid() helper on all arches,
> > > if it is noop(return 0), I can simply remove it.
> > > if it is not noop, after the helper,
> > > #define memory_add_physaddr_to_nid
> > >
> > > What do you think of this proposal?
> >
> > Especially for architectures that use memblock info for numa info
> > (which seems to be everyone except x86) why not implement a generic
> > memory_add_physaddr_to_nid() that does:
>
> That would be only arm64.
>

Darn, I saw ARCH_KEEP_MEMBLOCK and had delusions of grandeur that it
could solve my numa api woes. At least for x86 the problem is already
solved with reserved numa_meminfo, but now I'm trying to write generic
drivers that use those apis and finding these gaps on other archs.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-08  5:48           ` Dan Williams
@ 2020-07-08  6:19             ` Mike Rapoport
  2020-07-08  6:44               ` Dan Williams
  2020-07-08  6:56             ` Justin He
  1 sibling, 1 reply; 46+ messages in thread
From: Mike Rapoport @ 2020-07-08  6:19 UTC (permalink / raw)
  To: Dan Williams
  Cc: Justin He, Michal Hocko, David Hildenbrand, Catalin Marinas,
	Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He,
	Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm,
	linux-nvdimm, Kaly Xin

On Tue, Jul 07, 2020 at 10:48:19PM -0700, Dan Williams wrote:
> On Tue, Jul 7, 2020 at 10:33 PM Mike Rapoport <rppt@linux.ibm.com> wrote:
> >
> > On Tue, Jul 07, 2020 at 08:56:36PM -0700, Dan Williams wrote:
> > > On Tue, Jul 7, 2020 at 7:20 PM Justin He <Justin.He@arm.com> wrote:
> > > >
> > > > Hi Michal and David
> > > >
> > > > > -----Original Message-----
> > > > > From: Michal Hocko <mhocko@kernel.org>
> > > > > Sent: Tuesday, July 7, 2020 7:55 PM
> > > > > To: Justin He <Justin.He@arm.com>
> > > > > Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Will Deacon
> > > > > <will@kernel.org>; Dan Williams <dan.j.williams@intel.com>; Vishal Verma
> > > > > <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com>; Andrew
> > > > > Morton <akpm@linux-foundation.org>; Mike Rapoport <rppt@linux.ibm.com>;
> > > > > Baoquan He <bhe@redhat.com>; Chuhong Yuan <hslester96@gmail.com>; linux-
> > > > > arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux-
> > > > > mm@kvack.org; linux-nvdimm@lists.01.org; Kaly Xin <Kaly.Xin@arm.com>
> > > > > Subject: Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid
> > > > > as EXPORT_SYMBOL_GPL
> > > > >
> > > > > On Tue 07-07-20 13:59:15, Jia He wrote:
> > > > > > This exports memory_add_physaddr_to_nid() for module driver to use.
> > > > > >
> > > > > > memory_add_physaddr_to_nid() is a fallback option to get the nid in case
> > > > > > NUMA_NO_NID is detected.
> > > > > >
> > > > > > Suggested-by: David Hildenbrand <david@redhat.com>
> > > > > > Signed-off-by: Jia He <justin.he@arm.com>
> > > > > > ---
> > > > > >  arch/arm64/mm/numa.c | 5 +++--
> > > > > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> > > > > > index aafcee3e3f7e..7eeb31740248 100644
> > > > > > --- a/arch/arm64/mm/numa.c
> > > > > > +++ b/arch/arm64/mm/numa.c
> > > > > > @@ -464,10 +464,11 @@ void __init arm64_numa_init(void)
> > > > > >
> > > > > >  /*
> > > > > >   * We hope that we will be hotplugging memory on nodes we already know
> > > > > about,
> > > > > > - * such that acpi_get_node() succeeds and we never fall back to this...
> > > > > > + * such that acpi_get_node() succeeds. But when SRAT is not present,
> > > > > the node
> > > > > > + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback
> > > > > option.
> > > > > >   */
> > > > > >  int memory_add_physaddr_to_nid(u64 addr)
> > > > > >  {
> > > > > > -   pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n",
> > > > > addr);
> > > > > >     return 0;
> > > > > >  }
> > > > > > +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
> > > > >
> > > > > Does it make sense to export a noop function? Wouldn't make more sense
> > > > > to simply make it static inline somewhere in a header? I haven't checked
> > > > > whether there is an easy way to do that sanely bu this just hit my eyes.
> > > >
> > > > Okay, I can make a change in memory_hotplug.h, sth like:
> > > > --- a/include/linux/memory_hotplug.h
> > > > +++ b/include/linux/memory_hotplug.h
> > > > @@ -149,13 +149,13 @@ int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
> > > >               struct mhp_params *params);
> > > >  #endif /* ARCH_HAS_ADD_PAGES */
> > > >
> > > > -#ifdef CONFIG_NUMA
> > > > -extern int memory_add_physaddr_to_nid(u64 start);
> > > > -#else
> > > > +#if !defined(CONFIG_NUMA) || !defined(memory_add_physaddr_to_nid)
> > > >  static inline int memory_add_physaddr_to_nid(u64 start)
> > > >  {
> > > >         return 0;
> > > >  }
> > > > +#else
> > > > +extern int memory_add_physaddr_to_nid(u64 start);
> > > >  #endif
> > > >
> > > > And then check the memory_add_physaddr_to_nid() helper on all arches,
> > > > if it is noop(return 0), I can simply remove it.
> > > > if it is not noop, after the helper,
> > > > #define memory_add_physaddr_to_nid
> > > >
> > > > What do you think of this proposal?
> > >
> > > Especially for architectures that use memblock info for numa info
> > > (which seems to be everyone except x86) why not implement a generic
> > > memory_add_physaddr_to_nid() that does:
> >
> > That would be only arm64.
> >
> 
> Darn, I saw ARCH_KEEP_MEMBLOCK and had delusions of grandeur that it
> could solve my numa api woes. At least for x86 the problem is already
> solved with reserved numa_meminfo, but now I'm trying to write generic
> drivers that use those apis and finding these gaps on other archs.

I'm not sure if x86's numa_meminfo is a part of the solution or a part
of the problem ;-)
Anyway, this all indeed messy and there is a lot to improve there.

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-08  4:27           ` Dan Williams
@ 2020-07-08  6:22             ` Mike Rapoport
  2020-07-08  6:53               ` Dan Williams
  2020-07-08  6:59               ` David Hildenbrand
  0 siblings, 2 replies; 46+ messages in thread
From: Mike Rapoport @ 2020-07-08  6:22 UTC (permalink / raw)
  To: Dan Williams
  Cc: Justin He, Michal Hocko, David Hildenbrand, Catalin Marinas,
	Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He,
	Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm,
	linux-nvdimm, Kaly Xin

On Tue, Jul 07, 2020 at 09:27:43PM -0700, Dan Williams wrote:
> On Tue, Jul 7, 2020 at 9:08 PM Justin He <Justin.He@arm.com> wrote:
> [..]
> > > Especially for architectures that use memblock info for numa info
> > > (which seems to be everyone except x86) why not implement a generic
> > > memory_add_physaddr_to_nid() that does:
> > >
> > > int memory_add_physaddr_to_nid(u64 addr)
> > > {
> > >         unsigned long start_pfn, end_pfn, pfn = PHYS_PFN(addr);
> > >         int nid;
> > >
> > >         for_each_online_node(nid) {
> > >                 get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
> > >                 if (pfn >= start_pfn && pfn <= end_pfn)
> > >                         return nid;
> > >         }
> > >         return NUMA_NO_NODE;
> > > }
> >
> > Thanks for your suggestion,
> > Could I wrap the codes and let memory_add_physaddr_to_nid simply invoke
> > phys_to_target_node()?
> 
> I think it needs to be the reverse. phys_to_target_node() should call
> memory_add_physaddr_to_nid() by default, but fall back to searching
> reserved memory address ranges in memblock. See phys_to_target_node()
> in arch/x86/mm/numa.c. That one uses numa_meminfo instead of memblock,
> but the principle is the same i.e. that a target node may not be
> represented in memblock.memory, but memblock.reserved. I'm working on
> a patch to provide a function similar to get_pfn_range_for_nid() that
> operates on reserved memory.

Do we really need yet another memblock iterator?
I think only x86 has memory that is not in memblock.memory but only in
memblock.reserved.

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-08  6:19             ` Mike Rapoport
@ 2020-07-08  6:44               ` Dan Williams
  0 siblings, 0 replies; 46+ messages in thread
From: Dan Williams @ 2020-07-08  6:44 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Justin He, Michal Hocko, David Hildenbrand, Catalin Marinas,
	Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He,
	Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm,
	linux-nvdimm, Kaly Xin

On Tue, Jul 7, 2020 at 11:20 PM Mike Rapoport <rppt@linux.ibm.com> wrote:
[..]
> > Darn, I saw ARCH_KEEP_MEMBLOCK and had delusions of grandeur that it
> > could solve my numa api woes. At least for x86 the problem is already
> > solved with reserved numa_meminfo, but now I'm trying to write generic
> > drivers that use those apis and finding these gaps on other archs.
>
> I'm not sure if x86's numa_meminfo is a part of the solution or a part
> of the problem ;-)

More the latter, but hopefully it can remain an exception and not the rule.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-08  6:22             ` Mike Rapoport
@ 2020-07-08  6:53               ` Dan Williams
  2020-07-08  6:59               ` David Hildenbrand
  1 sibling, 0 replies; 46+ messages in thread
From: Dan Williams @ 2020-07-08  6:53 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Justin He, Michal Hocko, David Hildenbrand, Catalin Marinas,
	Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He,
	Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm,
	linux-nvdimm, Kaly Xin

On Tue, Jul 7, 2020 at 11:22 PM Mike Rapoport <rppt@linux.ibm.com> wrote:
[..]
> > > Thanks for your suggestion,
> > > Could I wrap the codes and let memory_add_physaddr_to_nid simply invoke
> > > phys_to_target_node()?
> >
> > I think it needs to be the reverse. phys_to_target_node() should call
> > memory_add_physaddr_to_nid() by default, but fall back to searching
> > reserved memory address ranges in memblock. See phys_to_target_node()
> > in arch/x86/mm/numa.c. That one uses numa_meminfo instead of memblock,
> > but the principle is the same i.e. that a target node may not be
> > represented in memblock.memory, but memblock.reserved. I'm working on
> > a patch to provide a function similar to get_pfn_range_for_nid() that
> > operates on reserved memory.
>
> Do we really need yet another memblock iterator?
> I think only x86 has memory that is not in memblock.memory but only in
> memblock.reserved.

Well, that's what led me here. EFI has introduced a memory attribute
called "EFI Special Purpose Memory". I mapped it to a new Linux
concept called Soft Reserved memory (commit b617c5266eed "efi: Common
enable/disable infrastructure for EFI soft reservation"). The driver I
want to claim that memory, device-dax, wants to be able to look up
numa information for an address range that is marked reserved in
memblock. The device-dax facility has the ability to either let
userspace map a device, or assign the memory backing that device to
the page allocator. In both scenarios the driver needs numa info to
either populate the 'numa_node' property of the device in sysfs, or to
pass an node-id to add_memory_resource() when it is hot-plugged.

I was thwarted by the lack of phys_to_target_node() on arm64, and
rather than add another stub like memory_add_physaddr_to_nid() I
wanted to see if it could be solved properly / generically with
memblock data.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-08  5:48           ` Dan Williams
  2020-07-08  6:19             ` Mike Rapoport
@ 2020-07-08  6:56             ` Justin He
  2020-07-08  7:00               ` David Hildenbrand
  1 sibling, 1 reply; 46+ messages in thread
From: Justin He @ 2020-07-08  6:56 UTC (permalink / raw)
  To: Dan Williams
  Cc: Michal Hocko, David Hildenbrand, Catalin Marinas, Will Deacon,
	Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He,
	Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm,
	linux-nvdimm, Kaly Xin, Mike Rapoport

Hi Dan

> -----Original Message-----
> From: Dan Williams <dan.j.williams@intel.com>
> Sent: Wednesday, July 8, 2020 1:48 PM
> To: Mike Rapoport <rppt@linux.ibm.com>
> Cc: Justin He <Justin.He@arm.com>; Michal Hocko <mhocko@kernel.org>; David
> Hildenbrand <david@redhat.com>; Catalin Marinas <Catalin.Marinas@arm.com>;
> Will Deacon <will@kernel.org>; Vishal Verma <vishal.l.verma@intel.com>;
> Dave Jiang <dave.jiang@intel.com>; Andrew Morton <akpm@linux-
> foundation.org>; Baoquan He <bhe@redhat.com>; Chuhong Yuan
> <hslester96@gmail.com>; linux-arm-kernel@lists.infradead.org; linux-
> kernel@vger.kernel.org; linux-mm@kvack.org; linux-nvdimm@lists.01.org;
> Kaly Xin <Kaly.Xin@arm.com>
> Subject: Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid
> as EXPORT_SYMBOL_GPL
> 
> On Tue, Jul 7, 2020 at 10:33 PM Mike Rapoport <rppt@linux.ibm.com> wrote:
> >
> > On Tue, Jul 07, 2020 at 08:56:36PM -0700, Dan Williams wrote:
> > > On Tue, Jul 7, 2020 at 7:20 PM Justin He <Justin.He@arm.com> wrote:
> > > >
> > > > Hi Michal and David
> > > >
> > > > > -----Original Message-----
> > > > > From: Michal Hocko <mhocko@kernel.org>
> > > > > Sent: Tuesday, July 7, 2020 7:55 PM
> > > > > To: Justin He <Justin.He@arm.com>
> > > > > Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Will Deacon
> > > > > <will@kernel.org>; Dan Williams <dan.j.williams@intel.com>; Vishal
> Verma
> > > > > <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com>;
> Andrew
> > > > > Morton <akpm@linux-foundation.org>; Mike Rapoport
> <rppt@linux.ibm.com>;
> > > > > Baoquan He <bhe@redhat.com>; Chuhong Yuan <hslester96@gmail.com>;
> linux-
> > > > > arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org;
> linux-
> > > > > mm@kvack.org; linux-nvdimm@lists.01.org; Kaly Xin
> <Kaly.Xin@arm.com>
> > > > > Subject: Re: [PATCH v2 1/3] arm64/numa: export
> memory_add_physaddr_to_nid
> > > > > as EXPORT_SYMBOL_GPL
> > > > >
> > > > > On Tue 07-07-20 13:59:15, Jia He wrote:
> > > > > > This exports memory_add_physaddr_to_nid() for module driver to
> use.
> > > > > >
> > > > > > memory_add_physaddr_to_nid() is a fallback option to get the nid
> in case
> > > > > > NUMA_NO_NID is detected.
> > > > > >
> > > > > > Suggested-by: David Hildenbrand <david@redhat.com>
> > > > > > Signed-off-by: Jia He <justin.he@arm.com>
> > > > > > ---
> > > > > >  arch/arm64/mm/numa.c | 5 +++--
> > > > > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> > > > > > index aafcee3e3f7e..7eeb31740248 100644
> > > > > > --- a/arch/arm64/mm/numa.c
> > > > > > +++ b/arch/arm64/mm/numa.c
> > > > > > @@ -464,10 +464,11 @@ void __init arm64_numa_init(void)
> > > > > >
> > > > > >  /*
> > > > > >   * We hope that we will be hotplugging memory on nodes we
> already know
> > > > > about,
> > > > > > - * such that acpi_get_node() succeeds and we never fall back to
> this...
> > > > > > + * such that acpi_get_node() succeeds. But when SRAT is not
> present,
> > > > > the node
> > > > > > + * id may be probed as NUMA_NO_NODE by acpi, Here provide a
> fallback
> > > > > option.
> > > > > >   */
> > > > > >  int memory_add_physaddr_to_nid(u64 addr)
> > > > > >  {
> > > > > > -   pr_warn("Unknown node for memory at 0x%llx, assuming node
> 0\n",
> > > > > addr);
> > > > > >     return 0;
> > > > > >  }
> > > > > > +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
> > > > >
> > > > > Does it make sense to export a noop function? Wouldn't make more
> sense
> > > > > to simply make it static inline somewhere in a header? I haven't
> checked
> > > > > whether there is an easy way to do that sanely bu this just hit my
> eyes.
> > > >
> > > > Okay, I can make a change in memory_hotplug.h, sth like:
> > > > --- a/include/linux/memory_hotplug.h
> > > > +++ b/include/linux/memory_hotplug.h
> > > > @@ -149,13 +149,13 @@ int add_pages(int nid, unsigned long start_pfn,
> unsigned long nr_pages,
> > > >               struct mhp_params *params);
> > > >  #endif /* ARCH_HAS_ADD_PAGES */
> > > >
> > > > -#ifdef CONFIG_NUMA
> > > > -extern int memory_add_physaddr_to_nid(u64 start);
> > > > -#else
> > > > +#if !defined(CONFIG_NUMA) || !defined(memory_add_physaddr_to_nid)
> > > >  static inline int memory_add_physaddr_to_nid(u64 start)
> > > >  {
> > > >         return 0;
> > > >  }
> > > > +#else
> > > > +extern int memory_add_physaddr_to_nid(u64 start);
> > > >  #endif
> > > >
> > > > And then check the memory_add_physaddr_to_nid() helper on all arches,
> > > > if it is noop(return 0), I can simply remove it.
> > > > if it is not noop, after the helper,
> > > > #define memory_add_physaddr_to_nid
> > > >
> > > > What do you think of this proposal?
> > >
> > > Especially for architectures that use memblock info for numa info
> > > (which seems to be everyone except x86) why not implement a generic
> > > memory_add_physaddr_to_nid() that does:
> >
> > That would be only arm64.
> >
> 
> Darn, I saw ARCH_KEEP_MEMBLOCK and had delusions of grandeur that it
> could solve my numa api woes. At least for x86 the problem is already
> solved with reserved numa_meminfo, but now I'm trying to write generic
> drivers that use those apis and finding these gaps on other archs.

Even on arm64, there is a dependency issue in dax_pmem kmem case.
If dax pmem uses memory_add_physaddr_to_nid() to decide which node that
memblock should add into, get_pfn_range_for_nid() might not have
the correct memblock info at that time. That is, get_pfn_range_for_nid()
can't get the correct memblock info before add_memory()

So IMO, memory_add_physaddr_to_nid() still have to implement as noop on
arm64 (return 0) together with sh,s390x? Powerpc, x86,ia64 can use their
own implementation. And phys_to_target_node() can use your suggested(
for_each_online_node() ...)

What do you think of it? Thanks

--
Cheers,
Justin (Jia He)



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-08  6:22             ` Mike Rapoport
  2020-07-08  6:53               ` Dan Williams
@ 2020-07-08  6:59               ` David Hildenbrand
  2020-07-08  7:04                 ` Dan Williams
  1 sibling, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2020-07-08  6:59 UTC (permalink / raw)
  To: Mike Rapoport, Dan Williams
  Cc: Justin He, Michal Hocko, Catalin Marinas, Will Deacon,
	Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He,
	Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm,
	linux-nvdimm, Kaly Xin

On 08.07.20 08:22, Mike Rapoport wrote:
> On Tue, Jul 07, 2020 at 09:27:43PM -0700, Dan Williams wrote:
>> On Tue, Jul 7, 2020 at 9:08 PM Justin He <Justin.He@arm.com> wrote:
>> [..]
>>>> Especially for architectures that use memblock info for numa info
>>>> (which seems to be everyone except x86) why not implement a generic
>>>> memory_add_physaddr_to_nid() that does:
>>>>
>>>> int memory_add_physaddr_to_nid(u64 addr)
>>>> {
>>>>         unsigned long start_pfn, end_pfn, pfn = PHYS_PFN(addr);
>>>>         int nid;
>>>>
>>>>         for_each_online_node(nid) {
>>>>                 get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
>>>>                 if (pfn >= start_pfn && pfn <= end_pfn)
>>>>                         return nid;
>>>>         }
>>>>         return NUMA_NO_NODE;
>>>> }
>>>
>>> Thanks for your suggestion,
>>> Could I wrap the codes and let memory_add_physaddr_to_nid simply invoke
>>> phys_to_target_node()?
>>
>> I think it needs to be the reverse. phys_to_target_node() should call
>> memory_add_physaddr_to_nid() by default, but fall back to searching
>> reserved memory address ranges in memblock. See phys_to_target_node()
>> in arch/x86/mm/numa.c. That one uses numa_meminfo instead of memblock,
>> but the principle is the same i.e. that a target node may not be
>> represented in memblock.memory, but memblock.reserved. I'm working on
>> a patch to provide a function similar to get_pfn_range_for_nid() that
>> operates on reserved memory.
> 
> Do we really need yet another memblock iterator?
> I think only x86 has memory that is not in memblock.memory but only in
> memblock.reserved.

Reading about abusing the memblock allcoator once again in memory
hotplug paths makes me shiver.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-08  6:56             ` Justin He
@ 2020-07-08  7:00               ` David Hildenbrand
  0 siblings, 0 replies; 46+ messages in thread
From: David Hildenbrand @ 2020-07-08  7:00 UTC (permalink / raw)
  To: Justin He, Dan Williams
  Cc: Michal Hocko, Catalin Marinas, Will Deacon, Vishal Verma,
	Dave Jiang, Andrew Morton, Baoquan He, Chuhong Yuan,
	linux-arm-kernel, linux-kernel, linux-mm, linux-nvdimm, Kaly Xin,
	Mike Rapoport

On 08.07.20 08:56, Justin He wrote:
> Hi Dan
> 
>> -----Original Message-----
>> From: Dan Williams <dan.j.williams@intel.com>
>> Sent: Wednesday, July 8, 2020 1:48 PM
>> To: Mike Rapoport <rppt@linux.ibm.com>
>> Cc: Justin He <Justin.He@arm.com>; Michal Hocko <mhocko@kernel.org>; David
>> Hildenbrand <david@redhat.com>; Catalin Marinas <Catalin.Marinas@arm.com>;
>> Will Deacon <will@kernel.org>; Vishal Verma <vishal.l.verma@intel.com>;
>> Dave Jiang <dave.jiang@intel.com>; Andrew Morton <akpm@linux-
>> foundation.org>; Baoquan He <bhe@redhat.com>; Chuhong Yuan
>> <hslester96@gmail.com>; linux-arm-kernel@lists.infradead.org; linux-
>> kernel@vger.kernel.org; linux-mm@kvack.org; linux-nvdimm@lists.01.org;
>> Kaly Xin <Kaly.Xin@arm.com>
>> Subject: Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid
>> as EXPORT_SYMBOL_GPL
>>
>> On Tue, Jul 7, 2020 at 10:33 PM Mike Rapoport <rppt@linux.ibm.com> wrote:
>>>
>>> On Tue, Jul 07, 2020 at 08:56:36PM -0700, Dan Williams wrote:
>>>> On Tue, Jul 7, 2020 at 7:20 PM Justin He <Justin.He@arm.com> wrote:
>>>>>
>>>>> Hi Michal and David
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Michal Hocko <mhocko@kernel.org>
>>>>>> Sent: Tuesday, July 7, 2020 7:55 PM
>>>>>> To: Justin He <Justin.He@arm.com>
>>>>>> Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Will Deacon
>>>>>> <will@kernel.org>; Dan Williams <dan.j.williams@intel.com>; Vishal
>> Verma
>>>>>> <vishal.l.verma@intel.com>; Dave Jiang <dave.jiang@intel.com>;
>> Andrew
>>>>>> Morton <akpm@linux-foundation.org>; Mike Rapoport
>> <rppt@linux.ibm.com>;
>>>>>> Baoquan He <bhe@redhat.com>; Chuhong Yuan <hslester96@gmail.com>;
>> linux-
>>>>>> arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org;
>> linux-
>>>>>> mm@kvack.org; linux-nvdimm@lists.01.org; Kaly Xin
>> <Kaly.Xin@arm.com>
>>>>>> Subject: Re: [PATCH v2 1/3] arm64/numa: export
>> memory_add_physaddr_to_nid
>>>>>> as EXPORT_SYMBOL_GPL
>>>>>>
>>>>>> On Tue 07-07-20 13:59:15, Jia He wrote:
>>>>>>> This exports memory_add_physaddr_to_nid() for module driver to
>> use.
>>>>>>>
>>>>>>> memory_add_physaddr_to_nid() is a fallback option to get the nid
>> in case
>>>>>>> NUMA_NO_NID is detected.
>>>>>>>
>>>>>>> Suggested-by: David Hildenbrand <david@redhat.com>
>>>>>>> Signed-off-by: Jia He <justin.he@arm.com>
>>>>>>> ---
>>>>>>>  arch/arm64/mm/numa.c | 5 +++--
>>>>>>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>>>>>>> index aafcee3e3f7e..7eeb31740248 100644
>>>>>>> --- a/arch/arm64/mm/numa.c
>>>>>>> +++ b/arch/arm64/mm/numa.c
>>>>>>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void)
>>>>>>>
>>>>>>>  /*
>>>>>>>   * We hope that we will be hotplugging memory on nodes we
>> already know
>>>>>> about,
>>>>>>> - * such that acpi_get_node() succeeds and we never fall back to
>> this...
>>>>>>> + * such that acpi_get_node() succeeds. But when SRAT is not
>> present,
>>>>>> the node
>>>>>>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a
>> fallback
>>>>>> option.
>>>>>>>   */
>>>>>>>  int memory_add_physaddr_to_nid(u64 addr)
>>>>>>>  {
>>>>>>> -   pr_warn("Unknown node for memory at 0x%llx, assuming node
>> 0\n",
>>>>>> addr);
>>>>>>>     return 0;
>>>>>>>  }
>>>>>>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
>>>>>>
>>>>>> Does it make sense to export a noop function? Wouldn't make more
>> sense
>>>>>> to simply make it static inline somewhere in a header? I haven't
>> checked
>>>>>> whether there is an easy way to do that sanely bu this just hit my
>> eyes.
>>>>>
>>>>> Okay, I can make a change in memory_hotplug.h, sth like:
>>>>> --- a/include/linux/memory_hotplug.h
>>>>> +++ b/include/linux/memory_hotplug.h
>>>>> @@ -149,13 +149,13 @@ int add_pages(int nid, unsigned long start_pfn,
>> unsigned long nr_pages,
>>>>>               struct mhp_params *params);
>>>>>  #endif /* ARCH_HAS_ADD_PAGES */
>>>>>
>>>>> -#ifdef CONFIG_NUMA
>>>>> -extern int memory_add_physaddr_to_nid(u64 start);
>>>>> -#else
>>>>> +#if !defined(CONFIG_NUMA) || !defined(memory_add_physaddr_to_nid)
>>>>>  static inline int memory_add_physaddr_to_nid(u64 start)
>>>>>  {
>>>>>         return 0;
>>>>>  }
>>>>> +#else
>>>>> +extern int memory_add_physaddr_to_nid(u64 start);
>>>>>  #endif
>>>>>
>>>>> And then check the memory_add_physaddr_to_nid() helper on all arches,
>>>>> if it is noop(return 0), I can simply remove it.
>>>>> if it is not noop, after the helper,
>>>>> #define memory_add_physaddr_to_nid
>>>>>
>>>>> What do you think of this proposal?
>>>>
>>>> Especially for architectures that use memblock info for numa info
>>>> (which seems to be everyone except x86) why not implement a generic
>>>> memory_add_physaddr_to_nid() that does:
>>>
>>> That would be only arm64.
>>>
>>
>> Darn, I saw ARCH_KEEP_MEMBLOCK and had delusions of grandeur that it
>> could solve my numa api woes. At least for x86 the problem is already
>> solved with reserved numa_meminfo, but now I'm trying to write generic
>> drivers that use those apis and finding these gaps on other archs.
> 
> Even on arm64, there is a dependency issue in dax_pmem kmem case.
> If dax pmem uses memory_add_physaddr_to_nid() to decide which node that
> memblock should add into, get_pfn_range_for_nid() might not have
> the correct memblock info at that time. That is, get_pfn_range_for_nid()
> can't get the correct memblock info before add_memory()
> 
> So IMO, memory_add_physaddr_to_nid() still have to implement as noop on
> arm64 (return 0) together with sh,s390x? Powerpc, x86,ia64 can use their
> own implementation. And phys_to_target_node() can use your suggested(
> for_each_online_node() ...)
> 
> What do you think of it? Thanks

You are trying to fix the "we only have one dummy node" AFAIU, so what
you propose here is certainly good enough for now.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-08  6:59               ` David Hildenbrand
@ 2020-07-08  7:04                 ` Dan Williams
  2020-07-08  7:16                   ` David Hildenbrand
  0 siblings, 1 reply; 46+ messages in thread
From: Dan Williams @ 2020-07-08  7:04 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Mike Rapoport, Justin He, Michal Hocko, Catalin Marinas,
	Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He,
	Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm,
	linux-nvdimm, Kaly Xin

On Tue, Jul 7, 2020 at 11:59 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 08.07.20 08:22, Mike Rapoport wrote:
> > On Tue, Jul 07, 2020 at 09:27:43PM -0700, Dan Williams wrote:
> >> On Tue, Jul 7, 2020 at 9:08 PM Justin He <Justin.He@arm.com> wrote:
> >> [..]
> >>>> Especially for architectures that use memblock info for numa info
> >>>> (which seems to be everyone except x86) why not implement a generic
> >>>> memory_add_physaddr_to_nid() that does:
> >>>>
> >>>> int memory_add_physaddr_to_nid(u64 addr)
> >>>> {
> >>>>         unsigned long start_pfn, end_pfn, pfn = PHYS_PFN(addr);
> >>>>         int nid;
> >>>>
> >>>>         for_each_online_node(nid) {
> >>>>                 get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
> >>>>                 if (pfn >= start_pfn && pfn <= end_pfn)
> >>>>                         return nid;
> >>>>         }
> >>>>         return NUMA_NO_NODE;
> >>>> }
> >>>
> >>> Thanks for your suggestion,
> >>> Could I wrap the codes and let memory_add_physaddr_to_nid simply invoke
> >>> phys_to_target_node()?
> >>
> >> I think it needs to be the reverse. phys_to_target_node() should call
> >> memory_add_physaddr_to_nid() by default, but fall back to searching
> >> reserved memory address ranges in memblock. See phys_to_target_node()
> >> in arch/x86/mm/numa.c. That one uses numa_meminfo instead of memblock,
> >> but the principle is the same i.e. that a target node may not be
> >> represented in memblock.memory, but memblock.reserved. I'm working on
> >> a patch to provide a function similar to get_pfn_range_for_nid() that
> >> operates on reserved memory.
> >
> > Do we really need yet another memblock iterator?
> > I think only x86 has memory that is not in memblock.memory but only in
> > memblock.reserved.
>
> Reading about abusing the memblock allcoator once again in memory
> hotplug paths makes me shiver.

Technical reasoning please?

arm64 numa information is established from memblock data. It seems
counterproductive to ignore that fact if we're already touching
memory_add_physaddr_to_nid() and have a use case for a driver to call
it.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-08  7:04                 ` Dan Williams
@ 2020-07-08  7:16                   ` David Hildenbrand
  2020-07-08  7:43                     ` Mike Rapoport
  0 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2020-07-08  7:16 UTC (permalink / raw)
  To: Dan Williams
  Cc: Mike Rapoport, Justin He, Michal Hocko, Catalin Marinas,
	Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He,
	Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm,
	linux-nvdimm, Kaly Xin

On 08.07.20 09:04, Dan Williams wrote:
> On Tue, Jul 7, 2020 at 11:59 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 08.07.20 08:22, Mike Rapoport wrote:
>>> On Tue, Jul 07, 2020 at 09:27:43PM -0700, Dan Williams wrote:
>>>> On Tue, Jul 7, 2020 at 9:08 PM Justin He <Justin.He@arm.com> wrote:
>>>> [..]
>>>>>> Especially for architectures that use memblock info for numa info
>>>>>> (which seems to be everyone except x86) why not implement a generic
>>>>>> memory_add_physaddr_to_nid() that does:
>>>>>>
>>>>>> int memory_add_physaddr_to_nid(u64 addr)
>>>>>> {
>>>>>>         unsigned long start_pfn, end_pfn, pfn = PHYS_PFN(addr);
>>>>>>         int nid;
>>>>>>
>>>>>>         for_each_online_node(nid) {
>>>>>>                 get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
>>>>>>                 if (pfn >= start_pfn && pfn <= end_pfn)
>>>>>>                         return nid;
>>>>>>         }
>>>>>>         return NUMA_NO_NODE;
>>>>>> }
>>>>>
>>>>> Thanks for your suggestion,
>>>>> Could I wrap the codes and let memory_add_physaddr_to_nid simply invoke
>>>>> phys_to_target_node()?
>>>>
>>>> I think it needs to be the reverse. phys_to_target_node() should call
>>>> memory_add_physaddr_to_nid() by default, but fall back to searching
>>>> reserved memory address ranges in memblock. See phys_to_target_node()
>>>> in arch/x86/mm/numa.c. That one uses numa_meminfo instead of memblock,
>>>> but the principle is the same i.e. that a target node may not be
>>>> represented in memblock.memory, but memblock.reserved. I'm working on
>>>> a patch to provide a function similar to get_pfn_range_for_nid() that
>>>> operates on reserved memory.
>>>
>>> Do we really need yet another memblock iterator?
>>> I think only x86 has memory that is not in memblock.memory but only in
>>> memblock.reserved.
>>
>> Reading about abusing the memblock allcoator once again in memory
>> hotplug paths makes me shiver.
> 
> Technical reasoning please?

ARCH_KEEP_MEMBLOCK is (AFAIK) only a hack for arm64 to implement
pfn_valid(), because they zap out individual pages corresponding to
memory holes of full sections.

I am not a friend of adding more post-init code to rely on memblock
data. It just makes it harder to eventually get rid of ARCH_KEEP_MEMBLOCK.

> 
> arm64 numa information is established from memblock data. It seems
> counterproductive to ignore that fact if we're already touching
> memory_add_physaddr_to_nid() and have a use case for a driver to call
> it.

... and we are trying to handle the "only a single dummy node" case
(patch #2), or what am I missing? What is there to optimize currently?

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-08  5:27             ` Mike Rapoport
@ 2020-07-08  7:21               ` David Hildenbrand
  2020-07-08  7:38                 ` Mike Rapoport
  2020-07-08  7:50                 ` Dan Williams
  0 siblings, 2 replies; 46+ messages in thread
From: David Hildenbrand @ 2020-07-08  7:21 UTC (permalink / raw)
  To: Mike Rapoport, Dan Williams
  Cc: Michal Hocko, Jia He, Catalin Marinas, Will Deacon, Vishal Verma,
	Dave Jiang, Andrew Morton, Baoquan He, Chuhong Yuan, Linux ARM,
	Linux Kernel Mailing List, Linux MM, linux-nvdimm, Kaly Xin

On 08.07.20 07:27, Mike Rapoport wrote:
> On Tue, Jul 07, 2020 at 03:05:48PM -0700, Dan Williams wrote:
>> On Tue, Jul 7, 2020 at 11:01 AM Mike Rapoport <rppt@linux.ibm.com> wrote:
>>>
>>> On Tue, Jul 07, 2020 at 02:26:08PM +0200, David Hildenbrand wrote:
>>>> On 07.07.20 14:13, Mike Rapoport wrote:
>>>>> On Tue, Jul 07, 2020 at 01:54:54PM +0200, Michal Hocko wrote:
>>>>>> On Tue 07-07-20 13:59:15, Jia He wrote:
>>>>>>> This exports memory_add_physaddr_to_nid() for module driver to use.
>>>>>>>
>>>>>>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case
>>>>>>> NUMA_NO_NID is detected.
>>>>>>>
>>>>>>> Suggested-by: David Hildenbrand <david@redhat.com>
>>>>>>> Signed-off-by: Jia He <justin.he@arm.com>
>>>>>>> ---
>>>>>>>  arch/arm64/mm/numa.c | 5 +++--
>>>>>>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>>>>>>> index aafcee3e3f7e..7eeb31740248 100644
>>>>>>> --- a/arch/arm64/mm/numa.c
>>>>>>> +++ b/arch/arm64/mm/numa.c
>>>>>>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void)
>>>>>>>
>>>>>>>  /*
>>>>>>>   * We hope that we will be hotplugging memory on nodes we already know about,
>>>>>>> - * such that acpi_get_node() succeeds and we never fall back to this...
>>>>>>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node
>>>>>>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option.
>>>>>>>   */
>>>>>>>  int memory_add_physaddr_to_nid(u64 addr)
>>>>>>>  {
>>>>>>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr);
>>>>>>>   return 0;
>>>>>>>  }
>>>>>>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
>>>>>>
>>>>>> Does it make sense to export a noop function? Wouldn't make more sense
>>>>>> to simply make it static inline somewhere in a header? I haven't checked
>>>>>> whether there is an easy way to do that sanely bu this just hit my eyes.
>>>>>
>>>>> We'll need to either add a CONFIG_ option or arch specific callback to
>>>>> make both non-empty (x86, powerpc, ia64) and empty (arm64, sh)
>>>>> implementations coexist ...
>>>>
>>>> Note: I have a similar dummy (return 0) patch for s390x lying around here.
>>>
>>> Then we'll call it a tie - 3:3 ;-)
>>
>> So I'd be happy to jump on the train of people wanting to export the
>> ARM stub for this (and add a new ARM stub for phys_to_target_node()),
>> but Will did have a plausibly better idea that I have been meaning to
>> circle back to:
>>
>> http://lore.kernel.org/r/20200325111039.GA32109@willie-the-truck
>>
>> ...i.e. iterate over node data to do the lookup. This would seem to
>> work generically for multiple archs unless I am missing something?

IIRC, only memory assigned to/onlined to a ZONE is represented in the
pgdat node span. E.g., not offline memory blocks.

Esp., when hotplugging + onlining consecutive memory, there won't really
be any intersections in most cases if I am not wrong. It would not be
"intersection" but rather "closest fit".

With overlapping nodes it's even more unclear. Which one to pick?

> 
> I think it would work on arm64, power and, most propbably on s390

With only a single dummy node I guess it should work (searching when
there is only a single node does not make too much sense).

> (David?), but not on x86. x86 does not have reserved memory in pgdat,
> it's never memblock_add()'ed (see e820__memblock_setup()).

Can you enlighten me why that is relevant for the memory hotplug path?
(or is it just a general comment to make the function as accurate as
possible for all addresses?)

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-08  7:21               ` David Hildenbrand
@ 2020-07-08  7:38                 ` Mike Rapoport
  2020-07-08  7:40                   ` David Hildenbrand
  2020-07-08  7:50                 ` Dan Williams
  1 sibling, 1 reply; 46+ messages in thread
From: Mike Rapoport @ 2020-07-08  7:38 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Dan Williams, Michal Hocko, Jia He, Catalin Marinas, Will Deacon,
	Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He,
	Chuhong Yuan, Linux ARM, Linux Kernel Mailing List, Linux MM,
	linux-nvdimm, Kaly Xin

On Wed, Jul 08, 2020 at 09:21:25AM +0200, David Hildenbrand wrote:
> On 08.07.20 07:27, Mike Rapoport wrote:
> > On Tue, Jul 07, 2020 at 03:05:48PM -0700, Dan Williams wrote:
> >> On Tue, Jul 7, 2020 at 11:01 AM Mike Rapoport <rppt@linux.ibm.com> wrote:
> >>>
> >>> On Tue, Jul 07, 2020 at 02:26:08PM +0200, David Hildenbrand wrote:
> >>>> On 07.07.20 14:13, Mike Rapoport wrote:
> >>>>> On Tue, Jul 07, 2020 at 01:54:54PM +0200, Michal Hocko wrote:
> >>>>>> On Tue 07-07-20 13:59:15, Jia He wrote:
> >>>>>>> This exports memory_add_physaddr_to_nid() for module driver to use.
> >>>>>>>
> >>>>>>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case
> >>>>>>> NUMA_NO_NID is detected.
> >>>>>>>
> >>>>>>> Suggested-by: David Hildenbrand <david@redhat.com>
> >>>>>>> Signed-off-by: Jia He <justin.he@arm.com>
> >>>>>>> ---
> >>>>>>>  arch/arm64/mm/numa.c | 5 +++--
> >>>>>>>  1 file changed, 3 insertions(+), 2 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> >>>>>>> index aafcee3e3f7e..7eeb31740248 100644
> >>>>>>> --- a/arch/arm64/mm/numa.c
> >>>>>>> +++ b/arch/arm64/mm/numa.c
> >>>>>>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void)
> >>>>>>>
> >>>>>>>  /*
> >>>>>>>   * We hope that we will be hotplugging memory on nodes we already know about,
> >>>>>>> - * such that acpi_get_node() succeeds and we never fall back to this...
> >>>>>>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node
> >>>>>>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option.
> >>>>>>>   */
> >>>>>>>  int memory_add_physaddr_to_nid(u64 addr)
> >>>>>>>  {
> >>>>>>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr);
> >>>>>>>   return 0;
> >>>>>>>  }
> >>>>>>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
> >>>>>>
> >>>>>> Does it make sense to export a noop function? Wouldn't make more sense
> >>>>>> to simply make it static inline somewhere in a header? I haven't checked
> >>>>>> whether there is an easy way to do that sanely bu this just hit my eyes.
> >>>>>
> >>>>> We'll need to either add a CONFIG_ option or arch specific callback to
> >>>>> make both non-empty (x86, powerpc, ia64) and empty (arm64, sh)
> >>>>> implementations coexist ...
> >>>>
> >>>> Note: I have a similar dummy (return 0) patch for s390x lying around here.
> >>>
> >>> Then we'll call it a tie - 3:3 ;-)
> >>
> >> So I'd be happy to jump on the train of people wanting to export the
> >> ARM stub for this (and add a new ARM stub for phys_to_target_node()),
> >> but Will did have a plausibly better idea that I have been meaning to
> >> circle back to:
> >>
> >> http://lore.kernel.org/r/20200325111039.GA32109@willie-the-truck
> >>
> >> ...i.e. iterate over node data to do the lookup. This would seem to
> >> work generically for multiple archs unless I am missing something?
> 
> IIRC, only memory assigned to/onlined to a ZONE is represented in the
> pgdat node span. E.g., not offline memory blocks.
> 
> Esp., when hotplugging + onlining consecutive memory, there won't really
> be any intersections in most cases if I am not wrong. It would not be
> "intersection" but rather "closest fit".
> 
> With overlapping nodes it's even more unclear. Which one to pick?
> 
> > 
> > I think it would work on arm64, power and, most propbably on s390
> 
> With only a single dummy node I guess it should work (searching when
> there is only a single node does not make too much sense).
> 
> > (David?), but not on x86. x86 does not have reserved memory in pgdat,
> > it's never memblock_add()'ed (see e820__memblock_setup()).
> 
> Can you enlighten me why that is relevant for the memory hotplug path?
> (or is it just a general comment to make the function as accurate as
> possible for all addresses?)

phys_to_target_node() on x86 falls back to numa_reserved_meminfo which
holds memory that is never listed in a node.

> -- 
> Thanks,
> 
> David / dhildenb
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-08  7:38                 ` Mike Rapoport
@ 2020-07-08  7:40                   ` David Hildenbrand
  0 siblings, 0 replies; 46+ messages in thread
From: David Hildenbrand @ 2020-07-08  7:40 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Dan Williams, Michal Hocko, Jia He, Catalin Marinas, Will Deacon,
	Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He,
	Chuhong Yuan, Linux ARM, Linux Kernel Mailing List, Linux MM,
	linux-nvdimm, Kaly Xin

On 08.07.20 09:38, Mike Rapoport wrote:
> On Wed, Jul 08, 2020 at 09:21:25AM +0200, David Hildenbrand wrote:
>> On 08.07.20 07:27, Mike Rapoport wrote:
>>> On Tue, Jul 07, 2020 at 03:05:48PM -0700, Dan Williams wrote:
>>>> On Tue, Jul 7, 2020 at 11:01 AM Mike Rapoport <rppt@linux.ibm.com> wrote:
>>>>>
>>>>> On Tue, Jul 07, 2020 at 02:26:08PM +0200, David Hildenbrand wrote:
>>>>>> On 07.07.20 14:13, Mike Rapoport wrote:
>>>>>>> On Tue, Jul 07, 2020 at 01:54:54PM +0200, Michal Hocko wrote:
>>>>>>>> On Tue 07-07-20 13:59:15, Jia He wrote:
>>>>>>>>> This exports memory_add_physaddr_to_nid() for module driver to use.
>>>>>>>>>
>>>>>>>>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case
>>>>>>>>> NUMA_NO_NID is detected.
>>>>>>>>>
>>>>>>>>> Suggested-by: David Hildenbrand <david@redhat.com>
>>>>>>>>> Signed-off-by: Jia He <justin.he@arm.com>
>>>>>>>>> ---
>>>>>>>>>  arch/arm64/mm/numa.c | 5 +++--
>>>>>>>>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>>>>>>>>> index aafcee3e3f7e..7eeb31740248 100644
>>>>>>>>> --- a/arch/arm64/mm/numa.c
>>>>>>>>> +++ b/arch/arm64/mm/numa.c
>>>>>>>>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void)
>>>>>>>>>
>>>>>>>>>  /*
>>>>>>>>>   * We hope that we will be hotplugging memory on nodes we already know about,
>>>>>>>>> - * such that acpi_get_node() succeeds and we never fall back to this...
>>>>>>>>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node
>>>>>>>>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option.
>>>>>>>>>   */
>>>>>>>>>  int memory_add_physaddr_to_nid(u64 addr)
>>>>>>>>>  {
>>>>>>>>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr);
>>>>>>>>>   return 0;
>>>>>>>>>  }
>>>>>>>>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
>>>>>>>>
>>>>>>>> Does it make sense to export a noop function? Wouldn't make more sense
>>>>>>>> to simply make it static inline somewhere in a header? I haven't checked
>>>>>>>> whether there is an easy way to do that sanely bu this just hit my eyes.
>>>>>>>
>>>>>>> We'll need to either add a CONFIG_ option or arch specific callback to
>>>>>>> make both non-empty (x86, powerpc, ia64) and empty (arm64, sh)
>>>>>>> implementations coexist ...
>>>>>>
>>>>>> Note: I have a similar dummy (return 0) patch for s390x lying around here.
>>>>>
>>>>> Then we'll call it a tie - 3:3 ;-)
>>>>
>>>> So I'd be happy to jump on the train of people wanting to export the
>>>> ARM stub for this (and add a new ARM stub for phys_to_target_node()),
>>>> but Will did have a plausibly better idea that I have been meaning to
>>>> circle back to:
>>>>
>>>> http://lore.kernel.org/r/20200325111039.GA32109@willie-the-truck
>>>>
>>>> ...i.e. iterate over node data to do the lookup. This would seem to
>>>> work generically for multiple archs unless I am missing something?
>>
>> IIRC, only memory assigned to/onlined to a ZONE is represented in the
>> pgdat node span. E.g., not offline memory blocks.
>>
>> Esp., when hotplugging + onlining consecutive memory, there won't really
>> be any intersections in most cases if I am not wrong. It would not be
>> "intersection" but rather "closest fit".
>>
>> With overlapping nodes it's even more unclear. Which one to pick?
>>
>>>
>>> I think it would work on arm64, power and, most propbably on s390
>>
>> With only a single dummy node I guess it should work (searching when
>> there is only a single node does not make too much sense).
>>
>>> (David?), but not on x86. x86 does not have reserved memory in pgdat,
>>> it's never memblock_add()'ed (see e820__memblock_setup()).
>>
>> Can you enlighten me why that is relevant for the memory hotplug path?
>> (or is it just a general comment to make the function as accurate as
>> possible for all addresses?)
> 
> phys_to_target_node() on x86 falls back to numa_reserved_meminfo which
> holds memory that is never listed in a node.
> 

Ah, I see - thanks.


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-08  7:16                   ` David Hildenbrand
@ 2020-07-08  7:43                     ` Mike Rapoport
  0 siblings, 0 replies; 46+ messages in thread
From: Mike Rapoport @ 2020-07-08  7:43 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Dan Williams, Justin He, Michal Hocko, Catalin Marinas,
	Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He,
	Chuhong Yuan, linux-arm-kernel, linux-kernel, linux-mm,
	linux-nvdimm, Kaly Xin

On Wed, Jul 08, 2020 at 09:16:01AM +0200, David Hildenbrand wrote:
> On 08.07.20 09:04, Dan Williams wrote:
> > On Tue, Jul 7, 2020 at 11:59 PM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 08.07.20 08:22, Mike Rapoport wrote:
> >>> On Tue, Jul 07, 2020 at 09:27:43PM -0700, Dan Williams wrote:
> >>>> On Tue, Jul 7, 2020 at 9:08 PM Justin He <Justin.He@arm.com> wrote:
> >>>> [..]
> >>>>>> Especially for architectures that use memblock info for numa info
> >>>>>> (which seems to be everyone except x86) why not implement a generic
> >>>>>> memory_add_physaddr_to_nid() that does:
> >>>>>>
> >>>>>> int memory_add_physaddr_to_nid(u64 addr)
> >>>>>> {
> >>>>>>         unsigned long start_pfn, end_pfn, pfn = PHYS_PFN(addr);
> >>>>>>         int nid;
> >>>>>>
> >>>>>>         for_each_online_node(nid) {
> >>>>>>                 get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
> >>>>>>                 if (pfn >= start_pfn && pfn <= end_pfn)
> >>>>>>                         return nid;
> >>>>>>         }
> >>>>>>         return NUMA_NO_NODE;
> >>>>>> }
> >>>>>
> >>>>> Thanks for your suggestion,
> >>>>> Could I wrap the codes and let memory_add_physaddr_to_nid simply invoke
> >>>>> phys_to_target_node()?
> >>>>
> >>>> I think it needs to be the reverse. phys_to_target_node() should call
> >>>> memory_add_physaddr_to_nid() by default, but fall back to searching
> >>>> reserved memory address ranges in memblock. See phys_to_target_node()
> >>>> in arch/x86/mm/numa.c. That one uses numa_meminfo instead of memblock,
> >>>> but the principle is the same i.e. that a target node may not be
> >>>> represented in memblock.memory, but memblock.reserved. I'm working on
> >>>> a patch to provide a function similar to get_pfn_range_for_nid() that
> >>>> operates on reserved memory.
> >>>
> >>> Do we really need yet another memblock iterator?
> >>> I think only x86 has memory that is not in memblock.memory but only in
> >>> memblock.reserved.
> >>
> >> Reading about abusing the memblock allcoator once again in memory
> >> hotplug paths makes me shiver.
> > 
> > Technical reasoning please?
> 
> ARCH_KEEP_MEMBLOCK is (AFAIK) only a hack for arm64 to implement
> pfn_valid(), because they zap out individual pages corresponding to
> memory holes of full sections.
> 
> I am not a friend of adding more post-init code to rely on memblock
> data. It just makes it harder to eventually get rid of ARCH_KEEP_MEMBLOCK.

The most heavy user of memblock in post-init code is powerpc. It won't
be easy to get rid of it there.

> > arm64 numa information is established from memblock data. It seems
> > counterproductive to ignore that fact if we're already touching
> > memory_add_physaddr_to_nid() and have a use case for a driver to call
> > it.
> 
> ... and we are trying to handle the "only a single dummy node" case
> (patch #2), or what am I missing? What is there to optimize currently?
> 
> -- 
> Thanks,
> 
> David / dhildenb
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-08  7:21               ` David Hildenbrand
  2020-07-08  7:38                 ` Mike Rapoport
@ 2020-07-08  7:50                 ` Dan Williams
  2020-07-08  8:26                   ` David Hildenbrand
  1 sibling, 1 reply; 46+ messages in thread
From: Dan Williams @ 2020-07-08  7:50 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Mike Rapoport, Michal Hocko, Jia He, Catalin Marinas,
	Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He,
	Chuhong Yuan, Linux ARM, Linux Kernel Mailing List, Linux MM,
	linux-nvdimm, Kaly Xin

On Wed, Jul 8, 2020 at 12:22 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 08.07.20 07:27, Mike Rapoport wrote:
> > On Tue, Jul 07, 2020 at 03:05:48PM -0700, Dan Williams wrote:
> >> On Tue, Jul 7, 2020 at 11:01 AM Mike Rapoport <rppt@linux.ibm.com> wrote:
> >>>
> >>> On Tue, Jul 07, 2020 at 02:26:08PM +0200, David Hildenbrand wrote:
> >>>> On 07.07.20 14:13, Mike Rapoport wrote:
> >>>>> On Tue, Jul 07, 2020 at 01:54:54PM +0200, Michal Hocko wrote:
> >>>>>> On Tue 07-07-20 13:59:15, Jia He wrote:
> >>>>>>> This exports memory_add_physaddr_to_nid() for module driver to use.
> >>>>>>>
> >>>>>>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case
> >>>>>>> NUMA_NO_NID is detected.
> >>>>>>>
> >>>>>>> Suggested-by: David Hildenbrand <david@redhat.com>
> >>>>>>> Signed-off-by: Jia He <justin.he@arm.com>
> >>>>>>> ---
> >>>>>>>  arch/arm64/mm/numa.c | 5 +++--
> >>>>>>>  1 file changed, 3 insertions(+), 2 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> >>>>>>> index aafcee3e3f7e..7eeb31740248 100644
> >>>>>>> --- a/arch/arm64/mm/numa.c
> >>>>>>> +++ b/arch/arm64/mm/numa.c
> >>>>>>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void)
> >>>>>>>
> >>>>>>>  /*
> >>>>>>>   * We hope that we will be hotplugging memory on nodes we already know about,
> >>>>>>> - * such that acpi_get_node() succeeds and we never fall back to this...
> >>>>>>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node
> >>>>>>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option.
> >>>>>>>   */
> >>>>>>>  int memory_add_physaddr_to_nid(u64 addr)
> >>>>>>>  {
> >>>>>>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr);
> >>>>>>>   return 0;
> >>>>>>>  }
> >>>>>>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
> >>>>>>
> >>>>>> Does it make sense to export a noop function? Wouldn't make more sense
> >>>>>> to simply make it static inline somewhere in a header? I haven't checked
> >>>>>> whether there is an easy way to do that sanely bu this just hit my eyes.
> >>>>>
> >>>>> We'll need to either add a CONFIG_ option or arch specific callback to
> >>>>> make both non-empty (x86, powerpc, ia64) and empty (arm64, sh)
> >>>>> implementations coexist ...
> >>>>
> >>>> Note: I have a similar dummy (return 0) patch for s390x lying around here.
> >>>
> >>> Then we'll call it a tie - 3:3 ;-)
> >>
> >> So I'd be happy to jump on the train of people wanting to export the
> >> ARM stub for this (and add a new ARM stub for phys_to_target_node()),
> >> but Will did have a plausibly better idea that I have been meaning to
> >> circle back to:
> >>
> >> http://lore.kernel.org/r/20200325111039.GA32109@willie-the-truck
> >>
> >> ...i.e. iterate over node data to do the lookup. This would seem to
> >> work generically for multiple archs unless I am missing something?
>
> IIRC, only memory assigned to/onlined to a ZONE is represented in the
> pgdat node span. E.g., not offline memory blocks.

So this dovetails somewhat with Will's idea. What if we populated
node_data for "offline" ranges? I started there, but then saw
ARCH_KEEP_MEMBLOCK and thought it would be safer to just teach
phys_to_target_node() to use that rather than update other code paths
to expect node_data might not always reflect online data.

> Esp., when hotplugging + onlining consecutive memory, there won't really
> be any intersections in most cases if I am not wrong. It would not be
> "intersection" but rather "closest fit".
>
> With overlapping nodes it's even more unclear. Which one to pick?

In the overlap case you get what you get. Some signal is better than
the noise of a dummy function. The consequences of picking the wrong
node might be that the kernel can't properly associate a memory range
to its performance data tables in firmware, but then again firmware
messed up with an overlapping node definition in the first instance.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-08  7:50                 ` Dan Williams
@ 2020-07-08  8:26                   ` David Hildenbrand
  2020-07-08  8:39                     ` Mike Rapoport
  0 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2020-07-08  8:26 UTC (permalink / raw)
  To: Dan Williams
  Cc: Mike Rapoport, Michal Hocko, Jia He, Catalin Marinas,
	Will Deacon, Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He,
	Chuhong Yuan, Linux ARM, Linux Kernel Mailing List, Linux MM,
	linux-nvdimm, Kaly Xin

On 08.07.20 09:50, Dan Williams wrote:
> On Wed, Jul 8, 2020 at 12:22 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 08.07.20 07:27, Mike Rapoport wrote:
>>> On Tue, Jul 07, 2020 at 03:05:48PM -0700, Dan Williams wrote:
>>>> On Tue, Jul 7, 2020 at 11:01 AM Mike Rapoport <rppt@linux.ibm.com> wrote:
>>>>>
>>>>> On Tue, Jul 07, 2020 at 02:26:08PM +0200, David Hildenbrand wrote:
>>>>>> On 07.07.20 14:13, Mike Rapoport wrote:
>>>>>>> On Tue, Jul 07, 2020 at 01:54:54PM +0200, Michal Hocko wrote:
>>>>>>>> On Tue 07-07-20 13:59:15, Jia He wrote:
>>>>>>>>> This exports memory_add_physaddr_to_nid() for module driver to use.
>>>>>>>>>
>>>>>>>>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case
>>>>>>>>> NUMA_NO_NID is detected.
>>>>>>>>>
>>>>>>>>> Suggested-by: David Hildenbrand <david@redhat.com>
>>>>>>>>> Signed-off-by: Jia He <justin.he@arm.com>
>>>>>>>>> ---
>>>>>>>>>  arch/arm64/mm/numa.c | 5 +++--
>>>>>>>>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>>>>>>>>> index aafcee3e3f7e..7eeb31740248 100644
>>>>>>>>> --- a/arch/arm64/mm/numa.c
>>>>>>>>> +++ b/arch/arm64/mm/numa.c
>>>>>>>>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void)
>>>>>>>>>
>>>>>>>>>  /*
>>>>>>>>>   * We hope that we will be hotplugging memory on nodes we already know about,
>>>>>>>>> - * such that acpi_get_node() succeeds and we never fall back to this...
>>>>>>>>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node
>>>>>>>>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option.
>>>>>>>>>   */
>>>>>>>>>  int memory_add_physaddr_to_nid(u64 addr)
>>>>>>>>>  {
>>>>>>>>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr);
>>>>>>>>>   return 0;
>>>>>>>>>  }
>>>>>>>>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
>>>>>>>>
>>>>>>>> Does it make sense to export a noop function? Wouldn't make more sense
>>>>>>>> to simply make it static inline somewhere in a header? I haven't checked
>>>>>>>> whether there is an easy way to do that sanely bu this just hit my eyes.
>>>>>>>
>>>>>>> We'll need to either add a CONFIG_ option or arch specific callback to
>>>>>>> make both non-empty (x86, powerpc, ia64) and empty (arm64, sh)
>>>>>>> implementations coexist ...
>>>>>>
>>>>>> Note: I have a similar dummy (return 0) patch for s390x lying around here.
>>>>>
>>>>> Then we'll call it a tie - 3:3 ;-)
>>>>
>>>> So I'd be happy to jump on the train of people wanting to export the
>>>> ARM stub for this (and add a new ARM stub for phys_to_target_node()),
>>>> but Will did have a plausibly better idea that I have been meaning to
>>>> circle back to:
>>>>
>>>> http://lore.kernel.org/r/20200325111039.GA32109@willie-the-truck
>>>>
>>>> ...i.e. iterate over node data to do the lookup. This would seem to
>>>> work generically for multiple archs unless I am missing something?
>>
>> IIRC, only memory assigned to/onlined to a ZONE is represented in the
>> pgdat node span. E.g., not offline memory blocks.
> 
> So this dovetails somewhat with Will's idea. What if we populated
> node_data for "offline" ranges? I started there, but then saw
> ARCH_KEEP_MEMBLOCK and thought it would be safer to just teach
> phys_to_target_node() to use that rather than update other code paths
> to expect node_data might not always reflect online data.

We currently need a somewhat-accurate pgdat node span to detect when to
offline a node. See try_offline_node(). This works fairly reliable.

Shrinking the node span is currently fairly easy for !ZONE_DEVICE
memory, because we can rely on pfn_to_online_page() + pfn_to_nid(pfn).
See e.g., find_biggest_section_pfn().

If we glue growing/shrinking the node span to adding/removing of memory
(instead of e.g., onlining/offlining), we can no longer base shrinking
on memmap data. We would have to get the information ("how far can I
shrink the node span, is it empty?") from somewhere else. E.g.,
for_each_memory_block() - but that one does not cover ZONE_DEVICE. And
there are memory blocks which cover multiple nodes, in which case we
only store one of them ... unreliable.

This certainly needs more thought :/

> 
>> Esp., when hotplugging + onlining consecutive memory, there won't really
>> be any intersections in most cases if I am not wrong. It would not be
>> "intersection" but rather "closest fit".
>>
>> With overlapping nodes it's even more unclear. Which one to pick?
> 
> In the overlap case you get what you get. Some signal is better than
> the noise of a dummy function. The consequences of picking the wrong
> node might be that the kernel can't properly associate a memory range
> to its performance data tables in firmware, but then again firmware
> messed up with an overlapping node definition in the first instance.

I'd be curious if what we are trying to optimize here is actually worth
optimizing. IOW, is there a well-known scenario where the dummy value on
arm64 would be problematic and is worth the effort?

I mean, in all performance relevant setups (ignoring
hv_balloon/xen-balloon/prove_store(), which also use
memory_add_physaddr_to_nid()), we should have a proper PXM/node
specified by the hardware on memory hotadd. The fallback of
memory_add_physaddr_to_nid() is not relevant in these scenarios.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-08  8:26                   ` David Hildenbrand
@ 2020-07-08  8:39                     ` Mike Rapoport
  2020-07-08  8:45                       ` David Hildenbrand
  0 siblings, 1 reply; 46+ messages in thread
From: Mike Rapoport @ 2020-07-08  8:39 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Dan Williams, Michal Hocko, Jia He, Catalin Marinas, Will Deacon,
	Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He,
	Chuhong Yuan, Linux ARM, Linux Kernel Mailing List, Linux MM,
	linux-nvdimm, Kaly Xin

On Wed, Jul 08, 2020 at 10:26:41AM +0200, David Hildenbrand wrote:
> On 08.07.20 09:50, Dan Williams wrote:
> > On Wed, Jul 8, 2020 at 12:22 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> >>>>>>>> On Tue 07-07-20 13:59:15, Jia He wrote:
> >>>>>>>>> This exports memory_add_physaddr_to_nid() for module driver to use.
> >>>>>>>>>
> >>>>>>>>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case
> >>>>>>>>> NUMA_NO_NID is detected.
> >>>>>>>>>
> >>>>>>>>> Suggested-by: David Hildenbrand <david@redhat.com>
> >>>>>>>>> Signed-off-by: Jia He <justin.he@arm.com>
> >>>>>>>>> ---
> >>>>>>>>>  arch/arm64/mm/numa.c | 5 +++--
> >>>>>>>>>  1 file changed, 3 insertions(+), 2 deletions(-)
> >>>>>>>>>
> >>>>>>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> >>>>>>>>> index aafcee3e3f7e..7eeb31740248 100644
> >>>>>>>>> --- a/arch/arm64/mm/numa.c
> >>>>>>>>> +++ b/arch/arm64/mm/numa.c
> >>>>>>>>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void)
> >>>>>>>>>
> >>>>>>>>>  /*
> >>>>>>>>>   * We hope that we will be hotplugging memory on nodes we already know about,
> >>>>>>>>> - * such that acpi_get_node() succeeds and we never fall back to this...
> >>>>>>>>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node
> >>>>>>>>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option.
> >>>>>>>>>   */
> >>>>>>>>>  int memory_add_physaddr_to_nid(u64 addr)
> >>>>>>>>>  {
> >>>>>>>>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr);
> >>>>>>>>>   return 0;
> >>>>>>>>>  }
> >>>>>>>>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
> >>>>>>>>
> >>>>>>>> Does it make sense to export a noop function? Wouldn't make more sense
> >>>>>>>> to simply make it static inline somewhere in a header? I haven't checked
> >>>>>>>> whether there is an easy way to do that sanely bu this just hit my eyes.

> I'd be curious if what we are trying to optimize here is actually worth
> optimizing. IOW, is there a well-known scenario where the dummy value on
> arm64 would be problematic and is worth the effort?

Well, it started with Michal's comment above that EXPORT_SYMBOL_GPL()
for a stub might be an overkill.

I think Jia's suggestion [1] with addition of a comment that explains
why and when the stub will be used, can work for both
memory_add_physaddr_to_nid() and phys_to_target_node().

But on more theoretical/fundmanetal level, I think we lack a generic
abstraction similar to e.g. x86 'struct numa_meminfo' that serves as
translaton of firmware supplied information into data that can be used
by the generic mm without need to reimplement it for each and every
arch.

[1] https://lore.kernel.org/lkml/AM6PR08MB406907F9F2B13DA6DC893AD9F7670@AM6PR08MB4069.eurprd08.prod.outlook.com

> I mean, in all performance relevant setups (ignoring
> hv_balloon/xen-balloon/prove_store(), which also use
> memory_add_physaddr_to_nid()), we should have a proper PXM/node
> specified by the hardware on memory hotadd. The fallback of
> memory_add_physaddr_to_nid() is not relevant in these scenarios.
> 
> -- 
> Thanks,
> 
> David / dhildenb
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-08  8:39                     ` Mike Rapoport
@ 2020-07-08  8:45                       ` David Hildenbrand
  2020-07-08  9:15                         ` Mike Rapoport
  0 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2020-07-08  8:45 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Dan Williams, Michal Hocko, Jia He, Catalin Marinas, Will Deacon,
	Vishal Verma, Dave Jiang, Andrew Morton, Baoquan He,
	Chuhong Yuan, Linux ARM, Linux Kernel Mailing List, Linux MM,
	linux-nvdimm, Kaly Xin

On 08.07.20 10:39, Mike Rapoport wrote:
> On Wed, Jul 08, 2020 at 10:26:41AM +0200, David Hildenbrand wrote:
>> On 08.07.20 09:50, Dan Williams wrote:
>>> On Wed, Jul 8, 2020 at 12:22 AM David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>>>>>>>> On Tue 07-07-20 13:59:15, Jia He wrote:
>>>>>>>>>>> This exports memory_add_physaddr_to_nid() for module driver to use.
>>>>>>>>>>>
>>>>>>>>>>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case
>>>>>>>>>>> NUMA_NO_NID is detected.
>>>>>>>>>>>
>>>>>>>>>>> Suggested-by: David Hildenbrand <david@redhat.com>
>>>>>>>>>>> Signed-off-by: Jia He <justin.he@arm.com>
>>>>>>>>>>> ---
>>>>>>>>>>>  arch/arm64/mm/numa.c | 5 +++--
>>>>>>>>>>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>>>>>>>>>>> index aafcee3e3f7e..7eeb31740248 100644
>>>>>>>>>>> --- a/arch/arm64/mm/numa.c
>>>>>>>>>>> +++ b/arch/arm64/mm/numa.c
>>>>>>>>>>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void)
>>>>>>>>>>>
>>>>>>>>>>>  /*
>>>>>>>>>>>   * We hope that we will be hotplugging memory on nodes we already know about,
>>>>>>>>>>> - * such that acpi_get_node() succeeds and we never fall back to this...
>>>>>>>>>>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node
>>>>>>>>>>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option.
>>>>>>>>>>>   */
>>>>>>>>>>>  int memory_add_physaddr_to_nid(u64 addr)
>>>>>>>>>>>  {
>>>>>>>>>>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr);
>>>>>>>>>>>   return 0;
>>>>>>>>>>>  }
>>>>>>>>>>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
>>>>>>>>>>
>>>>>>>>>> Does it make sense to export a noop function? Wouldn't make more sense
>>>>>>>>>> to simply make it static inline somewhere in a header? I haven't checked
>>>>>>>>>> whether there is an easy way to do that sanely bu this just hit my eyes.
> 
>> I'd be curious if what we are trying to optimize here is actually worth
>> optimizing. IOW, is there a well-known scenario where the dummy value on
>> arm64 would be problematic and is worth the effort?
> 
> Well, it started with Michal's comment above that EXPORT_SYMBOL_GPL()
> for a stub might be an overkill.
> 
> I think Jia's suggestion [1] with addition of a comment that explains
> why and when the stub will be used, can work for both
> memory_add_physaddr_to_nid() and phys_to_target_node().

Agreed.

> 
> But on more theoretical/fundmanetal level, I think we lack a generic
> abstraction similar to e.g. x86 'struct numa_meminfo' that serves as
> translaton of firmware supplied information into data that can be used
> by the generic mm without need to reimplement it for each and every
> arch.

Right. As I expressed, I am not a friend of using memblock for that, and
the pgdat node span is tricky.

Maybe abstracting that x86 concept is possible in some way (and we could
restrict the information to boot-time properties, so we don't have to
mess with memory hot(un)plug - just as done for numa_meminfo AFAIKS).

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-08  8:45                       ` David Hildenbrand
@ 2020-07-08  9:15                         ` Mike Rapoport
  2020-07-08  9:25                           ` David Hildenbrand
  0 siblings, 1 reply; 46+ messages in thread
From: Mike Rapoport @ 2020-07-08  9:15 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Mike Rapoport, Dan Williams, Michal Hocko, Jia He,
	Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang,
	Andrew Morton, Baoquan He, Chuhong Yuan, Linux ARM,
	Linux Kernel Mailing List, Linux MM, linux-nvdimm, Kaly Xin

On Wed, Jul 08, 2020 at 10:45:17AM +0200, David Hildenbrand wrote:
> On 08.07.20 10:39, Mike Rapoport wrote:
> > On Wed, Jul 08, 2020 at 10:26:41AM +0200, David Hildenbrand wrote:
> >> On 08.07.20 09:50, Dan Williams wrote:
> >>> On Wed, Jul 8, 2020 at 12:22 AM David Hildenbrand <david@redhat.com> wrote:
> >>>>
> >>>>>>>>>> On Tue 07-07-20 13:59:15, Jia He wrote:
> >>>>>>>>>>> This exports memory_add_physaddr_to_nid() for module driver to use.
> >>>>>>>>>>>
> >>>>>>>>>>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case
> >>>>>>>>>>> NUMA_NO_NID is detected.
> >>>>>>>>>>>
> >>>>>>>>>>> Suggested-by: David Hildenbrand <david@redhat.com>
> >>>>>>>>>>> Signed-off-by: Jia He <justin.he@arm.com>
> >>>>>>>>>>> ---
> >>>>>>>>>>>  arch/arm64/mm/numa.c | 5 +++--
> >>>>>>>>>>>  1 file changed, 3 insertions(+), 2 deletions(-)
> >>>>>>>>>>>
> >>>>>>>>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> >>>>>>>>>>> index aafcee3e3f7e..7eeb31740248 100644
> >>>>>>>>>>> --- a/arch/arm64/mm/numa.c
> >>>>>>>>>>> +++ b/arch/arm64/mm/numa.c
> >>>>>>>>>>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void)
> >>>>>>>>>>>
> >>>>>>>>>>>  /*
> >>>>>>>>>>>   * We hope that we will be hotplugging memory on nodes we already know about,
> >>>>>>>>>>> - * such that acpi_get_node() succeeds and we never fall back to this...
> >>>>>>>>>>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node
> >>>>>>>>>>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option.
> >>>>>>>>>>>   */
> >>>>>>>>>>>  int memory_add_physaddr_to_nid(u64 addr)
> >>>>>>>>>>>  {
> >>>>>>>>>>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr);
> >>>>>>>>>>>   return 0;
> >>>>>>>>>>>  }
> >>>>>>>>>>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
> >>>>>>>>>>
> >>>>>>>>>> Does it make sense to export a noop function? Wouldn't make more sense
> >>>>>>>>>> to simply make it static inline somewhere in a header? I haven't checked
> >>>>>>>>>> whether there is an easy way to do that sanely bu this just hit my eyes.
> > 
> >> I'd be curious if what we are trying to optimize here is actually worth
> >> optimizing. IOW, is there a well-known scenario where the dummy value on
> >> arm64 would be problematic and is worth the effort?
> > 
> > Well, it started with Michal's comment above that EXPORT_SYMBOL_GPL()
> > for a stub might be an overkill.
> > 
> > I think Jia's suggestion [1] with addition of a comment that explains
> > why and when the stub will be used, can work for both
> > memory_add_physaddr_to_nid() and phys_to_target_node().
> 
> Agreed.
> 
> > 
> > But on more theoretical/fundmanetal level, I think we lack a generic
> > abstraction similar to e.g. x86 'struct numa_meminfo' that serves as
> > translaton of firmware supplied information into data that can be used
> > by the generic mm without need to reimplement it for each and every
> > arch.
> 
> Right. As I expressed, I am not a friend of using memblock for that, and
> the pgdat node span is tricky.
>
> Maybe abstracting that x86 concept is possible in some way (and we could
> restrict the information to boot-time properties, so we don't have to
> mess with memory hot(un)plug - just as done for numa_meminfo AFAIKS).

I agree with pgdat part and disagree about memblock. It already has
non-init physmap, why won't we add memblock.memory to the mix? ;-)

Now, seriously, memblock already has all the necessary information about
the coldplug memory for several architectures. x86 being an exception
because for some reason the reserved memory is not considered memory
there. The infrastructure for quiering and iterating memory regions is
already there. We just need to leave out the irrelevant parts, like
memblock.reserved and allocation funcions.

Otherwise we'll add yet another 'struct { start, end }', a horde of
covnersion and re-initialization functions that will do more or less the
same things as current memblock APIs.

> -- 
> Thanks,
> 
> David / dhildenb
> 
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-08  9:15                         ` Mike Rapoport
@ 2020-07-08  9:25                           ` David Hildenbrand
  2020-07-08  9:45                             ` Mike Rapoport
  0 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2020-07-08  9:25 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Mike Rapoport, Dan Williams, Michal Hocko, Jia He,
	Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang,
	Andrew Morton, Baoquan He, Chuhong Yuan, Linux ARM,
	Linux Kernel Mailing List, Linux MM, linux-nvdimm, Kaly Xin

On 08.07.20 11:15, Mike Rapoport wrote:
> On Wed, Jul 08, 2020 at 10:45:17AM +0200, David Hildenbrand wrote:
>> On 08.07.20 10:39, Mike Rapoport wrote:
>>> On Wed, Jul 08, 2020 at 10:26:41AM +0200, David Hildenbrand wrote:
>>>> On 08.07.20 09:50, Dan Williams wrote:
>>>>> On Wed, Jul 8, 2020 at 12:22 AM David Hildenbrand <david@redhat.com> wrote:
>>>>>>
>>>>>>>>>>>> On Tue 07-07-20 13:59:15, Jia He wrote:
>>>>>>>>>>>>> This exports memory_add_physaddr_to_nid() for module driver to use.
>>>>>>>>>>>>>
>>>>>>>>>>>>> memory_add_physaddr_to_nid() is a fallback option to get the nid in case
>>>>>>>>>>>>> NUMA_NO_NID is detected.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Suggested-by: David Hildenbrand <david@redhat.com>
>>>>>>>>>>>>> Signed-off-by: Jia He <justin.he@arm.com>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>>  arch/arm64/mm/numa.c | 5 +++--
>>>>>>>>>>>>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>>>>>>>>>>>>> index aafcee3e3f7e..7eeb31740248 100644
>>>>>>>>>>>>> --- a/arch/arm64/mm/numa.c
>>>>>>>>>>>>> +++ b/arch/arm64/mm/numa.c
>>>>>>>>>>>>> @@ -464,10 +464,11 @@ void __init arm64_numa_init(void)
>>>>>>>>>>>>>
>>>>>>>>>>>>>  /*
>>>>>>>>>>>>>   * We hope that we will be hotplugging memory on nodes we already know about,
>>>>>>>>>>>>> - * such that acpi_get_node() succeeds and we never fall back to this...
>>>>>>>>>>>>> + * such that acpi_get_node() succeeds. But when SRAT is not present, the node
>>>>>>>>>>>>> + * id may be probed as NUMA_NO_NODE by acpi, Here provide a fallback option.
>>>>>>>>>>>>>   */
>>>>>>>>>>>>>  int memory_add_physaddr_to_nid(u64 addr)
>>>>>>>>>>>>>  {
>>>>>>>>>>>>> - pr_warn("Unknown node for memory at 0x%llx, assuming node 0\n", addr);
>>>>>>>>>>>>>   return 0;
>>>>>>>>>>>>>  }
>>>>>>>>>>>>> +EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
>>>>>>>>>>>>
>>>>>>>>>>>> Does it make sense to export a noop function? Wouldn't make more sense
>>>>>>>>>>>> to simply make it static inline somewhere in a header? I haven't checked
>>>>>>>>>>>> whether there is an easy way to do that sanely bu this just hit my eyes.
>>>
>>>> I'd be curious if what we are trying to optimize here is actually worth
>>>> optimizing. IOW, is there a well-known scenario where the dummy value on
>>>> arm64 would be problematic and is worth the effort?
>>>
>>> Well, it started with Michal's comment above that EXPORT_SYMBOL_GPL()
>>> for a stub might be an overkill.
>>>
>>> I think Jia's suggestion [1] with addition of a comment that explains
>>> why and when the stub will be used, can work for both
>>> memory_add_physaddr_to_nid() and phys_to_target_node().
>>
>> Agreed.
>>
>>>
>>> But on more theoretical/fundmanetal level, I think we lack a generic
>>> abstraction similar to e.g. x86 'struct numa_meminfo' that serves as
>>> translaton of firmware supplied information into data that can be used
>>> by the generic mm without need to reimplement it for each and every
>>> arch.
>>
>> Right. As I expressed, I am not a friend of using memblock for that, and
>> the pgdat node span is tricky.
>>
>> Maybe abstracting that x86 concept is possible in some way (and we could
>> restrict the information to boot-time properties, so we don't have to
>> mess with memory hot(un)plug - just as done for numa_meminfo AFAIKS).
> 
> I agree with pgdat part and disagree about memblock. It already has
> non-init physmap, why won't we add memblock.memory to the mix? ;-)

Can we generalize and tweak physmap to contain node info? That's all we
need, no? (the special mem= parameter handling should not matter for our
use case, where "physmap" and "memory" would differ)

> 
> Now, seriously, memblock already has all the necessary information about
> the coldplug memory for several architectures. x86 being an exception
> because for some reason the reserved memory is not considered memory
> there. The infrastructure for quiering and iterating memory regions is
> already there. We just need to leave out the irrelevant parts, like
> memblock.reserved and allocation funcions.

I *really* don't want to mess with memblocks on memory hot(un)plug on
x86 and s390x (+other architectures in the future). I also thought about
stopping to create memblocks for hotplugged memory on arm64, by tweaking
pfn_valid() to query memblocks only for early sections.

If "physmem" is not an option, can we at least introduce something like
ARCH_UPDTAE_MEMBLOCK_ON_HOTPLUG to avoid doing that on x86 and s390x for
now (and later maybe for others)?

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-08  9:25                           ` David Hildenbrand
@ 2020-07-08  9:45                             ` Mike Rapoport
  2020-07-08 10:04                               ` David Hildenbrand
  0 siblings, 1 reply; 46+ messages in thread
From: Mike Rapoport @ 2020-07-08  9:45 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Mike Rapoport, Dan Williams, Michal Hocko, Jia He,
	Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang,
	Andrew Morton, Baoquan He, Chuhong Yuan, Linux ARM,
	Linux Kernel Mailing List, Linux MM, linux-nvdimm, Kaly Xin

On Wed, Jul 08, 2020 at 11:25:36AM +0200, David Hildenbrand wrote:
> On 08.07.20 11:15, Mike Rapoport wrote:
> >>>>>>
> >>> But on more theoretical/fundmanetal level, I think we lack a generic
> >>> abstraction similar to e.g. x86 'struct numa_meminfo' that serves as
> >>> translaton of firmware supplied information into data that can be used
> >>> by the generic mm without need to reimplement it for each and every
> >>> arch.
> >>
> >> Right. As I expressed, I am not a friend of using memblock for that, and
> >> the pgdat node span is tricky.
> >>
> >> Maybe abstracting that x86 concept is possible in some way (and we could
> >> restrict the information to boot-time properties, so we don't have to
> >> mess with memory hot(un)plug - just as done for numa_meminfo AFAIKS).
> > 
> > I agree with pgdat part and disagree about memblock. It already has
> > non-init physmap, why won't we add memblock.memory to the mix? ;-)
> 
> Can we generalize and tweak physmap to contain node info? That's all we
> need, no? (the special mem= parameter handling should not matter for our
> use case, where "physmap" and "memory" would differ)

TBH, I have only random vague thoughts at the moment. This might be an
option. But then we need to enable physmap on !s390, right?
 
> > Now, seriously, memblock already has all the necessary information about
> > the coldplug memory for several architectures. x86 being an exception
> > because for some reason the reserved memory is not considered memory
> > there. The infrastructure for quiering and iterating memory regions is
> > already there. We just need to leave out the irrelevant parts, like
> > memblock.reserved and allocation funcions.
> 
> I *really* don't want to mess with memblocks on memory hot(un)plug on
> x86 and s390x (+other architectures in the future). I also thought about
> stopping to create memblocks for hotplugged memory on arm64, by tweaking
> pfn_valid() to query memblocks only for early sections.
>
> If "physmem" is not an option, can we at least introduce something like
> ARCH_UPDTAE_MEMBLOCK_ON_HOTPLUG to avoid doing that on x86 and s390x for
> now (and later maybe for others)?

I have to do more memory hotplug howework to answer that ;-)

My general point is that we don't have to reinvent the wheel to have
coldplug memory representation, it's already there. We just need a way
to use it properly.

> -- 
> Thanks,
> 
> David / dhildenb
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-08  9:45                             ` Mike Rapoport
@ 2020-07-08 10:04                               ` David Hildenbrand
  2020-07-08 15:50                                 ` Dan Williams
  0 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2020-07-08 10:04 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Mike Rapoport, Dan Williams, Michal Hocko, Jia He,
	Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang,
	Andrew Morton, Baoquan He, Chuhong Yuan, Linux ARM,
	Linux Kernel Mailing List, Linux MM, linux-nvdimm, Kaly Xin

On 08.07.20 11:45, Mike Rapoport wrote:
> On Wed, Jul 08, 2020 at 11:25:36AM +0200, David Hildenbrand wrote:
>> On 08.07.20 11:15, Mike Rapoport wrote:
>>>>>>>>
>>>>> But on more theoretical/fundmanetal level, I think we lack a generic
>>>>> abstraction similar to e.g. x86 'struct numa_meminfo' that serves as
>>>>> translaton of firmware supplied information into data that can be used
>>>>> by the generic mm without need to reimplement it for each and every
>>>>> arch.
>>>>
>>>> Right. As I expressed, I am not a friend of using memblock for that, and
>>>> the pgdat node span is tricky.
>>>>
>>>> Maybe abstracting that x86 concept is possible in some way (and we could
>>>> restrict the information to boot-time properties, so we don't have to
>>>> mess with memory hot(un)plug - just as done for numa_meminfo AFAIKS).
>>>
>>> I agree with pgdat part and disagree about memblock. It already has
>>> non-init physmap, why won't we add memblock.memory to the mix? ;-)
>>
>> Can we generalize and tweak physmap to contain node info? That's all we
>> need, no? (the special mem= parameter handling should not matter for our
>> use case, where "physmap" and "memory" would differ)
> 
> TBH, I have only random vague thoughts at the moment. This might be an
> option. But then we need to enable physmap on !s390, right?

Yes, looks like it.

>  
>>> Now, seriously, memblock already has all the necessary information about
>>> the coldplug memory for several architectures. x86 being an exception
>>> because for some reason the reserved memory is not considered memory
>>> there. The infrastructure for quiering and iterating memory regions is
>>> already there. We just need to leave out the irrelevant parts, like
>>> memblock.reserved and allocation funcions.
>>
>> I *really* don't want to mess with memblocks on memory hot(un)plug on
>> x86 and s390x (+other architectures in the future). I also thought about
>> stopping to create memblocks for hotplugged memory on arm64, by tweaking
>> pfn_valid() to query memblocks only for early sections.
>>
>> If "physmem" is not an option, can we at least introduce something like
>> ARCH_UPDTAE_MEMBLOCK_ON_HOTPLUG to avoid doing that on x86 and s390x for
>> now (and later maybe for others)?
> 
> I have to do more memory hotplug howework to answer that ;-)
> 
> My general point is that we don't have to reinvent the wheel to have
> coldplug memory representation, it's already there. We just need a way
> to use it properly.

Yes, I tend to agree. Details to be clarified :)

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-08 10:04                               ` David Hildenbrand
@ 2020-07-08 15:50                                 ` Dan Williams
  2020-07-08 16:10                                   ` David Hildenbrand
  0 siblings, 1 reply; 46+ messages in thread
From: Dan Williams @ 2020-07-08 15:50 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Mike Rapoport, Mike Rapoport, Michal Hocko, Jia He,
	Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang,
	Andrew Morton, Baoquan He, Chuhong Yuan, Linux ARM,
	Linux Kernel Mailing List, Linux MM, linux-nvdimm, Kaly Xin

On Wed, Jul 8, 2020 at 3:04 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 08.07.20 11:45, Mike Rapoport wrote:
> > On Wed, Jul 08, 2020 at 11:25:36AM +0200, David Hildenbrand wrote:
> >> On 08.07.20 11:15, Mike Rapoport wrote:
> >>>>>>>>
> >>>>> But on more theoretical/fundmanetal level, I think we lack a generic
> >>>>> abstraction similar to e.g. x86 'struct numa_meminfo' that serves as
> >>>>> translaton of firmware supplied information into data that can be used
> >>>>> by the generic mm without need to reimplement it for each and every
> >>>>> arch.
> >>>>
> >>>> Right. As I expressed, I am not a friend of using memblock for that, and
> >>>> the pgdat node span is tricky.
> >>>>
> >>>> Maybe abstracting that x86 concept is possible in some way (and we could
> >>>> restrict the information to boot-time properties, so we don't have to
> >>>> mess with memory hot(un)plug - just as done for numa_meminfo AFAIKS).
> >>>
> >>> I agree with pgdat part and disagree about memblock. It already has
> >>> non-init physmap, why won't we add memblock.memory to the mix? ;-)
> >>
> >> Can we generalize and tweak physmap to contain node info? That's all we
> >> need, no? (the special mem= parameter handling should not matter for our
> >> use case, where "physmap" and "memory" would differ)
> >
> > TBH, I have only random vague thoughts at the moment. This might be an
> > option. But then we need to enable physmap on !s390, right?
>
> Yes, looks like it.
>
> >
> >>> Now, seriously, memblock already has all the necessary information about
> >>> the coldplug memory for several architectures. x86 being an exception
> >>> because for some reason the reserved memory is not considered memory
> >>> there. The infrastructure for quiering and iterating memory regions is
> >>> already there. We just need to leave out the irrelevant parts, like
> >>> memblock.reserved and allocation funcions.
> >>
> >> I *really* don't want to mess with memblocks on memory hot(un)plug on
> >> x86 and s390x (+other architectures in the future). I also thought about
> >> stopping to create memblocks for hotplugged memory on arm64, by tweaking
> >> pfn_valid() to query memblocks only for early sections.
> >>
> >> If "physmem" is not an option, can we at least introduce something like
> >> ARCH_UPDTAE_MEMBLOCK_ON_HOTPLUG to avoid doing that on x86 and s390x for
> >> now (and later maybe for others)?
> >
> > I have to do more memory hotplug howework to answer that ;-)
> >
> > My general point is that we don't have to reinvent the wheel to have
> > coldplug memory representation, it's already there. We just need a way
> > to use it properly.
>
> Yes, I tend to agree. Details to be clarified :)

I'm not quite understanding the concern, or requirement about
"updating memblock" in the hotplug path. The routines
memory_add_physaddr_to_nid() and phys_to_target_node() are helpers to
interrogate platform-firmware numa info through a common abstraction.
They place no burden on the memory hotplug code they're just used to
see if a hot-added range lies within an existing node span when
platform-firmware otherwise fails to communicate a node. x86 can
continue to back those helpers with numa_meminfo, arm64 can use a
generic memblock implementation and other archs can follow the arm64
example if they want better numa answers for drivers.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-08 15:50                                 ` Dan Williams
@ 2020-07-08 16:10                                   ` David Hildenbrand
  2020-07-08 16:47                                     ` Mike Rapoport
  0 siblings, 1 reply; 46+ messages in thread
From: David Hildenbrand @ 2020-07-08 16:10 UTC (permalink / raw)
  To: Dan Williams
  Cc: Mike Rapoport, Mike Rapoport, Michal Hocko, Jia He,
	Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang,
	Andrew Morton, Baoquan He, Chuhong Yuan, Linux ARM,
	Linux Kernel Mailing List, Linux MM, linux-nvdimm, Kaly Xin

On 08.07.20 17:50, Dan Williams wrote:
> On Wed, Jul 8, 2020 at 3:04 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 08.07.20 11:45, Mike Rapoport wrote:
>>> On Wed, Jul 08, 2020 at 11:25:36AM +0200, David Hildenbrand wrote:
>>>> On 08.07.20 11:15, Mike Rapoport wrote:
>>>>>>>>>>
>>>>>>> But on more theoretical/fundmanetal level, I think we lack a generic
>>>>>>> abstraction similar to e.g. x86 'struct numa_meminfo' that serves as
>>>>>>> translaton of firmware supplied information into data that can be used
>>>>>>> by the generic mm without need to reimplement it for each and every
>>>>>>> arch.
>>>>>>
>>>>>> Right. As I expressed, I am not a friend of using memblock for that, and
>>>>>> the pgdat node span is tricky.
>>>>>>
>>>>>> Maybe abstracting that x86 concept is possible in some way (and we could
>>>>>> restrict the information to boot-time properties, so we don't have to
>>>>>> mess with memory hot(un)plug - just as done for numa_meminfo AFAIKS).
>>>>>
>>>>> I agree with pgdat part and disagree about memblock. It already has
>>>>> non-init physmap, why won't we add memblock.memory to the mix? ;-)
>>>>
>>>> Can we generalize and tweak physmap to contain node info? That's all we
>>>> need, no? (the special mem= parameter handling should not matter for our
>>>> use case, where "physmap" and "memory" would differ)
>>>
>>> TBH, I have only random vague thoughts at the moment. This might be an
>>> option. But then we need to enable physmap on !s390, right?
>>
>> Yes, looks like it.
>>
>>>
>>>>> Now, seriously, memblock already has all the necessary information about
>>>>> the coldplug memory for several architectures. x86 being an exception
>>>>> because for some reason the reserved memory is not considered memory
>>>>> there. The infrastructure for quiering and iterating memory regions is
>>>>> already there. We just need to leave out the irrelevant parts, like
>>>>> memblock.reserved and allocation funcions.
>>>>
>>>> I *really* don't want to mess with memblocks on memory hot(un)plug on
>>>> x86 and s390x (+other architectures in the future). I also thought about
>>>> stopping to create memblocks for hotplugged memory on arm64, by tweaking
>>>> pfn_valid() to query memblocks only for early sections.
>>>>
>>>> If "physmem" is not an option, can we at least introduce something like
>>>> ARCH_UPDTAE_MEMBLOCK_ON_HOTPLUG to avoid doing that on x86 and s390x for
>>>> now (and later maybe for others)?
>>>
>>> I have to do more memory hotplug howework to answer that ;-)
>>>
>>> My general point is that we don't have to reinvent the wheel to have
>>> coldplug memory representation, it's already there. We just need a way
>>> to use it properly.
>>
>> Yes, I tend to agree. Details to be clarified :)
> 
> I'm not quite understanding the concern, or requirement about
> "updating memblock" in the hotplug path. The routines
> memory_add_physaddr_to_nid() and phys_to_target_node() are helpers to
> interrogate platform-firmware numa info through a common abstraction.
> They place no burden on the memory hotplug code they're just used to
> see if a hot-added range lies within an existing node span when
> platform-firmware otherwise fails to communicate a node. x86 can
> continue to back those helpers with numa_meminfo, arm64 can use a
> generic memblock implementation and other archs can follow the arm64
> example if they want better numa answers for drivers.
> 

See memblock_add_node()/memblock_remove() in mm/memory_hotplug.c. I
don't want that code be reactivated for x86/s390x. That's all I am saying.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL
  2020-07-08 16:10                                   ` David Hildenbrand
@ 2020-07-08 16:47                                     ` Mike Rapoport
  0 siblings, 0 replies; 46+ messages in thread
From: Mike Rapoport @ 2020-07-08 16:47 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Dan Williams, Mike Rapoport, Michal Hocko, Jia He,
	Catalin Marinas, Will Deacon, Vishal Verma, Dave Jiang,
	Andrew Morton, Baoquan He, Chuhong Yuan, Linux ARM,
	Linux Kernel Mailing List, Linux MM, linux-nvdimm, Kaly Xin

On Wed, Jul 08, 2020 at 06:10:19PM +0200, David Hildenbrand wrote:
> On 08.07.20 17:50, Dan Williams wrote:
> > On Wed, Jul 8, 2020 at 3:04 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> > 
> > I'm not quite understanding the concern, or requirement about
> > "updating memblock" in the hotplug path. The routines
> > memory_add_physaddr_to_nid() and phys_to_target_node() are helpers to
> > interrogate platform-firmware numa info through a common abstraction.
> > They place no burden on the memory hotplug code they're just used to
> > see if a hot-added range lies within an existing node span when
> > platform-firmware otherwise fails to communicate a node. x86 can
> > continue to back those helpers with numa_meminfo, arm64 can use a
> > generic memblock implementation and other archs can follow the arm64
> > example if they want better numa answers for drivers.
> > 
> 
> See memblock_add_node()/memblock_remove() in mm/memory_hotplug.c. I
> don't want that code be reactivated for x86/s390x. That's all I am saying.

And these have actual meaning only on arm64 because powerpc does not
rely on memblock for memory hot(un)plug, AFAIU.

Anyway, at the moment we can use memblock on hotplug path only on arm64
and I don't think its the path worth exploring.

> -- 
> Thanks,
> 
> David / dhildenb
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2020-07-08 16:47 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-07  5:59 [PATCH v2 0/3] Fix and enable pmem as RAM device on arm64 Jia He
2020-07-07  5:59 ` [PATCH v2 1/3] arm64/numa: export memory_add_physaddr_to_nid as EXPORT_SYMBOL_GPL Jia He
2020-07-07 11:35   ` David Hildenbrand
2020-07-07 11:54   ` Michal Hocko
2020-07-07 12:13     ` Mike Rapoport
2020-07-07 12:26       ` David Hildenbrand
2020-07-07 18:00         ` Mike Rapoport
2020-07-07 22:05           ` Dan Williams
2020-07-08  5:27             ` Mike Rapoport
2020-07-08  7:21               ` David Hildenbrand
2020-07-08  7:38                 ` Mike Rapoport
2020-07-08  7:40                   ` David Hildenbrand
2020-07-08  7:50                 ` Dan Williams
2020-07-08  8:26                   ` David Hildenbrand
2020-07-08  8:39                     ` Mike Rapoport
2020-07-08  8:45                       ` David Hildenbrand
2020-07-08  9:15                         ` Mike Rapoport
2020-07-08  9:25                           ` David Hildenbrand
2020-07-08  9:45                             ` Mike Rapoport
2020-07-08 10:04                               ` David Hildenbrand
2020-07-08 15:50                                 ` Dan Williams
2020-07-08 16:10                                   ` David Hildenbrand
2020-07-08 16:47                                     ` Mike Rapoport
2020-07-08  2:20     ` Justin He
2020-07-08  3:56       ` Dan Williams
2020-07-08  4:08         ` Justin He
2020-07-08  4:27           ` Dan Williams
2020-07-08  6:22             ` Mike Rapoport
2020-07-08  6:53               ` Dan Williams
2020-07-08  6:59               ` David Hildenbrand
2020-07-08  7:04                 ` Dan Williams
2020-07-08  7:16                   ` David Hildenbrand
2020-07-08  7:43                     ` Mike Rapoport
2020-07-08  5:32         ` Mike Rapoport
2020-07-08  5:48           ` Dan Williams
2020-07-08  6:19             ` Mike Rapoport
2020-07-08  6:44               ` Dan Williams
2020-07-08  6:56             ` Justin He
2020-07-08  7:00               ` David Hildenbrand
2020-07-07  5:59 ` [RFC PATCH v2 2/3] device-dax: use fallback nid when numa_node is invalid Jia He
2020-07-07  6:08   ` Justin He
2020-07-07 11:34   ` David Hildenbrand
2020-07-08  1:41     ` Justin He
2020-07-07  5:59 ` [PATCH v2 3/3] mm/memory_hotplug: fix unpaired mem_hotplug_begin/done Jia He
2020-07-07 10:06   ` Michal Hocko
2020-07-07 11:31   ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).