linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Fix and enable pmem as RAM on arm64
@ 2020-07-06  1:19 Jia He
  2020-07-06  1:19 ` [PATCH 1/3] arm64/numa: set numa_off to false when numa node is fake Jia He
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Jia He @ 2020-07-06  1:19 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Andrew Morton, Mike Rapoport, Baoquan He, Chuhong Yuan,
	linux-arm-kernel, linux-kernel, linux-mm, Kaly Xin, Jia He

This fix a few issues when I tried to enable pmem as RAM device on arm64.

Tested on ThunderX2 host/qemu "-M virt" guest.

Jia He (3):
  arm64/numa: set numa_off to false when numa node is fake
  mm/memory_hotplug: harden try_offline_node against bogus nid
  mm/memory_hotplug: fix unpaired mem_hotplug_begin/done

 arch/arm64/mm/numa.c | 3 ++-
 mm/memory_hotplug.c  | 5 ++++-
 2 files changed, 6 insertions(+), 2 deletions(-)

-- 
2.17.1



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] arm64/numa: set numa_off to false when numa node is fake
  2020-07-06  1:19 [PATCH 0/3] Fix and enable pmem as RAM on arm64 Jia He
@ 2020-07-06  1:19 ` Jia He
  2020-07-06  8:02   ` David Hildenbrand
  2020-07-06 10:29   ` Jonathan Cameron
  2020-07-06  1:19 ` [PATCH 2/3] mm/memory_hotplug: harden try_offline_node against bogus nid Jia He
  2020-07-06  1:19 ` [PATCH 3/3] mm/memory_hotplug: fix unpaired mem_hotplug_begin/done Jia He
  2 siblings, 2 replies; 15+ messages in thread
From: Jia He @ 2020-07-06  1:19 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Andrew Morton, Mike Rapoport, Baoquan He, Chuhong Yuan,
	linux-arm-kernel, linux-kernel, linux-mm, Kaly Xin, Jia He

Previously, numa_off is set to true unconditionally in dummy_numa_init(),
even if there is a fake numa node.

But acpi will translate node id to NUMA_NO_NODE(-1) in acpi_map_pxm_to_node()
because it regards numa_off as turning off the numa node.

Without this patch, pmem can't be probed as a RAM device on arm64 if SRAT table
isn't present.

$ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g -a 64K
kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] with invalid node: -1
kmem: probe of dax0.0 failed with error -22

This fixes it by setting numa_off to false.

Signed-off-by: Jia He <justin.he@arm.com>
---
 arch/arm64/mm/numa.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
index aafcee3e3f7e..7689986020d9 100644
--- a/arch/arm64/mm/numa.c
+++ b/arch/arm64/mm/numa.c
@@ -440,7 +440,8 @@ static int __init dummy_numa_init(void)
 		return ret;
 	}
 
-	numa_off = true;
+	/* force numa_off to be false since we have a fake numa node here */
+	numa_off = false;
 	return 0;
 }
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] mm/memory_hotplug: harden try_offline_node against bogus nid
  2020-07-06  1:19 [PATCH 0/3] Fix and enable pmem as RAM on arm64 Jia He
  2020-07-06  1:19 ` [PATCH 1/3] arm64/numa: set numa_off to false when numa node is fake Jia He
@ 2020-07-06  1:19 ` Jia He
  2020-07-06  7:57   ` David Hildenbrand
  2020-07-06  1:19 ` [PATCH 3/3] mm/memory_hotplug: fix unpaired mem_hotplug_begin/done Jia He
  2 siblings, 1 reply; 15+ messages in thread
From: Jia He @ 2020-07-06  1:19 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Andrew Morton, Mike Rapoport, Baoquan He, Chuhong Yuan,
	linux-arm-kernel, linux-kernel, linux-mm, Kaly Xin, Jia He

When testing the remove_memory path of dax pmem, there will be a panic with
call trace:
  try_remove_memory+0x84/0x170
  remove_memory+0x38/0x58
  dev_dax_kmem_remove+0x3c/0x84 [kmem]
  device_release_driver_internal+0xfc/0x1c8
  device_release_driver+0x28/0x38
  bus_remove_device+0xd4/0x158
  device_del+0x160/0x3a0
  unregister_dev_dax+0x30/0x68
  devm_action_release+0x20/0x30
  release_nodes+0x150/0x240
  devres_release_all+0x6c/0x1d0
  device_release_driver_internal+0x10c/0x1c8
  driver_detach+0xac/0x170
  bus_remove_driver+0x64/0x130
  driver_unregister+0x34/0x60
  dax_pmem_exit+0x14/0xffc4 [dax_pmem]
  __arm64_sys_delete_module+0x18c/0x2d0
  el0_svc_common.constprop.2+0x78/0x168
  do_el0_svc+0x34/0xa0
  el0_sync_handler+0xe0/0x188
  el0_sync+0x164/0x180

It is caused by the bogus nid (-1). Although the root cause is pmem dax
translates from pxm to node_id incorrectly due to numa_off, it is worth
hardening the codes in try_offline_node(), quiting if !pgdat.

Signed-off-by: Jia He <justin.he@arm.com>
---
 mm/memory_hotplug.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index da374cd3d45b..e1e290577b45 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1680,6 +1680,9 @@ void try_offline_node(int nid)
 	pg_data_t *pgdat = NODE_DATA(nid);
 	int rc;
 
+	if (WARN_ON(!pgdat))
+		return;
+
 	/*
 	 * If the node still spans pages (especially ZONE_DEVICE), don't
 	 * offline it. A node spans memory after move_pfn_range_to_zone(),
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] mm/memory_hotplug: fix unpaired mem_hotplug_begin/done
  2020-07-06  1:19 [PATCH 0/3] Fix and enable pmem as RAM on arm64 Jia He
  2020-07-06  1:19 ` [PATCH 1/3] arm64/numa: set numa_off to false when numa node is fake Jia He
  2020-07-06  1:19 ` [PATCH 2/3] mm/memory_hotplug: harden try_offline_node against bogus nid Jia He
@ 2020-07-06  1:19 ` Jia He
  2020-07-06  7:49   ` David Hildenbrand
  2 siblings, 1 reply; 15+ messages in thread
From: Jia He @ 2020-07-06  1:19 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Andrew Morton, Mike Rapoport, Baoquan He, Chuhong Yuan,
	linux-arm-kernel, linux-kernel, linux-mm, Kaly Xin, Jia He

When check_memblock_offlined_cb() returns failed rc(e.g. the memblock is
online at that time), mem_hotplug_begin/done is unpaired in such case.

Therefore a warning:
 Call Trace:
  percpu_up_write+0x33/0x40
  try_remove_memory+0x66/0x120
  ? _cond_resched+0x19/0x30
  remove_memory+0x2b/0x40
  dev_dax_kmem_remove+0x36/0x72 [kmem]
  device_release_driver_internal+0xf0/0x1c0
  device_release_driver+0x12/0x20
  bus_remove_device+0xe1/0x150
  device_del+0x17b/0x3e0
  unregister_dev_dax+0x29/0x60
  devm_action_release+0x15/0x20
  release_nodes+0x19a/0x1e0
  devres_release_all+0x3f/0x50
  device_release_driver_internal+0x100/0x1c0
  driver_detach+0x4c/0x8f
  bus_remove_driver+0x5c/0xd0
  driver_unregister+0x31/0x50
  dax_pmem_exit+0x10/0xfe0 [dax_pmem]

This fixes it by moving mem_hotplug_done ahead of "done"

Signed-off-by: Jia He <justin.he@arm.com>
---
 mm/memory_hotplug.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index e1e290577b45..86b36714342b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1769,8 +1769,8 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size)
 
 	try_offline_node(nid);
 
-done:
 	mem_hotplug_done();
+done:
 	return rc;
 }
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] mm/memory_hotplug: fix unpaired mem_hotplug_begin/done
  2020-07-06  1:19 ` [PATCH 3/3] mm/memory_hotplug: fix unpaired mem_hotplug_begin/done Jia He
@ 2020-07-06  7:49   ` David Hildenbrand
  2020-07-07 22:10     ` Dan Williams
  0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2020-07-06  7:49 UTC (permalink / raw)
  To: Jia He, Catalin Marinas, Will Deacon
  Cc: Andrew Morton, Mike Rapoport, Baoquan He, Chuhong Yuan,
	linux-arm-kernel, linux-kernel, linux-mm, Kaly Xin, Dan Williams,
	Michal Hocko

On 06.07.20 03:19, Jia He wrote:
> When check_memblock_offlined_cb() returns failed rc(e.g. the memblock is
> online at that time), mem_hotplug_begin/done is unpaired in such case.
> 
> Therefore a warning:
>  Call Trace:
>   percpu_up_write+0x33/0x40
>   try_remove_memory+0x66/0x120
>   ? _cond_resched+0x19/0x30
>   remove_memory+0x2b/0x40
>   dev_dax_kmem_remove+0x36/0x72 [kmem]
>   device_release_driver_internal+0xf0/0x1c0
>   device_release_driver+0x12/0x20
>   bus_remove_device+0xe1/0x150
>   device_del+0x17b/0x3e0
>   unregister_dev_dax+0x29/0x60
>   devm_action_release+0x15/0x20
>   release_nodes+0x19a/0x1e0
>   devres_release_all+0x3f/0x50
>   device_release_driver_internal+0x100/0x1c0
>   driver_detach+0x4c/0x8f
>   bus_remove_driver+0x5c/0xd0
>   driver_unregister+0x31/0x50
>   dax_pmem_exit+0x10/0xfe0 [dax_pmem]
> 
> This fixes it by moving mem_hotplug_done ahead of "done"
> 
> Signed-off-by: Jia He <justin.he@arm.com>
> ---
>  mm/memory_hotplug.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index e1e290577b45..86b36714342b 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1769,8 +1769,8 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size)
>  
>  	try_offline_node(nid);
>  
> -done:
>  	mem_hotplug_done();
> +done:
>  	return rc;
>  }
>  
> 

Just drop the "done" label, use "return rc;" directly instead of the
goto, and "return 0;" at the end.

Also, please add

Fixes: f1037ec0cc8a ("mm/memory_hotplug: fix remove_memory() lockdep splat")

and

Cc: stable@vger.kernel.org # v5.6+

Thanks!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] mm/memory_hotplug: harden try_offline_node against bogus nid
  2020-07-06  1:19 ` [PATCH 2/3] mm/memory_hotplug: harden try_offline_node against bogus nid Jia He
@ 2020-07-06  7:57   ` David Hildenbrand
  2020-07-06 13:45     ` Justin He
  0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2020-07-06  7:57 UTC (permalink / raw)
  To: Jia He, Catalin Marinas, Will Deacon
  Cc: Andrew Morton, Mike Rapoport, Baoquan He, Chuhong Yuan,
	linux-arm-kernel, linux-kernel, linux-mm, Kaly Xin

On 06.07.20 03:19, Jia He wrote:
> When testing the remove_memory path of dax pmem, there will be a panic with
> call trace:
>   try_remove_memory+0x84/0x170
>   remove_memory+0x38/0x58
>   dev_dax_kmem_remove+0x3c/0x84 [kmem]
>   device_release_driver_internal+0xfc/0x1c8
>   device_release_driver+0x28/0x38
>   bus_remove_device+0xd4/0x158
>   device_del+0x160/0x3a0
>   unregister_dev_dax+0x30/0x68
>   devm_action_release+0x20/0x30
>   release_nodes+0x150/0x240
>   devres_release_all+0x6c/0x1d0
>   device_release_driver_internal+0x10c/0x1c8
>   driver_detach+0xac/0x170
>   bus_remove_driver+0x64/0x130
>   driver_unregister+0x34/0x60
>   dax_pmem_exit+0x14/0xffc4 [dax_pmem]
>   __arm64_sys_delete_module+0x18c/0x2d0
>   el0_svc_common.constprop.2+0x78/0x168
>   do_el0_svc+0x34/0xa0
>   el0_sync_handler+0xe0/0x188
>   el0_sync+0x164/0x180
> 
> It is caused by the bogus nid (-1). Although the root cause is pmem dax
> translates from pxm to node_id incorrectly due to numa_off, it is worth
> hardening the codes in try_offline_node(), quiting if !pgdat.
> 
> Signed-off-by: Jia He <justin.he@arm.com>
> ---
>  mm/memory_hotplug.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index da374cd3d45b..e1e290577b45 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1680,6 +1680,9 @@ void try_offline_node(int nid)
>  	pg_data_t *pgdat = NODE_DATA(nid);
>  	int rc;
>  
> +	if (WARN_ON(!pgdat))
> +		return;
> +
>  	/*
>  	 * If the node still spans pages (especially ZONE_DEVICE), don't
>  	 * offline it. A node spans memory after move_pfn_range_to_zone(),
> 

Hm. If I am not wrong, somebody used add_memory() with another nid than
try_remove_memory()?

Or did we pass the node_possible(nid) check in add_memory_resource(),
and succeeded to add to nid==-1?

Having that said, this feels somewhat wrong, especially checking against
pgdat down in try_offline_node(). It really has to be the same nid as
used when adding - and that nid has to be sane.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] arm64/numa: set numa_off to false when numa node is fake
  2020-07-06  1:19 ` [PATCH 1/3] arm64/numa: set numa_off to false when numa node is fake Jia He
@ 2020-07-06  8:02   ` David Hildenbrand
  2020-07-06 12:36     ` Justin He
  2020-07-06 10:29   ` Jonathan Cameron
  1 sibling, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2020-07-06  8:02 UTC (permalink / raw)
  To: Jia He, Catalin Marinas, Will Deacon
  Cc: Andrew Morton, Mike Rapoport, Baoquan He, Chuhong Yuan,
	linux-arm-kernel, linux-kernel, linux-mm, Kaly Xin

On 06.07.20 03:19, Jia He wrote:
> Previously, numa_off is set to true unconditionally in dummy_numa_init(),
> even if there is a fake numa node.
> 
> But acpi will translate node id to NUMA_NO_NODE(-1) in acpi_map_pxm_to_node()
> because it regards numa_off as turning off the numa node.
> 
> Without this patch, pmem can't be probed as a RAM device on arm64 if SRAT table
> isn't present.
> 
> $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g -a 64K
> kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] with invalid node: -1
> kmem: probe of dax0.0 failed with error -22
> 
> This fixes it by setting numa_off to false.
> 
> Signed-off-by: Jia He <justin.he@arm.com>
> ---
>  arch/arm64/mm/numa.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> index aafcee3e3f7e..7689986020d9 100644
> --- a/arch/arm64/mm/numa.c
> +++ b/arch/arm64/mm/numa.c
> @@ -440,7 +440,8 @@ static int __init dummy_numa_init(void)
>  		return ret;
>  	}
>  
> -	numa_off = true;
> +	/* force numa_off to be false since we have a fake numa node here */
> +	numa_off = false;
>  	return 0;
>  }
>  
> 

What would happen if we use something like this in drivers/dax/kmem.c
instead:

numa_node = dev_dax->target_node;
if (numa_node == NUMA_NO_NODE)
	numa_node = memory_add_physaddr_to_nid(kmem_start);

and eventually dropping the pr_warn in
arm64/memory_add_physaddr_to_nid() ? Would that work?

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] arm64/numa: set numa_off to false when numa node is fake
  2020-07-06  1:19 ` [PATCH 1/3] arm64/numa: set numa_off to false when numa node is fake Jia He
  2020-07-06  8:02   ` David Hildenbrand
@ 2020-07-06 10:29   ` Jonathan Cameron
  2020-07-06 10:46     ` Jonathan Cameron
  1 sibling, 1 reply; 15+ messages in thread
From: Jonathan Cameron @ 2020-07-06 10:29 UTC (permalink / raw)
  To: Jia He
  Cc: Catalin Marinas, Will Deacon, Andrew Morton, Mike Rapoport,
	Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel,
	linux-mm, Kaly Xin

On Mon, 6 Jul 2020 09:19:45 +0800
Jia He <justin.he@arm.com> wrote:

Hi,

> Previously, numa_off is set to true unconditionally in dummy_numa_init(),
> even if there is a fake numa node.
> 
> But acpi will translate node id to NUMA_NO_NODE(-1) in acpi_map_pxm_to_node()
> because it regards numa_off as turning off the numa node.

That is correct.  It is operating exactly as it should, if SRAT hasn't been parsed
and you are on ACPI platform there are no nodes.  They cannot be created at
some later date.  The dummy code doesn't change this. It just does enough to carry
on operating with no specified nodes.

> 
> Without this patch, pmem can't be probed as a RAM device on arm64 if SRAT table
> isn't present.
> 
> $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g -a 64K
> kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] with invalid node: -1
> kmem: probe of dax0.0 failed with error -22
> 
> This fixes it by setting numa_off to false.

Without the SRAT protection patch [1] you may well run into problems
because someone somewhere will have _PXM in a DSDT but will
have a non existent SRAT.   We had this happen on an AMD platform when we
tried to introduce working _PXM support for PCI. [2]

So whilst this seems superficially safe, I'd definitely be crossing your fingers.
Note, at that time I proposed putting the numa_off = false into the x86 code
path precisely to cut out that possibility (was rejected at the time, at least
partly because the clarifications to the ACPI spec were not pubilc.)

The patch in [1] should sort things out however by ensuring we only create
new domains where we should actually be doing so. However, in your case
it will return NUMA_NO_NODE anyway so this isn't the right way to fix things.

[1] https://patchwork.kernel.org/patch/11632063/
[2] https://patchwork.kernel.org/patch/10597777/

Thanks,

Jonathan

> 
> Signed-off-by: Jia He <justin.he@arm.com>
> ---
>  arch/arm64/mm/numa.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> index aafcee3e3f7e..7689986020d9 100644
> --- a/arch/arm64/mm/numa.c
> +++ b/arch/arm64/mm/numa.c
> @@ -440,7 +440,8 @@ static int __init dummy_numa_init(void)
>  		return ret;
>  	}
>  
> -	numa_off = true;
> +	/* force numa_off to be false since we have a fake numa node here */
> +	numa_off = false;
>  	return 0;
>  }
>  




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] arm64/numa: set numa_off to false when numa node is fake
  2020-07-06 10:29   ` Jonathan Cameron
@ 2020-07-06 10:46     ` Jonathan Cameron
  2020-07-06 12:47       ` Justin He
  0 siblings, 1 reply; 15+ messages in thread
From: Jonathan Cameron @ 2020-07-06 10:46 UTC (permalink / raw)
  To: Jia He
  Cc: Catalin Marinas, Will Deacon, Andrew Morton, Mike Rapoport,
	Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel,
	linux-mm, Kaly Xin

On Mon, 6 Jul 2020 11:29:21 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:

> On Mon, 6 Jul 2020 09:19:45 +0800
> Jia He <justin.he@arm.com> wrote:
> 
> Hi,
> 
> > Previously, numa_off is set to true unconditionally in dummy_numa_init(),
> > even if there is a fake numa node.
> > 
> > But acpi will translate node id to NUMA_NO_NODE(-1) in acpi_map_pxm_to_node()
> > because it regards numa_off as turning off the numa node.  
> 
> That is correct.  It is operating exactly as it should, if SRAT hasn't been parsed
> and you are on ACPI platform there are no nodes.  They cannot be created at
> some later date.  The dummy code doesn't change this. It just does enough to carry
> on operating with no specified nodes.
> 
> > 
> > Without this patch, pmem can't be probed as a RAM device on arm64 if SRAT table
> > isn't present.
> > 
> > $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g -a 64K
> > kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] with invalid node: -1
> > kmem: probe of dax0.0 failed with error -22
> > 
> > This fixes it by setting numa_off to false.  
> 
> Without the SRAT protection patch [1] you may well run into problems
> because someone somewhere will have _PXM in a DSDT but will
> have a non existent SRAT.   We had this happen on an AMD platform when we
> tried to introduce working _PXM support for PCI. [2]
> 
> So whilst this seems superficially safe, I'd definitely be crossing your fingers.
> Note, at that time I proposed putting the numa_off = false into the x86 code
> path precisely to cut out that possibility (was rejected at the time, at least
> partly because the clarifications to the ACPI spec were not pubilc.)
> 
> The patch in [1] should sort things out however by ensuring we only create
> new domains where we should actually be doing so. However, in your case
> it will return NUMA_NO_NODE anyway so this isn't the right way to fix things.
> 
> [1] https://patchwork.kernel.org/patch/11632063/
> [2] https://patchwork.kernel.org/patch/10597777/

Thinking a bit more on this...

I'd like to understand more on what your use case is.

Do you have an NFIT that is setting the proximity domain for the
non-volatile memory in SPA structures?  If so the ACPI spec (6.3 makes this
clear) requires those match with domains described in SRAT.
If SRAT isn't there, then we can't expect sensible results from using these
values from NFIT.
If SRAT is there and numa=off is set then we should probably also rule out
parsing NFIT, or make all nfit handling fine with NO_NUMA_NODE, preferably
with explicit checks to ensure we don't try to use the Proximity Node values
as they have no meaning with numa=off. I note that the core NFIT parsing
is fine with the value not being supplied in the first place.

https://elixir.bootlin.com/linux/latest/source/drivers/acpi/nfit/core.c#L2947

Thanks,

Jonathan

> 
> Thanks,
> 
> Jonathan
> 
> > 
> > Signed-off-by: Jia He <justin.he@arm.com>
> > ---
> >  arch/arm64/mm/numa.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> > index aafcee3e3f7e..7689986020d9 100644
> > --- a/arch/arm64/mm/numa.c
> > +++ b/arch/arm64/mm/numa.c
> > @@ -440,7 +440,8 @@ static int __init dummy_numa_init(void)
> >  		return ret;
> >  	}
> >  
> > -	numa_off = true;
> > +	/* force numa_off to be false since we have a fake numa node here */
> > +	numa_off = false;
> >  	return 0;
> >  }
> >    
> 




^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 1/3] arm64/numa: set numa_off to false when numa node is fake
  2020-07-06  8:02   ` David Hildenbrand
@ 2020-07-06 12:36     ` Justin He
  2020-07-06 13:56       ` David Hildenbrand
  0 siblings, 1 reply; 15+ messages in thread
From: Justin He @ 2020-07-06 12:36 UTC (permalink / raw)
  To: David Hildenbrand, Catalin Marinas, Will Deacon
  Cc: Andrew Morton, Mike Rapoport, Baoquan He, Chuhong Yuan,
	linux-arm-kernel, linux-kernel, linux-mm, Kaly Xin

Hi David, thanks for the comments. See my answer please:

> -----Original Message-----
> From: David Hildenbrand <david@redhat.com>
> Sent: Monday, July 6, 2020 4:03 PM
> To: Justin He <Justin.He@arm.com>; Catalin Marinas
> <Catalin.Marinas@arm.com>; Will Deacon <will@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>; Mike Rapoport
> <rppt@linux.ibm.com>; Baoquan He <bhe@redhat.com>; Chuhong Yuan
> <hslester96@gmail.com>; linux-arm-kernel@lists.infradead.org; linux-
> kernel@vger.kernel.org; linux-mm@kvack.org; Kaly Xin <Kaly.Xin@arm.com>
> Subject: Re: [PATCH 1/3] arm64/numa: set numa_off to false when numa node
> is fake
> 
> On 06.07.20 03:19, Jia He wrote:
> > Previously, numa_off is set to true unconditionally in dummy_numa_init(),
> > even if there is a fake numa node.
> >
> > But acpi will translate node id to NUMA_NO_NODE(-1) in
> acpi_map_pxm_to_node()
> > because it regards numa_off as turning off the numa node.
> >
> > Without this patch, pmem can't be probed as a RAM device on arm64 if
> SRAT table
> > isn't present.
> >
> > $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g -
> a 64K
> > kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] with
> invalid node: -1
> > kmem: probe of dax0.0 failed with error -22
> >
> > This fixes it by setting numa_off to false.
> >
> > Signed-off-by: Jia He <justin.he@arm.com>
> > ---
> >  arch/arm64/mm/numa.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> > index aafcee3e3f7e..7689986020d9 100644
> > --- a/arch/arm64/mm/numa.c
> > +++ b/arch/arm64/mm/numa.c
> > @@ -440,7 +440,8 @@ static int __init dummy_numa_init(void)
> >  		return ret;
> >  	}
> >
> > -	numa_off = true;
> > +	/* force numa_off to be false since we have a fake numa node here
> */
> > +	numa_off = false;
> >  	return 0;
> >  }
> >
> >
> 
> What would happen if we use something like this in drivers/dax/kmem.c
> instead:
> 
> numa_node = dev_dax->target_node;
> if (numa_node == NUMA_NO_NODE)
> 	numa_node = memory_add_physaddr_to_nid(kmem_start);
> 
> and eventually dropping the pr_warn in
> arm64/memory_add_physaddr_to_nid() ? Would that work?

Yes, it works. I sent a similar patch [1] before. But seems pmem
maintainer didn't satisfy it. Do you think memory_add_physaddr_to_nid()
is better than numa_mem_id()? 

[1] https://lkml.org/lkml/2019/8/16/367

--
Cheers,
Justin (Jia He)



^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 1/3] arm64/numa: set numa_off to false when numa node is fake
  2020-07-06 10:46     ` Jonathan Cameron
@ 2020-07-06 12:47       ` Justin He
  2020-07-06 13:03         ` Jonathan Cameron
  0 siblings, 1 reply; 15+ messages in thread
From: Justin He @ 2020-07-06 12:47 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Catalin Marinas, Will Deacon, Andrew Morton, Mike Rapoport,
	Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel,
	linux-mm, Kaly Xin, David Hildenbrand

Hi Jonathan, thanks for the comments.

> -----Original Message-----
> From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
> Sent: Monday, July 6, 2020 6:46 PM
> To: Justin He <Justin.He@arm.com>
> Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Will Deacon
> <will@kernel.org>; Andrew Morton <akpm@linux-foundation.org>; Mike
> Rapoport <rppt@linux.ibm.com>; Baoquan He <bhe@redhat.com>; Chuhong Yuan
> <hslester96@gmail.com>; linux-arm-kernel@lists.infradead.org; linux-
> kernel@vger.kernel.org; linux-mm@kvack.org; Kaly Xin <Kaly.Xin@arm.com>
> Subject: Re: [PATCH 1/3] arm64/numa: set numa_off to false when numa node
> is fake
> 
> On Mon, 6 Jul 2020 11:29:21 +0100
> Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:
> 
> > On Mon, 6 Jul 2020 09:19:45 +0800
> > Jia He <justin.he@arm.com> wrote:
> >
> > Hi,
> >
> > > Previously, numa_off is set to true unconditionally in
> dummy_numa_init(),
> > > even if there is a fake numa node.
> > >
> > > But acpi will translate node id to NUMA_NO_NODE(-1) in
> acpi_map_pxm_to_node()
> > > because it regards numa_off as turning off the numa node.
> >
> > That is correct.  It is operating exactly as it should, if SRAT hasn't
> been parsed
> > and you are on ACPI platform there are no nodes.  They cannot be created
> at
> > some later date.  The dummy code doesn't change this. It just does
> enough to carry
> > on operating with no specified nodes.
> >
> > >
> > > Without this patch, pmem can't be probed as a RAM device on arm64 if
> SRAT table
> > > isn't present.
> > >
> > > $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g
> -a 64K
> > > kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] with
> invalid node: -1
> > > kmem: probe of dax0.0 failed with error -22
> > >
> > > This fixes it by setting numa_off to false.
> >
> > Without the SRAT protection patch [1] you may well run into problems

Sorry, doesn't quite understand here. Do you mean your [1] can resolve this
issue? But acpi_map_pxm_to_node() has returned with NUMA_NO_NODE after
following check:
	if (pxm < 0 || pxm >= MAX_PXM_DOMAINS || numa_off)
		return NUMA_NO_NODE;
Seems even with your [1] patch, it is not helpful? Thanks for clarification
if my understanding is wrong.
[1] https://patchwork.kernel.org/patch/11632063/

> > because someone somewhere will have _PXM in a DSDT but will
> > have a non existent SRAT.   We had this happen on an AMD platform when
> we
> > tried to introduce working _PXM support for PCI. [2]
> >
> > So whilst this seems superficially safe, I'd definitely be crossing your
> fingers.
> > Note, at that time I proposed putting the numa_off = false into the x86
> code
> > path precisely to cut out that possibility (was rejected at the time, at
> least
> > partly because the clarifications to the ACPI spec were not pubilc.)
> >
> > The patch in [1] should sort things out however by ensuring we only
> create
> > new domains where we should actually be doing so. However, in your case
> > it will return NUMA_NO_NODE anyway so this isn't the right way to fix
> things.

Okay, let me try to summarize, there might be 3 possible fixing ways:
1. this patch, seems it is not satisfied by you and David 😉
2. my previous proposal [2], similar as what David suggested
3. remove numa_off check in acpi_map_pxm_to_node()
e.g.
...
	if (pxm < 0 || pxm >= MAX_PXM_DOMAINS /*|| numa_off*/)
		return NUMA_NO_NODE;

[2] https://lkml.org/lkml/2019/8/16/367


--
Cheers,
Justin (Jia He)



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] arm64/numa: set numa_off to false when numa node is fake
  2020-07-06 12:47       ` Justin He
@ 2020-07-06 13:03         ` Jonathan Cameron
  0 siblings, 0 replies; 15+ messages in thread
From: Jonathan Cameron @ 2020-07-06 13:03 UTC (permalink / raw)
  To: Justin He
  Cc: Catalin Marinas, Will Deacon, Andrew Morton, Mike Rapoport,
	Baoquan He, Chuhong Yuan, linux-arm-kernel, linux-kernel,
	linux-mm, Kaly Xin, David Hildenbrand

On Mon, 6 Jul 2020 12:47:51 +0000
Justin He <Justin.He@arm.com> wrote:

> Hi Jonathan, thanks for the comments.
> 
> > -----Original Message-----
> > From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
> > Sent: Monday, July 6, 2020 6:46 PM
> > To: Justin He <Justin.He@arm.com>
> > Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Will Deacon
> > <will@kernel.org>; Andrew Morton <akpm@linux-foundation.org>; Mike
> > Rapoport <rppt@linux.ibm.com>; Baoquan He <bhe@redhat.com>; Chuhong Yuan
> > <hslester96@gmail.com>; linux-arm-kernel@lists.infradead.org; linux-
> > kernel@vger.kernel.org; linux-mm@kvack.org; Kaly Xin <Kaly.Xin@arm.com>
> > Subject: Re: [PATCH 1/3] arm64/numa: set numa_off to false when numa node
> > is fake
> > 
> > On Mon, 6 Jul 2020 11:29:21 +0100
> > Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:
> >   
> > > On Mon, 6 Jul 2020 09:19:45 +0800
> > > Jia He <justin.he@arm.com> wrote:
> > >
> > > Hi,
> > >  
> > > > Previously, numa_off is set to true unconditionally in  
> > dummy_numa_init(),  
> > > > even if there is a fake numa node.
> > > >
> > > > But acpi will translate node id to NUMA_NO_NODE(-1) in  
> > acpi_map_pxm_to_node()  
> > > > because it regards numa_off as turning off the numa node.  
> > >
> > > That is correct.  It is operating exactly as it should, if SRAT hasn't  
> > been parsed  
> > > and you are on ACPI platform there are no nodes.  They cannot be created  
> > at  
> > > some later date.  The dummy code doesn't change this. It just does  
> > enough to carry  
> > > on operating with no specified nodes.
> > >  
> > > >
> > > > Without this patch, pmem can't be probed as a RAM device on arm64 if  
> > SRAT table  
> > > > isn't present.
> > > >
> > > > $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g  
> > -a 64K  
> > > > kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] with  
> > invalid node: -1  
> > > > kmem: probe of dax0.0 failed with error -22
> > > >
> > > > This fixes it by setting numa_off to false.  
> > >
> > > Without the SRAT protection patch [1] you may well run into problems  
> 
> Sorry, doesn't quite understand here. Do you mean your [1] can resolve this
> issue? But acpi_map_pxm_to_node() has returned with NUMA_NO_NODE after
> following check:
> 	if (pxm < 0 || pxm >= MAX_PXM_DOMAINS || numa_off)
> 		return NUMA_NO_NODE;

The point of that patch is it will make it safe to remove the numa_off because
any later accidental reference to a non existent node (i.e. one not defined
in SRAT) will not blow up.

It doesn't fix your original problem. What it does do, is fix the new problem case
you introduce by removing numa_off below.  It ensures you still return NUMA_NO_NODE
in cases which should do so (i.e. all of them if you have no SRAT and are using ACPI).

Of course, you could just not remove the numa_off = true bit then you won't hit
that condition anyway. There are plenty of other reasons for the SRAT patch though,
it just happens to close a problem you were introducing here as well.

For reference we had an AMD platform that had no SRAT, but provided _PXM for
a few nodes in its DSDT.   That result in non booting systems.  It only affected
x86 because ARM64 had that numa_off = true being set.  If we change the arm64 case
without the patch to ensure the underlying problem is fixed, you are very likely to hit
the equivalent problem. There may well be platforms out there relying on that quirk
of what the code currently does.

> Seems even with your [1] patch, it is not helpful? Thanks for clarification
> if my understanding is wrong.
> [1] https://patchwork.kernel.org/patch/11632063/
> 
> > > because someone somewhere will have _PXM in a DSDT but will
> > > have a non existent SRAT.   We had this happen on an AMD platform when  
> > we  
> > > tried to introduce working _PXM support for PCI. [2]
> > >
> > > So whilst this seems superficially safe, I'd definitely be crossing your  
> > fingers.  
> > > Note, at that time I proposed putting the numa_off = false into the x86  
> > code  
> > > path precisely to cut out that possibility (was rejected at the time, at  
> > least  
> > > partly because the clarifications to the ACPI spec were not pubilc.)
> > >
> > > The patch in [1] should sort things out however by ensuring we only  
> > create  
> > > new domains where we should actually be doing so. However, in your case
> > > it will return NUMA_NO_NODE anyway so this isn't the right way to fix  
> > things.  
> 
> Okay, let me try to summarize, there might be 3 possible fixing ways:
> 1. this patch, seems it is not satisfied by you and David 😉
> 2. my previous proposal [2], similar as what David suggested

That looks like the correct approach to me as well.

> 3. remove numa_off check in acpi_map_pxm_to_node()

No way to that one.  The only right return value from acpi_map_pxm_to_node
when no node is provided (always the case if you have no SRAT) is
NUMA_NO_NODE.  Do not paper over that - fix the caller to handle
a perfectly valid return value.

Jonathan

> e.g.
> ...
> 	if (pxm < 0 || pxm >= MAX_PXM_DOMAINS /*|| numa_off*/)
> 		return NUMA_NO_NODE;
> 
> [2] https://lkml.org/lkml/2019/8/16/367
> 
> 
> --
> Cheers,
> Justin (Jia He)
> 
> 




^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 2/3] mm/memory_hotplug: harden try_offline_node against bogus nid
  2020-07-06  7:57   ` David Hildenbrand
@ 2020-07-06 13:45     ` Justin He
  0 siblings, 0 replies; 15+ messages in thread
From: Justin He @ 2020-07-06 13:45 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrew Morton, Mike Rapoport, Baoquan He, Chuhong Yuan,
	linux-arm-kernel, linux-kernel, linux-mm, Kaly Xin,
	Catalin Marinas, Will Deacon

Hi David

> -----Original Message-----
> From: David Hildenbrand <david@redhat.com>
> Sent: Monday, July 6, 2020 3:58 PM
> To: Justin He <Justin.He@arm.com>; Catalin Marinas
> <Catalin.Marinas@arm.com>; Will Deacon <will@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>; Mike Rapoport
> <rppt@linux.ibm.com>; Baoquan He <bhe@redhat.com>; Chuhong Yuan
> <hslester96@gmail.com>; linux-arm-kernel@lists.infradead.org; linux-
> kernel@vger.kernel.org; linux-mm@kvack.org; Kaly Xin <Kaly.Xin@arm.com>
> Subject: Re: [PATCH 2/3] mm/memory_hotplug: harden try_offline_node
> against bogus nid
> 
> On 06.07.20 03:19, Jia He wrote:
> > When testing the remove_memory path of dax pmem, there will be a panic
> with
> > call trace:
> >   try_remove_memory+0x84/0x170
> >   remove_memory+0x38/0x58
> >   dev_dax_kmem_remove+0x3c/0x84 [kmem]
> >   device_release_driver_internal+0xfc/0x1c8
> >   device_release_driver+0x28/0x38
> >   bus_remove_device+0xd4/0x158
> >   device_del+0x160/0x3a0
> >   unregister_dev_dax+0x30/0x68
> >   devm_action_release+0x20/0x30
> >   release_nodes+0x150/0x240
> >   devres_release_all+0x6c/0x1d0
> >   device_release_driver_internal+0x10c/0x1c8
> >   driver_detach+0xac/0x170
> >   bus_remove_driver+0x64/0x130
> >   driver_unregister+0x34/0x60
> >   dax_pmem_exit+0x14/0xffc4 [dax_pmem]
> >   __arm64_sys_delete_module+0x18c/0x2d0
> >   el0_svc_common.constprop.2+0x78/0x168
> >   do_el0_svc+0x34/0xa0
> >   el0_sync_handler+0xe0/0x188
> >   el0_sync+0x164/0x180
> >
> > It is caused by the bogus nid (-1). Although the root cause is pmem dax
> > translates from pxm to node_id incorrectly due to numa_off, it is worth
> > hardening the codes in try_offline_node(), quiting if !pgdat.
> >
> > Signed-off-by: Jia He <justin.he@arm.com>
> > ---
> >  mm/memory_hotplug.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > index da374cd3d45b..e1e290577b45 100644
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -1680,6 +1680,9 @@ void try_offline_node(int nid)
> >  	pg_data_t *pgdat = NODE_DATA(nid);
> >  	int rc;
> >
> > +	if (WARN_ON(!pgdat))
> > +		return;
> > +
> >  	/*
> >  	 * If the node still spans pages (especially ZONE_DEVICE), don't
> >  	 * offline it. A node spans memory after move_pfn_range_to_zone(),
> >
> 
> Hm. If I am not wrong, somebody used add_memory() with another nid than
> try_remove_memory()?
> 

Yes after commit fa6d9ec790550, it can prevent this possibility.
I will drop this single patch. Thanks
--
Cheers,
Justin (Jia He)



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] arm64/numa: set numa_off to false when numa node is fake
  2020-07-06 12:36     ` Justin He
@ 2020-07-06 13:56       ` David Hildenbrand
  0 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2020-07-06 13:56 UTC (permalink / raw)
  To: Justin He, Catalin Marinas, Will Deacon
  Cc: Andrew Morton, Mike Rapoport, Baoquan He, Chuhong Yuan,
	linux-arm-kernel, linux-kernel, linux-mm, Kaly Xin

On 06.07.20 14:36, Justin He wrote:
> Hi David, thanks for the comments. See my answer please:
> 
>> -----Original Message-----
>> From: David Hildenbrand <david@redhat.com>
>> Sent: Monday, July 6, 2020 4:03 PM
>> To: Justin He <Justin.He@arm.com>; Catalin Marinas
>> <Catalin.Marinas@arm.com>; Will Deacon <will@kernel.org>
>> Cc: Andrew Morton <akpm@linux-foundation.org>; Mike Rapoport
>> <rppt@linux.ibm.com>; Baoquan He <bhe@redhat.com>; Chuhong Yuan
>> <hslester96@gmail.com>; linux-arm-kernel@lists.infradead.org; linux-
>> kernel@vger.kernel.org; linux-mm@kvack.org; Kaly Xin <Kaly.Xin@arm.com>
>> Subject: Re: [PATCH 1/3] arm64/numa: set numa_off to false when numa node
>> is fake
>>
>> On 06.07.20 03:19, Jia He wrote:
>>> Previously, numa_off is set to true unconditionally in dummy_numa_init(),
>>> even if there is a fake numa node.
>>>
>>> But acpi will translate node id to NUMA_NO_NODE(-1) in
>> acpi_map_pxm_to_node()
>>> because it regards numa_off as turning off the numa node.
>>>
>>> Without this patch, pmem can't be probed as a RAM device on arm64 if
>> SRAT table
>>> isn't present.
>>>
>>> $ndctl create-namespace -fe namespace0.0 --mode=devdax --map=dev -s 1g -
>> a 64K
>>> kmem dax0.0: rejecting DAX region [mem 0x240400000-0x2bfffffff] with
>> invalid node: -1
>>> kmem: probe of dax0.0 failed with error -22
>>>
>>> This fixes it by setting numa_off to false.
>>>
>>> Signed-off-by: Jia He <justin.he@arm.com>
>>> ---
>>>  arch/arm64/mm/numa.c | 3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>>> index aafcee3e3f7e..7689986020d9 100644
>>> --- a/arch/arm64/mm/numa.c
>>> +++ b/arch/arm64/mm/numa.c
>>> @@ -440,7 +440,8 @@ static int __init dummy_numa_init(void)
>>>  		return ret;
>>>  	}
>>>
>>> -	numa_off = true;
>>> +	/* force numa_off to be false since we have a fake numa node here
>> */
>>> +	numa_off = false;
>>>  	return 0;
>>>  }
>>>
>>>
>>
>> What would happen if we use something like this in drivers/dax/kmem.c
>> instead:
>>
>> numa_node = dev_dax->target_node;
>> if (numa_node == NUMA_NO_NODE)
>> 	numa_node = memory_add_physaddr_to_nid(kmem_start);
>>
>> and eventually dropping the pr_warn in
>> arm64/memory_add_physaddr_to_nid() ? Would that work?
> 
> Yes, it works. I sent a similar patch [1] before. But seems pmem
> maintainer didn't satisfy it. Do you think memory_add_physaddr_to_nid()
> is better than numa_mem_id()? 

Well, it's the somewhat-common way to get a NID for memory hotadd.

E.g.,
- drivers/acpi/acpi_memhotplug.c
- drivers/base/memory.c
- drivers/hv/hv_balloon.c
- drivers/virtio/virtio_mem.c
- drivers/xen/balloon.c

use it in combination with add_memory_*()

Especially, ACPI and virtio-mem use it in case NUMA_NO_NID is detected.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] mm/memory_hotplug: fix unpaired mem_hotplug_begin/done
  2020-07-06  7:49   ` David Hildenbrand
@ 2020-07-07 22:10     ` Dan Williams
  0 siblings, 0 replies; 15+ messages in thread
From: Dan Williams @ 2020-07-07 22:10 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Jia He, Catalin Marinas, Will Deacon, Andrew Morton,
	Mike Rapoport, Baoquan He, Chuhong Yuan, Linux ARM,
	Linux Kernel Mailing List, Linux MM, Kaly Xin, Michal Hocko

On Mon, Jul 6, 2020 at 12:50 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 06.07.20 03:19, Jia He wrote:
> > When check_memblock_offlined_cb() returns failed rc(e.g. the memblock is
> > online at that time), mem_hotplug_begin/done is unpaired in such case.
> >
> > Therefore a warning:
> >  Call Trace:
> >   percpu_up_write+0x33/0x40
> >   try_remove_memory+0x66/0x120
> >   ? _cond_resched+0x19/0x30
> >   remove_memory+0x2b/0x40
> >   dev_dax_kmem_remove+0x36/0x72 [kmem]
> >   device_release_driver_internal+0xf0/0x1c0
> >   device_release_driver+0x12/0x20
> >   bus_remove_device+0xe1/0x150
> >   device_del+0x17b/0x3e0
> >   unregister_dev_dax+0x29/0x60
> >   devm_action_release+0x15/0x20
> >   release_nodes+0x19a/0x1e0
> >   devres_release_all+0x3f/0x50
> >   device_release_driver_internal+0x100/0x1c0
> >   driver_detach+0x4c/0x8f
> >   bus_remove_driver+0x5c/0xd0
> >   driver_unregister+0x31/0x50
> >   dax_pmem_exit+0x10/0xfe0 [dax_pmem]
> >
> > This fixes it by moving mem_hotplug_done ahead of "done"
> >
> > Signed-off-by: Jia He <justin.he@arm.com>
> > ---
> >  mm/memory_hotplug.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > index e1e290577b45..86b36714342b 100644
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -1769,8 +1769,8 @@ static int __ref try_remove_memory(int nid, u64 start, u64 size)
> >
> >       try_offline_node(nid);
> >
> > -done:
> >       mem_hotplug_done();
> > +done:
> >       return rc;
> >  }
> >
> >
>
> Just drop the "done" label, use "return rc;" directly instead of the
> goto, and "return 0;" at the end.
>
> Also, please add
>
> Fixes: f1037ec0cc8a ("mm/memory_hotplug: fix remove_memory() lockdep splat")
>
> and
>
> Cc: stable@vger.kernel.org # v5.6+
>
> Thanks!

Yes, thanks, you can also add:

Acked-by: Dan Williams <dan.j.williams@intel.com>


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-07-07 22:10 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-06  1:19 [PATCH 0/3] Fix and enable pmem as RAM on arm64 Jia He
2020-07-06  1:19 ` [PATCH 1/3] arm64/numa: set numa_off to false when numa node is fake Jia He
2020-07-06  8:02   ` David Hildenbrand
2020-07-06 12:36     ` Justin He
2020-07-06 13:56       ` David Hildenbrand
2020-07-06 10:29   ` Jonathan Cameron
2020-07-06 10:46     ` Jonathan Cameron
2020-07-06 12:47       ` Justin He
2020-07-06 13:03         ` Jonathan Cameron
2020-07-06  1:19 ` [PATCH 2/3] mm/memory_hotplug: harden try_offline_node against bogus nid Jia He
2020-07-06  7:57   ` David Hildenbrand
2020-07-06 13:45     ` Justin He
2020-07-06  1:19 ` [PATCH 3/3] mm/memory_hotplug: fix unpaired mem_hotplug_begin/done Jia He
2020-07-06  7:49   ` David Hildenbrand
2020-07-07 22:10     ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).