linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: Fix false softlockup during pfn range removal
@ 2020-06-19 23:12 Ben Widawsky
  2020-06-22  7:16 ` David Hildenbrand
  2020-06-23  7:18 ` David Hildenbrand
  0 siblings, 2 replies; 5+ messages in thread
From: Ben Widawsky @ 2020-06-19 23:12 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Ben Widawsky, Scargall, Steve, Dan Williams,
	David Hildenbrand, Vishal Verma

When working with very large nodes, poisoning the struct pages (for
which there will be very many) can take a very long time. If the system
is using voluntary preemptions, the software watchdog will not be able
to detect forward progress. This patch addresses this issue by offering
to give up time like __remove_pages() does.  This behavior was
introduced in v5.6 with:
commit d33695b16a9f ("mm/memory_hotplug: poison memmap in remove_pfn_range_from_zone()")

Alternately, init_page_poison could do this cond_resched(), but it seems
to me that the caller of init_page_poison() is what actually knows
whether or not it should relax its own priority.

Based on Dan's notes, I think this is perfectly safe:
commit f931ab479dd2 ("mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}")

Aside from fixing the lockup, it is also a friendlier thing to do on
lower core systems that might wipe out large chunks of hotplug memory
(probably not a very common case).

Fixes this kind of splat:
[  352.142079] watchdog: BUG: soft lockup - CPU#46 stuck for 22s! [daxctl:9922]
[  352.150067] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security rfkill ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_umad vfat fat kmem intel_rapl_msr intel_rapl_common rpcrdma sunrpc rdma_ucm ib_iser isst_if_common rdma_cm skx_edac iw_cm ib_cm x86_pkg_temp_thermal libiscsi intel_powerclamp scsi_transport_iscsi coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i40iw intel_cstate iTCO_wdt ib_uverbs iTCO_vendor_support device_dax ib_core joydev intel_uncore i2c_i801 lpc_ich ipmi_ssif ioatdma dca wmi ipmi_si ipmi_devintf ipmi_msghandler dax_pmem
[  352.150123]  dax_pmem_core acpi_pad acpi_power_meter drm ip_tables xfs libcrc32c nd_pmem nd_btt i40e e1000e crc32c_intel nfit
[  352.260774] irq event stamp: 138450
[  352.264692] hardirqs last  enabled at (138449): [<ffffffffa1001f26>] trace_hardirqs_on_thunk+0x1a/0x1c
[  352.275134] hardirqs last disabled at (138450): [<ffffffffa1001f42>] trace_hardirqs_off_thunk+0x1a/0x1c
[  352.285662] softirqs last  enabled at (138448): [<ffffffffa1e00347>] __do_softirq+0x347/0x456
[  352.295233] softirqs last disabled at (138443): [<ffffffffa10c416d>] irq_exit+0x7d/0xb0
[  352.304210] CPU: 46 PID: 9922 Comm: daxctl Not tainted 5.7.0-BEN-14238-g373c6049b336 #30
[  352.313283] Hardware name: Intel Corporation PURLEY/PURLEY, BIOS PLYXCRB1.86B.0578.D07.1902280810 02/28/2019
[  352.324308] RIP: 0010:memset_erms+0x9/0x10
[  352.328905] Code: c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 f3 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 <f3> aa 4c 89 c8 c3 90 49 89 fa 40 0f b6 ce 48 b8 01 01 01 01 01 01
[  352.349953] RSP: 0018:ffffc90021b5fd50 EFLAGS: 00010202 ORIG_RAX: ffffffffffffff13
[  352.358450] RAX: 00000000000000ff RBX: ffff88983ffd5000 RCX: 0000000065df0e40
[  352.366457] RDX: 00000003a0000000 RSI: 00000000ffffffff RDI: ffffea039b20f1c0
[  352.374465] RBP: ffff88983ffd6c00 R08: 0000000000000000 R09: ffffea0061000000
[  352.382473] R10: 0000000000000001 R11: 0000000000000000 R12: ffffea006f808000
[  352.390480] R13: 0000000001840000 R14: 000000000e800000 R15: ffff8997d7b764e0
[  352.398487] FS:  00007f724ef81780(0000) GS:ffff8997df100000(0000) knlGS:0000000000000000
[  352.407562] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  352.414011] CR2: 00007ffcd4070768 CR3: 000001178c722002 CR4: 00000000003606e0
[  352.422056] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  352.430092] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  352.438115] Call Trace:
[  352.440865]  remove_pfn_range_from_zone+0x3a/0x380
[  352.446244]  ? cpumask_next+0x17/0x20
[  352.450361]  memunmap_pages+0x17f/0x280
[  352.454670]  release_nodes+0x22a/0x260
[  352.458888]  __device_release_driver+0x172/0x220
[  352.464070]  device_driver_detach+0x3e/0xa0
[  352.468753]  unbind_store+0x113/0x130
[  352.472868]  kernfs_fop_write+0xdc/0x1c0
[  352.477273]  vfs_write+0xde/0x1d0
[  352.482218]  ksys_write+0x58/0xd0
[  352.487207]  do_syscall_64+0x5a/0x120
[  352.492529]  entry_SYSCALL_64_after_hwframe+0x49/0xb3
[  352.499446] RIP: 0033:0x7f724f40b5e7
[  352.504673] Code: Bad RIP value.
[  352.509484] RSP: 002b:00007ffcd40738f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  352.519213] RAX: ffffffffffffffda RBX: 00007f724ef816a8 RCX: 00007f724f40b5e7
[  352.528410] RDX: 0000000000000007 RSI: 00005617d7cd1277 RDI: 0000000000000003
[  352.537573] RBP: 0000000000000003 R08: 00000000ffffffff R09: 00007ffcd40737d0
[  352.546764] R10: 0000000000000000 R11: 0000000000000246 R12: 00005617d7cd1277
[  352.555929] R13: 0000000000000000 R14: 0000000000000007 R15: 00005617d7cd1230
[  370.353742] Built 2 zonelists, mobility grouping on.  Total pages: 49050381
[  370.373317] Policy zone: Normal
[  374.948164] Built 3 zonelists, mobility grouping on.  Total pages: 49312525
[  375.017496] Policy zone: Normal

Fixes: commit d33695b16a9f ("mm/memory_hotplug: poison memmap in remove_pfn_range_from_zone()")
Reported-by: "Scargall, Steve" <steve.scargall@intel.com>
Reported-by: Ben Widawsky <ben.widawsky@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
---
 mm/memory_hotplug.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 9b34e03e730a..da374cd3d45b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -471,11 +471,20 @@ void __ref remove_pfn_range_from_zone(struct zone *zone,
 				      unsigned long start_pfn,
 				      unsigned long nr_pages)
 {
+	const unsigned long end_pfn = start_pfn + nr_pages;
 	struct pglist_data *pgdat = zone->zone_pgdat;
-	unsigned long flags;
+	unsigned long pfn, cur_nr_pages, flags;
 
 	/* Poison struct pages because they are now uninitialized again. */
-	page_init_poison(pfn_to_page(start_pfn), sizeof(struct page) * nr_pages);
+	for (pfn = start_pfn; pfn < end_pfn; pfn += cur_nr_pages) {
+		cond_resched();
+
+		/* Select all remaining pages up to the next section boundary */
+		cur_nr_pages =
+			min(end_pfn - pfn, SECTION_ALIGN_UP(pfn + 1) - pfn);
+		page_init_poison(pfn_to_page(pfn),
+				 sizeof(struct page) * cur_nr_pages);
+	}
 
 #ifdef CONFIG_ZONE_DEVICE
 	/*
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm: Fix false softlockup during pfn range removal
  2020-06-19 23:12 [PATCH] mm: Fix false softlockup during pfn range removal Ben Widawsky
@ 2020-06-22  7:16 ` David Hildenbrand
  2020-06-22 16:25   ` Ben Widawsky
  2020-06-23  7:18 ` David Hildenbrand
  1 sibling, 1 reply; 5+ messages in thread
From: David Hildenbrand @ 2020-06-22  7:16 UTC (permalink / raw)
  To: Ben Widawsky, linux-mm
  Cc: Andrew Morton, Scargall, Steve, Dan Williams, Vishal Verma

On 20.06.20 01:12, Ben Widawsky wrote:
> When working with very large nodes, poisoning the struct pages (for
> which there will be very many) can take a very long time. If the system
> is using voluntary preemptions, the software watchdog will not be able
> to detect forward progress. This patch addresses this issue by offering
> to give up time like __remove_pages() does.  This behavior was
> introduced in v5.6 with:
> commit d33695b16a9f ("mm/memory_hotplug: poison memmap in remove_pfn_range_from_zone()")

We try to stay <= 72 chars in the commit message (except for kernel splats).

> 
> Alternately, init_page_poison could do this cond_resched(), but it seems
> to me that the caller of init_page_poison() is what actually knows
> whether or not it should relax its own priority.
> 
> Based on Dan's notes, I think this is perfectly safe:
> commit f931ab479dd2 ("mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}")

We shouldn't be holding any spin locks, so it's safe. (we could think
about doing this outside of the memory hotplug lock in the case of
devmem, however, it's only a debugging feature so we most probably don't
care).

> 
> Aside from fixing the lockup, it is also a friendlier thing to do on
> lower core systems that might wipe out large chunks of hotplug memory
> (probably not a very common case).

It really only is an issue for devmem. Ordinary hotplugged system memory
is not affected (onlined/offlined in memory block granularity).

> 
> Fixes this kind of splat:
> [  352.142079] watchdog: BUG: soft lockup - CPU#46 stuck for 22s! [daxctl:9922]
> [  352.150067] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security rfkill ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_umad vfat fat kmem intel_rapl_msr intel_rapl_common rpcrdma sunrpc rdma_ucm ib_iser isst_if_common rdma_cm skx_edac iw_cm ib_cm x86_pkg_temp_thermal libiscsi intel_powerclamp scsi_transport_iscsi coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i40iw intel_cstate iTCO_wdt ib_uverbs iTCO_vendor_support device_dax ib_core joydev intel_uncore i2c_i801 lpc_ich ipmi_ssif ioatdma dca wmi ipmi_si ipmi_devintf ipmi_msghandler dax_pmem
> [  352.150123]  dax_pmem_core acpi_pad acpi_power_meter drm ip_tables xfs libcrc32c nd_pmem nd_btt i40e e1000e crc32c_intel nfit
> [  352.260774] irq event stamp: 138450
> [  352.264692] hardirqs last  enabled at (138449): [<ffffffffa1001f26>] trace_hardirqs_on_thunk+0x1a/0x1c
> [  352.275134] hardirqs last disabled at (138450): [<ffffffffa1001f42>] trace_hardirqs_off_thunk+0x1a/0x1c
> [  352.285662] softirqs last  enabled at (138448): [<ffffffffa1e00347>] __do_softirq+0x347/0x456
> [  352.295233] softirqs last disabled at (138443): [<ffffffffa10c416d>] irq_exit+0x7d/0xb0
> [  352.304210] CPU: 46 PID: 9922 Comm: daxctl Not tainted 5.7.0-BEN-14238-g373c6049b336 #30
> [  352.313283] Hardware name: Intel Corporation PURLEY/PURLEY, BIOS PLYXCRB1.86B.0578.D07.1902280810 02/28/2019
> [  352.324308] RIP: 0010:memset_erms+0x9/0x10
> [  352.328905] Code: c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 f3 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 <f3> aa 4c 89 c8 c3 90 49 89 fa 40 0f b6 ce 48 b8 01 01 01 01 01 01
> [  352.349953] RSP: 0018:ffffc90021b5fd50 EFLAGS: 00010202 ORIG_RAX: ffffffffffffff13
> [  352.358450] RAX: 00000000000000ff RBX: ffff88983ffd5000 RCX: 0000000065df0e40
> [  352.366457] RDX: 00000003a0000000 RSI: 00000000ffffffff RDI: ffffea039b20f1c0
> [  352.374465] RBP: ffff88983ffd6c00 R08: 0000000000000000 R09: ffffea0061000000
> [  352.382473] R10: 0000000000000001 R11: 0000000000000000 R12: ffffea006f808000
> [  352.390480] R13: 0000000001840000 R14: 000000000e800000 R15: ffff8997d7b764e0
> [  352.398487] FS:  00007f724ef81780(0000) GS:ffff8997df100000(0000) knlGS:0000000000000000
> [  352.407562] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  352.414011] CR2: 00007ffcd4070768 CR3: 000001178c722002 CR4: 00000000003606e0
> [  352.422056] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  352.430092] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  352.438115] Call Trace:
> [  352.440865]  remove_pfn_range_from_zone+0x3a/0x380
> [  352.446244]  ? cpumask_next+0x17/0x20
> [  352.450361]  memunmap_pages+0x17f/0x280
> [  352.454670]  release_nodes+0x22a/0x260
> [  352.458888]  __device_release_driver+0x172/0x220
> [  352.464070]  device_driver_detach+0x3e/0xa0
> [  352.468753]  unbind_store+0x113/0x130
> [  352.472868]  kernfs_fop_write+0xdc/0x1c0
> [  352.477273]  vfs_write+0xde/0x1d0
> [  352.482218]  ksys_write+0x58/0xd0
> [  352.487207]  do_syscall_64+0x5a/0x120
> [  352.492529]  entry_SYSCALL_64_after_hwframe+0x49/0xb3
> [  352.499446] RIP: 0033:0x7f724f40b5e7
> [  352.504673] Code: Bad RIP value.
> [  352.509484] RSP: 002b:00007ffcd40738f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [  352.519213] RAX: ffffffffffffffda RBX: 00007f724ef816a8 RCX: 00007f724f40b5e7
> [  352.528410] RDX: 0000000000000007 RSI: 00005617d7cd1277 RDI: 0000000000000003
> [  352.537573] RBP: 0000000000000003 R08: 00000000ffffffff R09: 00007ffcd40737d0
> [  352.546764] R10: 0000000000000000 R11: 0000000000000246 R12: 00005617d7cd1277
> [  352.555929] R13: 0000000000000000 R14: 0000000000000007 R15: 00005617d7cd1230
> [  370.353742] Built 2 zonelists, mobility grouping on.  Total pages: 49050381
> [  370.373317] Policy zone: Normal
> [  374.948164] Built 3 zonelists, mobility grouping on.  Total pages: 49312525
> [  375.017496] Policy zone: Normal

I'd have stripped this to a reasonable minimum.

> 
> Fixes: commit d33695b16a9f ("mm/memory_hotplug: poison memmap in remove_pfn_range_from_zone()")
> Reported-by: "Scargall, Steve" <steve.scargall@intel.com>
> Reported-by: Ben Widawsky <ben.widawsky@intel.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Vishal Verma <vishal.l.verma@intel.com>
> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> ---
>  mm/memory_hotplug.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 9b34e03e730a..da374cd3d45b 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -471,11 +471,20 @@ void __ref remove_pfn_range_from_zone(struct zone *zone,
>  				      unsigned long start_pfn,
>  				      unsigned long nr_pages)
>  {
> +	const unsigned long end_pfn = start_pfn + nr_pages;
>  	struct pglist_data *pgdat = zone->zone_pgdat;
> -	unsigned long flags;
> +	unsigned long pfn, cur_nr_pages, flags;
>  
>  	/* Poison struct pages because they are now uninitialized again. */
> -	page_init_poison(pfn_to_page(start_pfn), sizeof(struct page) * nr_pages);
> +	for (pfn = start_pfn; pfn < end_pfn; pfn += cur_nr_pages) {
> +		cond_resched();
> +
> +		/* Select all remaining pages up to the next section boundary */
> +		cur_nr_pages =
> +			min(end_pfn - pfn, SECTION_ALIGN_UP(pfn + 1) - pfn);

Nit: I'd have used the same indentation as the code you copied this from.

> +		page_init_poison(pfn_to_page(pfn),
> +				 sizeof(struct page) * cur_nr_pages);
> +	}
>  
>  #ifdef CONFIG_ZONE_DEVICE
>  	/*
> 

Thanks!

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm: Fix false softlockup during pfn range removal
  2020-06-22  7:16 ` David Hildenbrand
@ 2020-06-22 16:25   ` Ben Widawsky
  2020-06-22 16:29     ` David Hildenbrand
  0 siblings, 1 reply; 5+ messages in thread
From: Ben Widawsky @ 2020-06-22 16:25 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, Andrew Morton, Scargall, Steve, Dan Williams, Vishal Verma

On 20-06-22 09:16:20, David Hildenbrand wrote:
> On 20.06.20 01:12, Ben Widawsky wrote:
> > When working with very large nodes, poisoning the struct pages (for
> > which there will be very many) can take a very long time. If the system
> > is using voluntary preemptions, the software watchdog will not be able
> > to detect forward progress. This patch addresses this issue by offering
> > to give up time like __remove_pages() does.  This behavior was
> > introduced in v5.6 with:
> > commit d33695b16a9f ("mm/memory_hotplug: poison memmap in remove_pfn_range_from_zone()")
> 
> We try to stay <= 72 chars in the commit message (except for kernel splats).
> 

Thanks. checkpatch does complain fwiw, I just somehow missed i.

> > 
> > Alternately, init_page_poison could do this cond_resched(), but it seems
> > to me that the caller of init_page_poison() is what actually knows
> > whether or not it should relax its own priority.
> > 
> > Based on Dan's notes, I think this is perfectly safe:
> > commit f931ab479dd2 ("mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}")
> 
> We shouldn't be holding any spin locks, so it's safe. (we could think
> about doing this outside of the memory hotplug lock in the case of
> devmem, however, it's only a debugging feature so we most probably don't
> care).
> 
> > 
> > Aside from fixing the lockup, it is also a friendlier thing to do on
> > lower core systems that might wipe out large chunks of hotplug memory
> > (probably not a very common case).
> 
> It really only is an issue for devmem. Ordinary hotplugged system memory
> is not affected (onlined/offlined in memory block granularity).

Could you explain this a bit? I was fixing the issue found on PMEM systems, but
it seems like regularly memory hotplug was potentially a victim. I think one of
the reasons PMEM might be more likely is the time it takes to work with any data
structures store in the PMEM itself is slower (just a guess).

> 
> > 
> > Fixes this kind of splat:
> > [  352.142079] watchdog: BUG: soft lockup - CPU#46 stuck for 22s! [daxctl:9922]
> > [  352.150067] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security rfkill ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_umad vfat fat kmem intel_rapl_msr intel_rapl_common rpcrdma sunrpc rdma_ucm ib_iser isst_if_common rdma_cm skx_edac iw_cm ib_cm x86_pkg_temp_thermal libiscsi intel_powerclamp scsi_transport_iscsi coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i40iw intel_cstate iTCO_wdt ib_uverbs iTCO_vendor_support device_dax ib_core joydev intel_uncore i2c_i801 lpc_ich ipmi_ssif ioatdma dca wmi ipmi_si ipmi_devintf ipmi_msghandler dax_pmem
> > [  352.150123]  dax_pmem_core acpi_pad acpi_power_meter drm ip_tables xfs libcrc32c nd_pmem nd_btt i40e e1000e crc32c_intel nfit
> > [  352.260774] irq event stamp: 138450
> > [  352.264692] hardirqs last  enabled at (138449): [<ffffffffa1001f26>] trace_hardirqs_on_thunk+0x1a/0x1c
> > [  352.275134] hardirqs last disabled at (138450): [<ffffffffa1001f42>] trace_hardirqs_off_thunk+0x1a/0x1c
> > [  352.285662] softirqs last  enabled at (138448): [<ffffffffa1e00347>] __do_softirq+0x347/0x456
> > [  352.295233] softirqs last disabled at (138443): [<ffffffffa10c416d>] irq_exit+0x7d/0xb0
> > [  352.304210] CPU: 46 PID: 9922 Comm: daxctl Not tainted 5.7.0-BEN-14238-g373c6049b336 #30
> > [  352.313283] Hardware name: Intel Corporation PURLEY/PURLEY, BIOS PLYXCRB1.86B.0578.D07.1902280810 02/28/2019
> > [  352.324308] RIP: 0010:memset_erms+0x9/0x10
> > [  352.328905] Code: c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 f3 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 <f3> aa 4c 89 c8 c3 90 49 89 fa 40 0f b6 ce 48 b8 01 01 01 01 01 01
> > [  352.349953] RSP: 0018:ffffc90021b5fd50 EFLAGS: 00010202 ORIG_RAX: ffffffffffffff13
> > [  352.358450] RAX: 00000000000000ff RBX: ffff88983ffd5000 RCX: 0000000065df0e40
> > [  352.366457] RDX: 00000003a0000000 RSI: 00000000ffffffff RDI: ffffea039b20f1c0
> > [  352.374465] RBP: ffff88983ffd6c00 R08: 0000000000000000 R09: ffffea0061000000
> > [  352.382473] R10: 0000000000000001 R11: 0000000000000000 R12: ffffea006f808000
> > [  352.390480] R13: 0000000001840000 R14: 000000000e800000 R15: ffff8997d7b764e0
> > [  352.398487] FS:  00007f724ef81780(0000) GS:ffff8997df100000(0000) knlGS:0000000000000000
> > [  352.407562] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  352.414011] CR2: 00007ffcd4070768 CR3: 000001178c722002 CR4: 00000000003606e0
> > [  352.422056] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [  352.430092] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [  352.438115] Call Trace:
> > [  352.440865]  remove_pfn_range_from_zone+0x3a/0x380
> > [  352.446244]  ? cpumask_next+0x17/0x20
> > [  352.450361]  memunmap_pages+0x17f/0x280
> > [  352.454670]  release_nodes+0x22a/0x260
> > [  352.458888]  __device_release_driver+0x172/0x220
> > [  352.464070]  device_driver_detach+0x3e/0xa0
> > [  352.468753]  unbind_store+0x113/0x130
> > [  352.472868]  kernfs_fop_write+0xdc/0x1c0
> > [  352.477273]  vfs_write+0xde/0x1d0
> > [  352.482218]  ksys_write+0x58/0xd0
> > [  352.487207]  do_syscall_64+0x5a/0x120
> > [  352.492529]  entry_SYSCALL_64_after_hwframe+0x49/0xb3
> > [  352.499446] RIP: 0033:0x7f724f40b5e7
> > [  352.504673] Code: Bad RIP value.
> > [  352.509484] RSP: 002b:00007ffcd40738f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> > [  352.519213] RAX: ffffffffffffffda RBX: 00007f724ef816a8 RCX: 00007f724f40b5e7
> > [  352.528410] RDX: 0000000000000007 RSI: 00005617d7cd1277 RDI: 0000000000000003
> > [  352.537573] RBP: 0000000000000003 R08: 00000000ffffffff R09: 00007ffcd40737d0
> > [  352.546764] R10: 0000000000000000 R11: 0000000000000246 R12: 00005617d7cd1277
> > [  352.555929] R13: 0000000000000000 R14: 0000000000000007 R15: 00005617d7cd1230
> > [  370.353742] Built 2 zonelists, mobility grouping on.  Total pages: 49050381
> > [  370.373317] Policy zone: Normal
> > [  374.948164] Built 3 zonelists, mobility grouping on.  Total pages: 49312525
> > [  375.017496] Policy zone: Normal
> 
> I'd have stripped this to a reasonable minimum.
> 
> > 
> > Fixes: commit d33695b16a9f ("mm/memory_hotplug: poison memmap in remove_pfn_range_from_zone()")
> > Reported-by: "Scargall, Steve" <steve.scargall@intel.com>
> > Reported-by: Ben Widawsky <ben.widawsky@intel.com>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: David Hildenbrand <david@redhat.com>
> > Cc: Vishal Verma <vishal.l.verma@intel.com>
> > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > ---
> >  mm/memory_hotplug.c | 13 +++++++++++--
> >  1 file changed, 11 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > index 9b34e03e730a..da374cd3d45b 100644
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -471,11 +471,20 @@ void __ref remove_pfn_range_from_zone(struct zone *zone,
> >  				      unsigned long start_pfn,
> >  				      unsigned long nr_pages)
> >  {
> > +	const unsigned long end_pfn = start_pfn + nr_pages;
> >  	struct pglist_data *pgdat = zone->zone_pgdat;
> > -	unsigned long flags;
> > +	unsigned long pfn, cur_nr_pages, flags;
> >  
> >  	/* Poison struct pages because they are now uninitialized again. */
> > -	page_init_poison(pfn_to_page(start_pfn), sizeof(struct page) * nr_pages);
> > +	for (pfn = start_pfn; pfn < end_pfn; pfn += cur_nr_pages) {
> > +		cond_resched();
> > +
> > +		/* Select all remaining pages up to the next section boundary */
> > +		cur_nr_pages =
> > +			min(end_pfn - pfn, SECTION_ALIGN_UP(pfn + 1) - pfn);
> 
> Nit: I'd have used the same indentation as the code you copied this from.
> 
> > +		page_init_poison(pfn_to_page(pfn),
> > +				 sizeof(struct page) * cur_nr_pages);
> > +	}
> >  
> >  #ifdef CONFIG_ZONE_DEVICE
> >  	/*
> > 
> 
> Thanks!
> 
> Acked-by: David Hildenbrand <david@redhat.com>
> 
> -- 
> Thanks,
> 
> David / dhildenb


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm: Fix false softlockup during pfn range removal
  2020-06-22 16:25   ` Ben Widawsky
@ 2020-06-22 16:29     ` David Hildenbrand
  0 siblings, 0 replies; 5+ messages in thread
From: David Hildenbrand @ 2020-06-22 16:29 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, Scargall, Steve, Dan Williams, Vishal Verma

>>> Alternately, init_page_poison could do this cond_resched(), but it seems
>>> to me that the caller of init_page_poison() is what actually knows
>>> whether or not it should relax its own priority.
>>>
>>> Based on Dan's notes, I think this is perfectly safe:
>>> commit f931ab479dd2 ("mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}")
>>
>> We shouldn't be holding any spin locks, so it's safe. (we could think
>> about doing this outside of the memory hotplug lock in the case of
>> devmem, however, it's only a debugging feature so we most probably don't
>> care).
>>
>>>
>>> Aside from fixing the lockup, it is also a friendlier thing to do on
>>> lower core systems that might wipe out large chunks of hotplug memory
>>> (probably not a very common case).
>>
>> It really only is an issue for devmem. Ordinary hotplugged system memory
>> is not affected (onlined/offlined in memory block granularity).
> 
> Could you explain this a bit? I was fixing the issue found on PMEM systems, but
> it seems like regularly memory hotplug was potentially a victim. I think one of
> the reasons PMEM might be more likely is the time it takes to work with any data
> structures store in the PMEM itself is slower (just a guess).

For system RAM, we have have (except one ppc exception):

memory_block_action()->offline_pages()->remove_pfn_range_from_zone()

Memory blocks span 1..X sections, usually only one. On x86-64, a memory
block is therefore 128MB..2G. Not a sufficiently large memmap to
actually trigger this - in contrast to ZONE_DEVICE regions.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm: Fix false softlockup during pfn range removal
  2020-06-19 23:12 [PATCH] mm: Fix false softlockup during pfn range removal Ben Widawsky
  2020-06-22  7:16 ` David Hildenbrand
@ 2020-06-23  7:18 ` David Hildenbrand
  1 sibling, 0 replies; 5+ messages in thread
From: David Hildenbrand @ 2020-06-23  7:18 UTC (permalink / raw)
  To: Ben Widawsky, linux-mm
  Cc: Andrew Morton, Scargall, Steve, Dan Williams, Vishal Verma

On 20.06.20 01:12, Ben Widawsky wrote:
> When working with very large nodes, poisoning the struct pages (for
> which there will be very many) can take a very long time. If the system
> is using voluntary preemptions, the software watchdog will not be able
> to detect forward progress. This patch addresses this issue by offering
> to give up time like __remove_pages() does.  This behavior was
> introduced in v5.6 with:
> commit d33695b16a9f ("mm/memory_hotplug: poison memmap in remove_pfn_range_from_zone()")
> 
> Alternately, init_page_poison could do this cond_resched(), but it seems
> to me that the caller of init_page_poison() is what actually knows
> whether or not it should relax its own priority.
> 
> Based on Dan's notes, I think this is perfectly safe:
> commit f931ab479dd2 ("mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}")
> 
> Aside from fixing the lockup, it is also a friendlier thing to do on
> lower core systems that might wipe out large chunks of hotplug memory
> (probably not a very common case).

BTW, I think this is even a fix for !VMEMMAP. page_init_poison() will
just do a memset. This is only guaranteed to work on section basis
correctly without SPARSE_VMEMMAP.

Thanks!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-06-23  7:18 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-19 23:12 [PATCH] mm: Fix false softlockup during pfn range removal Ben Widawsky
2020-06-22  7:16 ` David Hildenbrand
2020-06-22 16:25   ` Ben Widawsky
2020-06-22 16:29     ` David Hildenbrand
2020-06-23  7:18 ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).