mm-commits.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: akpm@linux-foundation.org, ben.widawsky@intel.com,
	dan.j.williams@intel.com, david@redhat.com,
	mm-commits@vger.kernel.org, stable@vger.kernel.org,
	steve.scargall@intel.com, torvalds@linux-foundation.org,
	vishal.l.verma@intel.com
Subject: [patch 31/32] mm/memory_hotplug.c: fix false softlockup during pfn range removal
Date: Thu, 25 Jun 2020 20:30:51 -0700	[thread overview]
Message-ID: <20200626033051.73wfBW4YT%akpm@linux-foundation.org> (raw)
In-Reply-To: <20200625202807.b630829d6fa55388148bee7d@linux-foundation.org>

From: Ben Widawsky <ben.widawsky@intel.com>
Subject: mm/memory_hotplug.c: fix false softlockup during pfn range removal

When working with very large nodes, poisoning the struct pages (for which
there will be very many) can take a very long time.  If the system is
using voluntary preemptions, the software watchdog will not be able to
detect forward progress.  This patch addresses this issue by offering to
give up time like __remove_pages() does.  This behavior was introduced in
v5.6 with: commit d33695b16a9f ("mm/memory_hotplug: poison memmap in
remove_pfn_range_from_zone()")

Alternately, init_page_poison could do this cond_resched(), but it seems
to me that the caller of init_page_poison() is what actually knows whether
or not it should relax its own priority.

Based on Dan's notes, I think this is perfectly safe: commit f931ab479dd2
("mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}")

Aside from fixing the lockup, it is also a friendlier thing to do on lower
core systems that might wipe out large chunks of hotplug memory (probably
not a very common case).

Fixes this kind of splat:
[  352.142079] watchdog: BUG: soft lockup - CPU#46 stuck for 22s! [daxctl:9922]
[  352.150067] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security rfkill ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_umad vfat fat kmem intel_rapl_msr intel_rapl_common rpcrdma sunrpc rdma_ucm ib_iser isst_if_common rdma_cm skx_edac iw_cm ib_cm x86_pkg_temp_thermal libiscsi intel_powerclamp scsi_transport_iscsi coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i40iw intel_cstate iTCO_wdt ib_uverbs iTCO_vendor_support device_dax ib_core joydev intel_uncore i2c_i801 lpc_ich ipmi_ssif ioatdma dca wmi ipmi_si ipmi_devintf ipmi_msghandler dax_pmem
[  352.150123]  dax_pmem_core acpi_pad acpi_power_meter drm ip_tables xfs libcrc32c nd_pmem nd_btt i40e e1000e crc32c_intel nfit
[  352.260774] irq event stamp: 138450
[  352.264692] hardirqs last  enabled at (138449): [<ffffffffa1001f26>] trace_hardirqs_on_thunk+0x1a/0x1c
[  352.275134] hardirqs last disabled at (138450): [<ffffffffa1001f42>] trace_hardirqs_off_thunk+0x1a/0x1c
[  352.285662] softirqs last  enabled at (138448): [<ffffffffa1e00347>] __do_softirq+0x347/0x456
[  352.295233] softirqs last disabled at (138443): [<ffffffffa10c416d>] irq_exit+0x7d/0xb0
[  352.304210] CPU: 46 PID: 9922 Comm: daxctl Not tainted 5.7.0-BEN-14238-g373c6049b336 #30
[  352.313283] Hardware name: Intel Corporation PURLEY/PURLEY, BIOS PLYXCRB1.86B.0578.D07.1902280810 02/28/2019
[  352.324308] RIP: 0010:memset_erms+0x9/0x10
[  352.328905] Code: c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 f3 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 <f3> aa 4c 89 c8 c3 90 49 89 fa 40 0f b6 ce 48 b8 01 01 01 01 01 01
[  352.349953] RSP: 0018:ffffc90021b5fd50 EFLAGS: 00010202 ORIG_RAX: ffffffffffffff13
[  352.358450] RAX: 00000000000000ff RBX: ffff88983ffd5000 RCX: 0000000065df0e40
[  352.366457] RDX: 00000003a0000000 RSI: 00000000ffffffff RDI: ffffea039b20f1c0
[  352.374465] RBP: ffff88983ffd6c00 R08: 0000000000000000 R09: ffffea0061000000
[  352.382473] R10: 0000000000000001 R11: 0000000000000000 R12: ffffea006f808000
[  352.390480] R13: 0000000001840000 R14: 000000000e800000 R15: ffff8997d7b764e0
[  352.398487] FS:  00007f724ef81780(0000) GS:ffff8997df100000(0000) knlGS:0000000000000000
[  352.407562] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  352.414011] CR2: 00007ffcd4070768 CR3: 000001178c722002 CR4: 00000000003606e0
[  352.422056] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  352.430092] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  352.438115] Call Trace:
[  352.440865]  remove_pfn_range_from_zone+0x3a/0x380
[  352.446244]  ? cpumask_next+0x17/0x20
[  352.450361]  memunmap_pages+0x17f/0x280
[  352.454670]  release_nodes+0x22a/0x260
[  352.458888]  __device_release_driver+0x172/0x220
[  352.464070]  device_driver_detach+0x3e/0xa0
[  352.468753]  unbind_store+0x113/0x130
[  352.472868]  kernfs_fop_write+0xdc/0x1c0
[  352.477273]  vfs_write+0xde/0x1d0
[  352.482218]  ksys_write+0x58/0xd0
[  352.487207]  do_syscall_64+0x5a/0x120
[  352.492529]  entry_SYSCALL_64_after_hwframe+0x49/0xb3
[  352.499446] RIP: 0033:0x7f724f40b5e7
[  352.504673] Code: Bad RIP value.
[  352.509484] RSP: 002b:00007ffcd40738f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  352.519213] RAX: ffffffffffffffda RBX: 00007f724ef816a8 RCX: 00007f724f40b5e7
[  352.528410] RDX: 0000000000000007 RSI: 00005617d7cd1277 RDI: 0000000000000003
[  352.537573] RBP: 0000000000000003 R08: 00000000ffffffff R09: 00007ffcd40737d0
[  352.546764] R10: 0000000000000000 R11: 0000000000000246 R12: 00005617d7cd1277
[  352.555929] R13: 0000000000000000 R14: 0000000000000007 R15: 00005617d7cd1230
[  370.353742] Built 2 zonelists, mobility grouping on.  Total pages: 49050381
[  370.373317] Policy zone: Normal
[  374.948164] Built 3 zonelists, mobility grouping on.  Total pages: 49312525
[  375.017496] Policy zone: Normal

David said: "It really only is an issue for devmem.  Ordinary
hotplugged system memory is not affected (onlined/offlined in memory
block granularity)."

Link: http://lkml.kernel.org/r/20200619231213.1160351-1-ben.widawsky@intel.com
Fixes: commit d33695b16a9f ("mm/memory_hotplug: poison memmap in remove_pfn_range_from_zone()")
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Reported-by: "Scargall, Steve" <steve.scargall@intel.com>
Reported-by: Ben Widawsky <ben.widawsky@intel.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memory_hotplug.c |   13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

--- a/mm/memory_hotplug.c~mm-fix-false-softlockup-during-pfn-range-removal
+++ a/mm/memory_hotplug.c
@@ -471,11 +471,20 @@ void __ref remove_pfn_range_from_zone(st
 				      unsigned long start_pfn,
 				      unsigned long nr_pages)
 {
+	const unsigned long end_pfn = start_pfn + nr_pages;
 	struct pglist_data *pgdat = zone->zone_pgdat;
-	unsigned long flags;
+	unsigned long pfn, cur_nr_pages, flags;
 
 	/* Poison struct pages because they are now uninitialized again. */
-	page_init_poison(pfn_to_page(start_pfn), sizeof(struct page) * nr_pages);
+	for (pfn = start_pfn; pfn < end_pfn; pfn += cur_nr_pages) {
+		cond_resched();
+
+		/* Select all remaining pages up to the next section boundary */
+		cur_nr_pages =
+			min(end_pfn - pfn, SECTION_ALIGN_UP(pfn + 1) - pfn);
+		page_init_poison(pfn_to_page(pfn),
+				 sizeof(struct page) * cur_nr_pages);
+	}
 
 #ifdef CONFIG_ZONE_DEVICE
 	/*
_

  parent reply	other threads:[~2020-06-26  3:30 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-26  3:28 incoming Andrew Morton
2020-06-26  3:29 ` [patch 01/32] openrisc: fix boot oops when DEBUG_VM is enabled Andrew Morton
2020-06-26  3:29 ` [patch 02/32] mm: do_swap_page(): fix up the error code Andrew Morton
2020-06-26  3:29 ` [patch 03/32] mm, compaction: make capture control handling safe wrt interrupts Andrew Morton
2020-06-26  3:29 ` [patch 04/32] kexec: do not verify the signature without the lockdown or mandatory signature Andrew Morton
2020-06-26  3:29 ` [patch 05/32] ocfs2: avoid inode removal while nfsd is accessing it Andrew Morton
2020-06-26  3:29 ` [patch 06/32] ocfs2: load global_inode_alloc Andrew Morton
2020-06-26  3:29 ` [patch 07/32] ocfs2: fix panic on nfs server over ocfs2 Andrew Morton
2020-06-26  3:29 ` [patch 08/32] ocfs2: fix value of OCFS2_INVALID_SLOT Andrew Morton
2020-06-26  3:29 ` [patch 09/32] lib: fix test_hmm.c reference after free Andrew Morton
2020-06-26  3:29 ` [patch 10/32] linux/bits.h: fix unsigned less than zero warnings Andrew Morton
2020-06-26  3:29 ` [patch 11/32] mm, slab: fix sign conversion problem in memcg_uncharge_slab() Andrew Morton
2020-06-26  3:29 ` [patch 12/32] mm/slab: use memzero_explicit() in kzfree() Andrew Morton
2020-06-26  3:29 ` [patch 13/32] slub: cure list_slab_objects() from double fix Andrew Morton
2020-06-26  3:29 ` [patch 14/32] mm: fix swap cache node allocation mask Andrew Morton
2020-06-26  3:30 ` [patch 15/32] mm/memory.c: properly pte_offset_map_lock/unlock in vm_insert_pages() Andrew Morton
2020-06-26  3:30 ` [patch 16/32] mm/debug_vm_pgtable: fix build failure with powerpc 8xx Andrew Morton
2020-06-26  3:30 ` [patch 17/32] make asm-generic/cacheflush.h more standalone Andrew Morton
2020-06-26  3:30 ` [patch 18/32] media: omap3isp: remove cacheflush.h Andrew Morton
2020-06-26  3:30 ` [patch 19/32] mm/vmalloc.c: fix a warning while make xmldocs Andrew Morton
2020-06-26  3:30 ` [patch 20/32] mm: memcontrol: handle div0 crash race condition in memory.low Andrew Morton
2020-06-26  3:30 ` [patch 21/32] mm/memcontrol.c: add missed css_put() Andrew Morton
2020-06-26  3:30 ` [patch 22/32] mm/memcontrol.c: prevent missed memory.low load tears Andrew Morton
2020-06-26  3:30 ` [patch 23/32] docs: mm/gup: minor documentation update Andrew Morton
2020-06-26  3:30 ` [patch 24/32] doc: THP CoW fault no longer allocate THP Andrew Morton
2020-06-26  3:30 ` [patch 25/32] mm: workingset: age nonresident information alongside anonymous pages Andrew Morton
2020-06-26  3:30 ` [patch 26/32] mm/swap: fix for "mm: workingset: age nonresident information alongside anonymous pages" Andrew Morton
2020-06-26  3:30 ` [patch 27/32] mm/memory: fix IO cost for anonymous page Andrew Morton
2020-06-26  3:30 ` [patch 28/32] x86/hyperv: allocate the hypercall page with only read and execute bits Andrew Morton
2020-06-26  3:30 ` [patch 29/32] arm64: use PAGE_KERNEL_ROX directly in alloc_insn_page Andrew Morton
2020-06-26  3:30 ` [patch 30/32] mm: remove vmalloc_exec Andrew Morton
2020-06-26  3:30 ` Andrew Morton [this message]
2020-06-26  3:30 ` [patch 32/32] MAINTAINERS: update info for sparse Andrew Morton
2020-06-27  3:32 ` + linux-next-git-rejects.patch added to -mm tree Andrew Morton
2020-06-27  3:32 ` [merged] dma-remap-align-the-size-in-dma_common__remap.patch removed from " Andrew Morton
2020-06-27  3:32 ` [merged] openrisc-fix-boot-oops-when-debug_vm-is-enabled.patch " Andrew Morton
2020-06-27  3:33 ` [merged] mm-do_swap_page-fix-up-the-error-code-instantiation.patch " Andrew Morton
2020-06-27  3:33 ` [merged] mm-compaction-make-capture-control-handling-safe-wrt-interrupts.patch " Andrew Morton
2020-06-27  3:33 ` [merged] kexec-do-not-verify-the-signature-without-the-lockdown-or-mandatory-signature.patch " Andrew Morton
2020-06-27  3:33 ` [merged] ocfs2-avoid-inode-removed-while-nfsd-access-it.patch " Andrew Morton
2020-06-27  3:33 ` [merged] ocfs2-load-global_inode_alloc.patch " Andrew Morton
2020-06-27  3:33 ` [merged] ocfs2-fix-panic-on-nfs-server-over-ocfs2.patch " Andrew Morton
2020-06-27  3:33 ` [merged] ocfs2-fix-value-of-ocfs2_invalid_slot.patch " Andrew Morton
2020-06-27  3:33 ` [merged] lib-fix-test_hmmc-reference-after-free.patch " Andrew Morton
2020-06-27  3:33 ` [merged] mm-slab-fix-sign-conversion-problem-in-memcg_uncharge_slab.patch " Andrew Morton
2020-06-27  3:33 ` [merged] mm-slab-use-memzero_explicit-in-kzfree.patch " Andrew Morton
2020-06-27  3:33 ` [merged] slub-cure-list_slab_objects-from-double-fix.patch " Andrew Morton
2020-06-27  3:33 ` [merged] mm-fix-swap-cache-node-allocation-mask.patch " Andrew Morton
2020-06-27  3:33 ` [merged] mm-memoryc-properly-pte_offset_map_lock-unlock-in-vm_insert_pages.patch " Andrew Morton
2020-06-27  3:33 ` [merged] mm-debug_vm_pgtable-fix-build-failure-with-powerpc-8xx.patch " Andrew Morton
2020-06-27  3:33 ` [merged] make-asm-generic-cacheflushh-more-standalone.patch " Andrew Morton
2020-06-27  3:33 ` [merged] media-omap3isp-remove-cacheflushh.patch " Andrew Morton
2020-06-27  3:33 ` [merged] mm-fix-a-warning-while-make-xmldocs.patch " Andrew Morton
2020-06-27  3:33 ` [merged] mm-memcontrol-handle-div0-crash-race-condition-in-memorylow.patch " Andrew Morton
2020-06-27  3:33 ` [merged] mm-memcontrol-fix-do-not-put-the-css-reference.patch " Andrew Morton
2020-06-27  3:33 ` [merged] mm-memcg-prevent-missed-memorylow-load-tears.patch " Andrew Morton
2020-06-27  3:33 ` [merged] docs-mm-gup-minor-documentation-update.patch " Andrew Morton
2020-06-27  3:33 ` [merged] doc-thp-cow-fault-no-longer-allocate-thp.patch " Andrew Morton
2020-06-27  3:33 ` [merged] mm-workingset-age-nonresident-information-alongside-anonymous-pages.patch " Andrew Morton
2020-06-27  3:34 ` [merged] mm-swap-fix-for-mm-workingset-age-nonresident-information-alongside-anonymous-pages.patch " Andrew Morton
2020-06-27  3:34 ` [merged] mm-memory-fix-io-cost-for-anonymous-page.patch " Andrew Morton
2020-06-27  3:34 ` [merged] x86-hyperv-allocate-the-hypercall-page-with-only-read-and-execute-bits.patch " Andrew Morton
2020-06-27  3:34 ` [merged] arm64-use-page_kernel_rox-directly-in-alloc_insn_page.patch " Andrew Morton
2020-06-27  3:34 ` [merged] mm-remove-vmalloc_exec.patch " Andrew Morton
2020-06-27  3:34 ` [merged] mm-fix-false-softlockup-during-pfn-range-removal.patch " Andrew Morton
2020-06-27  3:34 ` [merged] maintainers-update-info-for-sparse.patch " Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200626033051.73wfBW4YT%akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=ben.widawsky@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mm-commits@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=steve.scargall@intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).