From: Ben Widawsky <ben.widawsky@intel.com>
To: David Hildenbrand <david@redhat.com>
Cc: linux-mm <linux-mm@kvack.org>,
Andrew Morton <akpm@linux-foundation.org>,
"Scargall, Steve" <steve.scargall@intel.com>,
Dan Williams <dan.j.williams@intel.com>,
Vishal Verma <vishal.l.verma@intel.com>
Subject: Re: [PATCH] mm: Fix false softlockup during pfn range removal
Date: Mon, 22 Jun 2020 09:25:27 -0700 [thread overview]
Message-ID: <20200622162527.bo765xhid563u6vp@intel.com> (raw)
In-Reply-To: <f6677b9d-98c4-b959-1c03-6bdf4d81b71c@redhat.com>
On 20-06-22 09:16:20, David Hildenbrand wrote:
> On 20.06.20 01:12, Ben Widawsky wrote:
> > When working with very large nodes, poisoning the struct pages (for
> > which there will be very many) can take a very long time. If the system
> > is using voluntary preemptions, the software watchdog will not be able
> > to detect forward progress. This patch addresses this issue by offering
> > to give up time like __remove_pages() does. This behavior was
> > introduced in v5.6 with:
> > commit d33695b16a9f ("mm/memory_hotplug: poison memmap in remove_pfn_range_from_zone()")
>
> We try to stay <= 72 chars in the commit message (except for kernel splats).
>
Thanks. checkpatch does complain fwiw, I just somehow missed i.
> >
> > Alternately, init_page_poison could do this cond_resched(), but it seems
> > to me that the caller of init_page_poison() is what actually knows
> > whether or not it should relax its own priority.
> >
> > Based on Dan's notes, I think this is perfectly safe:
> > commit f931ab479dd2 ("mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}")
>
> We shouldn't be holding any spin locks, so it's safe. (we could think
> about doing this outside of the memory hotplug lock in the case of
> devmem, however, it's only a debugging feature so we most probably don't
> care).
>
> >
> > Aside from fixing the lockup, it is also a friendlier thing to do on
> > lower core systems that might wipe out large chunks of hotplug memory
> > (probably not a very common case).
>
> It really only is an issue for devmem. Ordinary hotplugged system memory
> is not affected (onlined/offlined in memory block granularity).
Could you explain this a bit? I was fixing the issue found on PMEM systems, but
it seems like regularly memory hotplug was potentially a victim. I think one of
the reasons PMEM might be more likely is the time it takes to work with any data
structures store in the PMEM itself is slower (just a guess).
>
> >
> > Fixes this kind of splat:
> > [ 352.142079] watchdog: BUG: soft lockup - CPU#46 stuck for 22s! [daxctl:9922]
> > [ 352.150067] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security rfkill ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_umad vfat fat kmem intel_rapl_msr intel_rapl_common rpcrdma sunrpc rdma_ucm ib_iser isst_if_common rdma_cm skx_edac iw_cm ib_cm x86_pkg_temp_thermal libiscsi intel_powerclamp scsi_transport_iscsi coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i40iw intel_cstate iTCO_wdt ib_uverbs iTCO_vendor_support device_dax ib_core joydev intel_uncore i2c_i801 lpc_ich ipmi_ssif ioatdma dca wmi ipmi_si ipmi_devintf ipmi_msghandler dax_pmem
> > [ 352.150123] dax_pmem_core acpi_pad acpi_power_meter drm ip_tables xfs libcrc32c nd_pmem nd_btt i40e e1000e crc32c_intel nfit
> > [ 352.260774] irq event stamp: 138450
> > [ 352.264692] hardirqs last enabled at (138449): [<ffffffffa1001f26>] trace_hardirqs_on_thunk+0x1a/0x1c
> > [ 352.275134] hardirqs last disabled at (138450): [<ffffffffa1001f42>] trace_hardirqs_off_thunk+0x1a/0x1c
> > [ 352.285662] softirqs last enabled at (138448): [<ffffffffa1e00347>] __do_softirq+0x347/0x456
> > [ 352.295233] softirqs last disabled at (138443): [<ffffffffa10c416d>] irq_exit+0x7d/0xb0
> > [ 352.304210] CPU: 46 PID: 9922 Comm: daxctl Not tainted 5.7.0-BEN-14238-g373c6049b336 #30
> > [ 352.313283] Hardware name: Intel Corporation PURLEY/PURLEY, BIOS PLYXCRB1.86B.0578.D07.1902280810 02/28/2019
> > [ 352.324308] RIP: 0010:memset_erms+0x9/0x10
> > [ 352.328905] Code: c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 f3 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 <f3> aa 4c 89 c8 c3 90 49 89 fa 40 0f b6 ce 48 b8 01 01 01 01 01 01
> > [ 352.349953] RSP: 0018:ffffc90021b5fd50 EFLAGS: 00010202 ORIG_RAX: ffffffffffffff13
> > [ 352.358450] RAX: 00000000000000ff RBX: ffff88983ffd5000 RCX: 0000000065df0e40
> > [ 352.366457] RDX: 00000003a0000000 RSI: 00000000ffffffff RDI: ffffea039b20f1c0
> > [ 352.374465] RBP: ffff88983ffd6c00 R08: 0000000000000000 R09: ffffea0061000000
> > [ 352.382473] R10: 0000000000000001 R11: 0000000000000000 R12: ffffea006f808000
> > [ 352.390480] R13: 0000000001840000 R14: 000000000e800000 R15: ffff8997d7b764e0
> > [ 352.398487] FS: 00007f724ef81780(0000) GS:ffff8997df100000(0000) knlGS:0000000000000000
> > [ 352.407562] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 352.414011] CR2: 00007ffcd4070768 CR3: 000001178c722002 CR4: 00000000003606e0
> > [ 352.422056] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 352.430092] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [ 352.438115] Call Trace:
> > [ 352.440865] remove_pfn_range_from_zone+0x3a/0x380
> > [ 352.446244] ? cpumask_next+0x17/0x20
> > [ 352.450361] memunmap_pages+0x17f/0x280
> > [ 352.454670] release_nodes+0x22a/0x260
> > [ 352.458888] __device_release_driver+0x172/0x220
> > [ 352.464070] device_driver_detach+0x3e/0xa0
> > [ 352.468753] unbind_store+0x113/0x130
> > [ 352.472868] kernfs_fop_write+0xdc/0x1c0
> > [ 352.477273] vfs_write+0xde/0x1d0
> > [ 352.482218] ksys_write+0x58/0xd0
> > [ 352.487207] do_syscall_64+0x5a/0x120
> > [ 352.492529] entry_SYSCALL_64_after_hwframe+0x49/0xb3
> > [ 352.499446] RIP: 0033:0x7f724f40b5e7
> > [ 352.504673] Code: Bad RIP value.
> > [ 352.509484] RSP: 002b:00007ffcd40738f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> > [ 352.519213] RAX: ffffffffffffffda RBX: 00007f724ef816a8 RCX: 00007f724f40b5e7
> > [ 352.528410] RDX: 0000000000000007 RSI: 00005617d7cd1277 RDI: 0000000000000003
> > [ 352.537573] RBP: 0000000000000003 R08: 00000000ffffffff R09: 00007ffcd40737d0
> > [ 352.546764] R10: 0000000000000000 R11: 0000000000000246 R12: 00005617d7cd1277
> > [ 352.555929] R13: 0000000000000000 R14: 0000000000000007 R15: 00005617d7cd1230
> > [ 370.353742] Built 2 zonelists, mobility grouping on. Total pages: 49050381
> > [ 370.373317] Policy zone: Normal
> > [ 374.948164] Built 3 zonelists, mobility grouping on. Total pages: 49312525
> > [ 375.017496] Policy zone: Normal
>
> I'd have stripped this to a reasonable minimum.
>
> >
> > Fixes: commit d33695b16a9f ("mm/memory_hotplug: poison memmap in remove_pfn_range_from_zone()")
> > Reported-by: "Scargall, Steve" <steve.scargall@intel.com>
> > Reported-by: Ben Widawsky <ben.widawsky@intel.com>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: David Hildenbrand <david@redhat.com>
> > Cc: Vishal Verma <vishal.l.verma@intel.com>
> > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > ---
> > mm/memory_hotplug.c | 13 +++++++++++--
> > 1 file changed, 11 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > index 9b34e03e730a..da374cd3d45b 100644
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -471,11 +471,20 @@ void __ref remove_pfn_range_from_zone(struct zone *zone,
> > unsigned long start_pfn,
> > unsigned long nr_pages)
> > {
> > + const unsigned long end_pfn = start_pfn + nr_pages;
> > struct pglist_data *pgdat = zone->zone_pgdat;
> > - unsigned long flags;
> > + unsigned long pfn, cur_nr_pages, flags;
> >
> > /* Poison struct pages because they are now uninitialized again. */
> > - page_init_poison(pfn_to_page(start_pfn), sizeof(struct page) * nr_pages);
> > + for (pfn = start_pfn; pfn < end_pfn; pfn += cur_nr_pages) {
> > + cond_resched();
> > +
> > + /* Select all remaining pages up to the next section boundary */
> > + cur_nr_pages =
> > + min(end_pfn - pfn, SECTION_ALIGN_UP(pfn + 1) - pfn);
>
> Nit: I'd have used the same indentation as the code you copied this from.
>
> > + page_init_poison(pfn_to_page(pfn),
> > + sizeof(struct page) * cur_nr_pages);
> > + }
> >
> > #ifdef CONFIG_ZONE_DEVICE
> > /*
> >
>
> Thanks!
>
> Acked-by: David Hildenbrand <david@redhat.com>
>
> --
> Thanks,
>
> David / dhildenb
next prev parent reply other threads:[~2020-06-22 16:25 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-19 23:12 [PATCH] mm: Fix false softlockup during pfn range removal Ben Widawsky
2020-06-22 7:16 ` David Hildenbrand
2020-06-22 16:25 ` Ben Widawsky [this message]
2020-06-22 16:29 ` David Hildenbrand
2020-06-23 7:18 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200622162527.bo765xhid563u6vp@intel.com \
--to=ben.widawsky@intel.com \
--cc=akpm@linux-foundation.org \
--cc=dan.j.williams@intel.com \
--cc=david@redhat.com \
--cc=linux-mm@kvack.org \
--cc=steve.scargall@intel.com \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).