From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1491C433E0 for ; Fri, 19 Jun 2020 23:12:26 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BB3F92242E for ; Fri, 19 Jun 2020 23:12:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BB3F92242E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3C4686B0072; Fri, 19 Jun 2020 19:12:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 34E246B0073; Fri, 19 Jun 2020 19:12:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 215146B0074; Fri, 19 Jun 2020 19:12:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0229.hostedemail.com [216.40.44.229]) by kanga.kvack.org (Postfix) with ESMTP id 0395F6B0072 for ; Fri, 19 Jun 2020 19:12:25 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 862781EF2 for ; Fri, 19 Jun 2020 23:12:25 +0000 (UTC) X-FDA: 76947512250.16.music41_1a10c5126e1c Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin16.hostedemail.com (Postfix) with ESMTP id 56CD5100E690B for ; Fri, 19 Jun 2020 23:12:25 +0000 (UTC) X-HE-Tag: music41_1a10c5126e1c X-Filterd-Recvd-Size: 8741 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by imf40.hostedemail.com (Postfix) with ESMTP for ; Fri, 19 Jun 2020 23:12:24 +0000 (UTC) IronPort-SDR: JJ8rx13JevosslTEZFMRCu17Ro25W5KE43XqbYvRIyVEzFOoYiOGazi1QaB+LjPIewNzpKAVfq a29EDwppfmQQ== X-IronPort-AV: E=McAfee;i="6000,8403,9657"; a="123401927" X-IronPort-AV: E=Sophos;i="5.75,256,1589266800"; d="scan'208";a="123401927" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 16:12:22 -0700 IronPort-SDR: lk18w2JoU0bANOKYP6TG8zsDg2+wk8Y5j9wG6cIAWm1D9InPN4JYDpmkQqY/w7apA+gDnxjywg RpGY9BnuQ+qw== X-IronPort-AV: E=Sophos;i="5.75,256,1589266800"; d="scan'208";a="310316367" Received: from sjiang-mobl2.ccr.corp.intel.com (HELO bwidawsk-mobl5.local) ([10.252.131.131]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2020 16:12:22 -0700 From: Ben Widawsky To: linux-mm Cc: Andrew Morton , Ben Widawsky , "Scargall, Steve" , Dan Williams , David Hildenbrand , Vishal Verma Subject: [PATCH] mm: Fix false softlockup during pfn range removal Date: Fri, 19 Jun 2020 16:12:13 -0700 Message-Id: <20200619231213.1160351-1-ben.widawsky@intel.com> X-Mailer: git-send-email 2.27.0 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 56CD5100E690B X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When working with very large nodes, poisoning the struct pages (for which there will be very many) can take a very long time. If the system is using voluntary preemptions, the software watchdog will not be able to detect forward progress. This patch addresses this issue by offering to give up time like __remove_pages() does. This behavior was introduced in v5.6 with: commit d33695b16a9f ("mm/memory_hotplug: poison memmap in remove_pfn_range_= from_zone()") Alternately, init_page_poison could do this cond_resched(), but it seems to me that the caller of init_page_poison() is what actually knows whether or not it should relax its own priority. Based on Dan's notes, I think this is perfectly safe: commit f931ab479dd2 ("mm: fix devm_memremap_pages crash, use mem_hotplug_{b= egin, done}") Aside from fixing the lockup, it is also a friendlier thing to do on lower core systems that might wipe out large chunks of hotplug memory (probably not a very common case). Fixes this kind of splat: [ 352.142079] watchdog: BUG: soft lockup - CPU#46 stuck for 22s! [daxctl:9= 922] [ 352.150067] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nf= t_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_c= hain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle = ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ip= v6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security rfkill ip_set= nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filte= r ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_s= rp ib_ipoib ib_umad vfat fat kmem intel_rapl_msr intel_rapl_common rpcrdma = sunrpc rdma_ucm ib_iser isst_if_common rdma_cm skx_edac iw_cm ib_cm x86_pkg= _temp_thermal libiscsi intel_powerclamp scsi_transport_iscsi coretemp kvm_i= ntel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i40iw = intel_cstate iTCO_wdt ib_uverbs iTCO_vendor_support device_dax ib_core joyd= ev intel_uncore i2c_i801 lpc_ich ipmi_ssif ioatdma dca wmi ipmi_si ipmi_dev= intf ipmi_msghandler dax_pmem [ 352.150123] dax_pmem_core acpi_pad acpi_power_meter drm ip_tables xfs l= ibcrc32c nd_pmem nd_btt i40e e1000e crc32c_intel nfit [ 352.260774] irq event stamp: 138450 [ 352.264692] hardirqs last enabled at (138449): [] tra= ce_hardirqs_on_thunk+0x1a/0x1c [ 352.275134] hardirqs last disabled at (138450): [] tra= ce_hardirqs_off_thunk+0x1a/0x1c [ 352.285662] softirqs last enabled at (138448): [] __d= o_softirq+0x347/0x456 [ 352.295233] softirqs last disabled at (138443): [] irq= _exit+0x7d/0xb0 [ 352.304210] CPU: 46 PID: 9922 Comm: daxctl Not tainted 5.7.0-BEN-14238-g= 373c6049b336 #30 [ 352.313283] Hardware name: Intel Corporation PURLEY/PURLEY, BIOS PLYXCRB= 1.86B.0578.D07.1902280810 02/28/2019 [ 352.324308] RIP: 0010:memset_erms+0x9/0x10 [ 352.328905] Code: c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 = 0f af c6 f3 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 aa 4c 89 c8 c3 90 49 89 fa 40 0f b6 ce 48 b8 01 01 01 01 01 01 [ 352.349953] RSP: 0018:ffffc90021b5fd50 EFLAGS: 00010202 ORIG_RAX: ffffff= ffffffff13 [ 352.358450] RAX: 00000000000000ff RBX: ffff88983ffd5000 RCX: 0000000065d= f0e40 [ 352.366457] RDX: 00000003a0000000 RSI: 00000000ffffffff RDI: ffffea039b2= 0f1c0 [ 352.374465] RBP: ffff88983ffd6c00 R08: 0000000000000000 R09: ffffea00610= 00000 [ 352.382473] R10: 0000000000000001 R11: 0000000000000000 R12: ffffea006f8= 08000 [ 352.390480] R13: 0000000001840000 R14: 000000000e800000 R15: ffff8997d7b= 764e0 [ 352.398487] FS: 00007f724ef81780(0000) GS:ffff8997df100000(0000) knlGS:= 0000000000000000 [ 352.407562] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 352.414011] CR2: 00007ffcd4070768 CR3: 000001178c722002 CR4: 00000000003= 606e0 [ 352.422056] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 00000000000= 00000 [ 352.430092] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 00000000000= 00400 [ 352.438115] Call Trace: [ 352.440865] remove_pfn_range_from_zone+0x3a/0x380 [ 352.446244] ? cpumask_next+0x17/0x20 [ 352.450361] memunmap_pages+0x17f/0x280 [ 352.454670] release_nodes+0x22a/0x260 [ 352.458888] __device_release_driver+0x172/0x220 [ 352.464070] device_driver_detach+0x3e/0xa0 [ 352.468753] unbind_store+0x113/0x130 [ 352.472868] kernfs_fop_write+0xdc/0x1c0 [ 352.477273] vfs_write+0xde/0x1d0 [ 352.482218] ksys_write+0x58/0xd0 [ 352.487207] do_syscall_64+0x5a/0x120 [ 352.492529] entry_SYSCALL_64_after_hwframe+0x49/0xb3 [ 352.499446] RIP: 0033:0x7f724f40b5e7 [ 352.504673] Code: Bad RIP value. [ 352.509484] RSP: 002b:00007ffcd40738f8 EFLAGS: 00000246 ORIG_RAX: 000000= 0000000001 [ 352.519213] RAX: ffffffffffffffda RBX: 00007f724ef816a8 RCX: 00007f724f4= 0b5e7 [ 352.528410] RDX: 0000000000000007 RSI: 00005617d7cd1277 RDI: 00000000000= 00003 [ 352.537573] RBP: 0000000000000003 R08: 00000000ffffffff R09: 00007ffcd40= 737d0 [ 352.546764] R10: 0000000000000000 R11: 0000000000000246 R12: 00005617d7c= d1277 [ 352.555929] R13: 0000000000000000 R14: 0000000000000007 R15: 00005617d7c= d1230 [ 370.353742] Built 2 zonelists, mobility grouping on. Total pages: 49050= 381 [ 370.373317] Policy zone: Normal [ 374.948164] Built 3 zonelists, mobility grouping on. Total pages: 49312= 525 [ 375.017496] Policy zone: Normal Fixes: commit d33695b16a9f ("mm/memory_hotplug: poison memmap in remove_pfn= _range_from_zone()") Reported-by: "Scargall, Steve" Reported-by: Ben Widawsky Cc: Dan Williams Cc: David Hildenbrand Cc: Vishal Verma Signed-off-by: Ben Widawsky --- mm/memory_hotplug.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 9b34e03e730a..da374cd3d45b 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -471,11 +471,20 @@ void __ref remove_pfn_range_from_zone(struct zone *zo= ne, unsigned long start_pfn, unsigned long nr_pages) { + const unsigned long end_pfn =3D start_pfn + nr_pages; struct pglist_data *pgdat =3D zone->zone_pgdat; - unsigned long flags; + unsigned long pfn, cur_nr_pages, flags; =20 /* Poison struct pages because they are now uninitialized again. */ - page_init_poison(pfn_to_page(start_pfn), sizeof(struct page) * nr_pages); + for (pfn =3D start_pfn; pfn < end_pfn; pfn +=3D cur_nr_pages) { + cond_resched(); + + /* Select all remaining pages up to the next section boundary */ + cur_nr_pages =3D + min(end_pfn - pfn, SECTION_ALIGN_UP(pfn + 1) - pfn); + page_init_poison(pfn_to_page(pfn), + sizeof(struct page) * cur_nr_pages); + } =20 #ifdef CONFIG_ZONE_DEVICE /* --=20 2.27.0