From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3BE54C07E95 for ; Mon, 19 Jul 2021 19:11:49 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B0B6F60720 for ; Mon, 19 Jul 2021 19:11:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B0B6F60720 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 377468D00EC; Mon, 19 Jul 2021 15:11:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 326E36B018E; Mon, 19 Jul 2021 15:11:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1C84C8D00EC; Mon, 19 Jul 2021 15:11:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0069.hostedemail.com [216.40.44.69]) by kanga.kvack.org (Postfix) with ESMTP id C03676B018D for ; Mon, 19 Jul 2021 15:11:48 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 5346A184B8516 for ; Mon, 19 Jul 2021 19:11:47 +0000 (UTC) X-FDA: 78380281854.14.936F783 Received: from mail-qk1-f180.google.com (mail-qk1-f180.google.com [209.85.222.180]) by imf11.hostedemail.com (Postfix) with ESMTP id DD377F00023B for ; Mon, 19 Jul 2021 19:11:46 +0000 (UTC) Received: by mail-qk1-f180.google.com with SMTP id bm6so4981626qkb.1 for ; Mon, 19 Jul 2021 12:11:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=p1X1NnRaF3JxNMoMGhIIYoJPgLnfABPGTk/ybbnFhr0=; b=T1YY7ocG3GYU+trEA4NBG0eY0idIAyxEk1UQ+GYdGnCf6U7o716G/BdiJ9UZJnudKE St/VciDoibErkG43qZbD4M/fdwpIAHIMHOPD+FAKMv2qPZfOIgdMZscvq3/1QJHmSU4k qssR6rRD+q6PlPJYwFzq84YOPbmG0k7e9MS2J2jhKH3sGHEyu5H3iXoi+l68lFC6bZ++ wCrsTh/deqGkQci5ZfEyUv+VJISnGggEn15BL6x8z/qFzzDGps30/GSvvZ4Gh9I+2pvk p9owmuIj5s8B4E7LfpY36C0+1HKb5e4rGJfy6vRmt94MEIaQhZYWQMgZsaUcH6kknVW1 op2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=p1X1NnRaF3JxNMoMGhIIYoJPgLnfABPGTk/ybbnFhr0=; b=PKlKvpM7+hw2a7xg9Ayg+JHv5zvVIub1MbPsWDSguQnzEkZDfB0lSpORvJsuRr+6dk CDs2kuFPV09pVcKwMP2im9vYm7z5DP8qAN5khcBxhDl2G1A7qm9eA5UH38zItF/dkB0T dMUOL5Xj6LXEiTGlbyQ7gnx/mEeczoUEJyGQoZZlL1RREDhSF1PlsXTU8qAed7rdYBwW 5X7xZv6IMEVzz9u1BsSu4rOhuAxihABjQcx3QCCioZ1PTbpYWf6FZ9DgGfl9YiNMVHVE v5gfg5DYPL7H6JOlEPzLL02uuxL8oOJAD4ekj4pLAjhk/CRHgCAht6DURg3PvJKc+o9b x4SA== X-Gm-Message-State: AOAM531NxSXv0uvYuZtmaXxuxgyiZ5fgHRwLQI8b3PN16d6YK8VLZFpl RjJEL0yIrooabObTngLkNCzIQQ== X-Google-Smtp-Source: ABdhPJwq3V2PoBBWjJ9yuw0Zhz/FIwAwJFeapM9Adh34zwdaKISkSVXa0QJvgb0c0GMIfJvRfksUFw== X-Received: by 2002:a37:91c7:: with SMTP id t190mr25225504qkd.282.1626721905724; Mon, 19 Jul 2021 12:11:45 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id i21sm6911109qti.45.2021.07.19.12.11.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Jul 2021 12:11:44 -0700 (PDT) Date: Mon, 19 Jul 2021 12:11:21 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Peter Xu cc: Igor Raits , Hugh Dickins , Andrew Morton , Hillf Danton , Axel Rasmussen , linux-mm@kvack.org Subject: Re: kernel BUG at include/linux/swapops.h:204! In-Reply-To: Message-ID: <796cbb7-5a1c-1ba0-dde5-479aba8224f2@google.com> References: <4c9e24db-29d5-5bbb-17ae-8dc32ceb66ed@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b=T1YY7ocG; spf=pass (imf11.hostedemail.com: domain of hughd@google.com designates 209.85.222.180 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam02 X-Stat-Signature: wnwech8qdpk8pn5srkt4t3w7ze8sijnm X-Rspamd-Queue-Id: DD377F00023B X-HE-Tag: 1626721906-619848 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Peter, I believe you have already fixed this, but the fix needs to go to stable. Sorry, the messages below are a muddle of top and middle posting, I'll resume at the bottom. On Fri, 16 Jul 2021, Hugh Dickins wrote: > On Thu, 15 Jul 2021, Igor Raits wrote: > > > Hi everyone again, > > > > I've been trying to reproduce this issue but still can't find a consistent > > pattern. > > > > However, it did happen once more and this time on 5.13.1: > > Thanks for the updates, Igor. > > I have to admit that what you have reported confirms the suspicion > that it's a bug introduced by one of my "stable" patches in 5.12.14 > (which are also in 5.13): nothing else between 5.12.12 and 5.12.14 > seems likely to be relevant. > > But I've gone back and forth and not been able to spot the problem. > > Please would you send (either privately to me, or to the list) your > 5.13.1 kernel's .config, and disassembly of pmd_migration_entry_wait() > from its vmlinux (with line numbers if available; or just send the > whole vmlinux if that's easier, and I'll disassemble). > > I am hoping that the disassembly, together with the register contents > that you've shown, will help guide towards an answer. > > Thanks, > Hugh > > > > > [ 222.068216] ------------[ cut here ]------------ > > [ 222.072884] kernel BUG at include/linux/swapops.h:204! > > [ 222.078062] invalid opcode: 0000 [#1] SMP NOPTI > > [ 222.082618] CPU: 38 PID: 9828 Comm: rpc-worker Tainted: G E > > 5.13.1-1.gdc.el8.x86_64 #1 > > [ 222.091894] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 > > Gen10, BIOS U30 05/24/2021 > > [ 222.100468] RIP: 0010:pmd_migration_entry_wait+0x132/0x140 > > [ 222.105994] Code: 02 00 00 00 5b 4c 89 c7 5d e9 ca c5 f6 ff 48 81 e2 00 > > f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44 ff ff ff > > <0f> 0b 48 8b 2d 65 48 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41 55 48 > > [ 222.124878] RSP: 0000:ffffbcfe9eb7bdd8 EFLAGS: 00010246 > > [ 222.130134] RAX: 0057ffffc0000000 RBX: ffff9eec4b3cdbf8 RCX: > > ffffffffffffffff > > [ 222.137309] RDX: 0000000000000000 RSI: ffff9eec4b3cdbf8 RDI: > > ffffdf55c52cf368 > > [ 222.144485] RBP: ffffdf55c52cf368 R08: ffffdf57428d8080 R09: > > 0000000000000000 > > [ 222.151661] R10: 0000000000000000 R11: 0000000000000000 R12: > > 0000000000000bf8 > > [ 222.158837] R13: 0400000000000000 R14: 0400000000000080 R15: > > ffff9eec2825b1f8 > > [ 222.166015] FS: 00007f6754aeb700(0000) GS:ffff9f49bfd00000(0000) > > knlGS:0000000000000000 > > [ 222.174153] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 222.179932] CR2: 00007f676ffffd98 CR3: 000000012bf6a002 CR4: > > 00000000007726e0 > > [ 222.187109] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > > 0000000000000000 > > [ 222.194283] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > > 0000000000000400 > > [ 222.201457] PKRU: 55555554 > > [ 222.204178] Call Trace: > > [ 222.206638] __handle_mm_fault+0x5ad/0x6e0 > > [ 222.210760] ? sysvec_call_function_single+0xb/0x90 > > [ 222.215672] handle_mm_fault+0xc5/0x290 > > [ 222.219529] do_user_addr_fault+0x1a9/0x660 > > [ 222.223740] ? sched_clock_cpu+0xc/0xa0 > > [ 222.227602] exc_page_fault+0x68/0x130 > > [ 222.231373] ? asm_exc_page_fault+0x8/0x30 > > [ 222.235495] asm_exc_page_fault+0x1e/0x30 > > [ 222.239526] RIP: 0033:0x7f67baaed734 > > [ 222.243120] Code: 89 08 48 8b 35 dd 3b 21 00 4c 8d 0d d6 3b 21 00 31 c0 > > 4c 39 ce 74 73 0f 1f 80 00 00 00 00 48 8d 96 40 fd ff ff 49 39 d2 74 22 > > <48> 8b 96 d8 03 00 00 48 01 15 4e 7c 21 00 80 be 50 03 00 00 00 c7 > > [ 222.262002] RSP: 002b:00007f6754aea298 EFLAGS: 00010287 > > [ 222.267257] RAX: 0000000000000000 RBX: 0000000000000000 RCX: > > 0000000000000000 > > [ 222.274432] RDX: 00007f676ffff700 RSI: 00007f676ffff9c0 RDI: > > 00007f676f7fec10 > > [ 222.281609] RBP: 0000000000000001 R08: 00007f676f7fed10 R09: > > 00007f67bad012f0 > > [ 222.288785] R10: 00007f6754aeb700 R11: 0000000000000202 R12: > > 0000000000000001 > > [ 222.295961] R13: 0000000000000006 R14: 0000000000000e28 R15: > > 00007f674006e1f0 > > [ 222.303137] Modules linked in: vhost_net(E) vhost(E) vhost_iotlb(E) > > tap(E) tun(E) tcp_diag(E) udp_diag(E) inet_diag(E) netconsole(E) > > nf_tables(E) vxlan(E) ip6_udp_tunnel(E) udp_tunnel(E) nfnetlink(E) > > binfmt_misc(E) iscsi_tcp(E) libiscsi_tcp(E) 8021q(E) garp(E) mrp(E) > > bonding(E) tls(E) vfat(E) fat(E) dm_service_time(E) dm_multipath(E) > > rpcrdma(E) sunrpc(E) rdma_ucm(E) ib_srpt(E) ib_isert(E) iscsi_target_mod(E) > > target_core_mod(E) ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) libiscsi(E) > > intel_rapl_msr(E) intel_rapl_common(E) scsi_transport_iscsi(E) > > isst_if_common(E) ipmi_ssif(E) nfit(E) libnvdimm(E) x86_pkg_temp_thermal(E) > > intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) > > crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) rapl(E) qedr(E) > > mei_me(E) acpi_ipmi(E) ib_uverbs(E) intel_cstate(E) ipmi_si(E) ib_core(E) > > ipmi_devintf(E) dm_mod(E) ioatdma(E) ses(E) intel_uncore(E) pcspkr(E) > > enclosure(E) mei(E) hpwdt(E) hpilo(E) lpc_ich(E) intel_pch_thermal(E) > > dca(E) ipmi_msghandler(E) > > [ 222.303181] acpi_power_meter(E) ext4(E) mbcache(E) jbd2(E) sd_mod(E) > > t10_pi(E) sg(E) qedf(E) qede(E) libfcoe(E) qed(E) libfc(E) smartpqi(E) > > scsi_transport_fc(E) tg3(E) scsi_transport_sas(E) crc8(E) wmi(E) > > nf_conntrack(E) libcrc32c(E) crc32c_intel(E) nf_defrag_ipv6(E) > > nf_defrag_ipv4(E) br_netfilter(E) bridge(E) stp(E) llc(E) > > [ 222.420050] ---[ end trace bcf7b6d1610cc21f ]--- > > [ 222.572925] RIP: 0010:pmd_migration_entry_wait+0x132/0x140 > > [ 222.578469] Code: 02 00 00 00 5b 4c 89 c7 5d e9 ca c5 f6 ff 48 81 e2 00 > > f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44 ff ff ff > > <0f> 0b 48 8b 2d 65 48 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41 55 48 > > [ 222.597359] RSP: 0000:ffffbcfe9eb7bdd8 EFLAGS: 00010246 > > [ 222.602620] RAX: 0057ffffc0000000 RBX: ffff9eec4b3cdbf8 RCX: > > ffffffffffffffff > > [ 222.609807] RDX: 0000000000000000 RSI: ffff9eec4b3cdbf8 RDI: > > ffffdf55c52cf368 > > [ 222.616990] RBP: ffffdf55c52cf368 R08: ffffdf57428d8080 R09: > > 0000000000000000 > > [ 222.624177] R10: 0000000000000000 R11: 0000000000000000 R12: > > 0000000000000bf8 > > [ 222.631361] R13: 0400000000000000 R14: 0400000000000080 R15: > > ffff9eec2825b1f8 > > [ 222.638548] FS: 00007f6754aeb700(0000) GS:ffff9f49bfd00000(0000) > > knlGS:0000000000000000 > > [ 222.646694] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 222.652481] CR2: 00007f676ffffd98 CR3: 000000012bf6a002 CR4: > > 00000000007726e0 > > [ 222.659665] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > > 0000000000000000 > > [ 222.666850] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > > 0000000000000400 > > [ 222.674031] PKRU: 55555554 > > [ 222.676758] Kernel panic - not syncing: Fatal exception > > [ 222.817538] Kernel Offset: 0x16000000 from 0xffffffff81000000 > > (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > > [ 222.965540] ---[ end Kernel panic - not syncing: Fatal exception ]--- > > > > On Sun, Jul 11, 2021 at 8:06 AM Igor Raits wrote: > > > > > Hi Hugh, > > > > > > On Sun, Jul 11, 2021 at 6:17 AM Hugh Dickins wrote: > > > > > >> On Sat, 10 Jul 2021, Igor Raits wrote: > > >> > > >> > Hello, > > >> > > > >> > I've seen one weird bug on 5.12.14 that happened a couple of times when > > >> I > > >> > started a bunch of VMs on a server. > > >> > > >> Would it be possible for you to try the same on a 5.12.13 kernel? > > >> Perhaps by reverting the diff between 5.12.13 and 5.12.14 temporarily. > > >> Enough to form an impression of whether the issue is new in 5.12.14. > > >> > > > > > > We've been using 5.12.12 for quite some time (~ a month) and I never saw > > > it there. > > > > > > But I have to admit that I don't really have a reproducer. For example, on > > > servers where it happened, > > > I just rebooted them and panic did not happen anymore (so I saw it only > > > only once, > > > only on 2 servers out of 32 that we have on 5.12.14). > > > > > > > > >> I ask because 5.12.14 did include several fixes and cleanups from me > > >> to page_vma_mapped_walk(), and that is involved in inserting and > > >> removing pmd migration entries. I am not aware of introducing any > > >> bug there, but your report has got me worried. If it's happening in > > >> 5.12.14 but not in 5.12.13, then I must look again at my changes. > > >> > > >> I don't expect Hillf's patch to help at at all: the pmd_lock() > > >> is supposed to be taken by page_vma_mapped_walk(), before > > >> set_pmd_migration_entry() and remove_migration_pmd() are called. > > >> > > >> Thanks, > > >> Hugh > > >> > > >> > > > >> > I've briefly googled this problem but could not find any relevant commit > > >> > that would fix this issue. > > >> > > > >> > Do you have any hint how to debug this further or know the fix by any > > >> > chance? > > >> > > > >> > Thanks in advance. Stack trace following: > > >> > > > >> > [ 376.876610] ------------[ cut here ]------------ > > >> > [ 376.881274] kernel BUG at include/linux/swapops.h:204! > > >> > [ 376.886455] invalid opcode: 0000 [#1] SMP NOPTI > > >> > [ 376.891014] CPU: 40 PID: 11775 Comm: rpc-worker Tainted: G > > >> E > > >> > 5.12.14-1.gdc.el8.x86_64 #1 > > >> > [ 376.900464] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 > > >> > Gen10, BIOS U30 05/24/2021 > > >> > [ 376.909038] RIP: 0010:pmd_migration_entry_wait+0x132/0x140 > > >> > [ 376.914562] Code: 02 00 00 00 5b 4c 89 c7 5d e9 8a e4 f6 ff 48 81 e2 > > >> 00 > > >> > f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44 ff ff ff > > >> > <0f> 0b 48 8b 2d 75 bd 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41 55 48 > > >> > [ 376.933443] RSP: 0000:ffffb65a5e1cfdc8 EFLAGS: 00010246 > > >> > [ 376.938701] RAX: 0017ffffc0000000 RBX: ffff908b8ecabaf8 RCX: > > >> > ffffffffffffffff > > >> > [ 376.945878] RDX: 0000000000000000 RSI: ffff908b8ecabaf8 RDI: > > >> > fffff497473b2ae8 > > >> > [ 376.953055] RBP: fffff497473b2ae8 R08: fffff49747fa8080 R09: > > >> > 0000000000000000 > > >> > [ 376.960230] R10: 0000000000000000 R11: 0000000000000000 R12: > > >> > 0000000000000af8 > > >> > [ 376.967407] R13: 0400000000000000 R14: 0400000000000080 R15: > > >> > ffff908bbef7b6a8 > > >> > [ 376.974582] FS: 00007f5bb1f81700(0000) GS:ffff90e87fd80000(0000) > > >> > knlGS:0000000000000000 > > >> > [ 376.982718] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > >> > [ 376.988497] CR2: 00007f5b2bfffd98 CR3: 00000001f793e006 CR4: > > >> > 00000000007726e0 > > >> > [ 376.995673] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > > >> > 0000000000000000 > > >> > [ 377.002849] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > > >> > 0000000000000400 > > >> > [ 377.010026] PKRU: 55555554 > > >> > [ 377.012745] Call Trace: > > >> > [ 377.015207] __handle_mm_fault+0x5ad/0x6e0 > > >> > [ 377.019335] handle_mm_fault+0xc5/0x290 > > >> > [ 377.023194] do_user_addr_fault+0x1cd/0x740 > > >> > [ 377.027406] exc_page_fault+0x54/0x110 > > >> > [ 377.031182] ? asm_exc_page_fault+0x8/0x30 > > >> > [ 377.035307] asm_exc_page_fault+0x1e/0x30 > > >> > [ 377.039340] RIP: 0033:0x7f5bb91d6734 > > >> > [ 377.042937] Code: 89 08 48 8b 35 dd 3b 21 00 4c 8d 0d d6 3b 21 00 31 > > >> c0 > > >> > 4c 39 ce 74 73 0f 1f 80 00 00 00 00 48 8d 96 40 fd ff ff 49 39 d2 74 22 > > >> > <48> 8b 96 d8 03 00 00 48 01 15 4e 7c 21 00 80 be 50 03 00 00 00 c7 > > >> > [ 377.061820] RSP: 002b:00007f5bb1f7ff58 EFLAGS: 00010206 > > >> > [ 377.067076] RAX: 0000000000000000 RBX: 0000000000000000 RCX: > > >> > 00007f5ba0000020 > > >> > [ 377.074255] RDX: 00007f5b2bfff700 RSI: 00007f5b2bfff9c0 RDI: > > >> > 0000000000000001 > > >> > [ 377.081429] RBP: 0000000000000001 R08: 0000000000000000 R09: > > >> > 00007f5bb93ea2f0 > > >> > [ 377.088606] R10: 00007f5bb1f81700 R11: 0000000000000202 R12: > > >> > 0000000000000001 > > >> > [ 377.095782] R13: 0000000000000006 R14: 0000000000000cb4 R15: > > >> > 00007f5bb1f801f0 > > >> > [ 377.102958] Modules linked in: ebt_arp(E) nft_meta_bridge(E) > > >> > ip6_tables(E) xt_CT(E) nf_log_ipv4(E) nf_log_common(E) nft_limit(E) > > >> > nft_counter(E) xt_LOG(E) xt_limit(E) xt_mac(E) xt_set(E) xt_multiport(E) > > >> > xt_state(E) xt_conntrack(E) xt_comment(E) xt_physdev(E) nft_compat(E) > > >> > ip_set_hash_net(E) ip_set(E) vhost_net(E) vhost(E) vhost_iotlb(E) tap(E) > > >> > tun(E) tcp_diag(E) udp_diag(E) inet_diag(E) netconsole(E) nf_tables(E) > > >> > vxlan(E) ip6_udp_tunnel(E) udp_tunnel(E) nfnetlink(E) binfmt_misc(E) > > >> > iscsi_tcp(E) libiscsi_tcp(E) 8021q(E) garp(E) mrp(E) bonding(E) tls(E) > > >> > vfat(E) fat(E) dm_service_time(E) dm_multipath(E) rpcrdma(E) sunrpc(E) > > >> > rdma_ucm(E) ib_srpt(E) ib_isert(E) iscsi_target_mod(E) > > >> target_core_mod(E) > > >> > ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) libiscsi(E) > > >> scsi_transport_iscsi(E) > > >> > intel_rapl_msr(E) qedr(E) intel_rapl_common(E) ib_uverbs(E) > > >> > isst_if_common(E) ib_core(E) nfit(E) libnvdimm(E) > > >> x86_pkg_temp_thermal(E) > > >> > intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) > > >> > crct10dif_pclmul(E) > > >> > [ 377.102999] crc32_pclmul(E) ghash_clmulni_intel(E) rapl(E) > > >> > intel_cstate(E) ipmi_ssif(E) acpi_ipmi(E) ipmi_si(E) mei_me(E) > > >> ioatdma(E) > > >> > ipmi_devintf(E) dm_mod(E) ses(E) intel_uncore(E) pcspkr(E) qede(E) > > >> > enclosure(E) tg3(E) mei(E) lpc_ich(E) hpilo(E) hpwdt(E) > > >> > intel_pch_thermal(E) dca(E) ipmi_msghandler(E) acpi_power_meter(E) > > >> ext4(E) > > >> > mbcache(E) jbd2(E) sd_mod(E) t10_pi(E) sg(E) qedf(E) qed(E) crc8(E) > > >> > libfcoe(E) libfc(E) smartpqi(E) scsi_transport_fc(E) > > >> scsi_transport_sas(E) > > >> > wmi(E) nf_conntrack(E) nf_defrag_ipv6(E) libcrc32c(E) crc32c_intel(E) > > >> > nf_defrag_ipv4(E) br_netfilter(E) bridge(E) stp(E) llc(E) > > >> > [ 377.243468] ---[ end trace 04bce3bb051f7620 ]--- > > >> > [ 377.385645] RIP: 0010:pmd_migration_entry_wait+0x132/0x140 > > >> > [ 377.391194] Code: 02 00 00 00 5b 4c 89 c7 5d e9 8a e4 f6 ff 48 81 e2 > > >> 00 > > >> > f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44 ff ff ff > > >> > <0f> 0b 48 8b 2d 75 bd 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41 55 48 > > >> > [ 377.410091] RSP: 0000:ffffb65a5e1cfdc8 EFLAGS: 00010246 > > >> > [ 377.415355] RAX: 0017ffffc0000000 RBX: ffff908b8ecabaf8 RCX: > > >> > ffffffffffffffff > > >> > [ 377.422540] RDX: 0000000000000000 RSI: ffff908b8ecabaf8 RDI: > > >> > fffff497473b2ae8 > > >> > [ 377.429721] RBP: fffff497473b2ae8 R08: fffff49747fa8080 R09: > > >> > 0000000000000000 > > >> > [ 377.436902] R10: 0000000000000000 R11: 0000000000000000 R12: > > >> > 0000000000000af8 > > >> > [ 377.444086] R13: 0400000000000000 R14: 0400000000000080 R15: > > >> > ffff908bbef7b6a8 > > >> > [ 377.451272] FS: 00007f5bb1f81700(0000) GS:ffff90e87fd80000(0000) > > >> > knlGS:0000000000000000 > > >> > [ 377.459415] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > >> > [ 377.465196] CR2: 00007f5b2bfffd98 CR3: 00000001f793e006 CR4: > > >> > 00000000007726e0 > > >> > [ 377.472377] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > > >> > 0000000000000000 > > >> > [ 377.479556] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > > >> > 0000000000000400 > > >> > [ 377.486738] PKRU: 55555554 > > >> > [ 377.489465] Kernel panic - not syncing: Fatal exception > > >> > [ 377.573911] Kernel Offset: 0xa000000 from 0xffffffff81000000 > > >> (relocation > > >> > range: 0xffffffff80000000-0xffffffffbfffffff) > > >> > [ 377.716482] ---[ end Kernel panic - not syncing: Fatal exception ]--- Disassembly of the vmlinux Igor sent (along with other info) confirmed something I suspected, that R08: fffff49747fa8080 in one of the dumps, R08: ffffdf57428d8080 in the other, is the relevant struct page pointer (and RAX the page->flags, which look like it was pointing at a good page). A page pointer ....8080 in pmd_migration_entry_wait() is interesting: normally I'd expect that to be ....0000 or ....8000, pointing to the head of a huge page. But instead it's pointing to the second tail (though by now that compound page has been freed, and head pointers in the tails reset to 0): as if the pfn has been incremented by 2 somehow. And if the pfn (swp_offset) in the migration entry has got corrupted, then it's no surprise that when removing migration entries, page_vma_mapped_walk() would see migration_entry_to_page(entry) != page, so be unable to replace that migration entry, leaving it behind for the user to hit BUG_ON(!PageLocked) in pmd_migration_entry_wait() when faulting on it later. So, what might increment the swp_offset by 2? Hunt around the encodings. Hmm, _PAGE_BIT_UFFD_WP is _PAGE_BIT_SOFTW2 which is bit 10, whereas _PAGE_BIT_PROTNONE (top bit to be avoided in pte encoding of swap) is _PAGE_BIT_GLOBAL is bit 8. After overcoming off-by-one confusions, it looks like if something somewhere were to set _PAGE_BIT_UFFD_WP in a migration pmd (whereas it's only suitable for a present pmd), it would indeed increment the swp_offset by 2. Hunt for uffd_wps, and run across copy_huge_pmd() in mm/huge_memory.c: in Igor's 5.13.1 and 5.12.14 and many others, that says if (!(vma->vm_flags & VM_UFFD_WP)) pmd = pmd_clear_uffd_wp(pmd); just *before* checking is_swap_pmd(). Fixed in 5.14-rc1 in commit 8f34f1eac382 ("mm/userfaultfd: fix uffd-wp special cases for fork()"). But clearing the bit would be harmless, wouldn't it? Because it wouldn't be set anyway. Waste a day before remembering what I never forgot but somehow blanked out: the L1TF "feature" forced us to invert the offset bits in the pte encoding of a swap entry, so there really is a bit set there in the pmd entry, and clearing it has the effect of setting it in the corresponding swap entry, so incrementing the migration pfn by 2. I cannot explain why Igor never saw this crash on 5.12.12: maybe something else in the environment changed around that time. And it will take several days for it to be confirmed as the fix in practice. But I'm confident that 8f34f1eac382 will prove to be the fix, so Peter please prepare some backports of that for the various stable/longterm kernels that need it - I've not looked into whether it applies cleanly, or depends on other commits too. You fixed several related but different things in that commit: but this one is the worst, because it can corrupt even those who are not using UFFD_WP at all. Many thans for reporting and helping, Igor. Hugh p.s. Peter, unrelated to this particular bug, and should not divert from fixing it: but looking again at those swap encodings, and particularly the soft_dirty manipulations: they look very fragile. I think uffd_wp was wrong to follow that bad example, and your upcoming new encoding (that I have previously called elegant) takes it a worse step further. I think we should change to a rule where the architecture-independent swp_entry_t contains *all* the info, including bits for soft_dirty and uffd_wp, so that swap entry cases can move immediately to decoding from arch-dependent pte to arch-independent swp_entry_t, and do all the manipulations on that. But I don't have time to make that change, and probably neither do you, and making the change is liable to introduce errors itself. So, no immediate plans, but please keep in mind.