From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=4OSM=MM=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.7 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_KAM_HTML_FONT_INVALID,
	URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 90C4BC636C8
	for <linux-mm@archiver.kernel.org>; Tue, 20 Jul 2021 07:47:17 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id D712661165
	for <linux-mm@archiver.kernel.org>; Tue, 20 Jul 2021 07:47:16 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D712661165
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gooddata.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 802206B00A5; Tue, 20 Jul 2021 03:47:17 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 7893A6B00A6; Tue, 20 Jul 2021 03:47:17 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 58C2D8D0001; Tue, 20 Jul 2021 03:47:17 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0084.hostedemail.com [216.40.44.84])
	by kanga.kvack.org (Postfix) with ESMTP id 0CB166B00A5
	for <linux-mm@kvack.org>; Tue, 20 Jul 2021 03:47:17 -0400 (EDT)
Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay03.hostedemail.com (Postfix) with ESMTP id 56FC08248047
	for <linux-mm@kvack.org>; Tue, 20 Jul 2021 07:47:15 +0000 (UTC)
X-FDA: 78382185630.24.47AB3D0
Received: from mail-ed1-f54.google.com (mail-ed1-f54.google.com [209.85.208.54])
	by imf27.hostedemail.com (Postfix) with ESMTP id C47B170009E1
	for <linux-mm@kvack.org>; Tue, 20 Jul 2021 07:47:14 +0000 (UTC)
Received: by mail-ed1-f54.google.com with SMTP id t3so27290880edc.7
        for <linux-mm@kvack.org>; Tue, 20 Jul 2021 00:47:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gooddata.com; s=google;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=yAITm9wG6OnbEvxwA4te0BUTZURBLqIslcQUxDmDJxo=;
        b=mNp3PtZQhh4fZelAHQGx8zmUcpVhLjBav18N4YKqDfvle6C2M6paUX4gppEcxj89ty
         7KhG7zgA5cQbfv9tPgOQC9xRMLSGgxvFBL+F6NblXAXkNynhb8MIKUAOalplsiValXVU
         hTgDvFqv/wdXE/qX4UtWOpWt2kxPsIN8jME4U=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=yAITm9wG6OnbEvxwA4te0BUTZURBLqIslcQUxDmDJxo=;
        b=bWHl5iJQINMRDs8ejh0UshSgaaqLfUMUCHqZsDR/jw9U/DW0f+Biou/yBRtvHId5Rq
         kW6RmHDqUdyLM0DBrcZclXp8sq0cxcIvqmAEqrRxdBHWzFSeDUTP0l0c4wr424tdiD8O
         GEg+uHw25MqoawKctT3NpjZU41caSWEhCMk9Hm0aNAOkb1xt+5lKqmh5SiJRAtwOCq5j
         VPE4J6PSasTJL8tNm9KQyPUA5TUf4ThL+IOmNBq5C1W4yNoeWsCXNkzCTDtQzT3ucyCM
         HEFPDB8RV+x8eZejcKkWjK+bafFpoyR73oHEH51e15s2t6WkVkhaHb+uXp3E4/0dSYw1
         Zi2A==
X-Gm-Message-State: AOAM531PFXobr0gY+VCvPgeOqdRFiXZwVrOGtQuvaJpc73qd3saBjQc9
	x5efpiIi8a3QgdO8Jje01JTaeaWmmJLSSunS0jJRyA==
X-Google-Smtp-Source: ABdhPJyjwZGIL7Wl3CP0dEAJZeFG9S4DkELrYkg9THNlQUKBsfpbGqm1fFPPRso/ji6S+t3YRM8lNBPBwZ4XqCUG6M8=
X-Received: by 2002:aa7:db54:: with SMTP id n20mr39214688edt.21.1626767233338;
 Tue, 20 Jul 2021 00:47:13 -0700 (PDT)
MIME-Version: 1.0
References: <CA+9S74hk0E=ju4jH95RBWMKFCGyfE6fb1Mgww3c4tmgzjfR8Og@mail.gmail.com>
 <4c9e24db-29d5-5bbb-17ae-8dc32ceb66ed@google.com> <CA+9S74i1kqAEXt6GjPpiWsCeBOxp0MFvdGsKmf=MFVogMGbzKg@mail.gmail.com>
 <CA+9S74gGoTsV=02suZ6oqUKvO-zKo3o1Ag8_gxL0QeKZ5RCeRw@mail.gmail.com>
 <e9baeaa-b25b-4d9b-de5e-bae678e5e089@google.com> <796cbb7-5a1c-1ba0-dde5-479aba8224f2@google.com>
 <YPX46x/pet5Sn5gC@t490s>
In-Reply-To: <YPX46x/pet5Sn5gC@t490s>
From: Igor Raits <igor@gooddata.com>
Date: Tue, 20 Jul 2021 09:47:02 +0200
Message-ID: <CA+9S74h71fUBwsH3PQb8_u=ZTP983FRfZeONaoXrXkZ7_Hickw@mail.gmail.com>
Subject: Re: kernel BUG at include/linux/swapops.h:204!
To: Peter Xu <peterx@redhat.com>
Cc: Hugh Dickins <hughd@google.com>, Andrew Morton <akpm@linux-foundation.org>, 
	Hillf Danton <hdanton@sina.com>, Axel Rasmussen <axelrasmussen@google.com>, linux-mm@kvack.org
Content-Type: multipart/alternative; boundary="00000000000013642705c7894307"
Authentication-Results: imf27.hostedemail.com;
	dkim=pass header.d=gooddata.com header.s=google header.b=mNp3PtZQ;
	spf=pass (imf27.hostedemail.com: domain of igor.raits@gooddata.com designates 209.85.208.54 as permitted sender) smtp.mailfrom=igor.raits@gooddata.com;
	dmarc=pass (policy=none) header.from=gooddata.com
X-Rspamd-Server: rspam03
X-Rspamd-Queue-Id: C47B170009E1
X-Stat-Signature: xeweznaweq4afxz91rpbsof3fttkn4gw
X-HE-Tag: 1626767234-176871
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

--00000000000013642705c7894307
Content-Type: text/plain; charset="UTF-8"

On Tue, Jul 20, 2021 at 12:13 AM Peter Xu <peterx@redhat.com> wrote:

> On Mon, Jul 19, 2021 at 12:11:21PM -0700, Hugh Dickins wrote:
> > Hi Peter,
>
> Hi, Hugh,
>
> >
> > I believe you have already fixed this, but the fix needs to go to stable.
> > Sorry, the messages below are a muddle of top and middle posting,
> > I'll resume at the bottom.
> >
> > On Fri, 16 Jul 2021, Hugh Dickins wrote:
> > > On Thu, 15 Jul 2021, Igor Raits wrote:
> > >
> > > > Hi everyone again,
> > > >
> > > > I've been trying to reproduce this issue but still can't find a
> consistent
> > > > pattern.
> > > >
> > > > However, it did happen once more and this time on 5.13.1:
> > >
> > > Thanks for the updates, Igor.
> > >
> > > I have to admit that what you have reported confirms the suspicion
> > > that it's a bug introduced by one of my "stable" patches in 5.12.14
> > > (which are also in 5.13): nothing else between 5.12.12 and 5.12.14
> > > seems likely to be relevant.
> > >
> > > But I've gone back and forth and not been able to spot the problem.
> > >
> > > Please would you send (either privately to me, or to the list) your
> > > 5.13.1 kernel's .config, and disassembly of pmd_migration_entry_wait()
> > > from its vmlinux (with line numbers if available; or just send the
> > > whole vmlinux if that's easier, and I'll disassemble).
> > >
> > > I am hoping that the disassembly, together with the register contents
> > > that you've shown, will help guide towards an answer.
> > >
> > > Thanks,
> > > Hugh
> > >
> > > >
> > > > [  222.068216] ------------[ cut here ]------------
> > > > [  222.072884] kernel BUG at include/linux/swapops.h:204!
> > > > [  222.078062] invalid opcode: 0000 [#1] SMP NOPTI
> > > > [  222.082618] CPU: 38 PID: 9828 Comm: rpc-worker Tainted: G
>     E
> > > >   5.13.1-1.gdc.el8.x86_64 #1
> > > > [  222.091894] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380
> > > > Gen10, BIOS U30 05/24/2021
> > > > [  222.100468] RIP: 0010:pmd_migration_entry_wait+0x132/0x140
> > > > [  222.105994] Code: 02 00 00 00 5b 4c 89 c7 5d e9 ca c5 f6 ff 48 81
> e2 00
> > > > f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44 ff ff
> ff
> > > > <0f> 0b 48 8b 2d 65 48 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41 55 48
> > > > [  222.124878] RSP: 0000:ffffbcfe9eb7bdd8 EFLAGS: 00010246
> > > > [  222.130134] RAX: 0057ffffc0000000 RBX: ffff9eec4b3cdbf8 RCX:
> > > > ffffffffffffffff
> > > > [  222.137309] RDX: 0000000000000000 RSI: ffff9eec4b3cdbf8 RDI:
> > > > ffffdf55c52cf368
> > > > [  222.144485] RBP: ffffdf55c52cf368 R08: ffffdf57428d8080 R09:
> > > > 0000000000000000
> > > > [  222.151661] R10: 0000000000000000 R11: 0000000000000000 R12:
> > > > 0000000000000bf8
> > > > [  222.158837] R13: 0400000000000000 R14: 0400000000000080 R15:
> > > > ffff9eec2825b1f8
> > > > [  222.166015] FS:  00007f6754aeb700(0000) GS:ffff9f49bfd00000(0000)
> > > > knlGS:0000000000000000
> > > > [  222.174153] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [  222.179932] CR2: 00007f676ffffd98 CR3: 000000012bf6a002 CR4:
> > > > 00000000007726e0
> > > > [  222.187109] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > > 0000000000000000
> > > > [  222.194283] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > > > 0000000000000400
> > > > [  222.201457] PKRU: 55555554
> > > > [  222.204178] Call Trace:
> > > > [  222.206638]  __handle_mm_fault+0x5ad/0x6e0
> > > > [  222.210760]  ? sysvec_call_function_single+0xb/0x90
> > > > [  222.215672]  handle_mm_fault+0xc5/0x290
> > > > [  222.219529]  do_user_addr_fault+0x1a9/0x660
> > > > [  222.223740]  ? sched_clock_cpu+0xc/0xa0
> > > > [  222.227602]  exc_page_fault+0x68/0x130
> > > > [  222.231373]  ? asm_exc_page_fault+0x8/0x30
> > > > [  222.235495]  asm_exc_page_fault+0x1e/0x30
> > > > [  222.239526] RIP: 0033:0x7f67baaed734
> > > > [  222.243120] Code: 89 08 48 8b 35 dd 3b 21 00 4c 8d 0d d6 3b 21 00
> 31 c0
> > > > 4c 39 ce 74 73 0f 1f 80 00 00 00 00 48 8d 96 40 fd ff ff 49 39 d2 74
> 22
> > > > <48> 8b 96 d8 03 00 00 48 01 15 4e 7c 21 00 80 be 50 03 00 00 00 c7
> > > > [  222.262002] RSP: 002b:00007f6754aea298 EFLAGS: 00010287
> > > > [  222.267257] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> > > > 0000000000000000
> > > > [  222.274432] RDX: 00007f676ffff700 RSI: 00007f676ffff9c0 RDI:
> > > > 00007f676f7fec10
> > > > [  222.281609] RBP: 0000000000000001 R08: 00007f676f7fed10 R09:
> > > > 00007f67bad012f0
> > > > [  222.288785] R10: 00007f6754aeb700 R11: 0000000000000202 R12:
> > > > 0000000000000001
> > > > [  222.295961] R13: 0000000000000006 R14: 0000000000000e28 R15:
> > > > 00007f674006e1f0
> > > > [  222.303137] Modules linked in: vhost_net(E) vhost(E)
> vhost_iotlb(E)
> > > > tap(E) tun(E) tcp_diag(E) udp_diag(E) inet_diag(E) netconsole(E)
> > > > nf_tables(E) vxlan(E) ip6_udp_tunnel(E) udp_tunnel(E) nfnetlink(E)
> > > > binfmt_misc(E) iscsi_tcp(E) libiscsi_tcp(E) 8021q(E) garp(E) mrp(E)
> > > > bonding(E) tls(E) vfat(E) fat(E) dm_service_time(E) dm_multipath(E)
> > > > rpcrdma(E) sunrpc(E) rdma_ucm(E) ib_srpt(E) ib_isert(E)
> iscsi_target_mod(E)
> > > > target_core_mod(E) ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E)
> libiscsi(E)
> > > > intel_rapl_msr(E) intel_rapl_common(E) scsi_transport_iscsi(E)
> > > > isst_if_common(E) ipmi_ssif(E) nfit(E) libnvdimm(E)
> x86_pkg_temp_thermal(E)
> > > > intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E)
> > > > crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) rapl(E)
> qedr(E)
> > > > mei_me(E) acpi_ipmi(E) ib_uverbs(E) intel_cstate(E) ipmi_si(E)
> ib_core(E)
> > > > ipmi_devintf(E) dm_mod(E) ioatdma(E) ses(E) intel_uncore(E) pcspkr(E)
> > > > enclosure(E) mei(E) hpwdt(E) hpilo(E) lpc_ich(E) intel_pch_thermal(E)
> > > > dca(E) ipmi_msghandler(E)
> > > > [  222.303181]  acpi_power_meter(E) ext4(E) mbcache(E) jbd2(E)
> sd_mod(E)
> > > > t10_pi(E) sg(E) qedf(E) qede(E) libfcoe(E) qed(E) libfc(E)
> smartpqi(E)
> > > > scsi_transport_fc(E) tg3(E) scsi_transport_sas(E) crc8(E) wmi(E)
> > > > nf_conntrack(E) libcrc32c(E) crc32c_intel(E) nf_defrag_ipv6(E)
> > > > nf_defrag_ipv4(E) br_netfilter(E) bridge(E) stp(E) llc(E)
> > > > [  222.420050] ---[ end trace bcf7b6d1610cc21f ]---
> > > > [  222.572925] RIP: 0010:pmd_migration_entry_wait+0x132/0x140
> > > > [  222.578469] Code: 02 00 00 00 5b 4c 89 c7 5d e9 ca c5 f6 ff 48 81
> e2 00
> > > > f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44 ff ff
> ff
> > > > <0f> 0b 48 8b 2d 65 48 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41 55 48
> > > > [  222.597359] RSP: 0000:ffffbcfe9eb7bdd8 EFLAGS: 00010246
> > > > [  222.602620] RAX: 0057ffffc0000000 RBX: ffff9eec4b3cdbf8 RCX:
> > > > ffffffffffffffff
> > > > [  222.609807] RDX: 0000000000000000 RSI: ffff9eec4b3cdbf8 RDI:
> > > > ffffdf55c52cf368
> > > > [  222.616990] RBP: ffffdf55c52cf368 R08: ffffdf57428d8080 R09:
> > > > 0000000000000000
> > > > [  222.624177] R10: 0000000000000000 R11: 0000000000000000 R12:
> > > > 0000000000000bf8
> > > > [  222.631361] R13: 0400000000000000 R14: 0400000000000080 R15:
> > > > ffff9eec2825b1f8
> > > > [  222.638548] FS:  00007f6754aeb700(0000) GS:ffff9f49bfd00000(0000)
> > > > knlGS:0000000000000000
> > > > [  222.646694] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [  222.652481] CR2: 00007f676ffffd98 CR3: 000000012bf6a002 CR4:
> > > > 00000000007726e0
> > > > [  222.659665] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > > 0000000000000000
> > > > [  222.666850] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > > > 0000000000000400
> > > > [  222.674031] PKRU: 55555554
> > > > [  222.676758] Kernel panic - not syncing: Fatal exception
> > > > [  222.817538] Kernel Offset: 0x16000000 from 0xffffffff81000000
> > > > (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > > > [  222.965540] ---[ end Kernel panic - not syncing: Fatal exception
> ]---
> > > >
> > > > On Sun, Jul 11, 2021 at 8:06 AM Igor Raits <igor@gooddata.com>
> wrote:
> > > >
> > > > > Hi Hugh,
> > > > >
> > > > > On Sun, Jul 11, 2021 at 6:17 AM Hugh Dickins <hughd@google.com>
> wrote:
> > > > >
> > > > >> On Sat, 10 Jul 2021, Igor Raits wrote:
> > > > >>
> > > > >> > Hello,
> > > > >> >
> > > > >> > I've seen one weird bug on 5.12.14 that happened a couple of
> times when
> > > > >> I
> > > > >> > started a bunch of VMs on a server.
> > > > >>
> > > > >> Would it be possible for you to try the same on a 5.12.13 kernel?
> > > > >> Perhaps by reverting the diff between 5.12.13 and 5.12.14
> temporarily.
> > > > >> Enough to form an impression of whether the issue is new in
> 5.12.14.
> > > > >>
> > > > >
> > > > > We've been using 5.12.12 for quite some time (~ a month) and I
> never saw
> > > > > it there.
> > > > >
> > > > > But I have to admit that I don't really have a reproducer. For
> example, on
> > > > > servers where it happened,
> > > > > I just rebooted them and panic did not happen anymore (so I saw it
> only
> > > > > only once,
> > > > > only on 2 servers out of 32 that we have on 5.12.14).
> > > > >
> > > > >
> > > > >> I ask because 5.12.14 did include several fixes and cleanups from
> me
> > > > >> to page_vma_mapped_walk(), and that is involved in inserting and
> > > > >> removing pmd migration entries.  I am not aware of introducing any
> > > > >> bug there, but your report has got me worried.  If it's happening
> in
> > > > >> 5.12.14 but not in 5.12.13, then I must look again at my changes.
> > > > >>
> > > > >> I don't expect Hillf's patch to help at at all: the pmd_lock()
> > > > >> is supposed to be taken by page_vma_mapped_walk(), before
> > > > >> set_pmd_migration_entry() and remove_migration_pmd() are called.
> > > > >>
> > > > >> Thanks,
> > > > >> Hugh
> > > > >>
> > > > >> >
> > > > >> > I've briefly googled this problem but could not find any
> relevant commit
> > > > >> > that would fix this issue.
> > > > >> >
> > > > >> > Do you have any hint how to debug this further or know the fix
> by any
> > > > >> > chance?
> > > > >> >
> > > > >> > Thanks in advance. Stack trace following:
> > > > >> >
> > > > >> > [  376.876610] ------------[ cut here ]------------
> > > > >> > [  376.881274] kernel BUG at include/linux/swapops.h:204!
> > > > >> > [  376.886455] invalid opcode: 0000 [#1] SMP NOPTI
> > > > >> > [  376.891014] CPU: 40 PID: 11775 Comm: rpc-worker Tainted: G
> > > > >>   E
> > > > >> >     5.12.14-1.gdc.el8.x86_64 #1
> > > > >> > [  376.900464] Hardware name: HPE ProLiant DL380 Gen10/ProLiant
> DL380
> > > > >> > Gen10, BIOS U30 05/24/2021
> > > > >> > [  376.909038] RIP: 0010:pmd_migration_entry_wait+0x132/0x140
> > > > >> > [  376.914562] Code: 02 00 00 00 5b 4c 89 c7 5d e9 8a e4 f6 ff
> 48 81 e2
> > > > >> 00
> > > > >> > f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44
> ff ff ff
> > > > >> > <0f> 0b 48 8b 2d 75 bd 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41
> 55 48
> > > > >> > [  376.933443] RSP: 0000:ffffb65a5e1cfdc8 EFLAGS: 00010246
> > > > >> > [  376.938701] RAX: 0017ffffc0000000 RBX: ffff908b8ecabaf8 RCX:
> > > > >> > ffffffffffffffff
> > > > >> > [  376.945878] RDX: 0000000000000000 RSI: ffff908b8ecabaf8 RDI:
> > > > >> > fffff497473b2ae8
> > > > >> > [  376.953055] RBP: fffff497473b2ae8 R08: fffff49747fa8080 R09:
> > > > >> > 0000000000000000
> > > > >> > [  376.960230] R10: 0000000000000000 R11: 0000000000000000 R12:
> > > > >> > 0000000000000af8
> > > > >> > [  376.967407] R13: 0400000000000000 R14: 0400000000000080 R15:
> > > > >> > ffff908bbef7b6a8
> > > > >> > [  376.974582] FS:  00007f5bb1f81700(0000)
> GS:ffff90e87fd80000(0000)
> > > > >> > knlGS:0000000000000000
> > > > >> > [  376.982718] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > >> > [  376.988497] CR2: 00007f5b2bfffd98 CR3: 00000001f793e006 CR4:
> > > > >> > 00000000007726e0
> > > > >> > [  376.995673] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > > >> > 0000000000000000
> > > > >> > [  377.002849] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > > > >> > 0000000000000400
> > > > >> > [  377.010026] PKRU: 55555554
> > > > >> > [  377.012745] Call Trace:
> > > > >> > [  377.015207]  __handle_mm_fault+0x5ad/0x6e0
> > > > >> > [  377.019335]  handle_mm_fault+0xc5/0x290
> > > > >> > [  377.023194]  do_user_addr_fault+0x1cd/0x740
> > > > >> > [  377.027406]  exc_page_fault+0x54/0x110
> > > > >> > [  377.031182]  ? asm_exc_page_fault+0x8/0x30
> > > > >> > [  377.035307]  asm_exc_page_fault+0x1e/0x30
> > > > >> > [  377.039340] RIP: 0033:0x7f5bb91d6734
> > > > >> > [  377.042937] Code: 89 08 48 8b 35 dd 3b 21 00 4c 8d 0d d6 3b
> 21 00 31
> > > > >> c0
> > > > >> > 4c 39 ce 74 73 0f 1f 80 00 00 00 00 48 8d 96 40 fd ff ff 49 39
> d2 74 22
> > > > >> > <48> 8b 96 d8 03 00 00 48 01 15 4e 7c 21 00 80 be 50 03 00 00
> 00 c7
> > > > >> > [  377.061820] RSP: 002b:00007f5bb1f7ff58 EFLAGS: 00010206
> > > > >> > [  377.067076] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> > > > >> > 00007f5ba0000020
> > > > >> > [  377.074255] RDX: 00007f5b2bfff700 RSI: 00007f5b2bfff9c0 RDI:
> > > > >> > 0000000000000001
> > > > >> > [  377.081429] RBP: 0000000000000001 R08: 0000000000000000 R09:
> > > > >> > 00007f5bb93ea2f0
> > > > >> > [  377.088606] R10: 00007f5bb1f81700 R11: 0000000000000202 R12:
> > > > >> > 0000000000000001
> > > > >> > [  377.095782] R13: 0000000000000006 R14: 0000000000000cb4 R15:
> > > > >> > 00007f5bb1f801f0
> > > > >> > [  377.102958] Modules linked in: ebt_arp(E) nft_meta_bridge(E)
> > > > >> > ip6_tables(E) xt_CT(E) nf_log_ipv4(E) nf_log_common(E)
> nft_limit(E)
> > > > >> > nft_counter(E) xt_LOG(E) xt_limit(E) xt_mac(E) xt_set(E)
> xt_multiport(E)
> > > > >> > xt_state(E) xt_conntrack(E) xt_comment(E) xt_physdev(E)
> nft_compat(E)
> > > > >> > ip_set_hash_net(E) ip_set(E) vhost_net(E) vhost(E)
> vhost_iotlb(E) tap(E)
> > > > >> > tun(E) tcp_diag(E) udp_diag(E) inet_diag(E) netconsole(E)
> nf_tables(E)
> > > > >> > vxlan(E) ip6_udp_tunnel(E) udp_tunnel(E) nfnetlink(E)
> binfmt_misc(E)
> > > > >> > iscsi_tcp(E) libiscsi_tcp(E) 8021q(E) garp(E) mrp(E) bonding(E)
> tls(E)
> > > > >> > vfat(E) fat(E) dm_service_time(E) dm_multipath(E) rpcrdma(E)
> sunrpc(E)
> > > > >> > rdma_ucm(E) ib_srpt(E) ib_isert(E) iscsi_target_mod(E)
> > > > >> target_core_mod(E)
> > > > >> > ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) libiscsi(E)
> > > > >> scsi_transport_iscsi(E)
> > > > >> > intel_rapl_msr(E) qedr(E) intel_rapl_common(E) ib_uverbs(E)
> > > > >> > isst_if_common(E) ib_core(E) nfit(E) libnvdimm(E)
> > > > >> x86_pkg_temp_thermal(E)
> > > > >> > intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E)
> > > > >> > crct10dif_pclmul(E)
> > > > >> > [  377.102999]  crc32_pclmul(E) ghash_clmulni_intel(E) rapl(E)
> > > > >> > intel_cstate(E) ipmi_ssif(E) acpi_ipmi(E) ipmi_si(E) mei_me(E)
> > > > >> ioatdma(E)
> > > > >> > ipmi_devintf(E) dm_mod(E) ses(E) intel_uncore(E) pcspkr(E)
> qede(E)
> > > > >> > enclosure(E) tg3(E) mei(E) lpc_ich(E) hpilo(E) hpwdt(E)
> > > > >> > intel_pch_thermal(E) dca(E) ipmi_msghandler(E)
> acpi_power_meter(E)
> > > > >> ext4(E)
> > > > >> > mbcache(E) jbd2(E) sd_mod(E) t10_pi(E) sg(E) qedf(E) qed(E)
> crc8(E)
> > > > >> > libfcoe(E) libfc(E) smartpqi(E) scsi_transport_fc(E)
> > > > >> scsi_transport_sas(E)
> > > > >> > wmi(E) nf_conntrack(E) nf_defrag_ipv6(E) libcrc32c(E)
> crc32c_intel(E)
> > > > >> > nf_defrag_ipv4(E) br_netfilter(E) bridge(E) stp(E) llc(E)
> > > > >> > [  377.243468] ---[ end trace 04bce3bb051f7620 ]---
> > > > >> > [  377.385645] RIP: 0010:pmd_migration_entry_wait+0x132/0x140
> > > > >> > [  377.391194] Code: 02 00 00 00 5b 4c 89 c7 5d e9 8a e4 f6 ff
> 48 81 e2
> > > > >> 00
> > > > >> > f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44
> ff ff ff
> > > > >> > <0f> 0b 48 8b 2d 75 bd 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41
> 55 48
> > > > >> > [  377.410091] RSP: 0000:ffffb65a5e1cfdc8 EFLAGS: 00010246
> > > > >> > [  377.415355] RAX: 0017ffffc0000000 RBX: ffff908b8ecabaf8 RCX:
> > > > >> > ffffffffffffffff
> > > > >> > [  377.422540] RDX: 0000000000000000 RSI: ffff908b8ecabaf8 RDI:
> > > > >> > fffff497473b2ae8
> > > > >> > [  377.429721] RBP: fffff497473b2ae8 R08: fffff49747fa8080 R09:
> > > > >> > 0000000000000000
> > > > >> > [  377.436902] R10: 0000000000000000 R11: 0000000000000000 R12:
> > > > >> > 0000000000000af8
> > > > >> > [  377.444086] R13: 0400000000000000 R14: 0400000000000080 R15:
> > > > >> > ffff908bbef7b6a8
> > > > >> > [  377.451272] FS:  00007f5bb1f81700(0000)
> GS:ffff90e87fd80000(0000)
> > > > >> > knlGS:0000000000000000
> > > > >> > [  377.459415] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > >> > [  377.465196] CR2: 00007f5b2bfffd98 CR3: 00000001f793e006 CR4:
> > > > >> > 00000000007726e0
> > > > >> > [  377.472377] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > > >> > 0000000000000000
> > > > >> > [  377.479556] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > > > >> > 0000000000000400
> > > > >> > [  377.486738] PKRU: 55555554
> > > > >> > [  377.489465] Kernel panic - not syncing: Fatal exception
> > > > >> > [  377.573911] Kernel Offset: 0xa000000 from 0xffffffff81000000
> > > > >> (relocation
> > > > >> > range: 0xffffffff80000000-0xffffffffbfffffff)
> > > > >> > [  377.716482] ---[ end Kernel panic - not syncing: Fatal
> exception ]---
> >
> > Disassembly of the vmlinux Igor sent (along with other info) confirmed
> > something I suspected, that R08: fffff49747fa8080 in one of the dumps,
> > R08: ffffdf57428d8080 in the other, is the relevant struct page pointer
> > (and RAX the page->flags, which look like it was pointing at a good
> page).
> >
> > A page pointer ....8080 in pmd_migration_entry_wait() is interesting:
> > normally I'd expect that to be ....0000 or ....8000, pointing to the
> > head of a huge page.  But instead it's pointing to the second tail
> > (though by now that compound page has been freed, and head pointers in
> > the tails reset to 0): as if the pfn has been incremented by 2 somehow.
> >
> > And if the pfn (swp_offset) in the migration entry has got corrupted,
> > then it's no surprise that when removing migration entries,
> > page_vma_mapped_walk() would see migration_entry_to_page(entry) != page,
> > so be unable to replace that migration entry, leaving it behind for the
> > user to hit BUG_ON(!PageLocked) in pmd_migration_entry_wait() when
> > faulting on it later.
> >
> > So, what might increment the swp_offset by 2? Hunt around the encodings.
> > Hmm, _PAGE_BIT_UFFD_WP is _PAGE_BIT_SOFTW2 which is bit 10, whereas
> > _PAGE_BIT_PROTNONE (top bit to be avoided in pte encoding of swap)
> > is _PAGE_BIT_GLOBAL is bit 8. After overcoming off-by-one confusions,
> > it looks like if something somewhere were to set _PAGE_BIT_UFFD_WP
> > in a migration pmd (whereas it's only suitable for a present pmd),
> > it would indeed increment the swp_offset by 2.
> >
> > Hunt for uffd_wps, and run across copy_huge_pmd() in mm/huge_memory.c:
> > in Igor's 5.13.1 and 5.12.14 and many others, that says
> >       if (!(vma->vm_flags & VM_UFFD_WP))
> >               pmd = pmd_clear_uffd_wp(pmd);
> > just *before* checking is_swap_pmd(). Fixed in 5.14-rc1 in commit
> > 8f34f1eac382 ("mm/userfaultfd: fix uffd-wp special cases for fork()").
> >
> > But clearing the bit would be harmless, wouldn't it? Because it wouldn't
> > be set anyway. Waste a day before remembering what I never forgot but
> > somehow blanked out: the L1TF "feature" forced us to invert the offset
> > bits in the pte encoding of a swap entry, so there really is a bit set
> > there in the pmd entry, and clearing it has the effect of setting it in
> > the corresponding swap entry, so incrementing the migration pfn by 2.
> >
> > I cannot explain why Igor never saw this crash on 5.12.12: maybe
> > something else in the environment changed around that time.  And it
> > will take several days for it to be confirmed as the fix in practice.
> >
> > But I'm confident that 8f34f1eac382 will prove to be the fix, so Peter
> > please prepare some backports of that for the various stable/longterm
> > kernels that need it - I've not looked into whether it applies cleanly,
> > or depends on other commits too.  You fixed several related but different
> > things in that commit: but this one is the worst, because it can corrupt
> > even those who are not using UFFD_WP at all.
>
> Looks right to me, b569a1760782 ("userfaultfd: wp: drop _PAGE_UFFD_WP
> properly
> when fork", 2020-04-07) seems to be the culprit.  I didn't notice the side
> effect in the bug or in the fix, or it should have already land stables. I
> am
> very sorry for such a preliminary bug that caused this fallout - I really
> can't
> tell why I completely didn't look at is_swap_pte() that's so obvious
> indeed.
>
> I checked it up, 5.6.y doesn't have the issue commit yet as it's not
> marked as
> "fixes". It started to show up in 5.7.y~5.13.y. 5.14-rc1 has 8f34f1eac382
> which
> is the fix.  So I think we need the fix or equivalent fix for 5.7.y~5.13.y.
>
> 5.12.y & 5.13.y can pick up the fix 8f34f1eac382 cleanly.  For the olders
> (5.7.y~5.11.y) they can't.  I plan to revert b569a1760782 instead.
>

FTR, even though 8f34f1eac382 applies cleanly it does not compile.
The 1st patch of that series is also required (5fc7a5f6fd04) - it removes
use of
*vma, which is later removed by the patch that fixes the actual problem.


>
> >
> > Many thans for reporting and helping, Igor.
> > Hugh
> >
> > p.s. Peter, unrelated to this particular bug, and should not divert from
> > fixing it: but looking again at those swap encodings, and particularly
> > the soft_dirty manipulations: they look very fragile. I think uffd_wp
> > was wrong to follow that bad example, and your upcoming new encoding
> > (that I have previously called elegant) takes it a worse step further.
> >
> > I think we should change to a rule where the architecture-independent
> > swp_entry_t contains *all* the info, including bits for soft_dirty and
> > uffd_wp, so that swap entry cases can move immediately to decoding from
> > arch-dependent pte to arch-independent swp_entry_t, and do all the
> > manipulations on that. But I don't have time to make that change, and
> > probably neither do you, and making the change is liable to introduce
> > errors itself. So, no immediate plans, but please keep in mind.
>
> Curious: did we encounter similar issue previously where soft dirty bit is
> applied wrongly so causing hard-to-debug issues?
>
> If this is destined to be the best solution, I can work on both of them.
> I am
> just worried that's too big a change as you said so we don't know what's
> the
> most efficient considering total time we use to develop, review and debug
> them.
>
> The other alternative is we fix bugs; I know that's so cheap a word when I
> said
> it, however we still can't deny it as an option yet.
>
> We can definitely discuss this out of this thread and I'll prepare the
> backport
> first.  For all the cases, this bug definitely brings some alert, and I'll
> keep
> that in mind.
>
> Please let me know if there's any comment on the backport plan above, or
> I'll
> prepare the patches for all the branches before tomorrow.
>
> Thanks,
>
> --
> Peter Xu
>
>

-- 

Igor Raits

Sr. SW Engineer

igor@gooddata.com

+420 775 117 817

Moravske namesti 1007/14

602 00 Brno-Veveri, Czech Republic

Twitter <https://twitter.com/gooddata> | Facebook
<https://www.facebook.com/gooddata> | LinkedIn
<http://www.linkedin.com/company/gooddata> | Blog
<http://www.gooddata.com/blog>


<https://www.gooddata.com/>

--00000000000013642705c7894307
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"g=
mail_attr">On Tue, Jul 20, 2021 at 12:13 AM Peter Xu &lt;<a href=3D"mailto:=
peterx@redhat.com">peterx@redhat.com</a>&gt; wrote:<br></div><blockquote cl=
ass=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid=
 rgb(204,204,204);padding-left:1ex">On Mon, Jul 19, 2021 at 12:11:21PM -070=
0, Hugh Dickins wrote:<br>
&gt; Hi Peter,<br>
<br>
Hi, Hugh,<br>
<br>
&gt; <br>
&gt; I believe you have already fixed this, but the fix needs to go to stab=
le.<br>
&gt; Sorry, the messages below are a muddle of top and middle posting,<br>
&gt; I&#39;ll resume at the bottom.<br>
&gt; <br>
&gt; On Fri, 16 Jul 2021, Hugh Dickins wrote:<br>
&gt; &gt; On Thu, 15 Jul 2021, Igor Raits wrote:<br>
&gt; &gt; <br>
&gt; &gt; &gt; Hi everyone again,<br>
&gt; &gt; &gt; <br>
&gt; &gt; &gt; I&#39;ve been trying to reproduce this issue but still can&#=
39;t find a consistent<br>
&gt; &gt; &gt; pattern.<br>
&gt; &gt; &gt; <br>
&gt; &gt; &gt; However, it did happen once more and this time on 5.13.1:<br=
>
&gt; &gt; <br>
&gt; &gt; Thanks for the updates, Igor.<br>
&gt; &gt; <br>
&gt; &gt; I have to admit that what you have reported confirms the suspicio=
n<br>
&gt; &gt; that it&#39;s a bug introduced by one of my &quot;stable&quot; pa=
tches in 5.12.14<br>
&gt; &gt; (which are also in 5.13): nothing else between 5.12.12 and 5.12.1=
4<br>
&gt; &gt; seems likely to be relevant.<br>
&gt; &gt; <br>
&gt; &gt; But I&#39;ve gone back and forth and not been able to spot the pr=
oblem.<br>
&gt; &gt; <br>
&gt; &gt; Please would you send (either privately to me, or to the list) yo=
ur<br>
&gt; &gt; 5.13.1 kernel&#39;s .config, and disassembly of pmd_migration_ent=
ry_wait()<br>
&gt; &gt; from its vmlinux (with line numbers if available; or just send th=
e<br>
&gt; &gt; whole vmlinux if that&#39;s easier, and I&#39;ll disassemble).<br=
>
&gt; &gt; <br>
&gt; &gt; I am hoping that the disassembly, together with the register cont=
ents<br>
&gt; &gt; that you&#39;ve shown, will help guide towards an answer.<br>
&gt; &gt; <br>
&gt; &gt; Thanks,<br>
&gt; &gt; Hugh<br>
&gt; &gt; <br>
&gt; &gt; &gt; <br>
&gt; &gt; &gt; [=C2=A0 222.068216] ------------[ cut here ]------------<br>
&gt; &gt; &gt; [=C2=A0 222.072884] kernel BUG at include/linux/swapops.h:20=
4!<br>
&gt; &gt; &gt; [=C2=A0 222.078062] invalid opcode: 0000 [#1] SMP NOPTI<br>
&gt; &gt; &gt; [=C2=A0 222.082618] CPU: 38 PID: 9828 Comm: rpc-worker Taint=
ed: G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 E<br>
&gt; &gt; &gt;=C2=A0 =C2=A05.13.1-1.gdc.el8.x86_64 #1<br>
&gt; &gt; &gt; [=C2=A0 222.091894] Hardware name: HPE ProLiant DL380 Gen10/=
ProLiant DL380<br>
&gt; &gt; &gt; Gen10, BIOS U30 05/24/2021<br>
&gt; &gt; &gt; [=C2=A0 222.100468] RIP: 0010:pmd_migration_entry_wait+0x132=
/0x140<br>
&gt; &gt; &gt; [=C2=A0 222.105994] Code: 02 00 00 00 5b 4c 89 c7 5d e9 ca c=
5 f6 ff 48 81 e2 00<br>
&gt; &gt; &gt; f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 =
44 ff ff ff<br>
&gt; &gt; &gt; &lt;0f&gt; 0b 48 8b 2d 65 48 30 01 e9 ef fe ff ff 0f 1f 44 0=
0 00 41 55 48<br>
&gt; &gt; &gt; [=C2=A0 222.124878] RSP: 0000:ffffbcfe9eb7bdd8 EFLAGS: 00010=
246<br>
&gt; &gt; &gt; [=C2=A0 222.130134] RAX: 0057ffffc0000000 RBX: ffff9eec4b3cd=
bf8 RCX:<br>
&gt; &gt; &gt; ffffffffffffffff<br>
&gt; &gt; &gt; [=C2=A0 222.137309] RDX: 0000000000000000 RSI: ffff9eec4b3cd=
bf8 RDI:<br>
&gt; &gt; &gt; ffffdf55c52cf368<br>
&gt; &gt; &gt; [=C2=A0 222.144485] RBP: ffffdf55c52cf368 R08: ffffdf57428d8=
080 R09:<br>
&gt; &gt; &gt; 0000000000000000<br>
&gt; &gt; &gt; [=C2=A0 222.151661] R10: 0000000000000000 R11: 0000000000000=
000 R12:<br>
&gt; &gt; &gt; 0000000000000bf8<br>
&gt; &gt; &gt; [=C2=A0 222.158837] R13: 0400000000000000 R14: 0400000000000=
080 R15:<br>
&gt; &gt; &gt; ffff9eec2825b1f8<br>
&gt; &gt; &gt; [=C2=A0 222.166015] FS:=C2=A0 00007f6754aeb700(0000) GS:ffff=
9f49bfd00000(0000)<br>
&gt; &gt; &gt; knlGS:0000000000000000<br>
&gt; &gt; &gt; [=C2=A0 222.174153] CS:=C2=A0 0010 DS: 0000 ES: 0000 CR0: 00=
00000080050033<br>
&gt; &gt; &gt; [=C2=A0 222.179932] CR2: 00007f676ffffd98 CR3: 000000012bf6a=
002 CR4:<br>
&gt; &gt; &gt; 00000000007726e0<br>
&gt; &gt; &gt; [=C2=A0 222.187109] DR0: 0000000000000000 DR1: 0000000000000=
000 DR2:<br>
&gt; &gt; &gt; 0000000000000000<br>
&gt; &gt; &gt; [=C2=A0 222.194283] DR3: 0000000000000000 DR6: 00000000fffe0=
ff0 DR7:<br>
&gt; &gt; &gt; 0000000000000400<br>
&gt; &gt; &gt; [=C2=A0 222.201457] PKRU: 55555554<br>
&gt; &gt; &gt; [=C2=A0 222.204178] Call Trace:<br>
&gt; &gt; &gt; [=C2=A0 222.206638]=C2=A0 __handle_mm_fault+0x5ad/0x6e0<br>
&gt; &gt; &gt; [=C2=A0 222.210760]=C2=A0 ? sysvec_call_function_single+0xb/=
0x90<br>
&gt; &gt; &gt; [=C2=A0 222.215672]=C2=A0 handle_mm_fault+0xc5/0x290<br>
&gt; &gt; &gt; [=C2=A0 222.219529]=C2=A0 do_user_addr_fault+0x1a9/0x660<br>
&gt; &gt; &gt; [=C2=A0 222.223740]=C2=A0 ? sched_clock_cpu+0xc/0xa0<br>
&gt; &gt; &gt; [=C2=A0 222.227602]=C2=A0 exc_page_fault+0x68/0x130<br>
&gt; &gt; &gt; [=C2=A0 222.231373]=C2=A0 ? asm_exc_page_fault+0x8/0x30<br>
&gt; &gt; &gt; [=C2=A0 222.235495]=C2=A0 asm_exc_page_fault+0x1e/0x30<br>
&gt; &gt; &gt; [=C2=A0 222.239526] RIP: 0033:0x7f67baaed734<br>
&gt; &gt; &gt; [=C2=A0 222.243120] Code: 89 08 48 8b 35 dd 3b 21 00 4c 8d 0=
d d6 3b 21 00 31 c0<br>
&gt; &gt; &gt; 4c 39 ce 74 73 0f 1f 80 00 00 00 00 48 8d 96 40 fd ff ff 49 =
39 d2 74 22<br>
&gt; &gt; &gt; &lt;48&gt; 8b 96 d8 03 00 00 48 01 15 4e 7c 21 00 80 be 50 0=
3 00 00 00 c7<br>
&gt; &gt; &gt; [=C2=A0 222.262002] RSP: 002b:00007f6754aea298 EFLAGS: 00010=
287<br>
&gt; &gt; &gt; [=C2=A0 222.267257] RAX: 0000000000000000 RBX: 0000000000000=
000 RCX:<br>
&gt; &gt; &gt; 0000000000000000<br>
&gt; &gt; &gt; [=C2=A0 222.274432] RDX: 00007f676ffff700 RSI: 00007f676ffff=
9c0 RDI:<br>
&gt; &gt; &gt; 00007f676f7fec10<br>
&gt; &gt; &gt; [=C2=A0 222.281609] RBP: 0000000000000001 R08: 00007f676f7fe=
d10 R09:<br>
&gt; &gt; &gt; 00007f67bad012f0<br>
&gt; &gt; &gt; [=C2=A0 222.288785] R10: 00007f6754aeb700 R11: 0000000000000=
202 R12:<br>
&gt; &gt; &gt; 0000000000000001<br>
&gt; &gt; &gt; [=C2=A0 222.295961] R13: 0000000000000006 R14: 0000000000000=
e28 R15:<br>
&gt; &gt; &gt; 00007f674006e1f0<br>
&gt; &gt; &gt; [=C2=A0 222.303137] Modules linked in: vhost_net(E) vhost(E)=
 vhost_iotlb(E)<br>
&gt; &gt; &gt; tap(E) tun(E) tcp_diag(E) udp_diag(E) inet_diag(E) netconsol=
e(E)<br>
&gt; &gt; &gt; nf_tables(E) vxlan(E) ip6_udp_tunnel(E) udp_tunnel(E) nfnetl=
ink(E)<br>
&gt; &gt; &gt; binfmt_misc(E) iscsi_tcp(E) libiscsi_tcp(E) 8021q(E) garp(E)=
 mrp(E)<br>
&gt; &gt; &gt; bonding(E) tls(E) vfat(E) fat(E) dm_service_time(E) dm_multi=
path(E)<br>
&gt; &gt; &gt; rpcrdma(E) sunrpc(E) rdma_ucm(E) ib_srpt(E) ib_isert(E) iscs=
i_target_mod(E)<br>
&gt; &gt; &gt; target_core_mod(E) ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) l=
ibiscsi(E)<br>
&gt; &gt; &gt; intel_rapl_msr(E) intel_rapl_common(E) scsi_transport_iscsi(=
E)<br>
&gt; &gt; &gt; isst_if_common(E) ipmi_ssif(E) nfit(E) libnvdimm(E) x86_pkg_=
temp_thermal(E)<br>
&gt; &gt; &gt; intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypas=
s(E)<br>
&gt; &gt; &gt; crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) r=
apl(E) qedr(E)<br>
&gt; &gt; &gt; mei_me(E) acpi_ipmi(E) ib_uverbs(E) intel_cstate(E) ipmi_si(=
E) ib_core(E)<br>
&gt; &gt; &gt; ipmi_devintf(E) dm_mod(E) ioatdma(E) ses(E) intel_uncore(E) =
pcspkr(E)<br>
&gt; &gt; &gt; enclosure(E) mei(E) hpwdt(E) hpilo(E) lpc_ich(E) intel_pch_t=
hermal(E)<br>
&gt; &gt; &gt; dca(E) ipmi_msghandler(E)<br>
&gt; &gt; &gt; [=C2=A0 222.303181]=C2=A0 acpi_power_meter(E) ext4(E) mbcach=
e(E) jbd2(E) sd_mod(E)<br>
&gt; &gt; &gt; t10_pi(E) sg(E) qedf(E) qede(E) libfcoe(E) qed(E) libfc(E) s=
martpqi(E)<br>
&gt; &gt; &gt; scsi_transport_fc(E) tg3(E) scsi_transport_sas(E) crc8(E) wm=
i(E)<br>
&gt; &gt; &gt; nf_conntrack(E) libcrc32c(E) crc32c_intel(E) nf_defrag_ipv6(=
E)<br>
&gt; &gt; &gt; nf_defrag_ipv4(E) br_netfilter(E) bridge(E) stp(E) llc(E)<br=
>
&gt; &gt; &gt; [=C2=A0 222.420050] ---[ end trace bcf7b6d1610cc21f ]---<br>
&gt; &gt; &gt; [=C2=A0 222.572925] RIP: 0010:pmd_migration_entry_wait+0x132=
/0x140<br>
&gt; &gt; &gt; [=C2=A0 222.578469] Code: 02 00 00 00 5b 4c 89 c7 5d e9 ca c=
5 f6 ff 48 81 e2 00<br>
&gt; &gt; &gt; f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 =
44 ff ff ff<br>
&gt; &gt; &gt; &lt;0f&gt; 0b 48 8b 2d 65 48 30 01 e9 ef fe ff ff 0f 1f 44 0=
0 00 41 55 48<br>
&gt; &gt; &gt; [=C2=A0 222.597359] RSP: 0000:ffffbcfe9eb7bdd8 EFLAGS: 00010=
246<br>
&gt; &gt; &gt; [=C2=A0 222.602620] RAX: 0057ffffc0000000 RBX: ffff9eec4b3cd=
bf8 RCX:<br>
&gt; &gt; &gt; ffffffffffffffff<br>
&gt; &gt; &gt; [=C2=A0 222.609807] RDX: 0000000000000000 RSI: ffff9eec4b3cd=
bf8 RDI:<br>
&gt; &gt; &gt; ffffdf55c52cf368<br>
&gt; &gt; &gt; [=C2=A0 222.616990] RBP: ffffdf55c52cf368 R08: ffffdf57428d8=
080 R09:<br>
&gt; &gt; &gt; 0000000000000000<br>
&gt; &gt; &gt; [=C2=A0 222.624177] R10: 0000000000000000 R11: 0000000000000=
000 R12:<br>
&gt; &gt; &gt; 0000000000000bf8<br>
&gt; &gt; &gt; [=C2=A0 222.631361] R13: 0400000000000000 R14: 0400000000000=
080 R15:<br>
&gt; &gt; &gt; ffff9eec2825b1f8<br>
&gt; &gt; &gt; [=C2=A0 222.638548] FS:=C2=A0 00007f6754aeb700(0000) GS:ffff=
9f49bfd00000(0000)<br>
&gt; &gt; &gt; knlGS:0000000000000000<br>
&gt; &gt; &gt; [=C2=A0 222.646694] CS:=C2=A0 0010 DS: 0000 ES: 0000 CR0: 00=
00000080050033<br>
&gt; &gt; &gt; [=C2=A0 222.652481] CR2: 00007f676ffffd98 CR3: 000000012bf6a=
002 CR4:<br>
&gt; &gt; &gt; 00000000007726e0<br>
&gt; &gt; &gt; [=C2=A0 222.659665] DR0: 0000000000000000 DR1: 0000000000000=
000 DR2:<br>
&gt; &gt; &gt; 0000000000000000<br>
&gt; &gt; &gt; [=C2=A0 222.666850] DR3: 0000000000000000 DR6: 00000000fffe0=
ff0 DR7:<br>
&gt; &gt; &gt; 0000000000000400<br>
&gt; &gt; &gt; [=C2=A0 222.674031] PKRU: 55555554<br>
&gt; &gt; &gt; [=C2=A0 222.676758] Kernel panic - not syncing: Fatal except=
ion<br>
&gt; &gt; &gt; [=C2=A0 222.817538] Kernel Offset: 0x16000000 from 0xfffffff=
f81000000<br>
&gt; &gt; &gt; (relocation range: 0xffffffff80000000-0xffffffffbfffffff)<br=
>
&gt; &gt; &gt; [=C2=A0 222.965540] ---[ end Kernel panic - not syncing: Fat=
al exception ]---<br>
&gt; &gt; &gt; <br>
&gt; &gt; &gt; On Sun, Jul 11, 2021 at 8:06 AM Igor Raits &lt;<a href=3D"ma=
ilto:igor@gooddata.com" target=3D"_blank">igor@gooddata.com</a>&gt; wrote:<=
br>
&gt; &gt; &gt; <br>
&gt; &gt; &gt; &gt; Hi Hugh,<br>
&gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; On Sun, Jul 11, 2021 at 6:17 AM Hugh Dickins &lt;<a hre=
f=3D"mailto:hughd@google.com" target=3D"_blank">hughd@google.com</a>&gt; wr=
ote:<br>
&gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt;&gt; On Sat, 10 Jul 2021, Igor Raits wrote:<br>
&gt; &gt; &gt; &gt;&gt;<br>
&gt; &gt; &gt; &gt;&gt; &gt; Hello,<br>
&gt; &gt; &gt; &gt;&gt; &gt;<br>
&gt; &gt; &gt; &gt;&gt; &gt; I&#39;ve seen one weird bug on 5.12.14 that ha=
ppened a couple of times when<br>
&gt; &gt; &gt; &gt;&gt; I<br>
&gt; &gt; &gt; &gt;&gt; &gt; started a bunch of VMs on a server.<br>
&gt; &gt; &gt; &gt;&gt;<br>
&gt; &gt; &gt; &gt;&gt; Would it be possible for you to try the same on a 5=
.12.13 kernel?<br>
&gt; &gt; &gt; &gt;&gt; Perhaps by reverting the diff between 5.12.13 and 5=
.12.14 temporarily.<br>
&gt; &gt; &gt; &gt;&gt; Enough to form an impression of whether the issue i=
s new in 5.12.14.<br>
&gt; &gt; &gt; &gt;&gt;<br>
&gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; We&#39;ve been using 5.12.12 for quite some time (~ a m=
onth) and I never saw<br>
&gt; &gt; &gt; &gt; it there.<br>
&gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; But I have to admit that I don&#39;t really have a repr=
oducer. For example, on<br>
&gt; &gt; &gt; &gt; servers where it happened,<br>
&gt; &gt; &gt; &gt; I just rebooted them and panic did not happen anymore (=
so I saw it only<br>
&gt; &gt; &gt; &gt; only once,<br>
&gt; &gt; &gt; &gt; only on 2 servers out of 32 that we have on 5.12.14).<b=
r>
&gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt;&gt; I ask because 5.12.14 did include several fixes and=
 cleanups from me<br>
&gt; &gt; &gt; &gt;&gt; to page_vma_mapped_walk(), and that is involved in =
inserting and<br>
&gt; &gt; &gt; &gt;&gt; removing pmd migration entries.=C2=A0 I am not awar=
e of introducing any<br>
&gt; &gt; &gt; &gt;&gt; bug there, but your report has got me worried.=C2=
=A0 If it&#39;s happening in<br>
&gt; &gt; &gt; &gt;&gt; 5.12.14 but not in 5.12.13, then I must look again =
at my changes.<br>
&gt; &gt; &gt; &gt;&gt;<br>
&gt; &gt; &gt; &gt;&gt; I don&#39;t expect Hillf&#39;s patch to help at at =
all: the pmd_lock()<br>
&gt; &gt; &gt; &gt;&gt; is supposed to be taken by page_vma_mapped_walk(), =
before<br>
&gt; &gt; &gt; &gt;&gt; set_pmd_migration_entry() and remove_migration_pmd(=
) are called.<br>
&gt; &gt; &gt; &gt;&gt;<br>
&gt; &gt; &gt; &gt;&gt; Thanks,<br>
&gt; &gt; &gt; &gt;&gt; Hugh<br>
&gt; &gt; &gt; &gt;&gt;<br>
&gt; &gt; &gt; &gt;&gt; &gt;<br>
&gt; &gt; &gt; &gt;&gt; &gt; I&#39;ve briefly googled this problem but coul=
d not find any relevant commit<br>
&gt; &gt; &gt; &gt;&gt; &gt; that would fix this issue.<br>
&gt; &gt; &gt; &gt;&gt; &gt;<br>
&gt; &gt; &gt; &gt;&gt; &gt; Do you have any hint how to debug this further=
 or know the fix by any<br>
&gt; &gt; &gt; &gt;&gt; &gt; chance?<br>
&gt; &gt; &gt; &gt;&gt; &gt;<br>
&gt; &gt; &gt; &gt;&gt; &gt; Thanks in advance. Stack trace following:<br>
&gt; &gt; &gt; &gt;&gt; &gt;<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 376.876610] ------------[ cut here ]--=
----------<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 376.881274] kernel BUG at include/linu=
x/swapops.h:204!<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 376.886455] invalid opcode: 0000 [#1] =
SMP NOPTI<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 376.891014] CPU: 40 PID: 11775 Comm: r=
pc-worker Tainted: G<br>
&gt; &gt; &gt; &gt;&gt;=C2=A0 =C2=A0E<br>
&gt; &gt; &gt; &gt;&gt; &gt;=C2=A0 =C2=A0 =C2=A05.12.14-1.gdc.el8.x86_64 #1=
<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 376.900464] Hardware name: HPE ProLian=
t DL380 Gen10/ProLiant DL380<br>
&gt; &gt; &gt; &gt;&gt; &gt; Gen10, BIOS U30 05/24/2021<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 376.909038] RIP: 0010:pmd_migration_en=
try_wait+0x132/0x140<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 376.914562] Code: 02 00 00 00 5b 4c 89=
 c7 5d e9 8a e4 f6 ff 48 81 e2<br>
&gt; &gt; &gt; &gt;&gt; 00<br>
&gt; &gt; &gt; &gt;&gt; &gt; f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 0=
0 00 75 80 e9 44 ff ff ff<br>
&gt; &gt; &gt; &gt;&gt; &gt; &lt;0f&gt; 0b 48 8b 2d 75 bd 30 01 e9 ef fe ff=
 ff 0f 1f 44 00 00 41 55 48<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 376.933443] RSP: 0000:ffffb65a5e1cfdc8=
 EFLAGS: 00010246<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 376.938701] RAX: 0017ffffc0000000 RBX:=
 ffff908b8ecabaf8 RCX:<br>
&gt; &gt; &gt; &gt;&gt; &gt; ffffffffffffffff<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 376.945878] RDX: 0000000000000000 RSI:=
 ffff908b8ecabaf8 RDI:<br>
&gt; &gt; &gt; &gt;&gt; &gt; fffff497473b2ae8<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 376.953055] RBP: fffff497473b2ae8 R08:=
 fffff49747fa8080 R09:<br>
&gt; &gt; &gt; &gt;&gt; &gt; 0000000000000000<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 376.960230] R10: 0000000000000000 R11:=
 0000000000000000 R12:<br>
&gt; &gt; &gt; &gt;&gt; &gt; 0000000000000af8<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 376.967407] R13: 0400000000000000 R14:=
 0400000000000080 R15:<br>
&gt; &gt; &gt; &gt;&gt; &gt; ffff908bbef7b6a8<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 376.974582] FS:=C2=A0 00007f5bb1f81700=
(0000) GS:ffff90e87fd80000(0000)<br>
&gt; &gt; &gt; &gt;&gt; &gt; knlGS:0000000000000000<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 376.982718] CS:=C2=A0 0010 DS: 0000 ES=
: 0000 CR0: 0000000080050033<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 376.988497] CR2: 00007f5b2bfffd98 CR3:=
 00000001f793e006 CR4:<br>
&gt; &gt; &gt; &gt;&gt; &gt; 00000000007726e0<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 376.995673] DR0: 0000000000000000 DR1:=
 0000000000000000 DR2:<br>
&gt; &gt; &gt; &gt;&gt; &gt; 0000000000000000<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.002849] DR3: 0000000000000000 DR6:=
 00000000fffe0ff0 DR7:<br>
&gt; &gt; &gt; &gt;&gt; &gt; 0000000000000400<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.010026] PKRU: 55555554<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.012745] Call Trace:<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.015207]=C2=A0 __handle_mm_fault+0x=
5ad/0x6e0<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.019335]=C2=A0 handle_mm_fault+0xc5=
/0x290<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.023194]=C2=A0 do_user_addr_fault+0=
x1cd/0x740<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.027406]=C2=A0 exc_page_fault+0x54/=
0x110<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.031182]=C2=A0 ? asm_exc_page_fault=
+0x8/0x30<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.035307]=C2=A0 asm_exc_page_fault+0=
x1e/0x30<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.039340] RIP: 0033:0x7f5bb91d6734<b=
r>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.042937] Code: 89 08 48 8b 35 dd 3b=
 21 00 4c 8d 0d d6 3b 21 00 31<br>
&gt; &gt; &gt; &gt;&gt; c0<br>
&gt; &gt; &gt; &gt;&gt; &gt; 4c 39 ce 74 73 0f 1f 80 00 00 00 00 48 8d 96 4=
0 fd ff ff 49 39 d2 74 22<br>
&gt; &gt; &gt; &gt;&gt; &gt; &lt;48&gt; 8b 96 d8 03 00 00 48 01 15 4e 7c 21=
 00 80 be 50 03 00 00 00 c7<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.061820] RSP: 002b:00007f5bb1f7ff58=
 EFLAGS: 00010206<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.067076] RAX: 0000000000000000 RBX:=
 0000000000000000 RCX:<br>
&gt; &gt; &gt; &gt;&gt; &gt; 00007f5ba0000020<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.074255] RDX: 00007f5b2bfff700 RSI:=
 00007f5b2bfff9c0 RDI:<br>
&gt; &gt; &gt; &gt;&gt; &gt; 0000000000000001<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.081429] RBP: 0000000000000001 R08:=
 0000000000000000 R09:<br>
&gt; &gt; &gt; &gt;&gt; &gt; 00007f5bb93ea2f0<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.088606] R10: 00007f5bb1f81700 R11:=
 0000000000000202 R12:<br>
&gt; &gt; &gt; &gt;&gt; &gt; 0000000000000001<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.095782] R13: 0000000000000006 R14:=
 0000000000000cb4 R15:<br>
&gt; &gt; &gt; &gt;&gt; &gt; 00007f5bb1f801f0<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.102958] Modules linked in: ebt_arp=
(E) nft_meta_bridge(E)<br>
&gt; &gt; &gt; &gt;&gt; &gt; ip6_tables(E) xt_CT(E) nf_log_ipv4(E) nf_log_c=
ommon(E) nft_limit(E)<br>
&gt; &gt; &gt; &gt;&gt; &gt; nft_counter(E) xt_LOG(E) xt_limit(E) xt_mac(E)=
 xt_set(E) xt_multiport(E)<br>
&gt; &gt; &gt; &gt;&gt; &gt; xt_state(E) xt_conntrack(E) xt_comment(E) xt_p=
hysdev(E) nft_compat(E)<br>
&gt; &gt; &gt; &gt;&gt; &gt; ip_set_hash_net(E) ip_set(E) vhost_net(E) vhos=
t(E) vhost_iotlb(E) tap(E)<br>
&gt; &gt; &gt; &gt;&gt; &gt; tun(E) tcp_diag(E) udp_diag(E) inet_diag(E) ne=
tconsole(E) nf_tables(E)<br>
&gt; &gt; &gt; &gt;&gt; &gt; vxlan(E) ip6_udp_tunnel(E) udp_tunnel(E) nfnet=
link(E) binfmt_misc(E)<br>
&gt; &gt; &gt; &gt;&gt; &gt; iscsi_tcp(E) libiscsi_tcp(E) 8021q(E) garp(E) =
mrp(E) bonding(E) tls(E)<br>
&gt; &gt; &gt; &gt;&gt; &gt; vfat(E) fat(E) dm_service_time(E) dm_multipath=
(E) rpcrdma(E) sunrpc(E)<br>
&gt; &gt; &gt; &gt;&gt; &gt; rdma_ucm(E) ib_srpt(E) ib_isert(E) iscsi_targe=
t_mod(E)<br>
&gt; &gt; &gt; &gt;&gt; target_core_mod(E)<br>
&gt; &gt; &gt; &gt;&gt; &gt; ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) libisc=
si(E)<br>
&gt; &gt; &gt; &gt;&gt; scsi_transport_iscsi(E)<br>
&gt; &gt; &gt; &gt;&gt; &gt; intel_rapl_msr(E) qedr(E) intel_rapl_common(E)=
 ib_uverbs(E)<br>
&gt; &gt; &gt; &gt;&gt; &gt; isst_if_common(E) ib_core(E) nfit(E) libnvdimm=
(E)<br>
&gt; &gt; &gt; &gt;&gt; x86_pkg_temp_thermal(E)<br>
&gt; &gt; &gt; &gt;&gt; &gt; intel_powerclamp(E) coretemp(E) kvm_intel(E) k=
vm(E) irqbypass(E)<br>
&gt; &gt; &gt; &gt;&gt; &gt; crct10dif_pclmul(E)<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.102999]=C2=A0 crc32_pclmul(E) ghas=
h_clmulni_intel(E) rapl(E)<br>
&gt; &gt; &gt; &gt;&gt; &gt; intel_cstate(E) ipmi_ssif(E) acpi_ipmi(E) ipmi=
_si(E) mei_me(E)<br>
&gt; &gt; &gt; &gt;&gt; ioatdma(E)<br>
&gt; &gt; &gt; &gt;&gt; &gt; ipmi_devintf(E) dm_mod(E) ses(E) intel_uncore(=
E) pcspkr(E) qede(E)<br>
&gt; &gt; &gt; &gt;&gt; &gt; enclosure(E) tg3(E) mei(E) lpc_ich(E) hpilo(E)=
 hpwdt(E)<br>
&gt; &gt; &gt; &gt;&gt; &gt; intel_pch_thermal(E) dca(E) ipmi_msghandler(E)=
 acpi_power_meter(E)<br>
&gt; &gt; &gt; &gt;&gt; ext4(E)<br>
&gt; &gt; &gt; &gt;&gt; &gt; mbcache(E) jbd2(E) sd_mod(E) t10_pi(E) sg(E) q=
edf(E) qed(E) crc8(E)<br>
&gt; &gt; &gt; &gt;&gt; &gt; libfcoe(E) libfc(E) smartpqi(E) scsi_transport=
_fc(E)<br>
&gt; &gt; &gt; &gt;&gt; scsi_transport_sas(E)<br>
&gt; &gt; &gt; &gt;&gt; &gt; wmi(E) nf_conntrack(E) nf_defrag_ipv6(E) libcr=
c32c(E) crc32c_intel(E)<br>
&gt; &gt; &gt; &gt;&gt; &gt; nf_defrag_ipv4(E) br_netfilter(E) bridge(E) st=
p(E) llc(E)<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.243468] ---[ end trace 04bce3bb051=
f7620 ]---<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.385645] RIP: 0010:pmd_migration_en=
try_wait+0x132/0x140<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.391194] Code: 02 00 00 00 5b 4c 89=
 c7 5d e9 8a e4 f6 ff 48 81 e2<br>
&gt; &gt; &gt; &gt;&gt; 00<br>
&gt; &gt; &gt; &gt;&gt; &gt; f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 0=
0 00 75 80 e9 44 ff ff ff<br>
&gt; &gt; &gt; &gt;&gt; &gt; &lt;0f&gt; 0b 48 8b 2d 75 bd 30 01 e9 ef fe ff=
 ff 0f 1f 44 00 00 41 55 48<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.410091] RSP: 0000:ffffb65a5e1cfdc8=
 EFLAGS: 00010246<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.415355] RAX: 0017ffffc0000000 RBX:=
 ffff908b8ecabaf8 RCX:<br>
&gt; &gt; &gt; &gt;&gt; &gt; ffffffffffffffff<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.422540] RDX: 0000000000000000 RSI:=
 ffff908b8ecabaf8 RDI:<br>
&gt; &gt; &gt; &gt;&gt; &gt; fffff497473b2ae8<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.429721] RBP: fffff497473b2ae8 R08:=
 fffff49747fa8080 R09:<br>
&gt; &gt; &gt; &gt;&gt; &gt; 0000000000000000<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.436902] R10: 0000000000000000 R11:=
 0000000000000000 R12:<br>
&gt; &gt; &gt; &gt;&gt; &gt; 0000000000000af8<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.444086] R13: 0400000000000000 R14:=
 0400000000000080 R15:<br>
&gt; &gt; &gt; &gt;&gt; &gt; ffff908bbef7b6a8<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.451272] FS:=C2=A0 00007f5bb1f81700=
(0000) GS:ffff90e87fd80000(0000)<br>
&gt; &gt; &gt; &gt;&gt; &gt; knlGS:0000000000000000<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.459415] CS:=C2=A0 0010 DS: 0000 ES=
: 0000 CR0: 0000000080050033<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.465196] CR2: 00007f5b2bfffd98 CR3:=
 00000001f793e006 CR4:<br>
&gt; &gt; &gt; &gt;&gt; &gt; 00000000007726e0<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.472377] DR0: 0000000000000000 DR1:=
 0000000000000000 DR2:<br>
&gt; &gt; &gt; &gt;&gt; &gt; 0000000000000000<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.479556] DR3: 0000000000000000 DR6:=
 00000000fffe0ff0 DR7:<br>
&gt; &gt; &gt; &gt;&gt; &gt; 0000000000000400<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.486738] PKRU: 55555554<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.489465] Kernel panic - not syncing=
: Fatal exception<br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.573911] Kernel Offset: 0xa000000 f=
rom 0xffffffff81000000<br>
&gt; &gt; &gt; &gt;&gt; (relocation<br>
&gt; &gt; &gt; &gt;&gt; &gt; range: 0xffffffff80000000-0xffffffffbfffffff)<=
br>
&gt; &gt; &gt; &gt;&gt; &gt; [=C2=A0 377.716482] ---[ end Kernel panic - no=
t syncing: Fatal exception ]---<br>
&gt; <br>
&gt; Disassembly of the vmlinux Igor sent (along with other info) confirmed=
<br>
&gt; something I suspected, that R08: fffff49747fa8080 in one of the dumps,=
<br>
&gt; R08: ffffdf57428d8080 in the other, is the relevant struct page pointe=
r<br>
&gt; (and RAX the page-&gt;flags, which look like it was pointing at a good=
 page).<br>
&gt; <br>
&gt; A page pointer ....8080 in pmd_migration_entry_wait() is interesting:<=
br>
&gt; normally I&#39;d expect that to be ....0000 or ....8000, pointing to t=
he<br>
&gt; head of a huge page.=C2=A0 But instead it&#39;s pointing to the second=
 tail<br>
&gt; (though by now that compound page has been freed, and head pointers in=
<br>
&gt; the tails reset to 0): as if the pfn has been incremented by 2 somehow=
.<br>
&gt; <br>
&gt; And if the pfn (swp_offset) in the migration entry has got corrupted,<=
br>
&gt; then it&#39;s no surprise that when removing migration entries,<br>
&gt; page_vma_mapped_walk() would see migration_entry_to_page(entry) !=3D p=
age,<br>
&gt; so be unable to replace that migration entry, leaving it behind for th=
e<br>
&gt; user to hit BUG_ON(!PageLocked) in pmd_migration_entry_wait() when<br>
&gt; faulting on it later.<br>
&gt; <br>
&gt; So, what might increment the swp_offset by 2? Hunt around the encoding=
s.<br>
&gt; Hmm, _PAGE_BIT_UFFD_WP is _PAGE_BIT_SOFTW2 which is bit 10, whereas<br=
>
&gt; _PAGE_BIT_PROTNONE (top bit to be avoided in pte encoding of swap)<br>
&gt; is _PAGE_BIT_GLOBAL is bit 8. After overcoming off-by-one confusions,<=
br>
&gt; it looks like if something somewhere were to set _PAGE_BIT_UFFD_WP<br>
&gt; in a migration pmd (whereas it&#39;s only suitable for a present pmd),=
<br>
&gt; it would indeed increment the swp_offset by 2.<br>
&gt; <br>
&gt; Hunt for uffd_wps, and run across copy_huge_pmd() in mm/huge_memory.c:=
<br>
&gt; in Igor&#39;s 5.13.1 and 5.12.14 and many others, that says<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0if (!(vma-&gt;vm_flags &amp; VM_UFFD_WP))<br=
>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0pmd =3D pmd_clea=
r_uffd_wp(pmd);<br>
&gt; just *before* checking is_swap_pmd(). Fixed in 5.14-rc1 in commit<br>
&gt; 8f34f1eac382 (&quot;mm/userfaultfd: fix uffd-wp special cases for fork=
()&quot;).<br>
&gt; <br>
&gt; But clearing the bit would be harmless, wouldn&#39;t it? Because it wo=
uldn&#39;t<br>
&gt; be set anyway. Waste a day before remembering what I never forgot but<=
br>
&gt; somehow blanked out: the L1TF &quot;feature&quot; forced us to invert =
the offset<br>
&gt; bits in the pte encoding of a swap entry, so there really is a bit set=
<br>
&gt; there in the pmd entry, and clearing it has the effect of setting it i=
n<br>
&gt; the corresponding swap entry, so incrementing the migration pfn by 2.<=
br>
&gt; <br>
&gt; I cannot explain why Igor never saw this crash on 5.12.12: maybe<br>
&gt; something else in the environment changed around that time.=C2=A0 And =
it<br>
&gt; will take several days for it to be confirmed as the fix in practice.<=
br>
&gt; <br>
&gt; But I&#39;m confident that 8f34f1eac382 will prove to be the fix, so P=
eter<br>
&gt; please prepare some backports of that for the various stable/longterm<=
br>
&gt; kernels that need it - I&#39;ve not looked into whether it applies cle=
anly,<br>
&gt; or depends on other commits too.=C2=A0 You fixed several related but d=
ifferent<br>
&gt; things in that commit: but this one is the worst, because it can corru=
pt<br>
&gt; even those who are not using UFFD_WP at all.<br>
<br>
Looks right to me, b569a1760782 (&quot;userfaultfd: wp: drop _PAGE_UFFD_WP =
properly<br>
when fork&quot;, 2020-04-07) seems to be the culprit.=C2=A0 I didn&#39;t no=
tice the side<br>
effect in the bug or in the fix, or it should have already land stables. I =
am<br>
very sorry for such a preliminary bug that caused this fallout - I really c=
an&#39;t<br>
tell why I completely didn&#39;t look at is_swap_pte() that&#39;s so obviou=
s indeed.<br>
<br>
I checked it up, 5.6.y doesn&#39;t have the issue commit yet as it&#39;s no=
t marked as<br>
&quot;fixes&quot;. It started to show up in 5.7.y~5.13.y. 5.14-rc1 has 8f34=
f1eac382 which<br>
is the fix.=C2=A0 So I think we need the fix or equivalent fix for 5.7.y~5.=
13.y.<br>
<br>
5.12.y &amp; 5.13.y can pick up the fix 8f34f1eac382 cleanly.=C2=A0 For the=
 olders<br>
(5.7.y~5.11.y) they can&#39;t.=C2=A0 I plan to revert b569a1760782 instead.=
<br></blockquote><div><br></div><div>FTR, even though 8f34f1eac382 applies =
cleanly it does not compile.</div><div>The 1st patch of that series is also=
 required (5fc7a5f6fd04) - it removes use of</div><div>*vma, which is later=
 removed by the patch that fixes the actual problem.<br></div><div>=C2=A0</=
div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bor=
der-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
&gt; <br>
&gt; Many thans for reporting and helping, Igor.<br>
&gt; Hugh<br>
&gt; <br>
&gt; p.s. Peter, unrelated to this particular bug, and should not divert fr=
om<br>
&gt; fixing it: but looking again at those swap encodings, and particularly=
<br>
&gt; the soft_dirty manipulations: they look very fragile. I think uffd_wp<=
br>
&gt; was wrong to follow that bad example, and your upcoming new encoding<b=
r>
&gt; (that I have previously called elegant) takes it a worse step further.=
<br>
&gt; <br>
&gt; I think we should change to a rule where the architecture-independent<=
br>
&gt; swp_entry_t contains *all* the info, including bits for soft_dirty and=
<br>
&gt; uffd_wp, so that swap entry cases can move immediately to decoding fro=
m<br>
&gt; arch-dependent pte to arch-independent swp_entry_t, and do all the<br>
&gt; manipulations on that. But I don&#39;t have time to make that change, =
and<br>
&gt; probably neither do you, and making the change is liable to introduce<=
br>
&gt; errors itself. So, no immediate plans, but please keep in mind.<br>
<br>
Curious: did we encounter similar issue previously where soft dirty bit is<=
br>
applied wrongly so causing hard-to-debug issues?<br>
<br>
If this is destined to be the best solution, I can work on both of them.=C2=
=A0 I am<br>
just worried that&#39;s too big a change as you said so we don&#39;t know w=
hat&#39;s the<br>
most efficient considering total time we use to develop, review and debug t=
hem.<br>
<br>
The other alternative is we fix bugs; I know that&#39;s so cheap a word whe=
n I said<br>
it, however we still can&#39;t deny it as an option yet.<br>
<br>
We can definitely discuss this out of this thread and I&#39;ll prepare the =
backport<br>
first.=C2=A0 For all the cases, this bug definitely brings some alert, and =
I&#39;ll keep<br>
that in mind.<br>
<br>
Please let me know if there&#39;s any comment on the backport plan above, o=
r I&#39;ll<br>
prepare the patches for all the branches before tomorrow.<br>
<br>
Thanks,<br>
<br>
-- <br>
Peter Xu<br>
<br>
</blockquote></div><br clear=3D"all"><br>-- <br><div dir=3D"ltr" class=3D"g=
mail_signature"><div dir=3D"ltr"><p dir=3D"ltr" style=3D"line-height:1.38;m=
argin-top:0pt;margin-bottom:0pt"><span style=3D"font-size:10pt;font-family:=
Avenir,sans-serif;color:rgb(0,0,0);background-color:transparent;font-weight=
:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-al=
ign:baseline;white-space:pre-wrap">Igor Raits</span></p><p dir=3D"ltr" styl=
e=3D"line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style=3D"font=
-size:10pt;font-family:Avenir,sans-serif;color:rgb(0,0,0);background-color:=
transparent;font-weight:400;font-style:normal;font-variant:normal;text-deco=
ration:none;vertical-align:baseline;white-space:pre-wrap">Sr. SW Engineer</=
span></p><p dir=3D"ltr" style=3D"line-height:1.38;margin-top:0pt;margin-bot=
tom:0pt"><a href=3D"mailto:igor@gooddata.com" style=3D"text-decoration:none=
" target=3D"_blank"><span style=3D"font-size:10pt;font-family:Avenir,sans-s=
erif;color:rgb(17,85,204);background-color:transparent;font-weight:400;font=
-style:normal;font-variant:normal;text-decoration:underline;vertical-align:=
baseline;white-space:pre-wrap">igor@gooddata.com</span></a></p><p dir=3D"lt=
r" style=3D"line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style=
=3D"font-size:10pt;font-family:Avenir,sans-serif;color:rgb(0,0,0);backgroun=
d-color:transparent;font-weight:400;font-style:normal;font-variant:normal;t=
ext-decoration:none;vertical-align:baseline;white-space:pre-wrap">+420 775 =
117 817</span></p><br><p dir=3D"ltr" style=3D"line-height:1.38;margin-top:0=
pt;margin-bottom:0pt"><span style=3D"font-size:10pt;font-family:Avenir,sans=
-serif;color:rgb(0,0,0);background-color:transparent;font-weight:400;font-s=
tyle:normal;font-variant:normal;text-decoration:none;vertical-align:baselin=
e;white-space:pre-wrap">Moravske namesti 1007/14</span></p><p dir=3D"ltr" s=
tyle=3D"line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style=3D"f=
ont-size:10pt;font-family:Avenir,sans-serif;color:rgb(0,0,0);background-col=
or:transparent;font-weight:400;font-style:normal;font-variant:normal;text-d=
ecoration:none;vertical-align:baseline;white-space:pre-wrap">602 00 Brno-Ve=
veri, Czech Republic</span></p><p dir=3D"ltr" style=3D"line-height:1.38;mar=
gin-top:0pt;margin-bottom:0pt"><a href=3D"https://twitter.com/gooddata" sty=
le=3D"text-decoration:none" target=3D"_blank"><span style=3D"font-size:10pt=
;font-family:Avenir,sans-serif;color:rgb(17,85,204);background-color:transp=
arent;font-weight:400;font-style:normal;font-variant:normal;text-decoration=
:underline;vertical-align:baseline;white-space:pre-wrap">Twitter</span></a>=
<span style=3D"font-size:10pt;font-family:Avenir,sans-serif;color:rgb(0,0,0=
);background-color:transparent;font-weight:400;font-style:normal;font-varia=
nt:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap=
"> | </span><a href=3D"https://www.facebook.com/gooddata" style=3D"text-dec=
oration:none" target=3D"_blank"><span style=3D"font-size:10pt;font-family:A=
venir,sans-serif;color:rgb(17,85,204);background-color:transparent;font-wei=
ght:400;font-style:normal;font-variant:normal;text-decoration:underline;ver=
tical-align:baseline;white-space:pre-wrap">Facebook</span></a><span style=
=3D"font-size:10pt;font-family:Avenir,sans-serif;color:rgb(0,0,0);backgroun=
d-color:transparent;font-weight:400;font-style:normal;font-variant:normal;t=
ext-decoration:none;vertical-align:baseline;white-space:pre-wrap"> | </span=
><a href=3D"http://www.linkedin.com/company/gooddata" style=3D"text-decorat=
ion:none" target=3D"_blank"><span style=3D"font-size:10pt;font-family:Aveni=
r,sans-serif;color:rgb(17,85,204);background-color:transparent;font-weight:=
400;font-style:normal;font-variant:normal;text-decoration:underline;vertica=
l-align:baseline;white-space:pre-wrap">LinkedIn</span></a><span style=3D"fo=
nt-size:10pt;font-family:Avenir,sans-serif;color:rgb(0,0,0);background-colo=
r:transparent;font-weight:400;font-style:normal;font-variant:normal;text-de=
coration:none;vertical-align:baseline;white-space:pre-wrap"> | </span><a hr=
ef=3D"http://www.gooddata.com/blog" style=3D"text-decoration:none" target=
=3D"_blank"><span style=3D"font-size:10pt;font-family:Avenir,sans-serif;col=
or:rgb(17,85,204);background-color:transparent;font-weight:400;font-style:n=
ormal;font-variant:normal;text-decoration:underline;vertical-align:baseline=
;white-space:pre-wrap">Blog</span></a></p><p dir=3D"ltr" style=3D"line-heig=
ht:1.38;margin-top:0pt;margin-bottom:0pt"><br></p><p dir=3D"ltr" style=3D"l=
ine-height:1.38;margin-top:0pt;margin-bottom:0pt"><a href=3D"https://www.go=
oddata.com/" target=3D"_blank"><span style=3D"border:medium none;display:in=
line-block;overflow:hidden;width:407px;height:45px"><img src=3D"https://www=
.gooddata.com/email-signature/gd-visa-signature.png" width=3D"420" height=
=3D"36"></span></a></p></div></div></div>

--00000000000013642705c7894307--