From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 79405C07E96 for ; Sun, 11 Jul 2021 04:17:42 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DA5A861353 for ; Sun, 11 Jul 2021 04:17:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DA5A861353 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 99AB66B0081; Sun, 11 Jul 2021 00:17:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 94A926B0082; Sun, 11 Jul 2021 00:17:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7EB4E6B0083; Sun, 11 Jul 2021 00:17:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0001.hostedemail.com [216.40.44.1]) by kanga.kvack.org (Postfix) with ESMTP id 5E5E56B0081 for ; Sun, 11 Jul 2021 00:17:41 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 79FCE181B8ADC for ; Sun, 11 Jul 2021 04:17:40 +0000 (UTC) X-FDA: 78348998280.09.7BFAEE7 Received: from mail-qt1-f173.google.com (mail-qt1-f173.google.com [209.85.160.173]) by imf30.hostedemail.com (Postfix) with ESMTP id 343A3E00180A for ; Sun, 11 Jul 2021 04:17:40 +0000 (UTC) Received: by mail-qt1-f173.google.com with SMTP id g8so11032182qth.10 for ; Sat, 10 Jul 2021 21:17:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=5OKZ9Uw2DoRsKb/4MF7su40HnEC58GiLos3GSDOpSB0=; b=Y24/DVPenfDOhTSrrlrGCxSXV+uHyHvBww+mpoC4hXONia7CC0fefu5ZDKEpAGxB+G mJt0krBuHq/Sf52/azgw6T6sQAmtsr8/SXjUuEe6vwfSR1553koke6/qckB31JqraHix kFHmq2AacAWaUzOZoCSusFDsHxUup07sfN2F/+Vh0MssaTFc9NO79w3p2Si98GFwRCeT adXpx6CR4XMTAjsrrMigL+WBfS9IXrRBCHkymu3gZG+YyL4j25ZKipfOnAAO0Q9lBHtX Vjx8aiE6h0NY8J8QGFg98YAehiSkxIA2rhDSq7iMigtioHqVTcQqGeYQNCw9pbBJ6eac +1jw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=5OKZ9Uw2DoRsKb/4MF7su40HnEC58GiLos3GSDOpSB0=; b=d1nH2uID/HUs8lIRkL6KpAcK8wfksD+l5I6uO1sywoNGq5RGZPdJAAV2qggDEOTUPn LOaKAdAm5cZivXBSxg0kY+a11pK7u1/xnWlS2NmV4h6vkqcm3JBip6jJhNTu99SSJOed lXjav8RC39BX1ZohB2h9S5cXHa5E+Bl4oKjLr/6pGBJFdswwjZSRS9ZdcatW4ncFTEMX FFFNWKLDM0RSkd++ZzD6zNrUtnkGF9U52NNORgfS5RLy2WiutBUygDiRIhDsiPjqQ29y wprH15aT7/GuKzQjFuV3orc4Yvr3JvcIyLI4WOCg9+R3GS6JKOK2CcM9s8gUZ5mXsHnB dYAw== X-Gm-Message-State: AOAM532tZVcShO46xjpfZJ6dOI7MyscqTyjCZbxbKuxLIToovt3SXB6a K+ZJEMbogZ7ykelBqnVxKsqt7A== X-Google-Smtp-Source: ABdhPJzl/VNF2EL88mLIx2rMLW2mq+gnAuMAuaM9Z9wIC/z8TPwyxf89HSI5ouhpW54CVUefGSGcgg== X-Received: by 2002:ac8:5ccc:: with SMTP id s12mr27284558qta.217.1625977059223; Sat, 10 Jul 2021 21:17:39 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id i9sm3677174qtp.50.2021.07.10.21.17.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 10 Jul 2021 21:17:38 -0700 (PDT) Date: Sat, 10 Jul 2021 21:17:27 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: Igor Raits cc: linux-mm@kvack.org, Andrew Morton , Hillf Danton Subject: Re: kernel BUG at include/linux/swapops.h:204! In-Reply-To: Message-ID: <4c9e24db-29d5-5bbb-17ae-8dc32ceb66ed@google.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 343A3E00180A X-Stat-Signature: 8bhau9m7y116ji8xnd5c3ctz8prxqp15 Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b="Y24/DVPe"; spf=pass (imf30.hostedemail.com: domain of hughd@google.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1625977060-475716 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, 10 Jul 2021, Igor Raits wrote: > Hello, > > I've seen one weird bug on 5.12.14 that happened a couple of times when I > started a bunch of VMs on a server. Would it be possible for you to try the same on a 5.12.13 kernel? Perhaps by reverting the diff between 5.12.13 and 5.12.14 temporarily. Enough to form an impression of whether the issue is new in 5.12.14. I ask because 5.12.14 did include several fixes and cleanups from me to page_vma_mapped_walk(), and that is involved in inserting and removing pmd migration entries. I am not aware of introducing any bug there, but your report has got me worried. If it's happening in 5.12.14 but not in 5.12.13, then I must look again at my changes. I don't expect Hillf's patch to help at at all: the pmd_lock() is supposed to be taken by page_vma_mapped_walk(), before set_pmd_migration_entry() and remove_migration_pmd() are called. Thanks, Hugh > > I've briefly googled this problem but could not find any relevant commit > that would fix this issue. > > Do you have any hint how to debug this further or know the fix by any > chance? > > Thanks in advance. Stack trace following: > > [ 376.876610] ------------[ cut here ]------------ > [ 376.881274] kernel BUG at include/linux/swapops.h:204! > [ 376.886455] invalid opcode: 0000 [#1] SMP NOPTI > [ 376.891014] CPU: 40 PID: 11775 Comm: rpc-worker Tainted: G E > 5.12.14-1.gdc.el8.x86_64 #1 > [ 376.900464] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 > Gen10, BIOS U30 05/24/2021 > [ 376.909038] RIP: 0010:pmd_migration_entry_wait+0x132/0x140 > [ 376.914562] Code: 02 00 00 00 5b 4c 89 c7 5d e9 8a e4 f6 ff 48 81 e2 00 > f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44 ff ff ff > <0f> 0b 48 8b 2d 75 bd 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41 55 48 > [ 376.933443] RSP: 0000:ffffb65a5e1cfdc8 EFLAGS: 00010246 > [ 376.938701] RAX: 0017ffffc0000000 RBX: ffff908b8ecabaf8 RCX: > ffffffffffffffff > [ 376.945878] RDX: 0000000000000000 RSI: ffff908b8ecabaf8 RDI: > fffff497473b2ae8 > [ 376.953055] RBP: fffff497473b2ae8 R08: fffff49747fa8080 R09: > 0000000000000000 > [ 376.960230] R10: 0000000000000000 R11: 0000000000000000 R12: > 0000000000000af8 > [ 376.967407] R13: 0400000000000000 R14: 0400000000000080 R15: > ffff908bbef7b6a8 > [ 376.974582] FS: 00007f5bb1f81700(0000) GS:ffff90e87fd80000(0000) > knlGS:0000000000000000 > [ 376.982718] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 376.988497] CR2: 00007f5b2bfffd98 CR3: 00000001f793e006 CR4: > 00000000007726e0 > [ 376.995673] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 377.002849] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > 0000000000000400 > [ 377.010026] PKRU: 55555554 > [ 377.012745] Call Trace: > [ 377.015207] __handle_mm_fault+0x5ad/0x6e0 > [ 377.019335] handle_mm_fault+0xc5/0x290 > [ 377.023194] do_user_addr_fault+0x1cd/0x740 > [ 377.027406] exc_page_fault+0x54/0x110 > [ 377.031182] ? asm_exc_page_fault+0x8/0x30 > [ 377.035307] asm_exc_page_fault+0x1e/0x30 > [ 377.039340] RIP: 0033:0x7f5bb91d6734 > [ 377.042937] Code: 89 08 48 8b 35 dd 3b 21 00 4c 8d 0d d6 3b 21 00 31 c0 > 4c 39 ce 74 73 0f 1f 80 00 00 00 00 48 8d 96 40 fd ff ff 49 39 d2 74 22 > <48> 8b 96 d8 03 00 00 48 01 15 4e 7c 21 00 80 be 50 03 00 00 00 c7 > [ 377.061820] RSP: 002b:00007f5bb1f7ff58 EFLAGS: 00010206 > [ 377.067076] RAX: 0000000000000000 RBX: 0000000000000000 RCX: > 00007f5ba0000020 > [ 377.074255] RDX: 00007f5b2bfff700 RSI: 00007f5b2bfff9c0 RDI: > 0000000000000001 > [ 377.081429] RBP: 0000000000000001 R08: 0000000000000000 R09: > 00007f5bb93ea2f0 > [ 377.088606] R10: 00007f5bb1f81700 R11: 0000000000000202 R12: > 0000000000000001 > [ 377.095782] R13: 0000000000000006 R14: 0000000000000cb4 R15: > 00007f5bb1f801f0 > [ 377.102958] Modules linked in: ebt_arp(E) nft_meta_bridge(E) > ip6_tables(E) xt_CT(E) nf_log_ipv4(E) nf_log_common(E) nft_limit(E) > nft_counter(E) xt_LOG(E) xt_limit(E) xt_mac(E) xt_set(E) xt_multiport(E) > xt_state(E) xt_conntrack(E) xt_comment(E) xt_physdev(E) nft_compat(E) > ip_set_hash_net(E) ip_set(E) vhost_net(E) vhost(E) vhost_iotlb(E) tap(E) > tun(E) tcp_diag(E) udp_diag(E) inet_diag(E) netconsole(E) nf_tables(E) > vxlan(E) ip6_udp_tunnel(E) udp_tunnel(E) nfnetlink(E) binfmt_misc(E) > iscsi_tcp(E) libiscsi_tcp(E) 8021q(E) garp(E) mrp(E) bonding(E) tls(E) > vfat(E) fat(E) dm_service_time(E) dm_multipath(E) rpcrdma(E) sunrpc(E) > rdma_ucm(E) ib_srpt(E) ib_isert(E) iscsi_target_mod(E) target_core_mod(E) > ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) libiscsi(E) scsi_transport_iscsi(E) > intel_rapl_msr(E) qedr(E) intel_rapl_common(E) ib_uverbs(E) > isst_if_common(E) ib_core(E) nfit(E) libnvdimm(E) x86_pkg_temp_thermal(E) > intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) > crct10dif_pclmul(E) > [ 377.102999] crc32_pclmul(E) ghash_clmulni_intel(E) rapl(E) > intel_cstate(E) ipmi_ssif(E) acpi_ipmi(E) ipmi_si(E) mei_me(E) ioatdma(E) > ipmi_devintf(E) dm_mod(E) ses(E) intel_uncore(E) pcspkr(E) qede(E) > enclosure(E) tg3(E) mei(E) lpc_ich(E) hpilo(E) hpwdt(E) > intel_pch_thermal(E) dca(E) ipmi_msghandler(E) acpi_power_meter(E) ext4(E) > mbcache(E) jbd2(E) sd_mod(E) t10_pi(E) sg(E) qedf(E) qed(E) crc8(E) > libfcoe(E) libfc(E) smartpqi(E) scsi_transport_fc(E) scsi_transport_sas(E) > wmi(E) nf_conntrack(E) nf_defrag_ipv6(E) libcrc32c(E) crc32c_intel(E) > nf_defrag_ipv4(E) br_netfilter(E) bridge(E) stp(E) llc(E) > [ 377.243468] ---[ end trace 04bce3bb051f7620 ]--- > [ 377.385645] RIP: 0010:pmd_migration_entry_wait+0x132/0x140 > [ 377.391194] Code: 02 00 00 00 5b 4c 89 c7 5d e9 8a e4 f6 ff 48 81 e2 00 > f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44 ff ff ff > <0f> 0b 48 8b 2d 75 bd 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41 55 48 > [ 377.410091] RSP: 0000:ffffb65a5e1cfdc8 EFLAGS: 00010246 > [ 377.415355] RAX: 0017ffffc0000000 RBX: ffff908b8ecabaf8 RCX: > ffffffffffffffff > [ 377.422540] RDX: 0000000000000000 RSI: ffff908b8ecabaf8 RDI: > fffff497473b2ae8 > [ 377.429721] RBP: fffff497473b2ae8 R08: fffff49747fa8080 R09: > 0000000000000000 > [ 377.436902] R10: 0000000000000000 R11: 0000000000000000 R12: > 0000000000000af8 > [ 377.444086] R13: 0400000000000000 R14: 0400000000000080 R15: > ffff908bbef7b6a8 > [ 377.451272] FS: 00007f5bb1f81700(0000) GS:ffff90e87fd80000(0000) > knlGS:0000000000000000 > [ 377.459415] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 377.465196] CR2: 00007f5b2bfffd98 CR3: 00000001f793e006 CR4: > 00000000007726e0 > [ 377.472377] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 377.479556] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > 0000000000000400 > [ 377.486738] PKRU: 55555554 > [ 377.489465] Kernel panic - not syncing: Fatal exception > [ 377.573911] Kernel Offset: 0xa000000 from 0xffffffff81000000 (relocation > range: 0xffffffff80000000-0xffffffffbfffffff) > [ 377.716482] ---[ end Kernel panic - not syncing: Fatal exception ]--- >