From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D181EC3DA6F for ; Thu, 24 Aug 2023 00:30:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238908AbjHXAa0 (ORCPT ); Wed, 23 Aug 2023 20:30:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41024 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238905AbjHXAaF (ORCPT ); Wed, 23 Aug 2023 20:30:05 -0400 Received: from mx.ewheeler.net (mx.ewheeler.net [173.205.220.69]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5728FE7F for ; Wed, 23 Aug 2023 17:30:03 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by mx.ewheeler.net (Postfix) with ESMTP id 12A7E84; Wed, 23 Aug 2023 17:30:03 -0700 (PDT) X-Virus-Scanned: amavisd-new at ewheeler.net Received: from mx.ewheeler.net ([127.0.0.1]) by localhost (mx.ewheeler.net [127.0.0.1]) (amavisd-new, port 10024) with LMTP id ZMm3BS5N07xL; Wed, 23 Aug 2023 17:30:02 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx.ewheeler.net (Postfix) with ESMTPSA id E0BF539; Wed, 23 Aug 2023 17:30:01 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 mx.ewheeler.net E0BF539 Date: Wed, 23 Aug 2023 17:30:01 -0700 (PDT) From: Eric Wheeler To: Sean Christopherson cc: Amaan Cheval , brak@gameservers.com, kvm@vger.kernel.org Subject: Re: Deadlock due to EPT_VIOLATION In-Reply-To: Message-ID: <5da12792-1e5d-b89d-ea0-e1159c645568@ewheeler.net> References: <418345e5-a3e5-6e8d-395a-f5551ea13e2@ewheeler.net> <5fc6cea-9f51-582c-8bb3-21e0b4bf397@ewheeler.net> <468b1298-e43e-2397-5f3-4b6af6e2f461@ewheeler.net> <58f24fa2-a5f4-c59a-2bcf-c49f7bddc5b@ewheeler.net> <347d3526-7f8d-4744-2a9c-8240cc556975@ewheeler.net> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Wed, 23 Aug 2023, Sean Christopherson wrote: > On Wed, Aug 23, 2023, Eric Wheeler wrote: ... > > 22:23:31:481714 tid[142943] pid[142917] MAP @ rip ffffffffa43ce877 (1208 hits), gpa = cf7f000, hva = 7f6d24f7f000, pfn = 1ee50ee > > 22:23:31:481714 tid[142943] pid[142917] ITER @ rip ffffffffa43ce877 (1208 hits), gpa = cf7f000 (cf7f000), hva = 7f6d24f7f000, pfn = 1ee50ee, tdp_mmu = 1, role = 3784, count = 1 > > 22:23:31:481715 tid[142943] pid[142917] SPTE @ rip ffffffffa43ce877 (1208 hits), gpa = cf7f000, hva = 7f6d24f7f000, pfn = 1ee50ee > > 22:23:31:481716 tid[142943] pid[142917] MAP_RET @ rip ffffffffa43ce877 (1208 hits), gpa = cf7f000, hva = 7f6d24f7f000, pfn = 1ee50ee, ret = 4 > > This vCPU is making forward progress, it just happens to be taken lots of fualts > on a single RIP. MAP_RET's "ret = 4" means KVM did inded "fix" the fault, and > the faulting GPA/HVA is changing on every fault. Best guess is that the guest > is zeroing a hugepage, but the guest's 2MiB page is mapped with 4KiB EPT entries, > i.e. the vCPU is doing REP STOS on a 2MiB region. ... > > > 21:25:50:282726 tid[3484173] pid[3484149] FAULTIN_RET @ rip ffffffff814e6ca5 (92239 hits), gpa = 1343fa0b0, hva = 7feb409fa000 (7feb409fa000), flags = 0, pfn = 18d1d46, ret = 0 : MMU seq = 8002dc25, in-prog = 0, start = 7feacde61000, end = 7feacde62000 ... > > > 21:25:50:282354 tid[3484174] pid[3484149] FAULTIN @ rip ffffffff814e6ca5 (90073 hits), gpa = 1343fa0b0, hva = 7feb409fa000, flags = 0 : MMU seq = 8002dc25, in-prog = 0, start = 7feacde61000, end = 7feacde62000 > > > 21:25:50:282571 tid[3484174] pid[3484149] FAULTIN_RET @ rip ffffffff814e6ca5 (90121 hits), gpa = 1343fa0b0, hva = 7feb409fa000 (7feb409fa000), flags = 0, pfn = 18d1d46, ret = 0 : MMU seq = 8002dc25, in-prog = 0, start = 7feacde61000, end = 7feacde62000 ... > These vCPUs (which belong to the same VM) appear to be well and truly stuck. The > fact that you got prints from MAP, MAP_RET, ITER, and SPTE for an unrelated (and > not stuck) vCPU is serendipitious, as it all but guarantees that the trace is > "good", i.e. that MAP prints aren't missing because the bpf program is bad. > > Since the mmu_notifier info is stable (though the seq is still *insanely* high), > assuming there's no kernel memory corruption, that means that KVM is bailing > because is_page_fault_stale() returns true. Based on v5.15 being the last known > good kernel for you, I should note that v5.15 was speculative information that may not be 100% correct. However, the occurance of failures <= 5.15 is miniscule compared to >5.15, as almost all occurences are in 6.1.x (which could be biased by the number of 6.1.x deployments). It is entirely possible that the <= 5.15 issues are false-positives, but we don't have TBF on anything <6.1.30, so no kprobe traces. > that places the blame squarerly on commit > a955cad84cda ("KVM: x86/mmu: Retry page fault if root is invalidated by memslot update"). > Note, that commit had an unrelated but fixed by 18c841e1f411 ("KVM: x86: Retry > page fault if MMU reload is pending and root has no sp"), but all flavors of v6.1 > have said fix and the bug caused a crash, not stuck vCPUs. > > The "MMU seq = 8002dc25" value still gives me pause. It's one hell of a coincidence > that all stuck vCPUs have had a sequence counter of 0x800xxxxx. > >