From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 536DDC433EF for ; Fri, 22 Oct 2021 00:49:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 36B6261373 for ; Fri, 22 Oct 2021 00:49:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232261AbhJVAvw (ORCPT ); Thu, 21 Oct 2021 20:51:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40154 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231939AbhJVAvt (ORCPT ); Thu, 21 Oct 2021 20:51:49 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DAF5EC061766 for ; Thu, 21 Oct 2021 17:49:32 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id c84-20020a251c57000000b005c131bb8967so1130162ybc.9 for ; Thu, 21 Oct 2021 17:49:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=reply-to:date:in-reply-to:message-id:mime-version:references :subject:from:to:cc; bh=zAlLHoW+/UWSIgBF8f8U1vIlbDY/y6L+gJgFXs+sPxE=; b=S0RhVxjB1Wo/yOzZTb5f1CnLnkAkLr79rT8KMKgkbmdFqPccf7mZHnYMsbmqvFjobo S7yA3+EaYGrJDQ/4Ffalnrtg7UpVhEQNjWrqgyOqXTo2Gieryyap7jLeqFqvPdrTLz8S vCq60ymXOF+5l40jvwyNWAnrn37kUdN2nrz2vSzwp2jX9LsEsi4eyos6ccB2+rtdd2Kg ZDvMnA0ge9QhzlaWVIRS7mH5i7tAcybZRqpQ05wMFc6DG6XTYUePut/lIXkvs2hb1grf L9z/t0RuawOJBRsAnOY5VLeMmJLUvrVl/HhoXUM+SayQx5T3tgi/3pOSXbEctiM2ZkSk IsxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:reply-to:date:in-reply-to:message-id :mime-version:references:subject:from:to:cc; bh=zAlLHoW+/UWSIgBF8f8U1vIlbDY/y6L+gJgFXs+sPxE=; b=4pCAQBnIJ4lSqrxrJQscvzXC+H+2ro6Pq1uR7SsvoF8f3PDyFeGeEIU5pBjMmYhuc8 MIh6GkSTShue1qnXA0iWI8DznrUxYUMymOeU/XtKsc/xI+VJsIHLcWBHCOgCJiJwJmPR xsi8bk3+s0B0hJGpPBnGp8QZp1C/Ls0WyAH56i/izj35qnbqPBeOUbemfL3OBh8YzGda b+sioXFQpeKQKUguPUTcd6o1/lYVRCBbZgvHFfrUwCuZsPrHrdi4wytK0Jk/fyFTNf3e mueLEfS66ALWlFvVloffMLCXG3CPY3pWl74FiWDvKz1yVu5RxXVLYos1cCM9bQm6j71A bs+g== X-Gm-Message-State: AOAM533Uiam1rMYpGEtzsxaSYfJBgbUY95t6KwcFn5n7op0WXyrgeNIB WMIirbl9/C3kUVK/m0u+Vn24nOwdNIw= X-Google-Smtp-Source: ABdhPJzE8/0GOKsH+J1FGlagWki612oU7RRmK0uz/ACvkjgw7Ou4i08OidFCoVSNcOcJ4KDQWZ/pr94HamE= X-Received: from seanjc798194.pdx.corp.google.com ([2620:15c:90:200:db63:c8c0:4e69:449d]) (user=seanjc job=sendgmr) by 2002:a5b:cce:: with SMTP id e14mr6917124ybr.486.1634863772008; Thu, 21 Oct 2021 17:49:32 -0700 (PDT) Reply-To: Sean Christopherson Date: Thu, 21 Oct 2021 17:49:24 -0700 In-Reply-To: <20211022004927.1448382-1-seanjc@google.com> Message-Id: <20211022004927.1448382-2-seanjc@google.com> Mime-Version: 1.0 References: <20211022004927.1448382-1-seanjc@google.com> X-Mailer: git-send-email 2.33.0.1079.g6e70778dc9-goog Subject: [PATCH v2 1/4] KVM: x86/mmu: Use vCPU's APICv status when handling APIC_ACCESS memslot From: Sean Christopherson To: Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Maxim Levitsky Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Query the vCPU's APICv state, not the overall VM's state, when handling a page fault that hit the APIC Access Page memslot. While the vCPU state can be stale with respect to the overall VM APICv state, it's at least coherent with respect to the page fault being handled, i.e. is the value of the vCPU's APICv enabling at the time of the fault. Using the VM's state is not functionally broken, but neither does it provide any meaningful advantages, and arguably retrying the faulting instruction, as will happen if the vCPU state is stale (see below), is semantically the desired behavior as it aligns with existing KVM MMU behavior where a change in VM state invalidates the context for handling a page fault. The vCPU state can be stale if a different vCPU (or other actor) toggles APICv state after the page fault exit from the guest, in which case KVM will attempt to handle the page fault prior to servicing the pending KVM_REQ_APICV_UPDATE. However, by design such a race is benign in all scenarios. If APICv is globally enabled but locally disabled, the page fault handler will go straight to emulation (emulation of the local APIC is ok even if APIC virtualization is enabled while the guest is running). If APICv is globally disabled but locally enabled, there are two possible scenarios, and in each scenario the final outcome is correct: KVM blocks all vCPUs until SPTEs for the APIC Access Page have been zapped. (1) the page fault acquires mmu_lock before the APICv update calls kvm_zap_gfn_range(). The page fault will install a "bad" SPTE, but that SPTE will never be consumed as (a) no vCPUs can be running in the guest due to the kick from KVM_REQ_APICV_UPDATE, and (b) the "bad" SPTE is guaranteed to be zapped by kvm_zap_gfn_range() before any vCPU re-enters the guest. Because KVM_REQ_APICV_UPDATE is raised before the VM state is update, all vCPUs will block on apicv_update_lock when attempting to service the request until the lock is dropped by the update flow, _after_ the SPTE is zapped. (2) the APICv update kvm_zap_gfn_range() acquires mmu_lock before the page fault. The page fault handler will bail due to the change in mmu_notifier_seq made by kvm_zap_gfn_range(). Arguably, KVM should not (attempt to) install a SPTE based on state that is knowingly outdated, but using the per-VM state suffers a similar flaw as not checking for a pending KVM_REQ_APICV_UPDATE means that KVM will either knowingly install a SPTE that will get zapped (page fault wins the race) or will get rejected (kvm_zap_gfn_range() wins the race). However, adding a check for KVM_REQ_APICV_UPDATE is extremely undesirable as it implies the check is meaningful/sufficient, which is not the case since the page fault handler doesn't hold mmu_lock when it checks APICv status, i.e. the mmu_notifier_seq mechanism is required for correctness. Cc: Maxim Levitsky Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 29 +++++++++++++++++++++++++++-- 1 file changed, 27 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index f9f228963088..6530d874f5b7 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3911,10 +3911,35 @@ static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, * If the APIC access page exists but is disabled, go directly * to emulation without caching the MMIO access or creating a * MMIO SPTE. That way the cache doesn't need to be purged - * when the AVIC is re-enabled. + * when the AVIC is re-enabled. Note, the vCPU's APICv state + * may be stale if an APICv update is in-progress, but KVM is + * guaranteed to operate correctly in all scenarios: + * + * 1. APICv is globally enabled but locally disabled. This + * vCPU will go straight to emulation without touching the + * MMIO cache or installing a SPTE. + * + * 2a. APICv is globally disabled but locally enabled, and this + * vCPU acquires mmu_lock before the APICv update calls + * kvm_zap_gfn_range(). This vCPU will install a bad SPTE, + * but the SPTE will never be consumed as (a) no vCPUs can + * be running due to the kick from KVM_REQ_APICV_UPDATE, + * and (b) the "bad" SPTE is guaranteed to be zapped by + * kvm_zap_gfn_range() before any vCPU re-enters the guest. + * Because KVM_REQ_APICV_UPDATE is raised before the VM + * state is update, vCPUs will block on apicv_update_lock + * when attempting to service the request until the lock is + * dropped by the update flow, _after_ the SPTE is zapped. + * + * 2b. APICv is globally disabled but locally enabled, and the + * APICv update kvm_zap_gfn_range() acquires mmu_lock before + * this vCPU. This vCPU will bail from the page fault + * handler due kvm_zap_gfn_range() bumping mmu_notifier_seq, + * and will retry the faulting instruction after servicing + * the KVM_REQ_APICV_UPDATE request. */ if (slot && slot->id == APIC_ACCESS_PAGE_PRIVATE_MEMSLOT && - !kvm_apicv_activated(vcpu->kvm)) { + !kvm_vcpu_apicv_active(vcpu)) { *r = RET_PF_EMULATE; return true; } -- 2.33.0.1079.g6e70778dc9-goog