From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-20.4 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FSL_HELO_FAKE, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F6DCC11F67 for ; Tue, 13 Jul 2021 20:19:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 36EE26135A for ; Tue, 13 Jul 2021 20:19:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235075AbhGMUWf (ORCPT ); Tue, 13 Jul 2021 16:22:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39934 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234615AbhGMUWe (ORCPT ); Tue, 13 Jul 2021 16:22:34 -0400 Received: from mail-pf1-x42d.google.com (mail-pf1-x42d.google.com [IPv6:2607:f8b0:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 58949C0613E9 for ; Tue, 13 Jul 2021 13:19:44 -0700 (PDT) Received: by mail-pf1-x42d.google.com with SMTP id j9so14966489pfc.5 for ; Tue, 13 Jul 2021 13:19:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=ApntNZUKGH4ouQpQjiCPIANirQV7gTDdHNRaTDbpwkI=; b=AtOjI9/bDSq3Rjimdi/lTu2X5RJ4USrs8ebWyNH6X63y3bZrp7837V6sQUvuLcTyv4 VtyVHTzRHTl86zcyXa04Ko/aPaXLX6RHfoKq2rUhU1nfHFkpeJ1f9s1Pm7dy4Kt+5eWC d/o1ycJX6wU5/3PgMFHVozAtyBn/DqjG4CJQ1qYHhmDQwQkkLEh+O1DIW2xGP7GajfKz VAbEY3TpcAlWdQUseO/sWsama1xVFLkxjWboWUPLQ7QhkpkcriDYWIocbkSwF8sED/y7 CVH15vKlve4Eg7/PKKCrWivT0rsM7+3fCWxadkYZwDxbO5UkZNrMXgV1/JYZdMTeRb2d YT9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=ApntNZUKGH4ouQpQjiCPIANirQV7gTDdHNRaTDbpwkI=; b=Ej6I+Hru2ttTvQAer3/3wfDsRI39j2jO8t5QJkw3/haYvTLD6np+aGIBvHL2Zr7pgt wTBV/PIBigCcrG15yQBc2aRsYxLLYLiMvnFSqzSkZ/l0vGfZu0zUEm/ds3/WxbjkB2HW TLidyipC1hMi4Bfyif6IH4JMbcltAO/3A2zmMC40mZ78Aa5OKP/C6e/1SUD8txxAbFRH WZcuEvmBQxIBNG/EDKvs1Wt//qRzOqgTWGuVYSNW6aReGhNylwsQZsrS7dQCcDqt1Rr4 h6Dku75hzGjs2daUCKw7me2Dc7pFleDMDFZGIzsuMwu163NDCwiB88+8lgUmX+lS+bAX MzSg== X-Gm-Message-State: AOAM5336Fa+9BVZ9EYIXkdP2sLp8o4A9cdSf5KfYiYdOpWkCRa46YBSA ufnbyzxndVl9259sEtLnphIbYw== X-Google-Smtp-Source: ABdhPJyPy5xSbrdNySHocck9/UIDyPfB6Nz/gRyKMhpC9oXDBRaudgO55tcL6k+8sEayiyWLkUR7KA== X-Received: by 2002:a65:56ca:: with SMTP id w10mr5732194pgs.107.1626207583582; Tue, 13 Jul 2021 13:19:43 -0700 (PDT) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id k189sm23339698pgk.14.2021.07.13.13.17.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Jul 2021 13:18:11 -0700 (PDT) Date: Tue, 13 Jul 2021 20:17:10 +0000 From: Sean Christopherson To: Paolo Bonzini Cc: isaku.yamahata@intel.com, Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H . Peter Anvin" , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , erdemaktas@google.com, Connor Kuehl , x86@kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, isaku.yamahata@gmail.com, Sean Christopherson Subject: Re: [RFC PATCH v2 16/69] KVM: x86/mmu: Zap only leaf SPTEs for deleted/moved memslot by default Message-ID: References: <78d02fee3a21741cc26f6b6b2fba258cd52f2c3c.1625186503.git.isaku.yamahata@intel.com> <3ef7f4e7-cfda-98fe-dd3e-1b084ef86bd4@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3ef7f4e7-cfda-98fe-dd3e-1b084ef86bd4@redhat.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 06, 2021, Paolo Bonzini wrote: > On 03/07/21 00:04, isaku.yamahata@intel.com wrote: > > From: Sean Christopherson > > > > Zap only leaf SPTEs when deleting/moving a memslot by default, and add a > > module param to allow reverting to the old behavior of zapping all SPTEs > > at all levels and memslots when any memslot is updated. > > > > Signed-off-by: Sean Christopherson > > Signed-off-by: Isaku Yamahata > > --- > > arch/x86/kvm/mmu/mmu.c | 21 ++++++++++++++++++++- > > 1 file changed, 20 insertions(+), 1 deletion(-) > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index 8d5876dfc6b7..5b8a640f8042 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > > @@ -85,6 +85,9 @@ __MODULE_PARM_TYPE(nx_huge_pages_recovery_ratio, "uint"); > > static bool __read_mostly force_flush_and_sync_on_reuse; > > module_param_named(flush_on_reuse, force_flush_and_sync_on_reuse, bool, 0644); > > +static bool __read_mostly memslot_update_zap_all; > > +module_param(memslot_update_zap_all, bool, 0444); > > + > > /* > > * When setting this variable to true it enables Two-Dimensional-Paging > > * where the hardware walks 2 page tables: > > @@ -5480,11 +5483,27 @@ static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm) > > return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages)); > > } > > +static void kvm_mmu_zap_memslot(struct kvm *kvm, struct kvm_memory_slot *slot) > > +{ > > + /* > > + * Zapping non-leaf SPTEs, a.k.a. not-last SPTEs, isn't required, worst > > + * case scenario we'll have unused shadow pages lying around until they > > + * are recycled due to age or when the VM is destroyed. > > + */ > > + write_lock(&kvm->mmu_lock); > > + slot_handle_level(kvm, slot, kvm_zap_rmapp, PG_LEVEL_4K, > > + KVM_MAX_HUGEPAGE_LEVEL, true); > > + write_unlock(&kvm->mmu_lock); > > +} > > + > > static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm, > > struct kvm_memory_slot *slot, > > struct kvm_page_track_notifier_node *node) > > { > > - kvm_mmu_zap_all_fast(kvm); > > + if (memslot_update_zap_all) > > + kvm_mmu_zap_all_fast(kvm); > > + else > > + kvm_mmu_zap_memslot(kvm, slot); > > } > > void kvm_mmu_init_vm(struct kvm *kvm) > > > > This is the old patch that broke VFIO for some unknown reason. Yes, my white whale :-/ > The commit message should at least say why memslot_update_zap_all is not true > by default. Also, IIUC the bug still there with NX hugepage splits disabled, I strongly suspect the bug is also there with hugepage splits enabled, it's just masked and/or harder to hit. > but what if the TDP MMU is enabled? This should not be a module param. IIRC, the original code I wrote had it as a per-VM flag that wasn't even exposed to the user, i.e. TDX guests always do the partial flush and non-TDX guests always do the full flush. I think that's the least awful approach if we can't figure out the underlying bug before TDX is ready for inclusion.