From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2A1D3C4332F for ; Tue, 8 Nov 2022 18:35:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231391AbiKHSfa (ORCPT ); Tue, 8 Nov 2022 13:35:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48710 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230385AbiKHSf1 (ORCPT ); Tue, 8 Nov 2022 13:35:27 -0500 Received: from mail-pg1-x533.google.com (mail-pg1-x533.google.com [IPv6:2607:f8b0:4864:20::533]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 55FC12B247 for ; Tue, 8 Nov 2022 10:35:26 -0800 (PST) Received: by mail-pg1-x533.google.com with SMTP id f63so14133208pgc.2 for ; Tue, 08 Nov 2022 10:35:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=RnyS3dwzu63wm+TBBKa4ksB7XTmDC722z2f6RyUHWPs=; b=UNWTdZX009UhqhJ6UQ4R4+hp8SiQhxnMLXohooc/noo36cuTVOTPYeiHbI6l/If5If J99V6SvhqOUJdXGP1USoU7SQ4UsQc0kDpJnivKhYJ1Gb1/UnIQFqmdMGK7+nTTT+MRSJ vQAXN6CX7nJMHdk7l5MQFGieiB1Zx2apAH0MHUHKHnrjfzbpWmhVg9F9nCjWzW+8XYOU SBv/FsmQOASl+XIqep9c2B037c8W8HTTwUNLVCWmGH8stGR96ZExj8ipESF8GjxhIevV tdt7PlrDh92QBIH3Y28a8olvDACYlXbPkSwNq87XUXh94gRzQ+koxl8E3v5eQ1QW+pGb XQZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=RnyS3dwzu63wm+TBBKa4ksB7XTmDC722z2f6RyUHWPs=; b=o4jvESQ/IzUHyAExKSYZxdaT3WzVosQGSRzjmIRaKsbm5ARe1snUjONg1WnW2V/kjr yC9Y9+HOACpEvYegXpF7DAB0oimALEJz3QTeQKegpWM8NTavXUW6Hvba8HCKeIA780mC TJyxDLjs9tJ6cLdzmuC5520h4l1pmfOS1i2/wUYERSaOKsN3EHSC+tQ5oOSujofQ8k8Z jU3ZBwQ7g6kzWJ+zwfqyBZ7h4DLvwLbR3hdREV0+1XPChLl7VJGvFfZ53V7Q7QA36qz7 ZjgOXDWVQYPPa38c71juoPKBU8OONTBz5nRthjraCeRRG5j38gRqkHwhcUitA6vtcOIB divA== X-Gm-Message-State: ACrzQf3zmTexzWVUJCbxLkKVwpiXGZC2ZEXdCt7DmkbmmUsSpmcMroSV 7Humo13ebdvCh3ammoEc69wJyA== X-Google-Smtp-Source: AMsMyM5Qg8YfT/bsoUeJb0OK8RxgGNg/b35ucvUXf02E9o8w0JO152nFj4Gk4WJt4CUR2YrbDzSCfA== X-Received: by 2002:a63:84c6:0:b0:46f:f8b0:ba09 with SMTP id k189-20020a6384c6000000b0046ff8b0ba09mr33082105pgd.192.1667932525661; Tue, 08 Nov 2022 10:35:25 -0800 (PST) Received: from google.com (7.104.168.34.bc.googleusercontent.com. [34.168.104.7]) by smtp.gmail.com with ESMTPSA id n15-20020a170902d2cf00b0018544ad1e8esm7324982plc.238.2022.11.08.10.35.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Nov 2022 10:35:24 -0800 (PST) Date: Tue, 8 Nov 2022 18:35:21 +0000 From: Sean Christopherson To: Yu Zhang Cc: Paolo Bonzini , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Eric Li , David Matlack , Oliver Upton , Liu Jingqi Subject: Re: [PATCH v5 05/15] KVM: nVMX: Let userspace set nVMX MSR to any _host_ supported value Message-ID: References: <20220607213604.3346000-6-seanjc@google.com> <20221031163907.w64vyg5twzvv2nho@linux.intel.com> <20221101101801.zxcjswoesg2gltri@linux.intel.com> <20221102085414.fk2xss74jvtzs6mr@linux.intel.com> <20221107082714.fq3sw7qii4unlcn2@linux.intel.com> <20221108102120.qdlgqlgvdi6wi22u@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20221108102120.qdlgqlgvdi6wi22u@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 08, 2022, Yu Zhang wrote: > On Mon, Nov 07, 2022 at 03:06:51PM +0000, Sean Christopherson wrote: > > On Mon, Nov 07, 2022, Yu Zhang wrote: > > > On Thu, Nov 03, 2022 at 04:53:11PM +0000, Sean Christopherson wrote: > > > > Ideally, KVM should NEVER manipulate VMX MSRs in response to guest CPUID changes. > > > > That's what I was referring to earlier by commits: > > > > ... > > > > > Thanks Sean. Let me try to rephrase my understandings of your statement( > > > and pls feel free to correct me): > > > > > > 1> For now, what vmx_adjust_secondary_exec_control() does, is to enable/ > > > disable a feature in VMX MSR(and nVMX MSR) based on cpuid changes. > > > 2> What makes sense is, if a feature is > > > a. disabled by guest CPUID, it shall not be exposed in guest VMX MSR; > > > b. enabled by guest CPUID, it could be either exposed or hidden in > > > guest VMX MSR. > > > 3> So your previous change is to guarantee 2.a, and userspace VMM can choose > > > to follow follow either choices in 2.b(depending on whether it believes this > > > feature is correctly supported by KVM in nested). > > > > > > Is above understanding correct? > > > > Not quite. Again, in an ideal world, KVM would not modify the VMX MSRs based on > > guest CPUID. But it's possible userspace is relying on KVM to hide a feature from > > L2 if it's hidden from L1, so to avoid breaking an otherwise valide userspace config, > > it's worth enforcing that in KVM. > > > > Sorry, maybe I should understand this way: > > In theroy, KVM shall not modify guest VMX MSRs in response to the guest CPUID > updates. Therefore we shall not enforce the exposure of a feature in guest VMX > MSR, just because it is enabled in guest CPUID (e.g., userspace VMM can choose > to hide such feature so long as it believes KVM can not provide correct nested > support for this feature). > > But in reverse, it is not reasonable for userspace VMM to expose a feature in > guest VMX MSR settings, if such feature is disabled in this guest's CPUID. So > KVM shall help to make sure such feature is hidden when guest CPUID changes. No. Again, KVM _should never_ manipulate VMX MSRs in response to CPUID changes. Keeping the existing behavior would be done purely to maintain backwards compability with existing userspace, not because it's strictly the right thing to do. E.g. as a strawman, a weird userspace could do KVM_SET_MSRS => KVM_SET_CPUID => KVM_SET_CPUID, where the first KVM_SET_CPUID reset to a base config and the second KVM_SET_CPUID incorporates "optional" features. In that case, clearing bits in the VMX MSRs on the first KVM_SET_CPUID would do the wrong thing if the second KVM_SET_CPUID enabled the relevant features. AFAIK, no userspace actually does something odd like that, whereas there are VMMs that do KVM_SET_MSRS before KVM_SET_CPUID, e.g. disable a feature in VMX MSRs but later enable the feature in CPUID for L1. And so disabling features is likely safe-ish, but enabling feature most definitely can cause problems for userspace. Hrm, actually, there are likely older VMMs that never set VMX MSRs, and so dropping the "enable features" code might not be safe either. Grr. The obvious solution would be to add a quirk, but maybe we can avoid a quirk by skipping KVM's misguided updates if userspace has set the MSR. That should work for a userspace that deliberately sets the MSR during setup, and for a userspace that blindly migrates the MSR since the migrated value should already be correct/sane. E.g. diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h index 45162c1bcd8f..671479cd7721 100644 --- a/arch/x86/kvm/vmx/capabilities.h +++ b/arch/x86/kvm/vmx/capabilities.h @@ -51,6 +51,7 @@ struct nested_vmx_msrs { u64 cr4_fixed1; u64 vmcs_enum; u64 vmfunc_controls; + bool secondary_set_by_userspace; }; struct vmcs_config { diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 62e3967cf131..3f691ed169d8 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -1257,6 +1257,9 @@ vmx_restore_control_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data) if (!is_bitwise_subset(supported, data, GENMASK_ULL(63, 32))) return -EINVAL; + if (msr_index == MSR_IA32_VMX_PROCBASED_CTLS2) + vmx->nested.msrs.secondary_set_by_userspace = true; + vmx_get_control_msr(&vmx->nested.msrs, msr_index, &lowp, &highp); *lowp = data; *highp = data >> 32; diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index ab89755dce66..8aadaae5b81e 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -4523,7 +4523,7 @@ vmx_adjust_secondary_exec_control(struct vcpu_vmx *vmx, u32 *exec_control, * Update the nested MSR settings so that a nested VMM can/can't set * controls for features that are/aren't exposed to the guest. */ - if (nested) { + if (nested && !vmx->nested.msrs.secondary_set_by_userspace) { if (enabled) vmx->nested.msrs.secondary_ctls_high |= control; else > BTW, I found my previous understanding of what vmx_adjust_secondary_exec_control() > currently does was also wrong. It could also be used for EXITING controls. And > for such flags(e.g., SECONDARY_EXEC_RDRAND_EXITING), values for the nested settings > (vmx->nested.msrs.secondary_ctls_high) and for the L1 execution controls(*exec_control) > could be opposite. So the statement: > "1> For now, what vmx_adjust_secondary_exec_control() does, is to enable/ > disable a feature in VMX MSR(and nVMX MSR) based on cpuid changes." > is wrong. No, it's correct. The EXITING controls are just inverted feature flags. E.g. if RDRAND is disabled in CPUID, KVM sets the EXITING control so that KVM intercepts RDRAND in order to inject #UD. [EXIT_REASON_RDRAND] = kvm_handle_invalid_op,