linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Yu Zhang <yu.c.zhang@linux.intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Eric Li <ercli@ucdavis.edu>, David Matlack <dmatlack@google.com>,
	Oliver Upton <oupton@google.com>
Subject: Re: [PATCH v5 05/15] KVM: nVMX: Let userspace set nVMX MSR to any _host_ supported value
Date: Thu, 3 Nov 2022 16:53:11 +0000	[thread overview]
Message-ID: <Y2Px90RQydMUoiRH@google.com> (raw)
In-Reply-To: <20221102085414.fk2xss74jvtzs6mr@linux.intel.com>

On Wed, Nov 02, 2022, Yu Zhang wrote:
> On Tue, Nov 01, 2022 at 05:58:21PM +0000, Sean Christopherson wrote:
> > On Tue, Nov 01, 2022, Yu Zhang wrote:
> > > What I observed is that vmx->nested.msrs.secondary_ctls_high will be changed
> > > in vmx_adjust_secondary_exec_control(), which can be triggered after cpuid is
> > > set. 
> > 
> > Ugh, that path got overlooked when we yanked out KVM's manipulaton of VMX MSRs
> > in response to guest CPUID changes.  I wonder if we can get away with changing
> > KVM's behavior to only ensure a feature isn't exposed to L2 when it's not exposed
> > to L1.
> > 
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 6b4266e949a3..cfc35d559d91 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -4523,8 +4523,8 @@ vmx_adjust_secondary_exec_control(struct vcpu_vmx *vmx, u32 *exec_control,
> >          * Update the nested MSR settings so that a nested VMM can/can't set
> >          * controls for features that are/aren't exposed to the guest.
> >          */
> > -       if (nested) {
> > -               if (enabled)
> > +       if (nested && !enabled)
> > +               if (exiting)
> >                         vmx->nested.msrs.secondary_ctls_high |= control;
> >                 else
> >                         vmx->nested.msrs.secondary_ctls_high &= ~control;
> > 
> 
> Indeed, this change can make sure a feature won't be exposed to L2, if L1
> does not have it.

No, that's not the goal of the change.  KVM already hides features in the VMX MSRs
if the base feature is not supported in L1 according to guest CPUID.  The problem
is that, currently, KVM also _forces_ features to be enabled in the VMX MSRs when
the base feature IS supported in L1 (CPUID).

Ideally, KVM should NEVER manipulate VMX MSRs in response to guest CPUID changes.
That's what I was referring to earlier by commits:

  8805875aa473 ("Revert "KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled"")
  9389d5774aca ("Revert "KVM: nVMX: Expose load IA32_PERF_GLOBAL_CTRL"")

E.g. if userspace sets VMX MSRs and then sets guest CPUID, KVM will override the
nVMX CPU model defined by userspace.  The scenario where userspace hides a "base"
feature but exposes the feature in the VMX MSRs is nonsensical, which is why I
think KVM can likely get away with force-disabling features.

The reverse is completely legitimate though: hiding a feature in VMX MSRs even if
the base feature is supported for L1, i.e. disallowing L1 from enabling the feature
in L2, is something that real VMMs will actually do, e.g. if the user doesn't trust
that KVM correctly handles all aspects of nested virtualization for the feature.

In other words, the behavior you're observing, where vmx->nested.msrs.secondary_ctls_high
is changed by vmx_adjust_secondary_exec_control(), is a completely separate bug
than the one below.

> But for the feature bits that L1 has, yet cleared from the
> vmcs_conf->nested.msrs.secondary_ctls_high in nested_vmx_setup_ctls_msrs(),
> there's no chance for userspace vmm to reset it again.
> 
> Well, I am not suggesting to give userspace vmm such permission(which I believe
> is incorrect). And IIUC, vmcs_conf->nested.msrs.secondary_ctls_high shall serve
> as a template to initialize vmx->nested.msrs.secondary_ctls_high. So maybe we
> shall not mask off some features in nested_vmx_setup_ctls_msrs() at the beginning.
>  
> > > Since KVM's config(vmcs_config->nested.secondary_ctls_high) is done during init
> > > by nested_vmx_setup_ctls_msrs(), which only kept a subset of the flags from the
> > > vmcs_confg->cpu_based_2nd_exec_ctrl, the vmx_restore_control_msr() could fail
> > > later, when userspace VMM tries to enable a feature(the only one I witnessed is
> > > SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE) by setting MSR_IA32_VMX_PROCBASED_CTLS2.
> > > Because the vmx->nested.msrs.secondary_ctls_high is updated by cpuid, but this
> > > bit is not taken from vmcs_conf->cpu_based_2nd_exec_ctrl by nested_vmx_setup_ctls_msrs()
> > > for vmcs_config->nested.secondary_ctls_high.
> > > 
> > > The failure can be fixed, simply by adding SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE in
> > > nested_vmx_setup_ctls_msrs(), e.g.
> > 
> > Assuming KVM actually supports user wait/pause in L2, this is an orthogonal bug
> > to the CPUID-based manipulation above.  KVM simply neglects to advertise to userspace
> > that ENABLE_USR_WAIT_PAUSE is supported for nested virtualization.
> > 
> > If KVM doesn't correctly support virtualizing user wait/pause for L2, then the
> > correct location to fix this is in vmx_secondary_exec_control().
> > 
> 
> Sorry, why vmx_secondary_exec_control()?

You missed the qualifier:

  If KVM doesn't correctly support virtualizing user wait/pause for L2

If KVM should NOT be exposing ENABLE_USR_WAIT_PAUSE to the L1 VMM, then NOT
propagating the feature to msrs->secondary_ctls_low is correct.  And if that's
the case, then vmx_secondary_exec_control() needs to be modified so that it does
NOT set ENABLE_USR_WAIT_PAUSE in vmx->nested.msrs.secondary_ctls_high.

> Could we just change nested_vmx_setup_ctls_msrs() like below:

If KVM correctly virtualizes the feature in a nested scenario, yes.  I haven't
looked into ENABLE_USR_WAIT_PAUSE enough to know whether or not KVM gets the
nested virtualization pieces correct, hence the above qualifier.

> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 0c62352dda6a..fa255391718c 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -6791,13 +6791,7 @@ void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
>         msrs->procbased_ctls_low &=
>                 ~(CPU_BASED_CR3_LOAD_EXITING | CPU_BASED_CR3_STORE_EXITING);
> 
> -       /*
> -        * secondary cpu-based controls.  Do not include those that
> -        * depend on CPUID bits, they are added later by
> -        * vmx_vcpu_after_set_cpuid.
> -        */
>         msrs->secondary_ctls_low = 0;
> -
>         msrs->secondary_ctls_high = vmcs_conf->cpu_based_2nd_exec_ctrl;
>         msrs->secondary_ctls_high &=
>                 SECONDARY_EXEC_DESC |
> @@ -6810,7 +6804,8 @@ void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
>                 SECONDARY_EXEC_ENABLE_INVPCID |
>                 SECONDARY_EXEC_RDSEED_EXITING |
>                 SECONDARY_EXEC_XSAVES |
> -               SECONDARY_EXEC_TSC_SCALING;
> +               SECONDARY_EXEC_TSC_SCALING |
> +               SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE;
> 
>         /*
>          * We can emulate "VMCS shadowing," even if the hardware

  reply	other threads:[~2022-11-03 16:53 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-07 21:35 [PATCH v5 00/15] KVM: nVMX: VMX MSR quirk+fixes, CR4 fixes Sean Christopherson
2022-06-07 21:35 ` [PATCH v5 01/15] KVM: x86: Split kvm_is_valid_cr4() and export only the non-vendor bits Sean Christopherson
2022-06-07 21:35 ` [PATCH v5 02/15] KVM: nVMX: Account for KVM reserved CR4 bits in consistency checks Sean Christopherson
2022-06-07 21:35 ` [PATCH v5 03/15] KVM: nVMX: Inject #UD if VMXON is attempted with incompatible CR0/CR4 Sean Christopherson
2022-06-07 21:35 ` [PATCH v5 04/15] KVM: nVMX: Rename handle_vm{on,off}() to handle_vmx{on,off}() Sean Christopherson
2022-06-07 21:35 ` [PATCH v5 05/15] KVM: nVMX: Let userspace set nVMX MSR to any _host_ supported value Sean Christopherson
2022-10-31 16:39   ` Yu Zhang
2022-10-31 17:11     ` Sean Christopherson
2022-11-01 10:18       ` Yu Zhang
2022-11-01 17:58         ` Sean Christopherson
2022-11-02  8:54           ` Yu Zhang
2022-11-03 16:53             ` Sean Christopherson [this message]
2022-11-07  8:28               ` Yu Zhang
2022-11-07 15:06                 ` Sean Christopherson
2022-11-08 10:21                   ` Yu Zhang
2022-11-08 18:35                     ` Sean Christopherson
2022-11-10  8:44                       ` Yu Zhang
2022-11-10 16:08                         ` Sean Christopherson
2022-06-07 21:35 ` [PATCH v5 06/15] KVM: nVMX: Keep KVM updates to BNDCFGS ctrl bits across MSR write Sean Christopherson
2022-07-22  9:06   ` Paolo Bonzini
2022-06-07 21:35 ` [PATCH v5 07/15] KVM: VMX: Add helper to check if the guest PMU has PERF_GLOBAL_CTRL Sean Christopherson
2022-06-07 21:35 ` [PATCH v5 08/15] KVM: nVMX: Keep KVM updates to PERF_GLOBAL_CTRL ctrl bits across MSR write Sean Christopherson
2022-06-07 21:35 ` [PATCH v5 09/15] KVM: nVMX: Drop nested_vmx_pmu_refresh() Sean Christopherson
2022-06-07 21:35 ` [PATCH v5 10/15] KVM: nVMX: Add a quirk for KVM tweaks to VMX MSRs Sean Christopherson
2022-06-07 21:36 ` [PATCH v5 11/15] KVM: nVMX: Set UMIP bit CR4_FIXED1 MSR when emulating UMIP Sean Christopherson
2022-07-22  9:49   ` Paolo Bonzini
2022-06-07 21:36 ` [PATCH v5 12/15] KVM: nVMX: Extend VMX MSRs quirk to CR0/4 fixed1 bits Sean Christopherson
2022-07-22  9:50   ` Paolo Bonzini
2022-06-07 21:36 ` [PATCH v5 13/15] KVM: selftests: Add test to verify KVM's VMX MSRs quirk for controls Sean Christopherson
2022-06-07 21:36 ` [PATCH v5 14/15] KVM: selftests: Extend VMX MSRs test to cover CR4_FIXED1 (and its quirks) Sean Christopherson
2022-06-07 21:36 ` [PATCH v5 15/15] KVM: selftests: Verify VMX MSRs can be restored to KVM-supported values Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y2Px90RQydMUoiRH@google.com \
    --to=seanjc@google.com \
    --cc=dmatlack@google.com \
    --cc=ercli@ucdavis.edu \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oupton@google.com \
    --cc=pbonzini@redhat.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=yu.c.zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).