From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6824AC43217 for ; Tue, 1 Nov 2022 17:58:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230017AbiKAR6a (ORCPT ); Tue, 1 Nov 2022 13:58:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50790 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229819AbiKAR60 (ORCPT ); Tue, 1 Nov 2022 13:58:26 -0400 Received: from mail-pl1-x631.google.com (mail-pl1-x631.google.com [IPv6:2607:f8b0:4864:20::631]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CFEC2644A for ; Tue, 1 Nov 2022 10:58:25 -0700 (PDT) Received: by mail-pl1-x631.google.com with SMTP id p3so14270993pld.10 for ; Tue, 01 Nov 2022 10:58:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=PSlM7tm98GJ6KQQBPng7urzfjnyPwOttoZ6TFnd/jVE=; b=tbiyQ/qH+nehgRPrhsXEOyMFfl0Nh6L1qRXhgwe2uAhYwwLIh+4vrx7SjEGwzAlgv+ 6bt1NpnPFq4ar1dxkygRW5xUck+xMUYcMqnOgiKbYR4L2jhUV1PqFMYC3Z0vzUXAjrTb DjrBH0VbMai3H9rAG0nv75s9+1te1RnvuTn3Ez37hKJwsW6htEBUfu7pMuHlCiaof7+X 0Lq3kzKd+Q3fpRxvVBn6JmQSHF8BhDOWcg1AuDQlDJc+cCjnGVSe8z0FrRhcwLK97D/v MhoQekh5MJAMQk7n65oo7JlM2mShK+IsRa1Ij562KAUi5WPU3pfVdWYFshMx+1gGT7Q2 7uew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=PSlM7tm98GJ6KQQBPng7urzfjnyPwOttoZ6TFnd/jVE=; b=CmHcsduhfWloyI4sjPV1yNHU5BcIjduSJGjH4UdL7gGOW7IrhDH7+qVns5zXtBC7PL WflP7pKJU0WlxOceljFIEQJl6vHkfqHPtrJfy/yd5uKZrpAXgMlRZsYDMD9eoKwMMKbP 5fx9hCtk3eDhszDCvlHmaYJvc/6z6IFWTsy92reSW919fbkkmi3/LZKj/zxDZiCufPh/ GDEU98McWM0LrT2ym5eElG6tejtLUQYBp3O4ROonFpIQRqfvkOZneedjYIC5YjnD37Z5 2PLgj6+cpCGNJlGEIWgdfV55jdOVSci2Sw1A7mnYKzJtR9cGTyPWAZdjDFjZXSZ+1xOm xXmA== X-Gm-Message-State: ACrzQf1fsgYeDnrOucJkM1yoLTNXQu3m1dlmunjs58N1KbQmTwJdqj00 s0vfa1W8kxcOK3/MCAQsyUv8Fg== X-Google-Smtp-Source: AMsMyM6ugcdQo4QoQpkqVPfIJKUqjKh5ecv1hn0GBLwlY/XMhuOE+o5JXOZAXwbLsTOMcDqatPuP4Q== X-Received: by 2002:a17:902:dac5:b0:186:a687:e082 with SMTP id q5-20020a170902dac500b00186a687e082mr20966918plx.84.1667325505243; Tue, 01 Nov 2022 10:58:25 -0700 (PDT) Received: from google.com (7.104.168.34.bc.googleusercontent.com. [34.168.104.7]) by smtp.gmail.com with ESMTPSA id bb8-20020a170902bc8800b00179f370dbe7sm6614237plb.287.2022.11.01.10.58.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Nov 2022 10:58:24 -0700 (PDT) Date: Tue, 1 Nov 2022 17:58:21 +0000 From: Sean Christopherson To: Yu Zhang Cc: Paolo Bonzini , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Eric Li , David Matlack , Oliver Upton Subject: Re: [PATCH v5 05/15] KVM: nVMX: Let userspace set nVMX MSR to any _host_ supported value Message-ID: References: <20220607213604.3346000-1-seanjc@google.com> <20220607213604.3346000-6-seanjc@google.com> <20221031163907.w64vyg5twzvv2nho@linux.intel.com> <20221101101801.zxcjswoesg2gltri@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20221101101801.zxcjswoesg2gltri@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 01, 2022, Yu Zhang wrote: > On Mon, Oct 31, 2022 at 05:11:10PM +0000, Sean Christopherson wrote: > > vmcs_config.nested never becomes out-of-date, it's read-only after __init (not > > currently marked as such, that will be remedied soon). > > > > The auditing performed by KVM is purely to guard against userspace enabling > > features that KVM doesn't support. KVM is not responsible for ensuring that the > > vCPU's CPUID model match the VMX MSR model. > > Do you mean the VMX MSR model shall not be changed after the cpuid updates? No, I mean that the virtual CPU model (CPUID + VMX MSRs) that is presented to the guest is the responsibility of host userspace. KVM only cares about not enabling bits/features that KVM doesn't supported. > And for VMX MSR model, do you mean the vmx->nested.msrs or the ones in > vmcs_config->nested? vmx->nested.msrs. vmcs_config->nested is effectively the VMX equivalent of KVM_GET_SUPPORTED_CPUID. > What I observed is that vmx->nested.msrs.secondary_ctls_high will be changed > in vmx_adjust_secondary_exec_control(), which can be triggered after cpuid is > set. Ugh, that path got overlooked when we yanked out KVM's manipulaton of VMX MSRs in response to guest CPUID changes. I wonder if we can get away with changing KVM's behavior to only ensure a feature isn't exposed to L2 when it's not exposed to L1. diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 6b4266e949a3..cfc35d559d91 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -4523,8 +4523,8 @@ vmx_adjust_secondary_exec_control(struct vcpu_vmx *vmx, u32 *exec_control, * Update the nested MSR settings so that a nested VMM can/can't set * controls for features that are/aren't exposed to the guest. */ - if (nested) { - if (enabled) + if (nested && !enabled) + if (exiting) vmx->nested.msrs.secondary_ctls_high |= control; else vmx->nested.msrs.secondary_ctls_high &= ~control; > Since KVM's config(vmcs_config->nested.secondary_ctls_high) is done during init > by nested_vmx_setup_ctls_msrs(), which only kept a subset of the flags from the > vmcs_confg->cpu_based_2nd_exec_ctrl, the vmx_restore_control_msr() could fail > later, when userspace VMM tries to enable a feature(the only one I witnessed is > SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE) by setting MSR_IA32_VMX_PROCBASED_CTLS2. > Because the vmx->nested.msrs.secondary_ctls_high is updated by cpuid, but this > bit is not taken from vmcs_conf->cpu_based_2nd_exec_ctrl by nested_vmx_setup_ctls_msrs() > for vmcs_config->nested.secondary_ctls_high. > > The failure can be fixed, simply by adding SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE in > nested_vmx_setup_ctls_msrs(), e.g. Assuming KVM actually supports user wait/pause in L2, this is an orthogonal bug to the CPUID-based manipulation above. KVM simply neglects to advertise to userspace that ENABLE_USR_WAIT_PAUSE is supported for nested virtualization. If KVM doesn't correctly support virtualizing user wait/pause for L2, then the correct location to fix this is in vmx_secondary_exec_control(). > > > Another question is about the setting of secondary_ctls_high in > > > nested_vmx_setup_ctls_msrs(). I saw there's a comment saying: > > > "Do not include those that depend on CPUID bits, they are > > > added later by vmx_vcpu_after_set_cpuid.". > > > > That's a stale comment, see the very next commit, 8805875aa473 ("Revert "KVM: nVMX: > > Do not expose MPX VMX controls when guest MPX disabled""), as well as the slightly > > later commit 9389d5774aca ("Revert "KVM: nVMX: Expose load IA32_PERF_GLOBAL_CTRL > > VM-{Entry,Exit} control""). > > > > So the comment can be and shall be removed, right? Yep. > > > if (cpu_has_vmx_vmfunc()) { > > > msrs->secondary_ctls_high |= > > > SECONDARY_EXEC_ENABLE_VMFUNC; > > > > This one is still required. KVM never enables VMFUNC for itself, i.e. it won't > > be set in KVM's VMCS configuration. > > > > My understanding is that, for VMFUNC, altough KVM does not support it, > SECONDARY_EXEC_ENABLE_VMFUNC is still set in the secondary proc-based > vm execution ctrol. KVM just uses different handlers for VMFUNC exits > from L1(to inject the #UD) and L2(to emulate the eptp switching). So > it doesn't matter if we do not clear this bit for vmcs_config->nested. > procbased_ctls_high. Ah, you're right, I didn't realize KVM enables VMFUNC in L1. Enabling VMFUNC for L1 is silly though, it's trivial to clear the feature in vmx_secondary_exec_control(). That said, enabling VMFUNC in vmcs01 is an orthogonal topic, and it _is_ indeed easier to keep the feature in the reference config. Now that the nested config is derived from the non-nested config, nested_vmx_setup_ctls_msrs() can do: diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 61a2e551640a..751cc67aa1a9 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -6811,7 +6811,8 @@ void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps) SECONDARY_EXEC_ENABLE_INVPCID | SECONDARY_EXEC_RDSEED_EXITING | SECONDARY_EXEC_XSAVES | - SECONDARY_EXEC_TSC_SCALING; + SECONDARY_EXEC_TSC_SCALING | + SECONDARY_EXEC_ENABLE_VMFUNC; /* * We can emulate "VMCS shadowing," even if the hardware @@ -6842,17 +6843,12 @@ void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps) } } - if (cpu_has_vmx_vmfunc()) { - msrs->secondary_ctls_high |= - SECONDARY_EXEC_ENABLE_VMFUNC; - /* - * Advertise EPTP switching unconditionally - * since we emulate it - */ - if (enable_ept) - msrs->vmfunc_controls = - VMX_VMFUNC_EPTP_SWITCHING; - } + /* + * Advertise EPTP switching if VMFUNC and EPT are supported, KVM + * emulates the actual EPTP switch in software. + */ + if (cpu_has_vmx_vmfunc() && enable_ept) + msrs->vmfunc_controls = VMX_VMFUNC_EPTP_SWITCHING; /* * Old versions of KVM use the single-context version without