linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
	Anirudh Rayabharam <anrayabh@linux.microsoft.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>,
	Maxim Levitsky <mlevitsk@redhat.com>,
	Nathan Chancellor <nathan@kernel.org>,
	Michael Kelley <mikelley@microsoft.com>,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v5 09/26] KVM: VMX: nVMX: Support TSC scaling and PERF_GLOBAL_CTRL with enlightened VMCS
Date: Fri, 19 Aug 2022 10:06:53 +0200	[thread overview]
Message-ID: <87sflssllu.fsf@redhat.com> (raw)
In-Reply-To: <Yv5zn4qTl0aiaQvh@google.com>

Sean Christopherson <seanjc@google.com> writes:

> On Tue, Aug 02, 2022, Vitaly Kuznetsov wrote:
>> Enlightened VMCS v1 got updated and now includes the required fields
>> for TSC scaling and loading PERF_GLOBAL_CTRL upon VMENTER/VMEXIT
>> features. For Hyper-V on KVM enablement, KVM can just observe VMX control
>> MSRs and use the features (with or without eVMCS) when
>> possible. Hyper-V on KVM case is trickier because of live migration:
>> the new features require explicit enablement from VMM to not break
>> it. Luckily, the updated eVMCS revision comes with a feature bit in
>> CPUID.0x4000000A.EBX.
>
> The refactor to implement the 2-d array should go in a prep patch.  Unless you
> (or anyone else) objects to the feedback below, I'll do the split myself, post v6,
> and queue this up (without patch 1, and assuming there're no major issues in the
> back half of the series).

No objections from me, thanks!

>
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  arch/x86/kvm/hyperv.c     |  2 +-
>>  arch/x86/kvm/vmx/evmcs.c  | 88 +++++++++++++++++++++++++++++++--------
>>  arch/x86/kvm/vmx/evmcs.h  | 17 ++------
>>  arch/x86/kvm/vmx/nested.c |  2 +-
>>  arch/x86/kvm/vmx/vmx.c    |  2 +-
>>  5 files changed, 78 insertions(+), 33 deletions(-)
>> 
>> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
>> index 1098915360ae..8a2b24f9bbf6 100644
>> --- a/arch/x86/kvm/hyperv.c
>> +++ b/arch/x86/kvm/hyperv.c
>> @@ -2552,7 +2552,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
>>  		case HYPERV_CPUID_NESTED_FEATURES:
>>  			ent->eax = evmcs_ver;
>>  			ent->eax |= HV_X64_NESTED_MSR_BITMAP;
>> -
>> +			ent->ebx |= HV_X64_NESTED_EVMCS1_2022_UPDATE;
>>  			break;
>>  
>>  		case HYPERV_CPUID_SYNDBG_VENDOR_AND_MAX_FUNCTIONS:
>> diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
>> index 8bea5dea0341..e8497f9854a1 100644
>> --- a/arch/x86/kvm/vmx/evmcs.c
>> +++ b/arch/x86/kvm/vmx/evmcs.c
>> @@ -368,7 +368,60 @@ uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu)
>>  	return 0;
>>  }
>>  
>> -void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata)
>> +enum evmcs_revision {
>> +	EVMCSv1_2016,
>> +	EVMCSv1_2022,
>> +	EVMCS_REVISION_MAX,
>
> "MAX" is technically wrong, the "max" entry is usually the last valid entry.  This
> should be NR_EVMCS_REVISIONS, and then NR_EVMCS_CTRLS below.
>
>> +};
>> +
>> +enum evmcs_unsupported_ctrl_type {
>
> I think drop the "unsupported" here, because the naming gets weird when trying to
> use it for something other than indexing evmcs_unsupported_ctls (see bottom).
>
>> +	EVMCS_EXIT_CTLS,
>> +	EVMCS_ENTRY_CTLS,
>
> s/CTLS/CTRLS for consistency.
>
>> +	EVMCS_2NDEXEC,
>> +	EVMCS_PINCTRL,
>> +	EVMCS_VMFUNC,
>> +	EVMCS_CTRL_MAX,
>> +};
>> +
>> +static u32 evmcs_unsupported_ctls[EVMCS_CTRL_MAX][EVMCS_REVISION_MAX] = {
>
> This can be "const".  And s/ctls/ctrls for consistency.
>
>> +	[EVMCS_EXIT_CTLS] = {
>> +		[EVMCSv1_2016] = EVMCS1_UNSUPPORTED_VMEXIT_CTRL | VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL,
>> +		[EVMCSv1_2022] = EVMCS1_UNSUPPORTED_VMEXIT_CTRL,
>> +	},
>> +	[EVMCS_ENTRY_CTLS] = {
>> +		[EVMCSv1_2016] = EVMCS1_UNSUPPORTED_VMENTRY_CTRL | VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL,
>> +		[EVMCSv1_2022] =  EVMCS1_UNSUPPORTED_VMENTRY_CTRL,
>> +	},
>> +	[EVMCS_2NDEXEC] = {
>> +		[EVMCSv1_2016] = EVMCS1_UNSUPPORTED_2NDEXEC | SECONDARY_EXEC_TSC_SCALING,
>> +		[EVMCSv1_2022] = EVMCS1_UNSUPPORTED_2NDEXEC,
>> +	},
>> +	[EVMCS_PINCTRL] = {
>> +		[EVMCSv1_2016] = EVMCS1_UNSUPPORTED_PINCTRL,
>> +		[EVMCSv1_2022] = EVMCS1_UNSUPPORTED_PINCTRL,
>> +	},
>> +	[EVMCS_VMFUNC] = {
>> +		[EVMCSv1_2016] = EVMCS1_UNSUPPORTED_VMFUNC,
>> +		[EVMCSv1_2022] = EVMCS1_UNSUPPORTED_VMFUNC,
>> +	},
>> +};
>> +
>> +static u32 evmcs_get_unsupported_ctls(struct kvm_vcpu *vcpu,
>> +				      enum evmcs_unsupported_ctrl_type ctrl_type)
>> +{
>> +	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
>> +	enum evmcs_revision evmcs_rev = EVMCSv1_2016;
>> +
>> +	if (!hv_vcpu)
>
> This is a functiontal change, and I don't think it's correct either.  Previously,
> KVM would apply the EVMCSv1_2016 filter irrespective of whether or not
> vcpu->arch.hyperv is non-NULL.  nested_enable_evmcs() doesn't require a Hyper-V
> vCPU, and AFAICT nothing requires a Hyper-V vCPU to use eVMCS.

Indeed, this *is* correct after PATCH11 when we get rid of VMX feature
MSR filtering for KVM-on-Hyper-V as the remaining use for
evmcs_get_unsupported_ctls() is Hyper-V on KVM and hv_vcpu is not NULL
there. As of this patch, this is incorrect as we're breaking e.g. Linux
guests on KVM on Hyper-V.

>
> So I believe this should be:
>
> 	if (hv_vcpu &&
> 	    hv_vcpu->cpuid_cache.nested_ebx & HV_X64_NESTED_EVMCS1_2022_UPDATE)
> 		evmcs_rev = EVMCSv1_2022;

This looks good to me, thanks!

>
>> +		return 0;
>> +
>> +	if (hv_vcpu->cpuid_cache.nested_ebx & HV_X64_NESTED_EVMCS1_2022_UPDATE)
>> +		evmcs_rev = EVMCSv1_2022;
>> +
>> +	return evmcs_unsupported_ctls[ctrl_type][evmcs_rev];
>> +}
>> +
>
> ...
>
>> -int nested_evmcs_check_controls(struct vmcs12 *vmcs12)
>> +int nested_evmcs_check_controls(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
>>  {
>>  	int ret = 0;
>>  	u32 unsupp_ctl;
>>  
>>  	unsupp_ctl = vmcs12->pin_based_vm_exec_control &
>> -		EVMCS1_UNSUPPORTED_PINCTRL;
>> +		evmcs_get_unsupported_ctls(vcpu, EVMCS_PINCTRL);
>>  	if (unsupp_ctl) {
>>  		trace_kvm_nested_vmenter_failed(
>> -			"eVMCS: unsupported pin-based VM-execution controls",
>> +			"eVMCS: unsupported pin-based VM-execution controls: ",
>>  			unsupp_ctl);
>
> Egad!  This is all broken, at least in theory.  The second param to
> trace_kvm_nested_vmenter_failed() is the error code, not the bad value.  It mostly
> works because __print_symbolic() falls back to printing the hex value on error,
> but that relies on there being no collisions/matches between unsupp_ctl and the
> set of VMX_VMENTER_INSTRUCTION_ERRORS.  E.g. if there's a collision then the
> tracepoint will pretty print a string and cause confusion.
>
> And checking every control even after one fails seems unnecessary subtle.  It
> provides marginal benefit while increasing the probability that we screw up and
> squash "ret" to zero.  Yeah, it might save some onion peeling, but if that happens
> during production and now during initial development of a feature, then someone
> has much bigger problems to worry about.
>
> Unless there are objections, I'll fold in a patch to yield:
>
> #define CC KVM_NESTED_VMENTER_CONSISTENCY_CHECK
>
> static bool nested_evmcs_is_valid_controls(enum evmcs_ctrl_type type, u32 val)
> {
> 	return val & evmcs_get_unsupported_ctls(type);
> }
>
> int nested_evmcs_check_controls(struct vmcs12 *vmcs12)
> {
> 	if (CC(!nested_evmcs_is_valid_controls(EVMCS_PINCTRL,
> 					       vmcs12->pin_based_vm_exec_control)))
> 		return -EINVAL;
>
> 	if (CC(!nested_evmcs_is_valid_controls(EVMCS_2NDEXEC,
> 					       vmcs12->secondary_vm_exec_control)))
> 		return -EINVAL;
>
> 	if (CC(!nested_evmcs_is_valid_controls(EVMCS_EXIT_CTRLS,
> 					       vmcs12->vm_exit_controls)))
> 		return -EINVAL;
>
> 	if (CC(!nested_evmcs_is_valid_controls(EVMCS_ENTRY_CTRLS,
> 					       vmcs12->vm_entry_controls)))
> 		return -EINVAL;
>
> 	if (CC(!nested_evmcs_is_valid_controls(EVMCS_VMFUNC,
> 					       vmcs12->vm_function_control)))
> 		return -EINVAL;
>
> 	return 0;
> }


Well, it's actually nice to see all the controls where KVM screwed up
without the need to instrument kernel, this way when someone comes with
an issue you can immediately see what's wrong in the trace
log. Honestly, I don't remember these firing outside of my development
environment, your patch to make things correct looks good to me.

-- 
Vitaly


  reply	other threads:[~2022-08-19  8:07 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-02 16:07 [PATCH v5 00/26] KVM: VMX: Support updated eVMCSv1 revision + use vmcs_config for L1 VMX MSRs Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 01/26] KVM: x86: hyper-v: Expose access to debug MSRs in the partition privilege flags Vitaly Kuznetsov
2022-08-18 15:14   ` Sean Christopherson
2022-08-18 15:20     ` Vitaly Kuznetsov
2022-08-18 15:49       ` Sean Christopherson
2022-08-18 15:59         ` Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 02/26] x86/hyperv: Fix 'struct hv_enlightened_vmcs' definition Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 03/26] x86/hyperv: Update " Vitaly Kuznetsov
2022-08-18 15:21   ` Sean Christopherson
2022-08-18 15:29     ` Vitaly Kuznetsov
2022-08-18 17:57       ` Sean Christopherson
2022-08-22  9:18         ` Vitaly Kuznetsov
2022-08-22 15:55           ` Sean Christopherson
2022-08-22 16:21             ` Vitaly Kuznetsov
2022-08-22 17:01               ` Sean Christopherson
2022-08-22 17:46                 ` Vitaly Kuznetsov
2022-08-22 18:32                   ` Sean Christopherson
2022-08-23  7:33                     ` Vitaly Kuznetsov
2022-08-23 15:00                       ` Sean Christopherson
2022-08-23 15:31                         ` Sean Christopherson
2022-08-23 16:54                         ` Vitaly Kuznetsov
2022-08-23 20:16                           ` Sean Christopherson
2022-08-22 16:13           ` Sean Christopherson
2022-08-22 16:24             ` Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 04/26] KVM: VMX: Define VMCS-to-EVMCS conversion for the new fields Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 05/26] KVM: nVMX: Support several new fields in eVMCSv1 Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 06/26] KVM: x86: hyper-v: Cache HYPERV_CPUID_NESTED_FEATURES CPUID leaf Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 07/26] KVM: selftests: Add ENCLS_EXITING_BITMAP{,HIGH} VMCS fields Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 08/26] KVM: selftests: Switch to updated eVMCSv1 definition Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 09/26] KVM: VMX: nVMX: Support TSC scaling and PERF_GLOBAL_CTRL with enlightened VMCS Vitaly Kuznetsov
2022-08-18 17:15   ` Sean Christopherson
2022-08-19  8:06     ` Vitaly Kuznetsov [this message]
2022-08-19 17:02       ` Sean Christopherson
2022-08-22  8:47         ` Vitaly Kuznetsov
2022-08-22 16:50           ` Sean Christopherson
2022-08-22 17:49             ` Vitaly Kuznetsov
2022-08-18 17:19   ` Sean Christopherson
2022-08-19  7:42     ` Vitaly Kuznetsov
2022-08-19 14:49       ` Sean Christopherson
2022-08-19 15:07         ` Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 10/26] KVM: selftests: Enable TSC scaling in evmcs selftest Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 11/26] KVM: VMX: Get rid of eVMCS specific VMX controls sanitization Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 12/26] KVM: VMX: Check VM_ENTRY_IA32E_MODE in setup_vmcs_config() Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 13/26] KVM: VMX: Check CPU_BASED_{INTR,NMI}_WINDOW_EXITING " Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 14/26] KVM: VMX: Tweak the special handling of SECONDARY_EXEC_ENCLS_EXITING " Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 15/26] KVM: VMX: Don't toggle VM_ENTRY_IA32E_MODE for 32-bit kernels/KVM Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 16/26] KVM: VMX: Extend VMX controls macro shenanigans Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 17/26] KVM: VMX: Move CPU_BASED_CR8_{LOAD,STORE}_EXITING filtering out of setup_vmcs_config() Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 18/26] KVM: VMX: Add missing VMEXIT controls to vmcs_config Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 19/26] KVM: VMX: Add missing CPU based VM execution " Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 20/26] KVM: VMX: Adjust CR3/INVPLG interception for EPT=y at runtime, not setup Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 21/26] KVM: x86: VMX: Replace some Intel model numbers with mnemonics Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 22/26] KVM: VMX: Move LOAD_IA32_PERF_GLOBAL_CTRL errata handling out of setup_vmcs_config() Vitaly Kuznetsov
2022-08-18 17:49   ` Sean Christopherson
2022-08-19  7:48     ` Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 23/26] KVM: nVMX: Always set required-1 bits of pinbased_ctls to PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 24/26] KVM: nVMX: Use sanitized allowed-1 bits for VMX control MSRs Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 25/26] KVM: VMX: Cache MSR_IA32_VMX_MISC in vmcs_config Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 26/26] KVM: nVMX: Use cached host MSR_IA32_VMX_MISC value for setting up nested MSR Vitaly Kuznetsov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87sflssllu.fsf@redhat.com \
    --to=vkuznets@redhat.com \
    --cc=anrayabh@linux.microsoft.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mikelley@microsoft.com \
    --cc=mlevitsk@redhat.com \
    --cc=nathan@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).