Re: [PATCH 2/2] KVM: x86: Fix INIT signal handling in various CPU states

From: Liran Alon <liran.alon@oracle.com>
To: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: pbonzini@redhat.com, rkrcmar@redhat.com, kvm@vger.kernel.org,
	jmattson@google.com, vkuznets@redhat.com,
	Joao Martins <joao.m.martins@oracle.com>,
	Nikita Leshenko <nikita.leshchenko@oracle.com>
Subject: Re: [PATCH 2/2] KVM: x86: Fix INIT signal handling in various CPU states
Date: Mon, 26 Aug 2019 21:26:25 +0300	[thread overview]
Message-ID: <221B019B-D38D-401E-9C6B-17D512B61345@oracle.com> (raw)
In-Reply-To: <20190826160301.GC19381@linux.intel.com>

> On 26 Aug 2019, at 19:03, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> 
> On Mon, Aug 26, 2019 at 01:24:49PM +0300, Liran Alon wrote:
>> Commit cd7764fe9f73 ("KVM: x86: latch INITs while in system management mode")
>> changed code to latch INIT while vCPU is in SMM and process latched INIT
>> when leaving SMM. It left a subtle remark in commit message that similar
>> treatment should also be done while vCPU is in VMX non-root-mode.
>> 
>> However, INIT signals should actually be latched in various vCPU states:
>> (*) For both Intel and AMD, INIT signals should be latched while vCPU
>> is in SMM.
>> (*) For Intel, INIT should also be latched while vCPU is in VMX
>> operation and later processed when vCPU leaves VMX operation by
>> executing VMXOFF.
> 
> I think there's a {get,set}_nested_state() component missing, e.g. the SMM
> case exposes and consumes a pending INIT via events->smi.latched_init.
> Similar functionality is needed to preserve a pending INIT for post-VMXON
> across migration.

Good catch. I have missed this.

It seems that kvm_vcpu_ioctl_x86_get_vcpu_events() sets events->smi.latched_init
just based on KVM_APIC_INIT being set in LAPIC pending events.
Therefore, in case INIT is latched outside of SMM (e.g. By vCPU being in VMX operation),
it should also record it in this variable.
However, kvm_vcpu_ioctl_x86_set_vcpu_events() seems to only treat this
flag in case vCPU is in SMM (As indicated by events->smi.smm).

I would actually like to add a new field (that deprecates this events->smi.latched_init)
of events->pending_lapic_events that holds pending LAPIC events.
Regardless of specific CPU state we are at. I would need to use
one of the reserved fields in struct kvm_vcpu_events and introduce
a new KVM_VCPUEVENT_VALID_APIC_EVENTS flags to indicate
this field is specified by userspace.
I will define that when this field is specified, the events->smi.latched_init is ignored.

An alternative could be to just add a flag to events->flags that modifies
behaviour to treat events->smi.latched_init as just events->latched_init.
But I prefer the previous option.

I don’t want to introduce a new flag for "pending INIT during VMX operation" as
there are various vendor-specific CPU states that latch INIT (As handled by this commit…).

Do you agree?

> 
>> (*) For AMD, INIT should also be latched while vCPU runs with GIF=0
>> or in guest-mode with intercept defined on INIT signal.
>> 
>> To fix this:
>> 1) Add kvm_x86_ops->apic_init_signal_blocked() such that each CPU vendor
>> can define the various CPU states in which INIT signals should be
>> blocked and modify kvm_apic_accept_events() to use it.
> 
> kvm_arch_vcpu_ioctl_set_mpstate() should also use the new helper, e.g.
> userspace shouldn't be able to put the vCPU into KVM_MP_STATE_SIPI_RECEIVED
> or KVM_MP_STATE_INIT_RECEIVED if it's post-VMXON.

Good catch. I have missed this as-well.
Will fix in v2.

> 
>> 2) Modify vmx_check_nested_events() to check for pending INIT signal
>> while vCPU in guest-mode. If so, emualte vmexit on
>> EXIT_REASON_INIT_SIGNAL. Note that nSVM should have similar behaviour
>> but is currently left as a TODO comment to implement in the future
>> because nSVM don't yet implement svm_check_nested_events().
>> 
>> Note: Currently KVM nVMX implementation don't support VMX wait-for-SIPI
>> activity state as specified in MSR_IA32_VMX_MISC bits 6:8 exposed to
>> guest (See nested_vmx_setup_ctls_msrs()).
>> If and when support for this activity state will be implemented,
>> kvm_check_nested_events() would need to avoid emulating vmexit on
>> INIT signal in case activity-state is wait-for-SIPI. In addition,
>> kvm_apic_accept_events() would need to be modified to avoid discarding
>> SIPI in case VMX activity-state is wait-for-SIPI but instead delay
>> SIPI processing to vmx_check_nested_events() that would clear
>> pending APIC events and emulate vmexit on SIPI.
>> 
>> Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
>> Co-developed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
>> Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
>> Signed-off-by: Liran Alon <liran.alon@oracle.com>
>> ---
>> arch/x86/include/asm/kvm_host.h |  2 ++
>> arch/x86/kvm/lapic.c            | 11 +++++++----
>> arch/x86/kvm/svm.c              | 20 ++++++++++++++++++++
>> arch/x86/kvm/vmx/nested.c       | 14 ++++++++++++++
>> arch/x86/kvm/vmx/vmx.c          |  6 ++++++
>> 5 files changed, 49 insertions(+), 4 deletions(-)
>> 
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 74e88e5edd9c..158483538181 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -1209,6 +1209,8 @@ struct kvm_x86_ops {
>> 	uint16_t (*nested_get_evmcs_version)(struct kvm_vcpu *vcpu);
>> 
>> 	bool (*need_emulation_on_page_fault)(struct kvm_vcpu *vcpu);
>> +
>> +	bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
>> };
>> 
>> struct kvm_arch_async_pf {
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> index 685d17c11461..9620fe5ce8d1 100644
>> --- a/arch/x86/kvm/lapic.c
>> +++ b/arch/x86/kvm/lapic.c
>> @@ -2702,11 +2702,14 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
>> 		return;
>> 
>> 	/*
>> -	 * INITs are latched while in SMM.  Because an SMM CPU cannot
>> -	 * be in KVM_MP_STATE_INIT_RECEIVED state, just eat SIPIs
>> -	 * and delay processing of INIT until the next RSM.
>> +	 * INITs are latched while CPU is in specific states.
>> +	 * Because a CPU cannot be in these states immediately
>> +	 * after it have processed an INIT signal (and thus in
>> +	 * KVM_MP_STATE_INIT_RECEIVED state), just eat SIPIs
>> +	 * and delay processing of INIT until CPU leaves
>> +	 * the state which latch INIT signal.
>> 	 */
>> -	if (is_smm(vcpu)) {
>> +	if (kvm_x86_ops->apic_init_signal_blocked(vcpu)) {
> 
> We may want to keep the SMM+INIT outside of the helper so as to avoid
> splitting the SMI+INIT logic across common x86 and vendor specific code.
> E.g. kvm_arch_vcpu_ioctl_set_mpstate() needs to handle the SMI pending
> scenario and kvm_vcpu_ioctl_x86_set_vcpu_events() must prevent putting
> the vCPU into SMM or pending an SMI when the vCPU is already in
> KVM_MP_STATE_INIT_RECEIVED.

I thought it will be nicer to have all conditions for INIT latching defined in a single place.
i.e. In the per-vendor callback.

Why does kvm_arch_vcpu_ioctl_set_mpstate() needs to prevent setting mpstate
to INIT_RECEIVED/SIPI_RECIEVED in case a SMI is pending?
I would expect it to just prevent to do so in case CPU is in a state where INIT is latched.

The commit which introduced this behaviour
28bf28887976 ("KVM: x86: fix user triggerable warning in kvm_apic_accept_events()")
Seems to just want to avoid WARN_ON_ONCE() at kvm_apic_accept_events().
Which in order to do so, smi_pending is not relevant.
Same argument goes for kvm_vcpu_ioctl_x86_set_vcpu_events() modified by same commit.

Am I missing something?

> 
>> 		WARN_ON_ONCE(vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED);
>> 		if (test_bit(KVM_APIC_SIPI, &apic->pending_events))
>> 			clear_bit(KVM_APIC_SIPI, &apic->pending_events);
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> index 6462c386015d..0e43acf7bea4 100644
>> --- a/arch/x86/kvm/svm.c
>> +++ b/arch/x86/kvm/svm.c
>> @@ -7205,6 +7205,24 @@ static bool svm_need_emulation_on_page_fault(struct kvm_vcpu *vcpu)
>> 	return false;
>> }
>> 
>> +static bool svm_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
>> +{
>> +	struct vcpu_svm *svm = to_svm(vcpu);
>> +
>> +	/*
>> +	 * TODO: Last condition latch INIT signals on vCPU when
>> +	 * vCPU is in guest-mode and vmcb12 defines intercept on INIT.
>> +	 * To properly emulate INIT intercept, SVM should also implement
>> +	 * kvm_x86_ops->check_nested_events() and process pending INIT
>> +	 * signal to cause nested_svm_vmexit(). As SVM currently don't
>> +	 * implement check_nested_events(), this work is delayed
>> +	 * for future improvement.
>> +	 */
>> +	return is_smm(vcpu) ||
>> +		   !gif_set(svm) ||
>> +		   (svm->vmcb->control.intercept & (1ULL << INTERCEPT_INIT));
>> +}
>> +
>> static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>> 	.cpu_has_kvm_support = has_svm,
>> 	.disabled_by_bios = is_disabled,
>> @@ -7341,6 +7359,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
>> 	.nested_get_evmcs_version = nested_get_evmcs_version,
>> 
>> 	.need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
>> +
>> +	.apic_init_signal_blocked = svm_apic_init_signal_blocked,
>> };
>> 
>> static int __init svm_init(void)
>> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
>> index ced9fba32598..d655fcd04c01 100644
>> --- a/arch/x86/kvm/vmx/nested.c
>> +++ b/arch/x86/kvm/vmx/nested.c
>> @@ -3401,6 +3401,15 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
>> 	unsigned long exit_qual;
>> 	bool block_nested_events =
>> 	    vmx->nested.nested_run_pending || kvm_event_needs_reinjection(vcpu);
>> +	struct kvm_lapic *apic = vcpu->arch.apic;
>> +
>> +	if (lapic_in_kernel(vcpu) &&
>> +		test_bit(KVM_APIC_INIT, &apic->pending_events)) {
> 
> Prefer aligned identation.

Actually I now see that I can just use kvm_lapic_latched_init() instead.
Therefore, will replace to that.

-Liran

> 
>> +		if (block_nested_events)
>> +			return -EBUSY;
>> +		nested_vmx_vmexit(vcpu, EXIT_REASON_INIT_SIGNAL, 0, 0);
>> +		return 0;
>> +	}
>> 
>> 	if (vcpu->arch.exception.pending &&
>> 		nested_vmx_check_exception(vcpu, &exit_qual)) {
>> @@ -4466,7 +4475,12 @@ static int handle_vmoff(struct kvm_vcpu *vcpu)
>> {
>> 	if (!nested_vmx_check_permission(vcpu))
>> 		return 1;
>> +
>> 	free_nested(vcpu);
>> +
>> +	/* Process a latched INIT during time CPU was in VMX operation */
>> +	kvm_make_request(KVM_REQ_EVENT, vcpu);
>> +
>> 	return nested_vmx_succeed(vcpu);
>> }
>> 
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index b5b5b2e5dac5..5a1aa0640f2a 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -7479,6 +7479,11 @@ static bool vmx_need_emulation_on_page_fault(struct kvm_vcpu *vcpu)
>> 	return false;
>> }
>> 
>> +static bool vmx_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
>> +{
>> +	return is_smm(vcpu) || to_vmx(vcpu)->nested.vmxon;
>> +}
>> +
>> static __init int hardware_setup(void)
>> {
>> 	unsigned long host_bndcfgs;
>> @@ -7803,6 +7808,7 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
>> 	.get_vmcs12_pages = NULL,
>> 	.nested_enable_evmcs = NULL,
>> 	.need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
>> +	.apic_init_signal_blocked = vmx_apic_init_signal_blocked,
>> };
>> 
>> static void vmx_cleanup_l1d_flush(void)
>> -- 
>> 2.20.1
>>