Re: [PATCH v2 3/5] KVM: nVMX: Fix injection to L2 when L1 don't intercept external-interrupts

From: "Radim Krčmář" <rkrcmar@redhat.com>
To: Liran Alon <LIRAN.ALON@ORACLE.COM>
Cc: pbonzini@redhat.com, kvm@vger.kernel.org, jmattson@google.com,
	wanpeng.li@hotmail.com, idan.brown@ORACLE.COM,
	Krish Sadhukhan <krish.sadhukhan@ORACLE.COM>,
	Konrad Rzeszutek Wilk <konrad.wilk@ORACLE.COM>
Subject: Re: [PATCH v2 3/5] KVM: nVMX: Fix injection to L2 when L1 don't intercept external-interrupts
Date: Thu, 7 Dec 2017 17:41:16 +0100	[thread overview]
Message-ID: <20171207164115.GB29615@flask> (raw)
In-Reply-To: <5A2923A5.3010906@ORACLE.COM>

2017-12-07 13:19+0200, Liran Alon:
> On 06/12/17 22:20, Radim Krčmář wrote:
> > 2017-12-05 10:16+0200, Liran Alon:
> > > Before each vmentry to guest, vcpu_enter_guest() calls sync_pir_to_irr()
> > > which calls vmx_hwapic_irr_update() to update RVI.
> > > Currently, vmx_hwapic_irr_update() contains a tweak in case it is called
> > > when CPU is running L2 and L1 don't intercept external-interrupts.
> > > In that case, code injects interrupt directly into L2 instead of
> > > updating RVI.
> > > 
> > > Besides being hacky (wouldn't expect function updating RVI to also
> > > inject interrupt), it also doesn't handle this case correctly.
> > > The code contains several issues:
> > > 1. When code calls kvm_queue_interrupt() it just passes it max_irr which
> > > represents the highest IRR currently pending in L1 LAPIC.
> > > This is problematic as interrupt was injected to guest but it's bit is
> > > still set in LAPIC IRR instead of being cleared from IRR and set in ISR.
> > > 2. Code doesn't check if LAPIC PPR is set to accept an interrupt of
> > > max_irr priority. It just checks if interrupts are enabled in guest with
> > > vmx_interrupt_allowed().
> > 
> > Good catch.
> > 
> > > To fix the above issues:
> > > 1. Simplify vmx_hwapic_irr_update() to just update RVI.
> > > Note that this shouldn't happen when CPU is running L2
> > > (See comment in code).
> > > 2. Since now vmx_hwapic_irr_update() only does logic for L1
> > > virtual-interrupt-delivery, inject_pending_event() should be the
> > > one responsible for injecting the interrupt directly into L2.
> > > Therefore, change kvm_cpu_has_injectable_intr() to check L1
> > > LAPIC when CPU is running L2.
> > > 
> > > This is enough to fix the issue since commit ("KVM: nVMX: Re-evaluate
> > > L1 pending events when running L2 and L1 got posted-interrupt")
> > > made vcpu_enter_guest() to set KVM_REQ_EVENT when L1 has a pending
> > > injectable interrupt.
> > 
> > Right, that answers my question from the first patch.
> > 
> > > Fixes: 963fee165660 ("KVM: nVMX: Fix virtual interrupt delivery
> > > injection")
> > > 
> > > Signed-off-by: Liran Alon <liran.alon@oracle.com>
> > > Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
> > > Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> > > Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> > > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > > ---
> > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > > @@ -9051,6 +9039,10 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
> > >   		 * If we are running L2 and L1 has a new pending interrupt
> > >   		 * which can be injected, we should re-evaluate
> > >   		 * what should be done with this new L1 interrupt.
> > > +		 * If L1 intercepts external-interrupts, we should
> > > +		 * exit from L2 to L1. Otherwise, interrupt should be
> > > +		 * delivered directly to L2.
> > > +		 * These cases are handled correctly by KVM_REQ_EVENT.
> > >   		 */
> > >   		if (is_guest_mode(vcpu) && (max_irr > prev_max_irr))
> > >   			kvm_make_request(KVM_REQ_EVENT, vcpu);
> > 
> > 
> > KVM_REQ_EVENT is not needed when we are intercepting the interrupt,
> > because we just want to run vmx_check_nested_events(), but it is needed
> > when delivering L1's interrupt inside L2.
> 
> OK. If vmcs12 intercept external interrupts, I will just call
> kvm_vcpu_exiting_guest_mode(). Otherwise, I will use KVM_REQ_EVENT.
> 
> > 
> > When vmcs12 doesn't intercept external interrupts, I'm thinking we could
> > pass virtual interrupts from vmcs01 and use RVI normally, as well as
> > delivering L1 posted interrupts to L2 without a VM exit.
> > 
> 
> Maybe I am missing something but I am not sure this is possible.
> 
> As the comment I added to vmx_hwapic_irr_update() specifies,
> virtual-interrupt-delivery cannot be enabled if exit on external-interrupts
> is disabled. This can be seen in the following code in
> nested_vmx_check_apicv_controls():
> /*
> * If virtual interrupt delivery is enabled,
> * we must exit on external interrupts.
> */
> if (nested_cpu_has_vid(vmcs12) &&
>     !nested_exit_on_intr(vcpu))
>         return -EINVAL;
> 
> Therefore, in case vmcs12 disables exit on
> external-interrupts then vmcs12 SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY must be
> cleared. As prepare_vmcs02() currently takes value of
> SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY as-is from vmcs12 to vmcs02, it means
> vmcs02 doesn't have virtual-interrupt-delivery in this case. Therefore, in
> this case, we should not update vmcs02 RVI.

Right, vmcs12 can't configure it, but we already have
PIN_BASED_EXT_INTR_MASK in vmcs02, so we'd also enable
VIRTUAL_INTR_DELIVERY in vmcs02 if we detect that vmcs12 disables
PIN_BASED_EXT_INTR_MASK.  And because vmcs12 can't use it, we're free to
configure vmcs02 with data from vmcs01, so interrupt delivery would
behave as if there was no nesting.
(Dislaimer: I haven't verified that it would actually work.)

For now, I think that throwing KVM_REQ_EVENT in this case is perfectly
acceptable and we can optimize later if we notice that it is being
widely used.  (Windows with device guard come to mind.)

Thanks.