From mboxrd@z Thu Jan  1 00:00:00 1970
From: Radim =?utf-8?B?S3LEjW3DocWZ?= <rkrcmar@redhat.com>
Subject: Re: [PATCH v2 3/5] KVM: nVMX: Fix injection to L2 when L1 don't
 intercept external-interrupts
Date: Thu, 7 Dec 2017 17:41:16 +0100
Message-ID: <20171207164115.GB29615@flask>
References: <1512461786-6465-1-git-send-email-liran.alon@oracle.com>
 <1512461786-6465-4-git-send-email-liran.alon@oracle.com>
 <20171206202052.GC3644@flask>
 <5A2923A5.3010906@ORACLE.COM>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Cc: pbonzini@redhat.com, kvm@vger.kernel.org, jmattson@google.com,
        wanpeng.li@hotmail.com, idan.brown@ORACLE.COM,
        Krish Sadhukhan <krish.sadhukhan@ORACLE.COM>,
        Konrad Rzeszutek Wilk <konrad.wilk@ORACLE.COM>
To: Liran Alon <LIRAN.ALON@ORACLE.COM>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:49912 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1752634AbdLGQlW (ORCPT <rfc822;kvm@vger.kernel.org>);
        Thu, 7 Dec 2017 11:41:22 -0500
Content-Disposition: inline
In-Reply-To: <5A2923A5.3010906@ORACLE.COM>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

2017-12-07 13:19+0200, Liran Alon:
> On 06/12/17 22:20, Radim Krčmář wrote:
> > 2017-12-05 10:16+0200, Liran Alon:
> > > Before each vmentry to guest, vcpu_enter_guest() calls sync_pir_to_irr()
> > > which calls vmx_hwapic_irr_update() to update RVI.
> > > Currently, vmx_hwapic_irr_update() contains a tweak in case it is called
> > > when CPU is running L2 and L1 don't intercept external-interrupts.
> > > In that case, code injects interrupt directly into L2 instead of
> > > updating RVI.
> > > 
> > > Besides being hacky (wouldn't expect function updating RVI to also
> > > inject interrupt), it also doesn't handle this case correctly.
> > > The code contains several issues:
> > > 1. When code calls kvm_queue_interrupt() it just passes it max_irr which
> > > represents the highest IRR currently pending in L1 LAPIC.
> > > This is problematic as interrupt was injected to guest but it's bit is
> > > still set in LAPIC IRR instead of being cleared from IRR and set in ISR.
> > > 2. Code doesn't check if LAPIC PPR is set to accept an interrupt of
> > > max_irr priority. It just checks if interrupts are enabled in guest with
> > > vmx_interrupt_allowed().
> > 
> > Good catch.
> > 
> > > To fix the above issues:
> > > 1. Simplify vmx_hwapic_irr_update() to just update RVI.
> > > Note that this shouldn't happen when CPU is running L2
> > > (See comment in code).
> > > 2. Since now vmx_hwapic_irr_update() only does logic for L1
> > > virtual-interrupt-delivery, inject_pending_event() should be the
> > > one responsible for injecting the interrupt directly into L2.
> > > Therefore, change kvm_cpu_has_injectable_intr() to check L1
> > > LAPIC when CPU is running L2.
> > > 
> > > This is enough to fix the issue since commit ("KVM: nVMX: Re-evaluate
> > > L1 pending events when running L2 and L1 got posted-interrupt")
> > > made vcpu_enter_guest() to set KVM_REQ_EVENT when L1 has a pending
> > > injectable interrupt.
> > 
> > Right, that answers my question from the first patch.
> > 
> > > Fixes: 963fee165660 ("KVM: nVMX: Fix virtual interrupt delivery
> > > injection")
> > > 
> > > Signed-off-by: Liran Alon <liran.alon@oracle.com>
> > > Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
> > > Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> > > Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> > > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > > ---
> > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > > @@ -9051,6 +9039,10 @@ static int vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
> > >   		 * If we are running L2 and L1 has a new pending interrupt
> > >   		 * which can be injected, we should re-evaluate
> > >   		 * what should be done with this new L1 interrupt.
> > > +		 * If L1 intercepts external-interrupts, we should
> > > +		 * exit from L2 to L1. Otherwise, interrupt should be
> > > +		 * delivered directly to L2.
> > > +		 * These cases are handled correctly by KVM_REQ_EVENT.
> > >   		 */
> > >   		if (is_guest_mode(vcpu) && (max_irr > prev_max_irr))
> > >   			kvm_make_request(KVM_REQ_EVENT, vcpu);
> > 
> > 
> > KVM_REQ_EVENT is not needed when we are intercepting the interrupt,
> > because we just want to run vmx_check_nested_events(), but it is needed
> > when delivering L1's interrupt inside L2.
> 
> OK. If vmcs12 intercept external interrupts, I will just call
> kvm_vcpu_exiting_guest_mode(). Otherwise, I will use KVM_REQ_EVENT.
> 
> > 
> > When vmcs12 doesn't intercept external interrupts, I'm thinking we could
> > pass virtual interrupts from vmcs01 and use RVI normally, as well as
> > delivering L1 posted interrupts to L2 without a VM exit.
> > 
> 
> Maybe I am missing something but I am not sure this is possible.
> 
> As the comment I added to vmx_hwapic_irr_update() specifies,
> virtual-interrupt-delivery cannot be enabled if exit on external-interrupts
> is disabled. This can be seen in the following code in
> nested_vmx_check_apicv_controls():
> /*
> * If virtual interrupt delivery is enabled,
> * we must exit on external interrupts.
> */
> if (nested_cpu_has_vid(vmcs12) &&
>     !nested_exit_on_intr(vcpu))
>         return -EINVAL;
> 
> Therefore, in case vmcs12 disables exit on
> external-interrupts then vmcs12 SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY must be
> cleared. As prepare_vmcs02() currently takes value of
> SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY as-is from vmcs12 to vmcs02, it means
> vmcs02 doesn't have virtual-interrupt-delivery in this case. Therefore, in
> this case, we should not update vmcs02 RVI.

Right, vmcs12 can't configure it, but we already have
PIN_BASED_EXT_INTR_MASK in vmcs02, so we'd also enable
VIRTUAL_INTR_DELIVERY in vmcs02 if we detect that vmcs12 disables
PIN_BASED_EXT_INTR_MASK.  And because vmcs12 can't use it, we're free to
configure vmcs02 with data from vmcs01, so interrupt delivery would
behave as if there was no nesting.
(Dislaimer: I haven't verified that it would actually work.)

For now, I think that throwing KVM_REQ_EVENT in this case is perfectly
acceptable and we can optimize later if we notice that it is being
widely used.  (Windows with device guard come to mind.)

Thanks.