From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Nadav Har'El" Subject: [PATCH 20/24] Correct handling of interrupt injection Date: Sun, 13 Jun 2010 15:32:48 +0300 Message-ID: <201006131232.o5DCWmCr013127@rice.haifa.ibm.com> References: <1276431753-nyh@il.ibm.com> Cc: kvm@vger.kernel.org To: avi@redhat.com Return-path: Received: from mtagate6.de.ibm.com ([195.212.17.166]:59255 "EHLO mtagate6.de.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753443Ab0FMMcw (ORCPT ); Sun, 13 Jun 2010 08:32:52 -0400 Received: from d12nrmr1607.megacenter.de.ibm.com (d12nrmr1607.megacenter.de.ibm.com [9.149.167.49]) by mtagate6.de.ibm.com (8.13.1/8.13.1) with ESMTP id o5DCWpj6011391 for ; Sun, 13 Jun 2010 12:32:51 GMT Received: from d12av01.megacenter.de.ibm.com (d12av01.megacenter.de.ibm.com [9.149.165.212]) by d12nrmr1607.megacenter.de.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id o5DCWo1F1061114 for ; Sun, 13 Jun 2010 14:32:50 +0200 Received: from d12av01.megacenter.de.ibm.com (loopback [127.0.0.1]) by d12av01.megacenter.de.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id o5DCWosw004570 for ; Sun, 13 Jun 2010 14:32:50 +0200 Sender: kvm-owner@vger.kernel.org List-ID: When KVM wants to inject an interrupt, the guest should think a real interrupt has happened. Normally (in the non-nested case) this means checking that the guest doesn't block interrupts (and if it does, inject when it doesn't - using the "interrupt window" VMX mechanism), and setting up the appropriate VMCS fields for the guest to receive the interrupt. However, when we are running a nested guest (L2) and its hypervisor (L1) requested exits on interrupts (as most hypervisors do), the most efficient thing to do is to exit L2, telling L1 that the exit was caused by an interrupt, the one we were injecting; Only when L1 asked not to be notified of interrupts, we should to inject it directly to the running guest L2 (i.e., the normal code path). However, properly doing what is described above requires invasive changes to the flow of the existing code, which we elected not to do in this stage. Instead we do something more simplistic and less efficient: we modify vmx_interrupt_allowed(), which kvm calls to see if it can inject the interrupt now, to exit from L2 to L1 before continuing the normal code. The normal kvm code then notices that L1 is blocking interrupts, and sets the interrupt window to inject the interrupt later to L1. Shortly after, L1 gets the interrupt while it is itself running, not as an exit from L2. The cost is an extra L1 exit (the interrupt window). Signed-off-by: Nadav Har'El --- --- .before/arch/x86/kvm/vmx.c 2010-06-13 15:01:30.000000000 +0300 +++ .after/arch/x86/kvm/vmx.c 2010-06-13 15:01:30.000000000 +0300 @@ -3591,9 +3591,29 @@ out: return ret; } +/* In nested virtualization, check if L1 asked to exit on external interrupts. + * For most existing hypervisors, this will always return true. + */ +static bool nested_exit_on_intr(struct kvm_vcpu *vcpu) +{ + int ret; + if (!nested_map_current(vcpu)) + return 0; + ret = get_shadow_vmcs(vcpu)->pin_based_vm_exec_control & + PIN_BASED_EXT_INTR_MASK; + nested_unmap_current(vcpu); + return ret; +} + static void enable_irq_window(struct kvm_vcpu *vcpu) { u32 cpu_based_vm_exec_control; + if (to_vmx(vcpu)->nested.nested_mode && nested_exit_on_intr(vcpu)) + /* We can get here when nested_run_pending caused + * vmx_interrupt_allowed() to return false. In this case, do + * nothing - the interrupt will be injected later. + */ + return; cpu_based_vm_exec_control = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL); cpu_based_vm_exec_control |= CPU_BASED_VIRTUAL_INTR_PENDING; @@ -3718,6 +3738,13 @@ static int nested_vmx_vmexit(struct kvm_ static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu) { + if (to_vmx(vcpu)->nested.nested_mode && nested_exit_on_intr(vcpu)) { + if (to_vmx(vcpu)->nested.nested_run_pending) + return 0; + nested_vmx_vmexit(vcpu, true); + /* fall through to normal code, but now in L1, not L2 */ + } + return (vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) && !(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) & (GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS));