From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Nadav Har'El" <nyh@math.technion.ac.il>
Subject: Re: [PATCH 23/31] nVMX: Correct handling of interrupt injection
Date: Wed, 25 May 2011 15:33:55 +0300
Message-ID: <20110525123355.GC16418@fermat.math.technion.ac.il>
References: <1305575004-nyh@il.ibm.com> <201105161955.p4GJtgKc001996@rice.haifa.ibm.com> <625BA99ED14B2D499DC4E29D8138F1505C9BFA3AD4@shsmsx502.ccr.corp.intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"gleb@redhat.com" <gleb@redhat.com>,
	"avi@redhat.com" <avi@redhat.com>
To: "Tian, Kevin" <kevin.tian@intel.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mailgw12.technion.ac.il ([132.68.225.12]:24536 "EHLO
	mailgw12.technion.ac.il" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1757178Ab1EYMeK (ORCPT <rfc822;kvm@vger.kernel.org>);
	Wed, 25 May 2011 08:34:10 -0400
Content-Disposition: inline
In-Reply-To: <625BA99ED14B2D499DC4E29D8138F1505C9BFA3AD4@shsmsx502.ccr.corp.intel.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On Wed, May 25, 2011, Tian, Kevin wrote about "RE: [PATCH 23/31] nVMX: Correct handling of interrupt injection":
> >  static void enable_irq_window(struct kvm_vcpu *vcpu)
> >  {
> >  	u32 cpu_based_vm_exec_control;
> > +	if (is_guest_mode(vcpu) && nested_exit_on_intr(vcpu))
> > +		/* We can get here when nested_run_pending caused
> > +		 * vmx_interrupt_allowed() to return false. In this case, do
> > +		 * nothing - the interrupt will be injected later.
> > +		 */
> 
> I think this is not a rare path? when vcpu is in guest mode with L2 as current
> vmx context, this function could be invoked multiple times since kvm thread
> can be scheduled out/in randomly. 

As I wrote in this comment, this can only happen on nested_run_pending
(i.e., VMLAUNCH/VMRESUME emulation), because if !nested_run_pending,
and nested_exit_on_intr(), vmx_interrupt_allowed() would have already
exited L2, and we wouldn't be in this case.

I don't know if to classify this as a "rare" path - it's definitely not
very common. But what does it matter if it's rare or common?


> > +		if (to_vmx(vcpu)->nested.nested_run_pending)
> > +			return 0;
> 
> Well, now I can see why you require this special 'nested_run_pending' flag
> because there're places where L0 injects virtual interrupts right after
> VMLAUNCH/VMRESUME emulation and before entering L2. :-)

Indeed. I tried to explain that in the patch description, where I wrote

 We keep a new flag, "nested_run_pending", which can override the decision of
 which should run next, L1 or L2. nested_run_pending=1 means that we *must* run
 L2 next, not L1. This is necessary in particular when L1 did a VMLAUNCH of L2
 and therefore expects L2 to be run (and perhaps be injected with an event it
 specified, etc.). Nested_run_pending is especially intended to avoid switching
 to L1 in the injection decision-point described above.

> > +		nested_vmx_vmexit(vcpu);
> > +		vmcs12 = get_vmcs12(vcpu);
> > +		vmcs12->vm_exit_reason = EXIT_REASON_EXTERNAL_INTERRUPT;
> > +		vmcs12->vm_exit_intr_info = 0;
> > +		/* fall through to normal code, but now in L1, not L2 */
> > +	}
> > +
> 
> This is a bad place to add this logic. vmx_interrupt_allowed is simply a
> query function but you make it an absolute trigger point for switching from
> L2 to L1. This is fine as now only point calling vmx_interrupt_allowed is
> when there's vNMI pending. But it's dangerous to have such assumption
> for pending events inside vmx_interrupt_allowed.

Now you're beating a dead horse ;-)

Gleb, and to some degree Avi, already argued that this is the wrong place
to do this exit, and if anything the exit should be done (or just decided on)
in enable_irq_window().

My counter-argument was that the right way is *neither* of these approaches -
any attempt to "commandeer" one of the existing x86 ops, be they
vmx_interrupt_allowed() or enable_irq_window() to do in the L2 case things
they were never designed to do is both ugly, and dangerous if the call sites
change at some time in the future.

So rather than changing one ugly abuse of one function, to the (arguably
also ugly) abuse of another function, what I'd like to see is a better overall
design, where the call sites in x86.c know about the possibility of a nested
guest (they already do - like we previously discussed, an is_guest_mode()
function was recently added), and when they need, *they* will call an
exit-to-L1 function, rather than calling a function called "enable_irq_window"
or "vmx_interrupt_allowed" which mysteriously will do the exit.


> On the other hand, I think there's one area which is not handled timely.
> I think you need to kick a L2->L1 transition when L0 wants to inject 
> virtual interrupt. Consider your current logic:
> 
> a) L2 is running on cpu1
> b) L0 on cpu 0 decides to post a virtual interrupt to L1. An IPI is issued to 
> cpu1 after updating virqchip
> c) L2 on cpu0 vmexit to L0, and checks whether L0 or L1 should handle
> the event. As it's an external interrupt, L0 will handle it. As it's a notification
> IPI, nothing is required.
> d) L0 on cpu0 then decides to resume, and find KVM_REQ_EVENT
> 
> At this point you only add logic to enable_irq_window, but there's no 
> action to trigger L2->L1 transition. So what will happen? Will the event
> be injected into L2 instead or pend until next switch happens due to
> other cause?

I'm afraid I'm missing something in your explanation... In step d, L0 finds
an interrupt in the injection queue, so isn't the first thing it does is to
call vmx_interrupt_allowed(), to check if injection is allowed now?
In our code, "vmx_interrupt_allowed()" was bastardized to exit to L1 in
this case. Isn't that the missing exit you were looking for?


-- 
Nadav Har'El                        |    Wednesday, May 25 2011, 21 Iyyar 5771
nyh@math.technion.ac.il             |-----------------------------------------
Phone +972-523-790466, ICQ 13349191 |Experience is what lets you recognize a
http://nadav.harel.org.il           |mistake when you make it again.