All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Tian, Kevin" <kevin.tian@intel.com>
To: Nadav Har'El <nyh@math.technion.ac.il>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"gleb@redhat.com" <gleb@redhat.com>,
	"avi@redhat.com" <avi@redhat.com>
Subject: RE: [PATCH 23/31] nVMX: Correct handling of interrupt injection
Date: Wed, 25 May 2011 20:55:39 +0800	[thread overview]
Message-ID: <625BA99ED14B2D499DC4E29D8138F1505C9BFA3B57@shsmsx502.ccr.corp.intel.com> (raw)
In-Reply-To: <20110525123355.GC16418@fermat.math.technion.ac.il>

> From: Nadav Har'El [mailto:nyh@math.technion.ac.il]
> Sent: Wednesday, May 25, 2011 8:34 PM
> 
> On Wed, May 25, 2011, Tian, Kevin wrote about "RE: [PATCH 23/31] nVMX:
> Correct handling of interrupt injection":
> > >  static void enable_irq_window(struct kvm_vcpu *vcpu)
> > >  {
> > >  	u32 cpu_based_vm_exec_control;
> > > +	if (is_guest_mode(vcpu) && nested_exit_on_intr(vcpu))
> > > +		/* We can get here when nested_run_pending caused
> > > +		 * vmx_interrupt_allowed() to return false. In this case, do
> > > +		 * nothing - the interrupt will be injected later.
> > > +		 */
> >
> > I think this is not a rare path? when vcpu is in guest mode with L2 as current
> > vmx context, this function could be invoked multiple times since kvm thread
> > can be scheduled out/in randomly.
> 
> As I wrote in this comment, this can only happen on nested_run_pending
> (i.e., VMLAUNCH/VMRESUME emulation), because if !nested_run_pending,
> and nested_exit_on_intr(), vmx_interrupt_allowed() would have already
> exited L2, and we wouldn't be in this case.
> 
> I don't know if to classify this as a "rare" path - it's definitely not
> very common. But what does it matter if it's rare or common?

It doesn't matter much. I just tried to understand your comment.

> 
> 
> > > +		if (to_vmx(vcpu)->nested.nested_run_pending)
> > > +			return 0;
> >
> > Well, now I can see why you require this special 'nested_run_pending' flag
> > because there're places where L0 injects virtual interrupts right after
> > VMLAUNCH/VMRESUME emulation and before entering L2. :-)
> 
> Indeed. I tried to explain that in the patch description, where I wrote
> 
>  We keep a new flag, "nested_run_pending", which can override the decision
> of
>  which should run next, L1 or L2. nested_run_pending=1 means that we
> *must* run
>  L2 next, not L1. This is necessary in particular when L1 did a VMLAUNCH of L2
>  and therefore expects L2 to be run (and perhaps be injected with an event it
>  specified, etc.). Nested_run_pending is especially intended to avoid switching
>  to L1 in the injection decision-point described above.

atm when nested_run_pending is first introduced, its usage is simple which made
me think this field may not be required. But later several key patches do depend
on this flag for correctness. :-)

> 
> > > +		nested_vmx_vmexit(vcpu);
> > > +		vmcs12 = get_vmcs12(vcpu);
> > > +		vmcs12->vm_exit_reason =
> EXIT_REASON_EXTERNAL_INTERRUPT;
> > > +		vmcs12->vm_exit_intr_info = 0;
> > > +		/* fall through to normal code, but now in L1, not L2 */
> > > +	}
> > > +
> >
> > This is a bad place to add this logic. vmx_interrupt_allowed is simply a
> > query function but you make it an absolute trigger point for switching from
> > L2 to L1. This is fine as now only point calling vmx_interrupt_allowed is
> > when there's vNMI pending. But it's dangerous to have such assumption
> > for pending events inside vmx_interrupt_allowed.
> 
> Now you're beating a dead horse ;-)
> 
> Gleb, and to some degree Avi, already argued that this is the wrong place
> to do this exit, and if anything the exit should be done (or just decided on)
> in enable_irq_window().
> 
> My counter-argument was that the right way is *neither* of these approaches
> -
> any attempt to "commandeer" one of the existing x86 ops, be they
> vmx_interrupt_allowed() or enable_irq_window() to do in the L2 case things
> they were never designed to do is both ugly, and dangerous if the call sites
> change at some time in the future.
> 
> So rather than changing one ugly abuse of one function, to the (arguably
> also ugly) abuse of another function, what I'd like to see is a better overall
> design, where the call sites in x86.c know about the possibility of a nested
> guest (they already do - like we previously discussed, an is_guest_mode()
> function was recently added), and when they need, *they* will call an
> exit-to-L1 function, rather than calling a function called "enable_irq_window"
> or "vmx_interrupt_allowed" which mysteriously will do the exit.
> 

I agree with your point here.

> 
> > On the other hand, I think there's one area which is not handled timely.
> > I think you need to kick a L2->L1 transition when L0 wants to inject
> > virtual interrupt. Consider your current logic:
> >
> > a) L2 is running on cpu1
> > b) L0 on cpu 0 decides to post a virtual interrupt to L1. An IPI is issued to
> > cpu1 after updating virqchip
> > c) L2 on cpu0 vmexit to L0, and checks whether L0 or L1 should handle
> > the event. As it's an external interrupt, L0 will handle it. As it's a notification
> > IPI, nothing is required.
> > d) L0 on cpu0 then decides to resume, and find KVM_REQ_EVENT
> >
> > At this point you only add logic to enable_irq_window, but there's no
> > action to trigger L2->L1 transition. So what will happen? Will the event
> > be injected into L2 instead or pend until next switch happens due to
> > other cause?
> 
> I'm afraid I'm missing something in your explanation... In step d, L0 finds
> an interrupt in the injection queue, so isn't the first thing it does is to
> call vmx_interrupt_allowed(), to check if injection is allowed now?
> In our code, "vmx_interrupt_allowed()" was bastardized to exit to L1 in
> this case. Isn't that the missing exit you were looking for?
> 

This is a false alarm. In my earlier search I thought that vmx_interrupt_allowed
is only invoked in vmx.c for pending vNMI check which actually led me wonder
for a bigger problem. But actually this .interrupt_allowed is checked in common
path as expected. So my own problem here. :-)

Thanks
Kevin

  reply	other threads:[~2011-05-25 12:55 UTC|newest]

Thread overview: 119+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-16 19:43 [PATCH 0/31] nVMX: Nested VMX, v10 Nadav Har'El
2011-05-16 19:44 ` [PATCH 01/31] nVMX: Add "nested" module option to kvm_intel Nadav Har'El
2011-05-16 19:44 ` [PATCH 02/31] nVMX: Implement VMXON and VMXOFF Nadav Har'El
2011-05-20  7:58   ` Tian, Kevin
2011-05-16 19:45 ` [PATCH 03/31] nVMX: Allow setting the VMXE bit in CR4 Nadav Har'El
2011-05-16 19:45 ` [PATCH 04/31] nVMX: Introduce vmcs12: a VMCS structure for L1 Nadav Har'El
2011-05-16 19:46 ` [PATCH 05/31] nVMX: Implement reading and writing of VMX MSRs Nadav Har'El
2011-05-16 19:46 ` [PATCH 06/31] nVMX: Decoding memory operands of VMX instructions Nadav Har'El
2011-05-16 19:47 ` [PATCH 07/31] nVMX: Introduce vmcs02: VMCS used to run L2 Nadav Har'El
2011-05-20  8:04   ` Tian, Kevin
2011-05-20  8:48     ` Tian, Kevin
2011-05-20 20:32       ` Nadav Har'El
2011-05-22  2:00         ` Tian, Kevin
2011-05-22  7:22           ` Nadav Har'El
2011-05-24  0:54             ` Tian, Kevin
2011-05-22  8:29     ` Nadav Har'El
2011-05-24  1:03       ` Tian, Kevin
2011-05-16 19:48 ` [PATCH 08/31] nVMX: Fix local_vcpus_link handling Nadav Har'El
2011-05-17 13:19   ` Marcelo Tosatti
2011-05-17 13:35     ` Avi Kivity
2011-05-17 14:35       ` Nadav Har'El
2011-05-17 14:42         ` Marcelo Tosatti
2011-05-17 17:57           ` Nadav Har'El
2011-05-17 15:11         ` Avi Kivity
2011-05-17 18:11           ` Nadav Har'El
2011-05-17 18:43             ` Marcelo Tosatti
2011-05-17 19:30               ` Nadav Har'El
2011-05-17 19:52                 ` Marcelo Tosatti
2011-05-18  5:52                   ` Nadav Har'El
2011-05-18  8:31                     ` Avi Kivity
2011-05-18  9:02                       ` Nadav Har'El
2011-05-18  9:16                         ` Avi Kivity
2011-05-18 12:08                     ` Marcelo Tosatti
2011-05-18 12:19                       ` Nadav Har'El
2011-05-22  8:57                       ` Nadav Har'El
2011-05-23 15:49                         ` Avi Kivity
2011-05-23 16:17                           ` Gleb Natapov
2011-05-23 18:59                             ` Nadav Har'El
2011-05-23 19:03                               ` Gleb Natapov
2011-05-23 16:43                           ` Roedel, Joerg
2011-05-23 16:51                             ` Avi Kivity
2011-05-24  9:22                               ` Roedel, Joerg
2011-05-24  9:28                                 ` Nadav Har'El
2011-05-24  9:57                                   ` Roedel, Joerg
2011-05-24 10:08                                     ` Avi Kivity
2011-05-24 10:12                                     ` Nadav Har'El
2011-05-23 18:51                           ` Nadav Har'El
2011-05-24  2:22                             ` Tian, Kevin
2011-05-24  7:56                               ` Nadav Har'El
2011-05-24  8:20                                 ` Tian, Kevin
2011-05-24 11:05                                   ` Avi Kivity
2011-05-24 11:20                                     ` Tian, Kevin
2011-05-24 11:27                                       ` Avi Kivity
2011-05-24 11:30                                         ` Tian, Kevin
2011-05-24 11:36                                           ` Avi Kivity
2011-05-24 11:40                                             ` Tian, Kevin
2011-05-24 11:59                                               ` Nadav Har'El
2011-05-24  0:57                           ` Tian, Kevin
2011-05-18  8:29                   ` Avi Kivity
2011-05-16 19:48 ` [PATCH 09/31] nVMX: Add VMCS fields to the vmcs12 Nadav Har'El
2011-05-20  8:22   ` Tian, Kevin
2011-05-16 19:49 ` [PATCH 10/31] nVMX: Success/failure of VMX instructions Nadav Har'El
2011-05-16 19:49 ` [PATCH 11/31] nVMX: Implement VMCLEAR Nadav Har'El
2011-05-16 19:50 ` [PATCH 12/31] nVMX: Implement VMPTRLD Nadav Har'El
2011-05-16 19:50 ` [PATCH 13/31] nVMX: Implement VMPTRST Nadav Har'El
2011-05-16 19:51 ` [PATCH 14/31] nVMX: Implement VMREAD and VMWRITE Nadav Har'El
2011-05-16 19:51 ` [PATCH 15/31] nVMX: Move host-state field setup to a function Nadav Har'El
2011-05-16 19:52 ` [PATCH 16/31] nVMX: Move control field setup to functions Nadav Har'El
2011-05-16 19:52 ` [PATCH 17/31] nVMX: Prepare vmcs02 from vmcs01 and vmcs12 Nadav Har'El
2011-05-24  8:02   ` Tian, Kevin
2011-05-24  9:19     ` Nadav Har'El
2011-05-24 10:52       ` Tian, Kevin
2011-05-16 19:53 ` [PATCH 18/31] nVMX: Implement VMLAUNCH and VMRESUME Nadav Har'El
2011-05-24  8:45   ` Tian, Kevin
2011-05-24  9:45     ` Nadav Har'El
2011-05-24 10:54       ` Tian, Kevin
2011-05-25  8:00   ` Tian, Kevin
2011-05-25 13:26     ` Nadav Har'El
2011-05-26  0:42       ` Tian, Kevin
2011-05-16 19:53 ` [PATCH 19/31] nVMX: No need for handle_vmx_insn function any more Nadav Har'El
2011-05-16 19:54 ` [PATCH 20/31] nVMX: Exiting from L2 to L1 Nadav Har'El
2011-05-24 12:58   ` Tian, Kevin
2011-05-24 13:43     ` Nadav Har'El
2011-05-25  0:55       ` Tian, Kevin
2011-05-25  8:06         ` Nadav Har'El
2011-05-25  8:23           ` Tian, Kevin
2011-05-25  2:43   ` Tian, Kevin
2011-05-25 13:21     ` Nadav Har'El
2011-05-26  0:41       ` Tian, Kevin
2011-05-16 19:54 ` [PATCH 21/31] nVMX: vmcs12 checks on nested entry Nadav Har'El
2011-05-25  3:01   ` Tian, Kevin
2011-05-25  5:38     ` Nadav Har'El
2011-05-25  7:33       ` Tian, Kevin
2011-05-16 19:55 ` [PATCH 22/31] nVMX: Deciding if L0 or L1 should handle an L2 exit Nadav Har'El
2011-05-25  7:56   ` Tian, Kevin
2011-05-25 13:45     ` Nadav Har'El
2011-05-16 19:55 ` [PATCH 23/31] nVMX: Correct handling of interrupt injection Nadav Har'El
2011-05-25  8:39   ` Tian, Kevin
2011-05-25  8:45     ` Tian, Kevin
2011-05-25 10:56     ` Nadav Har'El
2011-05-25  9:18   ` Tian, Kevin
2011-05-25 12:33     ` Nadav Har'El
2011-05-25 12:55       ` Tian, Kevin [this message]
2011-05-16 19:56 ` [PATCH 24/31] nVMX: Correct handling of exception injection Nadav Har'El
2011-05-16 19:56 ` [PATCH 25/31] nVMX: Correct handling of idt vectoring info Nadav Har'El
2011-05-25 10:02   ` Tian, Kevin
2011-05-25 10:13     ` Nadav Har'El
2011-05-25 10:17       ` Tian, Kevin
2011-05-16 19:57 ` [PATCH 26/31] nVMX: Handling of CR0 and CR4 modifying instructions Nadav Har'El
2011-05-16 19:57 ` [PATCH 27/31] nVMX: Further fixes for lazy FPU loading Nadav Har'El
2011-05-16 19:58 ` [PATCH 28/31] nVMX: Additional TSC-offset handling Nadav Har'El
2011-05-16 19:58 ` [PATCH 29/31] nVMX: Add VMX to list of supported cpuid features Nadav Har'El
2011-05-16 19:59 ` [PATCH 30/31] nVMX: Miscellenous small corrections Nadav Har'El
2011-05-16 19:59 ` [PATCH 31/31] nVMX: Documentation Nadav Har'El
2011-05-25 10:33   ` Tian, Kevin
2011-05-25 11:54     ` Nadav Har'El
2011-05-25 12:11       ` Tian, Kevin
2011-05-25 12:13     ` Muli Ben-Yehuda
2011-05-25 20:01 [PATCH 0/31] nVMX: Nested VMX, v11 Nadav Har'El
2011-05-25 20:13 ` [PATCH 23/31] nVMX: Correct handling of interrupt injection Nadav Har'El

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=625BA99ED14B2D499DC4E29D8138F1505C9BFA3B57@shsmsx502.ccr.corp.intel.com \
    --to=kevin.tian@intel.com \
    --cc=avi@redhat.com \
    --cc=gleb@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=nyh@math.technion.ac.il \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.