All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Tian, Kevin" <kevin.tian@intel.com>
To: Nadav Har'El <nyh@math.technion.ac.il>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"gleb@redhat.com" <gleb@redhat.com>,
	"avi@redhat.com" <avi@redhat.com>
Subject: RE: [PATCH 20/31] nVMX: Exiting from L2 to L1
Date: Wed, 25 May 2011 08:55:13 +0800	[thread overview]
Message-ID: <625BA99ED14B2D499DC4E29D8138F1505C9BFA35FD@shsmsx502.ccr.corp.intel.com> (raw)
In-Reply-To: <20110524134302.GA10363@fermat.math.technion.ac.il>

> From: Nadav Har'El [mailto:nyh@math.technion.ac.il]
> Sent: Tuesday, May 24, 2011 9:43 PM
> 
> On Tue, May 24, 2011, Tian, Kevin wrote about "RE: [PATCH 20/31] nVMX:
> Exiting from L2 to L1":
> > > +vmcs12_guest_cr0(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
> > > +{
> > > +	/*
> > > +	 * As explained above, we take a bit from GUEST_CR0 if we allowed
> the
> > > +	 * guest to modify it untrapped (vcpu->arch.cr0_guest_owned_bits),
> or
> > > +	 * if we did trap it - if we did so because L1 asked to trap this bit
> > > +	 * (vmcs12->cr0_guest_host_mask). Otherwise (bits we trapped but
> L1
> > > +	 * didn't expect us to trap) we read from CR0_READ_SHADOW.
> > > +	 */
> > > +	unsigned long guest_cr0_bits =
> > > +		vcpu->arch.cr0_guest_owned_bits |
> vmcs12->cr0_guest_host_mask;
> > > +	return (vmcs_readl(GUEST_CR0) & guest_cr0_bits) |
> > > +	       (vmcs_readl(CR0_READ_SHADOW) & ~guest_cr0_bits);
> > > +}
> >
> > Hi, Nadav,
> >
> > Not sure whether I get above operation wrong.
> 
> This is one of the trickiest functions in nested VMX, which is why I added
> 15 lines of comments (!) on just two statements of code.

I read the comment carefully, and the scenario I described is not covered there.

> 
> > But it looks not exactly correct to me
> > in a glimpse. Say a bit set both in L0/L1's cr0_guest_host_mask. In such case
> that
> > bit from vmcs12_GUEST_CR0 resides in vmcs02_CR0_READ_SHADOW,
> however above
> > operation will make vmcs02_GUEST_CR0 bit returned instead.
> 
> This behavior is correct: If a bit is set in L1's cr0_guest_host_mask (and
> in particular, if it is set in both L0's and L1's), we always exit to L1 when
> L2 changes this bit, and this bit cannot change while L2 is running, so
> naturally after the run vmcs02.guest_cr0 and vmcs12.guest_cr0 are still
> identical in that be.

Are you sure this is the case? vmcs12.guest_cr0 is identical to an operation
that L1 tries to update GUEST_CR0 when you prepare vmcs02 which is why
you use vmx_set_cr0(vcpu, vmcs12->guest_cr0) in prepare_vmcs02. If L0 
has one bit set in L0's cr0_guest_host_mask, the corresponding bit in 
vmcs12.guest_cr0 will be cached in vmcs02.cr0_read_shadow anyway. This
is not related to whether L2 changes that bit.

IOW, I disagree that if L0/L1 set same bit in cr0_guest_host_mask, then
the bit is identical in vmcs02.guest_cr0 and vmcs12.guest_cr0 because L1
has no permission to set its bit effectively in this case.

> Copying that bit from vmcs02_CR0_READ_SHADOW, like you suggested, would
> be
> completely wrong in this case: When L1 set a bit in cr0_guest_host_mask,
> the vmcs02->cr0_read_shadow is vmcs12->cr0_read_shadow (see
> nested_read_cr0),
> and is just a pretense that L1 set up for L2 - it is NOT the real bit of
> guest_cr0, so copying it into guest_cr0 would be wrong.

So I'm talking about reserving that bit from vmcs12.guest_cr0 when it's set
in vmcs12.cr0_guest_host_mask which is a natural output.

> 
> Note that this function is completely different from nested_read_cr0 (the
> new name), which behaves similar to what you suggested but serves a
> completely
> different (and in some respect, opposite) function.
> 
> I think my comments in the code are clearer than what I just wrote here, so
> please take a look at them again, and let me know if you find any errors.
> 
> > Instead of constructing vmcs12_GUEST_CR0 completely from
> vmcs02_GUEST_CR0,
> > why not just updating bits which can be altered while keeping the rest bits
> from
> > vmcs12_GUEST_CR0? Say something like:
> >
> > vmcs12->guest_cr0 &= vmcs12->cr0_guest_host_mask; /* keep unchanged
> bits */
> > vmcs12->guest_cr0 |= (vmcs_readl(GUEST_CR0) &
> vcpu->arch.cr0_guest_owned_bits) |
> > 	(vmcs_readl(CR0_READ_SHADOW) &
> ~( vcpu->arch.cr0_guest_owned_bits | vmcs12->cr0_guest_host_mask))
> 
> I guess I could do something like this, but do you think it's clearer?
> I don't. Behind all the details, my formula emphasises that MOST cr0 bits
> can be just copied from vmcs02 to vmcs12 as is - and we only have to do
> something strange for special bits - where L0 wanted to trap but L1 didn't.
> In your formula, it looks like there are 3 different cases instead of 2.

But my formula is more clear given that it sticks to the implication of the
cr0_guest_host_mask. You only need to update cr0 bits which can be modified
by the L2 w/o trap while just keeping the rest.

> 
> In any case, your formula is definitely not more correct, because the formulas
> are in fact equivalent - let me prove:
> 
> If, instead of taking the "unchanged bits" (as you call them) from
> vmcs12->guest_cr0, you take them from vmcs02->guest_cr0 (you can,
> because they couldn't have changed), you end up with *exactly* the same
> formula I used. Here is the proof:
> 
>  yourformula =
> 	(vmcs12->guest_cr0 & vmcs12->cr0_guest_host_mask) |
> 	(vmcs_readl(GUEST_CR0) & vcpu->arch.cr0_guest_owned_bits) |
>  	(vmcs_readl(CR0_READ_SHADOW) &
> ~( vcpu->arch.cr0_guest_owned_bits | vmcs12->cr0_guest_host_mask))
> 
> Now because of the "unchanged bits",
> 	(vmcs12->guest_cr0 & vmcs12->cr0_guest_host_mask) ==
> 	(vmcs02->guest_cr0 & vmcs12->cr0_guest_host_mask) ==
> 
>           (and note that vmcs02->guest_cr0 is vmcs_readl(GUEST_CR0))

this is the problem:

	(vmcs12->guest_cr0 & vmcs12->cr0_guest_host_mask) !=
	(vmcs02->guest_cr0 & vmcs12->cr0_guest_host_mask)

only below equation holds true:

	(vmcs12->guest_cr0 & vmcs12->cr0_guest_host_mask & !L0->cr0_guest_host_mask) ==
	(vmcs02->guest_cr0 & vmcs12->cr0_guest_host_mask & !L0->cr0_guest_host_mask)

When one bit of vmcs12->cr0_guest_host_mask is set, it simply implicates that L1
wants to control the bit instead of L2. However whether L1 can really control that
bit still depends on whether L0 allows it to be!

> 
> so this in yourformula, it becomes
> 
>  yourformula =
> 	(vmcs_readl(GUEST_CR0) & vmcs12->cr0_guest_host_mask) |
> 	(vmcs_readl(GUEST_CR0) & vcpu->arch.cr0_guest_owned_bits) |
>  	(vmcs_readl(CR0_READ_SHADOW) &
> ~( vcpu->arch.cr0_guest_owned_bits | vmcs12->cr0_guest_host_mask))
> 
> or, simplifying
> 
>  yourformula =
> 	(vmcs_readl(GUEST_CR0) & (vmcs12->cr0_guest_host_mask |
> vcpu->arch.cr0_guest_owned_bits) |
>  	(vmcs_readl(CR0_READ_SHADOW) &
> ~( vcpu->arch.cr0_guest_owned_bits | vmcs12->cr0_guest_host_mask))
> 
> now, using the name I used:
> 	unsigned long guest_cr0_bits =
> 		vcpu->arch.cr0_guest_owned_bits | vmcs12->cr0_guest_host_mask;
> 
> you end up with
> 
>  yourforumula =
> 	(vmcs_readl(GUEST_CR0) & guest_cr0_bits) |
>  	(vmcs_readl(CR0_READ_SHADOW) & ~guest_cr0_bits )
> 
> Which is, believe it or not, exactly my formula :-)
> 

So with my interpretation, two formulas are different because you
misuse vmcs12.cr0_guest_host_mask. :-)

Thanks
Kevin

  reply	other threads:[~2011-05-25  0:55 UTC|newest]

Thread overview: 119+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-16 19:43 [PATCH 0/31] nVMX: Nested VMX, v10 Nadav Har'El
2011-05-16 19:44 ` [PATCH 01/31] nVMX: Add "nested" module option to kvm_intel Nadav Har'El
2011-05-16 19:44 ` [PATCH 02/31] nVMX: Implement VMXON and VMXOFF Nadav Har'El
2011-05-20  7:58   ` Tian, Kevin
2011-05-16 19:45 ` [PATCH 03/31] nVMX: Allow setting the VMXE bit in CR4 Nadav Har'El
2011-05-16 19:45 ` [PATCH 04/31] nVMX: Introduce vmcs12: a VMCS structure for L1 Nadav Har'El
2011-05-16 19:46 ` [PATCH 05/31] nVMX: Implement reading and writing of VMX MSRs Nadav Har'El
2011-05-16 19:46 ` [PATCH 06/31] nVMX: Decoding memory operands of VMX instructions Nadav Har'El
2011-05-16 19:47 ` [PATCH 07/31] nVMX: Introduce vmcs02: VMCS used to run L2 Nadav Har'El
2011-05-20  8:04   ` Tian, Kevin
2011-05-20  8:48     ` Tian, Kevin
2011-05-20 20:32       ` Nadav Har'El
2011-05-22  2:00         ` Tian, Kevin
2011-05-22  7:22           ` Nadav Har'El
2011-05-24  0:54             ` Tian, Kevin
2011-05-22  8:29     ` Nadav Har'El
2011-05-24  1:03       ` Tian, Kevin
2011-05-16 19:48 ` [PATCH 08/31] nVMX: Fix local_vcpus_link handling Nadav Har'El
2011-05-17 13:19   ` Marcelo Tosatti
2011-05-17 13:35     ` Avi Kivity
2011-05-17 14:35       ` Nadav Har'El
2011-05-17 14:42         ` Marcelo Tosatti
2011-05-17 17:57           ` Nadav Har'El
2011-05-17 15:11         ` Avi Kivity
2011-05-17 18:11           ` Nadav Har'El
2011-05-17 18:43             ` Marcelo Tosatti
2011-05-17 19:30               ` Nadav Har'El
2011-05-17 19:52                 ` Marcelo Tosatti
2011-05-18  5:52                   ` Nadav Har'El
2011-05-18  8:31                     ` Avi Kivity
2011-05-18  9:02                       ` Nadav Har'El
2011-05-18  9:16                         ` Avi Kivity
2011-05-18 12:08                     ` Marcelo Tosatti
2011-05-18 12:19                       ` Nadav Har'El
2011-05-22  8:57                       ` Nadav Har'El
2011-05-23 15:49                         ` Avi Kivity
2011-05-23 16:17                           ` Gleb Natapov
2011-05-23 18:59                             ` Nadav Har'El
2011-05-23 19:03                               ` Gleb Natapov
2011-05-23 16:43                           ` Roedel, Joerg
2011-05-23 16:51                             ` Avi Kivity
2011-05-24  9:22                               ` Roedel, Joerg
2011-05-24  9:28                                 ` Nadav Har'El
2011-05-24  9:57                                   ` Roedel, Joerg
2011-05-24 10:08                                     ` Avi Kivity
2011-05-24 10:12                                     ` Nadav Har'El
2011-05-23 18:51                           ` Nadav Har'El
2011-05-24  2:22                             ` Tian, Kevin
2011-05-24  7:56                               ` Nadav Har'El
2011-05-24  8:20                                 ` Tian, Kevin
2011-05-24 11:05                                   ` Avi Kivity
2011-05-24 11:20                                     ` Tian, Kevin
2011-05-24 11:27                                       ` Avi Kivity
2011-05-24 11:30                                         ` Tian, Kevin
2011-05-24 11:36                                           ` Avi Kivity
2011-05-24 11:40                                             ` Tian, Kevin
2011-05-24 11:59                                               ` Nadav Har'El
2011-05-24  0:57                           ` Tian, Kevin
2011-05-18  8:29                   ` Avi Kivity
2011-05-16 19:48 ` [PATCH 09/31] nVMX: Add VMCS fields to the vmcs12 Nadav Har'El
2011-05-20  8:22   ` Tian, Kevin
2011-05-16 19:49 ` [PATCH 10/31] nVMX: Success/failure of VMX instructions Nadav Har'El
2011-05-16 19:49 ` [PATCH 11/31] nVMX: Implement VMCLEAR Nadav Har'El
2011-05-16 19:50 ` [PATCH 12/31] nVMX: Implement VMPTRLD Nadav Har'El
2011-05-16 19:50 ` [PATCH 13/31] nVMX: Implement VMPTRST Nadav Har'El
2011-05-16 19:51 ` [PATCH 14/31] nVMX: Implement VMREAD and VMWRITE Nadav Har'El
2011-05-16 19:51 ` [PATCH 15/31] nVMX: Move host-state field setup to a function Nadav Har'El
2011-05-16 19:52 ` [PATCH 16/31] nVMX: Move control field setup to functions Nadav Har'El
2011-05-16 19:52 ` [PATCH 17/31] nVMX: Prepare vmcs02 from vmcs01 and vmcs12 Nadav Har'El
2011-05-24  8:02   ` Tian, Kevin
2011-05-24  9:19     ` Nadav Har'El
2011-05-24 10:52       ` Tian, Kevin
2011-05-16 19:53 ` [PATCH 18/31] nVMX: Implement VMLAUNCH and VMRESUME Nadav Har'El
2011-05-24  8:45   ` Tian, Kevin
2011-05-24  9:45     ` Nadav Har'El
2011-05-24 10:54       ` Tian, Kevin
2011-05-25  8:00   ` Tian, Kevin
2011-05-25 13:26     ` Nadav Har'El
2011-05-26  0:42       ` Tian, Kevin
2011-05-16 19:53 ` [PATCH 19/31] nVMX: No need for handle_vmx_insn function any more Nadav Har'El
2011-05-16 19:54 ` [PATCH 20/31] nVMX: Exiting from L2 to L1 Nadav Har'El
2011-05-24 12:58   ` Tian, Kevin
2011-05-24 13:43     ` Nadav Har'El
2011-05-25  0:55       ` Tian, Kevin [this message]
2011-05-25  8:06         ` Nadav Har'El
2011-05-25  8:23           ` Tian, Kevin
2011-05-25  2:43   ` Tian, Kevin
2011-05-25 13:21     ` Nadav Har'El
2011-05-26  0:41       ` Tian, Kevin
2011-05-16 19:54 ` [PATCH 21/31] nVMX: vmcs12 checks on nested entry Nadav Har'El
2011-05-25  3:01   ` Tian, Kevin
2011-05-25  5:38     ` Nadav Har'El
2011-05-25  7:33       ` Tian, Kevin
2011-05-16 19:55 ` [PATCH 22/31] nVMX: Deciding if L0 or L1 should handle an L2 exit Nadav Har'El
2011-05-25  7:56   ` Tian, Kevin
2011-05-25 13:45     ` Nadav Har'El
2011-05-16 19:55 ` [PATCH 23/31] nVMX: Correct handling of interrupt injection Nadav Har'El
2011-05-25  8:39   ` Tian, Kevin
2011-05-25  8:45     ` Tian, Kevin
2011-05-25 10:56     ` Nadav Har'El
2011-05-25  9:18   ` Tian, Kevin
2011-05-25 12:33     ` Nadav Har'El
2011-05-25 12:55       ` Tian, Kevin
2011-05-16 19:56 ` [PATCH 24/31] nVMX: Correct handling of exception injection Nadav Har'El
2011-05-16 19:56 ` [PATCH 25/31] nVMX: Correct handling of idt vectoring info Nadav Har'El
2011-05-25 10:02   ` Tian, Kevin
2011-05-25 10:13     ` Nadav Har'El
2011-05-25 10:17       ` Tian, Kevin
2011-05-16 19:57 ` [PATCH 26/31] nVMX: Handling of CR0 and CR4 modifying instructions Nadav Har'El
2011-05-16 19:57 ` [PATCH 27/31] nVMX: Further fixes for lazy FPU loading Nadav Har'El
2011-05-16 19:58 ` [PATCH 28/31] nVMX: Additional TSC-offset handling Nadav Har'El
2011-05-16 19:58 ` [PATCH 29/31] nVMX: Add VMX to list of supported cpuid features Nadav Har'El
2011-05-16 19:59 ` [PATCH 30/31] nVMX: Miscellenous small corrections Nadav Har'El
2011-05-16 19:59 ` [PATCH 31/31] nVMX: Documentation Nadav Har'El
2011-05-25 10:33   ` Tian, Kevin
2011-05-25 11:54     ` Nadav Har'El
2011-05-25 12:11       ` Tian, Kevin
2011-05-25 12:13     ` Muli Ben-Yehuda
2011-05-25 20:01 [PATCH 0/31] nVMX: Nested VMX, v11 Nadav Har'El
2011-05-25 20:11 ` [PATCH 20/31] nVMX: Exiting from L2 to L1 Nadav Har'El

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=625BA99ED14B2D499DC4E29D8138F1505C9BFA35FD@shsmsx502.ccr.corp.intel.com \
    --to=kevin.tian@intel.com \
    --cc=avi@redhat.com \
    --cc=gleb@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=nyh@math.technion.ac.il \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.