All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gleb Natapov <gleb@redhat.com>
To: "Zhang, Yang Z" <yang.z.zhang@intel.com>
Cc: Jan Kiszka <jan.kiszka@web.de>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"Nakajima, Jun" <jun.nakajima@intel.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: Re: [PATCH v3 01/13] nEPT: Support LOAD_IA32_EFER entry/exit controls for L1
Date: Mon, 8 Jul 2013 15:37:54 +0300	[thread overview]
Message-ID: <20130708123753.GF5113@redhat.com> (raw)
In-Reply-To: <A9667DDFB95DB7438FA9D7D576C3D87E0A8B16EE@SHSMSX104.ccr.corp.intel.com>

On Thu, Jul 04, 2013 at 08:42:53AM +0000, Zhang, Yang Z wrote:
> Gleb Natapov wrote on 2013-07-02:
> > On Tue, Jul 02, 2013 at 05:34:56PM +0200, Jan Kiszka wrote:
> >> On 2013-07-02 17:15, Gleb Natapov wrote:
> >>> On Tue, Jul 02, 2013 at 04:28:56PM +0200, Jan Kiszka wrote:
> >>>> On 2013-07-02 15:59, Gleb Natapov wrote:
> >>>>> On Tue, Jul 02, 2013 at 03:01:24AM +0000, Zhang, Yang Z wrote:
> >>>>>> Since this series is pending in mail list for long time. And
> >>>>>> it's really a big feature for Nested. Also, I doubt the
> >>>>>> original authors(Jun and Nahav)should not have enough time to continue it.
> >>>>>> So I will pick it up. :)
> >>>>>> 
> >>>>>> See comments below:
> >>>>>> 
> >>>>>> Paolo Bonzini wrote on 2013-05-20:
> >>>>>>> Il 19/05/2013 06:52, Jun Nakajima ha scritto:
> >>>>>>>> From: Nadav Har'El <nyh@il.ibm.com>
> >>>>>>>> 
> >>>>>>>> Recent KVM, since
> >>>>>>>> http://kerneltrap.org/mailarchive/linux-kvm/2010/5/2/6261577
> >>>>>>>> switch the EFER MSR when EPT is used and the host and guest have
> >>>>>>>> different NX bits. So if we add support for nested EPT (L1 guest
> >>>>>>>> using EPT to run L2) and want to be able to run recent KVM as L1,
> >>>>>>>> we need to allow L1 to use this EFER switching feature.
> >>>>>>>> 
> >>>>>>>> To do this EFER switching, KVM uses VM_ENTRY/EXIT_LOAD_IA32_EFER
> >>>>>>>> if available, and if it isn't, it uses the generic
> >>>>>>>> VM_ENTRY/EXIT_MSR_LOAD. This patch adds support for the former
> >>>>>>>> (the latter is still unsupported).
> >>>>>>>> 
> >>>>>>>> Nested entry and exit emulation (prepare_vmcs_02 and
> >>>>>>>> load_vmcs12_host_state, respectively) already handled
> >>>>>>>> VM_ENTRY/EXIT_LOAD_IA32_EFER correctly. So all that's left to do
> >>>>>>>> in this patch is to properly advertise this feature to L1.
> >>>>>>>> 
> >>>>>>>> Note that vmcs12's VM_ENTRY/EXIT_LOAD_IA32_EFER are emulated by
> >>>>>>>> L0, by using vmx_set_efer (which itself sets one of several
> >>>>>>>> vmcs02 fields), so we always support this feature, regardless of
> >>>>>>>> whether the host supports it.
> >>>>>>>> 
> >>>>>>>> Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
> >>>>>>>> Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
> >>>>>>>> Signed-off-by: Xinhao Xu <xinhao.xu@intel.com>
> >>>>>>>> ---
> >>>>>>>>  arch/x86/kvm/vmx.c | 23 ++++++++++++++++-------
> >>>>>>>>  1 file changed, 16 insertions(+), 7 deletions(-)
> >>>>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
> >>>>>>>> 260a919..fb9cae5 100644
> >>>>>>>> --- a/arch/x86/kvm/vmx.c
> >>>>>>>> +++ b/arch/x86/kvm/vmx.c
> >>>>>>>> @@ -2192,7 +2192,8 @@ static __init void
> >>>>>>>> nested_vmx_setup_ctls_msrs(void)  #else
> >>>>>>>>  	nested_vmx_exit_ctls_high = 0;  #endif
> >>>>>>>> -	nested_vmx_exit_ctls_high |= VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR;
> >>>>>>>> +	nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR
> >>>>>>>> | +				      VM_EXIT_LOAD_IA32_EFER);
> >>>>>>>> 
> >>>>>>>>  	/* entry controls */
> >>>>>>>>  	rdmsr(MSR_IA32_VMX_ENTRY_CTLS, @@ -2201,8 +2202,8
> > @@ static
> >>>>>>>> __init void nested_vmx_setup_ctls_msrs(void)
> >>>>>>>>  	nested_vmx_entry_ctls_low = VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR;
> >>>>>>>>  	nested_vmx_entry_ctls_high &= 		VM_ENTRY_LOAD_IA32_PAT |
> >>>>>>>>  VM_ENTRY_IA32E_MODE;
> >>>>>>>> -	nested_vmx_entry_ctls_high |=
> >>>>>>>> VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR; -
> >>>>>>>> +	nested_vmx_entry_ctls_high |=
> >>>>>>>> (VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | +				      
> >>>>>>>> VM_ENTRY_LOAD_IA32_EFER);
> >>>>>>>>  	/* cpu-based controls */
> >>>>>>>>  	rdmsr(MSR_IA32_VMX_PROCBASED_CTLS,
> >>>>>>>>  		nested_vmx_procbased_ctls_low,
> >>>>>>>> nested_vmx_procbased_ctls_high); @@ -7492,10 +7493,18 @@ static
> >>>>>>>> void prepare_vmcs02(struct kvm_vcpu *vcpu,
> >>>>>>> struct vmcs12 *vmcs12)
> >>>>>>>>  	vcpu->arch.cr0_guest_owned_bits &=
> >>>>>>>>  ~vmcs12->cr0_guest_host_mask; 	vmcs_writel(CR0_GUEST_HOST_MASK,
> >>>>>>> ~vcpu->arch.cr0_guest_owned_bits);
> >>>>>>>> 
> >>>>>>>> -	/* Note: IA32_MODE, LOAD_IA32_EFER are modified by
> > vmx_set_efer
> >>>>>>> below */
> >>>>>>>> -	vmcs_write32(VM_EXIT_CONTROLS, -		vmcs12->vm_exit_controls |
> >>>>>>>> vmcs_config.vmexit_ctrl); -	vmcs_write32(VM_ENTRY_CONTROLS,
> >>>>>>>> vmcs12->vm_entry_controls | +	/* L2->L1 exit controls are
> >>>>>>>> emulated - the hardware exit is +to L0 so +	 * we should use its
> >>>>>>>> exit controls. Note that IA32_MODE, LOAD_IA32_EFER +	 * bits are
> >>>>>>>> further modified by vmx_set_efer() below. +	 */
> >>>>>>>> +	vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl);
> >>>>>> This is wrong. We cannot use L0 exit control directly.
> >>>>>> LOAD_PERF_GLOBAL_CTRL, LOAD_HOST_EFE, LOAD_HOST_PAT,
> > ACK_INTR_ON_EXIT should use host's exit control. But others, still
> > need use (vmcs12|host).
> >>>>>> 
> >>>>> I do not see why. We always intercept DR7/PAT/EFER, so save is
> >>>>> emulated too. Host address space size always come from L0 and
> >>>>> preemption timer is not supported for nested IIRC and when it
> >>>>> will be host will have to save it on exit anyway for correct emulation.
> >>>> 
> >>>> Preemption timer is already supported and works fine as far as I tested.
> >>>> KVM doesn't use it for L1, so we do not need to save/restore it - IIRC.
> >>>> 
> >>> So what happens if L1 configures it to value X after X/2 ticks L0
> >>> exit happen and L0 gets back to L2 directly. The counter will be X
> >>> again instead of X/2.
> >> 
> >> Likely. Yes, we need to improve our emulation by setting "Save
> >> VMX-preemption timer value" or emulate this in software if the
> >> hardware lacks support for it (was this flag introduced after the
> >> preemption timer itself?).
> >> 
> > Not sure, but my point was that for correct emulation host needs to
> > set "save preempt timer on vmexit" anyway so all VM_EXIT_CONTROLS are
> > indeed emulated as far as I see.
> >
> Ok, here is my summary, please correct me if I am wrong:
> bit 2: Save debug controls, the first processor only support 1-setting on it, so just use host's setting is enough
Not because first processor only supported 1-setting, but because L0
intercepts all changes to DR7 and DEBUGCTL MSR, so L2 cannot change them
behind L0 back. If L1 asks to save them in vmcs12 L0 can do it during
vmexit emulation.

> bit 9: Host address space size, it indicate the host's state, so must use host's setting.
> bit 12: Load IA32_PERF_GLOBAL_CTRL: same as above.
Not sure what "above" do you mean. It is fully emulated during vmexit
emulation. We do not want PERF_GLOBAL_CTRL to be loaded during L2->L0
vmexit, we want it to be loaded in L1 after L0 emulates L2->L1 vmexit.
But I think there is a bug in current emulation. The code looks like
this:

        if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL)
                vmcs_write64(GUEST_IA32_PERF_GLOBAL_CTRL,
                        vmcs12->host_ia32_perf_global_ctrl);

But GUEST_IA32_PERF_GLOBAL_CTRL will not be loaded during L1 entry unless
VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL is set. Why do we assume it is set
here?

> bit 15 : Acknowledge interrupt on exit: same as above.
Same as bit 9.

> bit 19: Load IA32_PAT: same as above.
> bit 20: Load IA32_EFER: same as above.
Those two are same as bit 12.

> bit 18: Save IA32_PAT, Didn't expose it to L1, so use host' setting is ok.
> bit 19: Save IA32_EFER, same as above.
Those two are the same a bit 2.

> bit 22: Save VMXpreemption timer value, I don't see KVM expose it to L1, but Jan said it's working. Strange! And according gleb's suggestion, it better to always set it.
>
It exposed it in nested_vmx_setup_ctls_msrs:
 
        nested_vmx_pinbased_ctls_high &= PIN_BASED_EXT_INTR_MASK |
                PIN_BASED_NMI_EXITING | PIN_BASED_VIRTUAL_NMIS |
                PIN_BASED_VMX_PREEMPTION_TIMER;

According to Gleb's suggestion current emulation is broken and to fix it
the bit will have to be set on each L2 entry if L1 is using preemption timer.

> So, currently, only use host' exit_control for L2 is enough.
> 
> Best regards,
> Yang
> 

--
			Gleb.

  reply	other threads:[~2013-07-08 12:37 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-19  4:52 [PATCH v3 01/13] nEPT: Support LOAD_IA32_EFER entry/exit controls for L1 Jun Nakajima
2013-05-19  4:52 ` [PATCH v3 02/13] nEPT: Move gpte_access() and prefetch_invalid_gpte() to paging_tmpl.h Jun Nakajima
2013-05-20 12:34   ` Paolo Bonzini
2013-05-19  4:52 ` [PATCH v3 03/13] nEPT: Add EPT tables support " Jun Nakajima
2013-05-21  7:52   ` Xiao Guangrong
2013-05-21  8:30     ` Xiao Guangrong
2013-05-21  9:01       ` Gleb Natapov
2013-05-21 11:05         ` Xiao Guangrong
2013-05-21 22:26           ` Nakajima, Jun
2013-05-22  1:10             ` Xiao Guangrong
2013-05-22  6:16             ` Gleb Natapov
2013-06-11 11:32     ` Gleb Natapov
2013-06-17 12:11       ` Xiao Guangrong
2013-06-18 10:57         ` Gleb Natapov
2013-06-18 12:51           ` Xiao Guangrong
2013-06-18 13:01             ` Gleb Natapov
2013-05-19  4:52 ` [PATCH v3 04/13] nEPT: Define EPT-specific link_shadow_page() Jun Nakajima
2013-05-20 12:43   ` Paolo Bonzini
2013-05-21  8:15   ` Xiao Guangrong
2013-05-21 21:44     ` Nakajima, Jun
2013-05-19  4:52 ` [PATCH v3 05/13] nEPT: MMU context for nested EPT Jun Nakajima
2013-05-21  8:50   ` Xiao Guangrong
2013-05-21 22:30     ` Nakajima, Jun
2013-05-19  4:52 ` [PATCH v3 06/13] nEPT: Fix cr3 handling in nested exit and entry Jun Nakajima
2013-05-20 13:19   ` Paolo Bonzini
2013-06-12 12:42   ` Gleb Natapov
2013-05-19  4:52 ` [PATCH v3 07/13] nEPT: Fix wrong test in kvm_set_cr3 Jun Nakajima
2013-05-20 13:17   ` Paolo Bonzini
2013-05-19  4:52 ` [PATCH v3 08/13] nEPT: Some additional comments Jun Nakajima
2013-05-20 13:21   ` Paolo Bonzini
2013-05-19  4:52 ` [PATCH v3 09/13] nEPT: Advertise EPT to L1 Jun Nakajima
2013-05-20 13:05   ` Paolo Bonzini
2013-05-19  4:52 ` [PATCH v3 10/13] nEPT: Nested INVEPT Jun Nakajima
2013-05-20 12:46   ` Paolo Bonzini
2013-05-21  9:16   ` Xiao Guangrong
2013-05-19  4:52 ` [PATCH v3 11/13] nEPT: Miscelleneous cleanups Jun Nakajima
2013-05-19  4:52 ` [PATCH v3 12/13] nEPT: Move is_rsvd_bits_set() to paging_tmpl.h Jun Nakajima
2013-05-19  4:52 ` [PATCH v3 13/13] nEPT: Inject EPT violation/misconfigration Jun Nakajima
2013-05-20 13:09   ` Paolo Bonzini
2013-05-21 10:56   ` Xiao Guangrong
2013-05-20 12:33 ` [PATCH v3 01/13] nEPT: Support LOAD_IA32_EFER entry/exit controls for L1 Paolo Bonzini
2013-07-02  3:01   ` Zhang, Yang Z
2013-07-02 13:59     ` Gleb Natapov
2013-07-02 14:28       ` Jan Kiszka
2013-07-02 15:15         ` Gleb Natapov
2013-07-02 15:34           ` Jan Kiszka
2013-07-02 15:43             ` Gleb Natapov
2013-07-04  8:42               ` Zhang, Yang Z
2013-07-08 12:37                 ` Gleb Natapov [this message]
2013-07-08 14:28                   ` Zhang, Yang Z
2013-07-08 16:08                     ` Gleb Natapov
  -- strict thread matches above, loose matches on Subject: below --
2013-05-09  0:53 Jun Nakajima
2013-05-13 12:25 ` Gleb Natapov
2013-05-19  4:57   ` Nakajima, Jun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130708123753.GF5113@redhat.com \
    --to=gleb@redhat.com \
    --cc=jan.kiszka@web.de \
    --cc=jun.nakajima@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=yang.z.zhang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.