All of lore.kernel.org
 help / color / mirror / Atom feed
From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
To: Gleb Natapov <gleb@redhat.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>,
	kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH v3 03/13] nEPT: Add EPT tables support to paging_tmpl.h
Date: Tue, 18 Jun 2013 20:51:25 +0800	[thread overview]
Message-ID: <51C057CD.6010907@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130618105745.GF5832@redhat.com>

On 06/18/2013 06:57 PM, Gleb Natapov wrote:
> On Mon, Jun 17, 2013 at 08:11:03PM +0800, Xiao Guangrong wrote:
>> On 06/11/2013 07:32 PM, Gleb Natapov wrote:
>>> On Tue, May 21, 2013 at 03:52:12PM +0800, Xiao Guangrong wrote:
>>>> On 05/19/2013 12:52 PM, Jun Nakajima wrote:
>>>>> From: Nadav Har'El <nyh@il.ibm.com>
>>>>>
>>>>> This is the first patch in a series which adds nested EPT support to KVM's
>>>>> nested VMX. Nested EPT means emulating EPT for an L1 guest so that L1 can use
>>>>> EPT when running a nested guest L2. When L1 uses EPT, it allows the L2 guest
>>>>> to set its own cr3 and take its own page faults without either of L0 or L1
>>>>> getting involved. This often significanlty improves L2's performance over the
>>>>> previous two alternatives (shadow page tables over EPT, and shadow page
>>>>> tables over shadow page tables).
>>>>>
>>>>> This patch adds EPT support to paging_tmpl.h.
>>>>>
>>>>> paging_tmpl.h contains the code for reading and writing page tables. The code
>>>>> for 32-bit and 64-bit tables is very similar, but not identical, so
>>>>> paging_tmpl.h is #include'd twice in mmu.c, once with PTTTYPE=32 and once
>>>>> with PTTYPE=64, and this generates the two sets of similar functions.
>>>>>
>>>>> There are subtle but important differences between the format of EPT tables
>>>>> and that of ordinary x86 64-bit page tables, so for nested EPT we need a
>>>>> third set of functions to read the guest EPT table and to write the shadow
>>>>> EPT table.
>>>>>
>>>>> So this patch adds third PTTYPE, PTTYPE_EPT, which creates functions (prefixed
>>>>> with "EPT") which correctly read and write EPT tables.
>>>>>
>>>>> Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
>>>>> Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
>>>>> Signed-off-by: Xinhao Xu <xinhao.xu@intel.com>
>>>>> ---
>>>>>  arch/x86/kvm/mmu.c         |  5 +++++
>>>>>  arch/x86/kvm/paging_tmpl.h | 43 +++++++++++++++++++++++++++++++++++++++++--
>>>>>  2 files changed, 46 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>>>>> index 117233f..6c1670f 100644
>>>>> --- a/arch/x86/kvm/mmu.c
>>>>> +++ b/arch/x86/kvm/mmu.c
>>>>> @@ -3397,6 +3397,11 @@ static inline bool is_last_gpte(struct kvm_mmu *mmu, unsigned level, unsigned gp
>>>>>  	return mmu->last_pte_bitmap & (1 << index);
>>>>>  }
>>>>>
>>>>> +#define PTTYPE_EPT 18 /* arbitrary */
>>>>> +#define PTTYPE PTTYPE_EPT
>>>>> +#include "paging_tmpl.h"
>>>>> +#undef PTTYPE
>>>>> +
>>>>>  #define PTTYPE 64
>>>>>  #include "paging_tmpl.h"
>>>>>  #undef PTTYPE
>>>>> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
>>>>> index df34d4a..4c45654 100644
>>>>> --- a/arch/x86/kvm/paging_tmpl.h
>>>>> +++ b/arch/x86/kvm/paging_tmpl.h
>>>>> @@ -50,6 +50,22 @@
>>>>>  	#define PT_LEVEL_BITS PT32_LEVEL_BITS
>>>>>  	#define PT_MAX_FULL_LEVELS 2
>>>>>  	#define CMPXCHG cmpxchg
>>>>> +#elif PTTYPE == PTTYPE_EPT
>>>>> +	#define pt_element_t u64
>>>>> +	#define guest_walker guest_walkerEPT
>>>>> +	#define FNAME(name) EPT_##name
>>>>> +	#define PT_BASE_ADDR_MASK PT64_BASE_ADDR_MASK
>>>>> +	#define PT_LVL_ADDR_MASK(lvl) PT64_LVL_ADDR_MASK(lvl)
>>>>> +	#define PT_LVL_OFFSET_MASK(lvl) PT64_LVL_OFFSET_MASK(lvl)
>>>>> +	#define PT_INDEX(addr, level) PT64_INDEX(addr, level)
>>>>> +	#define PT_LEVEL_BITS PT64_LEVEL_BITS
>>>>> +	#ifdef CONFIG_X86_64
>>>>> +	#define PT_MAX_FULL_LEVELS 4
>>>>> +	#define CMPXCHG cmpxchg
>>>>> +	#else
>>>>> +	#define CMPXCHG cmpxchg64
>>>>
>>>> CMPXHG is only used in FNAME(cmpxchg_gpte), but you commented it later.
>>>> Do we really need it?
>>>>
>>>>> +	#define PT_MAX_FULL_LEVELS 2
>>>>
>>>> And the SDM says:
>>>>
>>>> "It uses a page-walk length of 4, meaning that at most 4 EPT paging-structure
>>>> entriesare accessed to translate a guest-physical address.", Is My SDM obsolete?
>>>> Which kind of process supports page-walk length = 2?
>>>>
>>>> It seems your patch is not able to handle the case that the guest uses walk-lenght = 2
>>>> which is running on the host with walk-lenght = 4.
>>>> (plrease refer to how to handle sp->role.quadrant in FNAME(get_level1_sp_gpa) in
>>>> the current code.)
>>>>
>>> But since EPT always has 4 levels on all existing cpus it is not an issue and the only case
>>> that we should worry about is guest walk-lenght == host walk-lenght == 4, or have I
>>
>> Yes. I totally agree with you, but...
>>
>>> misunderstood what you mean here?
>>
>> What confused me is that this patch defines "#define PT_MAX_FULL_LEVELS 2", so i asked the
>> question: "Which kind of process supports page-walk length = 2".
>> Sorry, there is a typo in my origin comments. "process" should be "processor" or "CPU".
>>
> That is how I understood it, but then the discussion moved to dropping
> of nEPT support on 32-bit host. What's the connection? Even on 32bit

If EPT supports "walk-level = 2" on 32bit host (maybe it is not true), i thought dropping
32bit support to reduce the complex is worthwhile, otherwise:
a) we need to handle different page size between L0 and L2 and
b) we need to carefully review the code due to lacking PDPT supporting on nept on L2.

I remember that the origin version of NEPT did not support PAE-32bit L2 guest. I have
found it out:
http://comments.gmane.org/gmane.comp.emulators.kvm.devel/95395

It seems no changes in this version, I have no idea how it was fixed in this version.

> host the walk is 4 levels. Doesn't shadow page code support 4 levels on
> 32bit host?

Yes, it does. 4 levels is fine on 32bit host.

If EPT only supports 4 levels on both 32bit and 64bit hosts, there is no big difference
to support nept on 32bit L2 and 64bit L2.



  reply	other threads:[~2013-06-18 12:51 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-19  4:52 [PATCH v3 01/13] nEPT: Support LOAD_IA32_EFER entry/exit controls for L1 Jun Nakajima
2013-05-19  4:52 ` [PATCH v3 02/13] nEPT: Move gpte_access() and prefetch_invalid_gpte() to paging_tmpl.h Jun Nakajima
2013-05-20 12:34   ` Paolo Bonzini
2013-05-19  4:52 ` [PATCH v3 03/13] nEPT: Add EPT tables support " Jun Nakajima
2013-05-21  7:52   ` Xiao Guangrong
2013-05-21  8:30     ` Xiao Guangrong
2013-05-21  9:01       ` Gleb Natapov
2013-05-21 11:05         ` Xiao Guangrong
2013-05-21 22:26           ` Nakajima, Jun
2013-05-22  1:10             ` Xiao Guangrong
2013-05-22  6:16             ` Gleb Natapov
2013-06-11 11:32     ` Gleb Natapov
2013-06-17 12:11       ` Xiao Guangrong
2013-06-18 10:57         ` Gleb Natapov
2013-06-18 12:51           ` Xiao Guangrong [this message]
2013-06-18 13:01             ` Gleb Natapov
2013-05-19  4:52 ` [PATCH v3 04/13] nEPT: Define EPT-specific link_shadow_page() Jun Nakajima
2013-05-20 12:43   ` Paolo Bonzini
2013-05-21  8:15   ` Xiao Guangrong
2013-05-21 21:44     ` Nakajima, Jun
2013-05-19  4:52 ` [PATCH v3 05/13] nEPT: MMU context for nested EPT Jun Nakajima
2013-05-21  8:50   ` Xiao Guangrong
2013-05-21 22:30     ` Nakajima, Jun
2013-05-19  4:52 ` [PATCH v3 06/13] nEPT: Fix cr3 handling in nested exit and entry Jun Nakajima
2013-05-20 13:19   ` Paolo Bonzini
2013-06-12 12:42   ` Gleb Natapov
2013-05-19  4:52 ` [PATCH v3 07/13] nEPT: Fix wrong test in kvm_set_cr3 Jun Nakajima
2013-05-20 13:17   ` Paolo Bonzini
2013-05-19  4:52 ` [PATCH v3 08/13] nEPT: Some additional comments Jun Nakajima
2013-05-20 13:21   ` Paolo Bonzini
2013-05-19  4:52 ` [PATCH v3 09/13] nEPT: Advertise EPT to L1 Jun Nakajima
2013-05-20 13:05   ` Paolo Bonzini
2013-05-19  4:52 ` [PATCH v3 10/13] nEPT: Nested INVEPT Jun Nakajima
2013-05-20 12:46   ` Paolo Bonzini
2013-05-21  9:16   ` Xiao Guangrong
2013-05-19  4:52 ` [PATCH v3 11/13] nEPT: Miscelleneous cleanups Jun Nakajima
2013-05-19  4:52 ` [PATCH v3 12/13] nEPT: Move is_rsvd_bits_set() to paging_tmpl.h Jun Nakajima
2013-05-19  4:52 ` [PATCH v3 13/13] nEPT: Inject EPT violation/misconfigration Jun Nakajima
2013-05-20 13:09   ` Paolo Bonzini
2013-05-21 10:56   ` Xiao Guangrong
2013-05-20 12:33 ` [PATCH v3 01/13] nEPT: Support LOAD_IA32_EFER entry/exit controls for L1 Paolo Bonzini
2013-07-02  3:01   ` Zhang, Yang Z
2013-07-02 13:59     ` Gleb Natapov
2013-07-02 14:28       ` Jan Kiszka
2013-07-02 15:15         ` Gleb Natapov
2013-07-02 15:34           ` Jan Kiszka
2013-07-02 15:43             ` Gleb Natapov
2013-07-04  8:42               ` Zhang, Yang Z
2013-07-08 12:37                 ` Gleb Natapov
2013-07-08 14:28                   ` Zhang, Yang Z
2013-07-08 16:08                     ` Gleb Natapov
  -- strict thread matches above, loose matches on Subject: below --
2013-05-09  0:53 Jun Nakajima
2013-05-09  0:53 ` [PATCH v3 02/13] nEPT: Move gpte_access() and prefetch_invalid_gpte() to paging_tmpl.h Jun Nakajima
2013-05-09  0:53   ` [PATCH v3 03/13] nEPT: Add EPT tables support " Jun Nakajima

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51C057CD.6010907@linux.vnet.ibm.com \
    --to=xiaoguangrong@linux.vnet.ibm.com \
    --cc=gleb@redhat.com \
    --cc=jun.nakajima@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.