All of lore.kernel.org
 help / color / mirror / Atom feed
From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
To: Jun Nakajima <jun.nakajima@intel.com>
Cc: kvm@vger.kernel.org, Gleb Natapov <gleb@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH v3 03/13] nEPT: Add EPT tables support to paging_tmpl.h
Date: Tue, 21 May 2013 15:52:12 +0800	[thread overview]
Message-ID: <519B27AC.6090903@linux.vnet.ibm.com> (raw)
In-Reply-To: <1368939152-11406-3-git-send-email-jun.nakajima@intel.com>

On 05/19/2013 12:52 PM, Jun Nakajima wrote:
> From: Nadav Har'El <nyh@il.ibm.com>
> 
> This is the first patch in a series which adds nested EPT support to KVM's
> nested VMX. Nested EPT means emulating EPT for an L1 guest so that L1 can use
> EPT when running a nested guest L2. When L1 uses EPT, it allows the L2 guest
> to set its own cr3 and take its own page faults without either of L0 or L1
> getting involved. This often significanlty improves L2's performance over the
> previous two alternatives (shadow page tables over EPT, and shadow page
> tables over shadow page tables).
> 
> This patch adds EPT support to paging_tmpl.h.
> 
> paging_tmpl.h contains the code for reading and writing page tables. The code
> for 32-bit and 64-bit tables is very similar, but not identical, so
> paging_tmpl.h is #include'd twice in mmu.c, once with PTTTYPE=32 and once
> with PTTYPE=64, and this generates the two sets of similar functions.
> 
> There are subtle but important differences between the format of EPT tables
> and that of ordinary x86 64-bit page tables, so for nested EPT we need a
> third set of functions to read the guest EPT table and to write the shadow
> EPT table.
> 
> So this patch adds third PTTYPE, PTTYPE_EPT, which creates functions (prefixed
> with "EPT") which correctly read and write EPT tables.
> 
> Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
> Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
> Signed-off-by: Xinhao Xu <xinhao.xu@intel.com>
> ---
>  arch/x86/kvm/mmu.c         |  5 +++++
>  arch/x86/kvm/paging_tmpl.h | 43 +++++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 46 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 117233f..6c1670f 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -3397,6 +3397,11 @@ static inline bool is_last_gpte(struct kvm_mmu *mmu, unsigned level, unsigned gp
>  	return mmu->last_pte_bitmap & (1 << index);
>  }
> 
> +#define PTTYPE_EPT 18 /* arbitrary */
> +#define PTTYPE PTTYPE_EPT
> +#include "paging_tmpl.h"
> +#undef PTTYPE
> +
>  #define PTTYPE 64
>  #include "paging_tmpl.h"
>  #undef PTTYPE
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index df34d4a..4c45654 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -50,6 +50,22 @@
>  	#define PT_LEVEL_BITS PT32_LEVEL_BITS
>  	#define PT_MAX_FULL_LEVELS 2
>  	#define CMPXCHG cmpxchg
> +#elif PTTYPE == PTTYPE_EPT
> +	#define pt_element_t u64
> +	#define guest_walker guest_walkerEPT
> +	#define FNAME(name) EPT_##name
> +	#define PT_BASE_ADDR_MASK PT64_BASE_ADDR_MASK
> +	#define PT_LVL_ADDR_MASK(lvl) PT64_LVL_ADDR_MASK(lvl)
> +	#define PT_LVL_OFFSET_MASK(lvl) PT64_LVL_OFFSET_MASK(lvl)
> +	#define PT_INDEX(addr, level) PT64_INDEX(addr, level)
> +	#define PT_LEVEL_BITS PT64_LEVEL_BITS
> +	#ifdef CONFIG_X86_64
> +	#define PT_MAX_FULL_LEVELS 4
> +	#define CMPXCHG cmpxchg
> +	#else
> +	#define CMPXCHG cmpxchg64

CMPXHG is only used in FNAME(cmpxchg_gpte), but you commented it later.
Do we really need it?

> +	#define PT_MAX_FULL_LEVELS 2

And the SDM says:

"It uses a page-walk length of 4, meaning that at most 4 EPT paging-structure
entriesare accessed to translate a guest-physical address.", Is My SDM obsolete?
Which kind of process supports page-walk length = 2?

It seems your patch is not able to handle the case that the guest uses walk-lenght = 2
which is running on the host with walk-lenght = 4.
(plrease refer to how to handle sp->role.quadrant in FNAME(get_level1_sp_gpa) in
the current code.)

> +	#endif
>  #else
>  	#error Invalid PTTYPE value
>  #endif
> @@ -80,6 +96,10 @@ static gfn_t gpte_to_gfn_lvl(pt_element_t gpte, int lvl)
>  	return (gpte & PT_LVL_ADDR_MASK(lvl)) >> PAGE_SHIFT;
>  }
> 
> +#if PTTYPE != PTTYPE_EPT
> +/*
> + *  Comment out this for EPT because update_accessed_dirty_bits() is not used.
> + */
>  static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>  			       pt_element_t __user *ptep_user, unsigned index,
>  			       pt_element_t orig_pte, pt_element_t new_pte)
> @@ -102,6 +122,7 @@ static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> 
>  	return (ret != orig_pte);
>  }
> +#endif
> 
>  static bool FNAME(prefetch_invalid_gpte)(struct kvm_vcpu *vcpu,
>  				  struct kvm_mmu_page *sp, u64 *spte,
> @@ -126,13 +147,21 @@ no_present:
>  static inline unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, u64 gpte)
>  {
>  	unsigned access;
> -
> +#if PTTYPE == PTTYPE_EPT
> +	access = (gpte & (VMX_EPT_READABLE_MASK | VMX_EPT_WRITABLE_MASK |
> +			  VMX_EPT_EXECUTABLE_MASK));

It seems wrong. The ACC_XXX definition:

#define ACC_EXEC_MASK    1
#define ACC_WRITE_MASK   PT_WRITABLE_MASK
#define ACC_USER_MASK    PT_USER_MASK
#define ACC_ALL          (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK)

The bits are different with the bits used in EPT page table, for example,
your code always see that the execution is not allowed.

> +#else
>  	access = (gpte & (PT_WRITABLE_MASK | PT_USER_MASK)) | ACC_EXEC_MASK;
>  	access &= ~(gpte >> PT64_NX_SHIFT);
> +#endif
> 
>  	return access;
>  }
> 
> +#if PTTYPE != PTTYPE_EPT
> +/*
> + * EPT A/D bit support is not implemented.
> + */
>  static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu,
>  					     struct kvm_mmu *mmu,
>  					     struct guest_walker *walker,
> @@ -169,6 +198,7 @@ static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu,
>  	}
>  	return 0;
>  }
> +#endif
> 
>  /*
>   * Fetch a guest pte for a guest virtual address
> @@ -177,7 +207,6 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
>  				    struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>  				    gva_t addr, u32 access)
>  {
> -	int ret;
>  	pt_element_t pte;
>  	pt_element_t __user *uninitialized_var(ptep_user);
>  	gfn_t table_gfn;
> @@ -192,7 +221,9 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
>  	gfn_t gfn;
> 
>  	trace_kvm_mmu_pagetable_walk(addr, access);
> +#if PTTYPE != PTTYPE_EPT
>  retry_walk:
> +#endif
>  	walker->level = mmu->root_level;
>  	pte           = mmu->get_cr3(vcpu);
> 
> @@ -277,6 +308,7 @@ retry_walk:
> 
>  	walker->gfn = real_gpa >> PAGE_SHIFT;
> 
> +#if PTTYPE != PTTYPE_EPT
>  	if (!write_fault)
>  		protect_clean_gpte(&pte_access, pte);
>  	else
> @@ -287,12 +319,15 @@ retry_walk:
>  		accessed_dirty &= pte >> (PT_DIRTY_SHIFT - PT_ACCESSED_SHIFT);
> 
>  	if (unlikely(!accessed_dirty)) {
> +		int ret;
> +
>  		ret = FNAME(update_accessed_dirty_bits)(vcpu, mmu, walker, write_fault);
>  		if (unlikely(ret < 0))
>  			goto error;
>  		else if (ret)
>  			goto retry_walk;
>  	}
> +#endif

There are lots of code in paging_tmpl.h depends on PT_ACCESSED_MASK/PT_DIRTY_MASK.
I do not see other parts are adjusted in your patch.

How about redefine PT_ACCESSED_MASK / PT_DIRTY_MASK, something like:

#if PTTYPE == 32
PT_ACCESS = PT_ACCESSED_MASK;
......
#elif PTTYPE == 64
PT_ACCESS = PT_ACCESSED_MASK;
......
#elif PTTYPE == PTTYPE_EPT
PT_ACCESS = 0
#else
.......

I guess the compiler can drop the unnecessary branch when PT_ACCESS == 0.
Also, it can help use to remove the untidy "#if PTTYPE != PTTYPE_EPT"

> 
>  	walker->pt_access = pt_access;
>  	walker->pte_access = pte_access;
> @@ -323,6 +358,7 @@ static int FNAME(walk_addr)(struct guest_walker *walker,
>  					access);
>  }
> 
> +#if PTTYPE != PTTYPE_EPT
>  static int FNAME(walk_addr_nested)(struct guest_walker *walker,
>  				   struct kvm_vcpu *vcpu, gva_t addr,
>  				   u32 access)
> @@ -330,6 +366,7 @@ static int FNAME(walk_addr_nested)(struct guest_walker *walker,
>  	return FNAME(walk_addr_generic)(walker, vcpu, &vcpu->arch.nested_mmu,
>  					addr, access);
>  }
> +#endif
> 
>  static bool
>  FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
> @@ -754,6 +791,7 @@ static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t vaddr, u32 access,
>  	return gpa;
>  }
> 
> +#if PTTYPE != PTTYPE_EPT
>  static gpa_t FNAME(gva_to_gpa_nested)(struct kvm_vcpu *vcpu, gva_t vaddr,
>  				      u32 access,
>  				      struct x86_exception *exception)
> @@ -772,6 +810,7 @@ static gpa_t FNAME(gva_to_gpa_nested)(struct kvm_vcpu *vcpu, gva_t vaddr,
> 
>  	return gpa;
>  }
> +#endif

Strange!

Why does nested ept not need these functions? How to emulate the instruction faulted on L2?




  reply	other threads:[~2013-05-21  7:52 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-19  4:52 [PATCH v3 01/13] nEPT: Support LOAD_IA32_EFER entry/exit controls for L1 Jun Nakajima
2013-05-19  4:52 ` [PATCH v3 02/13] nEPT: Move gpte_access() and prefetch_invalid_gpte() to paging_tmpl.h Jun Nakajima
2013-05-20 12:34   ` Paolo Bonzini
2013-05-19  4:52 ` [PATCH v3 03/13] nEPT: Add EPT tables support " Jun Nakajima
2013-05-21  7:52   ` Xiao Guangrong [this message]
2013-05-21  8:30     ` Xiao Guangrong
2013-05-21  9:01       ` Gleb Natapov
2013-05-21 11:05         ` Xiao Guangrong
2013-05-21 22:26           ` Nakajima, Jun
2013-05-22  1:10             ` Xiao Guangrong
2013-05-22  6:16             ` Gleb Natapov
2013-06-11 11:32     ` Gleb Natapov
2013-06-17 12:11       ` Xiao Guangrong
2013-06-18 10:57         ` Gleb Natapov
2013-06-18 12:51           ` Xiao Guangrong
2013-06-18 13:01             ` Gleb Natapov
2013-05-19  4:52 ` [PATCH v3 04/13] nEPT: Define EPT-specific link_shadow_page() Jun Nakajima
2013-05-20 12:43   ` Paolo Bonzini
2013-05-21  8:15   ` Xiao Guangrong
2013-05-21 21:44     ` Nakajima, Jun
2013-05-19  4:52 ` [PATCH v3 05/13] nEPT: MMU context for nested EPT Jun Nakajima
2013-05-21  8:50   ` Xiao Guangrong
2013-05-21 22:30     ` Nakajima, Jun
2013-05-19  4:52 ` [PATCH v3 06/13] nEPT: Fix cr3 handling in nested exit and entry Jun Nakajima
2013-05-20 13:19   ` Paolo Bonzini
2013-06-12 12:42   ` Gleb Natapov
2013-05-19  4:52 ` [PATCH v3 07/13] nEPT: Fix wrong test in kvm_set_cr3 Jun Nakajima
2013-05-20 13:17   ` Paolo Bonzini
2013-05-19  4:52 ` [PATCH v3 08/13] nEPT: Some additional comments Jun Nakajima
2013-05-20 13:21   ` Paolo Bonzini
2013-05-19  4:52 ` [PATCH v3 09/13] nEPT: Advertise EPT to L1 Jun Nakajima
2013-05-20 13:05   ` Paolo Bonzini
2013-05-19  4:52 ` [PATCH v3 10/13] nEPT: Nested INVEPT Jun Nakajima
2013-05-20 12:46   ` Paolo Bonzini
2013-05-21  9:16   ` Xiao Guangrong
2013-05-19  4:52 ` [PATCH v3 11/13] nEPT: Miscelleneous cleanups Jun Nakajima
2013-05-19  4:52 ` [PATCH v3 12/13] nEPT: Move is_rsvd_bits_set() to paging_tmpl.h Jun Nakajima
2013-05-19  4:52 ` [PATCH v3 13/13] nEPT: Inject EPT violation/misconfigration Jun Nakajima
2013-05-20 13:09   ` Paolo Bonzini
2013-05-21 10:56   ` Xiao Guangrong
2013-05-20 12:33 ` [PATCH v3 01/13] nEPT: Support LOAD_IA32_EFER entry/exit controls for L1 Paolo Bonzini
2013-07-02  3:01   ` Zhang, Yang Z
2013-07-02 13:59     ` Gleb Natapov
2013-07-02 14:28       ` Jan Kiszka
2013-07-02 15:15         ` Gleb Natapov
2013-07-02 15:34           ` Jan Kiszka
2013-07-02 15:43             ` Gleb Natapov
2013-07-04  8:42               ` Zhang, Yang Z
2013-07-08 12:37                 ` Gleb Natapov
2013-07-08 14:28                   ` Zhang, Yang Z
2013-07-08 16:08                     ` Gleb Natapov
  -- strict thread matches above, loose matches on Subject: below --
2013-05-09  0:53 Jun Nakajima
2013-05-09  0:53 ` [PATCH v3 02/13] nEPT: Move gpte_access() and prefetch_invalid_gpte() to paging_tmpl.h Jun Nakajima
2013-05-09  0:53   ` [PATCH v3 03/13] nEPT: Add EPT tables support " Jun Nakajima

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=519B27AC.6090903@linux.vnet.ibm.com \
    --to=xiaoguangrong@linux.vnet.ibm.com \
    --cc=gleb@redhat.com \
    --cc=jun.nakajima@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.