Re: [PATCH v2 01/10] nestedhap: Change hostcr3 and p2m->cr3 to meaningful words

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH v2 01/10] nestedhap: Change hostcr3 and p2m->cr3 to meaningful words
  2012-12-19 19:44 ` [PATCH v2 01/10] nestedhap: Change hostcr3 and p2m->cr3 to meaningful words Xiantao Zhang
@ 2012-12-19  8:19   ` Jan Beulich
  2012-12-19  8:40     ` Zhang, Xiantao
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2012-12-19  8:19 UTC (permalink / raw)
  To: Xiantao Zhang; +Cc: keir, tim, eddie.dong, jun.nakajima, xen-devel

>>> On 19.12.12 at 20:44, Xiantao Zhang <xiantao.zhang@intel.com> wrote:
> From: Zhang Xiantao <xiantao.zhang@intel.com>
> 
> VMX doesn't have the concept about host cr3 for nested p2m,
> and only SVM has, so change it to netural words.
> 
> Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>

You had an ack on this from Tim already, so unless you needed to
drop it because of non-trivial changes (which I see no indication
of either here or in 00/10), you should be adding such below your
S-o-b line to avoid committers from having to go hunt for prior
acks in the list archives. Same for at least 02/10 and 06/10.

Jan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 08/10] nEPT: handle invept instruction from L1 VMM
  2012-12-19 19:44 ` [PATCH v2 08/10] nEPT: handle invept instruction from L1 VMM Xiantao Zhang
@ 2012-12-19  8:28   ` Jan Beulich
  2012-12-20  2:39     ` Zhang, Xiantao
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2012-12-19  8:28 UTC (permalink / raw)
  To: Xiantao Zhang; +Cc: keir, tim, eddie.dong, jun.nakajima, xen-devel

>>> On 19.12.12 at 20:44, Xiantao Zhang <xiantao.zhang@intel.com> wrote:
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -2572,11 +2572,13 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
>          if ( nvmx_handle_vmresume(regs) == X86EMUL_OKAY )
>              update_guest_eip();
>          break;
> -
> +    case EXIT_REASON_INVEPT:
> +        if ( nvmx_handle_invept(regs) == X86EMUL_OKAY )
> +            update_guest_eip();
> +        break;

In (potentially going to become) long switch statements, please
don't drop the blank lines between individual cases - instead of
dropping the line here, you wold want to insert another one
below the new separately handled case.

>      case EXIT_REASON_MWAIT_INSTRUCTION:
>      case EXIT_REASON_MONITOR_INSTRUCTION:
>      case EXIT_REASON_GETSEC:
> -    case EXIT_REASON_INVEPT:
>      case EXIT_REASON_INVVPID:
>          /*
>           * We should never exit on GETSEC because CR4.SMXE is always 0 when
> --- a/xen/arch/x86/hvm/vmx/vvmx.c
> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> @@ -1356,6 +1356,45 @@ int nvmx_handle_vmwrite(struct cpu_user_regs *regs)
>      return X86EMUL_OKAY;
>  }
>  
> +int nvmx_handle_invept(struct cpu_user_regs *regs)
> +{
> +    struct vmx_inst_decoded decode;
> +    unsigned long eptp;
> +    u64 inv_type;
> +
> +    if ( !cpu_has_vmx_ept )
> +        return X86EMUL_EXCEPTION;
> +
> +    if ( decode_vmx_inst(regs, &decode, &eptp, 0)
> +             != X86EMUL_OKAY )
> +        return X86EMUL_EXCEPTION;
> +
> +    inv_type = reg_read(regs, decode.reg2);
> +    gdprintk(XENLOG_DEBUG,"inv_type:%ld, eptp:%lx\n", inv_type, eptp);

An unconditional printk() on an operation potentially happening
quite frequently? Even with XENLOG_DEBUG this is not acceptable
imo.

> +
> +    switch ( inv_type ) {
> +    case INVEPT_SINGLE_CONTEXT:
> +        {
> +            struct p2m_domain *p2m = vcpu_nestedhvm(current).nv_p2m;
> +            if ( p2m )
> +            {
> +	            p2m_flush(current, p2m);

Despite your comment in 00/10, there still is a whitespace issues
at least here (didn't look that closely elsewhere).

> +                ept_sync_domain(p2m);
> +            }
> +        }
> +        break;
> +    case INVEPT_ALL_CONTEXT:
> +        p2m_flush_nestedp2m(current->domain);
> +        __invept(INVEPT_ALL_CONTEXT, 0, 0);
> +        break;
> +    default:
> +        return X86EMUL_EXCEPTION;
> +    }
> +    vmreturn(regs, VMSUCCEED);
> +    return X86EMUL_OKAY;
> +}
> +
> +
>  #define __emul_value(enable1, default1) \
>      ((enable1 | default1) << 32 | (default1))
>  
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -1465,7 +1465,7 @@ p2m_flush_table(struct p2m_domain *p2m)
>  void
>  p2m_flush(struct vcpu *v, struct p2m_domain *p2m)
>  {
> -    ASSERT(v->domain == p2m->domain);
> +    ASSERT(p2m && v->domain == p2m->domain);

How is this change related to the rest of the patch?

Jan

>      vcpu_nestedhvm(v).nv_p2m = NULL;
>      p2m_flush_table(p2m);
>      hvm_asid_flush_vcpu(v);

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 01/10] nestedhap: Change hostcr3 and p2m->cr3 to meaningful words
  2012-12-19  8:19   ` Jan Beulich
@ 2012-12-19  8:40     ` Zhang, Xiantao
  0 siblings, 0 replies; 18+ messages in thread
From: Zhang, Xiantao @ 2012-12-19  8:40 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, Dong, Eddie, tim, xen-devel, Nakajima, Jun, Zhang, Xiantao

Thanks,  I will send them  again. 
Xiantao

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Wednesday, December 19, 2012 4:20 PM
> To: Zhang, Xiantao
> Cc: Dong, Eddie; Nakajima, Jun; xen-devel@lists.xen.org; keir@xen.org;
> tim@xen.org
> Subject: Re: [PATCH v2 01/10] nestedhap: Change hostcr3 and p2m->cr3 to
> meaningful words
> 
> >>> On 19.12.12 at 20:44, Xiantao Zhang <xiantao.zhang@intel.com> wrote:
> > From: Zhang Xiantao <xiantao.zhang@intel.com>
> >
> > VMX doesn't have the concept about host cr3 for nested p2m, and only
> > SVM has, so change it to netural words.
> >
> > Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
> 
> You had an ack on this from Tim already, so unless you needed to drop it
> because of non-trivial changes (which I see no indication of either here or in
> 00/10), you should be adding such below your S-o-b line to avoid committers
> from having to go hunt for prior acks in the list archives. Same for at least
> 02/10 and 06/10.
> 
> Jan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 03/10] nested_ept: Implement guest ept's walker
  2012-12-19 19:44 ` [PATCH v2 03/10] nested_ept: Implement guest ept's walker Xiantao Zhang
@ 2012-12-19 16:42   ` Nakajima, Jun
  0 siblings, 0 replies; 18+ messages in thread
From: Nakajima, Jun @ 2012-12-19 16:42 UTC (permalink / raw)
  To: Xiantao Zhang; +Cc: tim, keir, eddie.dong, JBeulich, xen-devel

On Wed, Dec 19, 2012 at 11:44 AM, Xiantao Zhang <xiantao.zhang@intel.com> wrote:
> From: Zhang Xiantao <xiantao.zhang@intel.com>
>
> Implment guest EPT PT walker, some logic is based on shadow's
> ia32e PT walker. During the PT walking, if the target pages are
> not in memory, use RETRY mechanism and get a chance to let the
> target page back.

It's just a programming style, but I would add 'break' for the default
case in the switch statements. Also, we should set an error for some
of the switch statements. We should not depend on print messages for
error handling or debugging.

>
> Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
> ---
>  xen/arch/x86/hvm/hvm.c              |    1 +
>  xen/arch/x86/hvm/vmx/vvmx.c         |   42 +++++-
>  xen/arch/x86/mm/guest_walk.c        |   16 ++-
>  xen/arch/x86/mm/hap/Makefile        |    1 +
>  xen/arch/x86/mm/hap/nested_ept.c    |  276 +++++++++++++++++++++++++++++++++++
>  xen/arch/x86/mm/hap/nested_hap.c    |    2 +-
>  xen/arch/x86/mm/shadow/multi.c      |    2 +-
>  xen/include/asm-x86/guest_pt.h      |    8 +
>  xen/include/asm-x86/hvm/nestedhvm.h |    1 +
>  xen/include/asm-x86/hvm/vmx/vmcs.h  |    1 +
>  xen/include/asm-x86/hvm/vmx/vmx.h   |   28 ++++
>  xen/include/asm-x86/hvm/vmx/vvmx.h  |   14 ++
>  12 files changed, 382 insertions(+), 10 deletions(-)
>  create mode 100644 xen/arch/x86/mm/hap/nested_ept.c
>
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 1cae8a8..3cd0075 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -1324,6 +1324,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa,
>                                               access_r, access_w, access_x);
>          switch (rv) {
>          case NESTEDHVM_PAGEFAULT_DONE:
> +        case NESTEDHVM_PAGEFAULT_RETRY:
>              return 1;
>          case NESTEDHVM_PAGEFAULT_L1_ERROR:
>              /* An error occured while translating gpa from
> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
> index 4495dd6..76cf757 100644
> --- a/xen/arch/x86/hvm/vmx/vvmx.c
> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> @@ -906,9 +906,18 @@ static void sync_vvmcs_ro(struct vcpu *v)
>  {
>      int i;
>      struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
> +    struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
> +    void *vvmcs = nvcpu->nv_vvmcx;
>
>      for ( i = 0; i < ARRAY_SIZE(vmcs_ro_field); i++ )
>          shadow_to_vvmcs(nvcpu->nv_vvmcx, vmcs_ro_field[i]);
> +
> +    /* Adjust exit_reason/exit_qualifciation for violation case */
> +    if ( __get_vvmcs(vvmcs, VM_EXIT_REASON) ==
> +                EXIT_REASON_EPT_VIOLATION ) {
> +        __set_vvmcs(vvmcs, EXIT_QUALIFICATION, nvmx->ept_exit.exit_qual);
> +        __set_vvmcs(vvmcs, VM_EXIT_REASON, nvmx->ept_exit.exit_reason);
> +    }
>  }
>
>  static void load_vvmcs_host_state(struct vcpu *v)
> @@ -1454,8 +1463,37 @@ nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa,
>                        unsigned int *page_order,
>                        bool_t access_r, bool_t access_w, bool_t access_x)
>  {
> -    /*TODO:*/
> -    return 0;
> +    uint64_t exit_qual = __vmread(EXIT_QUALIFICATION);
> +    uint32_t exit_reason = EXIT_REASON_EPT_VIOLATION;
> +    int rc;
> +    unsigned long gfn;
> +    uint32_t rwx_rights = (access_x << 2) | (access_w << 1) | access_r;
> +    struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
> +
> +    rc = nept_translate_l2ga(v, L2_gpa, page_order, rwx_rights, &gfn,
> +                                &exit_qual, &exit_reason);
> +    switch ( rc ) {
> +        case EPT_TRANSLATE_SUCCEED:
> +            *L1_gpa = (gfn << PAGE_SHIFT) + (L2_gpa & ~PAGE_MASK);
> +            rc = NESTEDHVM_PAGEFAULT_DONE;
> +            break;
> +        case EPT_TRANSLATE_VIOLATION:
> +        case EPT_TRANSLATE_MISCONFIG:
> +            rc = NESTEDHVM_PAGEFAULT_INJECT;
> +            nvmx->ept_exit.exit_reason = exit_reason;
> +            nvmx->ept_exit.exit_qual = exit_qual;
> +            break;
> +        case EPT_TRANSLATE_RETRY:
> +            rc = NESTEDHVM_PAGEFAULT_RETRY;
> +            break;
> +        case EPT_TRANSLATE_ERR_PAGE:
> +            rc = NESTEDHVM_PAGEFAULT_L1_ERROR;
> +            break;
> +        default:
> +            gdprintk(XENLOG_ERR, "GUEST EPT translation error!\n");

break and rc = ;

> +    }
> +
> +    return rc;
>  }
>
>  void nvmx_idtv_handling(void)
> diff --git a/xen/arch/x86/mm/guest_walk.c b/xen/arch/x86/mm/guest_walk.c
> index 0f08fb0..1c165c6 100644
> --- a/xen/arch/x86/mm/guest_walk.c
> +++ b/xen/arch/x86/mm/guest_walk.c
> @@ -88,18 +88,19 @@ static uint32_t set_ad_bits(void *guest_p, void *walk_p, int set_dirty)
>
>  /* If the map is non-NULL, we leave this function having
>   * acquired an extra ref on mfn_to_page(*mfn) */
> -static inline void *map_domain_gfn(struct p2m_domain *p2m,
> -                                   gfn_t gfn,
> +void *map_domain_gfn(struct p2m_domain *p2m,
> +                                   gfn_t gfn,
>                                     mfn_t *mfn,
>                                     p2m_type_t *p2mt,
> -                                   uint32_t *rc)
> +                                   p2m_query_t q,
> +                                   uint32_t *rc)
>  {
>      struct page_info *page;
>      void *map;
>
>      /* Translate the gfn, unsharing if shared */
>      page = get_page_from_gfn_p2m(p2m->domain, p2m, gfn_x(gfn), p2mt, NULL,
> -                                  P2M_ALLOC | P2M_UNSHARE);
> +                                  q);
>      if ( p2m_is_paging(*p2mt) )
>      {
>          ASSERT(!p2m_is_nestedp2m(p2m));
> @@ -128,7 +129,6 @@ static inline void *map_domain_gfn(struct p2m_domain *p2m,
>      return map;
>  }
>
> -
>  /* Walk the guest pagetables, after the manner of a hardware walker. */
>  /* Because the walk is essentially random, it can cause a deadlock
>   * warning in the p2m locking code. Highly unlikely this is an actual
> @@ -149,6 +149,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
>      uint32_t gflags, mflags, iflags, rc = 0;
>      int smep;
>      bool_t pse1G = 0, pse2M = 0;
> +    p2m_query_t qt = P2M_ALLOC | P2M_UNSHARE;
>
>      perfc_incr(guest_walk);
>      memset(gw, 0, sizeof(*gw));
> @@ -188,7 +189,8 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
>      l3p = map_domain_gfn(p2m,
>                           guest_l4e_get_gfn(gw->l4e),
>                           &gw->l3mfn,
> -                         &p2mt,
> +                         &p2mt,
> +                         qt,
>                           &rc);
>      if(l3p == NULL)
>          goto out;
> @@ -249,6 +251,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
>                           guest_l3e_get_gfn(gw->l3e),
>                           &gw->l2mfn,
>                           &p2mt,
> +                         qt,
>                           &rc);
>      if(l2p == NULL)
>          goto out;
> @@ -322,6 +325,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
>                               guest_l2e_get_gfn(gw->l2e),
>                               &gw->l1mfn,
>                               &p2mt,
> +                             qt,
>                               &rc);
>          if(l1p == NULL)
>              goto out;
> diff --git a/xen/arch/x86/mm/hap/Makefile b/xen/arch/x86/mm/hap/Makefile
> index 80a6bec..68f2bb5 100644
> --- a/xen/arch/x86/mm/hap/Makefile
> +++ b/xen/arch/x86/mm/hap/Makefile
> @@ -3,6 +3,7 @@ obj-y += guest_walk_2level.o
>  obj-y += guest_walk_3level.o
>  obj-$(x86_64) += guest_walk_4level.o
>  obj-y += nested_hap.o
> +obj-y += nested_ept.o
>
>  guest_walk_%level.o: guest_walk.c Makefile
>         $(CC) $(CFLAGS) -DGUEST_PAGING_LEVELS=$* -c $< -o $@
> diff --git a/xen/arch/x86/mm/hap/nested_ept.c b/xen/arch/x86/mm/hap/nested_ept.c
> new file mode 100644
> index 0000000..5f80d82
> --- /dev/null
> +++ b/xen/arch/x86/mm/hap/nested_ept.c
> @@ -0,0 +1,276 @@
> +/*
> + * nested_ept.c: Handling virtulized EPT for guest in nested case.
> + *
> + * Copyright (c) 2012, Intel Corporation
> + *  Xiantao Zhang <xiantao.zhang@intel.com>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
> + * Place - Suite 330, Boston, MA 02111-1307 USA.
> + */
> +#include <asm/domain.h>
> +#include <asm/page.h>
> +#include <asm/paging.h>
> +#include <asm/p2m.h>
> +#include <asm/mem_event.h>
> +#include <public/mem_event.h>
> +#include <asm/mem_sharing.h>
> +#include <xen/event.h>
> +#include <asm/hap.h>
> +#include <asm/hvm/support.h>
> +
> +#include <asm/hvm/nestedhvm.h>
> +
> +#include "private.h"
> +
> +#include <asm/hvm/vmx/vmx.h>
> +#include <asm/hvm/vmx/vvmx.h>
> +
> +/* EPT always use 4-level paging structure */
> +#define GUEST_PAGING_LEVELS 4
> +#include <asm/guest_pt.h>
> +
> +/* Must reserved bits in all level entries  */
> +#define EPT_MUST_RSV_BITS (((1ull << PADDR_BITS) -1) & \
> +                     ~((1ull << paddr_bits) - 1))
> +
> +/*
> + *TODO: Just leave it as 0 here for compile pass, will
> + * define real capabilities in the subsequent patches.
> + */
> +#define NEPT_VPID_CAP_BITS 0
> +
> +
> +#define NEPT_1G_ENTRY_FLAG (1 << 11)
> +#define NEPT_2M_ENTRY_FLAG (1 << 10)
> +#define NEPT_4K_ENTRY_FLAG (1 << 9)
> +
> +bool_t nept_sp_entry(ept_entry_t e)
> +{
> +    return !!(e.sp);
> +}
> +
> +static bool_t nept_rsv_bits_check(ept_entry_t e, uint32_t level)
> +{
> +    uint64_t rsv_bits = EPT_MUST_RSV_BITS;
> +
> +    switch ( level ) {
> +    case 1:
> +        break;
> +    case 2 ... 3:
> +        if (nept_sp_entry(e))
> +            rsv_bits |=  ((1ull << (9 * (level -1 ))) -1) << PAGE_SHIFT;
> +        else
> +            rsv_bits |= EPTE_EMT_MASK | EPTE_IGMT_MASK;
> +        break;
> +    case 4:
> +        rsv_bits |= EPTE_EMT_MASK | EPTE_IGMT_MASK | EPTE_SUPER_PAGE_MASK;
> +    break;
> +    default:
> +        printk("Unsupported EPT paging level: %d\n", level);

break (or return) and we need to take a definitive action here.

> +    }
> +    return !!(e.epte & rsv_bits);
> +}
> +
> +/* EMT checking*/
> +static bool_t nept_emt_bits_check(ept_entry_t e, uint32_t level)
> +{
> +    if ( e.sp || level == 1 ) {
> +        if ( e.emt == EPT_EMT_RSV0 || e.emt == EPT_EMT_RSV1 ||
> +                e.emt == EPT_EMT_RSV2 )
> +            return 1;
> +    }
> +    return 0;
> +}
> +
> +static bool_t nept_rwx_bits_check(ept_entry_t e) {
> +    /*write only or write/execute only*/
> +    uint8_t rwx_bits = e.epte & EPTE_RWX_MASK;
> +
> +    if ( rwx_bits == ept_access_w || rwx_bits == ept_access_wx )
> +        return 1;
> +
> +    if ( rwx_bits == ept_access_x && !(NEPT_VPID_CAP_BITS &
> +                        VMX_EPT_EXEC_ONLY_SUPPORTED))
> +        return 1;
> +
> +    return 0;
> +}
> +
> +/* nept's misconfiguration check */
> +static bool_t nept_misconfiguration_check(ept_entry_t e, uint32_t level)
> +{
> +    return (nept_rsv_bits_check(e, level) ||
> +                nept_emt_bits_check(e, level) ||
> +                nept_rwx_bits_check(e));
> +}
> +
> +static bool_t nept_permission_check(uint32_t rwx_acc, uint32_t rwx_bits)
> +{
> +    return !(EPTE_RWX_MASK & rwx_acc & ~rwx_bits);
> +}
> +
> +/* nept's non-present check */
> +static bool_t nept_non_present_check(ept_entry_t e)
> +{
> +    if (e.epte & EPTE_RWX_MASK)
> +        return 0;
> +    return 1;
> +}
> +
> +uint64_t nept_get_ept_vpid_cap(void)
> +{
> +    return NEPT_VPID_CAP_BITS;
> +}
> +
> +static int ept_lvl_table_offset(unsigned long gpa, int lvl)
> +{
> +    return (gpa >>(EPT_L4_PAGETABLE_SHIFT -(4 - lvl) * 9)) &
> +                (EPT_PAGETABLE_ENTRIES -1 );
> +}
> +
> +static uint32_t
> +nept_walk_tables(struct vcpu *v, unsigned long l2ga, ept_walk_t *gw)
> +{
> +    int lvl;
> +    p2m_type_t p2mt;
> +    uint32_t rc = 0, ret = 0, gflags;
> +    struct domain *d = v->domain;
> +    struct p2m_domain *p2m = d->arch.p2m;
> +    gfn_t base_gfn = _gfn(nhvm_vcpu_p2m_base(v) >> PAGE_SHIFT);
> +    mfn_t lxmfn;
> +    ept_entry_t *lxp = NULL;
> +
> +    memset(gw, 0, sizeof(*gw));
> +
> +    for (lvl = 4; lvl > 0; lvl--)
> +    {
> +        lxp = map_domain_gfn(p2m, base_gfn, &lxmfn, &p2mt, P2M_ALLOC, &rc);
> +        if ( !lxp )
> +            goto map_err;
> +        gw->lxe[lvl] = lxp[ept_lvl_table_offset(l2ga, lvl)];
> +        unmap_domain_page(lxp);
> +        put_page(mfn_to_page(mfn_x(lxmfn)));
> +
> +        if (nept_non_present_check(gw->lxe[lvl]))
> +            goto non_present;
> +
> +        if (nept_misconfiguration_check(gw->lxe[lvl], lvl))
> +            goto misconfig_err;
> +
> +        if ( (lvl == 2 || lvl == 3) && nept_sp_entry(gw->lxe[lvl]) )
> +        {
> +            /* Generate a fake l1 table entry so callers don't all
> +             * have to understand superpages. */
> +            unsigned long gfn_lvl_mask =  (1ull << ((lvl - 1) * 9)) - 1;
> +            gfn_t start = _gfn(gw->lxe[lvl].mfn);
> +            /* Increment the pfn by the right number of 4k pages. */
> +            start = _gfn((gfn_x(start) & ~gfn_lvl_mask) +
> +                     ((l2ga >> PAGE_SHIFT) & gfn_lvl_mask));
> +            gflags = (gw->lxe[lvl].epte & EPTE_FLAG_MASK) |
> +                    (lvl == 3 ? NEPT_1G_ENTRY_FLAG: NEPT_2M_ENTRY_FLAG);
> +            gw->lxe[0].epte = (gfn_x(start) << PAGE_SHIFT) | gflags;
> +            goto done;
> +        }
> +        if ( lvl > 1 )
> +            base_gfn = _gfn(gw->lxe[lvl].mfn);
> +    }
> +
> +    /* If this is not a super entry, we can reach here. */
> +    gflags = (gw->lxe[1].epte & EPTE_FLAG_MASK) | NEPT_4K_ENTRY_FLAG;
> +    gw->lxe[0].epte = (gw->lxe[1].epte & PAGE_MASK) | gflags;
> +
> +done:
> +    ret = EPT_TRANSLATE_SUCCEED;
> +    goto out;
> +
> +map_err:
> +    if ( rc == _PAGE_PAGED )
> +        ret = EPT_TRANSLATE_RETRY;
> +    else
> +        ret = EPT_TRANSLATE_ERR_PAGE;
> +    goto out;
> +
> +misconfig_err:
> +    ret =  EPT_TRANSLATE_MISCONFIG;
> +    goto out;
> +
> +non_present:
> +    ret = EPT_TRANSLATE_VIOLATION;
> +    /* fall through. */
> +out:
> +    return ret;
> +}
> +
> +/* Translate a L2 guest address to L1 gpa via L1 EPT paging structure */
> +
> +int nept_translate_l2ga(struct vcpu *v, paddr_t l2ga,
> +                        unsigned int *page_order, uint32_t rwx_acc,
> +                        unsigned long *l1gfn, uint64_t *exit_qual,
> +                        uint32_t *exit_reason)
> +{
> +    uint32_t rc, rwx_bits = 0;
> +    ept_walk_t gw;
> +    rwx_acc &= EPTE_RWX_MASK;
> +
> +    *l1gfn = INVALID_GFN;
> +
> +    rc = nept_walk_tables(v, l2ga, &gw);
> +    switch ( rc ) {
> +    case EPT_TRANSLATE_SUCCEED:
> +        if ( likely(gw.lxe[0].epte & NEPT_2M_ENTRY_FLAG) )
> +        {
> +            rwx_bits = gw.lxe[4].epte & gw.lxe[3].epte & gw.lxe[2].epte &
> +                            EPTE_RWX_MASK;
> +            *page_order = 9;
> +        }
> +        else if ( gw.lxe[0].epte & NEPT_4K_ENTRY_FLAG ) {
> +            rwx_bits = gw.lxe[4].epte & gw.lxe[3].epte & gw.lxe[2].epte &
> +                    gw.lxe[1].epte & EPTE_RWX_MASK;
> +            *page_order = 0;
> +        }
> +        else if ( gw.lxe[0].epte & NEPT_1G_ENTRY_FLAG  )
> +        {
> +            rwx_bits = gw.lxe[4].epte & gw.lxe[3].epte  & EPTE_RWX_MASK;
> +            *page_order = 18;
> +        }
> +        else
> +        {
> +            gdprintk(XENLOG_ERR, "Uncorrect l1 entry!\n");
> +            BUG();
> +        }
> +        if ( nept_permission_check(rwx_acc, rwx_bits) )
> +        {
> +            *l1gfn = gw.lxe[0].mfn;
> +            break;
> +        }
> +        rc = EPT_TRANSLATE_VIOLATION;
> +    /* Fall through to EPT violation if permission check fails. */
> +    case EPT_TRANSLATE_VIOLATION:
> +        *exit_qual = (*exit_qual & 0xffffffc0) | (rwx_bits << 3) | rwx_acc;
> +        *exit_reason = EXIT_REASON_EPT_VIOLATION;
> +        break;
> +
> +    case EPT_TRANSLATE_ERR_PAGE:
> +        break;
> +    case EPT_TRANSLATE_MISCONFIG:
> +        rc = EPT_TRANSLATE_MISCONFIG;
> +        *exit_qual = 0;
> +        *exit_reason = EXIT_REASON_EPT_MISCONFIG;
> +        break;
> +    case EPT_TRANSLATE_RETRY:
> +        break;
> +    default:
> +        gdprintk(XENLOG_ERR, "Unsupported ept translation type!:%d\n", rc);

Same here.

> +    }
> +    return rc;
> +}
> diff --git a/xen/arch/x86/mm/hap/nested_hap.c b/xen/arch/x86/mm/hap/nested_hap.c
> index 8787c91..6d1264b 100644
> --- a/xen/arch/x86/mm/hap/nested_hap.c
> +++ b/xen/arch/x86/mm/hap/nested_hap.c
> @@ -217,7 +217,7 @@ nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t *L2_gpa,
>      /* let caller to handle these two cases */
>      switch (rv) {
>      case NESTEDHVM_PAGEFAULT_INJECT:
> -        return rv;
> +    case NESTEDHVM_PAGEFAULT_RETRY:
>      case NESTEDHVM_PAGEFAULT_L1_ERROR:
>          return rv;
>      case NESTEDHVM_PAGEFAULT_DONE:
> diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
> index 4967da1..409198c 100644
> --- a/xen/arch/x86/mm/shadow/multi.c
> +++ b/xen/arch/x86/mm/shadow/multi.c
> @@ -4582,7 +4582,7 @@ static mfn_t emulate_gva_to_mfn(struct vcpu *v,
>      /* Translate the GFN to an MFN */
>      ASSERT(!paging_locked_by_me(v->domain));
>      mfn = get_gfn(v->domain, _gfn(gfn), &p2mt);
> -
> +
>      if ( p2m_is_readonly(p2mt) )
>      {
>          put_gfn(v->domain, gfn);
> diff --git a/xen/include/asm-x86/guest_pt.h b/xen/include/asm-x86/guest_pt.h
> index 4e1dda0..db8a0b6 100644
> --- a/xen/include/asm-x86/guest_pt.h
> +++ b/xen/include/asm-x86/guest_pt.h
> @@ -315,6 +315,14 @@ guest_walk_to_page_order(walk_t *gw)
>  #define GPT_RENAME2(_n, _l) _n ## _ ## _l ## _levels
>  #define GPT_RENAME(_n, _l) GPT_RENAME2(_n, _l)
>  #define guest_walk_tables GPT_RENAME(guest_walk_tables, GUEST_PAGING_LEVELS)
> +#define map_domain_gfn GPT_RENAME(map_domain_gfn, GUEST_PAGING_LEVELS)
> +
> +extern void *map_domain_gfn(struct p2m_domain *p2m,
> +                                   gfn_t gfn,
> +                                   mfn_t *mfn,
> +                                   p2m_type_t *p2mt,
> +                                   p2m_query_t q,
> +                                   uint32_t *rc);
>
>  extern uint32_t
>  guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m, unsigned long va,
> diff --git a/xen/include/asm-x86/hvm/nestedhvm.h b/xen/include/asm-x86/hvm/nestedhvm.h
> index 91fde0b..649c511 100644
> --- a/xen/include/asm-x86/hvm/nestedhvm.h
> +++ b/xen/include/asm-x86/hvm/nestedhvm.h
> @@ -52,6 +52,7 @@ bool_t nestedhvm_vcpu_in_guestmode(struct vcpu *v);
>  #define NESTEDHVM_PAGEFAULT_L1_ERROR   2
>  #define NESTEDHVM_PAGEFAULT_L0_ERROR   3
>  #define NESTEDHVM_PAGEFAULT_MMIO       4
> +#define NESTEDHVM_PAGEFAULT_RETRY      5
>  int nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t *L2_gpa,
>      bool_t access_r, bool_t access_w, bool_t access_x);
>
> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
> index ef2c9c9..9a728b6 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -194,6 +194,7 @@ extern u32 vmx_secondary_exec_control;
>
>  extern bool_t cpu_has_vmx_ins_outs_instr_info;
>
> +#define VMX_EPT_EXEC_ONLY_SUPPORTED             0x00000001
>  #define VMX_EPT_WALK_LENGTH_4_SUPPORTED         0x00000040
>  #define VMX_EPT_MEMORY_TYPE_UC                  0x00000100
>  #define VMX_EPT_MEMORY_TYPE_WB                  0x00004000
> diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
> index aa5b080..feaaa80 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmx.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmx.h
> @@ -51,6 +51,11 @@ typedef union {
>      u64 epte;
>  } ept_entry_t;
>
> +typedef struct {
> +    /*use lxe[0] to save result */
> +    ept_entry_t lxe[5];
> +} ept_walk_t;
> +
>  #define EPT_TABLE_ORDER         9
>  #define EPTE_SUPER_PAGE_MASK    0x80
>  #define EPTE_MFN_MASK           0xffffffffff000ULL
> @@ -60,6 +65,28 @@ typedef union {
>  #define EPTE_AVAIL1_SHIFT       8
>  #define EPTE_EMT_SHIFT          3
>  #define EPTE_IGMT_SHIFT         6
> +#define EPTE_RWX_MASK           0x7
> +#define EPTE_FLAG_MASK          0x7f
> +
> +#define EPT_EMT_UC              0
> +#define EPT_EMT_WC              1
> +#define EPT_EMT_RSV0            2
> +#define EPT_EMT_RSV1            3
> +#define EPT_EMT_WT              4
> +#define EPT_EMT_WP              5
> +#define EPT_EMT_WB              6
> +#define EPT_EMT_RSV2            7
> +
> +typedef enum {
> +    ept_access_n     = 0, /* No access permissions allowed */
> +    ept_access_r     = 1,
> +    ept_access_w     = 2,
> +    ept_access_rw    = 3,
> +    ept_access_x     = 4,
> +    ept_access_rx    = 5,
> +    ept_access_wx    = 6,
> +    ept_access_all   = 7,
> +} ept_access_t;
>
>  void vmx_asm_vmexit_handler(struct cpu_user_regs);
>  void vmx_asm_do_vmentry(void);
> @@ -419,6 +446,7 @@ void update_guest_eip(void);
>  #define _EPT_GLA_FAULT              8
>  #define EPT_GLA_FAULT               (1UL<<_EPT_GLA_FAULT)
>
> +#define EPT_L4_PAGETABLE_SHIFT      39
>  #define EPT_PAGETABLE_ENTRIES       512
>
>  #endif /* __ASM_X86_HVM_VMX_VMX_H__ */
> diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h b/xen/include/asm-x86/hvm/vmx/vvmx.h
> index 422f006..8eb377b 100644
> --- a/xen/include/asm-x86/hvm/vmx/vvmx.h
> +++ b/xen/include/asm-x86/hvm/vmx/vvmx.h
> @@ -32,6 +32,10 @@ struct nestedvmx {
>          unsigned long intr_info;
>          u32           error_code;
>      } intr;
> +    struct {
> +        uint32_t exit_reason;
> +        uint32_t exit_qual;
> +    } ept_exit;
>  };
>
>  #define vcpu_2_nvmx(v) (vcpu_nestedhvm(v).u.nvmx)
> @@ -109,6 +113,12 @@ void nvmx_domain_relinquish_resources(struct domain *d);
>  int nvmx_handle_vmxon(struct cpu_user_regs *regs);
>  int nvmx_handle_vmxoff(struct cpu_user_regs *regs);
>
> +#define EPT_TRANSLATE_SUCCEED   0
> +#define EPT_TRANSLATE_VIOLATION 1
> +#define EPT_TRANSLATE_ERR_PAGE  2
> +#define EPT_TRANSLATE_MISCONFIG 3
> +#define EPT_TRANSLATE_RETRY     4
> +
>  int
>  nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa,
>                        unsigned int *page_order,
> @@ -192,5 +202,9 @@ u64 nvmx_get_tsc_offset(struct vcpu *v);
>  int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs,
>                            unsigned int exit_reason);
>
> +int nept_translate_l2ga(struct vcpu *v, paddr_t l2ga,
> +                        unsigned int *page_order, uint32_t rwx_acc,
> +                        unsigned long *l1gfn, uint64_t *exit_qual,
> +                        uint32_t *exit_reason);
>  #endif /* __ASM_X86_HVM_VVMX_H__ */
>
> --
> 1.7.1
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel



-- 
Jun
Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 05/10] nEPT: Try to enable EPT paging for L2 guest.
  2012-12-19 19:44 ` [PATCH v2 05/10] nEPT: Try to enable EPT paging for L2 guest Xiantao Zhang
@ 2012-12-19 17:16   ` Nakajima, Jun
  2012-12-20  1:27     ` Zhang, Xiantao
  0 siblings, 1 reply; 18+ messages in thread
From: Nakajima, Jun @ 2012-12-19 17:16 UTC (permalink / raw)
  To: Xiantao Zhang; +Cc: keir, tim, eddie.dong, JBeulich, xen-devel

Minor comments below.

On Wed, Dec 19, 2012 at 11:44 AM, Xiantao Zhang <xiantao.zhang@intel.com> wrote:
> From: Zhang Xiantao <xiantao.zhang@intel.com>
>
> Once found EPT is enabled by L1 VMM, enabled nested EPT support
> for L2 guest.
>
> Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
> ---
>  xen/arch/x86/hvm/vmx/vmx.c         |   16 +++++++++--
>  xen/arch/x86/hvm/vmx/vvmx.c        |   48 +++++++++++++++++++++++++++--------
>  xen/include/asm-x86/hvm/vmx/vvmx.h |    5 +++-
>  3 files changed, 54 insertions(+), 15 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index d74aae0..e5be5a2 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -1461,6 +1461,7 @@ static struct hvm_function_table __read_mostly vmx_function_table = {
>      .nhvm_vcpu_guestcr3   = nvmx_vcpu_guestcr3,
>      .nhvm_vcpu_p2m_base   = nvmx_vcpu_eptp_base,
>      .nhvm_vcpu_asid       = nvmx_vcpu_asid,
> +    .nhvm_vmcx_hap_enabled = nvmx_ept_enabled,
>      .nhvm_vmcx_guest_intercepts_trap = nvmx_intercepts_exception,
>      .nhvm_vcpu_vmexit_trap = nvmx_vmexit_trap,
>      .nhvm_intr_blocked    = nvmx_intr_blocked,
> @@ -2003,6 +2004,7 @@ static void ept_handle_violation(unsigned long qualification, paddr_t gpa)
>      unsigned long gla, gfn = gpa >> PAGE_SHIFT;
>      mfn_t mfn;
>      p2m_type_t p2mt;
> +    int ret;
>      struct domain *d = current->domain;
>
>      if ( tb_init_done )
> @@ -2017,18 +2019,26 @@ static void ept_handle_violation(unsigned long qualification, paddr_t gpa)
>          _d.gpa = gpa;
>          _d.qualification = qualification;
>          _d.mfn = mfn_x(get_gfn_query_unlocked(d, gfn, &_d.p2mt));
> -
> +
>          __trace_var(TRC_HVM_NPF, 0, sizeof(_d), &_d);
>      }
>
> -    if ( hvm_hap_nested_page_fault(gpa,
> +    ret = hvm_hap_nested_page_fault(gpa,
>                                     qualification & EPT_GLA_VALID       ? 1 : 0,
>                                     qualification & EPT_GLA_VALID
>                                       ? __vmread(GUEST_LINEAR_ADDRESS) : ~0ull,
>                                     qualification & EPT_READ_VIOLATION  ? 1 : 0,
>                                     qualification & EPT_WRITE_VIOLATION ? 1 : 0,
> -                                   qualification & EPT_EXEC_VIOLATION  ? 1 : 0) )
> +                                   qualification & EPT_EXEC_VIOLATION  ? 1 : 0);
> +    switch ( ret ) {
> +    case 0:
> +        break;
> +    case 1:
>          return;
> +    case -1:
> +        vcpu_nestedhvm(current).nv_vmexit_pending = 1;

I think we should add some comments for this case (e.g. what it means,
what to do).


> +        return;
> +    }
>
>      /* Everything else is an error. */
>      mfn = get_gfn_query_unlocked(d, gfn, &p2mt);
> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
> index 76cf757..c100730 100644
> --- a/xen/arch/x86/hvm/vmx/vvmx.c
> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> @@ -41,6 +41,7 @@ int nvmx_vcpu_initialise(struct vcpu *v)
>          gdprintk(XENLOG_ERR, "nest: allocation for shadow vmcs failed\n");
>         goto out;
>      }
> +    nvmx->ept.enabled = 0;
>      nvmx->vmxon_region_pa = 0;
>      nvcpu->nv_vvmcx = NULL;
>      nvcpu->nv_vvmcxaddr = VMCX_EADDR;
> @@ -96,9 +97,11 @@ uint64_t nvmx_vcpu_guestcr3(struct vcpu *v)
>
>  uint64_t nvmx_vcpu_eptp_base(struct vcpu *v)
>  {
> -    /* TODO */
> -    ASSERT(0);
> -    return 0;
> +    uint64_t eptp_base;
> +    struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
> +
> +    eptp_base = __get_vvmcs(nvcpu->nv_vvmcx, EPT_POINTER);
> +    return eptp_base & PAGE_MASK;
>  }
>
>  uint32_t nvmx_vcpu_asid(struct vcpu *v)
> @@ -108,6 +111,13 @@ uint32_t nvmx_vcpu_asid(struct vcpu *v)
>      return 0;
>  }
>
> +bool_t nvmx_ept_enabled(struct vcpu *v)
> +{
> +    struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
> +
> +    return !!(nvmx->ept.enabled);
> +}
> +
>  static const enum x86_segment sreg_to_index[] = {
>      [VMX_SREG_ES] = x86_seg_es,
>      [VMX_SREG_CS] = x86_seg_cs,
> @@ -503,14 +513,16 @@ void nvmx_update_exec_control(struct vcpu *v, u32 host_cntrl)
>  }
>
>  void nvmx_update_secondary_exec_control(struct vcpu *v,
> -                                            unsigned long value)
> +                                            unsigned long host_cntrl)
>  {
>      u32 shadow_cntrl;
>      struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
> +    struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
>
>      shadow_cntrl = __get_vvmcs(nvcpu->nv_vvmcx, SECONDARY_VM_EXEC_CONTROL);
> -    shadow_cntrl |= value;
> -    set_shadow_control(v, SECONDARY_VM_EXEC_CONTROL, shadow_cntrl);
> +    nvmx->ept.enabled = !!(shadow_cntrl & SECONDARY_EXEC_ENABLE_EPT);
> +    shadow_cntrl |= host_cntrl;
> +    __vmwrite(SECONDARY_VM_EXEC_CONTROL, shadow_cntrl);
>  }
>
>  static void nvmx_update_pin_control(struct vcpu *v, unsigned long host_cntrl)
> @@ -818,6 +830,17 @@ static void load_shadow_guest_state(struct vcpu *v)
>      /* TODO: CR3 target control */
>  }
>
> +
> +static uint64_t get_shadow_eptp(struct vcpu *v)
> +{
> +    uint64_t np2m_base = nvmx_vcpu_eptp_base(v);
> +    struct p2m_domain *p2m = p2m_get_nestedp2m(v, np2m_base);
> +    struct ept_data *ept = &p2m->ept;
> +
> +    ept->asr = pagetable_get_pfn(p2m_get_pagetable(p2m));
> +    return ept_get_eptp(ept);
> +}
> +
>  static void virtual_vmentry(struct cpu_user_regs *regs)
>  {
>      struct vcpu *v = current;
> @@ -862,7 +885,10 @@ static void virtual_vmentry(struct cpu_user_regs *regs)
>      /* updating host cr0 to sync TS bit */
>      __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0);
>
> -    /* TODO: EPT_POINTER */
> +    /* Setup virtual ETP for L2 guest*/
> +    if ( nestedhvm_paging_mode_hap(v) )
> +        __vmwrite(EPT_POINTER, get_shadow_eptp(v));
> +
>  }
>
>  static void sync_vvmcs_guest_state(struct vcpu *v, struct cpu_user_regs *regs)
> @@ -915,8 +941,8 @@ static void sync_vvmcs_ro(struct vcpu *v)
>      /* Adjust exit_reason/exit_qualifciation for violation case */
>      if ( __get_vvmcs(vvmcs, VM_EXIT_REASON) ==
>                  EXIT_REASON_EPT_VIOLATION ) {
> -        __set_vvmcs(vvmcs, EXIT_QUALIFICATION, nvmx->ept_exit.exit_qual);
> -        __set_vvmcs(vvmcs, VM_EXIT_REASON, nvmx->ept_exit.exit_reason);
> +        __set_vvmcs(vvmcs, EXIT_QUALIFICATION, nvmx->ept.exit_qual);
> +        __set_vvmcs(vvmcs, VM_EXIT_REASON, nvmx->ept.exit_reason);
>      }
>  }
>
> @@ -1480,8 +1506,8 @@ nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa,
>          case EPT_TRANSLATE_VIOLATION:
>          case EPT_TRANSLATE_MISCONFIG:
>              rc = NESTEDHVM_PAGEFAULT_INJECT;
> -            nvmx->ept_exit.exit_reason = exit_reason;
> -            nvmx->ept_exit.exit_qual = exit_qual;
> +            nvmx->ept.exit_reason = exit_reason;
> +            nvmx->ept.exit_qual = exit_qual;
>              break;
>          case EPT_TRANSLATE_RETRY:
>              rc = NESTEDHVM_PAGEFAULT_RETRY;
> diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h b/xen/include/asm-x86/hvm/vmx/vvmx.h
> index 8eb377b..661cd8a 100644
> --- a/xen/include/asm-x86/hvm/vmx/vvmx.h
> +++ b/xen/include/asm-x86/hvm/vmx/vvmx.h
> @@ -33,9 +33,10 @@ struct nestedvmx {
>          u32           error_code;
>      } intr;
>      struct {
> +        char     enabled;

I think we should use boot_t not char.

>          uint32_t exit_reason;
>          uint32_t exit_qual;
> -    } ept_exit;
> +    } ept;
>  };
>
>  #define vcpu_2_nvmx(v) (vcpu_nestedhvm(v).u.nvmx)
> @@ -110,6 +111,8 @@ int nvmx_intercepts_exception(struct vcpu *v,
>                                unsigned int trap, int error_code);
>  void nvmx_domain_relinquish_resources(struct domain *d);
>
> +bool_t nvmx_ept_enabled(struct vcpu *v);
> +
>  int nvmx_handle_vmxon(struct cpu_user_regs *regs);
>  int nvmx_handle_vmxoff(struct cpu_user_regs *regs);
>
> --
> 1.7.1
>



-- 
Jun
Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 00/10] Nested VMX: Add virtual EPT & VPID support to L1 VMM
@ 2012-12-19 19:44 Xiantao Zhang
  2012-12-19 19:44 ` [PATCH v2 01/10] nestedhap: Change hostcr3 and p2m->cr3 to meaningful words Xiantao Zhang
                   ` (9 more replies)
  0 siblings, 10 replies; 18+ messages in thread
From: Xiantao Zhang @ 2012-12-19 19:44 UTC (permalink / raw)
  To: xen-devel; +Cc: keir, jun.nakajima, tim, eddie.dong, JBeulich, Zhang Xiantao

From: Zhang Xiantao <xiantao.zhang@intel.com>

With virtual EPT support, L1 hyerpvisor can use EPT hardware
for L2 guest's memory virtualization.
In this way, L2 guest's performance can be improved sharply.
According to our testing, some benchmarks can show > 5x performance gain.

Changes from v1:
Update the patches according to Tim's comments. 
1. Patch 03: Enhance the virtual EPT's walker logic.
2. Patch 04: Add a new field in struct p2m_domain, and use it to store
   EPT-specific data. For host p2m, it saves L1 VMM's EPT data,
   and for nested p2m, it saves nested EPT's data
3. Patch 07: strictly check host's p2m access type.
4. Other patches: some whitespace mangling fixes.

Zhang Xiantao (10):
  nestedhap: Change hostcr3 and p2m->cr3 to meaningful words
  nestedhap: Change nested p2m's walker to vendor-specific
  nested_ept: Implement guest ept's walker
  EPT: Make ept data structure or operations neutral
  nEPT: Try to enable EPT paging for L2 guest.
  nEPT: Sync PDPTR fields if L2 guest in PAE paging mode
  nEPT: Use minimal permission for nested p2m.
  nEPT: handle invept instruction from L1 VMM
  nVMX: virutalize VPID capability to nested VMM.
  nEPT: expost EPT & VPID capablities to L1 VMM

 xen/arch/x86/hvm/hvm.c                  |    7 +-
 xen/arch/x86/hvm/svm/nestedsvm.c        |   31 ++++
 xen/arch/x86/hvm/svm/svm.c              |    3 +-
 xen/arch/x86/hvm/vmx/vmcs.c             |    9 +-
 xen/arch/x86/hvm/vmx/vmx.c              |   90 ++++-------
 xen/arch/x86/hvm/vmx/vvmx.c             |  213 ++++++++++++++++++++++--
 xen/arch/x86/mm/guest_walk.c            |   16 +-
 xen/arch/x86/mm/hap/Makefile            |    1 +
 xen/arch/x86/mm/hap/nested_ept.c        |  282 +++++++++++++++++++++++++++++++
 xen/arch/x86/mm/hap/nested_hap.c        |   95 ++++++-----
 xen/arch/x86/mm/mm-locks.h              |    2 +-
 xen/arch/x86/mm/p2m-ept.c               |  104 +++++++++---
 xen/arch/x86/mm/p2m.c                   |   51 ++++---
 xen/arch/x86/mm/shadow/multi.c          |    2 +-
 xen/include/asm-x86/guest_pt.h          |    8 +
 xen/include/asm-x86/hvm/hvm.h           |    9 +-
 xen/include/asm-x86/hvm/nestedhvm.h     |    1 +
 xen/include/asm-x86/hvm/svm/nestedsvm.h |    3 +
 xen/include/asm-x86/hvm/vmx/vmcs.h      |   24 ++--
 xen/include/asm-x86/hvm/vmx/vmx.h       |   38 ++++-
 xen/include/asm-x86/hvm/vmx/vvmx.h      |   29 +++-
 xen/include/asm-x86/p2m.h               |   20 ++-
 22 files changed, 843 insertions(+), 195 deletions(-)
 create mode 100644 xen/arch/x86/mm/hap/nested_ept.c

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 01/10] nestedhap: Change hostcr3 and p2m->cr3 to meaningful words
  2012-12-19 19:44 [PATCH v2 00/10] Nested VMX: Add virtual EPT & VPID support to L1 VMM Xiantao Zhang
@ 2012-12-19 19:44 ` Xiantao Zhang
  2012-12-19  8:19   ` Jan Beulich
  2012-12-19 19:44 ` [PATCH v2 02/10] nestedhap: Change nested p2m's walker to vendor-specific Xiantao Zhang
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 18+ messages in thread
From: Xiantao Zhang @ 2012-12-19 19:44 UTC (permalink / raw)
  To: xen-devel; +Cc: keir, jun.nakajima, tim, eddie.dong, JBeulich, Zhang Xiantao

From: Zhang Xiantao <xiantao.zhang@intel.com>

VMX doesn't have the concept about host cr3 for nested p2m,
and only SVM has, so change it to netural words.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
---
 xen/arch/x86/hvm/hvm.c             |    6 +++---
 xen/arch/x86/hvm/svm/svm.c         |    2 +-
 xen/arch/x86/hvm/vmx/vmx.c         |    2 +-
 xen/arch/x86/hvm/vmx/vvmx.c        |    2 +-
 xen/arch/x86/mm/hap/nested_hap.c   |   15 ++++++++-------
 xen/arch/x86/mm/mm-locks.h         |    2 +-
 xen/arch/x86/mm/p2m.c              |   26 +++++++++++++-------------
 xen/include/asm-x86/hvm/hvm.h      |    4 ++--
 xen/include/asm-x86/hvm/vmx/vvmx.h |    2 +-
 xen/include/asm-x86/p2m.h          |   16 ++++++++--------
 10 files changed, 39 insertions(+), 38 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 40c1ab2..1cae8a8 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4536,10 +4536,10 @@ uint64_t nhvm_vcpu_guestcr3(struct vcpu *v)
     return -EOPNOTSUPP;
 }
 
-uint64_t nhvm_vcpu_hostcr3(struct vcpu *v)
+uint64_t nhvm_vcpu_p2m_base(struct vcpu *v)
 {
-    if (hvm_funcs.nhvm_vcpu_hostcr3)
-        return hvm_funcs.nhvm_vcpu_hostcr3(v);
+    if (hvm_funcs.nhvm_vcpu_p2m_base)
+        return hvm_funcs.nhvm_vcpu_p2m_base(v);
     return -EOPNOTSUPP;
 }
 
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 55a5ae5..2c8504a 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -2003,7 +2003,7 @@ static struct hvm_function_table __read_mostly svm_function_table = {
     .nhvm_vcpu_vmexit = nsvm_vcpu_vmexit_inject,
     .nhvm_vcpu_vmexit_trap = nsvm_vcpu_vmexit_trap,
     .nhvm_vcpu_guestcr3 = nsvm_vcpu_guestcr3,
-    .nhvm_vcpu_hostcr3 = nsvm_vcpu_hostcr3,
+    .nhvm_vcpu_p2m_base = nsvm_vcpu_hostcr3,
     .nhvm_vcpu_asid = nsvm_vcpu_asid,
     .nhvm_vmcx_guest_intercepts_trap = nsvm_vmcb_guest_intercepts_trap,
     .nhvm_vmcx_hap_enabled = nsvm_vmcb_hap_enabled,
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index aee1f9e..98309da 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1504,7 +1504,7 @@ static struct hvm_function_table __read_mostly vmx_function_table = {
     .nhvm_vcpu_destroy    = nvmx_vcpu_destroy,
     .nhvm_vcpu_reset      = nvmx_vcpu_reset,
     .nhvm_vcpu_guestcr3   = nvmx_vcpu_guestcr3,
-    .nhvm_vcpu_hostcr3    = nvmx_vcpu_hostcr3,
+    .nhvm_vcpu_p2m_base   = nvmx_vcpu_eptp_base,
     .nhvm_vcpu_asid       = nvmx_vcpu_asid,
     .nhvm_vmcx_guest_intercepts_trap = nvmx_intercepts_exception,
     .nhvm_vcpu_vmexit_trap = nvmx_vmexit_trap,
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index b005816..6d1a736 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -94,7 +94,7 @@ uint64_t nvmx_vcpu_guestcr3(struct vcpu *v)
     return 0;
 }
 
-uint64_t nvmx_vcpu_hostcr3(struct vcpu *v)
+uint64_t nvmx_vcpu_eptp_base(struct vcpu *v)
 {
     /* TODO */
     ASSERT(0);
diff --git a/xen/arch/x86/mm/hap/nested_hap.c b/xen/arch/x86/mm/hap/nested_hap.c
index 317875d..f9a5edc 100644
--- a/xen/arch/x86/mm/hap/nested_hap.c
+++ b/xen/arch/x86/mm/hap/nested_hap.c
@@ -48,9 +48,10 @@
  *    1. If #NPF is from L1 guest, then we crash the guest VM (same as old 
  *       code)
  *    2. If #NPF is from L2 guest, then we continue from (3)
- *    3. Get h_cr3 from L1 guest. Map h_cr3 into L0 hypervisor address space.
- *    4. Walk the h_cr3 page table
- *    5.    - if not present, then we inject #NPF back to L1 guest and 
+ *    3. Get np2m base from L1 guest. Map np2m base into L0 hypervisor address space.
+ *    4. Walk the np2m's  page table
+ *    5.    - if not present or permission check failure, then we inject #NPF back to 
+ *    L1 guest and 
  *            re-launch L1 guest (L1 guest will either treat this #NPF as MMIO,
  *            or fix its p2m table for L2 guest)
  *    6.    - if present, then we will get the a new translated value L1-GPA 
@@ -89,7 +90,7 @@ nestedp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
 
     if (old_flags & _PAGE_PRESENT)
         flush_tlb_mask(p2m->dirty_cpumask);
-    
+
     paging_unlock(d);
 }
 
@@ -110,7 +111,7 @@ nestedhap_fix_p2m(struct vcpu *v, struct p2m_domain *p2m,
     /* If this p2m table has been flushed or recycled under our feet, 
      * leave it alone.  We'll pick up the right one as we try to 
      * vmenter the guest. */
-    if ( p2m->cr3 == nhvm_vcpu_hostcr3(v) )
+    if ( p2m->np2m_base == nhvm_vcpu_p2m_base(v) )
     {
         unsigned long gfn, mask;
         mfn_t mfn;
@@ -186,7 +187,7 @@ nestedhap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa,
     uint32_t pfec;
     unsigned long nested_cr3, gfn;
     
-    nested_cr3 = nhvm_vcpu_hostcr3(v);
+    nested_cr3 = nhvm_vcpu_p2m_base(v);
 
     pfec = PFEC_user_mode | PFEC_page_present;
     if (access_w)
@@ -221,7 +222,7 @@ nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t *L2_gpa,
     p2m_type_t p2mt_10;
 
     p2m = p2m_get_hostp2m(d); /* L0 p2m */
-    nested_p2m = p2m_get_nestedp2m(v, nhvm_vcpu_hostcr3(v));
+    nested_p2m = p2m_get_nestedp2m(v, nhvm_vcpu_p2m_base(v));
 
     /* walk the L1 P2M table */
     rv = nestedhap_walk_L1_p2m(v, *L2_gpa, &L1_gpa, &page_order_21,
diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
index 3700e32..1817f81 100644
--- a/xen/arch/x86/mm/mm-locks.h
+++ b/xen/arch/x86/mm/mm-locks.h
@@ -249,7 +249,7 @@ declare_mm_order_constraint(per_page_sharing)
  * A per-domain lock that protects the mapping from nested-CR3 to 
  * nested-p2m.  In particular it covers:
  * - the array of nested-p2m tables, and all LRU activity therein; and
- * - setting the "cr3" field of any p2m table to a non-CR3_EADDR value. 
+ * - setting the "cr3" field of any p2m table to a non-P2M_BASE_EAADR value. 
  *   (i.e. assigning a p2m table to be the shadow of that cr3 */
 
 /* PoD lock (per-p2m-table)
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 258f46e..6a4bdd9 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -69,7 +69,7 @@ static void p2m_initialise(struct domain *d, struct p2m_domain *p2m)
     p2m->domain = d;
     p2m->default_access = p2m_access_rwx;
 
-    p2m->cr3 = CR3_EADDR;
+    p2m->np2m_base = P2M_BASE_EADDR;
 
     if ( hap_enabled(d) && cpu_has_vmx )
         ept_p2m_init(p2m);
@@ -1433,7 +1433,7 @@ p2m_flush_table(struct p2m_domain *p2m)
     ASSERT(page_list_empty(&p2m->pod.single));
 
     /* This is no longer a valid nested p2m for any address space */
-    p2m->cr3 = CR3_EADDR;
+    p2m->np2m_base = P2M_BASE_EADDR;
     
     /* Zap the top level of the trie */
     top = mfn_to_page(pagetable_get_mfn(p2m_get_pagetable(p2m)));
@@ -1471,7 +1471,7 @@ p2m_flush_nestedp2m(struct domain *d)
 }
 
 struct p2m_domain *
-p2m_get_nestedp2m(struct vcpu *v, uint64_t cr3)
+p2m_get_nestedp2m(struct vcpu *v, uint64_t np2m_base)
 {
     /* Use volatile to prevent gcc to cache nv->nv_p2m in a cpu register as
      * this may change within the loop by an other (v)cpu.
@@ -1480,8 +1480,8 @@ p2m_get_nestedp2m(struct vcpu *v, uint64_t cr3)
     struct domain *d;
     struct p2m_domain *p2m;
 
-    /* Mask out low bits; this avoids collisions with CR3_EADDR */
-    cr3 &= ~(0xfffull);
+    /* Mask out low bits; this avoids collisions with P2M_BASE_EADDR */
+    np2m_base &= ~(0xfffull);
 
     if (nv->nv_flushp2m && nv->nv_p2m) {
         nv->nv_p2m = NULL;
@@ -1493,14 +1493,14 @@ p2m_get_nestedp2m(struct vcpu *v, uint64_t cr3)
     if ( p2m ) 
     {
         p2m_lock(p2m);
-        if ( p2m->cr3 == cr3 || p2m->cr3 == CR3_EADDR )
+        if ( p2m->np2m_base == np2m_base || p2m->np2m_base == P2M_BASE_EADDR )
         {
             nv->nv_flushp2m = 0;
             p2m_getlru_nestedp2m(d, p2m);
             nv->nv_p2m = p2m;
-            if (p2m->cr3 == CR3_EADDR)
+            if (p2m->np2m_base == P2M_BASE_EADDR)
                 hvm_asid_flush_vcpu(v);
-            p2m->cr3 = cr3;
+            p2m->np2m_base = np2m_base;
             cpumask_set_cpu(v->processor, p2m->dirty_cpumask);
             p2m_unlock(p2m);
             nestedp2m_unlock(d);
@@ -1515,7 +1515,7 @@ p2m_get_nestedp2m(struct vcpu *v, uint64_t cr3)
     p2m_flush_table(p2m);
     p2m_lock(p2m);
     nv->nv_p2m = p2m;
-    p2m->cr3 = cr3;
+    p2m->np2m_base = np2m_base;
     nv->nv_flushp2m = 0;
     hvm_asid_flush_vcpu(v);
     cpumask_set_cpu(v->processor, p2m->dirty_cpumask);
@@ -1531,7 +1531,7 @@ p2m_get_p2m(struct vcpu *v)
     if (!nestedhvm_is_n2(v))
         return p2m_get_hostp2m(v->domain);
 
-    return p2m_get_nestedp2m(v, nhvm_vcpu_hostcr3(v));
+    return p2m_get_nestedp2m(v, nhvm_vcpu_p2m_base(v));
 }
 
 unsigned long paging_gva_to_gfn(struct vcpu *v,
@@ -1549,15 +1549,15 @@ unsigned long paging_gva_to_gfn(struct vcpu *v,
         struct p2m_domain *p2m;
         const struct paging_mode *mode;
         uint32_t pfec_21 = *pfec;
-        uint64_t ncr3 = nhvm_vcpu_hostcr3(v);
+        uint64_t np2m_base = nhvm_vcpu_p2m_base(v);
 
         /* translate l2 guest va into l2 guest gfn */
-        p2m = p2m_get_nestedp2m(v, ncr3);
+        p2m = p2m_get_nestedp2m(v, np2m_base);
         mode = paging_get_nestedmode(v);
         gfn = mode->gva_to_gfn(v, p2m, va, pfec);
 
         /* translate l2 guest gfn into l1 guest gfn */
-        return hostmode->p2m_ga_to_gfn(v, hostp2m, ncr3,
+        return hostmode->p2m_ga_to_gfn(v, hostp2m, np2m_base,
                                        gfn << PAGE_SHIFT, &pfec_21, NULL);
     }
 
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index fdb0f58..d3535b6 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -170,7 +170,7 @@ struct hvm_function_table {
                                 uint64_t exitcode);
     int (*nhvm_vcpu_vmexit_trap)(struct vcpu *v, struct hvm_trap *trap);
     uint64_t (*nhvm_vcpu_guestcr3)(struct vcpu *v);
-    uint64_t (*nhvm_vcpu_hostcr3)(struct vcpu *v);
+    uint64_t (*nhvm_vcpu_p2m_base)(struct vcpu *v);
     uint32_t (*nhvm_vcpu_asid)(struct vcpu *v);
     int (*nhvm_vmcx_guest_intercepts_trap)(struct vcpu *v, 
                                unsigned int trapnr, int errcode);
@@ -475,7 +475,7 @@ uint64_t nhvm_vcpu_guestcr3(struct vcpu *v);
 /* returns l1 guest's cr3 that points to the page table used to
  * translate l2 guest physical address to l1 guest physical address.
  */
-uint64_t nhvm_vcpu_hostcr3(struct vcpu *v);
+uint64_t nhvm_vcpu_p2m_base(struct vcpu *v);
 /* returns the asid number l1 guest wants to use to run the l2 guest */
 uint32_t nhvm_vcpu_asid(struct vcpu *v);
 
diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h b/xen/include/asm-x86/hvm/vmx/vvmx.h
index dce2cd8..d97011d 100644
--- a/xen/include/asm-x86/hvm/vmx/vvmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vvmx.h
@@ -99,7 +99,7 @@ int nvmx_vcpu_initialise(struct vcpu *v);
 void nvmx_vcpu_destroy(struct vcpu *v);
 int nvmx_vcpu_reset(struct vcpu *v);
 uint64_t nvmx_vcpu_guestcr3(struct vcpu *v);
-uint64_t nvmx_vcpu_hostcr3(struct vcpu *v);
+uint64_t nvmx_vcpu_eptp_base(struct vcpu *v);
 uint32_t nvmx_vcpu_asid(struct vcpu *v);
 enum hvm_intblk nvmx_intr_blocked(struct vcpu *v);
 int nvmx_intercepts_exception(struct vcpu *v, 
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 2bd2048..ce26594 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -197,17 +197,17 @@ struct p2m_domain {
 
     struct domain     *domain;   /* back pointer to domain */
 
-    /* Nested p2ms only: nested-CR3 value that this p2m shadows. 
-     * This can be cleared to CR3_EADDR under the per-p2m lock but
+    /* Nested p2ms only: nested p2m base value that this p2m shadows. 
+     * This can be cleared to P2M_BASE_EADDR under the per-p2m lock but
      * needs both the per-p2m lock and the per-domain nestedp2m lock
      * to set it to any other value. */
-#define CR3_EADDR     (~0ULL)
-    uint64_t           cr3;
+#define P2M_BASE_EADDR     (~0ULL)
+    uint64_t           np2m_base;
 
     /* Nested p2ms: linked list of n2pms allocated to this domain. 
      * The host p2m hasolds the head of the list and the np2ms are 
      * threaded on in LRU order. */
-    struct list_head np2m_list; 
+    struct list_head   np2m_list; 
 
 
     /* Host p2m: when this flag is set, don't flush all the nested-p2m 
@@ -282,11 +282,11 @@ struct p2m_domain {
 /* get host p2m table */
 #define p2m_get_hostp2m(d)      ((d)->arch.p2m)
 
-/* Get p2m table (re)usable for specified cr3.
+/* Get p2m table (re)usable for specified np2m base.
  * Automatically destroys and re-initializes a p2m if none found.
- * If cr3 == 0 then v->arch.hvm_vcpu.guest_cr[3] is used.
+ * If np2m_base == 0 then v->arch.hvm_vcpu.guest_cr[3] is used.
  */
-struct p2m_domain *p2m_get_nestedp2m(struct vcpu *v, uint64_t cr3);
+struct p2m_domain *p2m_get_nestedp2m(struct vcpu *v, uint64_t np2m_base);
 
 /* If vcpu is in host mode then behaviour matches p2m_get_hostp2m().
  * If vcpu is in guest mode then behaviour matches p2m_get_nestedp2m().
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 02/10] nestedhap: Change nested p2m's walker to vendor-specific
  2012-12-19 19:44 [PATCH v2 00/10] Nested VMX: Add virtual EPT & VPID support to L1 VMM Xiantao Zhang
  2012-12-19 19:44 ` [PATCH v2 01/10] nestedhap: Change hostcr3 and p2m->cr3 to meaningful words Xiantao Zhang
@ 2012-12-19 19:44 ` Xiantao Zhang
  2012-12-19 19:44 ` [PATCH v2 03/10] nested_ept: Implement guest ept's walker Xiantao Zhang
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Xiantao Zhang @ 2012-12-19 19:44 UTC (permalink / raw)
  To: xen-devel; +Cc: keir, jun.nakajima, tim, eddie.dong, JBeulich, Zhang Xiantao

From: Zhang Xiantao <xiantao.zhang@intel.com>

EPT and NPT adopts differnt formats for each-level entry,
so change the walker functions to vendor-specific.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
---
 xen/arch/x86/hvm/svm/nestedsvm.c        |   31 +++++++++++++++++++++
 xen/arch/x86/hvm/svm/svm.c              |    1 +
 xen/arch/x86/hvm/vmx/vmx.c              |    3 +-
 xen/arch/x86/hvm/vmx/vvmx.c             |   13 +++++++++
 xen/arch/x86/mm/hap/nested_hap.c        |   46 +++++++++++--------------------
 xen/include/asm-x86/hvm/hvm.h           |    5 +++
 xen/include/asm-x86/hvm/svm/nestedsvm.h |    3 ++
 xen/include/asm-x86/hvm/vmx/vvmx.h      |    5 +++
 8 files changed, 76 insertions(+), 31 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c
index ed0faa6..5dcb354 100644
--- a/xen/arch/x86/hvm/svm/nestedsvm.c
+++ b/xen/arch/x86/hvm/svm/nestedsvm.c
@@ -1171,6 +1171,37 @@ nsvm_vmcb_hap_enabled(struct vcpu *v)
     return vcpu_nestedsvm(v).ns_hap_enabled;
 }
 
+/* This function uses L2_gpa to walk the P2M page table in L1. If the 
+ * walk is successful, the translated value is returned in
+ * L1_gpa. The result value tells what to do next.
+ */
+int
+nsvm_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa,
+                      unsigned int *page_order,
+                      bool_t access_r, bool_t access_w, bool_t access_x)
+{
+    uint32_t pfec;
+    unsigned long nested_cr3, gfn;
+    
+    nested_cr3 = nhvm_vcpu_p2m_base(v);
+
+    pfec = PFEC_user_mode | PFEC_page_present;
+    if (access_w)
+        pfec |= PFEC_write_access;
+    if (access_x)
+        pfec |= PFEC_insn_fetch;
+
+    /* Walk the guest-supplied NPT table, just as if it were a pagetable */
+    gfn = paging_ga_to_gfn_cr3(v, nested_cr3, L2_gpa, &pfec, page_order);
+
+    if ( gfn == INVALID_GFN ) 
+        return NESTEDHVM_PAGEFAULT_INJECT;
+
+    *L1_gpa = (gfn << PAGE_SHIFT) + (L2_gpa & ~PAGE_MASK);
+    return NESTEDHVM_PAGEFAULT_DONE;
+}
+
+
 enum hvm_intblk nsvm_intr_blocked(struct vcpu *v)
 {
     struct nestedsvm *svm = &vcpu_nestedsvm(v);
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 2c8504a..acd2d49 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -2008,6 +2008,7 @@ static struct hvm_function_table __read_mostly svm_function_table = {
     .nhvm_vmcx_guest_intercepts_trap = nsvm_vmcb_guest_intercepts_trap,
     .nhvm_vmcx_hap_enabled = nsvm_vmcb_hap_enabled,
     .nhvm_intr_blocked = nsvm_intr_blocked,
+    .nhvm_hap_walk_L1_p2m = nsvm_hap_walk_L1_p2m,
 };
 
 void svm_vmexit_handler(struct cpu_user_regs *regs)
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 98309da..4abfa90 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1511,7 +1511,8 @@ static struct hvm_function_table __read_mostly vmx_function_table = {
     .nhvm_intr_blocked    = nvmx_intr_blocked,
     .nhvm_domain_relinquish_resources = nvmx_domain_relinquish_resources,
     .update_eoi_exit_bitmap = vmx_update_eoi_exit_bitmap,
-    .virtual_intr_delivery_enabled = vmx_virtual_intr_delivery_enabled
+    .virtual_intr_delivery_enabled = vmx_virtual_intr_delivery_enabled,
+    .nhvm_hap_walk_L1_p2m = nvmx_hap_walk_L1_p2m,
 };
 
 struct hvm_function_table * __init start_vmx(void)
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 6d1a736..4495dd6 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1445,6 +1445,19 @@ int nvmx_msr_write_intercept(unsigned int msr, u64 msr_content)
     return 1;
 }
 
+/* This function uses L2_gpa to walk the P2M page table in L1. If the 
+ * walk is successful, the translated value is returned in
+ * L1_gpa. The result value tells what to do next.
+ */
+int
+nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa,
+                      unsigned int *page_order,
+                      bool_t access_r, bool_t access_w, bool_t access_x)
+{
+    /*TODO:*/
+    return 0;
+}
+
 void nvmx_idtv_handling(void)
 {
     struct vcpu *v = current;
diff --git a/xen/arch/x86/mm/hap/nested_hap.c b/xen/arch/x86/mm/hap/nested_hap.c
index f9a5edc..8787c91 100644
--- a/xen/arch/x86/mm/hap/nested_hap.c
+++ b/xen/arch/x86/mm/hap/nested_hap.c
@@ -136,6 +136,22 @@ nestedhap_fix_p2m(struct vcpu *v, struct p2m_domain *p2m,
     }
 }
 
+/* This function uses L2_gpa to walk the P2M page table in L1. If the 
+ * walk is successful, the translated value is returned in
+ * L1_gpa. The result value tells what to do next.
+ */
+static int
+nestedhap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa,
+                      unsigned int *page_order,
+                      bool_t access_r, bool_t access_w, bool_t access_x)
+{
+    ASSERT(hvm_funcs.nhvm_hap_walk_L1_p2m);
+
+    return hvm_funcs.nhvm_hap_walk_L1_p2m(v, L2_gpa, L1_gpa, page_order,
+        access_r, access_w, access_x);
+}
+
+
 /* This function uses L1_gpa to walk the P2M table in L0 hypervisor. If the
  * walk is successful, the translated value is returned in L0_gpa. The return 
  * value tells the upper level what to do.
@@ -175,36 +191,6 @@ out:
     return rc;
 }
 
-/* This function uses L2_gpa to walk the P2M page table in L1. If the 
- * walk is successful, the translated value is returned in
- * L1_gpa. The result value tells what to do next.
- */
-static int
-nestedhap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa,
-                      unsigned int *page_order,
-                      bool_t access_r, bool_t access_w, bool_t access_x)
-{
-    uint32_t pfec;
-    unsigned long nested_cr3, gfn;
-    
-    nested_cr3 = nhvm_vcpu_p2m_base(v);
-
-    pfec = PFEC_user_mode | PFEC_page_present;
-    if (access_w)
-        pfec |= PFEC_write_access;
-    if (access_x)
-        pfec |= PFEC_insn_fetch;
-
-    /* Walk the guest-supplied NPT table, just as if it were a pagetable */
-    gfn = paging_ga_to_gfn_cr3(v, nested_cr3, L2_gpa, &pfec, page_order);
-
-    if ( gfn == INVALID_GFN ) 
-        return NESTEDHVM_PAGEFAULT_INJECT;
-
-    *L1_gpa = (gfn << PAGE_SHIFT) + (L2_gpa & ~PAGE_MASK);
-    return NESTEDHVM_PAGEFAULT_DONE;
-}
-
 /*
  * The following function, nestedhap_page_fault(), is for steps (3)--(10).
  *
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index d3535b6..80f07e9 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -183,6 +183,11 @@ struct hvm_function_table {
     /* Virtual interrupt delivery */
     void (*update_eoi_exit_bitmap)(struct vcpu *v, u8 vector, u8 trig);
     int (*virtual_intr_delivery_enabled)(void);
+
+    /*Walk nested p2m  */
+    int (*nhvm_hap_walk_L1_p2m)(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa,
+                      unsigned int *page_order,
+                      bool_t access_r, bool_t access_w, bool_t access_x);
 };
 
 extern struct hvm_function_table hvm_funcs;
diff --git a/xen/include/asm-x86/hvm/svm/nestedsvm.h b/xen/include/asm-x86/hvm/svm/nestedsvm.h
index fa83023..0c90f30 100644
--- a/xen/include/asm-x86/hvm/svm/nestedsvm.h
+++ b/xen/include/asm-x86/hvm/svm/nestedsvm.h
@@ -133,6 +133,9 @@ int nsvm_wrmsr(struct vcpu *v, unsigned int msr, uint64_t msr_content);
 void svm_vmexit_do_clgi(struct cpu_user_regs *regs, struct vcpu *v);
 void svm_vmexit_do_stgi(struct cpu_user_regs *regs, struct vcpu *v);
 bool_t nestedsvm_gif_isset(struct vcpu *v);
+int nsvm_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa,
+                      unsigned int *page_order,
+                      bool_t access_r, bool_t access_w, bool_t access_x);
 
 #define NSVM_INTR_NOTHANDLED     3
 #define NSVM_INTR_NOTINTERCEPTED 2
diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h b/xen/include/asm-x86/hvm/vmx/vvmx.h
index d97011d..422f006 100644
--- a/xen/include/asm-x86/hvm/vmx/vvmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vvmx.h
@@ -108,6 +108,11 @@ void nvmx_domain_relinquish_resources(struct domain *d);
 
 int nvmx_handle_vmxon(struct cpu_user_regs *regs);
 int nvmx_handle_vmxoff(struct cpu_user_regs *regs);
+
+int
+nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa,
+                      unsigned int *page_order,
+                      bool_t access_r, bool_t access_w, bool_t access_x);
 /*
  * Virtual VMCS layout
  *
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 03/10] nested_ept: Implement guest ept's walker
  2012-12-19 19:44 [PATCH v2 00/10] Nested VMX: Add virtual EPT & VPID support to L1 VMM Xiantao Zhang
  2012-12-19 19:44 ` [PATCH v2 01/10] nestedhap: Change hostcr3 and p2m->cr3 to meaningful words Xiantao Zhang
  2012-12-19 19:44 ` [PATCH v2 02/10] nestedhap: Change nested p2m's walker to vendor-specific Xiantao Zhang
@ 2012-12-19 19:44 ` Xiantao Zhang
  2012-12-19 16:42   ` Nakajima, Jun
  2012-12-19 19:44 ` [PATCH v2 04/10] EPT: Make ept data structure or operations neutral Xiantao Zhang
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 18+ messages in thread
From: Xiantao Zhang @ 2012-12-19 19:44 UTC (permalink / raw)
  To: xen-devel; +Cc: keir, jun.nakajima, tim, eddie.dong, JBeulich, Zhang Xiantao

From: Zhang Xiantao <xiantao.zhang@intel.com>

Implment guest EPT PT walker, some logic is based on shadow's
ia32e PT walker. During the PT walking, if the target pages are
not in memory, use RETRY mechanism and get a chance to let the
target page back.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
---
 xen/arch/x86/hvm/hvm.c              |    1 +
 xen/arch/x86/hvm/vmx/vvmx.c         |   42 +++++-
 xen/arch/x86/mm/guest_walk.c        |   16 ++-
 xen/arch/x86/mm/hap/Makefile        |    1 +
 xen/arch/x86/mm/hap/nested_ept.c    |  276 +++++++++++++++++++++++++++++++++++
 xen/arch/x86/mm/hap/nested_hap.c    |    2 +-
 xen/arch/x86/mm/shadow/multi.c      |    2 +-
 xen/include/asm-x86/guest_pt.h      |    8 +
 xen/include/asm-x86/hvm/nestedhvm.h |    1 +
 xen/include/asm-x86/hvm/vmx/vmcs.h  |    1 +
 xen/include/asm-x86/hvm/vmx/vmx.h   |   28 ++++
 xen/include/asm-x86/hvm/vmx/vvmx.h  |   14 ++
 12 files changed, 382 insertions(+), 10 deletions(-)
 create mode 100644 xen/arch/x86/mm/hap/nested_ept.c

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 1cae8a8..3cd0075 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1324,6 +1324,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa,
                                              access_r, access_w, access_x);
         switch (rv) {
         case NESTEDHVM_PAGEFAULT_DONE:
+        case NESTEDHVM_PAGEFAULT_RETRY:
             return 1;
         case NESTEDHVM_PAGEFAULT_L1_ERROR:
             /* An error occured while translating gpa from
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 4495dd6..76cf757 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -906,9 +906,18 @@ static void sync_vvmcs_ro(struct vcpu *v)
 {
     int i;
     struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
+    struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
+    void *vvmcs = nvcpu->nv_vvmcx;
 
     for ( i = 0; i < ARRAY_SIZE(vmcs_ro_field); i++ )
         shadow_to_vvmcs(nvcpu->nv_vvmcx, vmcs_ro_field[i]);
+
+    /* Adjust exit_reason/exit_qualifciation for violation case */
+    if ( __get_vvmcs(vvmcs, VM_EXIT_REASON) ==
+                EXIT_REASON_EPT_VIOLATION ) {
+        __set_vvmcs(vvmcs, EXIT_QUALIFICATION, nvmx->ept_exit.exit_qual);
+        __set_vvmcs(vvmcs, VM_EXIT_REASON, nvmx->ept_exit.exit_reason);
+    }
 }
 
 static void load_vvmcs_host_state(struct vcpu *v)
@@ -1454,8 +1463,37 @@ nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa,
                       unsigned int *page_order,
                       bool_t access_r, bool_t access_w, bool_t access_x)
 {
-    /*TODO:*/
-    return 0;
+    uint64_t exit_qual = __vmread(EXIT_QUALIFICATION);
+    uint32_t exit_reason = EXIT_REASON_EPT_VIOLATION;
+    int rc;
+    unsigned long gfn;
+    uint32_t rwx_rights = (access_x << 2) | (access_w << 1) | access_r;
+    struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
+
+    rc = nept_translate_l2ga(v, L2_gpa, page_order, rwx_rights, &gfn,
+                                &exit_qual, &exit_reason);
+    switch ( rc ) {
+        case EPT_TRANSLATE_SUCCEED:
+            *L1_gpa = (gfn << PAGE_SHIFT) + (L2_gpa & ~PAGE_MASK);
+            rc = NESTEDHVM_PAGEFAULT_DONE;
+            break;
+        case EPT_TRANSLATE_VIOLATION:
+        case EPT_TRANSLATE_MISCONFIG:
+            rc = NESTEDHVM_PAGEFAULT_INJECT;
+            nvmx->ept_exit.exit_reason = exit_reason;
+            nvmx->ept_exit.exit_qual = exit_qual;
+            break;
+        case EPT_TRANSLATE_RETRY:
+            rc = NESTEDHVM_PAGEFAULT_RETRY;
+            break;
+        case EPT_TRANSLATE_ERR_PAGE:
+            rc = NESTEDHVM_PAGEFAULT_L1_ERROR;
+            break;
+        default:
+            gdprintk(XENLOG_ERR, "GUEST EPT translation error!\n");
+    }
+
+    return rc;
 }
 
 void nvmx_idtv_handling(void)
diff --git a/xen/arch/x86/mm/guest_walk.c b/xen/arch/x86/mm/guest_walk.c
index 0f08fb0..1c165c6 100644
--- a/xen/arch/x86/mm/guest_walk.c
+++ b/xen/arch/x86/mm/guest_walk.c
@@ -88,18 +88,19 @@ static uint32_t set_ad_bits(void *guest_p, void *walk_p, int set_dirty)
 
 /* If the map is non-NULL, we leave this function having 
  * acquired an extra ref on mfn_to_page(*mfn) */
-static inline void *map_domain_gfn(struct p2m_domain *p2m,
-                                   gfn_t gfn, 
+void *map_domain_gfn(struct p2m_domain *p2m,
+                                   gfn_t gfn,
                                    mfn_t *mfn,
                                    p2m_type_t *p2mt,
-                                   uint32_t *rc) 
+                                   p2m_query_t q,
+                                   uint32_t *rc)
 {
     struct page_info *page;
     void *map;
 
     /* Translate the gfn, unsharing if shared */
     page = get_page_from_gfn_p2m(p2m->domain, p2m, gfn_x(gfn), p2mt, NULL,
-                                  P2M_ALLOC | P2M_UNSHARE);
+                                  q);
     if ( p2m_is_paging(*p2mt) )
     {
         ASSERT(!p2m_is_nestedp2m(p2m));
@@ -128,7 +129,6 @@ static inline void *map_domain_gfn(struct p2m_domain *p2m,
     return map;
 }
 
-
 /* Walk the guest pagetables, after the manner of a hardware walker. */
 /* Because the walk is essentially random, it can cause a deadlock 
  * warning in the p2m locking code. Highly unlikely this is an actual
@@ -149,6 +149,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
     uint32_t gflags, mflags, iflags, rc = 0;
     int smep;
     bool_t pse1G = 0, pse2M = 0;
+    p2m_query_t qt = P2M_ALLOC | P2M_UNSHARE;
 
     perfc_incr(guest_walk);
     memset(gw, 0, sizeof(*gw));
@@ -188,7 +189,8 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
     l3p = map_domain_gfn(p2m, 
                          guest_l4e_get_gfn(gw->l4e), 
                          &gw->l3mfn,
-                         &p2mt, 
+                         &p2mt,
+                         qt, 
                          &rc); 
     if(l3p == NULL)
         goto out;
@@ -249,6 +251,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
                          guest_l3e_get_gfn(gw->l3e), 
                          &gw->l2mfn,
                          &p2mt, 
+                         qt,
                          &rc); 
     if(l2p == NULL)
         goto out;
@@ -322,6 +325,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
                              guest_l2e_get_gfn(gw->l2e), 
                              &gw->l1mfn,
                              &p2mt,
+                             qt,
                              &rc);
         if(l1p == NULL)
             goto out;
diff --git a/xen/arch/x86/mm/hap/Makefile b/xen/arch/x86/mm/hap/Makefile
index 80a6bec..68f2bb5 100644
--- a/xen/arch/x86/mm/hap/Makefile
+++ b/xen/arch/x86/mm/hap/Makefile
@@ -3,6 +3,7 @@ obj-y += guest_walk_2level.o
 obj-y += guest_walk_3level.o
 obj-$(x86_64) += guest_walk_4level.o
 obj-y += nested_hap.o
+obj-y += nested_ept.o
 
 guest_walk_%level.o: guest_walk.c Makefile
 	$(CC) $(CFLAGS) -DGUEST_PAGING_LEVELS=$* -c $< -o $@
diff --git a/xen/arch/x86/mm/hap/nested_ept.c b/xen/arch/x86/mm/hap/nested_ept.c
new file mode 100644
index 0000000..5f80d82
--- /dev/null
+++ b/xen/arch/x86/mm/hap/nested_ept.c
@@ -0,0 +1,276 @@
+/*
+ * nested_ept.c: Handling virtulized EPT for guest in nested case.
+ *
+ * Copyright (c) 2012, Intel Corporation
+ *  Xiantao Zhang <xiantao.zhang@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ */
+#include <asm/domain.h>
+#include <asm/page.h>
+#include <asm/paging.h>
+#include <asm/p2m.h>
+#include <asm/mem_event.h>
+#include <public/mem_event.h>
+#include <asm/mem_sharing.h>
+#include <xen/event.h>
+#include <asm/hap.h>
+#include <asm/hvm/support.h>
+
+#include <asm/hvm/nestedhvm.h>
+
+#include "private.h"
+
+#include <asm/hvm/vmx/vmx.h>
+#include <asm/hvm/vmx/vvmx.h>
+
+/* EPT always use 4-level paging structure */
+#define GUEST_PAGING_LEVELS 4
+#include <asm/guest_pt.h>
+
+/* Must reserved bits in all level entries  */
+#define EPT_MUST_RSV_BITS (((1ull << PADDR_BITS) -1) & \
+                     ~((1ull << paddr_bits) - 1))
+
+/*
+ *TODO: Just leave it as 0 here for compile pass, will
+ * define real capabilities in the subsequent patches.
+ */
+#define NEPT_VPID_CAP_BITS 0
+
+
+#define NEPT_1G_ENTRY_FLAG (1 << 11)
+#define NEPT_2M_ENTRY_FLAG (1 << 10)
+#define NEPT_4K_ENTRY_FLAG (1 << 9)
+
+bool_t nept_sp_entry(ept_entry_t e)
+{
+    return !!(e.sp);
+}
+
+static bool_t nept_rsv_bits_check(ept_entry_t e, uint32_t level)
+{
+    uint64_t rsv_bits = EPT_MUST_RSV_BITS;
+
+    switch ( level ) {
+    case 1:
+        break;
+    case 2 ... 3:
+        if (nept_sp_entry(e))
+            rsv_bits |=  ((1ull << (9 * (level -1 ))) -1) << PAGE_SHIFT;
+        else
+            rsv_bits |= EPTE_EMT_MASK | EPTE_IGMT_MASK;
+        break;
+    case 4:
+        rsv_bits |= EPTE_EMT_MASK | EPTE_IGMT_MASK | EPTE_SUPER_PAGE_MASK;
+    break;
+    default:
+        printk("Unsupported EPT paging level: %d\n", level);
+    }
+    return !!(e.epte & rsv_bits);
+}
+
+/* EMT checking*/
+static bool_t nept_emt_bits_check(ept_entry_t e, uint32_t level)
+{
+    if ( e.sp || level == 1 ) {
+        if ( e.emt == EPT_EMT_RSV0 || e.emt == EPT_EMT_RSV1 ||
+                e.emt == EPT_EMT_RSV2 )
+            return 1;
+    }
+    return 0;
+}
+
+static bool_t nept_rwx_bits_check(ept_entry_t e) {
+    /*write only or write/execute only*/
+    uint8_t rwx_bits = e.epte & EPTE_RWX_MASK;
+
+    if ( rwx_bits == ept_access_w || rwx_bits == ept_access_wx )
+        return 1;
+
+    if ( rwx_bits == ept_access_x && !(NEPT_VPID_CAP_BITS &
+                        VMX_EPT_EXEC_ONLY_SUPPORTED))
+        return 1;
+
+    return 0;
+}
+
+/* nept's misconfiguration check */
+static bool_t nept_misconfiguration_check(ept_entry_t e, uint32_t level)
+{
+    return (nept_rsv_bits_check(e, level) ||
+                nept_emt_bits_check(e, level) ||
+                nept_rwx_bits_check(e));
+}
+
+static bool_t nept_permission_check(uint32_t rwx_acc, uint32_t rwx_bits)
+{
+    return !(EPTE_RWX_MASK & rwx_acc & ~rwx_bits);
+}
+
+/* nept's non-present check */
+static bool_t nept_non_present_check(ept_entry_t e)
+{
+    if (e.epte & EPTE_RWX_MASK)
+        return 0;
+    return 1;
+}
+
+uint64_t nept_get_ept_vpid_cap(void)
+{
+    return NEPT_VPID_CAP_BITS;
+}
+
+static int ept_lvl_table_offset(unsigned long gpa, int lvl)
+{
+    return (gpa >>(EPT_L4_PAGETABLE_SHIFT -(4 - lvl) * 9)) &
+                (EPT_PAGETABLE_ENTRIES -1 );
+}
+
+static uint32_t
+nept_walk_tables(struct vcpu *v, unsigned long l2ga, ept_walk_t *gw)
+{
+    int lvl;
+    p2m_type_t p2mt;
+    uint32_t rc = 0, ret = 0, gflags;
+    struct domain *d = v->domain;
+    struct p2m_domain *p2m = d->arch.p2m;
+    gfn_t base_gfn = _gfn(nhvm_vcpu_p2m_base(v) >> PAGE_SHIFT);
+    mfn_t lxmfn;
+    ept_entry_t *lxp = NULL;
+
+    memset(gw, 0, sizeof(*gw));
+
+    for (lvl = 4; lvl > 0; lvl--)
+    {
+        lxp = map_domain_gfn(p2m, base_gfn, &lxmfn, &p2mt, P2M_ALLOC, &rc);
+        if ( !lxp )
+            goto map_err;
+        gw->lxe[lvl] = lxp[ept_lvl_table_offset(l2ga, lvl)];
+        unmap_domain_page(lxp);
+        put_page(mfn_to_page(mfn_x(lxmfn)));
+
+        if (nept_non_present_check(gw->lxe[lvl]))
+            goto non_present;
+
+        if (nept_misconfiguration_check(gw->lxe[lvl], lvl))
+            goto misconfig_err;
+
+        if ( (lvl == 2 || lvl == 3) && nept_sp_entry(gw->lxe[lvl]) )
+        {
+            /* Generate a fake l1 table entry so callers don't all
+             * have to understand superpages. */
+            unsigned long gfn_lvl_mask =  (1ull << ((lvl - 1) * 9)) - 1;
+            gfn_t start = _gfn(gw->lxe[lvl].mfn);
+            /* Increment the pfn by the right number of 4k pages. */
+            start = _gfn((gfn_x(start) & ~gfn_lvl_mask) +
+                     ((l2ga >> PAGE_SHIFT) & gfn_lvl_mask));
+            gflags = (gw->lxe[lvl].epte & EPTE_FLAG_MASK) |
+                    (lvl == 3 ? NEPT_1G_ENTRY_FLAG: NEPT_2M_ENTRY_FLAG);
+            gw->lxe[0].epte = (gfn_x(start) << PAGE_SHIFT) | gflags;
+            goto done;
+        }
+        if ( lvl > 1 )
+            base_gfn = _gfn(gw->lxe[lvl].mfn);
+    }
+
+    /* If this is not a super entry, we can reach here. */
+    gflags = (gw->lxe[1].epte & EPTE_FLAG_MASK) | NEPT_4K_ENTRY_FLAG;
+    gw->lxe[0].epte = (gw->lxe[1].epte & PAGE_MASK) | gflags;
+
+done:
+    ret = EPT_TRANSLATE_SUCCEED;
+    goto out;
+
+map_err:
+    if ( rc == _PAGE_PAGED )
+        ret = EPT_TRANSLATE_RETRY;
+    else
+        ret = EPT_TRANSLATE_ERR_PAGE;
+    goto out;
+
+misconfig_err:
+    ret =  EPT_TRANSLATE_MISCONFIG;
+    goto out;
+
+non_present:
+    ret = EPT_TRANSLATE_VIOLATION;
+    /* fall through. */
+out:
+    return ret;
+}
+
+/* Translate a L2 guest address to L1 gpa via L1 EPT paging structure */
+
+int nept_translate_l2ga(struct vcpu *v, paddr_t l2ga,
+                        unsigned int *page_order, uint32_t rwx_acc,
+                        unsigned long *l1gfn, uint64_t *exit_qual,
+                        uint32_t *exit_reason)
+{
+    uint32_t rc, rwx_bits = 0;
+    ept_walk_t gw;
+    rwx_acc &= EPTE_RWX_MASK;
+
+    *l1gfn = INVALID_GFN;
+
+    rc = nept_walk_tables(v, l2ga, &gw);
+    switch ( rc ) {
+    case EPT_TRANSLATE_SUCCEED:
+        if ( likely(gw.lxe[0].epte & NEPT_2M_ENTRY_FLAG) )
+        {
+            rwx_bits = gw.lxe[4].epte & gw.lxe[3].epte & gw.lxe[2].epte &
+                            EPTE_RWX_MASK;
+            *page_order = 9;
+        }
+        else if ( gw.lxe[0].epte & NEPT_4K_ENTRY_FLAG ) {
+            rwx_bits = gw.lxe[4].epte & gw.lxe[3].epte & gw.lxe[2].epte &
+                    gw.lxe[1].epte & EPTE_RWX_MASK;
+            *page_order = 0;
+        }
+        else if ( gw.lxe[0].epte & NEPT_1G_ENTRY_FLAG  )
+        {
+            rwx_bits = gw.lxe[4].epte & gw.lxe[3].epte  & EPTE_RWX_MASK;
+            *page_order = 18;
+        }
+        else
+        {
+            gdprintk(XENLOG_ERR, "Uncorrect l1 entry!\n");
+            BUG();
+        }
+        if ( nept_permission_check(rwx_acc, rwx_bits) )
+        {
+            *l1gfn = gw.lxe[0].mfn;
+            break;
+        }
+        rc = EPT_TRANSLATE_VIOLATION;
+    /* Fall through to EPT violation if permission check fails. */
+    case EPT_TRANSLATE_VIOLATION:
+        *exit_qual = (*exit_qual & 0xffffffc0) | (rwx_bits << 3) | rwx_acc;
+        *exit_reason = EXIT_REASON_EPT_VIOLATION;
+        break;
+
+    case EPT_TRANSLATE_ERR_PAGE:
+        break;
+    case EPT_TRANSLATE_MISCONFIG:
+        rc = EPT_TRANSLATE_MISCONFIG;
+        *exit_qual = 0;
+        *exit_reason = EXIT_REASON_EPT_MISCONFIG;
+        break;
+    case EPT_TRANSLATE_RETRY:
+        break;
+    default:
+        gdprintk(XENLOG_ERR, "Unsupported ept translation type!:%d\n", rc);
+    }
+    return rc;
+}
diff --git a/xen/arch/x86/mm/hap/nested_hap.c b/xen/arch/x86/mm/hap/nested_hap.c
index 8787c91..6d1264b 100644
--- a/xen/arch/x86/mm/hap/nested_hap.c
+++ b/xen/arch/x86/mm/hap/nested_hap.c
@@ -217,7 +217,7 @@ nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t *L2_gpa,
     /* let caller to handle these two cases */
     switch (rv) {
     case NESTEDHVM_PAGEFAULT_INJECT:
-        return rv;
+    case NESTEDHVM_PAGEFAULT_RETRY:
     case NESTEDHVM_PAGEFAULT_L1_ERROR:
         return rv;
     case NESTEDHVM_PAGEFAULT_DONE:
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 4967da1..409198c 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -4582,7 +4582,7 @@ static mfn_t emulate_gva_to_mfn(struct vcpu *v,
     /* Translate the GFN to an MFN */
     ASSERT(!paging_locked_by_me(v->domain));
     mfn = get_gfn(v->domain, _gfn(gfn), &p2mt);
-        
+
     if ( p2m_is_readonly(p2mt) )
     {
         put_gfn(v->domain, gfn);
diff --git a/xen/include/asm-x86/guest_pt.h b/xen/include/asm-x86/guest_pt.h
index 4e1dda0..db8a0b6 100644
--- a/xen/include/asm-x86/guest_pt.h
+++ b/xen/include/asm-x86/guest_pt.h
@@ -315,6 +315,14 @@ guest_walk_to_page_order(walk_t *gw)
 #define GPT_RENAME2(_n, _l) _n ## _ ## _l ## _levels
 #define GPT_RENAME(_n, _l) GPT_RENAME2(_n, _l)
 #define guest_walk_tables GPT_RENAME(guest_walk_tables, GUEST_PAGING_LEVELS)
+#define map_domain_gfn GPT_RENAME(map_domain_gfn, GUEST_PAGING_LEVELS)
+
+extern void *map_domain_gfn(struct p2m_domain *p2m,
+                                   gfn_t gfn,
+                                   mfn_t *mfn,
+                                   p2m_type_t *p2mt,
+                                   p2m_query_t q,
+                                   uint32_t *rc);
 
 extern uint32_t 
 guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m, unsigned long va,
diff --git a/xen/include/asm-x86/hvm/nestedhvm.h b/xen/include/asm-x86/hvm/nestedhvm.h
index 91fde0b..649c511 100644
--- a/xen/include/asm-x86/hvm/nestedhvm.h
+++ b/xen/include/asm-x86/hvm/nestedhvm.h
@@ -52,6 +52,7 @@ bool_t nestedhvm_vcpu_in_guestmode(struct vcpu *v);
 #define NESTEDHVM_PAGEFAULT_L1_ERROR   2
 #define NESTEDHVM_PAGEFAULT_L0_ERROR   3
 #define NESTEDHVM_PAGEFAULT_MMIO       4
+#define NESTEDHVM_PAGEFAULT_RETRY      5
 int nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t *L2_gpa,
     bool_t access_r, bool_t access_w, bool_t access_x);
 
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index ef2c9c9..9a728b6 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -194,6 +194,7 @@ extern u32 vmx_secondary_exec_control;
 
 extern bool_t cpu_has_vmx_ins_outs_instr_info;
 
+#define VMX_EPT_EXEC_ONLY_SUPPORTED             0x00000001
 #define VMX_EPT_WALK_LENGTH_4_SUPPORTED         0x00000040
 #define VMX_EPT_MEMORY_TYPE_UC                  0x00000100
 #define VMX_EPT_MEMORY_TYPE_WB                  0x00004000
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index aa5b080..feaaa80 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -51,6 +51,11 @@ typedef union {
     u64 epte;
 } ept_entry_t;
 
+typedef struct {
+    /*use lxe[0] to save result */
+    ept_entry_t lxe[5];
+} ept_walk_t;
+
 #define EPT_TABLE_ORDER         9
 #define EPTE_SUPER_PAGE_MASK    0x80
 #define EPTE_MFN_MASK           0xffffffffff000ULL
@@ -60,6 +65,28 @@ typedef union {
 #define EPTE_AVAIL1_SHIFT       8
 #define EPTE_EMT_SHIFT          3
 #define EPTE_IGMT_SHIFT         6
+#define EPTE_RWX_MASK           0x7
+#define EPTE_FLAG_MASK          0x7f
+
+#define EPT_EMT_UC              0
+#define EPT_EMT_WC              1
+#define EPT_EMT_RSV0            2
+#define EPT_EMT_RSV1            3
+#define EPT_EMT_WT              4
+#define EPT_EMT_WP              5
+#define EPT_EMT_WB              6
+#define EPT_EMT_RSV2            7
+
+typedef enum {
+    ept_access_n     = 0, /* No access permissions allowed */
+    ept_access_r     = 1,
+    ept_access_w     = 2,
+    ept_access_rw    = 3,
+    ept_access_x     = 4,
+    ept_access_rx    = 5,
+    ept_access_wx    = 6,
+    ept_access_all   = 7,
+} ept_access_t;
 
 void vmx_asm_vmexit_handler(struct cpu_user_regs);
 void vmx_asm_do_vmentry(void);
@@ -419,6 +446,7 @@ void update_guest_eip(void);
 #define _EPT_GLA_FAULT              8
 #define EPT_GLA_FAULT               (1UL<<_EPT_GLA_FAULT)
 
+#define EPT_L4_PAGETABLE_SHIFT      39
 #define EPT_PAGETABLE_ENTRIES       512
 
 #endif /* __ASM_X86_HVM_VMX_VMX_H__ */
diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h b/xen/include/asm-x86/hvm/vmx/vvmx.h
index 422f006..8eb377b 100644
--- a/xen/include/asm-x86/hvm/vmx/vvmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vvmx.h
@@ -32,6 +32,10 @@ struct nestedvmx {
         unsigned long intr_info;
         u32           error_code;
     } intr;
+    struct {
+        uint32_t exit_reason;
+        uint32_t exit_qual;
+    } ept_exit;
 };
 
 #define vcpu_2_nvmx(v)	(vcpu_nestedhvm(v).u.nvmx)
@@ -109,6 +113,12 @@ void nvmx_domain_relinquish_resources(struct domain *d);
 int nvmx_handle_vmxon(struct cpu_user_regs *regs);
 int nvmx_handle_vmxoff(struct cpu_user_regs *regs);
 
+#define EPT_TRANSLATE_SUCCEED   0
+#define EPT_TRANSLATE_VIOLATION 1
+#define EPT_TRANSLATE_ERR_PAGE  2
+#define EPT_TRANSLATE_MISCONFIG 3
+#define EPT_TRANSLATE_RETRY     4
+
 int
 nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa,
                       unsigned int *page_order,
@@ -192,5 +202,9 @@ u64 nvmx_get_tsc_offset(struct vcpu *v);
 int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs,
                           unsigned int exit_reason);
 
+int nept_translate_l2ga(struct vcpu *v, paddr_t l2ga, 
+                        unsigned int *page_order, uint32_t rwx_acc,
+                        unsigned long *l1gfn, uint64_t *exit_qual,
+                        uint32_t *exit_reason);
 #endif /* __ASM_X86_HVM_VVMX_H__ */
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 04/10] EPT: Make ept data structure or operations neutral
  2012-12-19 19:44 [PATCH v2 00/10] Nested VMX: Add virtual EPT & VPID support to L1 VMM Xiantao Zhang
                   ` (2 preceding siblings ...)
  2012-12-19 19:44 ` [PATCH v2 03/10] nested_ept: Implement guest ept's walker Xiantao Zhang
@ 2012-12-19 19:44 ` Xiantao Zhang
  2012-12-19 19:44 ` [PATCH v2 05/10] nEPT: Try to enable EPT paging for L2 guest Xiantao Zhang
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Xiantao Zhang @ 2012-12-19 19:44 UTC (permalink / raw)
  To: xen-devel; +Cc: keir, jun.nakajima, tim, eddie.dong, JBeulich, Zhang Xiantao

From: Zhang Xiantao <xiantao.zhang@intel.com>

Share the current EPT logic with nested EPT case, so
make the related data structure or operations netural
to comment EPT and nested EPT.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c        |    9 +++-
 xen/arch/x86/hvm/vmx/vmx.c         |   53 ++-----------------
 xen/arch/x86/mm/p2m-ept.c          |  104 ++++++++++++++++++++++++++++--------
 xen/arch/x86/mm/p2m.c              |   23 ++++++---
 xen/include/asm-x86/hvm/vmx/vmcs.h |   23 ++++----
 xen/include/asm-x86/hvm/vmx/vmx.h  |   10 +++-
 xen/include/asm-x86/p2m.h          |    4 ++
 7 files changed, 133 insertions(+), 93 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 9adc7a4..379b75c 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -941,8 +941,13 @@ static int construct_vmcs(struct vcpu *v)
         __vmwrite(TPR_THRESHOLD, 0);
     }
 
-    if ( paging_mode_hap(d) )
-        __vmwrite(EPT_POINTER, d->arch.hvm_domain.vmx.ept_control.eptp);
+    if ( paging_mode_hap(d) ) {
+        struct p2m_domain *p2m = p2m_get_hostp2m(d);
+        struct ept_data *ept = &p2m->ept;
+
+        ept->asr  = pagetable_get_pfn(p2m_get_pagetable(p2m));
+        __vmwrite(EPT_POINTER, ept_get_eptp(ept));
+    }
 
     if ( cpu_has_vmx_pat && paging_mode_hap(d) )
     {
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 4abfa90..d74aae0 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -74,38 +74,19 @@ static void vmx_fpu_dirty_intercept(void);
 static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content);
 static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content);
 static void vmx_invlpg_intercept(unsigned long vaddr);
-static void __ept_sync_domain(void *info);
 
 static int vmx_domain_initialise(struct domain *d)
 {
     int rc;
 
-    /* Set the memory type used when accessing EPT paging structures. */
-    d->arch.hvm_domain.vmx.ept_control.ept_mt = EPT_DEFAULT_MT;
-
-    /* set EPT page-walk length, now it's actual walk length - 1, i.e. 3 */
-    d->arch.hvm_domain.vmx.ept_control.ept_wl = 3;
-
-    d->arch.hvm_domain.vmx.ept_control.asr  =
-        pagetable_get_pfn(p2m_get_pagetable(p2m_get_hostp2m(d)));
-
-    if ( !zalloc_cpumask_var(&d->arch.hvm_domain.vmx.ept_synced) )
-        return -ENOMEM;
-
     if ( (rc = vmx_alloc_vlapic_mapping(d)) != 0 )
-    {
-        free_cpumask_var(d->arch.hvm_domain.vmx.ept_synced);
         return rc;
-    }
 
     return 0;
 }
 
 static void vmx_domain_destroy(struct domain *d)
 {
-    if ( paging_mode_hap(d) )
-        on_each_cpu(__ept_sync_domain, d, 1);
-    free_cpumask_var(d->arch.hvm_domain.vmx.ept_synced);
     vmx_free_vlapic_mapping(d);
 }
 
@@ -641,6 +622,7 @@ static void vmx_ctxt_switch_to(struct vcpu *v)
 {
     struct domain *d = v->domain;
     unsigned long old_cr4 = read_cr4(), new_cr4 = mmu_cr4_features;
+    struct ept_data *ept_data = &p2m_get_hostp2m(d)->ept;
 
     /* HOST_CR4 in VMCS is always mmu_cr4_features. Sync CR4 now. */
     if ( old_cr4 != new_cr4 )
@@ -650,10 +632,10 @@ static void vmx_ctxt_switch_to(struct vcpu *v)
     {
         unsigned int cpu = smp_processor_id();
         /* Test-and-test-and-set this CPU in the EPT-is-synced mask. */
-        if ( !cpumask_test_cpu(cpu, d->arch.hvm_domain.vmx.ept_synced) &&
+        if ( !cpumask_test_cpu(cpu, ept_get_synced_mask(ept_data)) &&
              !cpumask_test_and_set_cpu(cpu,
-                                       d->arch.hvm_domain.vmx.ept_synced) )
-            __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(d), 0);
+                                       ept_get_synced_mask(ept_data)) )
+            __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept_data), 0);
     }
 
     vmx_restore_guest_msrs(v);
@@ -1216,33 +1198,6 @@ static void vmx_update_guest_efer(struct vcpu *v)
                    (v->arch.hvm_vcpu.guest_efer & EFER_SCE));
 }
 
-static void __ept_sync_domain(void *info)
-{
-    struct domain *d = info;
-    __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(d), 0);
-}
-
-void ept_sync_domain(struct domain *d)
-{
-    /* Only if using EPT and this domain has some VCPUs to dirty. */
-    if ( !paging_mode_hap(d) || !d->vcpu || !d->vcpu[0] )
-        return;
-
-    ASSERT(local_irq_is_enabled());
-
-    /*
-     * Flush active cpus synchronously. Flush others the next time this domain
-     * is scheduled onto them. We accept the race of other CPUs adding to
-     * the ept_synced mask before on_selected_cpus() reads it, resulting in
-     * unnecessary extra flushes, to avoid allocating a cpumask_t on the stack.
-     */
-    cpumask_and(d->arch.hvm_domain.vmx.ept_synced,
-                d->domain_dirty_cpumask, &cpu_online_map);
-
-    on_selected_cpus(d->arch.hvm_domain.vmx.ept_synced,
-                     __ept_sync_domain, d, 1);
-}
-
 void nvmx_enqueue_n2_exceptions(struct vcpu *v, 
             unsigned long intr_fields, int error_code)
 {
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index c964f54..e33f415 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -291,9 +291,11 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
     int need_modify_vtd_table = 1;
     int vtd_pte_present = 0;
     int needs_sync = 1;
-    struct domain *d = p2m->domain;
     ept_entry_t old_entry = { .epte = 0 };
+    struct ept_data *ept = &p2m->ept;
+    struct domain *d = p2m->domain;
 
+    ASSERT(ept);
     /*
      * the caller must make sure:
      * 1. passing valid gfn and mfn at order boundary.
@@ -301,17 +303,17 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
      * 3. passing a valid order.
      */
     if ( ((gfn | mfn_x(mfn)) & ((1UL << order) - 1)) ||
-         ((u64)gfn >> ((ept_get_wl(d) + 1) * EPT_TABLE_ORDER)) ||
+         ((u64)gfn >> ((ept_get_wl(ept) + 1) * EPT_TABLE_ORDER)) ||
          (order % EPT_TABLE_ORDER) )
         return 0;
 
-    ASSERT((target == 2 && hvm_hap_has_1gb(d)) ||
-           (target == 1 && hvm_hap_has_2mb(d)) ||
+    ASSERT((target == 2 && hvm_hap_has_1gb()) ||
+           (target == 1 && hvm_hap_has_2mb()) ||
            (target == 0));
 
-    table = map_domain_page(ept_get_asr(d));
+    table = map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m)));
 
-    for ( i = ept_get_wl(d); i > target; i-- )
+    for ( i = ept_get_wl(ept); i > target; i-- )
     {
         ret = ept_next_level(p2m, 0, &table, &gfn_remainder, i);
         if ( !ret )
@@ -439,9 +441,11 @@ out:
     unmap_domain_page(table);
 
     if ( needs_sync )
-        ept_sync_domain(p2m->domain);
+        ept_sync_domain(p2m);
 
-    if ( rv && iommu_enabled && need_iommu(p2m->domain) && need_modify_vtd_table )
+    /* For non-nested p2m, may need to change VT-d page table.*/
+    if ( rv && !p2m_is_nestedp2m(p2m) && iommu_enabled && need_iommu(p2m->domain) &&
+                need_modify_vtd_table )
     {
         if ( iommu_hap_pt_share )
             iommu_pte_flush(d, gfn, (u64*)ept_entry, order, vtd_pte_present);
@@ -488,14 +492,14 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m,
                            unsigned long gfn, p2m_type_t *t, p2m_access_t* a,
                            p2m_query_t q, unsigned int *page_order)
 {
-    struct domain *d = p2m->domain;
-    ept_entry_t *table = map_domain_page(ept_get_asr(d));
+    ept_entry_t *table = map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m)));
     unsigned long gfn_remainder = gfn;
     ept_entry_t *ept_entry;
     u32 index;
     int i;
     int ret = 0;
     mfn_t mfn = _mfn(INVALID_MFN);
+    struct ept_data *ept = &p2m->ept;
 
     *t = p2m_mmio_dm;
     *a = p2m_access_n;
@@ -506,7 +510,7 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m,
 
     /* Should check if gfn obeys GAW here. */
 
-    for ( i = ept_get_wl(d); i > 0; i-- )
+    for ( i = ept_get_wl(ept); i > 0; i-- )
     {
     retry:
         ret = ept_next_level(p2m, 1, &table, &gfn_remainder, i);
@@ -588,19 +592,20 @@ out:
 static ept_entry_t ept_get_entry_content(struct p2m_domain *p2m,
     unsigned long gfn, int *level)
 {
-    ept_entry_t *table = map_domain_page(ept_get_asr(p2m->domain));
+    ept_entry_t *table =  map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m)));
     unsigned long gfn_remainder = gfn;
     ept_entry_t *ept_entry;
     ept_entry_t content = { .epte = 0 };
     u32 index;
     int i;
     int ret=0;
+    struct ept_data *ept = &p2m->ept;
 
     /* This pfn is higher than the highest the p2m map currently holds */
     if ( gfn > p2m->max_mapped_pfn )
         goto out;
 
-    for ( i = ept_get_wl(p2m->domain); i > 0; i-- )
+    for ( i = ept_get_wl(ept); i > 0; i-- )
     {
         ret = ept_next_level(p2m, 1, &table, &gfn_remainder, i);
         if ( !ret || ret == GUEST_TABLE_POD_PAGE )
@@ -622,7 +627,8 @@ static ept_entry_t ept_get_entry_content(struct p2m_domain *p2m,
 void ept_walk_table(struct domain *d, unsigned long gfn)
 {
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
-    ept_entry_t *table = map_domain_page(ept_get_asr(d));
+    struct ept_data *ept = &p2m->ept;
+    ept_entry_t *table =  map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m)));
     unsigned long gfn_remainder = gfn;
 
     int i;
@@ -638,7 +644,7 @@ void ept_walk_table(struct domain *d, unsigned long gfn)
         goto out;
     }
 
-    for ( i = ept_get_wl(d); i >= 0; i-- )
+    for ( i = ept_get_wl(ept); i >= 0; i-- )
     {
         ept_entry_t *ept_entry, *next;
         u32 index;
@@ -778,24 +784,76 @@ static void ept_change_entry_type_page(mfn_t ept_page_mfn, int ept_page_level,
 static void ept_change_entry_type_global(struct p2m_domain *p2m,
                                          p2m_type_t ot, p2m_type_t nt)
 {
-    struct domain *d = p2m->domain;
-    if ( ept_get_asr(d) == 0 )
+    struct ept_data *ept = &p2m->ept;
+    if ( ept_get_asr(ept) == 0 )
         return;
 
     BUG_ON(p2m_is_grant(ot) || p2m_is_grant(nt));
     BUG_ON(ot != nt && (ot == p2m_mmio_direct || nt == p2m_mmio_direct));
 
-    ept_change_entry_type_page(_mfn(ept_get_asr(d)), ept_get_wl(d), ot, nt);
+    ept_change_entry_type_page(_mfn(ept_get_asr(ept)),
+            ept_get_wl(ept), ot, nt);
+
+    ept_sync_domain(p2m);
+}
+
+static void __ept_sync_domain(void *info)
+{
+    struct ept_data *ept = &((struct p2m_domain *)info)->ept;
 
-    ept_sync_domain(d);
+    __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept), 0);
 }
 
-void ept_p2m_init(struct p2m_domain *p2m)
+void ept_sync_domain(struct p2m_domain *p2m)
 {
+    struct domain *d = p2m->domain;
+    struct ept_data *ept = &p2m->ept;
+    /* Only if using EPT and this domain has some VCPUs to dirty. */
+    if ( !paging_mode_hap(d) || !d->vcpu || !d->vcpu[0] )
+        return;
+
+    ASSERT(local_irq_is_enabled());
+
+    /*
+     * Flush active cpus synchronously. Flush others the next time this domain
+     * is scheduled onto them. We accept the race of other CPUs adding to
+     * the ept_synced mask before on_selected_cpus() reads it, resulting in
+     * unnecessary extra flushes, to avoid allocating a cpumask_t on the stack.
+     */
+    cpumask_and(ept_get_synced_mask(ept),
+                d->domain_dirty_cpumask, &cpu_online_map);
+
+    on_selected_cpus(ept_get_synced_mask(ept),
+                     __ept_sync_domain, p2m, 1);
+}
+
+int ept_p2m_init(struct p2m_domain *p2m)
+{
+    struct ept_data *ept = &p2m->ept;
+
     p2m->set_entry = ept_set_entry;
     p2m->get_entry = ept_get_entry;
     p2m->change_entry_type_global = ept_change_entry_type_global;
     p2m->audit_p2m = NULL;
+
+    /* Set the memory type used when accessing EPT paging structures. */
+    ept->ept_mt = EPT_DEFAULT_MT;
+
+    /* set EPT page-walk length, now it's actual walk length - 1, i.e. 3 */
+    ept->ept_wl = 3;
+
+    if ( !zalloc_cpumask_var(&ept->synced_mask) )
+        return -ENOMEM;
+
+    on_each_cpu(__ept_sync_domain, p2m, 1);
+
+    return 0;
+}
+
+void ept_p2m_uninit(struct p2m_domain *p2m)
+{
+    struct ept_data *ept = &p2m->ept;
+    free_cpumask_var(ept->synced_mask);
 }
 
 static void ept_dump_p2m_table(unsigned char key)
@@ -811,6 +869,7 @@ static void ept_dump_p2m_table(unsigned char key)
     unsigned long gfn, gfn_remainder;
     unsigned long record_counter = 0;
     struct p2m_domain *p2m;
+    struct ept_data *ept;
 
     for_each_domain(d)
     {
@@ -818,15 +877,16 @@ static void ept_dump_p2m_table(unsigned char key)
             continue;
 
         p2m = p2m_get_hostp2m(d);
+        ept = &p2m->ept;
         printk("\ndomain%d EPT p2m table: \n", d->domain_id);
 
         for ( gfn = 0; gfn <= p2m->max_mapped_pfn; gfn += (1 << order) )
         {
             gfn_remainder = gfn;
             mfn = _mfn(INVALID_MFN);
-            table = map_domain_page(ept_get_asr(d));
+            table = map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m)));
 
-            for ( i = ept_get_wl(d); i > 0; i-- )
+            for ( i = ept_get_wl(ept); i > 0; i-- )
             {
                 ret = ept_next_level(p2m, 1, &table, &gfn_remainder, i);
                 if ( ret != GUEST_TABLE_NORMAL_PAGE )
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 6a4bdd9..1f59410 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -57,8 +57,10 @@ boolean_param("hap_2mb", opt_hap_2mb);
 
 
 /* Init the datastructures for later use by the p2m code */
-static void p2m_initialise(struct domain *d, struct p2m_domain *p2m)
+static int p2m_initialise(struct domain *d, struct p2m_domain *p2m)
 {
+    int ret = 0;
+
     mm_rwlock_init(&p2m->lock);
     mm_lock_init(&p2m->pod.lock);
     INIT_LIST_HEAD(&p2m->np2m_list);
@@ -72,11 +74,11 @@ static void p2m_initialise(struct domain *d, struct p2m_domain *p2m)
     p2m->np2m_base = P2M_BASE_EADDR;
 
     if ( hap_enabled(d) && cpu_has_vmx )
-        ept_p2m_init(p2m);
+        ret = ept_p2m_init(p2m);
     else
         p2m_pt_init(p2m);
 
-    return;
+    return ret;
 }
 
 static int
@@ -119,7 +121,7 @@ int p2m_init(struct domain *d)
      * since nestedhvm_enabled(d) returns false here.
      * (p2m_init runs too early for HVM_PARAM_* options) */
     rc = p2m_init_nestedp2m(d);
-    if ( rc ) 
+    if ( rc )
         p2m_final_teardown(d);
     return rc;
 }
@@ -424,12 +426,16 @@ void p2m_teardown(struct p2m_domain *p2m)
 static void p2m_teardown_nestedp2m(struct domain *d)
 {
     uint8_t i;
+    struct p2m_domain *p2m;
 
     for (i = 0; i < MAX_NESTEDP2M; i++) {
         if ( !d->arch.nested_p2m[i] )
             continue;
-        free_cpumask_var(d->arch.nested_p2m[i]->dirty_cpumask);
-        xfree(d->arch.nested_p2m[i]);
+        p2m = d->arch.nested_p2m[i];
+        free_cpumask_var(p2m->dirty_cpumask);
+        if ( hap_enabled(d) && cpu_has_vmx )
+            ept_p2m_uninit(p2m);
+        xfree(p2m);
         d->arch.nested_p2m[i] = NULL;
     }
 }
@@ -437,9 +443,12 @@ static void p2m_teardown_nestedp2m(struct domain *d)
 void p2m_final_teardown(struct domain *d)
 {
     /* Iterate over all p2m tables per domain */
-    if ( d->arch.p2m )
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    if ( p2m )
     {
         free_cpumask_var(d->arch.p2m->dirty_cpumask);
+        if ( hap_enabled(d) && cpu_has_vmx )
+            ept_p2m_uninit(p2m);
         xfree(d->arch.p2m);
         d->arch.p2m = NULL;
     }
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 9a728b6..2d38b43 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -56,26 +56,27 @@ struct vmx_msr_state {
 
 #define EPT_DEFAULT_MT      MTRR_TYPE_WRBACK
 
-struct vmx_domain {
-    unsigned long apic_access_mfn;
+struct ept_data{
     union {
-        struct {
+    struct {
             u64 ept_mt :3,
                 ept_wl :3,
                 rsvd   :6,
                 asr    :52;
         };
         u64 eptp;
-    } ept_control;
-    cpumask_var_t ept_synced;
+    };
+    cpumask_var_t synced_mask;
+};
+
+struct vmx_domain {
+    unsigned long apic_access_mfn;
 };
 
-#define ept_get_wl(d)   \
-    ((d)->arch.hvm_domain.vmx.ept_control.ept_wl)
-#define ept_get_asr(d)  \
-    ((d)->arch.hvm_domain.vmx.ept_control.asr)
-#define ept_get_eptp(d) \
-    ((d)->arch.hvm_domain.vmx.ept_control.eptp)
+#define ept_get_wl(ept)   ((ept)->ept_wl)
+#define ept_get_asr(ept)  ((ept)->asr)
+#define ept_get_eptp(ept) ((ept)->eptp)
+#define ept_get_synced_mask(ept) ((ept)->synced_mask)
 
 struct arch_vmx_struct {
     /* Virtual address of VMCS. */
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index feaaa80..2600694 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -360,7 +360,7 @@ static inline void ept_sync_all(void)
     __invept(INVEPT_ALL_CONTEXT, 0, 0);
 }
 
-void ept_sync_domain(struct domain *d);
+void ept_sync_domain(struct p2m_domain *p2m);
 
 static inline void vpid_sync_vcpu_gva(struct vcpu *v, unsigned long gva)
 {
@@ -422,12 +422,18 @@ void vmx_get_segment_register(struct vcpu *, enum x86_segment,
 void vmx_inject_extint(int trap);
 void vmx_inject_nmi(void);
 
-void ept_p2m_init(struct p2m_domain *p2m);
+int ept_p2m_init(struct p2m_domain *p2m);
+void ept_p2m_uninit(struct p2m_domain *p2m);
+
 void ept_walk_table(struct domain *d, unsigned long gfn);
 void setup_ept_dump(void);
 
 void update_guest_eip(void);
 
+int alloc_p2m_hap_data(struct p2m_domain *p2m);
+void free_p2m_hap_data(struct p2m_domain *p2m);
+void p2m_init_hap_data(struct p2m_domain *p2m);
+
 /* EPT violation qualifications definitions */
 #define _EPT_READ_VIOLATION         0
 #define EPT_READ_VIOLATION          (1UL<<_EPT_READ_VIOLATION)
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index ce26594..b6a84b6 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -277,6 +277,10 @@ struct p2m_domain {
         mm_lock_t        lock;         /* Locking of private pod structs,   *
                                         * not relying on the p2m lock.      */
     } pod;
+    union {
+        struct ept_data ept;
+        /* NPT-equivalent structure could be added here. */
+    };
 };
 
 /* get host p2m table */
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 05/10] nEPT: Try to enable EPT paging for L2 guest.
  2012-12-19 19:44 [PATCH v2 00/10] Nested VMX: Add virtual EPT & VPID support to L1 VMM Xiantao Zhang
                   ` (3 preceding siblings ...)
  2012-12-19 19:44 ` [PATCH v2 04/10] EPT: Make ept data structure or operations neutral Xiantao Zhang
@ 2012-12-19 19:44 ` Xiantao Zhang
  2012-12-19 17:16   ` Nakajima, Jun
  2012-12-19 19:44 ` [PATCH v2 06/10] nEPT: Sync PDPTR fields if L2 guest in PAE paging mode Xiantao Zhang
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 18+ messages in thread
From: Xiantao Zhang @ 2012-12-19 19:44 UTC (permalink / raw)
  To: xen-devel; +Cc: keir, jun.nakajima, tim, eddie.dong, JBeulich, Zhang Xiantao

From: Zhang Xiantao <xiantao.zhang@intel.com>

Once found EPT is enabled by L1 VMM, enabled nested EPT support
for L2 guest.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
---
 xen/arch/x86/hvm/vmx/vmx.c         |   16 +++++++++--
 xen/arch/x86/hvm/vmx/vvmx.c        |   48 +++++++++++++++++++++++++++--------
 xen/include/asm-x86/hvm/vmx/vvmx.h |    5 +++-
 3 files changed, 54 insertions(+), 15 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index d74aae0..e5be5a2 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1461,6 +1461,7 @@ static struct hvm_function_table __read_mostly vmx_function_table = {
     .nhvm_vcpu_guestcr3   = nvmx_vcpu_guestcr3,
     .nhvm_vcpu_p2m_base   = nvmx_vcpu_eptp_base,
     .nhvm_vcpu_asid       = nvmx_vcpu_asid,
+    .nhvm_vmcx_hap_enabled = nvmx_ept_enabled,
     .nhvm_vmcx_guest_intercepts_trap = nvmx_intercepts_exception,
     .nhvm_vcpu_vmexit_trap = nvmx_vmexit_trap,
     .nhvm_intr_blocked    = nvmx_intr_blocked,
@@ -2003,6 +2004,7 @@ static void ept_handle_violation(unsigned long qualification, paddr_t gpa)
     unsigned long gla, gfn = gpa >> PAGE_SHIFT;
     mfn_t mfn;
     p2m_type_t p2mt;
+    int ret;
     struct domain *d = current->domain;
 
     if ( tb_init_done )
@@ -2017,18 +2019,26 @@ static void ept_handle_violation(unsigned long qualification, paddr_t gpa)
         _d.gpa = gpa;
         _d.qualification = qualification;
         _d.mfn = mfn_x(get_gfn_query_unlocked(d, gfn, &_d.p2mt));
-        
+
         __trace_var(TRC_HVM_NPF, 0, sizeof(_d), &_d);
     }
 
-    if ( hvm_hap_nested_page_fault(gpa,
+    ret = hvm_hap_nested_page_fault(gpa,
                                    qualification & EPT_GLA_VALID       ? 1 : 0,
                                    qualification & EPT_GLA_VALID
                                      ? __vmread(GUEST_LINEAR_ADDRESS) : ~0ull,
                                    qualification & EPT_READ_VIOLATION  ? 1 : 0,
                                    qualification & EPT_WRITE_VIOLATION ? 1 : 0,
-                                   qualification & EPT_EXEC_VIOLATION  ? 1 : 0) )
+                                   qualification & EPT_EXEC_VIOLATION  ? 1 : 0);
+    switch ( ret ) {
+    case 0:
+        break;
+    case 1:
         return;
+    case -1:
+        vcpu_nestedhvm(current).nv_vmexit_pending = 1;
+        return;
+    }
 
     /* Everything else is an error. */
     mfn = get_gfn_query_unlocked(d, gfn, &p2mt);
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 76cf757..c100730 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -41,6 +41,7 @@ int nvmx_vcpu_initialise(struct vcpu *v)
         gdprintk(XENLOG_ERR, "nest: allocation for shadow vmcs failed\n");
 	goto out;
     }
+    nvmx->ept.enabled = 0;
     nvmx->vmxon_region_pa = 0;
     nvcpu->nv_vvmcx = NULL;
     nvcpu->nv_vvmcxaddr = VMCX_EADDR;
@@ -96,9 +97,11 @@ uint64_t nvmx_vcpu_guestcr3(struct vcpu *v)
 
 uint64_t nvmx_vcpu_eptp_base(struct vcpu *v)
 {
-    /* TODO */
-    ASSERT(0);
-    return 0;
+    uint64_t eptp_base;
+    struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
+
+    eptp_base = __get_vvmcs(nvcpu->nv_vvmcx, EPT_POINTER);
+    return eptp_base & PAGE_MASK; 
 }
 
 uint32_t nvmx_vcpu_asid(struct vcpu *v)
@@ -108,6 +111,13 @@ uint32_t nvmx_vcpu_asid(struct vcpu *v)
     return 0;
 }
 
+bool_t nvmx_ept_enabled(struct vcpu *v)
+{
+    struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
+
+    return !!(nvmx->ept.enabled);
+}
+
 static const enum x86_segment sreg_to_index[] = {
     [VMX_SREG_ES] = x86_seg_es,
     [VMX_SREG_CS] = x86_seg_cs,
@@ -503,14 +513,16 @@ void nvmx_update_exec_control(struct vcpu *v, u32 host_cntrl)
 }
 
 void nvmx_update_secondary_exec_control(struct vcpu *v,
-                                            unsigned long value)
+                                            unsigned long host_cntrl)
 {
     u32 shadow_cntrl;
     struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
+    struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
 
     shadow_cntrl = __get_vvmcs(nvcpu->nv_vvmcx, SECONDARY_VM_EXEC_CONTROL);
-    shadow_cntrl |= value;
-    set_shadow_control(v, SECONDARY_VM_EXEC_CONTROL, shadow_cntrl);
+    nvmx->ept.enabled = !!(shadow_cntrl & SECONDARY_EXEC_ENABLE_EPT);
+    shadow_cntrl |= host_cntrl;
+    __vmwrite(SECONDARY_VM_EXEC_CONTROL, shadow_cntrl);
 }
 
 static void nvmx_update_pin_control(struct vcpu *v, unsigned long host_cntrl)
@@ -818,6 +830,17 @@ static void load_shadow_guest_state(struct vcpu *v)
     /* TODO: CR3 target control */
 }
 
+
+static uint64_t get_shadow_eptp(struct vcpu *v)
+{
+    uint64_t np2m_base = nvmx_vcpu_eptp_base(v);
+    struct p2m_domain *p2m = p2m_get_nestedp2m(v, np2m_base);
+    struct ept_data *ept = &p2m->ept;
+
+    ept->asr = pagetable_get_pfn(p2m_get_pagetable(p2m));
+    return ept_get_eptp(ept);
+}
+
 static void virtual_vmentry(struct cpu_user_regs *regs)
 {
     struct vcpu *v = current;
@@ -862,7 +885,10 @@ static void virtual_vmentry(struct cpu_user_regs *regs)
     /* updating host cr0 to sync TS bit */
     __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0);
 
-    /* TODO: EPT_POINTER */
+    /* Setup virtual ETP for L2 guest*/
+    if ( nestedhvm_paging_mode_hap(v) )
+        __vmwrite(EPT_POINTER, get_shadow_eptp(v));
+
 }
 
 static void sync_vvmcs_guest_state(struct vcpu *v, struct cpu_user_regs *regs)
@@ -915,8 +941,8 @@ static void sync_vvmcs_ro(struct vcpu *v)
     /* Adjust exit_reason/exit_qualifciation for violation case */
     if ( __get_vvmcs(vvmcs, VM_EXIT_REASON) ==
                 EXIT_REASON_EPT_VIOLATION ) {
-        __set_vvmcs(vvmcs, EXIT_QUALIFICATION, nvmx->ept_exit.exit_qual);
-        __set_vvmcs(vvmcs, VM_EXIT_REASON, nvmx->ept_exit.exit_reason);
+        __set_vvmcs(vvmcs, EXIT_QUALIFICATION, nvmx->ept.exit_qual);
+        __set_vvmcs(vvmcs, VM_EXIT_REASON, nvmx->ept.exit_reason);
     }
 }
 
@@ -1480,8 +1506,8 @@ nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa,
         case EPT_TRANSLATE_VIOLATION:
         case EPT_TRANSLATE_MISCONFIG:
             rc = NESTEDHVM_PAGEFAULT_INJECT;
-            nvmx->ept_exit.exit_reason = exit_reason;
-            nvmx->ept_exit.exit_qual = exit_qual;
+            nvmx->ept.exit_reason = exit_reason;
+            nvmx->ept.exit_qual = exit_qual;
             break;
         case EPT_TRANSLATE_RETRY:
             rc = NESTEDHVM_PAGEFAULT_RETRY;
diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h b/xen/include/asm-x86/hvm/vmx/vvmx.h
index 8eb377b..661cd8a 100644
--- a/xen/include/asm-x86/hvm/vmx/vvmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vvmx.h
@@ -33,9 +33,10 @@ struct nestedvmx {
         u32           error_code;
     } intr;
     struct {
+        char     enabled;
         uint32_t exit_reason;
         uint32_t exit_qual;
-    } ept_exit;
+    } ept;
 };
 
 #define vcpu_2_nvmx(v)	(vcpu_nestedhvm(v).u.nvmx)
@@ -110,6 +111,8 @@ int nvmx_intercepts_exception(struct vcpu *v,
                               unsigned int trap, int error_code);
 void nvmx_domain_relinquish_resources(struct domain *d);
 
+bool_t nvmx_ept_enabled(struct vcpu *v);
+
 int nvmx_handle_vmxon(struct cpu_user_regs *regs);
 int nvmx_handle_vmxoff(struct cpu_user_regs *regs);
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 06/10] nEPT: Sync PDPTR fields if L2 guest in PAE paging mode
  2012-12-19 19:44 [PATCH v2 00/10] Nested VMX: Add virtual EPT & VPID support to L1 VMM Xiantao Zhang
                   ` (4 preceding siblings ...)
  2012-12-19 19:44 ` [PATCH v2 05/10] nEPT: Try to enable EPT paging for L2 guest Xiantao Zhang
@ 2012-12-19 19:44 ` Xiantao Zhang
  2012-12-19 19:44 ` [PATCH v2 07/10] nEPT: Use minimal permission for nested p2m Xiantao Zhang
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Xiantao Zhang @ 2012-12-19 19:44 UTC (permalink / raw)
  To: xen-devel; +Cc: keir, jun.nakajima, tim, eddie.dong, JBeulich, Zhang Xiantao

From: Zhang Xiantao <xiantao.zhang@intel.com>

For PAE L2 guest, GUEST_DPPTR registers needs to be synced for each virtual
vmentry.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
---
 xen/arch/x86/hvm/vmx/vvmx.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index c100730..7c55c51 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -826,7 +826,14 @@ static void load_shadow_guest_state(struct vcpu *v)
     vvmcs_to_shadow(vvmcs, CR0_GUEST_HOST_MASK);
     vvmcs_to_shadow(vvmcs, CR4_GUEST_HOST_MASK);
 
-    /* TODO: PDPTRs for nested ept */
+    if ( nvmx_ept_enabled(v) && hvm_pae_enabled(v) &&
+                    (v->arch.hvm_vcpu.guest_efer & EFER_LMA) ) {
+	    vvmcs_to_shadow(vvmcs, GUEST_PDPTR0);
+	    vvmcs_to_shadow(vvmcs, GUEST_PDPTR1);
+	    vvmcs_to_shadow(vvmcs, GUEST_PDPTR2);
+	    vvmcs_to_shadow(vvmcs, GUEST_PDPTR3);
+    }
+
     /* TODO: CR3 target control */
 }
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 07/10] nEPT: Use minimal permission for nested p2m.
  2012-12-19 19:44 [PATCH v2 00/10] Nested VMX: Add virtual EPT & VPID support to L1 VMM Xiantao Zhang
                   ` (5 preceding siblings ...)
  2012-12-19 19:44 ` [PATCH v2 06/10] nEPT: Sync PDPTR fields if L2 guest in PAE paging mode Xiantao Zhang
@ 2012-12-19 19:44 ` Xiantao Zhang
  2012-12-19 19:44 ` [PATCH v2 08/10] nEPT: handle invept instruction from L1 VMM Xiantao Zhang
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Xiantao Zhang @ 2012-12-19 19:44 UTC (permalink / raw)
  To: xen-devel; +Cc: keir, jun.nakajima, tim, eddie.dong, JBeulich, Zhang Xiantao

From: Zhang Xiantao <xiantao.zhang@intel.com>

Emulate permission check for the nested p2m. Current solution is to
use minimal permission, and once meet permission violation in L0, then
determin whether it is caused by guest EPT or host EPT
---
 xen/arch/x86/hvm/svm/nestedsvm.c        |    2 +-
 xen/arch/x86/hvm/vmx/vvmx.c             |    4 +-
 xen/arch/x86/mm/hap/nested_ept.c        |    5 ++-
 xen/arch/x86/mm/hap/nested_hap.c        |   38 +++++++++++++++++++++++-------
 xen/include/asm-x86/hvm/hvm.h           |    2 +-
 xen/include/asm-x86/hvm/svm/nestedsvm.h |    2 +-
 xen/include/asm-x86/hvm/vmx/vvmx.h      |    6 ++--
 7 files changed, 40 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c
index 5dcb354..ab455a9 100644
--- a/xen/arch/x86/hvm/svm/nestedsvm.c
+++ b/xen/arch/x86/hvm/svm/nestedsvm.c
@@ -1177,7 +1177,7 @@ nsvm_vmcb_hap_enabled(struct vcpu *v)
  */
 int
 nsvm_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa,
-                      unsigned int *page_order,
+                      unsigned int *page_order, uint8_t *p2m_acc,
                       bool_t access_r, bool_t access_w, bool_t access_x)
 {
     uint32_t pfec;
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 7c55c51..cb2c6e7 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1493,7 +1493,7 @@ int nvmx_msr_write_intercept(unsigned int msr, u64 msr_content)
  */
 int
 nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa,
-                      unsigned int *page_order,
+                      unsigned int *page_order, uint8_t *p2m_acc,
                       bool_t access_r, bool_t access_w, bool_t access_x)
 {
     uint64_t exit_qual = __vmread(EXIT_QUALIFICATION);
@@ -1503,7 +1503,7 @@ nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa,
     uint32_t rwx_rights = (access_x << 2) | (access_w << 1) | access_r;
     struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
 
-    rc = nept_translate_l2ga(v, L2_gpa, page_order, rwx_rights, &gfn,
+    rc = nept_translate_l2ga(v, L2_gpa, page_order, rwx_rights, &gfn, p2m_acc,
                                 &exit_qual, &exit_reason);
     switch ( rc ) {
         case EPT_TRANSLATE_SUCCEED:
diff --git a/xen/arch/x86/mm/hap/nested_ept.c b/xen/arch/x86/mm/hap/nested_ept.c
index 5f80d82..4b99281 100644
--- a/xen/arch/x86/mm/hap/nested_ept.c
+++ b/xen/arch/x86/mm/hap/nested_ept.c
@@ -215,8 +215,8 @@ out:
 
 int nept_translate_l2ga(struct vcpu *v, paddr_t l2ga,
                         unsigned int *page_order, uint32_t rwx_acc,
-                        unsigned long *l1gfn, uint64_t *exit_qual,
-                        uint32_t *exit_reason)
+                        unsigned long *l1gfn, uint8_t *p2m_acc,
+                        uint64_t *exit_qual, uint32_t *exit_reason)
 {
     uint32_t rc, rwx_bits = 0;
     ept_walk_t gw;
@@ -251,6 +251,7 @@ int nept_translate_l2ga(struct vcpu *v, paddr_t l2ga,
         if ( nept_permission_check(rwx_acc, rwx_bits) )
         {
             *l1gfn = gw.lxe[0].mfn;
+            *p2m_acc = (uint8_t)rwx_bits;
             break;
         }
         rc = EPT_TRANSLATE_VIOLATION;
diff --git a/xen/arch/x86/mm/hap/nested_hap.c b/xen/arch/x86/mm/hap/nested_hap.c
index 6d1264b..84dbf15 100644
--- a/xen/arch/x86/mm/hap/nested_hap.c
+++ b/xen/arch/x86/mm/hap/nested_hap.c
@@ -142,12 +142,12 @@ nestedhap_fix_p2m(struct vcpu *v, struct p2m_domain *p2m,
  */
 static int
 nestedhap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa,
-                      unsigned int *page_order,
+                      unsigned int *page_order, uint8_t *p2m_acc,
                       bool_t access_r, bool_t access_w, bool_t access_x)
 {
     ASSERT(hvm_funcs.nhvm_hap_walk_L1_p2m);
 
-    return hvm_funcs.nhvm_hap_walk_L1_p2m(v, L2_gpa, L1_gpa, page_order,
+    return hvm_funcs.nhvm_hap_walk_L1_p2m(v, L2_gpa, L1_gpa, page_order, p2m_acc,
         access_r, access_w, access_x);
 }
 
@@ -158,16 +158,15 @@ nestedhap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa,
  */
 static int
 nestedhap_walk_L0_p2m(struct p2m_domain *p2m, paddr_t L1_gpa, paddr_t *L0_gpa,
-                      p2m_type_t *p2mt,
+                      p2m_type_t *p2mt, p2m_access_t *p2ma,
                       unsigned int *page_order,
                       bool_t access_r, bool_t access_w, bool_t access_x)
 {
     mfn_t mfn;
-    p2m_access_t p2ma;
     int rc;
 
     /* walk L0 P2M table */
-    mfn = get_gfn_type_access(p2m, L1_gpa >> PAGE_SHIFT, p2mt, &p2ma, 
+    mfn = get_gfn_type_access(p2m, L1_gpa >> PAGE_SHIFT, p2mt, p2ma, 
                               0, page_order);
 
     rc = NESTEDHVM_PAGEFAULT_MMIO;
@@ -206,12 +205,14 @@ nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t *L2_gpa,
     struct p2m_domain *p2m, *nested_p2m;
     unsigned int page_order_21, page_order_10, page_order_20;
     p2m_type_t p2mt_10;
+    p2m_access_t p2ma_10 = p2m_access_rwx;
+    uint8_t p2ma_21;
 
     p2m = p2m_get_hostp2m(d); /* L0 p2m */
     nested_p2m = p2m_get_nestedp2m(v, nhvm_vcpu_p2m_base(v));
 
     /* walk the L1 P2M table */
-    rv = nestedhap_walk_L1_p2m(v, *L2_gpa, &L1_gpa, &page_order_21,
+    rv = nestedhap_walk_L1_p2m(v, *L2_gpa, &L1_gpa, &page_order_21, &p2ma_21,
         access_r, access_w, access_x);
 
     /* let caller to handle these two cases */
@@ -229,7 +230,7 @@ nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t *L2_gpa,
 
     /* ==> we have to walk L0 P2M */
     rv = nestedhap_walk_L0_p2m(p2m, L1_gpa, &L0_gpa,
-        &p2mt_10, &page_order_10,
+        &p2mt_10, &p2ma_10, &page_order_10,
         access_r, access_w, access_x);
 
     /* let upper level caller to handle these two cases */
@@ -250,10 +251,29 @@ nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t *L2_gpa,
 
     page_order_20 = min(page_order_21, page_order_10);
 
+    ASSERT(p2ma_10 <= p2m_access_n2rwx);
+    /*NOTE: if assert fails, needs to handle new access type here */
+
+    switch ( p2ma_10 ) {
+    case p2m_access_n ... p2m_access_rwx:
+        break;
+    case p2m_access_rx2rw:
+        p2ma_10 = p2m_access_rx;
+        break;
+    case p2m_access_n2rwx:
+        p2ma_10 = p2m_access_n;
+        break;
+    default:
+        p2ma_10 = p2m_access_n;
+        /* For safety, remove all permissions. */
+        gdprintk(XENLOG_ERR, "Unhandled p2m access type:%d\n", p2ma_10);
+    }
+    /* Use minimal permission for nested p2m. */
+    p2ma_10 &= (p2m_access_t)p2ma_21;
+
     /* fix p2m_get_pagetable(nested_p2m) */
     nestedhap_fix_p2m(v, nested_p2m, *L2_gpa, L0_gpa, page_order_20,
-        p2mt_10,
-        p2m_access_rwx /* FIXME: Should use minimum permission. */);
+        p2mt_10, p2ma_10);
 
     return NESTEDHVM_PAGEFAULT_DONE;
 }
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 80f07e9..889e3c9 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -186,7 +186,7 @@ struct hvm_function_table {
 
     /*Walk nested p2m  */
     int (*nhvm_hap_walk_L1_p2m)(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa,
-                      unsigned int *page_order,
+                      unsigned int *page_order, uint8_t *p2m_acc,
                       bool_t access_r, bool_t access_w, bool_t access_x);
 };
 
diff --git a/xen/include/asm-x86/hvm/svm/nestedsvm.h b/xen/include/asm-x86/hvm/svm/nestedsvm.h
index 0c90f30..748cc04 100644
--- a/xen/include/asm-x86/hvm/svm/nestedsvm.h
+++ b/xen/include/asm-x86/hvm/svm/nestedsvm.h
@@ -134,7 +134,7 @@ void svm_vmexit_do_clgi(struct cpu_user_regs *regs, struct vcpu *v);
 void svm_vmexit_do_stgi(struct cpu_user_regs *regs, struct vcpu *v);
 bool_t nestedsvm_gif_isset(struct vcpu *v);
 int nsvm_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa,
-                      unsigned int *page_order,
+                      unsigned int *page_order, uint8_t *p2m_acc,
                       bool_t access_r, bool_t access_w, bool_t access_x);
 
 #define NSVM_INTR_NOTHANDLED     3
diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h b/xen/include/asm-x86/hvm/vmx/vvmx.h
index 661cd8a..55c0ad1 100644
--- a/xen/include/asm-x86/hvm/vmx/vvmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vvmx.h
@@ -124,7 +124,7 @@ int nvmx_handle_vmxoff(struct cpu_user_regs *regs);
 
 int
 nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa,
-                      unsigned int *page_order,
+                      unsigned int *page_order, uint8_t *p2m_acc,
                       bool_t access_r, bool_t access_w, bool_t access_x);
 /*
  * Virtual VMCS layout
@@ -207,7 +207,7 @@ int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs,
 
 int nept_translate_l2ga(struct vcpu *v, paddr_t l2ga, 
                         unsigned int *page_order, uint32_t rwx_acc,
-                        unsigned long *l1gfn, uint64_t *exit_qual,
-                        uint32_t *exit_reason);
+                        unsigned long *l1gfn, uint8_t *p2m_acc,
+                        uint64_t *exit_qual, uint32_t *exit_reason);
 #endif /* __ASM_X86_HVM_VVMX_H__ */
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 08/10] nEPT: handle invept instruction from L1 VMM
  2012-12-19 19:44 [PATCH v2 00/10] Nested VMX: Add virtual EPT & VPID support to L1 VMM Xiantao Zhang
                   ` (6 preceding siblings ...)
  2012-12-19 19:44 ` [PATCH v2 07/10] nEPT: Use minimal permission for nested p2m Xiantao Zhang
@ 2012-12-19 19:44 ` Xiantao Zhang
  2012-12-19  8:28   ` Jan Beulich
  2012-12-19 19:44 ` [PATCH v2 09/10] nVMX: virutalize VPID capability to nested VMM Xiantao Zhang
  2012-12-19 19:44 ` [PATCH v2 10/10] nEPT: expost EPT & VPID capablities to L1 VMM Xiantao Zhang
  9 siblings, 1 reply; 18+ messages in thread
From: Xiantao Zhang @ 2012-12-19 19:44 UTC (permalink / raw)
  To: xen-devel; +Cc: keir, jun.nakajima, tim, eddie.dong, JBeulich, Zhang Xiantao

From: Zhang Xiantao <xiantao.zhang@intel.com>

Add the INVEPT instruction emulation logic.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
---
 xen/arch/x86/hvm/vmx/vmx.c         |    6 +++-
 xen/arch/x86/hvm/vmx/vvmx.c        |   39 ++++++++++++++++++++++++++++++++++++
 xen/arch/x86/mm/p2m.c              |    2 +-
 xen/include/asm-x86/hvm/vmx/vvmx.h |    1 +
 4 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index e5be5a2..7af92cc 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2572,11 +2572,13 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
         if ( nvmx_handle_vmresume(regs) == X86EMUL_OKAY )
             update_guest_eip();
         break;
-
+    case EXIT_REASON_INVEPT:
+        if ( nvmx_handle_invept(regs) == X86EMUL_OKAY )
+            update_guest_eip();
+        break;
     case EXIT_REASON_MWAIT_INSTRUCTION:
     case EXIT_REASON_MONITOR_INSTRUCTION:
     case EXIT_REASON_GETSEC:
-    case EXIT_REASON_INVEPT:
     case EXIT_REASON_INVVPID:
         /*
          * We should never exit on GETSEC because CR4.SMXE is always 0 when
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index cb2c6e7..b7c3639 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1356,6 +1356,45 @@ int nvmx_handle_vmwrite(struct cpu_user_regs *regs)
     return X86EMUL_OKAY;
 }
 
+int nvmx_handle_invept(struct cpu_user_regs *regs)
+{
+    struct vmx_inst_decoded decode;
+    unsigned long eptp;
+    u64 inv_type;
+
+    if ( !cpu_has_vmx_ept )
+        return X86EMUL_EXCEPTION;
+
+    if ( decode_vmx_inst(regs, &decode, &eptp, 0)
+             != X86EMUL_OKAY )
+        return X86EMUL_EXCEPTION;
+
+    inv_type = reg_read(regs, decode.reg2);
+    gdprintk(XENLOG_DEBUG,"inv_type:%ld, eptp:%lx\n", inv_type, eptp);
+
+    switch ( inv_type ) {
+    case INVEPT_SINGLE_CONTEXT:
+        {
+            struct p2m_domain *p2m = vcpu_nestedhvm(current).nv_p2m;
+            if ( p2m )
+            {
+	            p2m_flush(current, p2m);
+                ept_sync_domain(p2m);
+            }
+        }
+        break;
+    case INVEPT_ALL_CONTEXT:
+        p2m_flush_nestedp2m(current->domain);
+        __invept(INVEPT_ALL_CONTEXT, 0, 0);
+        break;
+    default:
+        return X86EMUL_EXCEPTION;
+    }
+    vmreturn(regs, VMSUCCEED);
+    return X86EMUL_OKAY;
+}
+
+
 #define __emul_value(enable1, default1) \
     ((enable1 | default1) << 32 | (default1))
 
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 1f59410..17903ee 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1465,7 +1465,7 @@ p2m_flush_table(struct p2m_domain *p2m)
 void
 p2m_flush(struct vcpu *v, struct p2m_domain *p2m)
 {
-    ASSERT(v->domain == p2m->domain);
+    ASSERT(p2m && v->domain == p2m->domain);
     vcpu_nestedhvm(v).nv_p2m = NULL;
     p2m_flush_table(p2m);
     hvm_asid_flush_vcpu(v);
diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h b/xen/include/asm-x86/hvm/vmx/vvmx.h
index 55c0ad1..cf5ed9a 100644
--- a/xen/include/asm-x86/hvm/vmx/vvmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vvmx.h
@@ -190,6 +190,7 @@ int nvmx_handle_vmread(struct cpu_user_regs *regs);
 int nvmx_handle_vmwrite(struct cpu_user_regs *regs);
 int nvmx_handle_vmresume(struct cpu_user_regs *regs);
 int nvmx_handle_vmlaunch(struct cpu_user_regs *regs);
+int nvmx_handle_invept(struct cpu_user_regs *regs);
 int nvmx_msr_read_intercept(unsigned int msr,
                                 u64 *msr_content);
 int nvmx_msr_write_intercept(unsigned int msr,
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 09/10] nVMX: virutalize VPID capability to nested VMM.
  2012-12-19 19:44 [PATCH v2 00/10] Nested VMX: Add virtual EPT & VPID support to L1 VMM Xiantao Zhang
                   ` (7 preceding siblings ...)
  2012-12-19 19:44 ` [PATCH v2 08/10] nEPT: handle invept instruction from L1 VMM Xiantao Zhang
@ 2012-12-19 19:44 ` Xiantao Zhang
  2012-12-19 19:44 ` [PATCH v2 10/10] nEPT: expost EPT & VPID capablities to L1 VMM Xiantao Zhang
  9 siblings, 0 replies; 18+ messages in thread
From: Xiantao Zhang @ 2012-12-19 19:44 UTC (permalink / raw)
  To: xen-devel; +Cc: keir, jun.nakajima, tim, eddie.dong, JBeulich, Zhang Xiantao

From: Zhang Xiantao <xiantao.zhang@intel.com>

Virtualize VPID for the nested vmm, use host's VPID
to emualte guest's VPID. For each virtual vmentry, if
guest'v vpid is changed, allocate a new host VPID for
L2 guest.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
---
 xen/arch/x86/hvm/vmx/vmx.c         |   10 +++++-
 xen/arch/x86/hvm/vmx/vvmx.c        |   55 +++++++++++++++++++++++++++++++++++-
 xen/include/asm-x86/hvm/vmx/vvmx.h |    2 +
 3 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 7af92cc..2144820 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2576,10 +2576,13 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
         if ( nvmx_handle_invept(regs) == X86EMUL_OKAY )
             update_guest_eip();
         break;
+    case EXIT_REASON_INVVPID:
+        if ( nvmx_handle_invvpid(regs) == X86EMUL_OKAY )
+            update_guest_eip();
+        break;
     case EXIT_REASON_MWAIT_INSTRUCTION:
     case EXIT_REASON_MONITOR_INSTRUCTION:
     case EXIT_REASON_GETSEC:
-    case EXIT_REASON_INVVPID:
         /*
          * We should never exit on GETSEC because CR4.SMXE is always 0 when
          * running in guest context, and the CPU checks that before getting
@@ -2697,8 +2700,11 @@ void vmx_vmenter_helper(void)
 
     if ( !cpu_has_vmx_vpid )
         goto out;
+    if ( nestedhvm_vcpu_in_guestmode(curr) )
+        p_asid = &vcpu_nestedhvm(curr).nv_n2asid;
+    else
+        p_asid = &curr->arch.hvm_vcpu.n1asid;
 
-    p_asid = &curr->arch.hvm_vcpu.n1asid;
     old_asid = p_asid->asid;
     need_flush = hvm_asid_handle_vmenter(p_asid);
     new_asid = p_asid->asid;
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index b7c3639..f2d7039 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -42,6 +42,7 @@ int nvmx_vcpu_initialise(struct vcpu *v)
 	goto out;
     }
     nvmx->ept.enabled = 0;
+    nvmx->guest_vpid = 0;
     nvmx->vmxon_region_pa = 0;
     nvcpu->nv_vvmcx = NULL;
     nvcpu->nv_vvmcxaddr = VMCX_EADDR;
@@ -848,6 +849,16 @@ static uint64_t get_shadow_eptp(struct vcpu *v)
     return ept_get_eptp(ept);
 }
 
+static bool_t nvmx_vpid_enabled(struct nestedvcpu *nvcpu)
+{
+    uint32_t second_cntl;
+
+    second_cntl = __get_vvmcs(nvcpu->nv_vvmcx, SECONDARY_VM_EXEC_CONTROL);
+    if ( second_cntl & SECONDARY_EXEC_ENABLE_VPID )
+        return 1;
+    return 0;
+}
+
 static void virtual_vmentry(struct cpu_user_regs *regs)
 {
     struct vcpu *v = current;
@@ -896,6 +907,18 @@ static void virtual_vmentry(struct cpu_user_regs *regs)
     if ( nestedhvm_paging_mode_hap(v) )
         __vmwrite(EPT_POINTER, get_shadow_eptp(v));
 
+    /* nested VPID support! */
+    if ( cpu_has_vmx_vpid && nvmx_vpid_enabled(nvcpu) )
+    {
+        struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
+        uint32_t new_vpid =  __get_vvmcs(vvmcs, VIRTUAL_PROCESSOR_ID);
+        if ( nvmx->guest_vpid != new_vpid )
+        {
+            hvm_asid_flush_vcpu_asid(&vcpu_nestedhvm(v).nv_n2asid);
+            nvmx->guest_vpid = new_vpid;
+        }
+    }
+
 }
 
 static void sync_vvmcs_guest_state(struct vcpu *v, struct cpu_user_regs *regs)
@@ -1187,7 +1210,7 @@ int nvmx_handle_vmlaunch(struct cpu_user_regs *regs)
     if ( vcpu_nestedhvm(v).nv_vvmcxaddr == VMCX_EADDR )
     {
         vmreturn (regs, VMFAIL_INVALID);
-        return X86EMUL_OKAY;        
+        return X86EMUL_OKAY;
     }
 
     launched = __get_vvmcs(vcpu_nestedhvm(v).nv_vvmcx,
@@ -1402,6 +1425,36 @@ int nvmx_handle_invept(struct cpu_user_regs *regs)
     (((__emul_value(enable1, default1) & host_value) & (~0ul << 32)) | \
     ((uint32_t)(__emul_value(enable1, default1) | host_value)))
 
+int nvmx_handle_invvpid(struct cpu_user_regs *regs)
+{
+    struct vmx_inst_decoded decode;
+    unsigned long vpid;
+    u64 inv_type;
+
+    if ( !cpu_has_vmx_vpid )
+        return X86EMUL_EXCEPTION;
+
+    if ( decode_vmx_inst(regs, &decode, &vpid, 0) != X86EMUL_OKAY )
+        return X86EMUL_EXCEPTION;
+
+    inv_type = reg_read(regs, decode.reg2);
+    gdprintk(XENLOG_DEBUG,"inv_type:%ld, vpid:%lx\n", inv_type, vpid);
+
+    switch ( inv_type ) {
+        /* Just invalidate all tlb entries for all types! */
+        case INVVPID_INDIVIDUAL_ADDR:
+        case INVVPID_SINGLE_CONTEXT:
+        case INVVPID_ALL_CONTEXT:
+            hvm_asid_flush_vcpu_asid(&vcpu_nestedhvm(current).nv_n2asid);
+            break;
+        default:
+            return X86EMUL_EXCEPTION;
+    }
+    vmreturn(regs, VMSUCCEED);
+
+    return X86EMUL_OKAY;
+}
+
 /*
  * Capability reporting
  */
diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h b/xen/include/asm-x86/hvm/vmx/vvmx.h
index cf5ed9a..28dd727 100644
--- a/xen/include/asm-x86/hvm/vmx/vvmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vvmx.h
@@ -37,6 +37,7 @@ struct nestedvmx {
         uint32_t exit_reason;
         uint32_t exit_qual;
     } ept;
+    uint32_t guest_vpid;
 };
 
 #define vcpu_2_nvmx(v)	(vcpu_nestedhvm(v).u.nvmx)
@@ -191,6 +192,7 @@ int nvmx_handle_vmwrite(struct cpu_user_regs *regs);
 int nvmx_handle_vmresume(struct cpu_user_regs *regs);
 int nvmx_handle_vmlaunch(struct cpu_user_regs *regs);
 int nvmx_handle_invept(struct cpu_user_regs *regs);
+int nvmx_handle_invvpid(struct cpu_user_regs *regs);
 int nvmx_msr_read_intercept(unsigned int msr,
                                 u64 *msr_content);
 int nvmx_msr_write_intercept(unsigned int msr,
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 10/10] nEPT: expost EPT & VPID capablities to L1 VMM
  2012-12-19 19:44 [PATCH v2 00/10] Nested VMX: Add virtual EPT & VPID support to L1 VMM Xiantao Zhang
                   ` (8 preceding siblings ...)
  2012-12-19 19:44 ` [PATCH v2 09/10] nVMX: virutalize VPID capability to nested VMM Xiantao Zhang
@ 2012-12-19 19:44 ` Xiantao Zhang
  9 siblings, 0 replies; 18+ messages in thread
From: Xiantao Zhang @ 2012-12-19 19:44 UTC (permalink / raw)
  To: xen-devel; +Cc: keir, jun.nakajima, tim, eddie.dong, JBeulich, Zhang Xiantao

From: Zhang Xiantao <xiantao.zhang@intel.com>

Expose EPT's  and VPID 's basic features to L1 VMM.
For EPT, no EPT A/D bit feature supported.
For VPID, exposes all features to L1 VMM

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
---
 xen/arch/x86/hvm/vmx/vvmx.c        |   17 +++++++++++++++--
 xen/arch/x86/mm/hap/nested_ept.c   |   19 ++++++++++++-------
 xen/include/asm-x86/hvm/vmx/vvmx.h |    2 ++
 3 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index f2d7039..0da81e3 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1485,6 +1485,8 @@ int nvmx_msr_read_intercept(unsigned int msr, u64 *msr_content)
         break;
     case MSR_IA32_VMX_PROCBASED_CTLS:
     case MSR_IA32_VMX_TRUE_PROCBASED_CTLS:
+    {
+        u32 default1_bits = VMX_PROCBASED_CTLS_DEFAULT1;
         /* 1-seetings */
         data = CPU_BASED_HLT_EXITING |
                CPU_BASED_VIRTUAL_INTR_PENDING |
@@ -1506,12 +1508,20 @@ int nvmx_msr_read_intercept(unsigned int msr, u64 *msr_content)
                CPU_BASED_PAUSE_EXITING |
                CPU_BASED_RDPMC_EXITING |
                CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
-        data = gen_vmx_msr(data, VMX_PROCBASED_CTLS_DEFAULT1, host_data);
+
+        if ( msr == MSR_IA32_VMX_TRUE_PROCBASED_CTLS )
+            default1_bits &= ~(CPU_BASED_CR3_LOAD_EXITING |
+                    CPU_BASED_CR3_STORE_EXITING | CPU_BASED_INVLPG_EXITING);
+
+        data = gen_vmx_msr(data, default1_bits, host_data);
         break;
+    }
     case MSR_IA32_VMX_PROCBASED_CTLS2:
         /* 1-seetings */
         data = SECONDARY_EXEC_DESCRIPTOR_TABLE_EXITING |
-               SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
+               SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
+               SECONDARY_EXEC_ENABLE_VPID |
+               SECONDARY_EXEC_ENABLE_EPT;
         data = gen_vmx_msr(data, 0, host_data);
         break;
     case MSR_IA32_VMX_EXIT_CTLS:
@@ -1564,6 +1574,9 @@ int nvmx_msr_read_intercept(unsigned int msr, u64 *msr_content)
     case MSR_IA32_VMX_MISC:
         gdprintk(XENLOG_WARNING, "VMX MSR %x not fully supported yet.\n", msr);
         break;
+    case MSR_IA32_VMX_EPT_VPID_CAP:
+        data = nept_get_ept_vpid_cap();
+        break;
     default:
         r = 0;
         break;
diff --git a/xen/arch/x86/mm/hap/nested_ept.c b/xen/arch/x86/mm/hap/nested_ept.c
index 4b99281..5b60f37 100644
--- a/xen/arch/x86/mm/hap/nested_ept.c
+++ b/xen/arch/x86/mm/hap/nested_ept.c
@@ -43,12 +43,15 @@
 #define EPT_MUST_RSV_BITS (((1ull << PADDR_BITS) -1) & \
                      ~((1ull << paddr_bits) - 1))
 
-/*
- *TODO: Just leave it as 0 here for compile pass, will
- * define real capabilities in the subsequent patches.
- */
-#define NEPT_VPID_CAP_BITS 0
-
+#define NEPT_VPID_CAP_BITS  \
+        (VMX_EPT_INVEPT_ALL_CONTEXT | VMX_EPT_INVEPT_SINGLE_CONTEXT |   \
+        VMX_EPT_INVEPT_INSTRUCTION | VMX_EPT_SUPERPAGE_1GB |            \
+        VMX_EPT_SUPERPAGE_2MB | VMX_EPT_MEMORY_TYPE_WB |                \
+        VMX_EPT_MEMORY_TYPE_UC | VMX_EPT_WALK_LENGTH_4_SUPPORTED |      \
+        VMX_EPT_EXEC_ONLY_SUPPORTED | VMX_VPID_INVVPID_INSTRUCTION |    \
+        VMX_VPID_INVVPID_INDIVIDUAL_ADDR |                              \
+        VMX_VPID_INVVPID_SINGLE_CONTEXT | VMX_VPID_INVVPID_ALL_CONTEXT |\
+        VMX_VPID_INVVPID_SINGLE_CONTEXT_RETAINING_GLOBAL)
 
 #define NEPT_1G_ENTRY_FLAG (1 << 11)
 #define NEPT_2M_ENTRY_FLAG (1 << 10)
@@ -129,7 +132,9 @@ static bool_t nept_non_present_check(ept_entry_t e)
 
 uint64_t nept_get_ept_vpid_cap(void)
 {
-    return NEPT_VPID_CAP_BITS;
+    if ( cpu_has_vmx_ept && cpu_has_vmx_vpid )
+        return NEPT_VPID_CAP_BITS;
+    return 0;
 }
 
 static int ept_lvl_table_offset(unsigned long gpa, int lvl)
diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h b/xen/include/asm-x86/hvm/vmx/vvmx.h
index 28dd727..1e7a6d7 100644
--- a/xen/include/asm-x86/hvm/vmx/vvmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vvmx.h
@@ -208,6 +208,8 @@ u64 nvmx_get_tsc_offset(struct vcpu *v);
 int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs,
                           unsigned int exit_reason);
 
+uint64_t nept_get_ept_vpid_cap(void);
+
 int nept_translate_l2ga(struct vcpu *v, paddr_t l2ga, 
                         unsigned int *page_order, uint32_t rwx_acc,
                         unsigned long *l1gfn, uint8_t *p2m_acc,
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 05/10] nEPT: Try to enable EPT paging for L2 guest.
  2012-12-19 17:16   ` Nakajima, Jun
@ 2012-12-20  1:27     ` Zhang, Xiantao
  0 siblings, 0 replies; 18+ messages in thread
From: Zhang, Xiantao @ 2012-12-20  1:27 UTC (permalink / raw)
  To: Nakajima, Jun; +Cc: keir, Dong, Eddie, tim, xen-devel, JBeulich, Zhang, Xiantao

Hi, Jun
Thanks,  I will update the patches according to your comments. 
Xiantao

> -----Original Message-----
> From: Nakajima, Jun [mailto:jun.nakajima@intel.com]
> Sent: Thursday, December 20, 2012 1:16 AM
> To: Zhang, Xiantao
> Cc: xen-devel@lists.xen.org; Dong, Eddie; keir@xen.org; JBeulich@suse.com;
> tim@xen.org
> Subject: Re: [PATCH v2 05/10] nEPT: Try to enable EPT paging for L2 guest.
> 
> Minor comments below.
> 
> On Wed, Dec 19, 2012 at 11:44 AM, Xiantao Zhang <xiantao.zhang@intel.com>
> wrote:
> > From: Zhang Xiantao <xiantao.zhang@intel.com>
> >
> > Once found EPT is enabled by L1 VMM, enabled nested EPT support for L2
> > guest.
> >
> > Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
> > ---
> >  xen/arch/x86/hvm/vmx/vmx.c         |   16 +++++++++--
> >  xen/arch/x86/hvm/vmx/vvmx.c        |   48
> +++++++++++++++++++++++++++--------
> >  xen/include/asm-x86/hvm/vmx/vvmx.h |    5 +++-
> >  3 files changed, 54 insertions(+), 15 deletions(-)
> >
> > diff --git a/xen/arch/x86/hvm/vmx/vmx.c
> b/xen/arch/x86/hvm/vmx/vmx.c
> > index d74aae0..e5be5a2 100644
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -1461,6 +1461,7 @@ static struct hvm_function_table __read_mostly
> vmx_function_table = {
> >      .nhvm_vcpu_guestcr3   = nvmx_vcpu_guestcr3,
> >      .nhvm_vcpu_p2m_base   = nvmx_vcpu_eptp_base,
> >      .nhvm_vcpu_asid       = nvmx_vcpu_asid,
> > +    .nhvm_vmcx_hap_enabled = nvmx_ept_enabled,
> >      .nhvm_vmcx_guest_intercepts_trap = nvmx_intercepts_exception,
> >      .nhvm_vcpu_vmexit_trap = nvmx_vmexit_trap,
> >      .nhvm_intr_blocked    = nvmx_intr_blocked,
> > @@ -2003,6 +2004,7 @@ static void ept_handle_violation(unsigned long
> qualification, paddr_t gpa)
> >      unsigned long gla, gfn = gpa >> PAGE_SHIFT;
> >      mfn_t mfn;
> >      p2m_type_t p2mt;
> > +    int ret;
> >      struct domain *d = current->domain;
> >
> >      if ( tb_init_done )
> > @@ -2017,18 +2019,26 @@ static void ept_handle_violation(unsigned long
> qualification, paddr_t gpa)
> >          _d.gpa = gpa;
> >          _d.qualification = qualification;
> >          _d.mfn = mfn_x(get_gfn_query_unlocked(d, gfn, &_d.p2mt));
> > -
> > +
> >          __trace_var(TRC_HVM_NPF, 0, sizeof(_d), &_d);
> >      }
> >
> > -    if ( hvm_hap_nested_page_fault(gpa,
> > +    ret = hvm_hap_nested_page_fault(gpa,
> >                                     qualification & EPT_GLA_VALID       ? 1 : 0,
> >                                     qualification & EPT_GLA_VALID
> >                                       ? __vmread(GUEST_LINEAR_ADDRESS) : ~0ull,
> >                                     qualification & EPT_READ_VIOLATION  ? 1 : 0,
> >                                     qualification & EPT_WRITE_VIOLATION ? 1 : 0,
> > -                                   qualification & EPT_EXEC_VIOLATION  ? 1 : 0) )
> > +                                   qualification & EPT_EXEC_VIOLATION  ? 1 : 0);
> > +    switch ( ret ) {
> > +    case 0:
> > +        break;
> > +    case 1:
> >          return;
> > +    case -1:
> > +        vcpu_nestedhvm(current).nv_vmexit_pending = 1;
> 
> I think we should add some comments for this case (e.g. what it means, what
> to do).
> 
> 
> > +        return;
> > +    }
> >
> >      /* Everything else is an error. */
> >      mfn = get_gfn_query_unlocked(d, gfn, &p2mt); diff --git
> > a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index
> > 76cf757..c100730 100644
> > --- a/xen/arch/x86/hvm/vmx/vvmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> > @@ -41,6 +41,7 @@ int nvmx_vcpu_initialise(struct vcpu *v)
> >          gdprintk(XENLOG_ERR, "nest: allocation for shadow vmcs failed\n");
> >         goto out;
> >      }
> > +    nvmx->ept.enabled = 0;
> >      nvmx->vmxon_region_pa = 0;
> >      nvcpu->nv_vvmcx = NULL;
> >      nvcpu->nv_vvmcxaddr = VMCX_EADDR; @@ -96,9 +97,11 @@ uint64_t
> > nvmx_vcpu_guestcr3(struct vcpu *v)
> >
> >  uint64_t nvmx_vcpu_eptp_base(struct vcpu *v)  {
> > -    /* TODO */
> > -    ASSERT(0);
> > -    return 0;
> > +    uint64_t eptp_base;
> > +    struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
> > +
> > +    eptp_base = __get_vvmcs(nvcpu->nv_vvmcx, EPT_POINTER);
> > +    return eptp_base & PAGE_MASK;
> >  }
> >
> >  uint32_t nvmx_vcpu_asid(struct vcpu *v) @@ -108,6 +111,13 @@ uint32_t
> > nvmx_vcpu_asid(struct vcpu *v)
> >      return 0;
> >  }
> >
> > +bool_t nvmx_ept_enabled(struct vcpu *v) {
> > +    struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
> > +
> > +    return !!(nvmx->ept.enabled);
> > +}
> > +
> >  static const enum x86_segment sreg_to_index[] = {
> >      [VMX_SREG_ES] = x86_seg_es,
> >      [VMX_SREG_CS] = x86_seg_cs,
> > @@ -503,14 +513,16 @@ void nvmx_update_exec_control(struct vcpu *v,
> > u32 host_cntrl)  }
> >
> >  void nvmx_update_secondary_exec_control(struct vcpu *v,
> > -                                            unsigned long value)
> > +                                            unsigned long host_cntrl)
> >  {
> >      u32 shadow_cntrl;
> >      struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
> > +    struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
> >
> >      shadow_cntrl = __get_vvmcs(nvcpu->nv_vvmcx,
> SECONDARY_VM_EXEC_CONTROL);
> > -    shadow_cntrl |= value;
> > -    set_shadow_control(v, SECONDARY_VM_EXEC_CONTROL,
> shadow_cntrl);
> > +    nvmx->ept.enabled = !!(shadow_cntrl &
> SECONDARY_EXEC_ENABLE_EPT);
> > +    shadow_cntrl |= host_cntrl;
> > +    __vmwrite(SECONDARY_VM_EXEC_CONTROL, shadow_cntrl);
> >  }
> >
> >  static void nvmx_update_pin_control(struct vcpu *v, unsigned long
> > host_cntrl) @@ -818,6 +830,17 @@ static void
> load_shadow_guest_state(struct vcpu *v)
> >      /* TODO: CR3 target control */
> >  }
> >
> > +
> > +static uint64_t get_shadow_eptp(struct vcpu *v) {
> > +    uint64_t np2m_base = nvmx_vcpu_eptp_base(v);
> > +    struct p2m_domain *p2m = p2m_get_nestedp2m(v, np2m_base);
> > +    struct ept_data *ept = &p2m->ept;
> > +
> > +    ept->asr = pagetable_get_pfn(p2m_get_pagetable(p2m));
> > +    return ept_get_eptp(ept);
> > +}
> > +
> >  static void virtual_vmentry(struct cpu_user_regs *regs)  {
> >      struct vcpu *v = current;
> > @@ -862,7 +885,10 @@ static void virtual_vmentry(struct cpu_user_regs
> *regs)
> >      /* updating host cr0 to sync TS bit */
> >      __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0);
> >
> > -    /* TODO: EPT_POINTER */
> > +    /* Setup virtual ETP for L2 guest*/
> > +    if ( nestedhvm_paging_mode_hap(v) )
> > +        __vmwrite(EPT_POINTER, get_shadow_eptp(v));
> > +
> >  }
> >
> >  static void sync_vvmcs_guest_state(struct vcpu *v, struct
> > cpu_user_regs *regs) @@ -915,8 +941,8 @@ static void
> sync_vvmcs_ro(struct vcpu *v)
> >      /* Adjust exit_reason/exit_qualifciation for violation case */
> >      if ( __get_vvmcs(vvmcs, VM_EXIT_REASON) ==
> >                  EXIT_REASON_EPT_VIOLATION ) {
> > -        __set_vvmcs(vvmcs, EXIT_QUALIFICATION, nvmx-
> >ept_exit.exit_qual);
> > -        __set_vvmcs(vvmcs, VM_EXIT_REASON, nvmx-
> >ept_exit.exit_reason);
> > +        __set_vvmcs(vvmcs, EXIT_QUALIFICATION, nvmx->ept.exit_qual);
> > +        __set_vvmcs(vvmcs, VM_EXIT_REASON, nvmx->ept.exit_reason);
> >      }
> >  }
> >
> > @@ -1480,8 +1506,8 @@ nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t
> L2_gpa, paddr_t *L1_gpa,
> >          case EPT_TRANSLATE_VIOLATION:
> >          case EPT_TRANSLATE_MISCONFIG:
> >              rc = NESTEDHVM_PAGEFAULT_INJECT;
> > -            nvmx->ept_exit.exit_reason = exit_reason;
> > -            nvmx->ept_exit.exit_qual = exit_qual;
> > +            nvmx->ept.exit_reason = exit_reason;
> > +            nvmx->ept.exit_qual = exit_qual;
> >              break;
> >          case EPT_TRANSLATE_RETRY:
> >              rc = NESTEDHVM_PAGEFAULT_RETRY; diff --git
> > a/xen/include/asm-x86/hvm/vmx/vvmx.h
> > b/xen/include/asm-x86/hvm/vmx/vvmx.h
> > index 8eb377b..661cd8a 100644
> > --- a/xen/include/asm-x86/hvm/vmx/vvmx.h
> > +++ b/xen/include/asm-x86/hvm/vmx/vvmx.h
> > @@ -33,9 +33,10 @@ struct nestedvmx {
> >          u32           error_code;
> >      } intr;
> >      struct {
> > +        char     enabled;
> 
> I think we should use boot_t not char.
> 
> >          uint32_t exit_reason;
> >          uint32_t exit_qual;
> > -    } ept_exit;
> > +    } ept;
> >  };
> >
> >  #define vcpu_2_nvmx(v) (vcpu_nestedhvm(v).u.nvmx) @@ -110,6 +111,8
> @@
> > int nvmx_intercepts_exception(struct vcpu *v,
> >                                unsigned int trap, int error_code);
> > void nvmx_domain_relinquish_resources(struct domain *d);
> >
> > +bool_t nvmx_ept_enabled(struct vcpu *v);
> > +
> >  int nvmx_handle_vmxon(struct cpu_user_regs *regs);  int
> > nvmx_handle_vmxoff(struct cpu_user_regs *regs);
> >
> > --
> > 1.7.1
> >
> 
> 
> 
> --
> Jun
> Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 08/10] nEPT: handle invept instruction from L1 VMM
  2012-12-19  8:28   ` Jan Beulich
@ 2012-12-20  2:39     ` Zhang, Xiantao
  0 siblings, 0 replies; 18+ messages in thread
From: Zhang, Xiantao @ 2012-12-20  2:39 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, Dong, Eddie, tim, xen-devel, Nakajima, Jun, Zhang, Xiantao

Thanks, Jan! 
>>: handle invept instruction from L1 VMM
> 
> >>> On 19.12.12 at 20:44, Xiantao Zhang <xiantao.zhang@intel.com> wrote:
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -2572,11 +2572,13 @@ void vmx_vmexit_handler(struct
> cpu_user_regs *regs)
> >          if ( nvmx_handle_vmresume(regs) == X86EMUL_OKAY )
> >              update_guest_eip();
> >          break;
> > -
> > +    case EXIT_REASON_INVEPT:
> > +        if ( nvmx_handle_invept(regs) == X86EMUL_OKAY )
> > +            update_guest_eip();
> > +        break;
> 
> In (potentially going to become) long switch statements, please don't drop
> the blank lines between individual cases - instead of dropping the line here,
> you wold want to insert another one below the new separately handled case.

Okay. 

> >      case EXIT_REASON_MWAIT_INSTRUCTION:
> >      case EXIT_REASON_MONITOR_INSTRUCTION:
> >      case EXIT_REASON_GETSEC:
> > -    case EXIT_REASON_INVEPT:
> >      case EXIT_REASON_INVVPID:
> >          /*
> >           * We should never exit on GETSEC because CR4.SMXE is always
> > 0 when
> > --- a/xen/arch/x86/hvm/vmx/vvmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> > @@ -1356,6 +1356,45 @@ int nvmx_handle_vmwrite(struct cpu_user_regs
> *regs)
> >      return X86EMUL_OKAY;
> >  }
> >
> > +int nvmx_handle_invept(struct cpu_user_regs *regs) {
> > +    struct vmx_inst_decoded decode;
> > +    unsigned long eptp;
> > +    u64 inv_type;
> > +
> > +    if ( !cpu_has_vmx_ept )
> > +        return X86EMUL_EXCEPTION;
> > +
> > +    if ( decode_vmx_inst(regs, &decode, &eptp, 0)
> > +             != X86EMUL_OKAY )
> > +        return X86EMUL_EXCEPTION;
> > +
> > +    inv_type = reg_read(regs, decode.reg2);
> > +    gdprintk(XENLOG_DEBUG,"inv_type:%ld, eptp:%lx\n", inv_type,
> > + eptp);
> 
> An unconditional printk() on an operation potentially happening quite
> frequently? Even with XENLOG_DEBUG this is not acceptable imo.

Okay, I will remove it. 

> > +
> > +    switch ( inv_type ) {
> > +    case INVEPT_SINGLE_CONTEXT:
> > +        {
> > +            struct p2m_domain *p2m = vcpu_nestedhvm(current).nv_p2m;
> > +            if ( p2m )
> > +            {
> > +	            p2m_flush(current, p2m);
> 
> Despite your comment in 00/10, there still is a whitespace issues at least here
> (didn't look that closely elsewhere).

Fixed.

> > +                ept_sync_domain(p2m);
> > +            }
> > +        }
> > +        break;
> > +    case INVEPT_ALL_CONTEXT:
> > +        p2m_flush_nestedp2m(current->domain);
> > +        __invept(INVEPT_ALL_CONTEXT, 0, 0);
> > +        break;
> > +    default:
> > +        return X86EMUL_EXCEPTION;
> > +    }
> > +    vmreturn(regs, VMSUCCEED);
> > +    return X86EMUL_OKAY;
> > +}
> > +
> > +
> >  #define __emul_value(enable1, default1) \
> >      ((enable1 | default1) << 32 | (default1))
> >
> > --- a/xen/arch/x86/mm/p2m.c
> > +++ b/xen/arch/x86/mm/p2m.c
> > @@ -1465,7 +1465,7 @@ p2m_flush_table(struct p2m_domain *p2m)  void
> > p2m_flush(struct vcpu *v, struct p2m_domain *p2m)  {
> > -    ASSERT(v->domain == p2m->domain);
> > +    ASSERT(p2m && v->domain == p2m->domain);
> 
> How is this change related to the rest of the patch? 
I will remove it, and let caller check whether p2m is NULL.  Originally,  this is to fix a Xen booting issue. 
Xiantao

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2012-12-20  2:39 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-19 19:44 [PATCH v2 00/10] Nested VMX: Add virtual EPT & VPID support to L1 VMM Xiantao Zhang
2012-12-19 19:44 ` [PATCH v2 01/10] nestedhap: Change hostcr3 and p2m->cr3 to meaningful words Xiantao Zhang
2012-12-19  8:19   ` Jan Beulich
2012-12-19  8:40     ` Zhang, Xiantao
2012-12-19 19:44 ` [PATCH v2 02/10] nestedhap: Change nested p2m's walker to vendor-specific Xiantao Zhang
2012-12-19 19:44 ` [PATCH v2 03/10] nested_ept: Implement guest ept's walker Xiantao Zhang
2012-12-19 16:42   ` Nakajima, Jun
2012-12-19 19:44 ` [PATCH v2 04/10] EPT: Make ept data structure or operations neutral Xiantao Zhang
2012-12-19 19:44 ` [PATCH v2 05/10] nEPT: Try to enable EPT paging for L2 guest Xiantao Zhang
2012-12-19 17:16   ` Nakajima, Jun
2012-12-20  1:27     ` Zhang, Xiantao
2012-12-19 19:44 ` [PATCH v2 06/10] nEPT: Sync PDPTR fields if L2 guest in PAE paging mode Xiantao Zhang
2012-12-19 19:44 ` [PATCH v2 07/10] nEPT: Use minimal permission for nested p2m Xiantao Zhang
2012-12-19 19:44 ` [PATCH v2 08/10] nEPT: handle invept instruction from L1 VMM Xiantao Zhang
2012-12-19  8:28   ` Jan Beulich
2012-12-20  2:39     ` Zhang, Xiantao
2012-12-19 19:44 ` [PATCH v2 09/10] nVMX: virutalize VPID capability to nested VMM Xiantao Zhang
2012-12-19 19:44 ` [PATCH v2 10/10] nEPT: expost EPT & VPID capablities to L1 VMM Xiantao Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.