All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Shuai Ruan <shuai.ruan@linux.intel.com>, xen-devel@lists.xen.org
Cc: kevin.tian@intel.com, wei.liu2@citrix.com,
	Ian.Campbell@citrix.com, stefano.stabellini@eu.citrix.com,
	eddie.dong@intel.com, ian.jackson@eu.citrix.com,
	jbeulich@suse.com, jun.nakajima@intel.com, keir@xen.org
Subject: Re: [PATCH V3 1/6] x86/xsaves: enable xsaves/xrstors for pv guest
Date: Wed, 5 Aug 2015 18:51:29 +0100	[thread overview]
Message-ID: <55C24D21.50209@citrix.com> (raw)
In-Reply-To: <1438739842-31658-2-git-send-email-shuai.ruan@linux.intel.com>

On 05/08/15 02:57, Shuai Ruan wrote:
> This patch emualtes xsaves/xrstors instructions and
> XSS msr access.
>
> As xsaves/xrstors instructions and XSS msr access
> required be executed only in ring0. So emulation are
> needed when pv guest uses these instructions.
>
> Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com>
> ---
>  xen/arch/x86/domain.c           |   3 +
>  xen/arch/x86/traps.c            | 138 ++++++++++++++++++++++++++++++++++++++++
>  xen/arch/x86/x86_64/mm.c        |  52 +++++++++++++++
>  xen/arch/x86/xstate.c           |  39 ++++++++++++
>  xen/include/asm-x86/domain.h    |   1 +
>  xen/include/asm-x86/mm.h        |   1 +
>  xen/include/asm-x86/msr-index.h |   2 +
>  xen/include/asm-x86/xstate.h    |   3 +
>  8 files changed, 239 insertions(+)
>
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index 045f6ff..e8b8d67 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -426,6 +426,7 @@ int vcpu_initialise(struct vcpu *v)
>  
>      /* By default, do not emulate */
>      v->arch.vm_event.emulate_flags = 0;
> +    v->arch.msr_ia32_xss = 0;

The backing memory for struct vcpu is zeroed when allocated.  There is
no need to explicitly zero this field here.

>  
>      rc = mapcache_vcpu_init(v);
>      if ( rc )
> @@ -1529,6 +1530,8 @@ static void __context_switch(void)
>              if ( xcr0 != get_xcr0() && !set_xcr0(xcr0) )
>                  BUG();
>          }
> +        if ( cpu_has_xsaves )
> +            wrmsr_safe(MSR_IA32_XSS, n->arch.msr_ia32_xss);

This musn't throw away potential errors.  It should not be possible for
n->arch.msr_ia32_xss to be invalid by this point, so a straight wrmsr()
would be correct.

However, you will want to implement lazy context switching, exactly like
get/set_xcr0().

>          vcpu_restore_fpu_eager(n);
>          n->arch.ctxt_switch_to(n);
>      }
> diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
> index 6a03582..c1fea77 100644
> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -2353,6 +2353,131 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
>          }
>          break;
>  
> +    case 0xc7:

This case should be sorted numerically, so should be between CPUID and
default.

> +    {
> +        void *xsave_addr;
> +        int not_page_aligned = 0;

bool_t

> +        u32 guest_xsaves_size = xstate_ctxt_size_compact(v->arch.xcr0);
> +
> +        switch ( insn_fetch(u8, code_base, eip, code_limit) )
> +        {
> +            case 0x2f:/* XSAVES */
> +            {
> +                if ( !cpu_has_xsaves || !(v->arch.pv_vcpu.ctrlreg[4] &
> +                                          X86_CR4_OSXSAVE))

I would format this as

if ( !cpu_has_xsaves ||
     !(v->arch.pv_vcpu.ctrlreg[4] & X86_CR4_OSXSAVE) )

to associate the bit test more clearly.

> +                {
> +                    do_guest_trap(TRAP_invalid_op, regs, 0);
> +                    goto skip;
> +                }
> +
> +                if ( v->arch.pv_vcpu.ctrlreg[0] & X86_CR0_TS )
> +                {
> +                    do_guest_trap(TRAP_nmi, regs, 0);

TRAP_no_device surely?

> +                    goto skip;
> +                }
> +
> +                if ( !guest_kernel_mode(v, regs) || (regs->edi & 0x3f) )

What does edi have to do with xsaves?  only edx:eax are special
according to the manual.

> +                    goto fail;
> +
> +                if ( (regs->edi & ~PAGE_MASK) + guest_xsaves_size > PAGE_SIZE )
> +                {
> +                    mfn_t mfn_list[2];
> +                    void *va;

This fails to account for the xsaves size growing to > PAGE_SIZE in
future processors.

/* TODO - expand for future processors .*/
BUG_ON(guest_xsaves_size <= PAGE_SIZE);

might be acceptable.  However, it is better fixed by...

> +
> +                    not_page_aligned = 1;
> +                    mfn_list[0] = _mfn(do_page_walk_mfn(v, regs->edi));
> +                    mfn_list[1] = _mfn(do_page_walk_mfn(v,
> +                                       PAGE_ALIGN(regs->edi)));
> +
> +                    va = __vmap(mfn_list, 1, 2, PAGE_SIZE, PAGE_HYPERVISOR);
> +                    xsave_addr = (void *)((unsigned long)va +
> +                                         (regs->edi & ~PAGE_MASK));

... introducing a new helper such as

void *vmap_guest_linear(void *va, size_t bytes,, uint32_t PFEC);

which takes care of the internal details of making a linear area of
guest virtual address space appear linear in Xen virtual address space
as well.

You also need to take care to respect non-writable pages, and have an
access_ok() check or a guest could (ab)use this emulation to write to
Xen code/data areas.

> +                }
> +                else
> +                    xsave_addr = do_page_walk(v, regs->edi);
> +
> +                if ( !xsave_addr )
> +                    goto fail;
> +
> +                xsaves(regs->eax, regs->edx, xsave_addr);
> +
> +                if ( not_page_aligned )
> +                    vunmap((void *)((unsigned long)xsave_addr & PAGE_MASK));
> +                else
> +                    unmap_domain_page(xsave_addr);
> +                break;
> +            }

Blank line here please.

> +            case 0x1f:/* XRSTORS */
> +            {
> +                struct xsave_struct guest_xsave_area;

Not on the stack please, but I don't believe you need this.

> +
> +                if ( !cpu_has_xsaves || !(v->arch.pv_vcpu.ctrlreg[4] &
> +                                          X86_CR4_OSXSAVE))
> +                {
> +                    do_guest_trap(TRAP_invalid_op, regs, 0);
> +                    goto skip;
> +                }
> +
> +                if ( v->arch.pv_vcpu.ctrlreg[0] & X86_CR0_TS )
> +                {
> +                    do_guest_trap(TRAP_nmi, regs, 0);
> +                    goto skip;
> +                }
> +
> +                if ( !guest_kernel_mode(v, regs) || (regs->edi & 0x3f) )
> +                    goto fail;
> +
> +                if ( (rc = copy_from_user(&guest_xsave_area, (void *) regs->edi,
> +                                          sizeof(struct xsave_struct))) !=0 )
> +                {
> +                    propagate_page_fault(regs->edi +
> +				         sizeof(struct xsave_struct) - rc, 0);
> +                    goto skip;

Surely you just need the xstate_bv and xcomp_bv ?

> +                }
> +                else
> +                if ( !(guest_xsave_area.xsave_hdr.xcomp_bv & 1l << 63) ||

1l << 63 is undefined behaviour by altering the sign bit with a shift. 
Perhaps you meant 1ul << 63 ?

> +                     (guest_xsave_area.xsave_hdr.xstate_bv |
> +                     guest_xsave_area.xsave_hdr.xcomp_bv) !=
> +                     guest_xsave_area.xsave_hdr.xcomp_bv ||

xcomp_bv is not introduced until patch 2, which means that this patch
doesn't build.  Please make absolutely sure that the entire series is
bisectable.

> +                     ((guest_xsave_area.xsave_hdr.xcomp_bv & ~(1l << 63)) |
> +                     v->arch.xcr0 | v->arch.msr_ia32_xss ) !=
> +                     (v->arch.xcr0 | v->arch.msr_ia32_xss) )
> +                    goto fail;
> +
> +                if ( (regs->edi & ~PAGE_MASK) + guest_xsaves_size > PAGE_SIZE )
> +                {
> +                    mfn_t mfn_list[2];
> +                    void *va;
> +
> +                    not_page_aligned = 1;
> +                    mfn_list[0] = _mfn(do_page_walk_mfn(v, regs->edi));
> +                    mfn_list[1] = _mfn(do_page_walk_mfn(v,
> +                                       PAGE_ALIGN(regs->edi)));
> +
> +                    va = __vmap(mfn_list, 1, 2, PAGE_SIZE, PAGE_HYPERVISOR);
> +                    xsave_addr = (void *)((unsigned long)va +
> +                                         (regs->edi & ~PAGE_MASK));
> +                }
> +                else
> +                    xsave_addr = do_page_walk(v, regs->edi);
> +
> +                if ( !xsave_addr )
> +                    goto fail;
> +
> +                xrstors(regs->eax, regs->edx, xsave_addr);
> +
> +                if ( not_page_aligned )
> +                    vunmap((void *)((unsigned long)xsave_addr & PAGE_MASK));
> +                else
> +                    unmap_domain_page(xsave_addr);
> +                break;
> +            }
> +            default:
> +                goto fail;
> +        }
> +        break;
> +    }
> +
>      case 0x06: /* CLTS */
>          (void)do_fpu_taskswitch(0);
>          break;
> @@ -2663,6 +2788,13 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
>                  }
>                  break;
>              }
> +        case MSR_IA32_XSS:
> +            if ( !cpu_has_xsaves )
> +                goto fail;
> +            if ( wrmsr_safe(regs->ecx, msr_content) != 0 )
> +                goto fail;

You can join these two if() statements together.  Also, it is
technically regs->_ecx.

> +            v->arch.msr_ia32_xss = msr_content;
> +            break;
>              /*FALLTHROUGH*/

Observe the comment which indicates that the previous case: deliberately
falls through to default:

>  
>          default:
> @@ -2798,6 +2930,12 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
>                  }
>                  break;
>              }
> +        case MSR_IA32_XSS:
> +            if ( !cpu_has_xsaves )
> +                goto fail;
> +            regs->eax = v->arch.msr_ia32_xss;
> +            regs->edx = v->arch.msr_ia32_xss >> 32;

val = v->arch.msr_ia32_xss;
goto rdmsr_writeback;

This code currently incorrectly clobbers the upper 32 bits of rax.

> +            break;
>              /*FALLTHROUGH*/

Again - another bad fall through case.

>  
>          default:
> diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
> index 98310f3..de94ac1 100644
> --- a/xen/arch/x86/x86_64/mm.c
> +++ b/xen/arch/x86/x86_64/mm.c
> @@ -48,6 +48,58 @@ l2_pgentry_t __section(".bss.page_aligned") l2_bootmap[L2_PAGETABLE_ENTRIES];
>  
>  l2_pgentry_t *compat_idle_pg_table_l2;
>  
> +unsigned long do_page_walk_mfn(struct vcpu *v, unsigned long addr)

What is this function?  Why is it useful?  Something like this belongs
in its own patch along with a description of why it is being introduced.

> +{
> +    unsigned long mfn = pagetable_get_pfn(v->arch.guest_table);
> +    l4_pgentry_t l4e, *l4t;
> +    l3_pgentry_t l3e, *l3t;
> +    l2_pgentry_t l2e, *l2t;
> +    l1_pgentry_t l1e, *l1t;
> +
> +    if ( !is_pv_vcpu(v) || !is_canonical_address(addr) )
> +        return 0;
> +
> +    l4t = map_domain_page(_mfn(mfn));
> +    l4e = l4t[l4_table_offset(addr)];
> +    unmap_domain_page(l4t);
> +    if ( !(l4e_get_flags(l4e) & _PAGE_PRESENT) )
> +        return 0;
> +
> +    l3t = map_l3t_from_l4e(l4e);
> +    l3e = l3t[l3_table_offset(addr)];
> +    unmap_domain_page(l3t);
> +    mfn = l3e_get_pfn(l3e);
> +    if ( !(l3e_get_flags(l3e) & _PAGE_PRESENT) || !mfn_valid(mfn) )
> +        return 0;
> +    if ( (l3e_get_flags(l3e) & _PAGE_PSE) )
> +    {
> +        mfn += PFN_DOWN(addr & ((1UL << L3_PAGETABLE_SHIFT) - 1));
> +        goto ret;
> +    }
> +
> +    l2t = map_domain_page(_mfn(mfn));
> +    l2e = l2t[l2_table_offset(addr)];
> +    unmap_domain_page(l2t);
> +    mfn = l2e_get_pfn(l2e);
> +    if ( !(l2e_get_flags(l2e) & _PAGE_PRESENT) || !mfn_valid(mfn) )
> +        return 0;
> +    if ( (l2e_get_flags(l2e) & _PAGE_PSE) )
> +    {
> +        mfn += PFN_DOWN(addr & ((1UL << L2_PAGETABLE_SHIFT) - 1));
> +        goto ret;
> +    }
> +
> +    l1t = map_domain_page(_mfn(mfn));
> +    l1e = l1t[l1_table_offset(addr)];
> +    unmap_domain_page(l1t);
> +    mfn = l1e_get_pfn(l1e);
> +    if ( !(l1e_get_flags(l1e) & _PAGE_PRESENT) || !mfn_valid(mfn) )
> +        return 0;
> +
> + ret:
> +    return mfn;
> +}
> +
>  void *do_page_walk(struct vcpu *v, unsigned long addr)
>  {
>      unsigned long mfn = pagetable_get_pfn(v->arch.guest_table);
> diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
> index d5f5e3b..e34eda3 100644
> --- a/xen/arch/x86/xstate.c
> +++ b/xen/arch/x86/xstate.c
> @@ -65,6 +65,31 @@ uint64_t get_xcr0(void)
>      return this_cpu(xcr0);
>  }
>  
> +void xsaves(uint32_t lmask, uint32_t hmask, struct xsave_struct *ptr)

Should take a 64bit mask.

> +{
> +    asm volatile ( ".byte 0x48,0x0f,0xc7,0x2f"
> +                    : "=m" (*ptr)
> +                    : "a" (lmask), "d" (hmask), "D" (ptr) );
> +}
> +
> +void xrstors(uint32_t lmask, uint32_t hmask, struct xsave_struct *ptr)
> +{
> +    asm volatile ( "1: .byte 0x48,0x0f,0xc7,0x1f\n"
> +                   ".section .fixup,\"ax\"      \n"
> +                   "2: mov %5,%%ecx             \n"
> +                   "   xor %1,%1                \n"
> +                   "   rep stosb                \n"
> +                   "   lea %2,%0                \n"
> +                   "   mov %3,%1                \n"
> +                   "   jmp 1b                   \n"
> +                   ".previous                   \n"
> +                   _ASM_EXTABLE(1b, 2b)
> +                   : "+&D" (ptr), "+&a" (lmask)
> +                   : "m" (*ptr), "g" (lmask), "d" (hmask),
> +                     "m" (xsave_cntxt_size)
> +                   : "ecx" );
> +}
> +

Neither of these two helpers have anything like sufficient checking to
be safely used on guest state.

>  void xsave(struct vcpu *v, uint64_t mask)
>  {
>      struct xsave_struct *ptr = v->arch.xsave_area;
> @@ -268,6 +293,20 @@ static unsigned int _xstate_ctxt_size(u64 xcr0)
>      return ebx;
>  }
>  
> +unsigned int xstate_ctxt_size_compact(u64 xcr0)
> +{
> +    u64 act_xcr0 = get_xcr0();
> +    u32 eax, ebx = 0, ecx, edx;
> +    bool_t ok = set_xcr0(xcr0);
> +
> +    ASSERT(ok);
> +    cpuid_count(XSTATE_CPUID, 1, &eax, &ebx, &ecx, &edx);
> +    ok = set_xcr0(act_xcr0);
> +    ASSERT(ok);
> +
> +    return ebx;
> +}
> +
>  /* Fastpath for common xstate size requests, avoiding reloads of xcr0. */
>  unsigned int xstate_ctxt_size(u64 xcr0)
>  {
> diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
> index 7a9e96f..aee781b 100644
> --- a/xen/include/asm-x86/domain.h
> +++ b/xen/include/asm-x86/domain.h
> @@ -496,6 +496,7 @@ struct arch_vcpu
>       */
>      struct xsave_struct *xsave_area;
>      uint64_t xcr0;
> +    u64 msr_ia32_xss;

uint64_t please, for consistency.

~Andrew

>      /* Accumulated eXtended features mask for using XSAVE/XRESTORE by Xen
>       * itself, as we can never know whether guest OS depends on content
>       * preservation whenever guest OS clears one feature flag (for example,
> diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
> index 8595c38..94a590e 100644
> --- a/xen/include/asm-x86/mm.h
> +++ b/xen/include/asm-x86/mm.h
> @@ -524,6 +524,7 @@ void make_cr3(struct vcpu *v, unsigned long mfn);
>  void update_cr3(struct vcpu *v);
>  int vcpu_destroy_pagetables(struct vcpu *);
>  struct trap_bounce *propagate_page_fault(unsigned long addr, u16 error_code);
> +unsigned long do_page_walk_mfn(struct vcpu *v, unsigned long addr);
>  void *do_page_walk(struct vcpu *v, unsigned long addr);
>  
>  int __sync_local_execstate(void);
> diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
> index 5425f77..365d995 100644
> --- a/xen/include/asm-x86/msr-index.h
> +++ b/xen/include/asm-x86/msr-index.h
> @@ -58,6 +58,8 @@
>  
>  #define MSR_IA32_BNDCFGS		0x00000D90
>  
> +#define MSR_IA32_XSS			0x00000da0
> +
>  #define MSR_MTRRfix64K_00000		0x00000250
>  #define MSR_MTRRfix16K_80000		0x00000258
>  #define MSR_MTRRfix16K_A0000		0x00000259
> diff --git a/xen/include/asm-x86/xstate.h b/xen/include/asm-x86/xstate.h
> index 4c690db..59c7156 100644
> --- a/xen/include/asm-x86/xstate.h
> +++ b/xen/include/asm-x86/xstate.h
> @@ -82,6 +82,8 @@ struct __packed __attribute__((aligned (64))) xsave_struct
>  /* extended state operations */
>  bool_t __must_check set_xcr0(u64 xfeatures);
>  uint64_t get_xcr0(void);
> +void xsaves(uint32_t lmask, uint32_t hmask, struct xsave_struct *ptr);
> +void xrstors(uint32_t lmask, uint32_t hmask, struct xsave_struct *ptr);
>  void xsave(struct vcpu *v, uint64_t mask);
>  void xrstor(struct vcpu *v, uint64_t mask);
>  bool_t xsave_enabled(const struct vcpu *v);
> @@ -92,6 +94,7 @@ int __must_check handle_xsetbv(u32 index, u64 new_bv);
>  void xstate_free_save_area(struct vcpu *v);
>  int xstate_alloc_save_area(struct vcpu *v);
>  void xstate_init(bool_t bsp);
> +unsigned int xstate_ctxt_size_compact(u64 xcr0);
>  unsigned int xstate_ctxt_size(u64 xcr0);
>  
>  #endif /* __ASM_XSTATE_H */

  reply	other threads:[~2015-08-05 17:51 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-05  1:57 [PATCH V3 0/6] add xsaves/xrstors support Shuai Ruan
2015-08-05  1:57 ` [PATCH V3 1/6] x86/xsaves: enable xsaves/xrstors for pv guest Shuai Ruan
2015-08-05 17:51   ` Andrew Cooper [this message]
2015-08-07  8:00     ` Shuai Ruan
     [not found]     ` <20150807080008.GA2976@shuai.ruan@linux.intel.com>
2015-08-07 12:44       ` Andrew Cooper
2015-08-11  7:50         ` Shuai Ruan
     [not found]         ` <20150811075039.GA14406@shuai.ruan@linux.intel.com>
2015-08-11 10:24           ` Andrew Cooper
2015-08-12  3:01             ` Shuai Ruan
2015-08-05  1:57 ` [PATCH V3 2/6] x86/xsaves: enable xsaves/xrstors in xen Shuai Ruan
2015-08-05 17:57   ` Andrew Cooper
2015-08-05  1:57 ` [PATCH V3 3/6] x86/xsaves: enable xsaves/xrstors for hvm guest Shuai Ruan
2015-08-05 18:17   ` Andrew Cooper
2015-08-07  8:22     ` Shuai Ruan
     [not found]     ` <20150807082244.GB2976@shuai.ruan@linux.intel.com>
2015-08-07 13:04       ` Andrew Cooper
2015-08-11  7:59         ` Shuai Ruan
     [not found]         ` <20150811075909.GB14406@shuai.ruan@linux.intel.com>
2015-08-11  9:37           ` Andrew Cooper
2015-08-12 11:17             ` Shuai Ruan
2015-08-05  1:57 ` [PATCH V3 4/6] libxc: expose xsaves/xgetbv1/xsavec to " Shuai Ruan
2015-08-05  8:37   ` Ian Campbell
2015-08-07  8:23     ` Shuai Ruan
2015-08-05  1:57 ` [PATCH V3 5/6] x86/xsaves: support compact format for hvm save/restore Shuai Ruan
2015-08-05 18:45   ` Andrew Cooper
     [not found]     ` <20150811080143.GC14406@shuai.ruan@linux.intel.com>
2015-08-11  9:27       ` Andrew Cooper
2015-08-12 11:23         ` Shuai Ruan
2015-08-05  1:57 ` [PATCH V3 6/6] x86/xsaves: detect xsaves/xgetbv1 in xen Shuai Ruan
2015-08-05 16:38 ` [PATCH V3 0/6] add xsaves/xrstors support Andrew Cooper
2015-08-07  8:25   ` Shuai Ruan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55C24D21.50209@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=eddie.dong@intel.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=jbeulich@suse.com \
    --cc=jun.nakajima@intel.com \
    --cc=keir@xen.org \
    --cc=kevin.tian@intel.com \
    --cc=shuai.ruan@linux.intel.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.