All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V3 0/6] add xsaves/xrstors support
@ 2015-08-05  1:57 Shuai Ruan
  2015-08-05  1:57 ` [PATCH V3 1/6] x86/xsaves: enable xsaves/xrstors for pv guest Shuai Ruan
                   ` (6 more replies)
  0 siblings, 7 replies; 27+ messages in thread
From: Shuai Ruan @ 2015-08-05  1:57 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, wei.liu2, Ian.Campbell, stefano.stabellini,
	jun.nakajima, andrew.cooper3, ian.jackson, eddie.dong, jbeulich,
	keir

Changes in v3:
* Address comments from Jan/Konrad
* Change the bits of eax/ebx/ecx/edx passed to guest in patch 4.
* Add code to verify whether host supports xsaves or not in patch 1.

Changes in v2:
* Address comments from Andrew/chao/Jan, mainly:
* Add details information of xsaves/xrstors feature.
* Test migration between xsaves-support machine and none-xsaves-support machine 
* Remove XGETBV1/XSAVEC/XSAVEOPT out of 'else' in patch 3.
* Change macro name XGETBV to XGETBV1 in patch 4.

This patchset enable xsaves/xrstors feature which will be available on 
Intel Skylake and later platform. Like xsave/xrstor, xsaves/xrstors 
feature will save and load processor state from a region of memory called 
XSAVE area. While unlike xsave/xrstor, xsaves/xrstors:

a) use the compacted format only for the extended region 
   of the XSAVE area which saves memory for you;
b) can operate on supervisor state components so the feature
   is prerequisite to support new supervisor state components;
c) execute only if CPL=0. 

Detail hardware spec can be found in chapter 13 (section 13.11 13.12) of the 
Intel SDM [1].

patch1: add xsaves/xrstors support for pv guest
patch2: add xsaves/xrstors support for xen
patch3-5: add xsaves/xrstors support for hvm guest
patch6: swtich on detection of xsaves/xrstors/xgetbv in xen

[1] Intel SDM (http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf)


Shuai Ruan (6):
  x86/xsaves: enable xsaves/xrstors for pv guest
  x86/xsaves: enable xsaves/xrstors in xen
  x86/xsaves: enable xsaves/xrstors for hvm guest
  libxc: expose xsaves/xgetbv1/xsavec to hvm guest
  x86/xsaves: support compact format for hvm save/restore
  x86/xsaves: detect xsaves/xgetbv1 in xen

 tools/libxc/xc_cpuid_x86.c         |  13 +-
 xen/arch/x86/domain.c              |   6 +
 xen/arch/x86/hvm/hvm.c             |  60 ++++++++-
 xen/arch/x86/hvm/vmx/vmcs.c        |   7 +-
 xen/arch/x86/hvm/vmx/vmx.c         |  18 +++
 xen/arch/x86/i387.c                |   5 +
 xen/arch/x86/traps.c               | 138 ++++++++++++++++++++
 xen/arch/x86/x86_64/mm.c           |  52 ++++++++
 xen/arch/x86/xstate.c              | 259 +++++++++++++++++++++++++++++++------
 xen/include/asm-x86/domain.h       |   1 +
 xen/include/asm-x86/hvm/vmx/vmcs.h |   5 +
 xen/include/asm-x86/hvm/vmx/vmx.h  |   2 +
 xen/include/asm-x86/mm.h           |   1 +
 xen/include/asm-x86/msr-index.h    |   2 +
 xen/include/asm-x86/xstate.h       |  15 ++-
 15 files changed, 536 insertions(+), 48 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH V3 1/6] x86/xsaves: enable xsaves/xrstors for pv guest
  2015-08-05  1:57 [PATCH V3 0/6] add xsaves/xrstors support Shuai Ruan
@ 2015-08-05  1:57 ` Shuai Ruan
  2015-08-05 17:51   ` Andrew Cooper
  2015-08-05  1:57 ` [PATCH V3 2/6] x86/xsaves: enable xsaves/xrstors in xen Shuai Ruan
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Shuai Ruan @ 2015-08-05  1:57 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, wei.liu2, Ian.Campbell, stefano.stabellini,
	jun.nakajima, andrew.cooper3, ian.jackson, eddie.dong, jbeulich,
	keir

This patch emualtes xsaves/xrstors instructions and
XSS msr access.

As xsaves/xrstors instructions and XSS msr access
required be executed only in ring0. So emulation are
needed when pv guest uses these instructions.

Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com>
---
 xen/arch/x86/domain.c           |   3 +
 xen/arch/x86/traps.c            | 138 ++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/x86_64/mm.c        |  52 +++++++++++++++
 xen/arch/x86/xstate.c           |  39 ++++++++++++
 xen/include/asm-x86/domain.h    |   1 +
 xen/include/asm-x86/mm.h        |   1 +
 xen/include/asm-x86/msr-index.h |   2 +
 xen/include/asm-x86/xstate.h    |   3 +
 8 files changed, 239 insertions(+)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 045f6ff..e8b8d67 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -426,6 +426,7 @@ int vcpu_initialise(struct vcpu *v)
 
     /* By default, do not emulate */
     v->arch.vm_event.emulate_flags = 0;
+    v->arch.msr_ia32_xss = 0;
 
     rc = mapcache_vcpu_init(v);
     if ( rc )
@@ -1529,6 +1530,8 @@ static void __context_switch(void)
             if ( xcr0 != get_xcr0() && !set_xcr0(xcr0) )
                 BUG();
         }
+        if ( cpu_has_xsaves )
+            wrmsr_safe(MSR_IA32_XSS, n->arch.msr_ia32_xss);
         vcpu_restore_fpu_eager(n);
         n->arch.ctxt_switch_to(n);
     }
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 6a03582..c1fea77 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -2353,6 +2353,131 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
         }
         break;
 
+    case 0xc7:
+    {
+        void *xsave_addr;
+        int not_page_aligned = 0;
+        u32 guest_xsaves_size = xstate_ctxt_size_compact(v->arch.xcr0);
+
+        switch ( insn_fetch(u8, code_base, eip, code_limit) )
+        {
+            case 0x2f:/* XSAVES */
+            {
+                if ( !cpu_has_xsaves || !(v->arch.pv_vcpu.ctrlreg[4] &
+                                          X86_CR4_OSXSAVE))
+                {
+                    do_guest_trap(TRAP_invalid_op, regs, 0);
+                    goto skip;
+                }
+
+                if ( v->arch.pv_vcpu.ctrlreg[0] & X86_CR0_TS )
+                {
+                    do_guest_trap(TRAP_nmi, regs, 0);
+                    goto skip;
+                }
+
+                if ( !guest_kernel_mode(v, regs) || (regs->edi & 0x3f) )
+                    goto fail;
+
+                if ( (regs->edi & ~PAGE_MASK) + guest_xsaves_size > PAGE_SIZE )
+                {
+                    mfn_t mfn_list[2];
+                    void *va;
+
+                    not_page_aligned = 1;
+                    mfn_list[0] = _mfn(do_page_walk_mfn(v, regs->edi));
+                    mfn_list[1] = _mfn(do_page_walk_mfn(v,
+                                       PAGE_ALIGN(regs->edi)));
+
+                    va = __vmap(mfn_list, 1, 2, PAGE_SIZE, PAGE_HYPERVISOR);
+                    xsave_addr = (void *)((unsigned long)va +
+                                         (regs->edi & ~PAGE_MASK));
+                }
+                else
+                    xsave_addr = do_page_walk(v, regs->edi);
+
+                if ( !xsave_addr )
+                    goto fail;
+
+                xsaves(regs->eax, regs->edx, xsave_addr);
+
+                if ( not_page_aligned )
+                    vunmap((void *)((unsigned long)xsave_addr & PAGE_MASK));
+                else
+                    unmap_domain_page(xsave_addr);
+                break;
+            }
+            case 0x1f:/* XRSTORS */
+            {
+                struct xsave_struct guest_xsave_area;
+
+                if ( !cpu_has_xsaves || !(v->arch.pv_vcpu.ctrlreg[4] &
+                                          X86_CR4_OSXSAVE))
+                {
+                    do_guest_trap(TRAP_invalid_op, regs, 0);
+                    goto skip;
+                }
+
+                if ( v->arch.pv_vcpu.ctrlreg[0] & X86_CR0_TS )
+                {
+                    do_guest_trap(TRAP_nmi, regs, 0);
+                    goto skip;
+                }
+
+                if ( !guest_kernel_mode(v, regs) || (regs->edi & 0x3f) )
+                    goto fail;
+
+                if ( (rc = copy_from_user(&guest_xsave_area, (void *) regs->edi,
+                                          sizeof(struct xsave_struct))) !=0 )
+                {
+                    propagate_page_fault(regs->edi +
+				         sizeof(struct xsave_struct) - rc, 0);
+                    goto skip;
+                }
+                else
+                if ( !(guest_xsave_area.xsave_hdr.xcomp_bv & 1l << 63) ||
+                     (guest_xsave_area.xsave_hdr.xstate_bv |
+                     guest_xsave_area.xsave_hdr.xcomp_bv) !=
+                     guest_xsave_area.xsave_hdr.xcomp_bv ||
+                     ((guest_xsave_area.xsave_hdr.xcomp_bv & ~(1l << 63)) |
+                     v->arch.xcr0 | v->arch.msr_ia32_xss ) !=
+                     (v->arch.xcr0 | v->arch.msr_ia32_xss) )
+                    goto fail;
+
+                if ( (regs->edi & ~PAGE_MASK) + guest_xsaves_size > PAGE_SIZE )
+                {
+                    mfn_t mfn_list[2];
+                    void *va;
+
+                    not_page_aligned = 1;
+                    mfn_list[0] = _mfn(do_page_walk_mfn(v, regs->edi));
+                    mfn_list[1] = _mfn(do_page_walk_mfn(v,
+                                       PAGE_ALIGN(regs->edi)));
+
+                    va = __vmap(mfn_list, 1, 2, PAGE_SIZE, PAGE_HYPERVISOR);
+                    xsave_addr = (void *)((unsigned long)va +
+                                         (regs->edi & ~PAGE_MASK));
+                }
+                else
+                    xsave_addr = do_page_walk(v, regs->edi);
+
+                if ( !xsave_addr )
+                    goto fail;
+
+                xrstors(regs->eax, regs->edx, xsave_addr);
+
+                if ( not_page_aligned )
+                    vunmap((void *)((unsigned long)xsave_addr & PAGE_MASK));
+                else
+                    unmap_domain_page(xsave_addr);
+                break;
+            }
+            default:
+                goto fail;
+        }
+        break;
+    }
+
     case 0x06: /* CLTS */
         (void)do_fpu_taskswitch(0);
         break;
@@ -2663,6 +2788,13 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
                 }
                 break;
             }
+        case MSR_IA32_XSS:
+            if ( !cpu_has_xsaves )
+                goto fail;
+            if ( wrmsr_safe(regs->ecx, msr_content) != 0 )
+                goto fail;
+            v->arch.msr_ia32_xss = msr_content;
+            break;
             /*FALLTHROUGH*/
 
         default:
@@ -2798,6 +2930,12 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
                 }
                 break;
             }
+        case MSR_IA32_XSS:
+            if ( !cpu_has_xsaves )
+                goto fail;
+            regs->eax = v->arch.msr_ia32_xss;
+            regs->edx = v->arch.msr_ia32_xss >> 32;
+            break;
             /*FALLTHROUGH*/
 
         default:
diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index 98310f3..de94ac1 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -48,6 +48,58 @@ l2_pgentry_t __section(".bss.page_aligned") l2_bootmap[L2_PAGETABLE_ENTRIES];
 
 l2_pgentry_t *compat_idle_pg_table_l2;
 
+unsigned long do_page_walk_mfn(struct vcpu *v, unsigned long addr)
+{
+    unsigned long mfn = pagetable_get_pfn(v->arch.guest_table);
+    l4_pgentry_t l4e, *l4t;
+    l3_pgentry_t l3e, *l3t;
+    l2_pgentry_t l2e, *l2t;
+    l1_pgentry_t l1e, *l1t;
+
+    if ( !is_pv_vcpu(v) || !is_canonical_address(addr) )
+        return 0;
+
+    l4t = map_domain_page(_mfn(mfn));
+    l4e = l4t[l4_table_offset(addr)];
+    unmap_domain_page(l4t);
+    if ( !(l4e_get_flags(l4e) & _PAGE_PRESENT) )
+        return 0;
+
+    l3t = map_l3t_from_l4e(l4e);
+    l3e = l3t[l3_table_offset(addr)];
+    unmap_domain_page(l3t);
+    mfn = l3e_get_pfn(l3e);
+    if ( !(l3e_get_flags(l3e) & _PAGE_PRESENT) || !mfn_valid(mfn) )
+        return 0;
+    if ( (l3e_get_flags(l3e) & _PAGE_PSE) )
+    {
+        mfn += PFN_DOWN(addr & ((1UL << L3_PAGETABLE_SHIFT) - 1));
+        goto ret;
+    }
+
+    l2t = map_domain_page(_mfn(mfn));
+    l2e = l2t[l2_table_offset(addr)];
+    unmap_domain_page(l2t);
+    mfn = l2e_get_pfn(l2e);
+    if ( !(l2e_get_flags(l2e) & _PAGE_PRESENT) || !mfn_valid(mfn) )
+        return 0;
+    if ( (l2e_get_flags(l2e) & _PAGE_PSE) )
+    {
+        mfn += PFN_DOWN(addr & ((1UL << L2_PAGETABLE_SHIFT) - 1));
+        goto ret;
+    }
+
+    l1t = map_domain_page(_mfn(mfn));
+    l1e = l1t[l1_table_offset(addr)];
+    unmap_domain_page(l1t);
+    mfn = l1e_get_pfn(l1e);
+    if ( !(l1e_get_flags(l1e) & _PAGE_PRESENT) || !mfn_valid(mfn) )
+        return 0;
+
+ ret:
+    return mfn;
+}
+
 void *do_page_walk(struct vcpu *v, unsigned long addr)
 {
     unsigned long mfn = pagetable_get_pfn(v->arch.guest_table);
diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index d5f5e3b..e34eda3 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -65,6 +65,31 @@ uint64_t get_xcr0(void)
     return this_cpu(xcr0);
 }
 
+void xsaves(uint32_t lmask, uint32_t hmask, struct xsave_struct *ptr)
+{
+    asm volatile ( ".byte 0x48,0x0f,0xc7,0x2f"
+                    : "=m" (*ptr)
+                    : "a" (lmask), "d" (hmask), "D" (ptr) );
+}
+
+void xrstors(uint32_t lmask, uint32_t hmask, struct xsave_struct *ptr)
+{
+    asm volatile ( "1: .byte 0x48,0x0f,0xc7,0x1f\n"
+                   ".section .fixup,\"ax\"      \n"
+                   "2: mov %5,%%ecx             \n"
+                   "   xor %1,%1                \n"
+                   "   rep stosb                \n"
+                   "   lea %2,%0                \n"
+                   "   mov %3,%1                \n"
+                   "   jmp 1b                   \n"
+                   ".previous                   \n"
+                   _ASM_EXTABLE(1b, 2b)
+                   : "+&D" (ptr), "+&a" (lmask)
+                   : "m" (*ptr), "g" (lmask), "d" (hmask),
+                     "m" (xsave_cntxt_size)
+                   : "ecx" );
+}
+
 void xsave(struct vcpu *v, uint64_t mask)
 {
     struct xsave_struct *ptr = v->arch.xsave_area;
@@ -268,6 +293,20 @@ static unsigned int _xstate_ctxt_size(u64 xcr0)
     return ebx;
 }
 
+unsigned int xstate_ctxt_size_compact(u64 xcr0)
+{
+    u64 act_xcr0 = get_xcr0();
+    u32 eax, ebx = 0, ecx, edx;
+    bool_t ok = set_xcr0(xcr0);
+
+    ASSERT(ok);
+    cpuid_count(XSTATE_CPUID, 1, &eax, &ebx, &ecx, &edx);
+    ok = set_xcr0(act_xcr0);
+    ASSERT(ok);
+
+    return ebx;
+}
+
 /* Fastpath for common xstate size requests, avoiding reloads of xcr0. */
 unsigned int xstate_ctxt_size(u64 xcr0)
 {
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 7a9e96f..aee781b 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -496,6 +496,7 @@ struct arch_vcpu
      */
     struct xsave_struct *xsave_area;
     uint64_t xcr0;
+    u64 msr_ia32_xss;
     /* Accumulated eXtended features mask for using XSAVE/XRESTORE by Xen
      * itself, as we can never know whether guest OS depends on content
      * preservation whenever guest OS clears one feature flag (for example,
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 8595c38..94a590e 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -524,6 +524,7 @@ void make_cr3(struct vcpu *v, unsigned long mfn);
 void update_cr3(struct vcpu *v);
 int vcpu_destroy_pagetables(struct vcpu *);
 struct trap_bounce *propagate_page_fault(unsigned long addr, u16 error_code);
+unsigned long do_page_walk_mfn(struct vcpu *v, unsigned long addr);
 void *do_page_walk(struct vcpu *v, unsigned long addr);
 
 int __sync_local_execstate(void);
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index 5425f77..365d995 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -58,6 +58,8 @@
 
 #define MSR_IA32_BNDCFGS		0x00000D90
 
+#define MSR_IA32_XSS			0x00000da0
+
 #define MSR_MTRRfix64K_00000		0x00000250
 #define MSR_MTRRfix16K_80000		0x00000258
 #define MSR_MTRRfix16K_A0000		0x00000259
diff --git a/xen/include/asm-x86/xstate.h b/xen/include/asm-x86/xstate.h
index 4c690db..59c7156 100644
--- a/xen/include/asm-x86/xstate.h
+++ b/xen/include/asm-x86/xstate.h
@@ -82,6 +82,8 @@ struct __packed __attribute__((aligned (64))) xsave_struct
 /* extended state operations */
 bool_t __must_check set_xcr0(u64 xfeatures);
 uint64_t get_xcr0(void);
+void xsaves(uint32_t lmask, uint32_t hmask, struct xsave_struct *ptr);
+void xrstors(uint32_t lmask, uint32_t hmask, struct xsave_struct *ptr);
 void xsave(struct vcpu *v, uint64_t mask);
 void xrstor(struct vcpu *v, uint64_t mask);
 bool_t xsave_enabled(const struct vcpu *v);
@@ -92,6 +94,7 @@ int __must_check handle_xsetbv(u32 index, u64 new_bv);
 void xstate_free_save_area(struct vcpu *v);
 int xstate_alloc_save_area(struct vcpu *v);
 void xstate_init(bool_t bsp);
+unsigned int xstate_ctxt_size_compact(u64 xcr0);
 unsigned int xstate_ctxt_size(u64 xcr0);
 
 #endif /* __ASM_XSTATE_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH V3 2/6] x86/xsaves: enable xsaves/xrstors in xen
  2015-08-05  1:57 [PATCH V3 0/6] add xsaves/xrstors support Shuai Ruan
  2015-08-05  1:57 ` [PATCH V3 1/6] x86/xsaves: enable xsaves/xrstors for pv guest Shuai Ruan
@ 2015-08-05  1:57 ` Shuai Ruan
  2015-08-05 17:57   ` Andrew Cooper
  2015-08-05  1:57 ` [PATCH V3 3/6] x86/xsaves: enable xsaves/xrstors for hvm guest Shuai Ruan
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Shuai Ruan @ 2015-08-05  1:57 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, wei.liu2, Ian.Campbell, stefano.stabellini,
	jun.nakajima, andrew.cooper3, ian.jackson, eddie.dong, jbeulich,
	keir

This patch uses xsaves/xrstors instead of xsaveopt/xrstor
to perform the xsave_area switching so that xen itself
can benefit from them when available.

Please note that xsaves/xrstors only use compact format.

Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com>
---
 xen/arch/x86/i387.c          |  5 ++++
 xen/arch/x86/xstate.c        | 70 +++++++++++++++++++++++++-------------------
 xen/include/asm-x86/xstate.h |  4 ++-
 3 files changed, 48 insertions(+), 31 deletions(-)

diff --git a/xen/arch/x86/i387.c b/xen/arch/x86/i387.c
index 14f2a79..9071374 100644
--- a/xen/arch/x86/i387.c
+++ b/xen/arch/x86/i387.c
@@ -309,7 +309,12 @@ int vcpu_init_fpu(struct vcpu *v)
         return rc;
 
     if ( v->arch.xsave_area )
+    {
         v->arch.fpu_ctxt = &v->arch.xsave_area->fpu_sse;
+
+        if ( cpu_has_xsaves )
+            v->arch.xsave_area->xsave_hdr.xcomp_bv |= XSTATE_COMPACTION_ENABLED;
+    }
     else
     {
         v->arch.fpu_ctxt = _xzalloc(sizeof(v->arch.xsave_area->fpu_sse), 16);
diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index e34eda3..571ce13 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -102,7 +102,9 @@ void xsave(struct vcpu *v, uint64_t mask)
         typeof(ptr->fpu_sse.fip.sel) fcs = ptr->fpu_sse.fip.sel;
         typeof(ptr->fpu_sse.fdp.sel) fds = ptr->fpu_sse.fdp.sel;
 
-        if ( cpu_has_xsaveopt )
+        if ( cpu_has_xsaves )
+            xsaves(lmask, hmask, ptr);
+        else if ( cpu_has_xsaveopt )
         {
             /*
              * xsaveopt may not write the FPU portion even when the respective
@@ -155,7 +157,9 @@ void xsave(struct vcpu *v, uint64_t mask)
     }
     else
     {
-        if ( cpu_has_xsaveopt )
+        if ( cpu_has_xsaves )
+            xsaves(lmask, hmask, ptr);
+        else if ( cpu_has_xsaveopt )
             asm volatile ( ".byte 0x0f,0xae,0x37"
                            : "=m" (*ptr)
                            : "a" (lmask), "d" (hmask), "D" (ptr) );
@@ -198,36 +202,42 @@ void xrstor(struct vcpu *v, uint64_t mask)
     switch ( __builtin_expect(ptr->fpu_sse.x[FPU_WORD_SIZE_OFFSET], 8) )
     {
     default:
-        asm volatile ( "1: .byte 0x48,0x0f,0xae,0x2f\n"
-                       ".section .fixup,\"ax\"      \n"
-                       "2: mov %5,%%ecx             \n"
-                       "   xor %1,%1                \n"
-                       "   rep stosb                \n"
-                       "   lea %2,%0                \n"
-                       "   mov %3,%1                \n"
-                       "   jmp 1b                   \n"
-                       ".previous                   \n"
-                       _ASM_EXTABLE(1b, 2b)
-                       : "+&D" (ptr), "+&a" (lmask)
-                       : "m" (*ptr), "g" (lmask), "d" (hmask),
-                         "m" (xsave_cntxt_size)
-                       : "ecx" );
+        if ( cpu_has_xsaves )
+            xrstors(lmask, hmask, ptr);
+        else
+            asm volatile ( "1: .byte 0x48,0x0f,0xae,0x2f\n"
+                           ".section .fixup,\"ax\"      \n"
+                           "2: mov %5,%%ecx             \n"
+                           "   xor %1,%1                \n"
+                           "   rep stosb                \n"
+                           "   lea %2,%0                \n"
+                           "   mov %3,%1                \n"
+                           "   jmp 1b                   \n"
+                           ".previous                   \n"
+                           _ASM_EXTABLE(1b, 2b)
+                           : "+&D" (ptr), "+&a" (lmask)
+                           : "m" (*ptr), "g" (lmask), "d" (hmask),
+                             "m" (xsave_cntxt_size)
+                           : "ecx" );
         break;
     case 4: case 2:
-        asm volatile ( "1: .byte 0x0f,0xae,0x2f\n"
-                       ".section .fixup,\"ax\" \n"
-                       "2: mov %5,%%ecx        \n"
-                       "   xor %1,%1           \n"
-                       "   rep stosb           \n"
-                       "   lea %2,%0           \n"
-                       "   mov %3,%1           \n"
-                       "   jmp 1b              \n"
-                       ".previous              \n"
-                       _ASM_EXTABLE(1b, 2b)
-                       : "+&D" (ptr), "+&a" (lmask)
-                       : "m" (*ptr), "g" (lmask), "d" (hmask),
-                         "m" (xsave_cntxt_size)
-                       : "ecx" );
+        if ( cpu_has_xsaves )
+            xrstors(lmask, hmask, ptr);
+        else
+            asm volatile ( "1: .byte 0x48,0x0f,0xae,0x2f\n"
+                           ".section .fixup,\"ax\"      \n"
+                           "2: mov %5,%%ecx             \n"
+                           "   xor %1,%1                \n"
+                           "   rep stosb                \n"
+                           "   lea %2,%0                \n"
+                           "   mov %3,%1                \n"
+                           "   jmp 1b                   \n"
+                           ".previous                   \n"
+                           _ASM_EXTABLE(1b, 2b)
+                           : "+&D" (ptr), "+&a" (lmask)
+                           : "m" (*ptr), "g" (lmask), "d" (hmask),
+                             "m" (xsave_cntxt_size)
+                           : "ecx" );
         break;
     }
 }
diff --git a/xen/include/asm-x86/xstate.h b/xen/include/asm-x86/xstate.h
index 59c7156..47c2f59 100644
--- a/xen/include/asm-x86/xstate.h
+++ b/xen/include/asm-x86/xstate.h
@@ -41,6 +41,7 @@
 #define XSTATE_ALL     (~(1ULL << 63))
 #define XSTATE_NONLAZY (XSTATE_LWP | XSTATE_BNDREGS | XSTATE_BNDCSR)
 #define XSTATE_LAZY    (XSTATE_ALL & ~XSTATE_NONLAZY)
+#define XSTATE_COMPACTION_ENABLED  (1ULL << 63)
 
 extern u64 xfeature_mask;
 extern bool_t cpu_has_xsaves, cpu_has_xgetbv1;
@@ -72,7 +73,8 @@ struct __packed __attribute__((aligned (64))) xsave_struct
 
     struct {
         u64 xstate_bv;
-        u64 reserved[7];
+        u64 xcomp_bv;
+        u64 reserved[6];
     } xsave_hdr;                             /* The 64-byte header */
 
     struct { char x[XSTATE_YMM_SIZE]; } ymm; /* YMM */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH V3 3/6] x86/xsaves: enable xsaves/xrstors for hvm guest
  2015-08-05  1:57 [PATCH V3 0/6] add xsaves/xrstors support Shuai Ruan
  2015-08-05  1:57 ` [PATCH V3 1/6] x86/xsaves: enable xsaves/xrstors for pv guest Shuai Ruan
  2015-08-05  1:57 ` [PATCH V3 2/6] x86/xsaves: enable xsaves/xrstors in xen Shuai Ruan
@ 2015-08-05  1:57 ` Shuai Ruan
  2015-08-05 18:17   ` Andrew Cooper
  2015-08-05  1:57 ` [PATCH V3 4/6] libxc: expose xsaves/xgetbv1/xsavec to " Shuai Ruan
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Shuai Ruan @ 2015-08-05  1:57 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, wei.liu2, Ian.Campbell, stefano.stabellini,
	jun.nakajima, andrew.cooper3, ian.jackson, eddie.dong, jbeulich,
	keir

This patch enables xsaves for hvm guest, includes:
1.handle xsaves vmcs init and vmexit.
2.add logic to write/read the XSS msr.

Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com>
---
 xen/arch/x86/hvm/hvm.c             | 44 ++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/hvm/vmx/vmcs.c        |  7 +++++-
 xen/arch/x86/hvm/vmx/vmx.c         | 18 ++++++++++++++++
 xen/arch/x86/xstate.c              |  4 ++--
 xen/include/asm-x86/hvm/vmx/vmcs.h |  5 +++++
 xen/include/asm-x86/hvm/vmx/vmx.h  |  2 ++
 xen/include/asm-x86/xstate.h       |  2 +-
 7 files changed, 78 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index c07e3ef..e5cf761 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4370,6 +4370,10 @@ void hvm_hypervisor_cpuid_leaf(uint32_t sub_idx,
     }
 }
 
+#define XSAVEOPT	(1 << 0)
+#define XSAVEC		(1 << 1)
+#define XGETBV1	(1 << 2)
+#define XSAVES		(1 << 3)
 void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
                                    unsigned int *ecx, unsigned int *edx)
 {
@@ -4456,6 +4460,34 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
                     *ebx = _eax + _ebx;
             }
         }
+        if ( count == 1 )
+        {
+            if ( cpu_has_xsaves )
+            {
+                *ebx = XSTATE_AREA_MIN_SIZE;
+                if ( v->arch.xcr0 | v->arch.msr_ia32_xss )
+                    for ( sub_leaf = 2; sub_leaf < 63; sub_leaf++ )
+                    {
+                        if ( !((v->arch.xcr0 | v->arch.msr_ia32_xss)
+			   & (1ULL << sub_leaf)) )
+                            continue;
+                        domain_cpuid(d, input, sub_leaf, &_eax, &_ebx, &_ecx,
+                                     &_edx);
+                        *ebx =  *ebx + _eax;
+                    }
+            }
+            else
+            {
+                *eax &= ~XSAVES;
+                *ebx = *ecx = *edx = 0;
+            }
+            if ( !cpu_has_xgetbv1 )
+                *eax &= ~XGETBV1;
+            if ( !cpu_has_xsavec )
+                *eax &= ~XSAVEC;
+            if ( !cpu_has_xsaveopt )
+                *eax &= ~XSAVEOPT;
+        }
         break;
 
     case 0x80000001:
@@ -4555,6 +4587,12 @@ int hvm_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
         *msr_content = v->arch.hvm_vcpu.guest_efer;
         break;
 
+    case MSR_IA32_XSS:
+        if ( !cpu_has_vmx_xsaves )
+            goto gp_fault;
+        *msr_content = v->arch.msr_ia32_xss;
+        break;
+
     case MSR_IA32_TSC:
         *msr_content = _hvm_rdtsc_intercept();
         break;
@@ -4687,6 +4725,12 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
            return X86EMUL_EXCEPTION;
         break;
 
+    case MSR_IA32_XSS:
+        if ( !cpu_has_vmx_xsaves )
+            goto gp_fault;
+        v->arch.msr_ia32_xss = msr_content;
+        break;
+
     case MSR_IA32_TSC:
         hvm_set_guest_tsc(v, msr_content);
         break;
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 4c5ceb5..8e61e3f 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -230,7 +230,8 @@ static int vmx_init_vmcs_config(void)
                SECONDARY_EXEC_ENABLE_EPT |
                SECONDARY_EXEC_ENABLE_RDTSCP |
                SECONDARY_EXEC_PAUSE_LOOP_EXITING |
-               SECONDARY_EXEC_ENABLE_INVPCID);
+               SECONDARY_EXEC_ENABLE_INVPCID |
+               SECONDARY_EXEC_XSAVES);
         rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
         if ( _vmx_misc_cap & VMX_MISC_VMWRITE_ALL )
             opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING;
@@ -921,6 +922,7 @@ void virtual_vmcs_vmwrite(void *vvmcs, u32 vmcs_encoding, u64 val)
     virtual_vmcs_exit(vvmcs);
 }
 
+#define VMX_XSS_EXIT_BITMAP 0
 static int construct_vmcs(struct vcpu *v)
 {
     struct domain *d = v->domain;
@@ -1204,6 +1206,9 @@ static int construct_vmcs(struct vcpu *v)
         __vmwrite(GUEST_PAT, guest_pat);
     }
 
+    if ( cpu_has_vmx_xsaves )
+        __vmwrite(XSS_EXIT_BITMAP, VMX_XSS_EXIT_BITMAP);
+
     vmx_vmcs_exit(v);
 
     /* PVH: paging mode is updated by arch_set_info_guest(). */
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index d3183a8..64ff63b 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2708,6 +2708,16 @@ static int vmx_handle_apic_write(void)
     return vlapic_apicv_write(current, exit_qualification & 0xfff);
 }
 
+static void vmx_handle_xsaves(void)
+{
+    WARN();
+}
+
+static void vmx_handle_xrstors(void)
+{
+    WARN();
+}
+
 void vmx_vmexit_handler(struct cpu_user_regs *regs)
 {
     unsigned long exit_qualification, exit_reason, idtv_info, intr_info = 0;
@@ -3226,6 +3236,14 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
         vmx_vcpu_flush_pml_buffer(v);
         break;
 
+    case EXIT_REASON_XSAVES:
+        vmx_handle_xsaves();
+        break;
+
+    case EXIT_REASON_XRSTORS:
+        vmx_handle_xrstors();
+        break;
+
     case EXIT_REASON_ACCESS_GDTR_OR_IDTR:
     case EXIT_REASON_ACCESS_LDTR_OR_TR:
     case EXIT_REASON_VMX_PREEMPTION_TIMER_EXPIRED:
diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index 571ce13..699058d 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -14,8 +14,8 @@
 #include <asm/xstate.h>
 #include <asm/asm_defns.h>
 
-static bool_t __read_mostly cpu_has_xsaveopt;
-static bool_t __read_mostly cpu_has_xsavec;
+bool_t __read_mostly cpu_has_xsaveopt;
+bool_t __read_mostly cpu_has_xsavec;
 bool_t __read_mostly cpu_has_xgetbv1;
 bool_t __read_mostly cpu_has_xsaves;
 
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 3132644..d969a6d 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -225,6 +225,7 @@ extern u32 vmx_vmentry_control;
 #define SECONDARY_EXEC_ENABLE_VMFUNC            0x00002000
 #define SECONDARY_EXEC_ENABLE_VMCS_SHADOWING    0x00004000
 #define SECONDARY_EXEC_ENABLE_PML               0x00020000
+#define SECONDARY_EXEC_XSAVES                   0x00100000
 extern u32 vmx_secondary_exec_control;
 
 #define VMX_EPT_EXEC_ONLY_SUPPORTED             0x00000001
@@ -287,6 +288,8 @@ extern u32 vmx_secondary_exec_control;
     (vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_VMCS_SHADOWING)
 #define cpu_has_vmx_pml \
     (vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_PML)
+#define cpu_has_vmx_xsaves \
+    (vmx_secondary_exec_control & SECONDARY_EXEC_XSAVES)
 
 #define VMCS_RID_TYPE_MASK              0x80000000
 
@@ -356,6 +359,8 @@ enum vmcs_field {
 #define EOI_EXIT_BITMAP(n) (EOI_EXIT_BITMAP0 + (n) * 2) /* n = 0...3 */
     VMREAD_BITMAP                   = 0x00002026,
     VMWRITE_BITMAP                  = 0x00002028,
+    XSS_EXIT_BITMAP                 = 0x0000202c,
+    XSS_EXIT_BITMAP_HIGH            = 0x0000202d,
     GUEST_PHYSICAL_ADDRESS          = 0x00002400,
     VMCS_LINK_POINTER               = 0x00002800,
     GUEST_IA32_DEBUGCTL             = 0x00002802,
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index c5f3d24..07036f6 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -188,6 +188,8 @@ static inline unsigned long pi_get_pir(struct pi_desc *pi_desc, int group)
 #define EXIT_REASON_APIC_WRITE          56
 #define EXIT_REASON_INVPCID             58
 #define EXIT_REASON_PML_FULL            62
+#define EXIT_REASON_XSAVES              63
+#define EXIT_REASON_XRSTORS             64
 
 /*
  * Interruption-information format
diff --git a/xen/include/asm-x86/xstate.h b/xen/include/asm-x86/xstate.h
index 47c2f59..75fa6eb 100644
--- a/xen/include/asm-x86/xstate.h
+++ b/xen/include/asm-x86/xstate.h
@@ -44,7 +44,7 @@
 #define XSTATE_COMPACTION_ENABLED  (1ULL << 63)
 
 extern u64 xfeature_mask;
-extern bool_t cpu_has_xsaves, cpu_has_xgetbv1;
+extern bool_t cpu_has_xsaves, cpu_has_xgetbv1 ,cpu_has_xsavec, cpu_has_xsaveopt;
 
 /* extended state save area */
 struct __packed __attribute__((aligned (64))) xsave_struct
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH V3 4/6] libxc: expose xsaves/xgetbv1/xsavec to hvm guest
  2015-08-05  1:57 [PATCH V3 0/6] add xsaves/xrstors support Shuai Ruan
                   ` (2 preceding siblings ...)
  2015-08-05  1:57 ` [PATCH V3 3/6] x86/xsaves: enable xsaves/xrstors for hvm guest Shuai Ruan
@ 2015-08-05  1:57 ` Shuai Ruan
  2015-08-05  8:37   ` Ian Campbell
  2015-08-05  1:57 ` [PATCH V3 5/6] x86/xsaves: support compact format for hvm save/restore Shuai Ruan
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Shuai Ruan @ 2015-08-05  1:57 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, wei.liu2, Ian.Campbell, stefano.stabellini,
	jun.nakajima, andrew.cooper3, ian.jackson, eddie.dong, jbeulich,
	keir

This patch exposes xsaves/xgetbv1/xsavec to hvm guest.
The reserved bits of eax/ebx/ecx/edx must be cleaned up
when call cpuid(0dh) with leaf 1 or 2..63.

According to the spec the following bits must be reseved:
For leaf 1, bits 03-04/08-31 of ecx is reserved. Edx is reserved.
For leaf 2...63, bits 01-31 of ecx is reserved. Edx is reserved.

Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com>
---
 tools/libxc/xc_cpuid_x86.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index c97f91a..b69676a 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -211,6 +211,9 @@ static void intel_xc_cpuid_policy(
 }
 
 #define XSAVEOPT        (1 << 0)
+#define XSAVEC          (1 << 1)
+#define XGETBV1         (1 << 2)
+#define XSAVES          (1 << 3)
 /* Configure extended state enumeration leaves (0x0000000D for xsave) */
 static void xc_cpuid_config_xsave(
     xc_interface *xch, domid_t domid, uint64_t xfeature_mask,
@@ -247,8 +250,9 @@ static void xc_cpuid_config_xsave(
         regs[1] = 512 + 64; /* FP/SSE + XSAVE.HEADER */
         break;
     case 1: /* leaf 1 */
-        regs[0] &= XSAVEOPT;
-        regs[1] = regs[2] = regs[3] = 0;
+        regs[0] &= (XSAVEOPT | XSAVEC | XGETBV1 | XSAVES);
+        regs[2] &= 0xe7;
+        regs[3] = 0;
         break;
     case 2 ... 63: /* sub-leaves */
         if ( !(xfeature_mask & (1ULL << input[1])) )
@@ -256,8 +260,9 @@ static void xc_cpuid_config_xsave(
             regs[0] = regs[1] = regs[2] = regs[3] = 0;
             break;
         }
-        /* Don't touch EAX, EBX. Also cleanup ECX and EDX */
-        regs[2] = regs[3] = 0;
+        /* Don't touch EAX, EBX. Also cleanup EDX. Cleanup bits 01-32 of ECX*/
+        regs[2] &= 0x1;
+        regs[3] = 0;
         break;
     }
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH V3 5/6] x86/xsaves: support compact format for hvm save/restore
  2015-08-05  1:57 [PATCH V3 0/6] add xsaves/xrstors support Shuai Ruan
                   ` (3 preceding siblings ...)
  2015-08-05  1:57 ` [PATCH V3 4/6] libxc: expose xsaves/xgetbv1/xsavec to " Shuai Ruan
@ 2015-08-05  1:57 ` Shuai Ruan
  2015-08-05 18:45   ` Andrew Cooper
  2015-08-05  1:57 ` [PATCH V3 6/6] x86/xsaves: detect xsaves/xgetbv1 in xen Shuai Ruan
  2015-08-05 16:38 ` [PATCH V3 0/6] add xsaves/xrstors support Andrew Cooper
  6 siblings, 1 reply; 27+ messages in thread
From: Shuai Ruan @ 2015-08-05  1:57 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, wei.liu2, Ian.Campbell, stefano.stabellini,
	jun.nakajima, andrew.cooper3, ian.jackson, eddie.dong, jbeulich,
	keir

xsaves/xrstors only use compact format, so format convertion
is needed when perform save/restore.

Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com>
---
 xen/arch/x86/domain.c        |   3 +
 xen/arch/x86/hvm/hvm.c       |  16 +++--
 xen/arch/x86/xstate.c        | 138 +++++++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/xstate.h |   6 ++
 4 files changed, 158 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index e8b8d67..083b70d 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -845,6 +845,9 @@ int arch_set_info_guest(
         memcpy(v->arch.fpu_ctxt, &c.nat->fpu_ctxt, sizeof(c.nat->fpu_ctxt));
         if ( v->arch.xsave_area )
              v->arch.xsave_area->xsave_hdr.xstate_bv = XSTATE_FP_SSE;
+        if ( cpu_has_xsaves )
+            v->arch.xsave_area->xsave_hdr.xcomp_bv = v->arch.xcr0_accum |
+                                                     XSTATE_COMPACTION_ENABLED;
     }
 
     if ( !compat )
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index e5cf761..8495938 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2127,8 +2127,11 @@ static int hvm_save_cpu_xsave_states(struct domain *d, hvm_domain_context_t *h)
         ctxt->xfeature_mask = xfeature_mask;
         ctxt->xcr0 = v->arch.xcr0;
         ctxt->xcr0_accum = v->arch.xcr0_accum;
-        memcpy(&ctxt->save_area, v->arch.xsave_area,
-               size - offsetof(struct hvm_hw_cpu_xsave, save_area));
+        if ( cpu_has_xsaves )
+            save_xsave_states(v, (u8 *)&ctxt->save_area);
+        else
+            memcpy(&ctxt->save_area, v->arch.xsave_area,
+                   size - offsetof(struct hvm_hw_cpu_xsave, save_area));
     }
 
     return 0;
@@ -2227,9 +2230,12 @@ static int hvm_load_cpu_xsave_states(struct domain *d, hvm_domain_context_t *h)
     v->arch.xcr0_accum = ctxt->xcr0_accum;
     if ( ctxt->xcr0_accum & XSTATE_NONLAZY )
         v->arch.nonlazy_xstate_used = 1;
-    memcpy(v->arch.xsave_area, &ctxt->save_area,
-           min(desc->length, size) - offsetof(struct hvm_hw_cpu_xsave,
-           save_area));
+    if ( cpu_has_xsaves )
+        load_xsave_states(v, (u8 *)&ctxt->save_area);
+    else
+        memcpy(v->arch.xsave_area, &ctxt->save_area,
+               min(desc->length, size) - offsetof(struct hvm_hw_cpu_xsave,
+               save_area));
 
     return 0;
 }
diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index 699058d..0eea146 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -29,6 +29,9 @@ static u32 __read_mostly xsave_cntxt_size;
 /* A 64-bit bitmask of the XSAVE/XRSTOR features supported by processor. */
 u64 __read_mostly xfeature_mask;
 
+static unsigned int *xstate_offsets, *xstate_sizes;
+static unsigned int xstate_features;
+static unsigned int xstate_comp_offsets[sizeof(xfeature_mask)*8];
 /* Cached xcr0 for fast read */
 static DEFINE_PER_CPU(uint64_t, xcr0);
 
@@ -65,6 +68,135 @@ uint64_t get_xcr0(void)
     return this_cpu(xcr0);
 }
 
+static void setup_xstate_features(void)
+{
+    unsigned int eax, ebx, ecx, edx, leaf = 0x2;
+
+    xstate_features = fls(xfeature_mask);
+    xstate_offsets = _xzalloc(xstate_features, sizeof(int));
+    xstate_sizes = _xzalloc(xstate_features, sizeof(int));
+
+    do {
+        cpuid_count(XSTATE_CPUID, leaf, &eax, &ebx, &ecx, &edx);
+
+        if ( eax == 0 )
+            break;
+
+        xstate_offsets[leaf] = ebx;
+        xstate_sizes[leaf] = eax;
+
+        leaf++;
+    } while (1);
+}
+
+static void setup_xstate_comp(void)
+{
+    unsigned int xstate_comp_sizes[sizeof(xfeature_mask)*8];
+    int i;
+
+    /*
+     * The FP xstates and SSE xstates are legacy states. They are always
+     * in the fixed offsets in the xsave area in either compacted form
+     * or standard form.
+     */
+    xstate_comp_offsets[0] = 0;
+    xstate_comp_offsets[1] = XSAVE_SSE_OFFSET;
+
+    xstate_comp_offsets[2] = FXSAVE_SIZE + XSAVE_HDR_SIZE;
+
+    for (i = 2; i < xstate_features; i++)
+    {
+        if ( 1 << i & xfeature_mask )
+            xstate_comp_sizes[i] = xstate_sizes[i];
+        else
+            xstate_comp_sizes[i] = 0;
+
+        if ( i > 2 )
+            xstate_comp_offsets[i] = xstate_comp_offsets[i-1]
+                                    + xstate_comp_sizes[i-1];
+    }
+}
+
+static void *get_xsave_addr(struct xsave_struct *xsave, int xstate)
+{
+    int feature = fls(xstate) - 1;
+    if ( !(1 << feature & xfeature_mask) )
+        return NULL;
+
+    return (void *)xsave + xstate_comp_offsets[feature];
+}
+
+void save_xsave_states(struct vcpu *v, u8 *dest)
+{
+    struct xsave_struct *xsave = v->arch.xsave_area;
+    u64 xstate_bv = xsave->xsave_hdr.xstate_bv;
+    u64 valid;
+
+    /*
+     * Copy legacy XSAVE area, to avoid complications with CPUID
+     * leaves 0 and 1 in the loop below.
+     */
+    memcpy(dest, xsave, XSAVE_HDR_OFFSET);
+
+    /* Set XSTATE_BV */
+    *(u64 *)(dest + XSAVE_HDR_OFFSET) = xstate_bv;
+
+    /*
+     * Copy each region from the possibly compacted offset to the
+     * non-compacted offset.
+     */
+    valid = xstate_bv & ~XSTATE_FP_SSE;
+    while ( valid )
+    {
+        u64 feature = valid & -valid;
+        int index = fls(feature) - 1;
+        void *src = get_xsave_addr(xsave, feature);
+
+        if ( src )
+            memcpy(dest + xstate_offsets[index], src, xstate_sizes[index]);
+        else
+            WARN_ON(1);
+
+        valid -= feature;
+    }
+}
+
+void load_xsave_states(struct vcpu *v, u8 *src)
+{
+    struct xsave_struct *xsave = v->arch.xsave_area;
+    u64 xstate_bv = *(u64 *)(src + XSAVE_HDR_OFFSET);
+    u64 valid;
+
+    /*
+     * Copy legacy XSAVE area, to avoid complications with CPUID
+     * leaves 0 and 1 in the loop below.
+     */
+    memcpy(xsave, src, XSAVE_HDR_OFFSET);
+
+    /* Set XSTATE_BV and possibly XCOMP_BV.  */
+    xsave->xsave_hdr.xstate_bv = xstate_bv;
+    xsave->xsave_hdr.xcomp_bv = v->arch.xcr0_accum | XSTATE_COMPACTION_ENABLED;
+
+    /*
+     * Copy each region from the non-compacted offset to the
+     * possibly compacted offset.
+     */
+    valid = xstate_bv & ~XSTATE_FP_SSE;
+    while ( valid )
+    {
+        u64 feature = valid & -valid;
+        int index = fls(feature) - 1;
+        void *dest = get_xsave_addr(xsave, feature);
+
+        if (dest)
+            memcpy(dest, src + xstate_offsets[index], xstate_sizes[index]);
+	else
+            WARN_ON(1);
+
+        valid -= feature;
+    }
+}
+
 void xsaves(uint32_t lmask, uint32_t hmask, struct xsave_struct *ptr)
 {
     asm volatile ( ".byte 0x48,0x0f,0xc7,0x2f"
@@ -388,6 +520,12 @@ void xstate_init(bool_t bsp)
         /* XXX BUG_ON(!cpu_has_xgetbv1 != !(eax & XSTATE_FEATURE_XGETBV1)); */
         /* XXX BUG_ON(!cpu_has_xsaves != !(eax & XSTATE_FEATURE_XSAVES)); */
     }
+
+    if ( cpu_has_xsaves )
+    {
+        setup_xstate_features();
+        setup_xstate_comp();
+    }
 }
 
 static bool_t valid_xcr0(u64 xcr0)
diff --git a/xen/include/asm-x86/xstate.h b/xen/include/asm-x86/xstate.h
index 75fa6eb..454454d 100644
--- a/xen/include/asm-x86/xstate.h
+++ b/xen/include/asm-x86/xstate.h
@@ -22,7 +22,11 @@
 
 #define XCR_XFEATURE_ENABLED_MASK 0x00000000  /* index of XCR0 */
 
+#define XSAVE_HDR_SIZE            64
+#define XSAVE_SSE_OFFSET          160
 #define XSTATE_YMM_SIZE           256
+#define FXSAVE_SIZE               512
+#define XSAVE_HDR_OFFSET          FXSAVE_SIZE
 #define XSTATE_AREA_MIN_SIZE      (512 + 64)  /* FP/SSE + XSAVE.HEADER */
 
 #define XSTATE_FP      (1ULL << 0)
@@ -86,6 +90,8 @@ bool_t __must_check set_xcr0(u64 xfeatures);
 uint64_t get_xcr0(void);
 void xsaves(uint32_t lmask, uint32_t hmask, struct xsave_struct *ptr);
 void xrstors(uint32_t lmask, uint32_t hmask, struct xsave_struct *ptr);
+void save_xsave_states(struct vcpu *v, u8 *dest);
+void load_xsave_states(struct vcpu *v, u8 *src);
 void xsave(struct vcpu *v, uint64_t mask);
 void xrstor(struct vcpu *v, uint64_t mask);
 bool_t xsave_enabled(const struct vcpu *v);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH V3 6/6] x86/xsaves: detect xsaves/xgetbv1 in xen
  2015-08-05  1:57 [PATCH V3 0/6] add xsaves/xrstors support Shuai Ruan
                   ` (4 preceding siblings ...)
  2015-08-05  1:57 ` [PATCH V3 5/6] x86/xsaves: support compact format for hvm save/restore Shuai Ruan
@ 2015-08-05  1:57 ` Shuai Ruan
  2015-08-05 16:38 ` [PATCH V3 0/6] add xsaves/xrstors support Andrew Cooper
  6 siblings, 0 replies; 27+ messages in thread
From: Shuai Ruan @ 2015-08-05  1:57 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, wei.liu2, Ian.Campbell, stefano.stabellini,
	jun.nakajima, andrew.cooper3, ian.jackson, eddie.dong, jbeulich,
	keir

As xsaves/xgetbv1 already support, so switch on.

Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com>
---
 xen/arch/x86/xstate.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index 0eea146..a3f5c67 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -510,15 +510,15 @@ void xstate_init(bool_t bsp)
     {
         cpu_has_xsaveopt = !!(eax & XSTATE_FEATURE_XSAVEOPT);
         cpu_has_xsavec = !!(eax & XSTATE_FEATURE_XSAVEC);
-        /* XXX cpu_has_xgetbv1 = !!(eax & XSTATE_FEATURE_XGETBV1); */
-        /* XXX cpu_has_xsaves = !!(eax & XSTATE_FEATURE_XSAVES); */
+        cpu_has_xgetbv1 = !!(eax & XSTATE_FEATURE_XGETBV1);
+        cpu_has_xsaves = !!(eax & XSTATE_FEATURE_XSAVES);
     }
     else
     {
         BUG_ON(!cpu_has_xsaveopt != !(eax & XSTATE_FEATURE_XSAVEOPT));
         BUG_ON(!cpu_has_xsavec != !(eax & XSTATE_FEATURE_XSAVEC));
-        /* XXX BUG_ON(!cpu_has_xgetbv1 != !(eax & XSTATE_FEATURE_XGETBV1)); */
-        /* XXX BUG_ON(!cpu_has_xsaves != !(eax & XSTATE_FEATURE_XSAVES)); */
+        BUG_ON(!cpu_has_xgetbv1 != !(eax & XSTATE_FEATURE_XGETBV1));
+        BUG_ON(!cpu_has_xsaves != !(eax & XSTATE_FEATURE_XSAVES));
     }
 
     if ( cpu_has_xsaves )
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH V3 4/6] libxc: expose xsaves/xgetbv1/xsavec to hvm guest
  2015-08-05  1:57 ` [PATCH V3 4/6] libxc: expose xsaves/xgetbv1/xsavec to " Shuai Ruan
@ 2015-08-05  8:37   ` Ian Campbell
  2015-08-07  8:23     ` Shuai Ruan
  0 siblings, 1 reply; 27+ messages in thread
From: Ian Campbell @ 2015-08-05  8:37 UTC (permalink / raw)
  To: Shuai Ruan, xen-devel
  Cc: kevin.tian, wei.liu2, jbeulich, stefano.stabellini,
	andrew.cooper3, ian.jackson, eddie.dong, jun.nakajima, keir

On Wed, 2015-08-05 at 09:57 +0800, Shuai Ruan wrote:
> This patch exposes xsaves/xgetbv1/xsavec to hvm guest.
> The reserved bits of eax/ebx/ecx/edx must be cleaned up
> when call cpuid(0dh) with leaf 1 or 2..63.
> 
> According to the spec the following bits must be reseved:

"reserved"

> For leaf 1, bits 03-04/08-31 of ecx is reserved. Edx is reserved.
> For leaf 2...63, bits 01-31 of ecx is reserved. Edx is reserved.
> 
> Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com>

Although this is toolstack code I think this really ought to be acked by
the hypervisor x86 maintainers, if they are happy with it then I am too,
and in that case you may add:

Acked-by: Ian Campbell <ian.campbell@citrix.com>

> ---
>  tools/libxc/xc_cpuid_x86.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
> index c97f91a..b69676a 100644
> --- a/tools/libxc/xc_cpuid_x86.c
> +++ b/tools/libxc/xc_cpuid_x86.c
> @@ -211,6 +211,9 @@ static void intel_xc_cpuid_policy(
>  }
>  
>  #define XSAVEOPT        (1 << 0)
> +#define XSAVEC          (1 << 1)
> +#define XGETBV1         (1 << 2)
> +#define XSAVES          (1 << 3)
>  /* Configure extended state enumeration leaves (0x0000000D for xsave) */
>  static void xc_cpuid_config_xsave(
>      xc_interface *xch, domid_t domid, uint64_t xfeature_mask,
> @@ -247,8 +250,9 @@ static void xc_cpuid_config_xsave(
>          regs[1] = 512 + 64; /* FP/SSE + XSAVE.HEADER */
>          break;
>      case 1: /* leaf 1 */
> -        regs[0] &= XSAVEOPT;
> -        regs[1] = regs[2] = regs[3] = 0;
> +        regs[0] &= (XSAVEOPT | XSAVEC | XGETBV1 | XSAVES);
> +        regs[2] &= 0xe7;
> +        regs[3] = 0;
>          break;
>      case 2 ... 63: /* sub-leaves */
>          if ( !(xfeature_mask & (1ULL << input[1])) )
> @@ -256,8 +260,9 @@ static void xc_cpuid_config_xsave(
>              regs[0] = regs[1] = regs[2] = regs[3] = 0;
>              break;
>          }
> -        /* Don't touch EAX, EBX. Also cleanup ECX and EDX */
> -        regs[2] = regs[3] = 0;
> +        /* Don't touch EAX, EBX. Also cleanup EDX. Cleanup bits 01-32 of 
> ECX*/
> +        regs[2] &= 0x1;
> +        regs[3] = 0;
>          break;
>      }
>  }

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V3 0/6] add xsaves/xrstors support
  2015-08-05  1:57 [PATCH V3 0/6] add xsaves/xrstors support Shuai Ruan
                   ` (5 preceding siblings ...)
  2015-08-05  1:57 ` [PATCH V3 6/6] x86/xsaves: detect xsaves/xgetbv1 in xen Shuai Ruan
@ 2015-08-05 16:38 ` Andrew Cooper
  2015-08-07  8:25   ` Shuai Ruan
  6 siblings, 1 reply; 27+ messages in thread
From: Andrew Cooper @ 2015-08-05 16:38 UTC (permalink / raw)
  To: Shuai Ruan, xen-devel
  Cc: kevin.tian, wei.liu2, Ian.Campbell, stefano.stabellini,
	eddie.dong, ian.jackson, jbeulich, jun.nakajima, keir

On 05/08/15 02:57, Shuai Ruan wrote:
> Detail hardware spec can be found in chapter 13 (section 13.11 13.12) of the 
> Intel SDM [1].
>
> patch1: add xsaves/xrstors support for pv guest
> patch2: add xsaves/xrstors support for xen
> patch3-5: add xsaves/xrstors support for hvm guest
> patch6: swtich on detection of xsaves/xrstors/xgetbv in xen

This order of operations seems backwards.

Can I suggest starting with a patch which adds various xsaves/etc
defines/helper functions/etc (rather than having them spread through the
series), then a patch which allows Xen to start using the features, then
adding support to PV and HVM guests.

~Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V3 1/6] x86/xsaves: enable xsaves/xrstors for pv guest
  2015-08-05  1:57 ` [PATCH V3 1/6] x86/xsaves: enable xsaves/xrstors for pv guest Shuai Ruan
@ 2015-08-05 17:51   ` Andrew Cooper
  2015-08-07  8:00     ` Shuai Ruan
       [not found]     ` <20150807080008.GA2976@shuai.ruan@linux.intel.com>
  0 siblings, 2 replies; 27+ messages in thread
From: Andrew Cooper @ 2015-08-05 17:51 UTC (permalink / raw)
  To: Shuai Ruan, xen-devel
  Cc: kevin.tian, wei.liu2, Ian.Campbell, stefano.stabellini,
	eddie.dong, ian.jackson, jbeulich, jun.nakajima, keir

On 05/08/15 02:57, Shuai Ruan wrote:
> This patch emualtes xsaves/xrstors instructions and
> XSS msr access.
>
> As xsaves/xrstors instructions and XSS msr access
> required be executed only in ring0. So emulation are
> needed when pv guest uses these instructions.
>
> Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com>
> ---
>  xen/arch/x86/domain.c           |   3 +
>  xen/arch/x86/traps.c            | 138 ++++++++++++++++++++++++++++++++++++++++
>  xen/arch/x86/x86_64/mm.c        |  52 +++++++++++++++
>  xen/arch/x86/xstate.c           |  39 ++++++++++++
>  xen/include/asm-x86/domain.h    |   1 +
>  xen/include/asm-x86/mm.h        |   1 +
>  xen/include/asm-x86/msr-index.h |   2 +
>  xen/include/asm-x86/xstate.h    |   3 +
>  8 files changed, 239 insertions(+)
>
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index 045f6ff..e8b8d67 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -426,6 +426,7 @@ int vcpu_initialise(struct vcpu *v)
>  
>      /* By default, do not emulate */
>      v->arch.vm_event.emulate_flags = 0;
> +    v->arch.msr_ia32_xss = 0;

The backing memory for struct vcpu is zeroed when allocated.  There is
no need to explicitly zero this field here.

>  
>      rc = mapcache_vcpu_init(v);
>      if ( rc )
> @@ -1529,6 +1530,8 @@ static void __context_switch(void)
>              if ( xcr0 != get_xcr0() && !set_xcr0(xcr0) )
>                  BUG();
>          }
> +        if ( cpu_has_xsaves )
> +            wrmsr_safe(MSR_IA32_XSS, n->arch.msr_ia32_xss);

This musn't throw away potential errors.  It should not be possible for
n->arch.msr_ia32_xss to be invalid by this point, so a straight wrmsr()
would be correct.

However, you will want to implement lazy context switching, exactly like
get/set_xcr0().

>          vcpu_restore_fpu_eager(n);
>          n->arch.ctxt_switch_to(n);
>      }
> diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
> index 6a03582..c1fea77 100644
> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -2353,6 +2353,131 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
>          }
>          break;
>  
> +    case 0xc7:

This case should be sorted numerically, so should be between CPUID and
default.

> +    {
> +        void *xsave_addr;
> +        int not_page_aligned = 0;

bool_t

> +        u32 guest_xsaves_size = xstate_ctxt_size_compact(v->arch.xcr0);
> +
> +        switch ( insn_fetch(u8, code_base, eip, code_limit) )
> +        {
> +            case 0x2f:/* XSAVES */
> +            {
> +                if ( !cpu_has_xsaves || !(v->arch.pv_vcpu.ctrlreg[4] &
> +                                          X86_CR4_OSXSAVE))

I would format this as

if ( !cpu_has_xsaves ||
     !(v->arch.pv_vcpu.ctrlreg[4] & X86_CR4_OSXSAVE) )

to associate the bit test more clearly.

> +                {
> +                    do_guest_trap(TRAP_invalid_op, regs, 0);
> +                    goto skip;
> +                }
> +
> +                if ( v->arch.pv_vcpu.ctrlreg[0] & X86_CR0_TS )
> +                {
> +                    do_guest_trap(TRAP_nmi, regs, 0);

TRAP_no_device surely?

> +                    goto skip;
> +                }
> +
> +                if ( !guest_kernel_mode(v, regs) || (regs->edi & 0x3f) )

What does edi have to do with xsaves?  only edx:eax are special
according to the manual.

> +                    goto fail;
> +
> +                if ( (regs->edi & ~PAGE_MASK) + guest_xsaves_size > PAGE_SIZE )
> +                {
> +                    mfn_t mfn_list[2];
> +                    void *va;

This fails to account for the xsaves size growing to > PAGE_SIZE in
future processors.

/* TODO - expand for future processors .*/
BUG_ON(guest_xsaves_size <= PAGE_SIZE);

might be acceptable.  However, it is better fixed by...

> +
> +                    not_page_aligned = 1;
> +                    mfn_list[0] = _mfn(do_page_walk_mfn(v, regs->edi));
> +                    mfn_list[1] = _mfn(do_page_walk_mfn(v,
> +                                       PAGE_ALIGN(regs->edi)));
> +
> +                    va = __vmap(mfn_list, 1, 2, PAGE_SIZE, PAGE_HYPERVISOR);
> +                    xsave_addr = (void *)((unsigned long)va +
> +                                         (regs->edi & ~PAGE_MASK));

... introducing a new helper such as

void *vmap_guest_linear(void *va, size_t bytes,, uint32_t PFEC);

which takes care of the internal details of making a linear area of
guest virtual address space appear linear in Xen virtual address space
as well.

You also need to take care to respect non-writable pages, and have an
access_ok() check or a guest could (ab)use this emulation to write to
Xen code/data areas.

> +                }
> +                else
> +                    xsave_addr = do_page_walk(v, regs->edi);
> +
> +                if ( !xsave_addr )
> +                    goto fail;
> +
> +                xsaves(regs->eax, regs->edx, xsave_addr);
> +
> +                if ( not_page_aligned )
> +                    vunmap((void *)((unsigned long)xsave_addr & PAGE_MASK));
> +                else
> +                    unmap_domain_page(xsave_addr);
> +                break;
> +            }

Blank line here please.

> +            case 0x1f:/* XRSTORS */
> +            {
> +                struct xsave_struct guest_xsave_area;

Not on the stack please, but I don't believe you need this.

> +
> +                if ( !cpu_has_xsaves || !(v->arch.pv_vcpu.ctrlreg[4] &
> +                                          X86_CR4_OSXSAVE))
> +                {
> +                    do_guest_trap(TRAP_invalid_op, regs, 0);
> +                    goto skip;
> +                }
> +
> +                if ( v->arch.pv_vcpu.ctrlreg[0] & X86_CR0_TS )
> +                {
> +                    do_guest_trap(TRAP_nmi, regs, 0);
> +                    goto skip;
> +                }
> +
> +                if ( !guest_kernel_mode(v, regs) || (regs->edi & 0x3f) )
> +                    goto fail;
> +
> +                if ( (rc = copy_from_user(&guest_xsave_area, (void *) regs->edi,
> +                                          sizeof(struct xsave_struct))) !=0 )
> +                {
> +                    propagate_page_fault(regs->edi +
> +				         sizeof(struct xsave_struct) - rc, 0);
> +                    goto skip;

Surely you just need the xstate_bv and xcomp_bv ?

> +                }
> +                else
> +                if ( !(guest_xsave_area.xsave_hdr.xcomp_bv & 1l << 63) ||

1l << 63 is undefined behaviour by altering the sign bit with a shift. 
Perhaps you meant 1ul << 63 ?

> +                     (guest_xsave_area.xsave_hdr.xstate_bv |
> +                     guest_xsave_area.xsave_hdr.xcomp_bv) !=
> +                     guest_xsave_area.xsave_hdr.xcomp_bv ||

xcomp_bv is not introduced until patch 2, which means that this patch
doesn't build.  Please make absolutely sure that the entire series is
bisectable.

> +                     ((guest_xsave_area.xsave_hdr.xcomp_bv & ~(1l << 63)) |
> +                     v->arch.xcr0 | v->arch.msr_ia32_xss ) !=
> +                     (v->arch.xcr0 | v->arch.msr_ia32_xss) )
> +                    goto fail;
> +
> +                if ( (regs->edi & ~PAGE_MASK) + guest_xsaves_size > PAGE_SIZE )
> +                {
> +                    mfn_t mfn_list[2];
> +                    void *va;
> +
> +                    not_page_aligned = 1;
> +                    mfn_list[0] = _mfn(do_page_walk_mfn(v, regs->edi));
> +                    mfn_list[1] = _mfn(do_page_walk_mfn(v,
> +                                       PAGE_ALIGN(regs->edi)));
> +
> +                    va = __vmap(mfn_list, 1, 2, PAGE_SIZE, PAGE_HYPERVISOR);
> +                    xsave_addr = (void *)((unsigned long)va +
> +                                         (regs->edi & ~PAGE_MASK));
> +                }
> +                else
> +                    xsave_addr = do_page_walk(v, regs->edi);
> +
> +                if ( !xsave_addr )
> +                    goto fail;
> +
> +                xrstors(regs->eax, regs->edx, xsave_addr);
> +
> +                if ( not_page_aligned )
> +                    vunmap((void *)((unsigned long)xsave_addr & PAGE_MASK));
> +                else
> +                    unmap_domain_page(xsave_addr);
> +                break;
> +            }
> +            default:
> +                goto fail;
> +        }
> +        break;
> +    }
> +
>      case 0x06: /* CLTS */
>          (void)do_fpu_taskswitch(0);
>          break;
> @@ -2663,6 +2788,13 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
>                  }
>                  break;
>              }
> +        case MSR_IA32_XSS:
> +            if ( !cpu_has_xsaves )
> +                goto fail;
> +            if ( wrmsr_safe(regs->ecx, msr_content) != 0 )
> +                goto fail;

You can join these two if() statements together.  Also, it is
technically regs->_ecx.

> +            v->arch.msr_ia32_xss = msr_content;
> +            break;
>              /*FALLTHROUGH*/

Observe the comment which indicates that the previous case: deliberately
falls through to default:

>  
>          default:
> @@ -2798,6 +2930,12 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
>                  }
>                  break;
>              }
> +        case MSR_IA32_XSS:
> +            if ( !cpu_has_xsaves )
> +                goto fail;
> +            regs->eax = v->arch.msr_ia32_xss;
> +            regs->edx = v->arch.msr_ia32_xss >> 32;

val = v->arch.msr_ia32_xss;
goto rdmsr_writeback;

This code currently incorrectly clobbers the upper 32 bits of rax.

> +            break;
>              /*FALLTHROUGH*/

Again - another bad fall through case.

>  
>          default:
> diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
> index 98310f3..de94ac1 100644
> --- a/xen/arch/x86/x86_64/mm.c
> +++ b/xen/arch/x86/x86_64/mm.c
> @@ -48,6 +48,58 @@ l2_pgentry_t __section(".bss.page_aligned") l2_bootmap[L2_PAGETABLE_ENTRIES];
>  
>  l2_pgentry_t *compat_idle_pg_table_l2;
>  
> +unsigned long do_page_walk_mfn(struct vcpu *v, unsigned long addr)

What is this function?  Why is it useful?  Something like this belongs
in its own patch along with a description of why it is being introduced.

> +{
> +    unsigned long mfn = pagetable_get_pfn(v->arch.guest_table);
> +    l4_pgentry_t l4e, *l4t;
> +    l3_pgentry_t l3e, *l3t;
> +    l2_pgentry_t l2e, *l2t;
> +    l1_pgentry_t l1e, *l1t;
> +
> +    if ( !is_pv_vcpu(v) || !is_canonical_address(addr) )
> +        return 0;
> +
> +    l4t = map_domain_page(_mfn(mfn));
> +    l4e = l4t[l4_table_offset(addr)];
> +    unmap_domain_page(l4t);
> +    if ( !(l4e_get_flags(l4e) & _PAGE_PRESENT) )
> +        return 0;
> +
> +    l3t = map_l3t_from_l4e(l4e);
> +    l3e = l3t[l3_table_offset(addr)];
> +    unmap_domain_page(l3t);
> +    mfn = l3e_get_pfn(l3e);
> +    if ( !(l3e_get_flags(l3e) & _PAGE_PRESENT) || !mfn_valid(mfn) )
> +        return 0;
> +    if ( (l3e_get_flags(l3e) & _PAGE_PSE) )
> +    {
> +        mfn += PFN_DOWN(addr & ((1UL << L3_PAGETABLE_SHIFT) - 1));
> +        goto ret;
> +    }
> +
> +    l2t = map_domain_page(_mfn(mfn));
> +    l2e = l2t[l2_table_offset(addr)];
> +    unmap_domain_page(l2t);
> +    mfn = l2e_get_pfn(l2e);
> +    if ( !(l2e_get_flags(l2e) & _PAGE_PRESENT) || !mfn_valid(mfn) )
> +        return 0;
> +    if ( (l2e_get_flags(l2e) & _PAGE_PSE) )
> +    {
> +        mfn += PFN_DOWN(addr & ((1UL << L2_PAGETABLE_SHIFT) - 1));
> +        goto ret;
> +    }
> +
> +    l1t = map_domain_page(_mfn(mfn));
> +    l1e = l1t[l1_table_offset(addr)];
> +    unmap_domain_page(l1t);
> +    mfn = l1e_get_pfn(l1e);
> +    if ( !(l1e_get_flags(l1e) & _PAGE_PRESENT) || !mfn_valid(mfn) )
> +        return 0;
> +
> + ret:
> +    return mfn;
> +}
> +
>  void *do_page_walk(struct vcpu *v, unsigned long addr)
>  {
>      unsigned long mfn = pagetable_get_pfn(v->arch.guest_table);
> diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
> index d5f5e3b..e34eda3 100644
> --- a/xen/arch/x86/xstate.c
> +++ b/xen/arch/x86/xstate.c
> @@ -65,6 +65,31 @@ uint64_t get_xcr0(void)
>      return this_cpu(xcr0);
>  }
>  
> +void xsaves(uint32_t lmask, uint32_t hmask, struct xsave_struct *ptr)

Should take a 64bit mask.

> +{
> +    asm volatile ( ".byte 0x48,0x0f,0xc7,0x2f"
> +                    : "=m" (*ptr)
> +                    : "a" (lmask), "d" (hmask), "D" (ptr) );
> +}
> +
> +void xrstors(uint32_t lmask, uint32_t hmask, struct xsave_struct *ptr)
> +{
> +    asm volatile ( "1: .byte 0x48,0x0f,0xc7,0x1f\n"
> +                   ".section .fixup,\"ax\"      \n"
> +                   "2: mov %5,%%ecx             \n"
> +                   "   xor %1,%1                \n"
> +                   "   rep stosb                \n"
> +                   "   lea %2,%0                \n"
> +                   "   mov %3,%1                \n"
> +                   "   jmp 1b                   \n"
> +                   ".previous                   \n"
> +                   _ASM_EXTABLE(1b, 2b)
> +                   : "+&D" (ptr), "+&a" (lmask)
> +                   : "m" (*ptr), "g" (lmask), "d" (hmask),
> +                     "m" (xsave_cntxt_size)
> +                   : "ecx" );
> +}
> +

Neither of these two helpers have anything like sufficient checking to
be safely used on guest state.

>  void xsave(struct vcpu *v, uint64_t mask)
>  {
>      struct xsave_struct *ptr = v->arch.xsave_area;
> @@ -268,6 +293,20 @@ static unsigned int _xstate_ctxt_size(u64 xcr0)
>      return ebx;
>  }
>  
> +unsigned int xstate_ctxt_size_compact(u64 xcr0)
> +{
> +    u64 act_xcr0 = get_xcr0();
> +    u32 eax, ebx = 0, ecx, edx;
> +    bool_t ok = set_xcr0(xcr0);
> +
> +    ASSERT(ok);
> +    cpuid_count(XSTATE_CPUID, 1, &eax, &ebx, &ecx, &edx);
> +    ok = set_xcr0(act_xcr0);
> +    ASSERT(ok);
> +
> +    return ebx;
> +}
> +
>  /* Fastpath for common xstate size requests, avoiding reloads of xcr0. */
>  unsigned int xstate_ctxt_size(u64 xcr0)
>  {
> diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
> index 7a9e96f..aee781b 100644
> --- a/xen/include/asm-x86/domain.h
> +++ b/xen/include/asm-x86/domain.h
> @@ -496,6 +496,7 @@ struct arch_vcpu
>       */
>      struct xsave_struct *xsave_area;
>      uint64_t xcr0;
> +    u64 msr_ia32_xss;

uint64_t please, for consistency.

~Andrew

>      /* Accumulated eXtended features mask for using XSAVE/XRESTORE by Xen
>       * itself, as we can never know whether guest OS depends on content
>       * preservation whenever guest OS clears one feature flag (for example,
> diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
> index 8595c38..94a590e 100644
> --- a/xen/include/asm-x86/mm.h
> +++ b/xen/include/asm-x86/mm.h
> @@ -524,6 +524,7 @@ void make_cr3(struct vcpu *v, unsigned long mfn);
>  void update_cr3(struct vcpu *v);
>  int vcpu_destroy_pagetables(struct vcpu *);
>  struct trap_bounce *propagate_page_fault(unsigned long addr, u16 error_code);
> +unsigned long do_page_walk_mfn(struct vcpu *v, unsigned long addr);
>  void *do_page_walk(struct vcpu *v, unsigned long addr);
>  
>  int __sync_local_execstate(void);
> diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
> index 5425f77..365d995 100644
> --- a/xen/include/asm-x86/msr-index.h
> +++ b/xen/include/asm-x86/msr-index.h
> @@ -58,6 +58,8 @@
>  
>  #define MSR_IA32_BNDCFGS		0x00000D90
>  
> +#define MSR_IA32_XSS			0x00000da0
> +
>  #define MSR_MTRRfix64K_00000		0x00000250
>  #define MSR_MTRRfix16K_80000		0x00000258
>  #define MSR_MTRRfix16K_A0000		0x00000259
> diff --git a/xen/include/asm-x86/xstate.h b/xen/include/asm-x86/xstate.h
> index 4c690db..59c7156 100644
> --- a/xen/include/asm-x86/xstate.h
> +++ b/xen/include/asm-x86/xstate.h
> @@ -82,6 +82,8 @@ struct __packed __attribute__((aligned (64))) xsave_struct
>  /* extended state operations */
>  bool_t __must_check set_xcr0(u64 xfeatures);
>  uint64_t get_xcr0(void);
> +void xsaves(uint32_t lmask, uint32_t hmask, struct xsave_struct *ptr);
> +void xrstors(uint32_t lmask, uint32_t hmask, struct xsave_struct *ptr);
>  void xsave(struct vcpu *v, uint64_t mask);
>  void xrstor(struct vcpu *v, uint64_t mask);
>  bool_t xsave_enabled(const struct vcpu *v);
> @@ -92,6 +94,7 @@ int __must_check handle_xsetbv(u32 index, u64 new_bv);
>  void xstate_free_save_area(struct vcpu *v);
>  int xstate_alloc_save_area(struct vcpu *v);
>  void xstate_init(bool_t bsp);
> +unsigned int xstate_ctxt_size_compact(u64 xcr0);
>  unsigned int xstate_ctxt_size(u64 xcr0);
>  
>  #endif /* __ASM_XSTATE_H */

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V3 2/6] x86/xsaves: enable xsaves/xrstors in xen
  2015-08-05  1:57 ` [PATCH V3 2/6] x86/xsaves: enable xsaves/xrstors in xen Shuai Ruan
@ 2015-08-05 17:57   ` Andrew Cooper
  0 siblings, 0 replies; 27+ messages in thread
From: Andrew Cooper @ 2015-08-05 17:57 UTC (permalink / raw)
  To: Shuai Ruan, xen-devel
  Cc: kevin.tian, wei.liu2, Ian.Campbell, stefano.stabellini,
	eddie.dong, ian.jackson, jbeulich, jun.nakajima, keir

On 05/08/15 02:57, Shuai Ruan wrote:
> This patch uses xsaves/xrstors instead of xsaveopt/xrstor
> to perform the xsave_area switching so that xen itself
> can benefit from them when available.
>
> Please note that xsaves/xrstors only use compact format.
>
> Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com>
> ---
>  xen/arch/x86/i387.c          |  5 ++++
>  xen/arch/x86/xstate.c        | 70 +++++++++++++++++++++++++-------------------
>  xen/include/asm-x86/xstate.h |  4 ++-
>  3 files changed, 48 insertions(+), 31 deletions(-)
>
> diff --git a/xen/arch/x86/i387.c b/xen/arch/x86/i387.c
> index 14f2a79..9071374 100644
> --- a/xen/arch/x86/i387.c
> +++ b/xen/arch/x86/i387.c
> @@ -309,7 +309,12 @@ int vcpu_init_fpu(struct vcpu *v)
>          return rc;
>  
>      if ( v->arch.xsave_area )
> +    {
>          v->arch.fpu_ctxt = &v->arch.xsave_area->fpu_sse;
> +
> +        if ( cpu_has_xsaves )
> +            v->arch.xsave_area->xsave_hdr.xcomp_bv |= XSTATE_COMPACTION_ENABLED;
> +    }

By making this change, you need to make arch_set_info_guest() able to
interpret and unpack a compressed xsave area without hardware support.

Otherwise, you have broken migration of PV vcpus between Skylake and
non-Skylake hardware.

~Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V3 3/6] x86/xsaves: enable xsaves/xrstors for hvm guest
  2015-08-05  1:57 ` [PATCH V3 3/6] x86/xsaves: enable xsaves/xrstors for hvm guest Shuai Ruan
@ 2015-08-05 18:17   ` Andrew Cooper
  2015-08-07  8:22     ` Shuai Ruan
       [not found]     ` <20150807082244.GB2976@shuai.ruan@linux.intel.com>
  0 siblings, 2 replies; 27+ messages in thread
From: Andrew Cooper @ 2015-08-05 18:17 UTC (permalink / raw)
  To: Shuai Ruan, xen-devel
  Cc: kevin.tian, wei.liu2, Ian.Campbell, stefano.stabellini,
	eddie.dong, ian.jackson, jbeulich, jun.nakajima, keir

On 05/08/15 02:57, Shuai Ruan wrote:
> This patch enables xsaves for hvm guest, includes:
> 1.handle xsaves vmcs init and vmexit.
> 2.add logic to write/read the XSS msr.
>
> Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com>
> ---
>  xen/arch/x86/hvm/hvm.c             | 44 ++++++++++++++++++++++++++++++++++++++
>  xen/arch/x86/hvm/vmx/vmcs.c        |  7 +++++-
>  xen/arch/x86/hvm/vmx/vmx.c         | 18 ++++++++++++++++
>  xen/arch/x86/xstate.c              |  4 ++--
>  xen/include/asm-x86/hvm/vmx/vmcs.h |  5 +++++
>  xen/include/asm-x86/hvm/vmx/vmx.h  |  2 ++
>  xen/include/asm-x86/xstate.h       |  2 +-
>  7 files changed, 78 insertions(+), 4 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index c07e3ef..e5cf761 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -4370,6 +4370,10 @@ void hvm_hypervisor_cpuid_leaf(uint32_t sub_idx,
>      }
>  }
>  
> +#define XSAVEOPT	(1 << 0)
> +#define XSAVEC		(1 << 1)
> +#define XGETBV1	(1 << 2)
> +#define XSAVES		(1 << 3)

These should be in cpufeature.h, not here.

>  void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
>                                     unsigned int *ecx, unsigned int *edx)
>  {
> @@ -4456,6 +4460,34 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
>                      *ebx = _eax + _ebx;
>              }
>          }
> +        if ( count == 1 )
> +        {
> +            if ( cpu_has_xsaves )
> +            {
> +                *ebx = XSTATE_AREA_MIN_SIZE;
> +                if ( v->arch.xcr0 | v->arch.msr_ia32_xss )
> +                    for ( sub_leaf = 2; sub_leaf < 63; sub_leaf++ )
> +                    {
> +                        if ( !((v->arch.xcr0 | v->arch.msr_ia32_xss)
> +			   & (1ULL << sub_leaf)) )
> +                            continue;
> +                        domain_cpuid(d, input, sub_leaf, &_eax, &_ebx, &_ecx,
> +                                     &_edx);
> +                        *ebx =  *ebx + _eax;
> +                    }
> +            }
> +            else
> +            {
> +                *eax &= ~XSAVES;
> +                *ebx = *ecx = *edx = 0;
> +            }
> +            if ( !cpu_has_xgetbv1 )
> +                *eax &= ~XGETBV1;
> +            if ( !cpu_has_xsavec )
> +                *eax &= ~XSAVEC;
> +            if ( !cpu_has_xsaveopt )
> +                *eax &= ~XSAVEOPT;
> +        }

Urgh - I really need to get domain cpuid fixed in Xen.  This is
currently making a very bad situation a little worse.

>          break;
>  
>      case 0x80000001:
> @@ -4555,6 +4587,12 @@ int hvm_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
>          *msr_content = v->arch.hvm_vcpu.guest_efer;
>          break;
>  
> +    case MSR_IA32_XSS:
> +        if ( !cpu_has_vmx_xsaves )

vmx_xsaves has nothing to do with this here.  I presume you mean
cpu_has_xsave?

> +            goto gp_fault;
> +        *msr_content = v->arch.msr_ia32_xss;
> +        break;
> +
>      case MSR_IA32_TSC:
>          *msr_content = _hvm_rdtsc_intercept();
>          break;
> @@ -4687,6 +4725,12 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
>             return X86EMUL_EXCEPTION;
>          break;
>  
> +    case MSR_IA32_XSS:
> +        if ( !cpu_has_vmx_xsaves )
> +            goto gp_fault;
> +        v->arch.msr_ia32_xss = msr_content;

You must validate msr_content here and possibly hand a gp fault back to
the guest.

> +        break;
> +
>      case MSR_IA32_TSC:
>          hvm_set_guest_tsc(v, msr_content);
>          break;
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index 4c5ceb5..8e61e3f 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -230,7 +230,8 @@ static int vmx_init_vmcs_config(void)
>                 SECONDARY_EXEC_ENABLE_EPT |
>                 SECONDARY_EXEC_ENABLE_RDTSCP |
>                 SECONDARY_EXEC_PAUSE_LOOP_EXITING |
> -               SECONDARY_EXEC_ENABLE_INVPCID);
> +               SECONDARY_EXEC_ENABLE_INVPCID |
> +               SECONDARY_EXEC_XSAVES);
>          rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
>          if ( _vmx_misc_cap & VMX_MISC_VMWRITE_ALL )
>              opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING;
> @@ -921,6 +922,7 @@ void virtual_vmcs_vmwrite(void *vvmcs, u32 vmcs_encoding, u64 val)
>      virtual_vmcs_exit(vvmcs);
>  }
>  
> +#define VMX_XSS_EXIT_BITMAP 0

This define definitely doesn't live here.

>  static int construct_vmcs(struct vcpu *v)
>  {
>      struct domain *d = v->domain;
> @@ -1204,6 +1206,9 @@ static int construct_vmcs(struct vcpu *v)
>          __vmwrite(GUEST_PAT, guest_pat);
>      }
>  
> +    if ( cpu_has_vmx_xsaves )
> +        __vmwrite(XSS_EXIT_BITMAP, VMX_XSS_EXIT_BITMAP);
> +
>      vmx_vmcs_exit(v);
>  
>      /* PVH: paging mode is updated by arch_set_info_guest(). */
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index d3183a8..64ff63b 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -2708,6 +2708,16 @@ static int vmx_handle_apic_write(void)
>      return vlapic_apicv_write(current, exit_qualification & 0xfff);
>  }
>  
> +static void vmx_handle_xsaves(void)
> +{
> +    WARN();
> +}
> +
> +static void vmx_handle_xrstors(void)
> +{
> +    WARN();
> +}
> +

What is these supposed to do?  They are not an appropriate handlers.

~Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V3 5/6] x86/xsaves: support compact format for hvm save/restore
  2015-08-05  1:57 ` [PATCH V3 5/6] x86/xsaves: support compact format for hvm save/restore Shuai Ruan
@ 2015-08-05 18:45   ` Andrew Cooper
       [not found]     ` <20150811080143.GC14406@shuai.ruan@linux.intel.com>
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Cooper @ 2015-08-05 18:45 UTC (permalink / raw)
  To: Shuai Ruan, xen-devel
  Cc: kevin.tian, wei.liu2, Ian.Campbell, stefano.stabellini,
	eddie.dong, ian.jackson, jbeulich, jun.nakajima, keir

On 05/08/15 02:57, Shuai Ruan wrote:
> xsaves/xrstors only use compact format, so format convertion
> is needed when perform save/restore.
>
> Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com>
> ---
>  xen/arch/x86/domain.c        |   3 +
>  xen/arch/x86/hvm/hvm.c       |  16 +++--
>  xen/arch/x86/xstate.c        | 138 +++++++++++++++++++++++++++++++++++++++++++
>  xen/include/asm-x86/xstate.h |   6 ++
>  4 files changed, 158 insertions(+), 5 deletions(-)
>
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index e8b8d67..083b70d 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -845,6 +845,9 @@ int arch_set_info_guest(
>          memcpy(v->arch.fpu_ctxt, &c.nat->fpu_ctxt, sizeof(c.nat->fpu_ctxt));
>          if ( v->arch.xsave_area )
>               v->arch.xsave_area->xsave_hdr.xstate_bv = XSTATE_FP_SSE;
> +        if ( cpu_has_xsaves )
> +            v->arch.xsave_area->xsave_hdr.xcomp_bv = v->arch.xcr0_accum |
> +                                                     XSTATE_COMPACTION_ENABLED;
>      }
>  
>      if ( !compat )
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index e5cf761..8495938 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -2127,8 +2127,11 @@ static int hvm_save_cpu_xsave_states(struct domain *d, hvm_domain_context_t *h)
>          ctxt->xfeature_mask = xfeature_mask;
>          ctxt->xcr0 = v->arch.xcr0;
>          ctxt->xcr0_accum = v->arch.xcr0_accum;
> -        memcpy(&ctxt->save_area, v->arch.xsave_area,
> -               size - offsetof(struct hvm_hw_cpu_xsave, save_area));
> +        if ( cpu_has_xsaves )
> +            save_xsave_states(v, (u8 *)&ctxt->save_area);

This absolutely needs to take a size parameter, and looks like it should
take a void pointer.

> +        else
> +            memcpy(&ctxt->save_area, v->arch.xsave_area,
> +                   size - offsetof(struct hvm_hw_cpu_xsave, save_area));
>      }
>  
>      return 0;
> @@ -2227,9 +2230,12 @@ static int hvm_load_cpu_xsave_states(struct domain *d, hvm_domain_context_t *h)
>      v->arch.xcr0_accum = ctxt->xcr0_accum;
>      if ( ctxt->xcr0_accum & XSTATE_NONLAZY )
>          v->arch.nonlazy_xstate_used = 1;
> -    memcpy(v->arch.xsave_area, &ctxt->save_area,
> -           min(desc->length, size) - offsetof(struct hvm_hw_cpu_xsave,
> -           save_area));
> +    if ( cpu_has_xsaves )
> +        load_xsave_states(v, (u8 *)&ctxt->save_area);
> +    else
> +        memcpy(v->arch.xsave_area, &ctxt->save_area,
> +               min(desc->length, size) - offsetof(struct hvm_hw_cpu_xsave,
> +               save_area));
>  
>      return 0;
>  }
> diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
> index 699058d..0eea146 100644
> --- a/xen/arch/x86/xstate.c
> +++ b/xen/arch/x86/xstate.c
> @@ -29,6 +29,9 @@ static u32 __read_mostly xsave_cntxt_size;
>  /* A 64-bit bitmask of the XSAVE/XRSTOR features supported by processor. */
>  u64 __read_mostly xfeature_mask;
>  
> +static unsigned int *xstate_offsets, *xstate_sizes;
> +static unsigned int xstate_features;
> +static unsigned int xstate_comp_offsets[sizeof(xfeature_mask)*8];
>  /* Cached xcr0 for fast read */
>  static DEFINE_PER_CPU(uint64_t, xcr0);
>  
> @@ -65,6 +68,135 @@ uint64_t get_xcr0(void)
>      return this_cpu(xcr0);
>  }
>  
> +static void setup_xstate_features(void)
> +{
> +    unsigned int eax, ebx, ecx, edx, leaf = 0x2;
> +
> +    xstate_features = fls(xfeature_mask);
> +    xstate_offsets = _xzalloc(xstate_features, sizeof(int));
> +    xstate_sizes = _xzalloc(xstate_features, sizeof(int));

Don't mix and match types.  xzalloc_array() is what you should use.

> +
> +    do {
> +        cpuid_count(XSTATE_CPUID, leaf, &eax, &ebx, &ecx, &edx);
> +
> +        if ( eax == 0 )
> +            break;
> +
> +        xstate_offsets[leaf] = ebx;
> +        xstate_sizes[leaf] = eax;
> +
> +        leaf++;
> +    } while (1);

This is erroneous if there is a break in the feature bits, and liable to
wander off the end of the array.

This loop should be a for loop over set bits in xfeature_mask, not an
infinite while loop.

> +}
> +
> +static void setup_xstate_comp(void)
> +{
> +    unsigned int xstate_comp_sizes[sizeof(xfeature_mask)*8];
> +    int i;

unsigned int.

> +
> +    /*
> +     * The FP xstates and SSE xstates are legacy states. They are always
> +     * in the fixed offsets in the xsave area in either compacted form
> +     * or standard form.
> +     */
> +    xstate_comp_offsets[0] = 0;
> +    xstate_comp_offsets[1] = XSAVE_SSE_OFFSET;
> +
> +    xstate_comp_offsets[2] = FXSAVE_SIZE + XSAVE_HDR_SIZE;
> +
> +    for (i = 2; i < xstate_features; i++)

This loop will run off the end of xstate_comp_sizes[] for any processor
supporting AVX512 or greater.

> +    {
> +        if ( 1 << i & xfeature_mask )
You definitely need some brackets here.

> +            xstate_comp_sizes[i] = xstate_sizes[i];
> +        else
> +            xstate_comp_sizes[i] = 0;
> +
> +        if ( i > 2 )
> +            xstate_comp_offsets[i] = xstate_comp_offsets[i-1]
> +                                    + xstate_comp_sizes[i-1];
> +    }
> +}
> +
> +static void *get_xsave_addr(struct xsave_struct *xsave, int xstate)
> +{
> +    int feature = fls(xstate) - 1;
> +    if ( !(1 << feature & xfeature_mask) )
> +        return NULL;
> +
> +    return (void *)xsave + xstate_comp_offsets[feature];
> +}
> +
> +void save_xsave_states(struct vcpu *v, u8 *dest)
> +{
> +    struct xsave_struct *xsave = v->arch.xsave_area;
> +    u64 xstate_bv = xsave->xsave_hdr.xstate_bv;
> +    u64 valid;
> +
> +    /*
> +     * Copy legacy XSAVE area, to avoid complications with CPUID
> +     * leaves 0 and 1 in the loop below.
> +     */
> +    memcpy(dest, xsave, XSAVE_HDR_OFFSET);
> +
> +    /* Set XSTATE_BV */
> +    *(u64 *)(dest + XSAVE_HDR_OFFSET) = xstate_bv;
> +
> +    /*
> +     * Copy each region from the possibly compacted offset to the
> +     * non-compacted offset.
> +     */
> +    valid = xstate_bv & ~XSTATE_FP_SSE;
> +    while ( valid )
> +    {
> +        u64 feature = valid & -valid;
> +        int index = fls(feature) - 1;
> +        void *src = get_xsave_addr(xsave, feature);
> +
> +        if ( src )
> +            memcpy(dest + xstate_offsets[index], src, xstate_sizes[index]);
> +        else
> +            WARN_ON(1);

These WARN_ON()s are of no use whatsoever.  They should either be
dropped, or turned to BUG() after printing some emergency state.

> +
> +        valid -= feature;
> +    }
> +}
> +
> +void load_xsave_states(struct vcpu *v, u8 *src)
> +{
> +    struct xsave_struct *xsave = v->arch.xsave_area;
> +    u64 xstate_bv = *(u64 *)(src + XSAVE_HDR_OFFSET);
> +    u64 valid;
> +
> +    /*
> +     * Copy legacy XSAVE area, to avoid complications with CPUID
> +     * leaves 0 and 1 in the loop below.
> +     */
> +    memcpy(xsave, src, XSAVE_HDR_OFFSET);
> +
> +    /* Set XSTATE_BV and possibly XCOMP_BV.  */
> +    xsave->xsave_hdr.xstate_bv = xstate_bv;
> +    xsave->xsave_hdr.xcomp_bv = v->arch.xcr0_accum | XSTATE_COMPACTION_ENABLED;
> +
> +    /*
> +     * Copy each region from the non-compacted offset to the
> +     * possibly compacted offset.
> +     */
> +    valid = xstate_bv & ~XSTATE_FP_SSE;
> +    while ( valid )
> +    {
> +        u64 feature = valid & -valid;
> +        int index = fls(feature) - 1;
> +        void *dest = get_xsave_addr(xsave, feature);
> +
> +        if (dest)
> +            memcpy(dest, src + xstate_offsets[index], xstate_sizes[index]);
> +	else
> +            WARN_ON(1);

Tabs.

~Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V3 1/6] x86/xsaves: enable xsaves/xrstors for pv guest
  2015-08-05 17:51   ` Andrew Cooper
@ 2015-08-07  8:00     ` Shuai Ruan
       [not found]     ` <20150807080008.GA2976@shuai.ruan@linux.intel.com>
  1 sibling, 0 replies; 27+ messages in thread
From: Shuai Ruan @ 2015-08-07  8:00 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: kevin.tian, wei.liu2, Ian.Campbell, stefano.stabellini,
	jun.nakajima, ian.jackson, eddie.dong, xen-devel, jbeulich, keir

On Wed, Aug 05, 2015 at 06:51:29PM +0100, Andrew Cooper wrote:
> On 05/08/15 02:57, Shuai Ruan wrote:
> > This patch emualtes xsaves/xrstors instructions and
> > XSS msr access.
> >
> > As xsaves/xrstors instructions and XSS msr access
> > required be executed only in ring0. So emulation are
> > needed when pv guest uses these instructions.
> >
> > Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com>
> > ---
> >  xen/arch/x86/domain.c           |   3 +
> >  xen/arch/x86/traps.c            | 138 ++++++++++++++++++++++++++++++++++++++++
> >  xen/arch/x86/x86_64/mm.c        |  52 +++++++++++++++
> >  xen/arch/x86/xstate.c           |  39 ++++++++++++
> >  xen/include/asm-x86/domain.h    |   1 +
> >  xen/include/asm-x86/mm.h        |   1 +
> >  xen/include/asm-x86/msr-index.h |   2 +
> >  xen/include/asm-x86/xstate.h    |   3 +
> >  8 files changed, 239 insertions(+)
> >
> > diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> > index 045f6ff..e8b8d67 100644
> > --- a/xen/arch/x86/domain.c
> > +++ b/xen/arch/x86/domain.c
> > @@ -426,6 +426,7 @@ int vcpu_initialise(struct vcpu *v)
> >  
> >      /* By default, do not emulate */
> >      v->arch.vm_event.emulate_flags = 0;
> > +    v->arch.msr_ia32_xss = 0;
> 
> The backing memory for struct vcpu is zeroed when allocated.  There is
> no need to explicitly zero this field here.
> 
Ok.
> >  
> >      rc = mapcache_vcpu_init(v);
> >      if ( rc )
> > @@ -1529,6 +1530,8 @@ static void __context_switch(void)
> >              if ( xcr0 != get_xcr0() && !set_xcr0(xcr0) )
> >                  BUG();
> >          }
> > +        if ( cpu_has_xsaves )
> > +            wrmsr_safe(MSR_IA32_XSS, n->arch.msr_ia32_xss);
> 
> This musn't throw away potential errors.  It should not be possible for
> n->arch.msr_ia32_xss to be invalid by this point, so a straight wrmsr()
> would be correct.
> 
> However, you will want to implement lazy context switching, exactly like
> get/set_xcr0().
> 
Ok.
> >          vcpu_restore_fpu_eager(n);
> >          n->arch.ctxt_switch_to(n);
> >      }
> > diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
> > index 6a03582..c1fea77 100644
> > --- a/xen/arch/x86/traps.c
> > +++ b/xen/arch/x86/traps.c
> > @@ -2353,6 +2353,131 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
> >          }
> >          break;
> >  
> > +    case 0xc7:
> 
> This case should be sorted numerically, so should be between CPUID and
> default.
> 
Ok.
> > +    {
> > +        void *xsave_addr;
> > +        int not_page_aligned = 0;
> 
> bool_t
> 
Sorry.
> > +        u32 guest_xsaves_size = xstate_ctxt_size_compact(v->arch.xcr0);
> > +
> > +        switch ( insn_fetch(u8, code_base, eip, code_limit) )
> > +        {
> > +            case 0x2f:/* XSAVES */
> > +            {
> > +                if ( !cpu_has_xsaves || !(v->arch.pv_vcpu.ctrlreg[4] &
> > +                                          X86_CR4_OSXSAVE))
> 
> I would format this as
> 
> if ( !cpu_has_xsaves ||
>      !(v->arch.pv_vcpu.ctrlreg[4] & X86_CR4_OSXSAVE) )
> 
> to associate the bit test more clearly.
> 
Ok.
> > +                {
> > +                    do_guest_trap(TRAP_invalid_op, regs, 0);
> > +                    goto skip;
> > +                }
> > +
> > +                if ( v->arch.pv_vcpu.ctrlreg[0] & X86_CR0_TS )
> > +                {
> > +                    do_guest_trap(TRAP_nmi, regs, 0);
> 
> TRAP_no_device surely?
> 
Yes. According to SDM Volum1 Section 13.11. For xsaves,
If CR0.TS[bit 3] is 1, a device-not-available exception (#NM) occurs.
> > +                    goto skip;
> > +                }
> > +
> > +                if ( !guest_kernel_mode(v, regs) || (regs->edi & 0x3f) )
> 
> What does edi have to do with xsaves?  only edx:eax are special
> according to the manual.
> 
regs->edi is the guest_linear_address 
> > +                    goto fail;
> > +
> > +                if ( (regs->edi & ~PAGE_MASK) + guest_xsaves_size > PAGE_SIZE )
> > +                {
> > +                    mfn_t mfn_list[2];
> > +                    void *va;
> 
> This fails to account for the xsaves size growing to > PAGE_SIZE in
> future processors.
> 
> /* TODO - expand for future processors .*/
> BUG_ON(guest_xsaves_size <= PAGE_SIZE);
> 
Yes, I agree.
> might be acceptable.  However, it is better fixed by...
> 
> > +
> > +                    not_page_aligned = 1;
> > +                    mfn_list[0] = _mfn(do_page_walk_mfn(v, regs->edi));
> > +                    mfn_list[1] = _mfn(do_page_walk_mfn(v,
> > +                                       PAGE_ALIGN(regs->edi)));
> > +
> > +                    va = __vmap(mfn_list, 1, 2, PAGE_SIZE, PAGE_HYPERVISOR);
> > +                    xsave_addr = (void *)((unsigned long)va +
> > +                                         (regs->edi & ~PAGE_MASK));
> 
> ... introducing a new helper such as
> 
> void *vmap_guest_linear(void *va, size_t bytes,, uint32_t PFEC);
> 
> which takes care of the internal details of making a linear area of
> guest virtual address space appear linear in Xen virtual address space
> as well.
> 
> You also need to take care to respect non-writable pages, and have an
> access_ok() check or a guest could (ab)use this emulation to write to
> Xen code/data areas.
> 
I will introduce a new helper in next version.
> > +                }
> > +                else
> > +                    xsave_addr = do_page_walk(v, regs->edi);
> > +
> > +                if ( !xsave_addr )
> > +                    goto fail;
> > +
> > +                xsaves(regs->eax, regs->edx, xsave_addr);
> > +
> > +                if ( not_page_aligned )
> > +                    vunmap((void *)((unsigned long)xsave_addr & PAGE_MASK));
> > +                else
> > +                    unmap_domain_page(xsave_addr);
> > +                break;
> > +            }
> 
> Blank line here please.
> 
Sorry.
> > +            case 0x1f:/* XRSTORS */
> > +            {
> > +                struct xsave_struct guest_xsave_area;
> 
> Not on the stack please, but I don't believe you need this.
> 
guest_xsave_area is needed for basic checking before performing xrstors.
> > +
> > +                if ( !cpu_has_xsaves || !(v->arch.pv_vcpu.ctrlreg[4] &
> > +                                          X86_CR4_OSXSAVE))
> > +                {
> > +                    do_guest_trap(TRAP_invalid_op, regs, 0);
> > +                    goto skip;
> > +                }
> > +
> > +                if ( v->arch.pv_vcpu.ctrlreg[0] & X86_CR0_TS )
> > +                {
> > +                    do_guest_trap(TRAP_nmi, regs, 0);
> > +                    goto skip;
> > +                }
> > +
> > +                if ( !guest_kernel_mode(v, regs) || (regs->edi & 0x3f) )
> > +                    goto fail;
> > +
> > +                if ( (rc = copy_from_user(&guest_xsave_area, (void *) regs->edi,
> > +                                          sizeof(struct xsave_struct))) !=0 )
> > +                {
> > +                    propagate_page_fault(regs->edi +
> > +				         sizeof(struct xsave_struct) - rc, 0);
> > +                    goto skip;
> 
> Surely you just need the xstate_bv and xcomp_bv ?
> 
I will dig into SDM to see whether I missing some checkings.
> > +                }
> > +                else
> > +                if ( !(guest_xsave_area.xsave_hdr.xcomp_bv & 1l << 63) ||
> 
> 1l << 63 is undefined behaviour by altering the sign bit with a shift. 
> Perhaps you meant 1ul << 63 ?
Yes, sorry.
> 
> > +                     (guest_xsave_area.xsave_hdr.xstate_bv |
> > +                     guest_xsave_area.xsave_hdr.xcomp_bv) !=
> > +                     guest_xsave_area.xsave_hdr.xcomp_bv ||
> 
> xcomp_bv is not introduced until patch 2, which means that this patch
> doesn't build.  Please make absolutely sure that the entire series is
> bisectable.
> 
Ok, I will fix this in next version.
> > +                     ((guest_xsave_area.xsave_hdr.xcomp_bv & ~(1l << 63)) |
> > +                     v->arch.xcr0 | v->arch.msr_ia32_xss ) !=
> > +                     (v->arch.xcr0 | v->arch.msr_ia32_xss) )
> > +                    goto fail;
> > +
> > +                if ( (regs->edi & ~PAGE_MASK) + guest_xsaves_size > PAGE_SIZE )
> > +                {
> > +                    mfn_t mfn_list[2];
> > +                    void *va;
> > +
> > +                    not_page_aligned = 1;
> > +                    mfn_list[0] = _mfn(do_page_walk_mfn(v, regs->edi));
> > +                    mfn_list[1] = _mfn(do_page_walk_mfn(v,
> > +                                       PAGE_ALIGN(regs->edi)));
> > +
> > +                    va = __vmap(mfn_list, 1, 2, PAGE_SIZE, PAGE_HYPERVISOR);
> > +                    xsave_addr = (void *)((unsigned long)va +
> > +                                         (regs->edi & ~PAGE_MASK));
> > +                }
> > +                else
> > +                    xsave_addr = do_page_walk(v, regs->edi);
> > +
> > +                if ( !xsave_addr )
> > +                    goto fail;
> > +
> > +                xrstors(regs->eax, regs->edx, xsave_addr);
> > +
> > +                if ( not_page_aligned )
> > +                    vunmap((void *)((unsigned long)xsave_addr & PAGE_MASK));
> > +                else
> > +                    unmap_domain_page(xsave_addr);
> > +                break;
> > +            }
> > +            default:
> > +                goto fail;
> > +        }
> > +        break;
> > +    }
> > +
> >      case 0x06: /* CLTS */
> >          (void)do_fpu_taskswitch(0);
> >          break;
> > @@ -2663,6 +2788,13 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
> >                  }
> >                  break;
> >              }
> > +        case MSR_IA32_XSS:
> > +            if ( !cpu_has_xsaves )
> > +                goto fail;
> > +            if ( wrmsr_safe(regs->ecx, msr_content) != 0 )
> > +                goto fail;
> 
> You can join these two if() statements together.  Also, it is
> technically regs->_ecx.
> 
Ok.
> > +            v->arch.msr_ia32_xss = msr_content;
> > +            break;
> >              /*FALLTHROUGH*/
> 
> Observe the comment which indicates that the previous case: deliberately
> falls through to default:
> 
Sorry.
> >  
> >          default:
> > @@ -2798,6 +2930,12 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
> >                  }
> >                  break;
> >              }
> > +        case MSR_IA32_XSS:
> > +            if ( !cpu_has_xsaves )
> > +                goto fail;
> > +            regs->eax = v->arch.msr_ia32_xss;
> > +            regs->edx = v->arch.msr_ia32_xss >> 32;
> 
> val = v->arch.msr_ia32_xss;
> goto rdmsr_writeback;
> 
> This code currently incorrectly clobbers the upper 32 bits of rax.
> 
Ok.
> > +            break;
> >              /*FALLTHROUGH*/
> 
> Again - another bad fall through case.
> 
> >  
> >          default:
> > diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
> > index 98310f3..de94ac1 100644
> > --- a/xen/arch/x86/x86_64/mm.c
> > +++ b/xen/arch/x86/x86_64/mm.c
> > @@ -48,6 +48,58 @@ l2_pgentry_t __section(".bss.page_aligned") l2_bootmap[L2_PAGETABLE_ENTRIES];
> >  
> >  l2_pgentry_t *compat_idle_pg_table_l2;
> >  
> > +unsigned long do_page_walk_mfn(struct vcpu *v, unsigned long addr)
> 
> What is this function?  Why is it useful?  Something like this belongs
> in its own patch along with a description of why it is being introduced.
> 
The fucntion is used for getting the mfn related to guest linear address.
Is there an another existing function I can use that can do the same
thing? Can you give me a suggestion.
> > +{
> > +    unsigned long mfn = pagetable_get_pfn(v->arch.guest_table);
> > +    l4_pgentry_t l4e, *l4t;
> > +    l3_pgentry_t l3e, *l3t;
> > +    l2_pgentry_t l2e, *l2t;
> > +    l1_pgentry_t l1e, *l1t;
> > +
> > +    if ( !is_pv_vcpu(v) || !is_canonical_address(addr) )
> > +        return 0;
> > +
> > +    l4t = map_domain_page(_mfn(mfn));
> > +    l4e = l4t[l4_table_offset(addr)];
> > +    unmap_domain_page(l4t);
> > +    if ( !(l4e_get_flags(l4e) & _PAGE_PRESENT) )
> > +        return 0;
> > +
> > +    l3t = map_l3t_from_l4e(l4e);
> > +    l3e = l3t[l3_table_offset(addr)];
> > +    unmap_domain_page(l3t);
> > +    mfn = l3e_get_pfn(l3e);
> > +    if ( !(l3e_get_flags(l3e) & _PAGE_PRESENT) || !mfn_valid(mfn) )
> > +        return 0;
> > +    if ( (l3e_get_flags(l3e) & _PAGE_PSE) )
> > +    {
> > +        mfn += PFN_DOWN(addr & ((1UL << L3_PAGETABLE_SHIFT) - 1));
> > +        goto ret;
> > +    }
> > +
> > +    l2t = map_domain_page(_mfn(mfn));
> > +    l2e = l2t[l2_table_offset(addr)];
> > +    unmap_domain_page(l2t);
> > +    mfn = l2e_get_pfn(l2e);
> > +    if ( !(l2e_get_flags(l2e) & _PAGE_PRESENT) || !mfn_valid(mfn) )
> > +        return 0;
> > +    if ( (l2e_get_flags(l2e) & _PAGE_PSE) )
> > +    {
> > +        mfn += PFN_DOWN(addr & ((1UL << L2_PAGETABLE_SHIFT) - 1));
> > +        goto ret;
> > +    }
> > +
> > +    l1t = map_domain_page(_mfn(mfn));
> > +    l1e = l1t[l1_table_offset(addr)];
> > +    unmap_domain_page(l1t);
> > +    mfn = l1e_get_pfn(l1e);
> > +    if ( !(l1e_get_flags(l1e) & _PAGE_PRESENT) || !mfn_valid(mfn) )
> > +        return 0;
> > +
> > + ret:
> > +    return mfn;
> > +}
> > +
> >  void *do_page_walk(struct vcpu *v, unsigned long addr)
> >  {
> >      unsigned long mfn = pagetable_get_pfn(v->arch.guest_table);
> > diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
> > index d5f5e3b..e34eda3 100644
> > --- a/xen/arch/x86/xstate.c
> > +++ b/xen/arch/x86/xstate.c
> > @@ -65,6 +65,31 @@ uint64_t get_xcr0(void)
> >      return this_cpu(xcr0);
> >  }
> >  
> > +void xsaves(uint32_t lmask, uint32_t hmask, struct xsave_struct *ptr)
> 
> Should take a 64bit mask.
> 
Function xsaves will be called by function xsave. In function xsave,
parameter 'mask' has already been divided into 'lmask' and 'hmask'.
So I think use lmask and hmask as parameter is proper.
> > +{
> > +    asm volatile ( ".byte 0x48,0x0f,0xc7,0x2f"
> > +                    : "=m" (*ptr)
> > +                    : "a" (lmask), "d" (hmask), "D" (ptr) );
> > +}
> > +
> > +void xrstors(uint32_t lmask, uint32_t hmask, struct xsave_struct *ptr)
> > +{
> > +    asm volatile ( "1: .byte 0x48,0x0f,0xc7,0x1f\n"
> > +                   ".section .fixup,\"ax\"      \n"
> > +                   "2: mov %5,%%ecx             \n"
> > +                   "   xor %1,%1                \n"
> > +                   "   rep stosb                \n"
> > +                   "   lea %2,%0                \n"
> > +                   "   mov %3,%1                \n"
> > +                   "   jmp 1b                   \n"
> > +                   ".previous                   \n"
> > +                   _ASM_EXTABLE(1b, 2b)
> > +                   : "+&D" (ptr), "+&a" (lmask)
> > +                   : "m" (*ptr), "g" (lmask), "d" (hmask),
> > +                     "m" (xsave_cntxt_size)
> > +                   : "ecx" );
> > +}
> > +
> 
> Neither of these two helpers have anything like sufficient checking to
> be safely used on guest state.
> 
Basic checking is done before these two helpers.
> >  void xsave(struct vcpu *v, uint64_t mask)
> >  {
> >      struct xsave_struct *ptr = v->arch.xsave_area;
> > @@ -268,6 +293,20 @@ static unsigned int _xstate_ctxt_size(u64 xcr0)
> >      return ebx;
> >  }
> >  
> > +unsigned int xstate_ctxt_size_compact(u64 xcr0)
> > +{
> > +    u64 act_xcr0 = get_xcr0();
> > +    u32 eax, ebx = 0, ecx, edx;
> > +    bool_t ok = set_xcr0(xcr0);
> > +
> > +    ASSERT(ok);
> > +    cpuid_count(XSTATE_CPUID, 1, &eax, &ebx, &ecx, &edx);
> > +    ok = set_xcr0(act_xcr0);
> > +    ASSERT(ok);
> > +
> > +    return ebx;
> > +}
> > +
> >  /* Fastpath for common xstate size requests, avoiding reloads of xcr0. */
> >  unsigned int xstate_ctxt_size(u64 xcr0)
> >  {
> > diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
> > index 7a9e96f..aee781b 100644
> > --- a/xen/include/asm-x86/domain.h
> > +++ b/xen/include/asm-x86/domain.h
> > @@ -496,6 +496,7 @@ struct arch_vcpu
> >       */
> >      struct xsave_struct *xsave_area;
> >      uint64_t xcr0;
> > +    u64 msr_ia32_xss;
> 
> uint64_t please, for consistency.
Sorry.
> 
> ~Andrew
> 
Thanks for your review. Andrew
> >      /* Accumulated eXtended features mask for using XSAVE/XRESTORE by Xen
> >       * itself, as we can never know whether guest OS depends on content
> >       * preservation whenever guest OS clears one feature flag (for example,
> > diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
> > index 8595c38..94a590e 100644
> > --- a/xen/include/asm-x86/mm.h
> > +++ b/xen/include/asm-x86/mm.h
> > @@ -524,6 +524,7 @@ void make_cr3(struct vcpu *v, unsigned long mfn);
> >  void update_cr3(struct vcpu *v);
> >  int vcpu_destroy_pagetables(struct vcpu *);
> >  struct trap_bounce *propagate_page_fault(unsigned long addr, u16 error_code);
> > +unsigned long do_page_walk_mfn(struct vcpu *v, unsigned long addr);
> >  void *do_page_walk(struct vcpu *v, unsigned long addr);
> >  
> >  int __sync_local_execstate(void);
> > diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
> > index 5425f77..365d995 100644
> > --- a/xen/include/asm-x86/msr-index.h
> > +++ b/xen/include/asm-x86/msr-index.h
> > @@ -58,6 +58,8 @@
> >  
> >  #define MSR_IA32_BNDCFGS		0x00000D90
> >  
> > +#define MSR_IA32_XSS			0x00000da0
> > +
> >  #define MSR_MTRRfix64K_00000		0x00000250
> >  #define MSR_MTRRfix16K_80000		0x00000258
> >  #define MSR_MTRRfix16K_A0000		0x00000259
> > diff --git a/xen/include/asm-x86/xstate.h b/xen/include/asm-x86/xstate.h
> > index 4c690db..59c7156 100644
> > --- a/xen/include/asm-x86/xstate.h
> > +++ b/xen/include/asm-x86/xstate.h
> > @@ -82,6 +82,8 @@ struct __packed __attribute__((aligned (64))) xsave_struct
> >  /* extended state operations */
> >  bool_t __must_check set_xcr0(u64 xfeatures);
> >  uint64_t get_xcr0(void);
> > +void xsaves(uint32_t lmask, uint32_t hmask, struct xsave_struct *ptr);
> > +void xrstors(uint32_t lmask, uint32_t hmask, struct xsave_struct *ptr);
> >  void xsave(struct vcpu *v, uint64_t mask);
> >  void xrstor(struct vcpu *v, uint64_t mask);
> >  bool_t xsave_enabled(const struct vcpu *v);
> > @@ -92,6 +94,7 @@ int __must_check handle_xsetbv(u32 index, u64 new_bv);
> >  void xstate_free_save_area(struct vcpu *v);
> >  int xstate_alloc_save_area(struct vcpu *v);
> >  void xstate_init(bool_t bsp);
> > +unsigned int xstate_ctxt_size_compact(u64 xcr0);
> >  unsigned int xstate_ctxt_size(u64 xcr0);
> >  
> >  #endif /* __ASM_XSTATE_H */
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V3 3/6] x86/xsaves: enable xsaves/xrstors for hvm guest
  2015-08-05 18:17   ` Andrew Cooper
@ 2015-08-07  8:22     ` Shuai Ruan
       [not found]     ` <20150807082244.GB2976@shuai.ruan@linux.intel.com>
  1 sibling, 0 replies; 27+ messages in thread
From: Shuai Ruan @ 2015-08-07  8:22 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: kevin.tian, wei.liu2, Ian.Campbell, stefano.stabellini,
	jun.nakajima, ian.jackson, eddie.dong, xen-devel, jbeulich, keir

On Wed, Aug 05, 2015 at 07:17:44PM +0100, Andrew Cooper wrote:
> On 05/08/15 02:57, Shuai Ruan wrote:
> > This patch enables xsaves for hvm guest, includes:
> > 1.handle xsaves vmcs init and vmexit.
> > 2.add logic to write/read the XSS msr.
> >
> > Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com>
> > ---
> >  xen/arch/x86/hvm/hvm.c             | 44 ++++++++++++++++++++++++++++++++++++++
> >  xen/arch/x86/hvm/vmx/vmcs.c        |  7 +++++-
> >  xen/arch/x86/hvm/vmx/vmx.c         | 18 ++++++++++++++++
> >  xen/arch/x86/xstate.c              |  4 ++--
> >  xen/include/asm-x86/hvm/vmx/vmcs.h |  5 +++++
> >  xen/include/asm-x86/hvm/vmx/vmx.h  |  2 ++
> >  xen/include/asm-x86/xstate.h       |  2 +-
> >  7 files changed, 78 insertions(+), 4 deletions(-)
> >
> > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> > index c07e3ef..e5cf761 100644
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -4370,6 +4370,10 @@ void hvm_hypervisor_cpuid_leaf(uint32_t sub_idx,
> >      }
> >  }
> >  
> > +#define XSAVEOPT	(1 << 0)
> > +#define XSAVEC		(1 << 1)
> > +#define XGETBV1	(1 << 2)
> > +#define XSAVES		(1 << 3)
> 
> These should be in cpufeature.h, not here.
> 
Ok.
> >  void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
> >                                     unsigned int *ecx, unsigned int *edx)
> >  {
> > @@ -4456,6 +4460,34 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
> >                      *ebx = _eax + _ebx;
> >              }
> >          }
> > +        if ( count == 1 )
> > +        {
> > +            if ( cpu_has_xsaves )
> > +            {
> > +                *ebx = XSTATE_AREA_MIN_SIZE;
> > +                if ( v->arch.xcr0 | v->arch.msr_ia32_xss )
> > +                    for ( sub_leaf = 2; sub_leaf < 63; sub_leaf++ )
> > +                    {
> > +                        if ( !((v->arch.xcr0 | v->arch.msr_ia32_xss)
> > +			   & (1ULL << sub_leaf)) )
> > +                            continue;
> > +                        domain_cpuid(d, input, sub_leaf, &_eax, &_ebx, &_ecx,
> > +                                     &_edx);
> > +                        *ebx =  *ebx + _eax;
> > +                    }
> > +            }
> > +            else
> > +            {
> > +                *eax &= ~XSAVES;
> > +                *ebx = *ecx = *edx = 0;
> > +            }
> > +            if ( !cpu_has_xgetbv1 )
> > +                *eax &= ~XGETBV1;
> > +            if ( !cpu_has_xsavec )
> > +                *eax &= ~XSAVEC;
> > +            if ( !cpu_has_xsaveopt )
> > +                *eax &= ~XSAVEOPT;
> > +        }
> 
> Urgh - I really need to get domain cpuid fixed in Xen.  This is
> currently making a very bad situation a little worse.
> 
In patch 4, I expose the xsaves/xsavec/xsaveopt and need to check
whether the hardware supoort it. What's your suggestion about this?
> >          break;
> >  
> >      case 0x80000001:
> > @@ -4555,6 +4587,12 @@ int hvm_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
> >          *msr_content = v->arch.hvm_vcpu.guest_efer;
> >          break;
> >  
> > +    case MSR_IA32_XSS:
> > +        if ( !cpu_has_vmx_xsaves )
> 
> vmx_xsaves has nothing to do with this here.  I presume you mean
> cpu_has_xsave?
> 
> > +            goto gp_fault;
> > +        *msr_content = v->arch.msr_ia32_xss;
> > +        break;
> > +
> >      case MSR_IA32_TSC:
> >          *msr_content = _hvm_rdtsc_intercept();
> >          break;
> > @@ -4687,6 +4725,12 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
> >             return X86EMUL_EXCEPTION;
> >          break;
> >  
> > +    case MSR_IA32_XSS:
> > +        if ( !cpu_has_vmx_xsaves )
> > +            goto gp_fault;
> > +        v->arch.msr_ia32_xss = msr_content;
> 
> You must validate msr_content here and possibly hand a gp fault back to
> the guest.
> 
Ok, I will fix it.
> > +        break;
> > +
> >      case MSR_IA32_TSC:
> >          hvm_set_guest_tsc(v, msr_content);
> >          break;
> > diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> > index 4c5ceb5..8e61e3f 100644
> > --- a/xen/arch/x86/hvm/vmx/vmcs.c
> > +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> > @@ -230,7 +230,8 @@ static int vmx_init_vmcs_config(void)
> >                 SECONDARY_EXEC_ENABLE_EPT |
> >                 SECONDARY_EXEC_ENABLE_RDTSCP |
> >                 SECONDARY_EXEC_PAUSE_LOOP_EXITING |
> > -               SECONDARY_EXEC_ENABLE_INVPCID);
> > +               SECONDARY_EXEC_ENABLE_INVPCID |
> > +               SECONDARY_EXEC_XSAVES);
> >          rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
> >          if ( _vmx_misc_cap & VMX_MISC_VMWRITE_ALL )
> >              opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING;
> > @@ -921,6 +922,7 @@ void virtual_vmcs_vmwrite(void *vvmcs, u32 vmcs_encoding, u64 val)
> >      virtual_vmcs_exit(vvmcs);
> >  }
> >  
> > +#define VMX_XSS_EXIT_BITMAP 0
> 
> This define definitely doesn't live here.
> 
Ok.
> >  static int construct_vmcs(struct vcpu *v)
> >  {
> >      struct domain *d = v->domain;
> > @@ -1204,6 +1206,9 @@ static int construct_vmcs(struct vcpu *v)
> >          __vmwrite(GUEST_PAT, guest_pat);
> >      }
> >  
> > +    if ( cpu_has_vmx_xsaves )
> > +        __vmwrite(XSS_EXIT_BITMAP, VMX_XSS_EXIT_BITMAP);
> > +
> >      vmx_vmcs_exit(v);
> >  
> >      /* PVH: paging mode is updated by arch_set_info_guest(). */
> > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> > index d3183a8..64ff63b 100644
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -2708,6 +2708,16 @@ static int vmx_handle_apic_write(void)
> >      return vlapic_apicv_write(current, exit_qualification & 0xfff);
> >  }
> >  
> > +static void vmx_handle_xsaves(void)
> > +{
> > +    WARN();
> > +}
> > +
> > +static void vmx_handle_xrstors(void)
> > +{
> > +    WARN();
> > +}
> > +
> 
> What is these supposed to do?  They are not an appropriate handlers.
> 
These two handlers do nothing here. Perform xsaves in HVM guest will 
not trap in hypersior in this patch (by setting XSS_EXIT_BITMAP zero). 
However it may trap in the future. See SDM Volume 3 Section 25.1.3 
for detail information.
> ~Andrew
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V3 4/6] libxc: expose xsaves/xgetbv1/xsavec to hvm guest
  2015-08-05  8:37   ` Ian Campbell
@ 2015-08-07  8:23     ` Shuai Ruan
  0 siblings, 0 replies; 27+ messages in thread
From: Shuai Ruan @ 2015-08-07  8:23 UTC (permalink / raw)
  To: Ian Campbell
  Cc: kevin.tian, wei.liu2, eddie.dong, stefano.stabellini,
	jun.nakajima, andrew.cooper3, ian.jackson, xen-devel, jbeulich,
	keir

On Wed, Aug 05, 2015 at 09:37:22AM +0100, Ian Campbell wrote:
> On Wed, 2015-08-05 at 09:57 +0800, Shuai Ruan wrote:
> > This patch exposes xsaves/xgetbv1/xsavec to hvm guest.
> > The reserved bits of eax/ebx/ecx/edx must be cleaned up
> > when call cpuid(0dh) with leaf 1 or 2..63.
> > 
> > According to the spec the following bits must be reseved:
> 
> "reserved"
> 
Sorry,:)
> > For leaf 1, bits 03-04/08-31 of ecx is reserved. Edx is reserved.
> > For leaf 2...63, bits 01-31 of ecx is reserved. Edx is reserved.
> > 
> > Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com>
> 
> Although this is toolstack code I think this really ought to be acked by
> the hypervisor x86 maintainers, if they are happy with it then I am too,
> and in that case you may add:
> 
> Acked-by: Ian Campbell <ian.campbell@citrix.com>
> 
Ok.Thanks for your review, Ian.
> > ---
> >  tools/libxc/xc_cpuid_x86.c | 13 +++++++++----
> >  1 file changed, 9 insertions(+), 4 deletions(-)
> > 
> > diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
> > index c97f91a..b69676a 100644
> > --- a/tools/libxc/xc_cpuid_x86.c
> > +++ b/tools/libxc/xc_cpuid_x86.c
> > @@ -211,6 +211,9 @@ static void intel_xc_cpuid_policy(
> >  }
> >  
> >  #define XSAVEOPT        (1 << 0)
> > +#define XSAVEC          (1 << 1)
> > +#define XGETBV1         (1 << 2)
> > +#define XSAVES          (1 << 3)
> >  /* Configure extended state enumeration leaves (0x0000000D for xsave) */
> >  static void xc_cpuid_config_xsave(
> >      xc_interface *xch, domid_t domid, uint64_t xfeature_mask,
> > @@ -247,8 +250,9 @@ static void xc_cpuid_config_xsave(
> >          regs[1] = 512 + 64; /* FP/SSE + XSAVE.HEADER */
> >          break;
> >      case 1: /* leaf 1 */
> > -        regs[0] &= XSAVEOPT;
> > -        regs[1] = regs[2] = regs[3] = 0;
> > +        regs[0] &= (XSAVEOPT | XSAVEC | XGETBV1 | XSAVES);
> > +        regs[2] &= 0xe7;
> > +        regs[3] = 0;
> >          break;
> >      case 2 ... 63: /* sub-leaves */
> >          if ( !(xfeature_mask & (1ULL << input[1])) )
> > @@ -256,8 +260,9 @@ static void xc_cpuid_config_xsave(
> >              regs[0] = regs[1] = regs[2] = regs[3] = 0;
> >              break;
> >          }
> > -        /* Don't touch EAX, EBX. Also cleanup ECX and EDX */
> > -        regs[2] = regs[3] = 0;
> > +        /* Don't touch EAX, EBX. Also cleanup EDX. Cleanup bits 01-32 of 
> > ECX*/
> > +        regs[2] &= 0x1;
> > +        regs[3] = 0;
> >          break;
> >      }
> >  }
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V3 0/6] add xsaves/xrstors support
  2015-08-05 16:38 ` [PATCH V3 0/6] add xsaves/xrstors support Andrew Cooper
@ 2015-08-07  8:25   ` Shuai Ruan
  0 siblings, 0 replies; 27+ messages in thread
From: Shuai Ruan @ 2015-08-07  8:25 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: kevin.tian, wei.liu2, Ian.Campbell, stefano.stabellini,
	jun.nakajima, ian.jackson, eddie.dong, xen-devel, jbeulich, keir

On Wed, Aug 05, 2015 at 05:38:21PM +0100, Andrew Cooper wrote:
> On 05/08/15 02:57, Shuai Ruan wrote:
> > Detail hardware spec can be found in chapter 13 (section 13.11 13.12) of the 
> > Intel SDM [1].
> >
> > patch1: add xsaves/xrstors support for pv guest
> > patch2: add xsaves/xrstors support for xen
> > patch3-5: add xsaves/xrstors support for hvm guest
> > patch6: swtich on detection of xsaves/xrstors/xgetbv in xen
> 
> This order of operations seems backwards.
> 
> Can I suggest starting with a patch which adds various xsaves/etc
> defines/helper functions/etc (rather than having them spread through the
> series), then a patch which allows Xen to start using the features, then
> adding support to PV and HVM guests.
> 
> ~Andrew
> 
> _______________________________________________
> Xen-devel mailing list
OK , I will do this in next version.
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V3 1/6] x86/xsaves: enable xsaves/xrstors for pv guest
       [not found]     ` <20150807080008.GA2976@shuai.ruan@linux.intel.com>
@ 2015-08-07 12:44       ` Andrew Cooper
  2015-08-11  7:50         ` Shuai Ruan
       [not found]         ` <20150811075039.GA14406@shuai.ruan@linux.intel.com>
  0 siblings, 2 replies; 27+ messages in thread
From: Andrew Cooper @ 2015-08-07 12:44 UTC (permalink / raw)
  To: Shuai Ruan
  Cc: kevin.tian, wei.liu2, Ian.Campbell, stefano.stabellini,
	jun.nakajima, ian.jackson, eddie.dong, xen-devel, jbeulich, keir

On 07/08/15 09:00, Shuai Ruan wrote:
>
>>> +                    goto skip;
>>> +                }
>>> +
>>> +                if ( !guest_kernel_mode(v, regs) || (regs->edi & 0x3f) )
>> What does edi have to do with xsaves?  only edx:eax are special
>> according to the manual.
>>
> regs->edi is the guest_linear_address

Whyso?  xsaves takes an unconditional memory parameter,  not a pointer
in %rdi.  (regs->edi is only correct for ins/outs because the pointer is
architecturally required to be in %rdi.)

There is nothing currently in emulate_privileged_op() which does ModRM
decoding for memory references, nor SIB decoding.  xsaves/xrstors would
be the first such operations.

I am also not sure that adding arbitrary memory decode here is sensible.

In an ideal world, we would have what is currently x86_emulate() split
in 3 stages.

Stage 1 does straight instruction decode to some internal representation.

Stage 2 does an audit to see whether the decoded instruction is
plausible for the reason why an emulation was needed.  We have had a
number of security issues with emulation in the past where guests cause
one instruction to trap for emulation, then rewrite the instruction to
be something else, and exploit a bug in the emulator.

Stage 3 performs the actions required for emulation.

Currently, x86_emulate() is limited to instructions which might
legitimately fault for emulation, but with the advent of VM
introspection, this is proving to be insufficient.  With my x86
maintainers hat on, I would like to avoid the current situation we have
with multiple bits of code doing x86 instruction decode and emulation
(which are all different).

I think the 3-step approach above caters suitably to all usecases, but
it is a large project itself.  It allows the introspection people to
have a full and complete x86 emulation infrastructure, while also
preventing areas like the shadow paging from being opened up to
potential vulnerabilities in unrelated areas of the x86 architecture.

I would even go so far as to say that it is probably ok not to support
xsaves/xrestors in PV guests until something along the above lines is
sorted.  The first feature in XSS is processor trace which a PV guest
couldn't use anyway.  I suspect the same applies to most of the other
XSS features, or they wouldn't need to be privileged in the first place.

>
>>> +
>>> +                if ( !cpu_has_xsaves || !(v->arch.pv_vcpu.ctrlreg[4] &
>>> +                                          X86_CR4_OSXSAVE))
>>> +                {
>>> +                    do_guest_trap(TRAP_invalid_op, regs, 0);
>>> +                    goto skip;
>>> +                }
>>> +
>>> +                if ( v->arch.pv_vcpu.ctrlreg[0] & X86_CR0_TS )
>>> +                {
>>> +                    do_guest_trap(TRAP_nmi, regs, 0);
>>> +                    goto skip;
>>> +                }
>>> +
>>> +                if ( !guest_kernel_mode(v, regs) || (regs->edi & 0x3f) )
>>> +                    goto fail;
>>> +
>>> +                if ( (rc = copy_from_user(&guest_xsave_area, (void *) regs->edi,
>>> +                                          sizeof(struct xsave_struct))) !=0 )
>>> +                {
>>> +                    propagate_page_fault(regs->edi +
>>> +				         sizeof(struct xsave_struct) - rc, 0);
>>> +                    goto skip;
>> Surely you just need the xstate_bv and xcomp_bv ?
>>
> I will dig into SDM to see whether I missing some checkings.

What I mean by this is that xstate_bv and xcomp_bv are all that you are
checking, so you just need two uint64_t's, rather than a full xsave_struct.

>
>>>  
>>>          default:
>>> diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
>>> index 98310f3..de94ac1 100644
>>> --- a/xen/arch/x86/x86_64/mm.c
>>> +++ b/xen/arch/x86/x86_64/mm.c
>>> @@ -48,6 +48,58 @@ l2_pgentry_t __section(".bss.page_aligned") l2_bootmap[L2_PAGETABLE_ENTRIES];
>>>  
>>>  l2_pgentry_t *compat_idle_pg_table_l2;
>>>  
>>> +unsigned long do_page_walk_mfn(struct vcpu *v, unsigned long addr)
>> What is this function?  Why is it useful?  Something like this belongs
>> in its own patch along with a description of why it is being introduced.
>>
> The fucntion is used for getting the mfn related to guest linear address.
> Is there an another existing function I can use that can do the same
> thing? Can you give me a suggestion.

do_page_walk() and use virt_to_mfn() on the result?  (I am just
guessing, but

>
>>> +{
>>> +    asm volatile ( ".byte 0x48,0x0f,0xc7,0x2f"
>>> +                    : "=m" (*ptr)
>>> +                    : "a" (lmask), "d" (hmask), "D" (ptr) );
>>> +}
>>> +
>>> +void xrstors(uint32_t lmask, uint32_t hmask, struct xsave_struct *ptr)
>>> +{
>>> +    asm volatile ( "1: .byte 0x48,0x0f,0xc7,0x1f\n"
>>> +                   ".section .fixup,\"ax\"      \n"
>>> +                   "2: mov %5,%%ecx             \n"
>>> +                   "   xor %1,%1                \n"
>>> +                   "   rep stosb                \n"
>>> +                   "   lea %2,%0                \n"
>>> +                   "   mov %3,%1                \n"
>>> +                   "   jmp 1b                   \n"
>>> +                   ".previous                   \n"
>>> +                   _ASM_EXTABLE(1b, 2b)
>>> +                   : "+&D" (ptr), "+&a" (lmask)
>>> +                   : "m" (*ptr), "g" (lmask), "d" (hmask),
>>> +                     "m" (xsave_cntxt_size)
>>> +                   : "ecx" );
>>> +}
>>> +
>> Neither of these two helpers have anything like sufficient checking to
>> be safely used on guest state.
>>
> Basic checking is done before these two helpers.

But this isn't the only place where they are used.

~Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V3 3/6] x86/xsaves: enable xsaves/xrstors for hvm guest
       [not found]     ` <20150807082244.GB2976@shuai.ruan@linux.intel.com>
@ 2015-08-07 13:04       ` Andrew Cooper
  2015-08-11  7:59         ` Shuai Ruan
       [not found]         ` <20150811075909.GB14406@shuai.ruan@linux.intel.com>
  0 siblings, 2 replies; 27+ messages in thread
From: Andrew Cooper @ 2015-08-07 13:04 UTC (permalink / raw)
  To: Shuai Ruan
  Cc: kevin.tian, wei.liu2, Ian.Campbell, stefano.stabellini,
	jun.nakajima, ian.jackson, eddie.dong, xen-devel, jbeulich, keir

On 07/08/15 09:22, Shuai Ruan wrote:
>
>>>  void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
>>>                                     unsigned int *ecx, unsigned int *edx)
>>>  {
>>> @@ -4456,6 +4460,34 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
>>>                      *ebx = _eax + _ebx;
>>>              }
>>>          }
>>> +        if ( count == 1 )
>>> +        {
>>> +            if ( cpu_has_xsaves )
>>> +            {
>>> +                *ebx = XSTATE_AREA_MIN_SIZE;
>>> +                if ( v->arch.xcr0 | v->arch.msr_ia32_xss )
>>> +                    for ( sub_leaf = 2; sub_leaf < 63; sub_leaf++ )
>>> +                    {
>>> +                        if ( !((v->arch.xcr0 | v->arch.msr_ia32_xss)
>>> +			   & (1ULL << sub_leaf)) )
>>> +                            continue;
>>> +                        domain_cpuid(d, input, sub_leaf, &_eax, &_ebx, &_ecx,
>>> +                                     &_edx);
>>> +                        *ebx =  *ebx + _eax;
>>> +                    }
>>> +            }
>>> +            else
>>> +            {
>>> +                *eax &= ~XSAVES;
>>> +                *ebx = *ecx = *edx = 0;
>>> +            }
>>> +            if ( !cpu_has_xgetbv1 )
>>> +                *eax &= ~XGETBV1;
>>> +            if ( !cpu_has_xsavec )
>>> +                *eax &= ~XSAVEC;
>>> +            if ( !cpu_has_xsaveopt )
>>> +                *eax &= ~XSAVEOPT;
>>> +        }
>> Urgh - I really need to get domain cpuid fixed in Xen.  This is
>> currently making a very bad situation a little worse.
>>
> In patch 4, I expose the xsaves/xsavec/xsaveopt and need to check
> whether the hardware supoort it. What's your suggestion about this?

Calling into domain_cpuid() in the loop is not useful as nothing will
set the subleaves up.  As a first pass, reading from
xstate_{offsets,sizes} will be better than nothing, as it will at least
match reality until the domain is migrated.

Longterm, I plan to overhaul the cpuid infrastructure to allow it to
properly represent per-core and per-package data, as well as move it
into the Xen architectural migration state, to avoid any host specific
values leaking into guest state.  This however is also a lot of work,
which you don't want to dependent on.

>
>>>  static int construct_vmcs(struct vcpu *v)
>>>  {
>>>      struct domain *d = v->domain;
>>> @@ -1204,6 +1206,9 @@ static int construct_vmcs(struct vcpu *v)
>>>          __vmwrite(GUEST_PAT, guest_pat);
>>>      }
>>>  
>>> +    if ( cpu_has_vmx_xsaves )
>>> +        __vmwrite(XSS_EXIT_BITMAP, VMX_XSS_EXIT_BITMAP);
>>> +
>>>      vmx_vmcs_exit(v);
>>>  
>>>      /* PVH: paging mode is updated by arch_set_info_guest(). */
>>> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
>>> index d3183a8..64ff63b 100644
>>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>>> @@ -2708,6 +2708,16 @@ static int vmx_handle_apic_write(void)
>>>      return vlapic_apicv_write(current, exit_qualification & 0xfff);
>>>  }
>>>  
>>> +static void vmx_handle_xsaves(void)
>>> +{
>>> +    WARN();
>>> +}
>>> +
>>> +static void vmx_handle_xrstors(void)
>>> +{
>>> +    WARN();
>>> +}
>>> +
>> What is these supposed to do?  They are not an appropriate handlers.
>>
> These two handlers do nothing here. Perform xsaves in HVM guest will 
> not trap in hypersior in this patch (by setting XSS_EXIT_BITMAP zero). 
> However it may trap in the future. See SDM Volume 3 Section 25.1.3 
> for detail information.

in which case use domain_crash().  WARN() here will allow a guest to DoS
Xen.

~Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V3 1/6] x86/xsaves: enable xsaves/xrstors for pv guest
  2015-08-07 12:44       ` Andrew Cooper
@ 2015-08-11  7:50         ` Shuai Ruan
       [not found]         ` <20150811075039.GA14406@shuai.ruan@linux.intel.com>
  1 sibling, 0 replies; 27+ messages in thread
From: Shuai Ruan @ 2015-08-11  7:50 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: kevin.tian, wei.liu2, Ian.Campbell, stefano.stabellini,
	eddie.dong, ian.jackson, xen-devel, jbeulich, jun.nakajima, keir

On Fri, Aug 07, 2015 at 01:44:41PM +0100, Andrew Cooper wrote:
> On 07/08/15 09:00, Shuai Ruan wrote:
> >
> >>> +                    goto skip;
> >>> +                }
> >>> +
> >>> +                if ( !guest_kernel_mode(v, regs) || (regs->edi & 0x3f) )
> >> What does edi have to do with xsaves?  only edx:eax are special
> >> according to the manual.
> >>
> > regs->edi is the guest_linear_address
> 
> Whyso?  xsaves takes an unconditional memory parameter,  not a pointer
> in %rdi.  (regs->edi is only correct for ins/outs because the pointer is
> architecturally required to be in %rdi.)
You are right. The linear_address should be decoded from the instruction.
> 
> There is nothing currently in emulate_privileged_op() which does ModRM
> decoding for memory references, nor SIB decoding.  xsaves/xrstors would
> be the first such operations.
> 
> I am also not sure that adding arbitrary memory decode here is sensible.
> 
> In an ideal world, we would have what is currently x86_emulate() split
> in 3 stages.
> 
> Stage 1 does straight instruction decode to some internal representation.
> 
> Stage 2 does an audit to see whether the decoded instruction is
> plausible for the reason why an emulation was needed.  We have had a
> number of security issues with emulation in the past where guests cause
> one instruction to trap for emulation, then rewrite the instruction to
> be something else, and exploit a bug in the emulator.
> 
> Stage 3 performs the actions required for emulation.
> 
> Currently, x86_emulate() is limited to instructions which might
> legitimately fault for emulation, but with the advent of VM
> introspection, this is proving to be insufficient.  With my x86
> maintainers hat on, I would like to avoid the current situation we have
> with multiple bits of code doing x86 instruction decode and emulation
> (which are all different).
> 
> I think the 3-step approach above caters suitably to all usecases, but
> it is a large project itself.  It allows the introspection people to
> have a full and complete x86 emulation infrastructure, while also
> preventing areas like the shadow paging from being opened up to
> potential vulnerabilities in unrelated areas of the x86 architecture.
> 
> I would even go so far as to say that it is probably ok not to support
> xsaves/xrestors in PV guests until something along the above lines is
> sorted.  The first feature in XSS is processor trace which a PV guest
> couldn't use anyway.  I suspect the same applies to most of the other
Why PV guest couldn't use precessor trace?
> XSS features, or they wouldn't need to be privileged in the first place.
> 
Thanks for your such detail suggestions.
For xsaves/xrstors would also bring other benefits for PV guest such as
saving memory of XSAVE area. If we do not support xsaves/xrstors in PV , 
PV guest would lose these benefits. What's your opinions toward this?
> >
> >>> +
> >>> +                if ( !cpu_has_xsaves || !(v->arch.pv_vcpu.ctrlreg[4] &
> >>> +                                          X86_CR4_OSXSAVE))
> >>> +                {
> >>> +                    do_guest_trap(TRAP_invalid_op, regs, 0);
> >>> +                    goto skip;
> >>> +                }
> >>> +
> >>> +                if ( v->arch.pv_vcpu.ctrlreg[0] & X86_CR0_TS )
> >>> +                {
> >>> +                    do_guest_trap(TRAP_nmi, regs, 0);
> >>> +                    goto skip;
> >>> +                }
> >>> +
> >>> +                if ( !guest_kernel_mode(v, regs) || (regs->edi & 0x3f) )
> >>> +                    goto fail;
> >>> +
> >>> +                if ( (rc = copy_from_user(&guest_xsave_area, (void *) regs->edi,
> >>> +                                          sizeof(struct xsave_struct))) !=0 )
> >>> +                {
> >>> +                    propagate_page_fault(regs->edi +
> >>> +				         sizeof(struct xsave_struct) - rc, 0);
> >>> +                    goto skip;
> >> Surely you just need the xstate_bv and xcomp_bv ?
> >>
> > I will dig into SDM to see whether I missing some checkings.
> 
> What I mean by this is that xstate_bv and xcomp_bv are all that you are
> checking, so you just need two uint64_t's, rather than a full xsave_struct.
> 
Sorry to misunderstand your meaning.
> >
> >>>  
> >>>          default:
> >>> diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
> >>> index 98310f3..de94ac1 100644
> >>> --- a/xen/arch/x86/x86_64/mm.c
> >>> +++ b/xen/arch/x86/x86_64/mm.c
> >>> @@ -48,6 +48,58 @@ l2_pgentry_t __section(".bss.page_aligned") l2_bootmap[L2_PAGETABLE_ENTRIES];
> >>>  
> >>>  l2_pgentry_t *compat_idle_pg_table_l2;
> >>>  
> >>> +unsigned long do_page_walk_mfn(struct vcpu *v, unsigned long addr)
> >> What is this function?  Why is it useful?  Something like this belongs
> >> in its own patch along with a description of why it is being introduced.
> >>
> > The fucntion is used for getting the mfn related to guest linear address.
> > Is there an another existing function I can use that can do the same
> > thing? Can you give me a suggestion.
> 
> do_page_walk() and use virt_to_mfn() on the result?  (I am just
> guessing, but
> 
> >
> >>> +{
> >>> +    asm volatile ( ".byte 0x48,0x0f,0xc7,0x2f"
> >>> +                    : "=m" (*ptr)
> >>> +                    : "a" (lmask), "d" (hmask), "D" (ptr) );
> >>> +}
> >>> +
> >>> +void xrstors(uint32_t lmask, uint32_t hmask, struct xsave_struct *ptr)
> >>> +{
> >>> +    asm volatile ( "1: .byte 0x48,0x0f,0xc7,0x1f\n"
> >>> +                   ".section .fixup,\"ax\"      \n"
> >>> +                   "2: mov %5,%%ecx             \n"
> >>> +                   "   xor %1,%1                \n"
> >>> +                   "   rep stosb                \n"
> >>> +                   "   lea %2,%0                \n"
> >>> +                   "   mov %3,%1                \n"
> >>> +                   "   jmp 1b                   \n"
> >>> +                   ".previous                   \n"
> >>> +                   _ASM_EXTABLE(1b, 2b)
> >>> +                   : "+&D" (ptr), "+&a" (lmask)
> >>> +                   : "m" (*ptr), "g" (lmask), "d" (hmask),
> >>> +                     "m" (xsave_cntxt_size)
> >>> +                   : "ecx" );
> >>> +}
> >>> +
> >> Neither of these two helpers have anything like sufficient checking to
> >> be safely used on guest state.
> >>
> > Basic checking is done before these two helpers.
> 
> But this isn't the only place where they are used.
> 
> ~Andrew
> 
> _______________________________________________
> Xen-devel mailing list
Thanks for your review ,Andrew
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V3 3/6] x86/xsaves: enable xsaves/xrstors for hvm guest
  2015-08-07 13:04       ` Andrew Cooper
@ 2015-08-11  7:59         ` Shuai Ruan
       [not found]         ` <20150811075909.GB14406@shuai.ruan@linux.intel.com>
  1 sibling, 0 replies; 27+ messages in thread
From: Shuai Ruan @ 2015-08-11  7:59 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: kevin.tian, wei.liu2, Ian.Campbell, stefano.stabellini,
	eddie.dong, ian.jackson, xen-devel, jbeulich, jun.nakajima, keir

On Fri, Aug 07, 2015 at 02:04:51PM +0100, Andrew Cooper wrote:
> On 07/08/15 09:22, Shuai Ruan wrote:
> >
> >>>  void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
> >>>                                     unsigned int *ecx, unsigned int *edx)
> >>>  {
> >>> @@ -4456,6 +4460,34 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
> >>>                      *ebx = _eax + _ebx;
> >>>              }
> >>>          }
> >>> +        if ( count == 1 )
> >>> +        {
> >>> +            if ( cpu_has_xsaves )
> >>> +            {
> >>> +                *ebx = XSTATE_AREA_MIN_SIZE;
> >>> +                if ( v->arch.xcr0 | v->arch.msr_ia32_xss )
> >>> +                    for ( sub_leaf = 2; sub_leaf < 63; sub_leaf++ )
> >>> +                    {
> >>> +                        if ( !((v->arch.xcr0 | v->arch.msr_ia32_xss)
> >>> +			   & (1ULL << sub_leaf)) )
> >>> +                            continue;
> >>> +                        domain_cpuid(d, input, sub_leaf, &_eax, &_ebx, &_ecx,
> >>> +                                     &_edx);
> >>> +                        *ebx =  *ebx + _eax;
> >>> +                    }
> >>> +            }
> >>> +            else
> >>> +            {
> >>> +                *eax &= ~XSAVES;
> >>> +                *ebx = *ecx = *edx = 0;
> >>> +            }
> >>> +            if ( !cpu_has_xgetbv1 )
> >>> +                *eax &= ~XGETBV1;
> >>> +            if ( !cpu_has_xsavec )
> >>> +                *eax &= ~XSAVEC;
> >>> +            if ( !cpu_has_xsaveopt )
> >>> +                *eax &= ~XSAVEOPT;
> >>> +        }
> >> Urgh - I really need to get domain cpuid fixed in Xen.  This is
> >> currently making a very bad situation a little worse.
> >>
> > In patch 4, I expose the xsaves/xsavec/xsaveopt and need to check
> > whether the hardware supoort it. What's your suggestion about this?
> 
> Calling into domain_cpuid() in the loop is not useful as nothing will
> set the subleaves up.  As a first pass, reading from
> xstate_{offsets,sizes} will be better than nothing, as it will at least
What do you mean by xstate_{offsets,sizes}?
> match reality until the domain is migrated.
> 
For CPUID(eax=0dh) with subleaf 1, the value of ebx will change
according to the v->arch.xcr0 | v->arch.msr_ia32_xss. So add
code in hvm_cpuid function is the best way I can think of. Your
suggestions :)?
> Longterm, I plan to overhaul the cpuid infrastructure to allow it to
> properly represent per-core and per-package data, as well as move it
> into the Xen architectural migration state, to avoid any host specific
> values leaking into guest state.  This however is also a lot of work,
> which you don't want to dependent on.
> 
> >
> >>>  static int construct_vmcs(struct vcpu *v)
> >>>  {
> >>>      struct domain *d = v->domain;
> >>> @@ -1204,6 +1206,9 @@ static int construct_vmcs(struct vcpu *v)
> >>>          __vmwrite(GUEST_PAT, guest_pat);
> >>>      }
> >>>  
> >>> +    if ( cpu_has_vmx_xsaves )
> >>> +        __vmwrite(XSS_EXIT_BITMAP, VMX_XSS_EXIT_BITMAP);
> >>> +
> >>>      vmx_vmcs_exit(v);
> >>>  
> >>>      /* PVH: paging mode is updated by arch_set_info_guest(). */
> >>> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> >>> index d3183a8..64ff63b 100644
> >>> --- a/xen/arch/x86/hvm/vmx/vmx.c
> >>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> >>> @@ -2708,6 +2708,16 @@ static int vmx_handle_apic_write(void)
> >>>      return vlapic_apicv_write(current, exit_qualification & 0xfff);
> >>>  }
> >>>  
> >>> +static void vmx_handle_xsaves(void)
> >>> +{
> >>> +    WARN();
> >>> +}
> >>> +
> >>> +static void vmx_handle_xrstors(void)
> >>> +{
> >>> +    WARN();
> >>> +}
> >>> +
> >> What is these supposed to do?  They are not an appropriate handlers.
> >>
> > These two handlers do nothing here. Perform xsaves in HVM guest will 
> > not trap in hypersior in this patch (by setting XSS_EXIT_BITMAP zero). 
> > However it may trap in the future. See SDM Volume 3 Section 25.1.3 
> > for detail information.
> 
> in which case use domain_crash().  WARN() here will allow a guest to DoS
> Xen.
I will change this in next version.
> 
> ~Andrew
> 
Thanks for your review ,Andrew.
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V3 5/6] x86/xsaves: support compact format for hvm save/restore
       [not found]     ` <20150811080143.GC14406@shuai.ruan@linux.intel.com>
@ 2015-08-11  9:27       ` Andrew Cooper
  2015-08-12 11:23         ` Shuai Ruan
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Cooper @ 2015-08-11  9:27 UTC (permalink / raw)
  To: Shuai Ruan; +Cc: Xen-devel List

On 11/08/15 09:01, Shuai Ruan wrote:
>
>>> +
>>> +    /*
>>> +     * The FP xstates and SSE xstates are legacy states. They are always
>>> +     * in the fixed offsets in the xsave area in either compacted form
>>> +     * or standard form.
>>> +     */
>>> +    xstate_comp_offsets[0] = 0;
>>> +    xstate_comp_offsets[1] = XSAVE_SSE_OFFSET;
>>> +
>>> +    xstate_comp_offsets[2] = FXSAVE_SIZE + XSAVE_HDR_SIZE;
>>> +
>>> +    for (i = 2; i < xstate_features; i++)
>> This loop will run off the end of xstate_comp_sizes[] for any processor
>> supporting AVX512 or greater.
>>
> For the length of xsate_comp_sizes is 64, I think the case you mentioned
> above will not happen.

xstate_features is a bitmap.  The comparison "i < xstate_features" is
bogus, and loops many more times than you intend.

~Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V3 3/6] x86/xsaves: enable xsaves/xrstors for hvm guest
       [not found]         ` <20150811075909.GB14406@shuai.ruan@linux.intel.com>
@ 2015-08-11  9:37           ` Andrew Cooper
  2015-08-12 11:17             ` Shuai Ruan
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Cooper @ 2015-08-11  9:37 UTC (permalink / raw)
  To: Shuai Ruan
  Cc: kevin.tian, wei.liu2, Ian.Campbell, stefano.stabellini,
	eddie.dong, ian.jackson, xen-devel, jbeulich, jun.nakajima, keir

On 11/08/15 08:59, Shuai Ruan wrote:
> On Fri, Aug 07, 2015 at 02:04:51PM +0100, Andrew Cooper wrote:
>> On 07/08/15 09:22, Shuai Ruan wrote:
>>>>>  void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
>>>>>                                     unsigned int *ecx, unsigned int *edx)
>>>>>  {
>>>>> @@ -4456,6 +4460,34 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
>>>>>                      *ebx = _eax + _ebx;
>>>>>              }
>>>>>          }
>>>>> +        if ( count == 1 )
>>>>> +        {
>>>>> +            if ( cpu_has_xsaves )
>>>>> +            {
>>>>> +                *ebx = XSTATE_AREA_MIN_SIZE;
>>>>> +                if ( v->arch.xcr0 | v->arch.msr_ia32_xss )
>>>>> +                    for ( sub_leaf = 2; sub_leaf < 63; sub_leaf++ )
>>>>> +                    {
>>>>> +                        if ( !((v->arch.xcr0 | v->arch.msr_ia32_xss)
>>>>> +			   & (1ULL << sub_leaf)) )
>>>>> +                            continue;
>>>>> +                        domain_cpuid(d, input, sub_leaf, &_eax, &_ebx, &_ecx,
>>>>> +                                     &_edx);
>>>>> +                        *ebx =  *ebx + _eax;
>>>>> +                    }
>>>>> +            }
>>>>> +            else
>>>>> +            {
>>>>> +                *eax &= ~XSAVES;
>>>>> +                *ebx = *ecx = *edx = 0;
>>>>> +            }
>>>>> +            if ( !cpu_has_xgetbv1 )
>>>>> +                *eax &= ~XGETBV1;
>>>>> +            if ( !cpu_has_xsavec )
>>>>> +                *eax &= ~XSAVEC;
>>>>> +            if ( !cpu_has_xsaveopt )
>>>>> +                *eax &= ~XSAVEOPT;
>>>>> +        }
>>>> Urgh - I really need to get domain cpuid fixed in Xen.  This is
>>>> currently making a very bad situation a little worse.
>>>>
>>> In patch 4, I expose the xsaves/xsavec/xsaveopt and need to check
>>> whether the hardware supoort it. What's your suggestion about this?
>> Calling into domain_cpuid() in the loop is not useful as nothing will
>> set the subleaves up.  As a first pass, reading from
>> xstate_{offsets,sizes} will be better than nothing, as it will at least
> What do you mean by xstate_{offsets,sizes}?

Shorthand for xstate_offsets xstate_sizes, per the standard shell expansion.

>> match reality until the domain is migrated.
>>
> For CPUID(eax=0dh) with subleaf 1, the value of ebx will change
> according to the v->arch.xcr0 | v->arch.msr_ia32_xss. So add
> code in hvm_cpuid function is the best way I can think of. Your
> suggestions :)?

Which is liable to change on different hardware.  Once a vm has
migrated, Xen may not legitimately execute another cpuid instruction as
part of emulating guest cpuid, as it is not necessarily accurate.

Xen currently does not currently have proper cpuid encapsulation, which
causes host-specific details to leak into guests across migrate.  I have
a longterm plan to fix it, but it is not simple or quick to do.

In this case, reading from xstate_{offsets,sizes} is better than
nothing, but will need fixing in the longterm.

~Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V3 1/6] x86/xsaves: enable xsaves/xrstors for pv guest
       [not found]         ` <20150811075039.GA14406@shuai.ruan@linux.intel.com>
@ 2015-08-11 10:24           ` Andrew Cooper
  2015-08-12  3:01             ` Shuai Ruan
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Cooper @ 2015-08-11 10:24 UTC (permalink / raw)
  To: Shuai Ruan
  Cc: kevin.tian, wei.liu2, Ian.Campbell, stefano.stabellini,
	eddie.dong, ian.jackson, xen-devel, jbeulich, jun.nakajima, keir

On 11/08/15 08:50, Shuai Ruan wrote:
> On Fri, Aug 07, 2015 at 01:44:41PM +0100, Andrew Cooper wrote:
>> On 07/08/15 09:00, Shuai Ruan wrote:
>>>>> +                    goto skip;
>>>>> +                }
>>>>> +
>>>>> +                if ( !guest_kernel_mode(v, regs) || (regs->edi & 0x3f) )
>>>> What does edi have to do with xsaves?  only edx:eax are special
>>>> according to the manual.
>>>>
>>> regs->edi is the guest_linear_address
>> Whyso?  xsaves takes an unconditional memory parameter,  not a pointer
>> in %rdi.  (regs->edi is only correct for ins/outs because the pointer is
>> architecturally required to be in %rdi.)
> You are right. The linear_address should be decoded from the instruction.
>> There is nothing currently in emulate_privileged_op() which does ModRM
>> decoding for memory references, nor SIB decoding.  xsaves/xrstors would
>> be the first such operations.
>>
>> I am also not sure that adding arbitrary memory decode here is sensible.
>>
>> In an ideal world, we would have what is currently x86_emulate() split
>> in 3 stages.
>>
>> Stage 1 does straight instruction decode to some internal representation.
>>
>> Stage 2 does an audit to see whether the decoded instruction is
>> plausible for the reason why an emulation was needed.  We have had a
>> number of security issues with emulation in the past where guests cause
>> one instruction to trap for emulation, then rewrite the instruction to
>> be something else, and exploit a bug in the emulator.
>>
>> Stage 3 performs the actions required for emulation.
>>
>> Currently, x86_emulate() is limited to instructions which might
>> legitimately fault for emulation, but with the advent of VM
>> introspection, this is proving to be insufficient.  With my x86
>> maintainers hat on, I would like to avoid the current situation we have
>> with multiple bits of code doing x86 instruction decode and emulation
>> (which are all different).
>>
>> I think the 3-step approach above caters suitably to all usecases, but
>> it is a large project itself.  It allows the introspection people to
>> have a full and complete x86 emulation infrastructure, while also
>> preventing areas like the shadow paging from being opened up to
>> potential vulnerabilities in unrelated areas of the x86 architecture.
>>
>> I would even go so far as to say that it is probably ok not to support
>> xsaves/xrestors in PV guests until something along the above lines is
>> sorted.  The first feature in XSS is processor trace which a PV guest
>> couldn't use anyway.  I suspect the same applies to most of the other
> Why PV guest couldn't use precessor trace?

After more consideration, Xen should not expose xsaves/xrstors to PV
guests at all.

>> XSS features, or they wouldn't need to be privileged in the first place.
>>
> Thanks for your such detail suggestions.
> For xsaves/xrstors would also bring other benefits for PV guest such as
> saving memory of XSAVE area. If we do not support xsaves/xrstors in PV , 
> PV guest would lose these benefits. What's your opinions toward this?

PV guests running under Xen are exactly the same as regular user
processes running under Linux.

There is a reason everything covered by xsaves/xrstors is restricted to
ring0; it would be a security hole to allow guests to configure the
features themselves.

Features such as Processor Trace would need a hypercall interface for
guests to use.

~Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V3 1/6] x86/xsaves: enable xsaves/xrstors for pv guest
  2015-08-11 10:24           ` Andrew Cooper
@ 2015-08-12  3:01             ` Shuai Ruan
  0 siblings, 0 replies; 27+ messages in thread
From: Shuai Ruan @ 2015-08-12  3:01 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: kevin.tian, wei.liu2, Ian.Campbell, stefano.stabellini,
	jun.nakajima, eddie.dong, ian.jackson, xen-devel, jbeulich, keir

On Tue, Aug 11, 2015 at 11:24:24AM +0100, Andrew Cooper wrote:
> On 11/08/15 08:50, Shuai Ruan wrote:
> > On Fri, Aug 07, 2015 at 01:44:41PM +0100, Andrew Cooper wrote:
> >> On 07/08/15 09:00, Shuai Ruan wrote:
> >>>>> +                    goto skip;
> >>>>> +                }
> >>>>> +
> >>>>> +                if ( !guest_kernel_mode(v, regs) || (regs->edi & 0x3f) )
> >>>> What does edi have to do with xsaves?  only edx:eax are special
> >>>> according to the manual.
> >>>>
> >>> regs->edi is the guest_linear_address
> >> Whyso?  xsaves takes an unconditional memory parameter,  not a pointer
> >> in %rdi.  (regs->edi is only correct for ins/outs because the pointer is
> >> architecturally required to be in %rdi.)
> > You are right. The linear_address should be decoded from the instruction.
> >> There is nothing currently in emulate_privileged_op() which does ModRM
> >> decoding for memory references, nor SIB decoding.  xsaves/xrstors would
> >> be the first such operations.
> >>
> >> I am also not sure that adding arbitrary memory decode here is sensible.
> >>
> >> In an ideal world, we would have what is currently x86_emulate() split
> >> in 3 stages.
> >>
> >> Stage 1 does straight instruction decode to some internal representation.
> >>
> >> Stage 2 does an audit to see whether the decoded instruction is
> >> plausible for the reason why an emulation was needed.  We have had a
> >> number of security issues with emulation in the past where guests cause
> >> one instruction to trap for emulation, then rewrite the instruction to
> >> be something else, and exploit a bug in the emulator.
> >>
> >> Stage 3 performs the actions required for emulation.
> >>
> >> Currently, x86_emulate() is limited to instructions which might
> >> legitimately fault for emulation, but with the advent of VM
> >> introspection, this is proving to be insufficient.  With my x86
> >> maintainers hat on, I would like to avoid the current situation we have
> >> with multiple bits of code doing x86 instruction decode and emulation
> >> (which are all different).
> >>
> >> I think the 3-step approach above caters suitably to all usecases, but
> >> it is a large project itself.  It allows the introspection people to
> >> have a full and complete x86 emulation infrastructure, while also
> >> preventing areas like the shadow paging from being opened up to
> >> potential vulnerabilities in unrelated areas of the x86 architecture.
> >>
> >> I would even go so far as to say that it is probably ok not to support
> >> xsaves/xrestors in PV guests until something along the above lines is
> >> sorted.  The first feature in XSS is processor trace which a PV guest
> >> couldn't use anyway.  I suspect the same applies to most of the other
> > Why PV guest couldn't use precessor trace?
> 
> After more consideration, Xen should not expose xsaves/xrstors to PV
> guests at all.
> 
OK. In next version, I will follow your suggestion not to add support for PV guest.
> >> XSS features, or they wouldn't need to be privileged in the first place.
> >>
> > Thanks for your such detail suggestions.
> > For xsaves/xrstors would also bring other benefits for PV guest such as
> > saving memory of XSAVE area. If we do not support xsaves/xrstors in PV , 
> > PV guest would lose these benefits. What's your opinions toward this?
> 
> PV guests running under Xen are exactly the same as regular user
> processes running under Linux.
> 
> There is a reason everything covered by xsaves/xrstors is restricted to
> ring0; it would be a security hole to allow guests to configure the
> features themselves.
> 
> Features such as Processor Trace would need a hypercall interface for
> guests to use.
> 
Ok. Thanks. 
> ~Andrew
Thnaks for your review ,Andrew
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V3 3/6] x86/xsaves: enable xsaves/xrstors for hvm guest
  2015-08-11  9:37           ` Andrew Cooper
@ 2015-08-12 11:17             ` Shuai Ruan
  0 siblings, 0 replies; 27+ messages in thread
From: Shuai Ruan @ 2015-08-12 11:17 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: kevin.tian, wei.liu2, Ian.Campbell, stefano.stabellini,
	jun.nakajima, eddie.dong, ian.jackson, xen-devel, jbeulich, keir

On Tue, Aug 11, 2015 at 10:37:56AM +0100, Andrew Cooper wrote:
> On 11/08/15 08:59, Shuai Ruan wrote:
> > On Fri, Aug 07, 2015 at 02:04:51PM +0100, Andrew Cooper wrote:
> >> On 07/08/15 09:22, Shuai Ruan wrote:
> >>>>>  void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
> >>>>>                                     unsigned int *ecx, unsigned int *edx)
> >>>>>  {
> >>>>> @@ -4456,6 +4460,34 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
> >>>>>                      *ebx = _eax + _ebx;
> >>>>>              }
> >>>>>          }
> >>>>> +        if ( count == 1 )
> >>>>> +        {
> >>>>> +            if ( cpu_has_xsaves )
> >>>>> +            {
> >>>>> +                *ebx = XSTATE_AREA_MIN_SIZE;
> >>>>> +                if ( v->arch.xcr0 | v->arch.msr_ia32_xss )
> >>>>> +                    for ( sub_leaf = 2; sub_leaf < 63; sub_leaf++ )
> >>>>> +                    {
> >>>>> +                        if ( !((v->arch.xcr0 | v->arch.msr_ia32_xss)
> >>>>> +			   & (1ULL << sub_leaf)) )
> >>>>> +                            continue;
> >>>>> +                        domain_cpuid(d, input, sub_leaf, &_eax, &_ebx, &_ecx,
> >>>>> +                                     &_edx);
> >>>>> +                        *ebx =  *ebx + _eax;
> >>>>> +                    }
> >>>>> +            }
> >>>>> +            else
> >>>>> +            {
> >>>>> +                *eax &= ~XSAVES;
> >>>>> +                *ebx = *ecx = *edx = 0;
> >>>>> +            }
> >>>>> +            if ( !cpu_has_xgetbv1 )
> >>>>> +                *eax &= ~XGETBV1;
> >>>>> +            if ( !cpu_has_xsavec )
> >>>>> +                *eax &= ~XSAVEC;
> >>>>> +            if ( !cpu_has_xsaveopt )
> >>>>> +                *eax &= ~XSAVEOPT;
> >>>>> +        }
> >>>> Urgh - I really need to get domain cpuid fixed in Xen.  This is
> >>>> currently making a very bad situation a little worse.
> >>>>
> >>> In patch 4, I expose the xsaves/xsavec/xsaveopt and need to check
> >>> whether the hardware supoort it. What's your suggestion about this?
> >> Calling into domain_cpuid() in the loop is not useful as nothing will
> >> set the subleaves up.  As a first pass, reading from
> >> xstate_{offsets,sizes} will be better than nothing, as it will at least
> > What do you mean by xstate_{offsets,sizes}?
> 
> Shorthand for xstate_offsets xstate_sizes, per the standard shell expansion.
> 
You mean xstate_offsets xstate_sizes I introduced in patch 5.
> >> match reality until the domain is migrated.
> >>
> > For CPUID(eax=0dh) with subleaf 1, the value of ebx will change
> > according to the v->arch.xcr0 | v->arch.msr_ia32_xss. So add
> > code in hvm_cpuid function is the best way I can think of. Your
> > suggestions :)?
> 
> Which is liable to change on different hardware.  Once a vm has
> migrated, Xen may not legitimately execute another cpuid instruction as
> part of emulating guest cpuid, as it is not necessarily accurate.
> 
> Xen currently does not currently have proper cpuid encapsulation, which
> causes host-specific details to leak into guests across migrate.  I have
> a longterm plan to fix it, but it is not simple or quick to do.
> 
> In this case, reading from xstate_{offsets,sizes} is better than
> nothing, but will need fixing in the longterm.
> 
Ok, I will fix this in next version.
> ~Andrew
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH V3 5/6] x86/xsaves: support compact format for hvm save/restore
  2015-08-11  9:27       ` Andrew Cooper
@ 2015-08-12 11:23         ` Shuai Ruan
  0 siblings, 0 replies; 27+ messages in thread
From: Shuai Ruan @ 2015-08-12 11:23 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel List

On Tue, Aug 11, 2015 at 10:27:20AM +0100, Andrew Cooper wrote:
> On 11/08/15 09:01, Shuai Ruan wrote:
> >
> >>> +
> >>> +    /*
> >>> +     * The FP xstates and SSE xstates are legacy states. They are always
> >>> +     * in the fixed offsets in the xsave area in either compacted form
> >>> +     * or standard form.
> >>> +     */
> >>> +    xstate_comp_offsets[0] = 0;
> >>> +    xstate_comp_offsets[1] = XSAVE_SSE_OFFSET;
> >>> +
> >>> +    xstate_comp_offsets[2] = FXSAVE_SIZE + XSAVE_HDR_SIZE;
> >>> +
> >>> +    for (i = 2; i < xstate_features; i++)
> >> This loop will run off the end of xstate_comp_sizes[] for any processor
> >> supporting AVX512 or greater.
> >>
> > For the length of xsate_comp_sizes is 64, I think the case you mentioned
> > above will not happen.
> 
> xstate_features is a bitmap.  The comparison "i < xstate_features" is
> bogus, and loops many more times than you intend.
> 
Ok. Thanks for your review, Andrew
> ~Andrew
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2015-08-12 11:23 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-05  1:57 [PATCH V3 0/6] add xsaves/xrstors support Shuai Ruan
2015-08-05  1:57 ` [PATCH V3 1/6] x86/xsaves: enable xsaves/xrstors for pv guest Shuai Ruan
2015-08-05 17:51   ` Andrew Cooper
2015-08-07  8:00     ` Shuai Ruan
     [not found]     ` <20150807080008.GA2976@shuai.ruan@linux.intel.com>
2015-08-07 12:44       ` Andrew Cooper
2015-08-11  7:50         ` Shuai Ruan
     [not found]         ` <20150811075039.GA14406@shuai.ruan@linux.intel.com>
2015-08-11 10:24           ` Andrew Cooper
2015-08-12  3:01             ` Shuai Ruan
2015-08-05  1:57 ` [PATCH V3 2/6] x86/xsaves: enable xsaves/xrstors in xen Shuai Ruan
2015-08-05 17:57   ` Andrew Cooper
2015-08-05  1:57 ` [PATCH V3 3/6] x86/xsaves: enable xsaves/xrstors for hvm guest Shuai Ruan
2015-08-05 18:17   ` Andrew Cooper
2015-08-07  8:22     ` Shuai Ruan
     [not found]     ` <20150807082244.GB2976@shuai.ruan@linux.intel.com>
2015-08-07 13:04       ` Andrew Cooper
2015-08-11  7:59         ` Shuai Ruan
     [not found]         ` <20150811075909.GB14406@shuai.ruan@linux.intel.com>
2015-08-11  9:37           ` Andrew Cooper
2015-08-12 11:17             ` Shuai Ruan
2015-08-05  1:57 ` [PATCH V3 4/6] libxc: expose xsaves/xgetbv1/xsavec to " Shuai Ruan
2015-08-05  8:37   ` Ian Campbell
2015-08-07  8:23     ` Shuai Ruan
2015-08-05  1:57 ` [PATCH V3 5/6] x86/xsaves: support compact format for hvm save/restore Shuai Ruan
2015-08-05 18:45   ` Andrew Cooper
     [not found]     ` <20150811080143.GC14406@shuai.ruan@linux.intel.com>
2015-08-11  9:27       ` Andrew Cooper
2015-08-12 11:23         ` Shuai Ruan
2015-08-05  1:57 ` [PATCH V3 6/6] x86/xsaves: detect xsaves/xgetbv1 in xen Shuai Ruan
2015-08-05 16:38 ` [PATCH V3 0/6] add xsaves/xrstors support Andrew Cooper
2015-08-07  8:25   ` Shuai Ruan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.