All of lore.kernel.org
 help / color / mirror / Atom feed
* [V11 0/3] add xsaves/xrstors support
@ 2015-11-20  1:17 Shuai Ruan
  2015-11-20  1:18 ` [V11 1/3] x86/xsaves: enable xsaves/xrstors/xsavec in xen Shuai Ruan
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Shuai Ruan @ 2015-11-20  1:17 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, wei.liu2, Ian.Campbell, stefano.stabellini,
	jun.nakajima, andrew.cooper3, ian.jackson, jbeulich, keir

From: Shuai Ruan <shuai.ruan@intel.com>

Changes in v11:
* Address comments from Jan:
* Using alternative asm on xrstor side. 
  For xsave side(alternative asm), I will send a seperate patch to do this. 

Changes in v10:
* Address comments from Jan:
* Add two new macros to handle alternative asm with fixup code in patch1.
* Fix a coding style error in patch2.

Changes in v9:
* Address comments from Jan:
* Add msr_ia32_xss save/restor support in patch2.
* Change xrstor to alternative asm in patch1.
* Use memcpy() copy the save header to avoid one ugly cast in patch1.
* Fix coding style errors.

Changes in v8:
* Address comments from Jan/Andrew:
* Add optimisation in set_msr_xss in patch1.
* Change from black listing feature to white listing in pv_cpuid in patch2.
* Add MSR_IA32_XSS save/restore support in patch2.
* Fix some code errors and coding style errors.

Changes in v7:
* Address comments from Jan/Andrew:
* Move macro XSTATE_FIXUP into patch3.
* Change lazy write to xss_msr in patch1.
* Drop inner set of brackets and stray cast in patch2.
* Move the condition and memcpy() wouldn't into the new called functions in patch2.
* Rebase patch4 base on Andrew's"tools/libxc: Improve efficiency of xc_cpuid_apply_policy()".
* For no XSS features are currently supported.For leaf 2...63,ecx remain zero at 
  the moment in patch4.

Changes in v6:
* Address comments from Jan.
* Rebase the code based on Andrew'patch "xen/x86: Record xsave features in c->x86_capabilities".
* Re-split the patch to avoid building errors. Move some func definitions from patch1 to patch2.
* Change func name load/save_xsave_states to compress/expand_xsave_states in patch2.  
* Add macro to handle redundancy in xrstor.
* Fix some code errors and coding style errors.
* Add some descriptions in commit message.

Changes in v5:
* Address comments from Andrew/Jan,mainly:
* Add lazy writes of the xss_msr.
* Add xsave_area check when save/restore guest.
* Add xsavec support.
* Use word 2 in xen/include/asm-x86/cpufeature.h.
* Fix some code errors.

Changes in v4:
* Address comments from Andrew, mainly:
* No xsaves suporting for pv guest.
* Using xstate_sizes instead of domain_cpuid in hvm_cupid in patch 3.
* Add support for format translation when perform pv guest migration in patch 2. 
* Fix some code errors.

Changes in v3:
* Address comments from Jan/Konrad
* Change the bits of eax/ebx/ecx/edx passed to guest in patch 4.
* Add code to verify whether host supports xsaves or not in patch 1.

Changes in v2:
* Address comments from Andrew/chao/Jan, mainly:
* Add details information of xsaves/xrstors feature.
* Test migration between xsaves-support machine and none-xsaves-support machine 
* Remove XGETBV1/XSAVEC/XSAVEOPT out of 'else' in patch 3.
* Change macro name XGETBV to XGETBV1 in patch 4.

This patchset enable xsaves/xrstors feature which will be available on 
Intel Skylake and later platform. Like xsave/xrstor, xsaves/xrstors 
feature will save and load processor state from a region of memory called 
XSAVE area. While unlike xsave/xrstor, xsaves/xrstors:

a) use the compacted format only for the extended region 
   of the XSAVE area which saves memory for you;
b) can operate on supervisor state components so the feature
   is prerequisite to support new supervisor state components;
c) execute only if CPL=0. 

Detail hardware spec can be found in chapter 13 (section 13.11 13.12) of the 
Intel SDM [1].


[1] Intel SDM (http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf)

Shuai Ruan (3):
  x86/xsaves: enable xsaves/xrstors/xsavec in xen
  x86/xsaves: enable xsaves/xrstors for hvm guest
  libxc: expose xsaves/xgetbv1/xsavec to hvm guest

 tools/libxc/xc_cpuid_x86.c         |  10 +-
 xen/arch/x86/domain.c              |   8 ++
 xen/arch/x86/domctl.c              |  30 ++++-
 xen/arch/x86/hvm/hvm.c             |  45 ++++++-
 xen/arch/x86/hvm/vmx/vmcs.c        |   5 +-
 xen/arch/x86/hvm/vmx/vmx.c         |  35 ++++-
 xen/arch/x86/i387.c                |   4 +
 xen/arch/x86/traps.c               |   6 +-
 xen/arch/x86/xstate.c              | 265 ++++++++++++++++++++++++++++++++-----
 xen/include/asm-x86/alternative.h  |   3 +
 xen/include/asm-x86/hvm/vmx/vmcs.h |   4 +
 xen/include/asm-x86/hvm/vmx/vmx.h  |   2 +
 xen/include/asm-x86/xstate.h       |   3 +
 13 files changed, 372 insertions(+), 48 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [V11 1/3] x86/xsaves: enable xsaves/xrstors/xsavec in xen
  2015-11-20  1:17 [V11 0/3] add xsaves/xrstors support Shuai Ruan
@ 2015-11-20  1:18 ` Shuai Ruan
  2015-11-20 11:35   ` Jan Beulich
  2015-11-20  1:18 ` [V11 2/3] x86/xsaves: enable xsaves/xrstors for hvm guest Shuai Ruan
  2015-11-20  1:18 ` [V11 3/3] libxc: expose xsaves/xgetbv1/xsavec to " Shuai Ruan
  2 siblings, 1 reply; 9+ messages in thread
From: Shuai Ruan @ 2015-11-20  1:18 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, wei.liu2, Ian.Campbell, stefano.stabellini,
	jun.nakajima, andrew.cooper3, ian.jackson, jbeulich, keir

This patch uses xsaves/xrstors/xsavec instead of xsaveopt/xrstor
to perform the xsave_area switching so that xen itself
can benefit from them when available.

For xsaves/xrstors/xsavec only use compact format. Add format conversion
support when perform guest os migration. Also, pv guest will not support
xsaves/xrstors.

Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com>
---
 xen/arch/x86/domain.c             |   8 ++
 xen/arch/x86/domctl.c             |  30 ++++-
 xen/arch/x86/hvm/hvm.c            |  18 ++-
 xen/arch/x86/i387.c               |   4 +
 xen/arch/x86/traps.c              |   6 +-
 xen/arch/x86/xstate.c             | 265 +++++++++++++++++++++++++++++++++-----
 xen/include/asm-x86/alternative.h |   3 +
 xen/include/asm-x86/xstate.h      |   2 +
 8 files changed, 292 insertions(+), 44 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 3f9e5d2..9476869 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -910,7 +910,12 @@ int arch_set_info_guest(
     {
         memcpy(v->arch.fpu_ctxt, &c.nat->fpu_ctxt, sizeof(c.nat->fpu_ctxt));
         if ( v->arch.xsave_area )
+        {
              v->arch.xsave_area->xsave_hdr.xstate_bv = XSTATE_FP_SSE;
+             if ( cpu_has_xsaves || cpu_has_xsavec )
+                  v->arch.xsave_area->xsave_hdr.xcomp_bv = XSTATE_FP_SSE |
+                                                           XSTATE_COMPACTION_ENABLED;
+        }
     }
 
     if ( !compat )
@@ -1594,6 +1599,9 @@ static void __context_switch(void)
 
             if ( xcr0 != get_xcr0() && !set_xcr0(xcr0) )
                 BUG();
+
+            if ( cpu_has_xsaves && has_hvm_container_vcpu(n) )
+                set_msr_xss(n->arch.hvm_vcpu.msr_xss);
         }
         vcpu_restore_fpu_eager(n);
         n->arch.ctxt_switch_to(n);
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 0f6fdb9..95b0747 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -897,9 +897,29 @@ long arch_do_domctl(
                 ret = -EFAULT;
 
             offset += sizeof(v->arch.xcr0_accum);
-            if ( !ret && copy_to_guest_offset(evc->buffer, offset,
-                                              (void *)v->arch.xsave_area,
-                                              size - 2 * sizeof(uint64_t)) )
+            if ( !ret && (cpu_has_xsaves || cpu_has_xsavec) )
+            {
+                void *xsave_area;
+
+                xsave_area = xmalloc_bytes(size);
+                if ( !xsave_area )
+                {
+                    ret = -ENOMEM;
+                    vcpu_unpause(v);
+                    goto vcpuextstate_out;
+                }
+
+                expand_xsave_states(v, xsave_area,
+                                    size - 2 * sizeof(uint64_t));
+
+                if ( copy_to_guest_offset(evc->buffer, offset, xsave_area,
+                                          size - 2 * sizeof(uint64_t)) )
+                     ret = -EFAULT;
+                xfree(xsave_area);
+           }
+           else if ( !ret && copy_to_guest_offset(evc->buffer, offset,
+                                                  (void *)v->arch.xsave_area,
+                                                  size - 2 * sizeof(uint64_t)) )
                 ret = -EFAULT;
 
             vcpu_unpause(v);
@@ -955,8 +975,8 @@ long arch_do_domctl(
                 v->arch.xcr0_accum = _xcr0_accum;
                 if ( _xcr0_accum & XSTATE_NONLAZY )
                     v->arch.nonlazy_xstate_used = 1;
-                memcpy(v->arch.xsave_area, _xsave_area,
-                       evc->size - 2 * sizeof(uint64_t));
+                compress_xsave_states(v, _xsave_area,
+                                      evc->size - 2 * sizeof(uint64_t));
                 vcpu_unpause(v);
             }
             else
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index ea982e2..20148f5 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2092,6 +2092,9 @@ static int hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
 
         memcpy(v->arch.xsave_area, ctxt.fpu_regs, sizeof(ctxt.fpu_regs));
         xsave_area->xsave_hdr.xstate_bv = XSTATE_FP_SSE;
+        if ( cpu_has_xsaves || cpu_has_xsavec )
+            xsave_area->xsave_hdr.xcomp_bv = XSTATE_FP_SSE |
+                                             XSTATE_COMPACTION_ENABLED;
     }
     else
         memcpy(v->arch.fpu_ctxt, ctxt.fpu_regs, sizeof(ctxt.fpu_regs));
@@ -2161,8 +2164,8 @@ static int hvm_save_cpu_xsave_states(struct domain *d, hvm_domain_context_t *h)
         ctxt->xfeature_mask = xfeature_mask;
         ctxt->xcr0 = v->arch.xcr0;
         ctxt->xcr0_accum = v->arch.xcr0_accum;
-        memcpy(&ctxt->save_area, v->arch.xsave_area,
-               size - offsetof(struct hvm_hw_cpu_xsave, save_area));
+        expand_xsave_states(v, &ctxt->save_area,
+                            size - offsetof(typeof(*ctxt), save_area));
     }
 
     return 0;
@@ -2261,9 +2264,9 @@ static int hvm_load_cpu_xsave_states(struct domain *d, hvm_domain_context_t *h)
     v->arch.xcr0_accum = ctxt->xcr0_accum;
     if ( ctxt->xcr0_accum & XSTATE_NONLAZY )
         v->arch.nonlazy_xstate_used = 1;
-    memcpy(v->arch.xsave_area, &ctxt->save_area,
-           min(desc->length, size) - offsetof(struct hvm_hw_cpu_xsave,
-           save_area));
+    compress_xsave_states(v, &ctxt->save_area,
+                          min(desc->length, size) -
+                          offsetof(struct hvm_hw_cpu_xsave,save_area));
 
     return 0;
 }
@@ -5503,7 +5506,12 @@ void hvm_vcpu_reset_state(struct vcpu *v, uint16_t cs, uint16_t ip)
     fpu_ctxt->fcw = FCW_RESET;
     fpu_ctxt->mxcsr = MXCSR_DEFAULT;
     if ( v->arch.xsave_area )
+    {
         v->arch.xsave_area->xsave_hdr.xstate_bv = XSTATE_FP;
+        if ( cpu_has_xsaves || cpu_has_xsavec )
+            v->arch.xsave_area->xsave_hdr.xcomp_bv = XSTATE_FP |
+                                                     XSTATE_COMPACTION_ENABLED;
+    }
 
     v->arch.vgc_flags = VGCF_online;
     memset(&v->arch.user_regs, 0, sizeof(v->arch.user_regs));
diff --git a/xen/arch/x86/i387.c b/xen/arch/x86/i387.c
index 66b51cb..b661d39 100644
--- a/xen/arch/x86/i387.c
+++ b/xen/arch/x86/i387.c
@@ -282,7 +282,11 @@ int vcpu_init_fpu(struct vcpu *v)
         return rc;
 
     if ( v->arch.xsave_area )
+    {
         v->arch.fpu_ctxt = &v->arch.xsave_area->fpu_sse;
+        if ( cpu_has_xsaves || cpu_has_xsavec )
+            v->arch.xsave_area->xsave_hdr.xcomp_bv = XSTATE_COMPACTION_ENABLED;
+    }
     else
     {
         v->arch.fpu_ctxt = _xzalloc(sizeof(v->arch.xsave_area->fpu_sse), 16);
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index e21fb78..075f98e 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -939,9 +939,9 @@ void pv_cpuid(struct cpu_user_regs *regs)
             goto unsupported;
         if ( regs->_ecx == 1 )
         {
-            a &= boot_cpu_data.x86_capability[cpufeat_word(X86_FEATURE_XSAVEOPT)];
-            if ( !cpu_has_xsaves )
-                b = c = d = 0;
+            a &= (boot_cpu_data.x86_capability[cpufeat_word(X86_FEATURE_XSAVEOPT)] &
+                  ~cpufeat_mask(X86_FEATURE_XSAVES));
+            b = c = d = 0;
         }
         break;
 
diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index 827e0e1..f3ae285 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -24,6 +24,11 @@ static u32 __read_mostly xsave_cntxt_size;
 /* A 64-bit bitmask of the XSAVE/XRSTOR features supported by processor. */
 u64 __read_mostly xfeature_mask;
 
+static unsigned int * __read_mostly xstate_offsets;
+static unsigned int * __read_mostly xstate_sizes;
+static unsigned int __read_mostly xstate_features;
+static unsigned int __read_mostly xstate_comp_offsets[sizeof(xfeature_mask)*8];
+
 /* Cached xcr0 for fast read */
 static DEFINE_PER_CPU(uint64_t, xcr0);
 
@@ -79,6 +84,161 @@ uint64_t get_msr_xss(void)
     return this_cpu(xss);
 }
 
+static bool_t xsave_area_compressed(const struct xsave_struct *xsave_area)
+{
+     return xsave_area && (xsave_area->xsave_hdr.xcomp_bv
+                           & XSTATE_COMPACTION_ENABLED);
+}
+
+static int setup_xstate_features(bool_t bsp)
+{
+    unsigned int leaf, tmp, eax, ebx;
+
+    if ( bsp )
+    {
+        xstate_features = fls(xfeature_mask);
+        xstate_offsets = xzalloc_array(unsigned int, xstate_features);
+        if ( !xstate_offsets )
+            return -ENOMEM;
+
+        xstate_sizes = xzalloc_array(unsigned int, xstate_features);
+        if ( !xstate_sizes )
+            return -ENOMEM;
+    }
+
+    for ( leaf = 2; leaf < xstate_features; leaf++ )
+    {
+        if ( bsp )
+            cpuid_count(XSTATE_CPUID, leaf, &xstate_sizes[leaf],
+                        &xstate_offsets[leaf], &tmp, &tmp);
+        else
+        {
+            cpuid_count(XSTATE_CPUID, leaf, &eax,
+                        &ebx, &tmp, &tmp);
+            BUG_ON(eax != xstate_sizes[leaf]);
+            BUG_ON(ebx != xstate_offsets[leaf]);
+        }
+    }
+
+    return 0;
+}
+
+static void __init setup_xstate_comp(void)
+{
+    unsigned int i;
+
+    /*
+     * The FP xstates and SSE xstates are legacy states. They are always
+     * in the fixed offsets in the xsave area in either compacted form
+     * or standard form.
+     */
+    xstate_comp_offsets[0] = 0;
+    xstate_comp_offsets[1] = XSAVE_SSE_OFFSET;
+
+    xstate_comp_offsets[2] = FXSAVE_SIZE + XSAVE_HDR_SIZE;
+
+    for ( i = 3; i < xstate_features; i++ )
+    {
+        xstate_comp_offsets[i] = xstate_comp_offsets[i - 1] +
+                                 (((1ul << i) & xfeature_mask)
+                                  ? xstate_sizes[i - 1] : 0);
+        ASSERT(xstate_comp_offsets[i] + xstate_sizes[i] <= xsave_cntxt_size);
+    }
+}
+
+static void *get_xsave_addr(void *xsave, unsigned int xfeature_idx)
+{
+    if ( !((1ul << xfeature_idx) & xfeature_mask) )
+        return NULL;
+
+    return xsave + xstate_comp_offsets[xfeature_idx];
+}
+
+void expand_xsave_states(struct vcpu *v, void *dest, unsigned int size)
+{
+    struct xsave_struct *xsave = v->arch.xsave_area;
+    u64 xstate_bv = xsave->xsave_hdr.xstate_bv;
+    u64 valid;
+
+    if ( !cpu_has_xsaves && !cpu_has_xsavec )
+    {
+        memcpy(dest, xsave, size);
+        return;
+    }
+
+    ASSERT(xsave_area_compressed(xsave));
+    /*
+     * Copy legacy XSAVE area and XSAVE hdr area.
+     */
+    memcpy(dest, xsave, XSTATE_AREA_MIN_SIZE);
+
+    ((struct xsave_struct *)dest)->xsave_hdr.xcomp_bv =  0;
+
+    /*
+     * Copy each region from the possibly compacted offset to the
+     * non-compacted offset.
+     */
+    valid = xstate_bv & ~XSTATE_FP_SSE;
+    while ( valid )
+    {
+        u64 feature = valid & -valid;
+        unsigned int index = fls(feature) - 1;
+        const void *src = get_xsave_addr(xsave, index);
+
+        if ( src )
+        {
+            ASSERT((xstate_offsets[index] + xstate_sizes[index]) <= size);
+            memcpy(dest + xstate_offsets[index], src, xstate_sizes[index]);
+        }
+
+        valid &= ~feature;
+    }
+}
+
+void compress_xsave_states(struct vcpu *v, const void *src, unsigned int size)
+{
+    struct xsave_struct *xsave = v->arch.xsave_area;
+    u64 xstate_bv = ((const struct xsave_struct *)src)->xsave_hdr.xstate_bv;
+    u64 valid;
+
+    if ( !cpu_has_xsaves && !cpu_has_xsavec )
+    {
+        memcpy(xsave, src, size);
+        return;
+    }
+
+    ASSERT(!xsave_area_compressed(src));
+    /*
+     * Copy legacy XSAVE area, to avoid complications with CPUID
+     * leaves 0 and 1 in the loop below.
+     */
+    memcpy(xsave, src, FXSAVE_SIZE);
+
+    /* Set XSTATE_BV and XCOMP_BV.  */
+    xsave->xsave_hdr.xstate_bv = xstate_bv;
+    xsave->xsave_hdr.xcomp_bv = v->arch.xcr0_accum | XSTATE_COMPACTION_ENABLED;
+
+    /*
+     * Copy each region from the non-compacted offset to the
+     * possibly compacted offset.
+     */
+    valid = xstate_bv & ~XSTATE_FP_SSE;
+    while ( valid )
+    {
+        u64 feature = valid & -valid;
+        unsigned int index = fls(feature) - 1;
+        void *dest = get_xsave_addr(xsave, index);
+
+        if ( dest )
+        {
+            ASSERT((xstate_offsets[index] + xstate_sizes[index]) <= size);
+            memcpy(dest, src + xstate_offsets[index], xstate_sizes[index]);
+        }
+
+        valid &= ~feature;
+    }
+}
+
 void xsave(struct vcpu *v, uint64_t mask)
 {
     struct xsave_struct *ptr = v->arch.xsave_area;
@@ -91,7 +251,15 @@ void xsave(struct vcpu *v, uint64_t mask)
         typeof(ptr->fpu_sse.fip.sel) fcs = ptr->fpu_sse.fip.sel;
         typeof(ptr->fpu_sse.fdp.sel) fds = ptr->fpu_sse.fdp.sel;
 
-        if ( cpu_has_xsaveopt )
+        if ( cpu_has_xsaves )
+            asm volatile ( ".byte 0x48,0x0f,0xc7,0x2f"
+                           : "=m" (*ptr)
+                           : "a" (lmask), "d" (hmask), "D" (ptr) );
+        else if ( cpu_has_xsavec )
+            asm volatile ( ".byte 0x48,0x0f,0xc7,0x27"
+                           : "=m" (*ptr)
+                           : "a" (lmask), "d" (hmask), "D" (ptr) );
+        else if ( cpu_has_xsaveopt )
         {
             /*
              * xsaveopt may not write the FPU portion even when the respective
@@ -144,7 +312,15 @@ void xsave(struct vcpu *v, uint64_t mask)
     }
     else
     {
-        if ( cpu_has_xsaveopt )
+        if ( cpu_has_xsaves )
+            asm volatile ( ".byte 0x0f,0xc7,0x2f"
+                           : "=m" (*ptr)
+                           : "a" (lmask), "d" (hmask), "D" (ptr) );
+        else if ( cpu_has_xsavec )
+            asm volatile ( ".byte 0x0f,0xc7,0x27"
+                           : "=m" (*ptr)
+                           : "a" (lmask), "d" (hmask), "D" (ptr) );
+        else if ( cpu_has_xsaveopt )
             asm volatile ( ".byte 0x0f,0xae,0x37"
                            : "=m" (*ptr)
                            : "a" (lmask), "d" (hmask), "D" (ptr) );
@@ -187,36 +363,56 @@ void xrstor(struct vcpu *v, uint64_t mask)
     switch ( __builtin_expect(ptr->fpu_sse.x[FPU_WORD_SIZE_OFFSET], 8) )
     {
     default:
-        asm volatile ( "1: .byte 0x48,0x0f,0xae,0x2f\n"
-                       ".section .fixup,\"ax\"      \n"
-                       "2: mov %5,%%ecx             \n"
-                       "   xor %1,%1                \n"
-                       "   rep stosb                \n"
-                       "   lea %2,%0                \n"
-                       "   mov %3,%1                \n"
-                       "   jmp 1b                   \n"
-                       ".previous                   \n"
-                       _ASM_EXTABLE(1b, 2b)
-                       : "+&D" (ptr), "+&a" (lmask)
-                       : "m" (*ptr), "g" (lmask), "d" (hmask),
-                         "m" (xsave_cntxt_size)
-                       : "ecx" );
+        alternative_io( "1: "".byte 0x48,0x0f,0xae,0x2f \n"
+                        ".section .fixup,\"ax\"         \n"
+                        "2: mov %6,%%ecx                \n"
+                        "   xor %1,%1                   \n"
+                        "   rep stosb                   \n"
+                        "   lea %3,%0                   \n"
+                        "   mov %4,%1                   \n"
+                        "   jmp 1b                      \n"
+                        ".previous                      \n"
+                        _ASM_EXTABLE(1b, 2b),
+                        ".byte 0x48,0x0f,0xc7,0x1f      \n"
+                        ".section .fixup,\"ax\"         \n"
+                        "2: mov %6,%%ecx                \n"
+                        "   xor %1,%1                   \n"
+                        "   rep stosb                   \n"
+                        "   lea %3,%0                   \n"
+                        "   mov %4,%1                   \n"
+                        "   jmp 1b                      \n"
+                        ".previous                      \n"
+                        _ASM_EXTABLE(1b, 2b),
+                        X86_FEATURE_XSAVES,
+                        ASM_OUTPUT2("+&D" (ptr), "+&a" (lmask)),
+                        "m" (*ptr), "g" (lmask), "d" (hmask), "m" (xsave_cntxt_size)
+                        : "ecx" );
         break;
     case 4: case 2:
-        asm volatile ( "1: .byte 0x0f,0xae,0x2f\n"
-                       ".section .fixup,\"ax\" \n"
-                       "2: mov %5,%%ecx        \n"
-                       "   xor %1,%1           \n"
-                       "   rep stosb           \n"
-                       "   lea %2,%0           \n"
-                       "   mov %3,%1           \n"
-                       "   jmp 1b              \n"
-                       ".previous              \n"
-                       _ASM_EXTABLE(1b, 2b)
-                       : "+&D" (ptr), "+&a" (lmask)
-                       : "m" (*ptr), "g" (lmask), "d" (hmask),
-                         "m" (xsave_cntxt_size)
-                       : "ecx" );
+        alternative_io( "1: "".byte 0x0f,0xae,0x2f \n"
+                        ".section .fixup,\"ax\"    \n"
+                        "2: mov %6,%%ecx           \n"
+                        "   xor %1,%1              \n"
+                        "   rep stosb              \n"
+                        "   lea %3,%0              \n"
+                        "   mov %4,%1              \n"
+                        "   jmp 1b                 \n"
+                        ".previous                 \n"
+                        _ASM_EXTABLE(1b, 2b),
+                        ".byte 0x0f,0xc7,0x1f      \n"
+                        ".section .fixup,\"ax\"    \n"
+                        "2: mov %6,%%ecx           \n"
+                        "   xor %1,%1              \n"
+                        "   rep stosb              \n"
+                        "   lea %3,%0              \n"
+                        "   mov %4,%1              \n"
+                        "   jmp 1b                 \n"
+                        ".previous                 \n"
+                        _ASM_EXTABLE(1b, 2b),
+                        X86_FEATURE_XSAVES,
+                        ASM_OUTPUT2("+&D" (ptr), "+&a" (lmask)),
+                        "m" (*ptr), "g" (lmask), "d" (hmask), "m" (xsave_cntxt_size)
+                        : "ecx" );
         break;
     }
 }
@@ -343,11 +539,18 @@ void xstate_init(struct cpuinfo_x86 *c)
 
     /* Mask out features not currently understood by Xen. */
     eax &= (cpufeat_mask(X86_FEATURE_XSAVEOPT) |
-            cpufeat_mask(X86_FEATURE_XSAVEC));
+            cpufeat_mask(X86_FEATURE_XSAVEC) |
+            cpufeat_mask(X86_FEATURE_XGETBV1) |
+            cpufeat_mask(X86_FEATURE_XSAVES));
 
     c->x86_capability[cpufeat_word(X86_FEATURE_XSAVEOPT)] = eax;
 
     BUG_ON(eax != boot_cpu_data.x86_capability[cpufeat_word(X86_FEATURE_XSAVEOPT)]);
+
+    if ( setup_xstate_features(bsp) && bsp )
+        BUG();
+    if ( bsp && (cpu_has_xsaves || cpu_has_xsavec) )
+        setup_xstate_comp();
 }
 
 static bool_t valid_xcr0(u64 xcr0)
diff --git a/xen/include/asm-x86/alternative.h b/xen/include/asm-x86/alternative.h
index 23c9b9f..7ddbd76 100644
--- a/xen/include/asm-x86/alternative.h
+++ b/xen/include/asm-x86/alternative.h
@@ -94,6 +94,9 @@ extern void alternative_instructions(void);
 	asm volatile (ALTERNATIVE(oldinstr, newinstr, feature)		\
 		: output : "i" (0), ## input)
 
+/* Use this macro(s) if you need more than one output parameter. */
+#define ASM_OUTPUT2(a...) a
+
 #endif  /*  __ASSEMBLY__  */
 
 #endif /* __X86_ALTERNATIVE_H__ */
diff --git a/xen/include/asm-x86/xstate.h b/xen/include/asm-x86/xstate.h
index b95a5b5..414cc99 100644
--- a/xen/include/asm-x86/xstate.h
+++ b/xen/include/asm-x86/xstate.h
@@ -91,6 +91,8 @@ void xrstor(struct vcpu *v, uint64_t mask);
 bool_t xsave_enabled(const struct vcpu *v);
 int __must_check validate_xstate(u64 xcr0, u64 xcr0_accum, u64 xstate_bv);
 int __must_check handle_xsetbv(u32 index, u64 new_bv);
+void expand_xsave_states(struct vcpu *v, void *dest, unsigned int size);
+void compress_xsave_states(struct vcpu *v, const void *src, unsigned int size);
 
 /* extended state init and cleanup functions */
 void xstate_free_save_area(struct vcpu *v);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [V11 2/3] x86/xsaves: enable xsaves/xrstors for hvm guest
  2015-11-20  1:17 [V11 0/3] add xsaves/xrstors support Shuai Ruan
  2015-11-20  1:18 ` [V11 1/3] x86/xsaves: enable xsaves/xrstors/xsavec in xen Shuai Ruan
@ 2015-11-20  1:18 ` Shuai Ruan
  2015-11-20  1:18 ` [V11 3/3] libxc: expose xsaves/xgetbv1/xsavec to " Shuai Ruan
  2 siblings, 0 replies; 9+ messages in thread
From: Shuai Ruan @ 2015-11-20  1:18 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, wei.liu2, Ian.Campbell, stefano.stabellini,
	jun.nakajima, andrew.cooper3, ian.jackson, jbeulich, keir

This patch enables xsaves for hvm guest, includes:
1.handle xsaves vmcs init and vmexit.
2.add logic to write/read the XSS msr.

Add IA32_XSS_MSR save/rstore support.

Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/hvm/hvm.c             | 27 +++++++++++++++++++++++++++
 xen/arch/x86/hvm/vmx/vmcs.c        |  5 ++++-
 xen/arch/x86/hvm/vmx/vmx.c         | 35 ++++++++++++++++++++++++++++++++++-
 xen/arch/x86/xstate.c              |  2 +-
 xen/include/asm-x86/hvm/vmx/vmcs.h |  4 ++++
 xen/include/asm-x86/hvm/vmx/vmx.h  |  2 ++
 xen/include/asm-x86/xstate.h       |  1 +
 7 files changed, 73 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 20148f5..108e3e4 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4606,6 +4606,20 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
                     *ebx = _eax + _ebx;
             }
         }
+        if ( count == 1 )
+        {
+            if ( cpu_has_xsaves && cpu_has_vmx_xsaves )
+            {
+                *ebx = XSTATE_AREA_MIN_SIZE;
+                if ( v->arch.xcr0 | v->arch.hvm_vcpu.msr_xss )
+                    for ( sub_leaf = 2; sub_leaf < 63; sub_leaf++ )
+                        if ( (v->arch.xcr0 | v->arch.hvm_vcpu.msr_xss) &
+                             (1ULL << sub_leaf) )
+                            *ebx += xstate_sizes[sub_leaf];
+            }
+            else
+                *ebx = *ecx = *edx = 0;
+        }
         break;
 
     case 0x80000001:
@@ -4705,6 +4719,12 @@ int hvm_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
         *msr_content = v->arch.hvm_vcpu.guest_efer;
         break;
 
+    case MSR_IA32_XSS:
+        if ( !cpu_has_xsaves )
+            goto gp_fault;
+        *msr_content = v->arch.hvm_vcpu.msr_xss;
+        break;
+
     case MSR_IA32_TSC:
         *msr_content = _hvm_rdtsc_intercept();
         break;
@@ -4837,6 +4857,13 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
            return X86EMUL_EXCEPTION;
         break;
 
+    case MSR_IA32_XSS:
+        /* No XSS features currently supported for guests. */
+        if ( !cpu_has_xsaves || msr_content != 0 )
+            goto gp_fault;
+        v->arch.hvm_vcpu.msr_xss = msr_content;
+        break;
+
     case MSR_IA32_TSC:
         hvm_set_guest_tsc(v, msr_content);
         break;
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 53207e6..cf9c14d 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -240,7 +240,8 @@ static int vmx_init_vmcs_config(void)
                SECONDARY_EXEC_PAUSE_LOOP_EXITING |
                SECONDARY_EXEC_ENABLE_INVPCID |
                SECONDARY_EXEC_ENABLE_VM_FUNCTIONS |
-               SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS);
+               SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS |
+               SECONDARY_EXEC_XSAVES);
         rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
         if ( _vmx_misc_cap & VMX_MISC_VMWRITE_ALL )
             opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING;
@@ -1249,6 +1250,8 @@ static int construct_vmcs(struct vcpu *v)
         __vmwrite(HOST_PAT, host_pat);
         __vmwrite(GUEST_PAT, guest_pat);
     }
+    if ( cpu_has_vmx_xsaves )
+        __vmwrite(XSS_EXIT_BITMAP, 0);
 
     vmx_vmcs_exit(v);
 
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index eb6248e..f4a6cca 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -625,7 +625,7 @@ static int vmx_load_vmcs_ctxt(struct vcpu *v, struct hvm_hw_cpu *ctxt)
 
 static unsigned int __init vmx_init_msr(void)
 {
-    return !!cpu_has_mpx;
+    return !!cpu_has_mpx + !!cpu_has_xsaves;
 }
 
 static void vmx_save_msr(struct vcpu *v, struct hvm_msr *ctxt)
@@ -640,6 +640,13 @@ static void vmx_save_msr(struct vcpu *v, struct hvm_msr *ctxt)
     }
 
     vmx_vmcs_exit(v);
+
+    if ( cpu_has_xsaves )
+    {
+        ctxt->msr[ctxt->count].val = v->arch.hvm_vcpu.msr_xss;
+        if ( ctxt->msr[ctxt->count].val )
+            ctxt->msr[ctxt->count++].index = MSR_IA32_XSS;
+    }
 }
 
 static int vmx_load_msr(struct vcpu *v, struct hvm_msr *ctxt)
@@ -659,6 +666,12 @@ static int vmx_load_msr(struct vcpu *v, struct hvm_msr *ctxt)
             else
                 err = -ENXIO;
             break;
+        case MSR_IA32_XSS:
+            if ( cpu_has_xsaves )
+                v->arch.hvm_vcpu.msr_xss = ctxt->msr[i].val;
+            else
+                err = -ENXIO;
+            break;
         default:
             continue;
         }
@@ -2809,6 +2822,18 @@ static void vmx_idtv_reinject(unsigned long idtv_info)
     }
 }
 
+static void vmx_handle_xsaves(void)
+{
+    gdprintk(XENLOG_ERR, "xsaves should not cause vmexit\n");
+    domain_crash(current->domain);
+}
+
+static void vmx_handle_xrstors(void)
+{
+    gdprintk(XENLOG_ERR, "xrstors should not cause vmexit\n");
+    domain_crash(current->domain);
+}
+
 static int vmx_handle_apic_write(void)
 {
     unsigned long exit_qualification;
@@ -3386,6 +3411,14 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
         vmx_vcpu_flush_pml_buffer(v);
         break;
 
+    case EXIT_REASON_XSAVES:
+        vmx_handle_xsaves();
+        break;
+
+    case EXIT_REASON_XRSTORS:
+        vmx_handle_xrstors();
+        break;
+
     case EXIT_REASON_ACCESS_GDTR_OR_IDTR:
     case EXIT_REASON_ACCESS_LDTR_OR_TR:
     case EXIT_REASON_VMX_PREEMPTION_TIMER_EXPIRED:
diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index f3ae285..195544c 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -25,7 +25,7 @@ static u32 __read_mostly xsave_cntxt_size;
 u64 __read_mostly xfeature_mask;
 
 static unsigned int * __read_mostly xstate_offsets;
-static unsigned int * __read_mostly xstate_sizes;
+unsigned int * __read_mostly xstate_sizes;
 static unsigned int __read_mostly xstate_features;
 static unsigned int __read_mostly xstate_comp_offsets[sizeof(xfeature_mask)*8];
 
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 865d9fc..1a5d827 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -225,6 +225,7 @@ extern u32 vmx_vmentry_control;
 #define SECONDARY_EXEC_ENABLE_VMCS_SHADOWING    0x00004000
 #define SECONDARY_EXEC_ENABLE_PML               0x00020000
 #define SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS   0x00040000
+#define SECONDARY_EXEC_XSAVES                   0x00100000
 extern u32 vmx_secondary_exec_control;
 
 #define VMX_EPT_EXEC_ONLY_SUPPORTED                         0x00000001
@@ -290,6 +291,8 @@ extern u64 vmx_ept_vpid_cap;
     (vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS)
 #define cpu_has_vmx_pml \
     (vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_PML)
+#define cpu_has_vmx_xsaves \
+    (vmx_secondary_exec_control & SECONDARY_EXEC_XSAVES)
 
 #define VMCS_RID_TYPE_MASK              0x80000000
 
@@ -364,6 +367,7 @@ enum vmcs_field {
     VMREAD_BITMAP                   = 0x00002026,
     VMWRITE_BITMAP                  = 0x00002028,
     VIRT_EXCEPTION_INFO             = 0x0000202a,
+    XSS_EXIT_BITMAP                 = 0x0000202c,
     GUEST_PHYSICAL_ADDRESS          = 0x00002400,
     VMCS_LINK_POINTER               = 0x00002800,
     GUEST_IA32_DEBUGCTL             = 0x00002802,
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h
index f16a306..0491750 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -188,6 +188,8 @@ static inline unsigned long pi_get_pir(struct pi_desc *pi_desc, int group)
 #define EXIT_REASON_INVPCID             58
 #define EXIT_REASON_VMFUNC              59
 #define EXIT_REASON_PML_FULL            62
+#define EXIT_REASON_XSAVES              63
+#define EXIT_REASON_XRSTORS             64
 
 /*
  * Interruption-information format
diff --git a/xen/include/asm-x86/xstate.h b/xen/include/asm-x86/xstate.h
index 414cc99..12d939b 100644
--- a/xen/include/asm-x86/xstate.h
+++ b/xen/include/asm-x86/xstate.h
@@ -45,6 +45,7 @@
 #define XSTATE_COMPACTION_ENABLED  (1ULL << 63)
 
 extern u64 xfeature_mask;
+extern unsigned int *xstate_sizes;
 
 /* extended state save area */
 struct __packed __attribute__((aligned (64))) xsave_struct
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [V11 3/3] libxc: expose xsaves/xgetbv1/xsavec to hvm guest
  2015-11-20  1:17 [V11 0/3] add xsaves/xrstors support Shuai Ruan
  2015-11-20  1:18 ` [V11 1/3] x86/xsaves: enable xsaves/xrstors/xsavec in xen Shuai Ruan
  2015-11-20  1:18 ` [V11 2/3] x86/xsaves: enable xsaves/xrstors for hvm guest Shuai Ruan
@ 2015-11-20  1:18 ` Shuai Ruan
  2 siblings, 0 replies; 9+ messages in thread
From: Shuai Ruan @ 2015-11-20  1:18 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, wei.liu2, Ian.Campbell, stefano.stabellini,
	jun.nakajima, andrew.cooper3, ian.jackson, jbeulich, keir

From: Shuai Ruan <shuai.ruan@intel.com>

This patch exposes xsaves/xgetbv1/xsavec to hvm guest.
The reserved bits of eax/ebx/ecx/edx must be cleaned up
when call cpuid(0dh) with leaf 1 or 2..63.

According to the spec the following bits must be reserved:
For leaf 1, bits 03-04/08-31 of ecx is reserved. Edx is reserved.
For leaf 2...63, bits 01-31 of ecx is reserved, Edx is reserved.

But as no XSS festures are currently supported, even in HVM guests,
for leaf 2...63, ecx should be zero at the moment.

Signed-off-by: Shuai Ruan <shuai.ruan@intel.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
 tools/libxc/xc_cpuid_x86.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index 031c848..8882c01 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -282,6 +282,9 @@ static void intel_xc_cpuid_policy(xc_interface *xch,
 }
 
 #define XSAVEOPT        (1 << 0)
+#define XSAVEC          (1 << 1)
+#define XGETBV1         (1 << 2)
+#define XSAVES          (1 << 3)
 /* Configure extended state enumeration leaves (0x0000000D for xsave) */
 static void xc_cpuid_config_xsave(xc_interface *xch,
                                   const struct cpuid_domain_info *info,
@@ -318,8 +321,11 @@ static void xc_cpuid_config_xsave(xc_interface *xch,
         regs[1] = 512 + 64; /* FP/SSE + XSAVE.HEADER */
         break;
     case 1: /* leaf 1 */
-        regs[0] &= XSAVEOPT;
-        regs[1] = regs[2] = regs[3] = 0;
+        regs[0] &= (XSAVEOPT | XSAVEC | XGETBV1 | XSAVES);
+        if ( !info->hvm )
+            regs[0] &= ~XSAVES;
+        regs[2] &= info->xfeature_mask;
+        regs[3] = 0;
         break;
     case 2 ... 63: /* sub-leaves */
         if ( !(info->xfeature_mask & (1ULL << input[1])) )
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [V11 1/3] x86/xsaves: enable xsaves/xrstors/xsavec in xen
  2015-11-20  1:18 ` [V11 1/3] x86/xsaves: enable xsaves/xrstors/xsavec in xen Shuai Ruan
@ 2015-11-20 11:35   ` Jan Beulich
  2015-11-23  5:22     ` Shuai Ruan
       [not found]     ` <20151123052255.GA12378@shuai.ruan@linux.intel.com>
  0 siblings, 2 replies; 9+ messages in thread
From: Jan Beulich @ 2015-11-20 11:35 UTC (permalink / raw)
  To: Shuai Ruan
  Cc: kevin.tian, wei.liu2, Ian.Campbell, stefano.stabellini,
	andrew.cooper3, ian.jackson, xen-devel, jun.nakajima, keir

>>> On 20.11.15 at 02:18, <shuai.ruan@linux.intel.com> wrote:
> @@ -187,36 +363,56 @@ void xrstor(struct vcpu *v, uint64_t mask)
>      switch ( __builtin_expect(ptr->fpu_sse.x[FPU_WORD_SIZE_OFFSET], 8) )
>      {
>      default:
> -        asm volatile ( "1: .byte 0x48,0x0f,0xae,0x2f\n"
> -                       ".section .fixup,\"ax\"      \n"
> -                       "2: mov %5,%%ecx             \n"
> -                       "   xor %1,%1                \n"
> -                       "   rep stosb                \n"
> -                       "   lea %2,%0                \n"
> -                       "   mov %3,%1                \n"
> -                       "   jmp 1b                   \n"
> -                       ".previous                   \n"
> -                       _ASM_EXTABLE(1b, 2b)
> -                       : "+&D" (ptr), "+&a" (lmask)
> -                       : "m" (*ptr), "g" (lmask), "d" (hmask),
> -                         "m" (xsave_cntxt_size)
> -                       : "ecx" );
> +        alternative_io( "1: "".byte 0x48,0x0f,0xae,0x2f \n"
> +                        ".section .fixup,\"ax\"         \n"
> +                        "2: mov %6,%%ecx                \n"
> +                        "   xor %1,%1                   \n"
> +                        "   rep stosb                   \n"
> +                        "   lea %3,%0                   \n"
> +                        "   mov %4,%1                   \n"
> +                        "   jmp 1b                      \n"
> +                        ".previous                      \n"
> +                        _ASM_EXTABLE(1b, 2b),
> +                        ".byte 0x48,0x0f,0xc7,0x1f      \n"
> +                        ".section .fixup,\"ax\"         \n"
> +                        "2: mov %6,%%ecx                \n"
> +                        "   xor %1,%1                   \n"
> +                        "   rep stosb                   \n"
> +                        "   lea %3,%0                   \n"
> +                        "   mov %4,%1                   \n"
> +                        "   jmp 1b                      \n"
> +                        ".previous                      \n"
> +                        _ASM_EXTABLE(1b, 2b),
> +                        X86_FEATURE_XSAVES,
> +                        ASM_OUTPUT2("+&D" (ptr), "+&a" (lmask)),
> +                        "m" (*ptr), "g" (lmask), "d" (hmask), "m" (xsave_cntxt_size)
> +                        : "ecx" );

So I had specifically asked for _not_ altering the indentation (to help
review), but you still modified the whole block. Which, if I hadn't
looked closely, would have hidden the %5 -> %6 and similar other
changes. I realize that's due to the dummy input alternative_io()
inserts. So I see three options for you (in order of my preference):

1) Do the conversion properly, splitting things out into a macro, in
a separate, prereq patch. "Properly" here meaning to convert from
numbered to named operands.

2) Fix alternative_{io,input}() to no longer have the - afaict -
pointless dummy input. The comment says it's for API compatibility,
which we (other than Linux from where it was taken) don't care
about. The only current user of alternative_io() should be unaffected,
as it uses named operands already. (This would again be in a prereq
patch, and the main patch would leave indentation unaltered.)

3) Stay with what you have, but leave the original indentation and
add a comment explaining the apparently odd numbering.

Albeit if 1 or 2 was chosen, 3 would seem to be a good idea anyway.

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [V11 1/3] x86/xsaves: enable xsaves/xrstors/xsavec in xen
  2015-11-20 11:35   ` Jan Beulich
@ 2015-11-23  5:22     ` Shuai Ruan
       [not found]     ` <20151123052255.GA12378@shuai.ruan@linux.intel.com>
  1 sibling, 0 replies; 9+ messages in thread
From: Shuai Ruan @ 2015-11-23  5:22 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, wei.liu2, Ian.Campbell, stefano.stabellini,
	andrew.cooper3, ian.jackson, xen-devel, jun.nakajima, keir

On Fri, Nov 20, 2015 at 04:35:00AM -0700, Jan Beulich wrote:
> >>> On 20.11.15 at 02:18, <shuai.ruan@linux.intel.com> wrote:
> > @@ -187,36 +363,56 @@ void xrstor(struct vcpu *v, uint64_t mask)
> >      switch ( __builtin_expect(ptr->fpu_sse.x[FPU_WORD_SIZE_OFFSET], 8) )
> >      {
> >      default:
> > -        asm volatile ( "1: .byte 0x48,0x0f,0xae,0x2f\n"
> > -                       ".section .fixup,\"ax\"      \n"
> > -                       "2: mov %5,%%ecx             \n"
> > -                       "   xor %1,%1                \n"
> > -                       "   rep stosb                \n"
> > -                       "   lea %2,%0                \n"
> > -                       "   mov %3,%1                \n"
> > -                       "   jmp 1b                   \n"
> > -                       ".previous                   \n"
> > -                       _ASM_EXTABLE(1b, 2b)
> > -                       : "+&D" (ptr), "+&a" (lmask)
> > -                       : "m" (*ptr), "g" (lmask), "d" (hmask),
> > -                         "m" (xsave_cntxt_size)
> > -                       : "ecx" );
> > +        alternative_io( "1: "".byte 0x48,0x0f,0xae,0x2f \n"
> > +                        ".section .fixup,\"ax\"         \n"
> > +                        "2: mov %6,%%ecx                \n"
> > +                        "   xor %1,%1                   \n"
> > +                        "   rep stosb                   \n"
> > +                        "   lea %3,%0                   \n"
> > +                        "   mov %4,%1                   \n"
> > +                        "   jmp 1b                      \n"
> > +                        ".previous                      \n"
> > +                        _ASM_EXTABLE(1b, 2b),
> > +                        ".byte 0x48,0x0f,0xc7,0x1f      \n"
> > +                        ".section .fixup,\"ax\"         \n"
> > +                        "2: mov %6,%%ecx                \n"
> > +                        "   xor %1,%1                   \n"
> > +                        "   rep stosb                   \n"
> > +                        "   lea %3,%0                   \n"
> > +                        "   mov %4,%1                   \n"
> > +                        "   jmp 1b                      \n"
> > +                        ".previous                      \n"
> > +                        _ASM_EXTABLE(1b, 2b),
> > +                        X86_FEATURE_XSAVES,
> > +                        ASM_OUTPUT2("+&D" (ptr), "+&a" (lmask)),
> > +                        "m" (*ptr), "g" (lmask), "d" (hmask), "m" (xsave_cntxt_size)
> > +                        : "ecx" );
> 
> So I had specifically asked for _not_ altering the indentation (to help
> review), but you still modified the whole block. Which, if I hadn't
> looked closely, would have hidden the %5 -> %6 and similar other
> changes. I realize that's due to the dummy input alternative_io()
> inserts. So I see three options for you (in order of my preference):
> 
> 1) Do the conversion properly, splitting things out into a macro, in
> a separate, prereq patch. "Properly" here meaning to convert from
> numbered to named operands.
I prefer to use this option.

The original code is below(without xsaves patch) 
        asm volatile ( "1: .byte 0x48,0x0f,0xae,0x2f\n"
                       ".section .fixup,\"ax\"      \n"
                       "2: mov %5,%%ecx             \n"
                       "   xor %1,%1                \n"
                       "   rep stosb                \n"
                       "   lea %2,%0                \n"
                       "   mov %3,%1                \n"
                       "   jmp 1b                   \n"
                       ".previous                   \n"
                       _ASM_EXTABLE(1b, 2b)
                      : "+&D" (ptr), "+&a" (lmask)
                      : "m" (*ptr), "g" (lmask), "d" (hmask),
                      "m" (xsave_cntxt_size)
                      : "ecx" );

Then My code to using named operands is below(without xaves patch).

        asm volatile ( "1: .byte 0x48,0x0f,0xae,0x2f	\n"
                       ".section .fixup,\"ax\"          \n"
                       "2: mov %[SIZE],%%ecx            \n"
                       "   xor %[LMASK_A],%[LMASK_A]    \n"
                       "   rep stosb                    \n"
                       "   lea %[PTR_M],%[PTR]          \n"
                       "   mov %[LMASK_G],%[LMASK_A]    \n"
                       "   jmp 1b                   	\n"
                       ".previous                   	\n"
                       _ASM_EXTABLE(1b, 2b)
                      : [PTR] "+&D" (ptr),  [LMASK_A] "+&a" (lmask)
                      : [PTR_M] "m" (*ptr), [LMASK_G] "g" (lmask), [HMASK_D] "d" (hmask),
                        [SIZE] "m" (xsave_cntxt_size)
                      : "ecx" );

Is that ok to you ?

Also , You ask to splitting things out into a macro ? I do not quite
unstandand your meaning ? Does it mean define Macro to handle fixup
code like below ?
#define XRSTOR_FIXUP
        ".section .fixup,\"ax\"          \n" \
        "2: mov %[SIZE],%%ecx            \n" \
        "   xor %[LMASK_A],%[LMASK_A]    \n" \
        "   rep stosb                    \n" \
        "   lea %[PTR_M],%[PTR]          \n" \
        "   mov %[LMASK_G],%[LMASK_A]    \n" \
        "   jmp 1b                   	\n"  \
        ".previous                   	\n"  \
        _ASM_EXTABLE(1b, 2b)

Then xrstor side can be:
        asm volatile ( "1: .byte 0x48,0x0f,0xae,0x2f	\n"
		       XRSTOR_FIXUP
                      : [PTR] "+&D" (ptr), [LMASK_A] "+&a" (lmask)
                      : [PTR_M] "m" (*ptr), [LMASK_G] "g" (lmask), [HMASK_D] "d" (hmask),
                        [SIZE] "m" (xsave_cntxt_size)
                      : "ecx" );

Thanks
Shuai
	

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [V11 1/3] x86/xsaves: enable xsaves/xrstors/xsavec in xen
       [not found]     ` <20151123052255.GA12378@shuai.ruan@linux.intel.com>
@ 2015-11-23 10:06       ` Jan Beulich
  2015-11-23 11:06         ` Shuai Ruan
       [not found]         ` <20151123110624.GA13622@shuai.ruan@linux.intel.com>
  0 siblings, 2 replies; 9+ messages in thread
From: Jan Beulich @ 2015-11-23 10:06 UTC (permalink / raw)
  To: Shuai Ruan
  Cc: kevin.tian, wei.liu2, Ian.Campbell, stefano.stabellini,
	andrew.cooper3, ian.jackson, xen-devel, jun.nakajima, keir

>>> On 23.11.15 at 06:22, <shuai.ruan@linux.intel.com> wrote:
> On Fri, Nov 20, 2015 at 04:35:00AM -0700, Jan Beulich wrote:
>> >>> On 20.11.15 at 02:18, <shuai.ruan@linux.intel.com> wrote:
>> > @@ -187,36 +363,56 @@ void xrstor(struct vcpu *v, uint64_t mask)
>> >      switch ( __builtin_expect(ptr->fpu_sse.x[FPU_WORD_SIZE_OFFSET], 8) )
>> >      {
>> >      default:
>> > -        asm volatile ( "1: .byte 0x48,0x0f,0xae,0x2f\n"
>> > -                       ".section .fixup,\"ax\"      \n"
>> > -                       "2: mov %5,%%ecx             \n"
>> > -                       "   xor %1,%1                \n"
>> > -                       "   rep stosb                \n"
>> > -                       "   lea %2,%0                \n"
>> > -                       "   mov %3,%1                \n"
>> > -                       "   jmp 1b                   \n"
>> > -                       ".previous                   \n"
>> > -                       _ASM_EXTABLE(1b, 2b)
>> > -                       : "+&D" (ptr), "+&a" (lmask)
>> > -                       : "m" (*ptr), "g" (lmask), "d" (hmask),
>> > -                         "m" (xsave_cntxt_size)
>> > -                       : "ecx" );
>> > +        alternative_io( "1: "".byte 0x48,0x0f,0xae,0x2f \n"
>> > +                        ".section .fixup,\"ax\"         \n"
>> > +                        "2: mov %6,%%ecx                \n"
>> > +                        "   xor %1,%1                   \n"
>> > +                        "   rep stosb                   \n"
>> > +                        "   lea %3,%0                   \n"
>> > +                        "   mov %4,%1                   \n"
>> > +                        "   jmp 1b                      \n"
>> > +                        ".previous                      \n"
>> > +                        _ASM_EXTABLE(1b, 2b),
>> > +                        ".byte 0x48,0x0f,0xc7,0x1f      \n"
>> > +                        ".section .fixup,\"ax\"         \n"
>> > +                        "2: mov %6,%%ecx                \n"
>> > +                        "   xor %1,%1                   \n"
>> > +                        "   rep stosb                   \n"
>> > +                        "   lea %3,%0                   \n"
>> > +                        "   mov %4,%1                   \n"
>> > +                        "   jmp 1b                      \n"
>> > +                        ".previous                      \n"
>> > +                        _ASM_EXTABLE(1b, 2b),
>> > +                        X86_FEATURE_XSAVES,
>> > +                        ASM_OUTPUT2("+&D" (ptr), "+&a" (lmask)),
>> > +                        "m" (*ptr), "g" (lmask), "d" (hmask), "m" 
> (xsave_cntxt_size)
>> > +                        : "ecx" );
>> 
>> So I had specifically asked for _not_ altering the indentation (to help
>> review), but you still modified the whole block. Which, if I hadn't
>> looked closely, would have hidden the %5 -> %6 and similar other
>> changes. I realize that's due to the dummy input alternative_io()
>> inserts. So I see three options for you (in order of my preference):
>> 
>> 1) Do the conversion properly, splitting things out into a macro, in
>> a separate, prereq patch. "Properly" here meaning to convert from
>> numbered to named operands.
> I prefer to use this option.
> 
> The original code is below(without xsaves patch) 
>         asm volatile ( "1: .byte 0x48,0x0f,0xae,0x2f\n"
>                        ".section .fixup,\"ax\"      \n"
>                        "2: mov %5,%%ecx             \n"
>                        "   xor %1,%1                \n"
>                        "   rep stosb                \n"
>                        "   lea %2,%0                \n"
>                        "   mov %3,%1                \n"
>                        "   jmp 1b                   \n"
>                        ".previous                   \n"
>                        _ASM_EXTABLE(1b, 2b)
>                       : "+&D" (ptr), "+&a" (lmask)
>                       : "m" (*ptr), "g" (lmask), "d" (hmask),
>                       "m" (xsave_cntxt_size)
>                       : "ecx" );
> 
> Then My code to using named operands is below(without xaves patch).
> 
>         asm volatile ( "1: .byte 0x48,0x0f,0xae,0x2f	\n"
>                        ".section .fixup,\"ax\"          \n"
>                        "2: mov %[SIZE],%%ecx            \n"
>                        "   xor %[LMASK_A],%[LMASK_A]    \n"
>                        "   rep stosb                    \n"
>                        "   lea %[PTR_M],%[PTR]          \n"
>                        "   mov %[LMASK_G],%[LMASK_A]    \n"
>                        "   jmp 1b                   	\n"
>                        ".previous                   	\n"
>                        _ASM_EXTABLE(1b, 2b)
>                       : [PTR] "+&D" (ptr),  [LMASK_A] "+&a" (lmask)
>                       : [PTR_M] "m" (*ptr), [LMASK_G] "g" (lmask), [HMASK_D] "d" (hmask),
>                         [SIZE] "m" (xsave_cntxt_size)
>                       : "ecx" );
> 
> Is that ok to you ?

Well, kind of. The variable naming is quite strange: Why upper case?
And why those strange _A, _G, etc suffixes (I think you instead mean
_out and _in or some such respectively; but see also below - the fewer
dependencies between macro and code surrounding the use sites, the
less important the selection of asm() operand names)?

> Also , You ask to splitting things out into a macro ? I do not quite
> unstandand your meaning ? Does it mean define Macro to handle fixup
> code like below ?
> #define XRSTOR_FIXUP
>         ".section .fixup,\"ax\"          \n" \
>         "2: mov %[SIZE],%%ecx            \n" \
>         "   xor %[LMASK_A],%[LMASK_A]    \n" \
>         "   rep stosb                    \n" \
>         "   lea %[PTR_M],%[PTR]          \n" \
>         "   mov %[LMASK_G],%[LMASK_A]    \n" \
>         "   jmp 1b                   	\n"  \
>         ".previous                   	\n"  \
>         _ASM_EXTABLE(1b, 2b)
> 
> Then xrstor side can be:
>         asm volatile ( "1: .byte 0x48,0x0f,0xae,0x2f	\n"
> 		       XRSTOR_FIXUP
>                       : [PTR] "+&D" (ptr), [LMASK_A] "+&a" (lmask)
>                       : [PTR_M] "m" (*ptr), [LMASK_G] "g" (lmask), [HMASK_D] "d" (hmask),
>                         [SIZE] "m" (xsave_cntxt_size)
>                       : "ecx" );

Goes in the right direction, but I think all operands related to the fixup
should also get moved to the macro. Of course you'll have to use your
judgment with your actual patches in mind. (To be more precise, the
macro should have as much inside it as possible, so that as little as
possible dependencies exist between it and the code surrounding the
macro invocation. This likely requires adding a parameter or two to
macro.)

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [V11 1/3] x86/xsaves: enable xsaves/xrstors/xsavec in xen
  2015-11-23 10:06       ` Jan Beulich
@ 2015-11-23 11:06         ` Shuai Ruan
       [not found]         ` <20151123110624.GA13622@shuai.ruan@linux.intel.com>
  1 sibling, 0 replies; 9+ messages in thread
From: Shuai Ruan @ 2015-11-23 11:06 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, wei.liu2, Ian.Campbell, stefano.stabellini,
	andrew.cooper3, ian.jackson, xen-devel, jun.nakajima, keir

On Mon, Nov 23, 2015 at 03:06:01AM -0700, Jan Beulich wrote:
> >> numbered to named operands.
> > I prefer to use this option.
> > 
> > The original code is below(without xsaves patch) 
> >         asm volatile ( "1: .byte 0x48,0x0f,0xae,0x2f\n"
> >                        ".section .fixup,\"ax\"      \n"
> >                        "2: mov %5,%%ecx             \n"
> >                        "   xor %1,%1                \n"
> >                        "   rep stosb                \n"
> >                        "   lea %2,%0                \n"
> >                        "   mov %3,%1                \n"
> >                        "   jmp 1b                   \n"
> >                        ".previous                   \n"
> >                        _ASM_EXTABLE(1b, 2b)
> >                       : "+&D" (ptr), "+&a" (lmask)
> >                       : "m" (*ptr), "g" (lmask), "d" (hmask),
> >                       "m" (xsave_cntxt_size)
> >                       : "ecx" );
> > 
> > Then My code to using named operands is below(without xaves patch).
> > 
> >         asm volatile ( "1: .byte 0x48,0x0f,0xae,0x2f	\n"
> >                        ".section .fixup,\"ax\"          \n"
> >                        "2: mov %[SIZE],%%ecx            \n"
> >                        "   xor %[LMASK_A],%[LMASK_A]    \n"
> >                        "   rep stosb                    \n"
> >                        "   lea %[PTR_M],%[PTR]          \n"
> >                        "   mov %[LMASK_G],%[LMASK_A]    \n"
> >                        "   jmp 1b                   	\n"
> >                        ".previous                   	\n"
> >                        _ASM_EXTABLE(1b, 2b)
> >                       : [PTR] "+&D" (ptr),  [LMASK_A] "+&a" (lmask)
> >                       : [PTR_M] "m" (*ptr), [LMASK_G] "g" (lmask), [HMASK_D] "d" (hmask),
> >                         [SIZE] "m" (xsave_cntxt_size)
> >                       : "ecx" );
> > 
> > Is that ok to you ?
> 
> Well, kind of. The variable naming is quite strange: Why upper case?
lower case will be used.
> And why those strange _A, _G, etc suffixes (I think you instead mean
> _out and _in or some such respectively; but see also below - the fewer
> dependencies between macro and code surrounding the use sites, the
> less important the selection of asm() operand names)?
I intend to use _A _G to represent the register it use to make it more
readable. But actually it make the code more strange.
I intend to add prefix before operand (like below).

         asm volatile ( "1: .byte 0x48,0x0f,0xae,0x2f	   \n"
                        ".section .fixup,\"ax\"            \n"
                        "2: mov %[size_in],%%ecx           \n"
                        "   xor %[lmask_out],%[lmask_out]  \n"
                        "   rep stosb                      \n"
                        "   lea %[ptr_in],%[ptr_out]       \n"
                        "   mov %[lmask_in],%[lmask_out]   \n"
                        "   jmp 1b                   	   \n"
                        ".previous                   	   \n"
                        _ASM_EXTABLE(1b, 2b)
                        : [ptr_out] "+&D" (ptr),  [lmask_out] "+&a" (lmask)
                        : [ptr_in] "m" (*ptr), [lmask_in] "g" (lmask), [hmask_in] "d" (hmask),
                        [size_in] "m" (xsave_cntxt_size)
                       : "ecx" );

This seem make the XSAVE_FIXUP macro below depend more on code
surrounding the use sites.
But the XSAVE_FIXUP macro actually only used in xrstor. So I think it
may be proper used like this. 
Is naming operand in this way ok to you ?

> > Also , You ask to splitting things out into a macro ? I do not quite
> > unstandand your meaning ? Does it mean define Macro to handle fixup
> > code like below ?
> > #define XRSTOR_FIXUP
> >         ".section .fixup,\"ax\"          \n" \
> >         "2: mov %[SIZE],%%ecx            \n" \
> >         "   xor %[LMASK_A],%[LMASK_A]    \n" \
> >         "   rep stosb                    \n" \
> >         "   lea %[PTR_M],%[PTR]          \n" \
> >         "   mov %[LMASK_G],%[LMASK_A]    \n" \
> >         "   jmp 1b                   	\n"  \
> >         ".previous                   	\n"  \
> >         _ASM_EXTABLE(1b, 2b)
> > 
> > Then xrstor side can be:
> >         asm volatile ( "1: .byte 0x48,0x0f,0xae,0x2f	\n"
> > 		       XRSTOR_FIXUP
> >                       : [PTR] "+&D" (ptr), [LMASK_A] "+&a" (lmask)
> >                       : [PTR_M] "m" (*ptr), [LMASK_G] "g" (lmask), [HMASK_D] "d" (hmask),
> >                         [SIZE] "m" (xsave_cntxt_size)
> >                       : "ecx" );
> 
> Goes in the right direction, but I think all operands related to the fixup
> should also get moved to the macro. Of course you'll have to use your
> judgment with your actual patches in mind. (To be more precise, the
> macro should have as much inside it as possible, so that as little as
> possible dependencies exist between it and the code surrounding the
> macro invocation. This likely requires adding a parameter or two to
> macro.)
All the operands realted to fixup moved to macro is ok here. 
But as I will use alternative asm in restor side(in xsaves patch), alternative will 
seperate the fixup code and out/in operand definitions like above , and also I intend 
to reuse XRSTOR_FIXUP in restor side. So in my opintion, seperate like this is ok.
Perhaps, there is some point I am not of aware. Please let me if you
have much better way to do this ?

Thanks for your time.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [V11 1/3] x86/xsaves: enable xsaves/xrstors/xsavec in xen
       [not found]         ` <20151123110624.GA13622@shuai.ruan@linux.intel.com>
@ 2015-11-23 11:24           ` Jan Beulich
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2015-11-23 11:24 UTC (permalink / raw)
  To: Shuai Ruan
  Cc: kevin.tian, wei.liu2, Ian.Campbell, stefano.stabellini,
	andrew.cooper3, ian.jackson, xen-devel, jun.nakajima, keir

>>> On 23.11.15 at 12:06, <shuai.ruan@linux.intel.com> wrote:
> On Mon, Nov 23, 2015 at 03:06:01AM -0700, Jan Beulich wrote:
>> >> numbered to named operands.
>> > I prefer to use this option.
>> > 
>> > The original code is below(without xsaves patch) 
>> >         asm volatile ( "1: .byte 0x48,0x0f,0xae,0x2f\n"
>> >                        ".section .fixup,\"ax\"      \n"
>> >                        "2: mov %5,%%ecx             \n"
>> >                        "   xor %1,%1                \n"
>> >                        "   rep stosb                \n"
>> >                        "   lea %2,%0                \n"
>> >                        "   mov %3,%1                \n"
>> >                        "   jmp 1b                   \n"
>> >                        ".previous                   \n"
>> >                        _ASM_EXTABLE(1b, 2b)
>> >                       : "+&D" (ptr), "+&a" (lmask)
>> >                       : "m" (*ptr), "g" (lmask), "d" (hmask),
>> >                       "m" (xsave_cntxt_size)
>> >                       : "ecx" );
>> > 
>> > Then My code to using named operands is below(without xaves patch).
>> > 
>> >         asm volatile ( "1: .byte 0x48,0x0f,0xae,0x2f	\n"
>> >                        ".section .fixup,\"ax\"          \n"
>> >                        "2: mov %[SIZE],%%ecx            \n"
>> >                        "   xor %[LMASK_A],%[LMASK_A]    \n"
>> >                        "   rep stosb                    \n"
>> >                        "   lea %[PTR_M],%[PTR]          \n"
>> >                        "   mov %[LMASK_G],%[LMASK_A]    \n"
>> >                        "   jmp 1b                   	\n"
>> >                        ".previous                   	\n"
>> >                        _ASM_EXTABLE(1b, 2b)
>> >                       : [PTR] "+&D" (ptr),  [LMASK_A] "+&a" (lmask)
>> >                       : [PTR_M] "m" (*ptr), [LMASK_G] "g" (lmask), 
> [HMASK_D] "d" (hmask),
>> >                         [SIZE] "m" (xsave_cntxt_size)
>> >                       : "ecx" );
>> > 
>> > Is that ok to you ?
>> 
>> Well, kind of. The variable naming is quite strange: Why upper case?
> lower case will be used.
>> And why those strange _A, _G, etc suffixes (I think you instead mean
>> _out and _in or some such respectively; but see also below - the fewer
>> dependencies between macro and code surrounding the use sites, the
>> less important the selection of asm() operand names)?
> I intend to use _A _G to represent the register it use to make it more
> readable. But actually it make the code more strange.
> I intend to add prefix before operand (like below).
> 
>          asm volatile ( "1: .byte 0x48,0x0f,0xae,0x2f	   \n"
>                         ".section .fixup,\"ax\"            \n"
>                         "2: mov %[size_in],%%ecx           \n"
>                         "   xor %[lmask_out],%[lmask_out]  \n"
>                         "   rep stosb                      \n"
>                         "   lea %[ptr_in],%[ptr_out]       \n"
>                         "   mov %[lmask_in],%[lmask_out]   \n"
>                         "   jmp 1b                   	   \n"
>                         ".previous                   	   \n"
>                         _ASM_EXTABLE(1b, 2b)
>                         : [ptr_out] "+&D" (ptr),  [lmask_out] "+&a" (lmask)
>                         : [ptr_in] "m" (*ptr), [lmask_in] "g" (lmask), [hmask_in] "d" (hmask),
>                         [size_in] "m" (xsave_cntxt_size)
>                        : "ecx" );
> 
> This seem make the XSAVE_FIXUP macro below depend more on code
> surrounding the use sites.
> But the XSAVE_FIXUP macro actually only used in xrstor. So I think it
> may be proper used like this. 
> Is naming operand in this way ok to you ?

Mostly. Please at least omit the suffixes where not needed to
disambiguate two operands. Also I don't think ptr_in is properly
reflecting the arguments purposes (which is a block of memory,
not a pointer to such).

>> > Also , You ask to splitting things out into a macro ? I do not quite
>> > unstandand your meaning ? Does it mean define Macro to handle fixup
>> > code like below ?
>> > #define XRSTOR_FIXUP
>> >         ".section .fixup,\"ax\"          \n" \
>> >         "2: mov %[SIZE],%%ecx            \n" \
>> >         "   xor %[LMASK_A],%[LMASK_A]    \n" \
>> >         "   rep stosb                    \n" \
>> >         "   lea %[PTR_M],%[PTR]          \n" \
>> >         "   mov %[LMASK_G],%[LMASK_A]    \n" \
>> >         "   jmp 1b                   	\n"  \
>> >         ".previous                   	\n"  \
>> >         _ASM_EXTABLE(1b, 2b)
>> > 
>> > Then xrstor side can be:
>> >         asm volatile ( "1: .byte 0x48,0x0f,0xae,0x2f	\n"
>> > 		       XRSTOR_FIXUP
>> >                       : [PTR] "+&D" (ptr), [LMASK_A] "+&a" (lmask)
>> >                       : [PTR_M] "m" (*ptr), [LMASK_G] "g" (lmask), 
> [HMASK_D] "d" (hmask),
>> >                         [SIZE] "m" (xsave_cntxt_size)
>> >                       : "ecx" );
>> 
>> Goes in the right direction, but I think all operands related to the fixup
>> should also get moved to the macro. Of course you'll have to use your
>> judgment with your actual patches in mind. (To be more precise, the
>> macro should have as much inside it as possible, so that as little as
>> possible dependencies exist between it and the code surrounding the
>> macro invocation. This likely requires adding a parameter or two to
>> macro.)
> All the operands realted to fixup moved to macro is ok here. 
> But as I will use alternative asm in restor side(in xsaves patch), 
> alternative will 
> seperate the fixup code and out/in operand definitions like above , and also 
> I intend 
> to reuse XRSTOR_FIXUP in restor side. So in my opintion, seperate like this 
> is ok.
> Perhaps, there is some point I am not of aware. Please let me if you
> have much better way to do this ?

I can't judge on this without seeing the result. I can only again defer
to your judgment. As said, the restriction I'm placing on the set of
choices you have is that there should be no disconnect between
names chosen inside the macro and names chosen by the invoking
code. I.e. anything the scope of which is the macro alone can be
established by the macro, while anything the scope of which is
wider than the macro alone should be passed as argument by the
invoking code.

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-11-23 11:24 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-20  1:17 [V11 0/3] add xsaves/xrstors support Shuai Ruan
2015-11-20  1:18 ` [V11 1/3] x86/xsaves: enable xsaves/xrstors/xsavec in xen Shuai Ruan
2015-11-20 11:35   ` Jan Beulich
2015-11-23  5:22     ` Shuai Ruan
     [not found]     ` <20151123052255.GA12378@shuai.ruan@linux.intel.com>
2015-11-23 10:06       ` Jan Beulich
2015-11-23 11:06         ` Shuai Ruan
     [not found]         ` <20151123110624.GA13622@shuai.ruan@linux.intel.com>
2015-11-23 11:24           ` Jan Beulich
2015-11-20  1:18 ` [V11 2/3] x86/xsaves: enable xsaves/xrstors for hvm guest Shuai Ruan
2015-11-20  1:18 ` [V11 3/3] libxc: expose xsaves/xgetbv1/xsavec to " Shuai Ruan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.