All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/5] x86: improve PDX <-> PFN and alike translations
       [not found]       ` <5AA7D98302000078001DD5F0@prv1-mh.provo.novell.com>
@ 2018-08-17  7:06         ` Jan Beulich
  2018-08-17  7:20           ` [PATCH v3 1/5] x86: remove page.h and processor.h inclusion from asm_defns.h Jan Beulich
                             ` (4 more replies)
  2018-09-18 12:34         ` [PATCH v4 0/4] x86: improve PDX <-> PFN and alike translations Jan Beulich
  1 sibling, 5 replies; 24+ messages in thread
From: Jan Beulich @ 2018-08-17  7:06 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

1: remove page.h and processor.h inclusion from asm_defns.h
2: use PDEP/PEXT for maddr/direct-map-offset conversion when available
3: use PDEP/PEXT for PFN/PDX conversion when available
4: use MOV for PFN/PDX conversion when possible
5: use PDEP for PTE flags insertion when available

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v3: Re-order series. Re-base.
v2: Extensive re-base.




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 1/5] x86: remove page.h and processor.h inclusion from asm_defns.h
  2018-08-17  7:06         ` [PATCH v3 0/5] x86: improve PDX <-> PFN and alike translations Jan Beulich
@ 2018-08-17  7:20           ` Jan Beulich
  2018-08-17  8:39             ` Andrew Cooper
  2018-08-17  7:21           ` [PATCH v3 2/5] x86: use PDEP/PEXT for maddr/direct-map-offset conversion when available Jan Beulich
                             ` (3 subsequent siblings)
  4 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2018-08-17  7:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

Subsequent changes require this (too wide anyway imo) dependency to be
dropped.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/boot/head.S
+++ b/xen/arch/x86/boot/head.S
@@ -5,6 +5,7 @@
 #include <asm/desc.h>
 #include <asm/fixmap.h>
 #include <asm/page.h>
+#include <asm/processor.h>
 #include <asm/msr.h>
 #include <asm/cpufeature.h>
 #include <public/elfnote.h>
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -9,6 +9,7 @@
 #include <asm/asm_defns.h>
 #include <asm/apicdef.h>
 #include <asm/page.h>
+#include <asm/processor.h>
 #include <asm/desc.h>
 #include <public/xen.h>
 #include <irq_vectors.h>
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -11,6 +11,7 @@
 #include <asm/asm_defns.h>
 #include <asm/apicdef.h>
 #include <asm/page.h>
+#include <asm/processor.h>
 #include <public/xen.h>
 #include <irq_vectors.h>
 
--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -7,9 +7,8 @@
 #include <asm/asm-offsets.h>
 #endif
 #include <asm/bug.h>
-#include <asm/page.h>
-#include <asm/processor.h>
 #include <asm/percpu.h>
+#include <asm/x86-defns.h>
 #include <xen/stringify.h>
 #include <asm/cpufeature.h>
 #include <asm/alternative.h>
--- a/xen/include/asm-x86/cpuid.h
+++ b/xen/include/asm-x86/cpuid.h
@@ -59,6 +59,7 @@ int init_domain_cpuid_policy(struct doma
 /* Clamp the CPUID policy to reality. */
 void recalculate_cpuid_policy(struct domain *d);
 
+struct vcpu;
 void guest_cpuid(const struct vcpu *v, uint32_t leaf,
                  uint32_t subleaf, struct cpuid_leaf *res);
 
--- a/xen/include/asm-x86/msr.h
+++ b/xen/include/asm-x86/msr.h
@@ -13,6 +13,7 @@
 
 #include <asm/asm_defns.h>
 #include <asm/cpufeature.h>
+#include <asm/processor.h>
 
 #define rdmsr(msr,val1,val2) \
      __asm__ __volatile__("rdmsr" \





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 2/5] x86: use PDEP/PEXT for maddr/direct-map-offset conversion when available
  2018-08-17  7:06         ` [PATCH v3 0/5] x86: improve PDX <-> PFN and alike translations Jan Beulich
  2018-08-17  7:20           ` [PATCH v3 1/5] x86: remove page.h and processor.h inclusion from asm_defns.h Jan Beulich
@ 2018-08-17  7:21           ` Jan Beulich
  2018-08-17  8:59             ` Andrew Cooper
  2018-08-17  7:22           ` [PATCH v3 3/5] x86: use PDEP/PEXT for PFN/PDX " Jan Beulich
                             ` (2 subsequent siblings)
  4 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2018-08-17  7:21 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

This allows to fold 6 instructions into a single one, reducing code size
quite a bit, especially when not considering the fallback functions
(which won't ever need to be brought into iCache or their mappings into
iTLB on systems supporting BMI2).

Make use of gcc's new V operand modifier, even if that results in a
slightly odd dependency in the sources (but I also didn't want to
introduce yet another manifest constant).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v3: Move infrastructure pieces here from "x86: use PDEP for PTE flags
    insertion when available".
v2: Avoid quoted symbols; use gcc's new V operand modifier instead.
    Re-base.

--- a/xen/arch/x86/alternative.c
+++ b/xen/arch/x86/alternative.c
@@ -224,6 +224,12 @@ void init_or_livepatch apply_alternative
         /* 0xe8/0xe9 are relative branches; fix the offset. */
         if ( a->repl_len >= 5 && (*buf & 0xfe) == 0xe8 )
             *(int32_t *)(buf + 1) += repl - orig;
+        /* RIP-relative addressing is easy to check for in VEX-encoded insns. */
+        else if ( a->repl_len >= 8 &&
+                  (*buf & ~1) == 0xc4 &&
+                  a->repl_len >= 9 - (*buf & 1) &&
+                  (buf[4 - (*buf & 1)] & ~0x38) == 0x05 )
+            *(int32_t *)(buf + 5 - (*buf & 1)) += repl - orig;
 
         add_nops(buf + a->repl_len, total_len - a->repl_len);
         text_poke(orig, buf, total_len);
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -390,6 +390,25 @@ void __init arch_init_memory(void)
 #endif
 }
 
+paddr_t __read_mostly ma_real_mask = ~0UL;
+
+#ifndef CONFIG_INDIRECT_THUNK /* V modifier unavailable? */
+
+/* Conversion between machine address and direct map offset. */
+paddr_t do2ma(unsigned long off)
+{
+    return (off & ma_va_bottom_mask) |
+           ((off << pfn_pdx_hole_shift) & ma_top_mask);
+}
+
+unsigned long ma2do(paddr_t ma)
+{
+    return (ma & ma_va_bottom_mask) |
+           ((ma & ma_top_mask) >> pfn_pdx_hole_shift);
+}
+
+#endif
+
 int page_is_ram_type(unsigned long mfn, unsigned long mem_type)
 {
     uint64_t maddr = pfn_to_paddr(mfn);
--- a/xen/arch/x86/srat.c
+++ b/xen/arch/x86/srat.c
@@ -450,6 +450,8 @@ void __init srat_parse_regions(u64 addr)
 	}
 
 	pfn_pdx_hole_setup(mask >> PAGE_SHIFT);
+
+	ma_real_mask = ma_top_mask | ma_va_bottom_mask;
 }
 
 /* Use the information discovered above to actually set up the nodes. */
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -66,6 +66,7 @@ SECTIONS
         _stext = .;            /* Text and read-only data */
        *(.text)
        *(.text.__x86_indirect_thunk_*)
+       *(.gnu.linkonce.t.*)
        *(.text.page_aligned)
 
        . = ALIGN(PAGE_SIZE);
--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -186,6 +186,20 @@ void ret_from_intr(void);
         UNLIKELY_END_SECTION "\n"          \
         ".Llikely." #tag ".%=:"
 
+#define LINKONCE_PROLOGUE(sym)                    \
+        ".ifndef " sym() "\n\t"                   \
+        ".pushsection " sym(.gnu.linkonce.t.) "," \
+                      "\"ax\",@progbits\n\t"      \
+        ".p2align 4\n"                            \
+        sym() ":"
+
+#define LINKONCE_EPILOGUE(sym)                    \
+        ".weak " sym() "\n\t"                     \
+        ".type " sym() ", @function\n\t"          \
+        ".size " sym() ", . - " sym() "\n\t"      \
+        ".popsection\n\t"                         \
+        ".endif"
+
 #endif
 
 /* "Raw" instruction opcodes */
--- a/xen/include/asm-x86/x86_64/page.h
+++ b/xen/include/asm-x86/x86_64/page.h
@@ -42,11 +42,18 @@ static inline unsigned long canonicalise
         return addr & ~CANONICAL_MASK;
 }
 
+#include <asm/alternative.h>
+#include <asm/asm_defns.h>
+#include <asm/cpufeature.h>
 #include <asm/types.h>
 
 #include <xen/pdx.h>
 
 extern unsigned long xen_virt_end;
+extern paddr_t ma_real_mask;
+
+paddr_t do2ma(unsigned long);
+unsigned long ma2do(paddr_t);
 
 /*
  * Note: These are solely for the use by page_{get,set}_owner(), and
@@ -57,8 +64,10 @@ extern unsigned long xen_virt_end;
 #define pdx_to_virt(pdx) ((void *)(DIRECTMAP_VIRT_START + \
                                    ((unsigned long)(pdx) << PAGE_SHIFT)))
 
-static inline unsigned long __virt_to_maddr(unsigned long va)
+static always_inline paddr_t __virt_to_maddr(unsigned long va)
 {
+    paddr_t ma;
+
     ASSERT(va < DIRECTMAP_VIRT_END);
     if ( va >= DIRECTMAP_VIRT_START )
         va -= DIRECTMAP_VIRT_START;
@@ -71,16 +80,77 @@ static inline unsigned long __virt_to_ma
 
         va += xen_phys_start - XEN_VIRT_START;
     }
-    return (va & ma_va_bottom_mask) |
-           ((va << pfn_pdx_hole_shift) & ma_top_mask);
+
+#ifdef CONFIG_INDIRECT_THUNK /* V modifier available? */
+#define SYMNAME(pfx...) #pfx "do2ma_%V[ma]_%V[off]"
+    alternative_io("call " SYMNAME() "\n\t"
+                   LINKONCE_PROLOGUE(SYMNAME) "\n\t"
+                   "mov %[shift], %%ecx\n\t"
+                   "mov %[off], %[ma]\n\t"
+                   "and %[bmask], %[ma]\n\t"
+                   "shl %%cl, %[off]\n\t"
+                   "and %[tmask], %[off]\n\t"
+                   "or %[off], %[ma]\n\t"
+                   "ret\n\t"
+                   LINKONCE_EPILOGUE(SYMNAME),
+                   "pdep %[mask], %[off], %[ma]", X86_FEATURE_BMI2,
+                   ASM_OUTPUT2([ma] "=&r" (ma), [off] "+r" (va)),
+                   [mask] "m" (ma_real_mask),
+                   [shift] "m" (pfn_pdx_hole_shift),
+                   [bmask] "m" (ma_va_bottom_mask),
+                   [tmask] "m" (ma_top_mask)
+                   : "ecx");
+#undef SYMNAME
+#else
+    alternative_io("call do2ma",
+                   /* pdep ma_real_mask(%rip), %rdi, %rax */
+                   ".byte 0xc4, 0xe2, 0xc3, 0xf5, 0x05\n\t"
+                   ".long ma_real_mask - 4 - .",
+                   X86_FEATURE_BMI2,
+                   ASM_OUTPUT2("=a" (ma), "+D" (va)), "m" (ma_real_mask)
+                   : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
+#endif
+
+    return ma;
 }
 
-static inline void *__maddr_to_virt(unsigned long ma)
+static always_inline void *__maddr_to_virt(paddr_t ma)
 {
+    unsigned long off;
+
     ASSERT(pfn_to_pdx(ma >> PAGE_SHIFT) < (DIRECTMAP_SIZE >> PAGE_SHIFT));
-    return (void *)(DIRECTMAP_VIRT_START +
-                    ((ma & ma_va_bottom_mask) |
-                     ((ma & ma_top_mask) >> pfn_pdx_hole_shift)));
+
+#ifdef CONFIG_INDIRECT_THUNK /* V modifier available? */
+#define SYMNAME(pfx...) #pfx "ma2do_%V[off]_%V[ma]"
+    alternative_io("call " SYMNAME() "\n\t"
+                   LINKONCE_PROLOGUE(SYMNAME) "\n\t"
+                   "mov %[tmask], %[off]\n\t"
+                   "mov %[shift], %%ecx\n\t"
+                   "and %[ma], %[off]\n\t"
+                   "and %[bmask], %[ma]\n\t"
+                   "shr %%cl, %[off]\n\t"
+                   "or %[ma], %[off]\n\t"
+                   "ret\n\t"
+                   LINKONCE_EPILOGUE(SYMNAME),
+                   "pext %[mask], %[ma], %[off]", X86_FEATURE_BMI2,
+                   ASM_OUTPUT2([off] "=&r" (off), [ma] "+r" (ma)),
+                   [mask] "m" (ma_real_mask),
+                   [shift] "m" (pfn_pdx_hole_shift),
+                   [bmask] "m" (ma_va_bottom_mask),
+                   [tmask] "m" (ma_top_mask)
+                   : "ecx");
+#undef SYMNAME
+#else
+    alternative_io("call ma2do",
+                   /* pext ma_real_mask(%rip), %rdi, %rax */
+                   ".byte 0xc4, 0xe2, 0xc2, 0xf5, 0x05\n\t"
+                   ".long ma_real_mask - 4 - .",
+                   X86_FEATURE_BMI2,
+                   ASM_OUTPUT2("=a" (off), "+D" (ma)), "m" (ma_real_mask)
+                   : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
+#endif
+
+    return (void *)DIRECTMAP_VIRT_START + off;
 }
 
 /* read access (should only be used for debug printk's) */




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 3/5] x86: use PDEP/PEXT for PFN/PDX conversion when available
  2018-08-17  7:06         ` [PATCH v3 0/5] x86: improve PDX <-> PFN and alike translations Jan Beulich
  2018-08-17  7:20           ` [PATCH v3 1/5] x86: remove page.h and processor.h inclusion from asm_defns.h Jan Beulich
  2018-08-17  7:21           ` [PATCH v3 2/5] x86: use PDEP/PEXT for maddr/direct-map-offset conversion when available Jan Beulich
@ 2018-08-17  7:22           ` Jan Beulich
  2018-08-17  7:23           ` [PATCH v3 4/5] x86: use MOV for PFN/PDX conversion when possible Jan Beulich
  2018-08-17  7:24           ` [PATCH v3 5/5] x86: use PDEP for PTE flags insertion when available Jan Beulich
  4 siblings, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2018-08-17  7:22 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

Both replace 6 instructions by a single one, further reducing code size,
cache, and TLB footprint (in particular on systems supporting BMI2).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Avoid quoted symbols; use gcc's new V operand modifier instead.
    Re-base.

--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -391,6 +391,7 @@ void __init arch_init_memory(void)
 }
 
 paddr_t __read_mostly ma_real_mask = ~0UL;
+unsigned long __read_mostly pfn_real_mask = ~0UL;
 
 #ifndef CONFIG_INDIRECT_THUNK /* V modifier unavailable? */
 
@@ -407,6 +408,17 @@ unsigned long ma2do(paddr_t ma)
            ((ma & ma_top_mask) >> pfn_pdx_hole_shift);
 }
 
+/* Conversion between PDX and PFN. */
+unsigned long pdx2pfn(unsigned long pdx)
+{
+    return generic_pdx_to_pfn(pdx);
+}
+
+unsigned long pfn2pdx(unsigned long pfn)
+{
+    return generic_pfn_to_pdx(pfn);
+}
+
 #endif
 
 int page_is_ram_type(unsigned long mfn, unsigned long mem_type)
--- a/xen/arch/x86/srat.c
+++ b/xen/arch/x86/srat.c
@@ -452,6 +452,7 @@ void __init srat_parse_regions(u64 addr)
 	pfn_pdx_hole_setup(mask >> PAGE_SHIFT);
 
 	ma_real_mask = ma_top_mask | ma_va_bottom_mask;
+	pfn_real_mask = pfn_top_mask | pfn_pdx_bottom_mask;
 }
 
 /* Use the information discovered above to actually set up the nodes. */
--- /dev/null
+++ b/xen/include/asm-arm/pdx.h
@@ -0,0 +1,16 @@
+#ifndef __ASM_ARM_PDX_H__
+#define __ASM_ARM_PDX_H__
+
+#define pdx_to_pfn generic_pdx_to_pfn
+#define pfn_to_pdx generic_pfn_to_pdx
+
+#endif /* __ASM_ARM_PDX_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
--- /dev/null
+++ b/xen/include/asm-x86/pdx.h
@@ -0,0 +1,93 @@
+#ifndef __ASM_ARM_PDX_H__
+#define __ASM_ARM_PDX_H__
+
+#include <asm/alternative.h>
+#include <asm/asm_defns.h>
+#include <asm/cpufeature.h>
+
+extern unsigned long pfn_real_mask;
+
+static always_inline unsigned long pdx_to_pfn(unsigned long pdx)
+{
+    unsigned long pfn;
+
+#ifdef CONFIG_INDIRECT_THUNK /* V modifier available? */
+#define SYMNAME(pfx...) #pfx "pdx2pfn_%V[pfn]_%V[pdx]"
+    alternative_io("call " SYMNAME() "\n\t"
+                   LINKONCE_PROLOGUE(SYMNAME) "\n\t"
+                   "mov %[shift], %%ecx\n\t"
+                   "mov %[pdx], %[pfn]\n\t"
+                   "and %[bmask], %[pfn]\n\t"
+                   "shl %%cl, %[pdx]\n\t"
+                   "and %[tmask], %[pdx]\n\t"
+                   "or %[pdx], %[pfn]\n\t"
+                   "ret\n\t"
+                   LINKONCE_EPILOGUE(SYMNAME),
+                   "pdep %[mask], %[pdx], %[pfn]", X86_FEATURE_BMI2,
+                   ASM_OUTPUT2([pfn] "=&r" (pfn), [pdx] "+r" (pdx)),
+                   [mask] "m" (pfn_real_mask),
+                   [shift] "m" (pfn_pdx_hole_shift),
+                   [bmask] "m" (pfn_pdx_bottom_mask),
+                   [tmask] "m" (pfn_top_mask)
+                   : "ecx");
+#undef SYMNAME
+#else
+    alternative_io("call pdx2pfn",
+                   /* pdep pfn_real_mask(%rip), %rdi, %rax */
+                   ".byte 0xc4, 0xe2, 0xc3, 0xf5, 0x05\n\t"
+                   ".long pfn_real_mask - 4 - .",
+                   X86_FEATURE_BMI2,
+                   ASM_OUTPUT2("=a" (pfn), "+D" (pdx)), "m" (pfn_real_mask)
+                   : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
+#endif
+
+    return pfn;
+}
+
+static always_inline unsigned long pfn_to_pdx(unsigned long pfn)
+{
+    unsigned long pdx;
+
+#ifdef CONFIG_INDIRECT_THUNK /* V modifier available? */
+#define SYMNAME(pfx...) #pfx "pfn2pdx_%V[pdx]_%V[pfn]"
+    alternative_io("call " SYMNAME() "\n\t"
+                   LINKONCE_PROLOGUE(SYMNAME) "\n\t"
+                   "mov %[tmask], %[pdx]\n\t"
+                   "mov %[shift], %%ecx\n\t"
+                   "and %[pfn], %[pdx]\n\t"
+                   "and %[bmask], %[pfn]\n\t"
+                   "shr %%cl, %[pdx]\n\t"
+                   "or %[pfn], %[pdx]\n\t"
+                   "ret\n\t"
+                   LINKONCE_EPILOGUE(SYMNAME),
+                   "pext %[mask], %[pfn], %[pdx]", X86_FEATURE_BMI2,
+                   ASM_OUTPUT2([pdx] "=&r" (pdx), [pfn] "+r" (pfn)),
+                   [mask] "m" (pfn_real_mask),
+                   [shift] "m" (pfn_pdx_hole_shift),
+                   [bmask] "m" (pfn_pdx_bottom_mask),
+                   [tmask] "m" (pfn_top_mask)
+                   : "ecx");
+#undef SYMNAME
+#else
+    alternative_io("call pfn2pdx",
+                   /* pext pfn_real_mask(%rip), %rdi, %rax */
+                   ".byte 0xc4, 0xe2, 0xc2, 0xf5, 0x05\n\t"
+                   ".long pfn_real_mask - 4 - .",
+                   X86_FEATURE_BMI2,
+                   ASM_OUTPUT2("=a" (pdx), "+D" (pfn)), "m" (pfn_real_mask)
+                   : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
+#endif
+
+    return pdx;
+}
+
+#endif /* __ASM_ARM_PDX_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
--- a/xen/include/xen/pdx.h
+++ b/xen/include/xen/pdx.h
@@ -23,13 +23,13 @@ extern void set_pdx_range(unsigned long
 
 bool __mfn_valid(unsigned long mfn);
 
-static inline unsigned long pfn_to_pdx(unsigned long pfn)
+static inline unsigned long generic_pfn_to_pdx(unsigned long pfn)
 {
     return (pfn & pfn_pdx_bottom_mask) |
            ((pfn & pfn_top_mask) >> pfn_pdx_hole_shift);
 }
 
-static inline unsigned long pdx_to_pfn(unsigned long pdx)
+static inline unsigned long generic_pdx_to_pfn(unsigned long pdx)
 {
     return (pdx & pfn_pdx_bottom_mask) |
            ((pdx << pfn_pdx_hole_shift) & pfn_top_mask);
@@ -40,6 +40,8 @@ static inline unsigned long pdx_to_pfn(u
 
 extern void pfn_pdx_hole_setup(unsigned long);
 
+#include <asm/pdx.h>
+
 #endif /* HAS_PDX */
 #endif /* __XEN_PDX_H__ */
 




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 4/5] x86: use MOV for PFN/PDX conversion when possible
  2018-08-17  7:06         ` [PATCH v3 0/5] x86: improve PDX <-> PFN and alike translations Jan Beulich
                             ` (2 preceding siblings ...)
  2018-08-17  7:22           ` [PATCH v3 3/5] x86: use PDEP/PEXT for PFN/PDX " Jan Beulich
@ 2018-08-17  7:23           ` Jan Beulich
  2018-08-17  8:41             ` Andrew Cooper
  2018-08-17  7:24           ` [PATCH v3 5/5] x86: use PDEP for PTE flags insertion when available Jan Beulich
  4 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2018-08-17  7:23 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

... and (of course) also maddr / direct-map-offset ones.

Most x86 systems don't actually require the use of PDX compression. Now
that we have patching for the conversion code in place anyway, extend it
to use simple MOV when possible. Introduce a new pseudo-CPU-feature to
key the patching off of.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
The indentation of the alternative_io_2() uses below ends up wrong, but
re-indenting here seems undesirable. I could either add a follow-up
re-formatting patch, or simply slightly bodge the indentation in the
earlier patches, for it to come out right here.
---
v3: Re-base.
v2: Avoid quoted symbols; use gcc's new V operand modifier instead.
    Re-base.

--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1439,6 +1439,9 @@ void __init noreturn __start_xen(unsigne
 
     numa_initmem_init(0, raw_max_page);
 
+    if ( !pfn_pdx_hole_shift )
+        setup_force_cpu_cap(X86_FEATURE_PFN_PDX_IDENT);
+
     if ( max_page - 1 > virt_to_mfn(HYPERVISOR_VIRT_END - 1) )
     {
         unsigned long limit = virt_to_mfn(HYPERVISOR_VIRT_END - 1);
--- a/xen/include/asm-x86/cpufeatures.h
+++ b/xen/include/asm-x86/cpufeatures.h
@@ -33,3 +33,4 @@ XEN_CPUFEATURE(SC_RSB_HVM,      (FSCAPIN
 XEN_CPUFEATURE(NO_XPTI,         (FSCAPINTS+0)*32+20) /* XPTI mitigation not in use */
 XEN_CPUFEATURE(SC_MSR_IDLE,     (FSCAPINTS+0)*32+21) /* (SC_MSR_PV || SC_MSR_HVM) && default_xen_spec_ctrl */
 XEN_CPUFEATURE(XEN_LBR,         (FSCAPINTS+0)*32+22) /* Xen uses MSR_DEBUGCTL.LBR */
+XEN_CPUFEATURE(PFN_PDX_IDENT,   (FSCAPINTS+0)*32+23) /* PFN <-> PDX mapping is 1:1 */
--- a/xen/include/asm-x86/pdx.h
+++ b/xen/include/asm-x86/pdx.h
@@ -13,7 +13,7 @@ static always_inline unsigned long pdx_t
 
 #ifdef CONFIG_INDIRECT_THUNK /* V modifier available? */
 #define SYMNAME(pfx...) #pfx "pdx2pfn_%V[pfn]_%V[pdx]"
-    alternative_io("call " SYMNAME() "\n\t"
+    alternative_io_2("call " SYMNAME() "\n\t"
                    LINKONCE_PROLOGUE(SYMNAME) "\n\t"
                    "mov %[shift], %%ecx\n\t"
                    "mov %[pdx], %[pfn]\n\t"
@@ -24,6 +24,7 @@ static always_inline unsigned long pdx_t
                    "ret\n\t"
                    LINKONCE_EPILOGUE(SYMNAME),
                    "pdep %[mask], %[pdx], %[pfn]", X86_FEATURE_BMI2,
+                   "mov %[pdx], %[pfn]", X86_FEATURE_PFN_PDX_IDENT,
                    ASM_OUTPUT2([pfn] "=&r" (pfn), [pdx] "+r" (pdx)),
                    [mask] "m" (pfn_real_mask),
                    [shift] "m" (pfn_pdx_hole_shift),
@@ -32,11 +33,12 @@ static always_inline unsigned long pdx_t
                    : "ecx");
 #undef SYMNAME
 #else
-    alternative_io("call pdx2pfn",
+    alternative_io_2("call pdx2pfn",
                    /* pdep pfn_real_mask(%rip), %rdi, %rax */
                    ".byte 0xc4, 0xe2, 0xc3, 0xf5, 0x05\n\t"
                    ".long pfn_real_mask - 4 - .",
                    X86_FEATURE_BMI2,
+                   "mov %%rdi, %%rax", X86_FEATURE_PFN_PDX_IDENT,
                    ASM_OUTPUT2("=a" (pfn), "+D" (pdx)), "m" (pfn_real_mask)
                    : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
 #endif
@@ -50,7 +52,7 @@ static always_inline unsigned long pfn_t
 
 #ifdef CONFIG_INDIRECT_THUNK /* V modifier available? */
 #define SYMNAME(pfx...) #pfx "pfn2pdx_%V[pdx]_%V[pfn]"
-    alternative_io("call " SYMNAME() "\n\t"
+    alternative_io_2("call " SYMNAME() "\n\t"
                    LINKONCE_PROLOGUE(SYMNAME) "\n\t"
                    "mov %[tmask], %[pdx]\n\t"
                    "mov %[shift], %%ecx\n\t"
@@ -61,6 +63,7 @@ static always_inline unsigned long pfn_t
                    "ret\n\t"
                    LINKONCE_EPILOGUE(SYMNAME),
                    "pext %[mask], %[pfn], %[pdx]", X86_FEATURE_BMI2,
+                   "mov %[pfn], %[pdx]", X86_FEATURE_PFN_PDX_IDENT,
                    ASM_OUTPUT2([pdx] "=&r" (pdx), [pfn] "+r" (pfn)),
                    [mask] "m" (pfn_real_mask),
                    [shift] "m" (pfn_pdx_hole_shift),
@@ -69,11 +72,12 @@ static always_inline unsigned long pfn_t
                    : "ecx");
 #undef SYMNAME
 #else
-    alternative_io("call pfn2pdx",
+    alternative_io_2("call pfn2pdx",
                    /* pext pfn_real_mask(%rip), %rdi, %rax */
                    ".byte 0xc4, 0xe2, 0xc2, 0xf5, 0x05\n\t"
                    ".long pfn_real_mask - 4 - .",
                    X86_FEATURE_BMI2,
+                   "mov %%rdi, %%rax", X86_FEATURE_PFN_PDX_IDENT,
                    ASM_OUTPUT2("=a" (pdx), "+D" (pfn)), "m" (pfn_real_mask)
                    : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
 #endif
--- a/xen/include/asm-x86/x86_64/page.h
+++ b/xen/include/asm-x86/x86_64/page.h
@@ -83,7 +83,7 @@ static always_inline paddr_t __virt_to_m
 
 #ifdef CONFIG_INDIRECT_THUNK /* V modifier available? */
 #define SYMNAME(pfx...) #pfx "do2ma_%V[ma]_%V[off]"
-    alternative_io("call " SYMNAME() "\n\t"
+    alternative_io_2("call " SYMNAME() "\n\t"
                    LINKONCE_PROLOGUE(SYMNAME) "\n\t"
                    "mov %[shift], %%ecx\n\t"
                    "mov %[off], %[ma]\n\t"
@@ -94,6 +94,7 @@ static always_inline paddr_t __virt_to_m
                    "ret\n\t"
                    LINKONCE_EPILOGUE(SYMNAME),
                    "pdep %[mask], %[off], %[ma]", X86_FEATURE_BMI2,
+                   "mov %[off], %[ma]", X86_FEATURE_PFN_PDX_IDENT,
                    ASM_OUTPUT2([ma] "=&r" (ma), [off] "+r" (va)),
                    [mask] "m" (ma_real_mask),
                    [shift] "m" (pfn_pdx_hole_shift),
@@ -102,11 +103,12 @@ static always_inline paddr_t __virt_to_m
                    : "ecx");
 #undef SYMNAME
 #else
-    alternative_io("call do2ma",
+    alternative_io_2("call do2ma",
                    /* pdep ma_real_mask(%rip), %rdi, %rax */
                    ".byte 0xc4, 0xe2, 0xc3, 0xf5, 0x05\n\t"
                    ".long ma_real_mask - 4 - .",
                    X86_FEATURE_BMI2,
+                   "mov %%rdi, %%rax", X86_FEATURE_PFN_PDX_IDENT,
                    ASM_OUTPUT2("=a" (ma), "+D" (va)), "m" (ma_real_mask)
                    : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
 #endif
@@ -122,7 +124,7 @@ static always_inline void *__maddr_to_vi
 
 #ifdef CONFIG_INDIRECT_THUNK /* V modifier available? */
 #define SYMNAME(pfx...) #pfx "ma2do_%V[off]_%V[ma]"
-    alternative_io("call " SYMNAME() "\n\t"
+    alternative_io_2("call " SYMNAME() "\n\t"
                    LINKONCE_PROLOGUE(SYMNAME) "\n\t"
                    "mov %[tmask], %[off]\n\t"
                    "mov %[shift], %%ecx\n\t"
@@ -133,6 +135,7 @@ static always_inline void *__maddr_to_vi
                    "ret\n\t"
                    LINKONCE_EPILOGUE(SYMNAME),
                    "pext %[mask], %[ma], %[off]", X86_FEATURE_BMI2,
+                   "mov %[ma], %[off]", X86_FEATURE_PFN_PDX_IDENT,
                    ASM_OUTPUT2([off] "=&r" (off), [ma] "+r" (ma)),
                    [mask] "m" (ma_real_mask),
                    [shift] "m" (pfn_pdx_hole_shift),
@@ -141,11 +144,12 @@ static always_inline void *__maddr_to_vi
                    : "ecx");
 #undef SYMNAME
 #else
-    alternative_io("call ma2do",
+    alternative_io_2("call ma2do",
                    /* pext ma_real_mask(%rip), %rdi, %rax */
                    ".byte 0xc4, 0xe2, 0xc2, 0xf5, 0x05\n\t"
                    ".long ma_real_mask - 4 - .",
                    X86_FEATURE_BMI2,
+                   "mov %%rdi, %%rax", X86_FEATURE_PFN_PDX_IDENT,
                    ASM_OUTPUT2("=a" (off), "+D" (ma)), "m" (ma_real_mask)
                    : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
 #endif




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 5/5] x86: use PDEP for PTE flags insertion when available
  2018-08-17  7:06         ` [PATCH v3 0/5] x86: improve PDX <-> PFN and alike translations Jan Beulich
                             ` (3 preceding siblings ...)
  2018-08-17  7:23           ` [PATCH v3 4/5] x86: use MOV for PFN/PDX conversion when possible Jan Beulich
@ 2018-08-17  7:24           ` Jan Beulich
  2018-08-18  1:08             ` Rich Persaud
  4 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2018-08-17  7:24 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

This replaces 5 instructions by a single one, further reducing code size,
cache, and TLB footprint (in particular on systems supporting BMI2).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Irrespective of the note regarding a possible alternative route, I think
the change here is an improvement until someone would take on
investigating whether the proposed dropping of the 24-bit flags
representation would be a worthwhile improvement of generated code.
---
v3: Move this patch last in series. Re-base.
v2: Avoid quoted symbols; use gcc's new V operand modifier instead.
    Re-base.
---
TBD: Also change get_pte_flags() (after having introduced test_pte_flags())?

--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -393,6 +393,8 @@ void __init arch_init_memory(void)
 paddr_t __read_mostly ma_real_mask = ~0UL;
 unsigned long __read_mostly pfn_real_mask = ~0UL;
 
+const intpte_t pte_flags_mask = ~(PADDR_MASK & PAGE_MASK);
+
 #ifndef CONFIG_INDIRECT_THUNK /* V modifier unavailable? */
 
 /* Conversion between machine address and direct map offset. */
@@ -419,6 +421,11 @@ unsigned long pfn2pdx(unsigned long pfn)
     return generic_pfn_to_pdx(pfn);
 }
 
+intpte_t put_pte_flags_v(unsigned int flags)
+{
+    return put_pte_flags_c(flags);
+}
+
 #endif
 
 int page_is_ram_type(unsigned long mfn, unsigned long mem_type)
--- a/xen/include/asm-x86/x86_64/page.h
+++ b/xen/include/asm-x86/x86_64/page.h
@@ -205,15 +205,53 @@ typedef l4_pgentry_t root_pgentry_t;
 
 /* Extract flags into 24-bit integer, or turn 24-bit flags into a pte mask. */
 #ifndef __ASSEMBLY__
+extern const intpte_t pte_flags_mask;
+intpte_t __attribute_const__ put_pte_flags_v(unsigned int x);
+
 static inline unsigned int get_pte_flags(intpte_t x)
 {
     return ((x >> 40) & ~0xfff) | (x & 0xfff);
 }
 
-static inline intpte_t put_pte_flags(unsigned int x)
+static inline intpte_t put_pte_flags_c(unsigned int x)
 {
     return (((intpte_t)x & ~0xfff) << 40) | (x & 0xfff);
 }
+
+static always_inline intpte_t put_pte_flags(unsigned int x)
+{
+    intpte_t pte;
+
+    if ( __builtin_constant_p(x) )
+        return put_pte_flags_c(x);
+
+#ifdef CONFIG_INDIRECT_THUNK /* V modifier available? */
+#define SYMNAME(pfx...) #pfx "put_pte_flags_%V[pte]_%V[flags]"
+    alternative_io("call " SYMNAME() "\n\t"
+                   LINKONCE_PROLOGUE(SYMNAME) "\n\t"
+                   "mov %[flags], %k[pte]\n\t"
+                   "and $0xfff000, %[flags]\n\t"
+                   "and $0x000fff, %k[pte]\n\t"
+                   "shl $40, %q[flags]\n\t"
+                   "or %q[flags], %[pte]\n\t"
+                   "ret\n\t"
+                   LINKONCE_EPILOGUE(SYMNAME),
+                   "pdep %[mask], %q[flags], %[pte]", X86_FEATURE_BMI2,
+                   ASM_OUTPUT2([pte] "=&r" (pte), [flags] "+r" (x)),
+                   [mask] "m" (pte_flags_mask));
+#undef SYMNAME
+#else
+    alternative_io("call put_pte_flags_v",
+                   /* pdep pte_flags_mask(%rip), %rdi, %rax */
+                   ".byte 0xc4, 0xe2, 0xc3, 0xf5, 0x05\n\t"
+                   ".long pte_flags_mask - 4 - .",
+                   X86_FEATURE_BMI2,
+                   ASM_OUTPUT2("=a" (pte), "+D" (x)), "m" (pte_flags_mask)
+                   : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
+#endif
+
+    return pte;
+}
 #endif
 
 /*





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/5] x86: remove page.h and processor.h inclusion from asm_defns.h
  2018-08-17  7:20           ` [PATCH v3 1/5] x86: remove page.h and processor.h inclusion from asm_defns.h Jan Beulich
@ 2018-08-17  8:39             ` Andrew Cooper
  0 siblings, 0 replies; 24+ messages in thread
From: Andrew Cooper @ 2018-08-17  8:39 UTC (permalink / raw)
  To: Jan Beulich, xen-devel

On 17/08/2018 08:20, Jan Beulich wrote:
> Subsequent changes require this (too wide anyway imo) dependency to be
> dropped.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 4/5] x86: use MOV for PFN/PDX conversion when possible
  2018-08-17  7:23           ` [PATCH v3 4/5] x86: use MOV for PFN/PDX conversion when possible Jan Beulich
@ 2018-08-17  8:41             ` Andrew Cooper
  0 siblings, 0 replies; 24+ messages in thread
From: Andrew Cooper @ 2018-08-17  8:41 UTC (permalink / raw)
  To: Jan Beulich, xen-devel

On 17/08/2018 08:23, Jan Beulich wrote:
> ... and (of course) also maddr / direct-map-offset ones.
>
> Most x86 systems don't actually require the use of PDX compression. Now
> that we have patching for the conversion code in place anyway, extend it
> to use simple MOV when possible. Introduce a new pseudo-CPU-feature to
> key the patching off of.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> The indentation of the alternative_io_2() uses below ends up wrong, but
> re-indenting here seems undesirable. I could either add a follow-up
> re-formatting patch, or simply slightly bodge the indentation in the
> earlier patches, for it to come out right here.

I've used the "bodge the indentation in the earlier patches" trick
before, and it is probably the cleanest approach.

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/5] x86: use PDEP/PEXT for maddr/direct-map-offset conversion when available
  2018-08-17  7:21           ` [PATCH v3 2/5] x86: use PDEP/PEXT for maddr/direct-map-offset conversion when available Jan Beulich
@ 2018-08-17  8:59             ` Andrew Cooper
  2018-08-17  9:38               ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Cooper @ 2018-08-17  8:59 UTC (permalink / raw)
  To: Jan Beulich, xen-devel

On 17/08/2018 08:21, Jan Beulich wrote:
> --- a/xen/include/asm-x86/asm_defns.h
> +++ b/xen/include/asm-x86/asm_defns.h
> @@ -186,6 +186,20 @@ void ret_from_intr(void);
>          UNLIKELY_END_SECTION "\n"          \
>          ".Llikely." #tag ".%=:"
>  
> +#define LINKONCE_PROLOGUE(sym)                    \
> +        ".ifndef " sym() "\n\t"                   \
> +        ".pushsection " sym(.gnu.linkonce.t.) "," \

This definitely warrants a comment and a change of name, seeing as sym
isn't a symbol.  Its a macro which gives you a string back.

> +                      "\"ax\",@progbits\n\t"      \
> +        ".p2align 4\n"                            \
> +        sym() ":"
> +
> +#define LINKONCE_EPILOGUE(sym)                    \
> +        ".weak " sym() "\n\t"                     \
> +        ".type " sym() ", @function\n\t"          \
> +        ".size " sym() ", . - " sym() "\n\t"      \
> +        ".popsection\n\t"                         \
> +        ".endif"
> +
>  #endif
>  
>  /* "Raw" instruction opcodes */
> --- a/xen/include/asm-x86/x86_64/page.h
> +++ b/xen/include/asm-x86/x86_64/page.h
> @@ -57,8 +64,10 @@ extern unsigned long xen_virt_end;
>  #define pdx_to_virt(pdx) ((void *)(DIRECTMAP_VIRT_START + \
>                                     ((unsigned long)(pdx) << PAGE_SHIFT)))
>  
> -static inline unsigned long __virt_to_maddr(unsigned long va)
> +static always_inline paddr_t __virt_to_maddr(unsigned long va)
>  {
> +    paddr_t ma;
> +
>      ASSERT(va < DIRECTMAP_VIRT_END);
>      if ( va >= DIRECTMAP_VIRT_START )
>          va -= DIRECTMAP_VIRT_START;
> @@ -71,16 +80,77 @@ static inline unsigned long __virt_to_ma
>  
>          va += xen_phys_start - XEN_VIRT_START;
>      }
> -    return (va & ma_va_bottom_mask) |
> -           ((va << pfn_pdx_hole_shift) & ma_top_mask);
> +
> +#ifdef CONFIG_INDIRECT_THUNK /* V modifier available? */
> +#define SYMNAME(pfx...) #pfx "do2ma_%V[ma]_%V[off]"
> +    alternative_io("call " SYMNAME() "\n\t"
> +                   LINKONCE_PROLOGUE(SYMNAME) "\n\t"
> +                   "mov %[shift], %%ecx\n\t"
> +                   "mov %[off], %[ma]\n\t"
> +                   "and %[bmask], %[ma]\n\t"
> +                   "shl %%cl, %[off]\n\t"
> +                   "and %[tmask], %[off]\n\t"
> +                   "or %[off], %[ma]\n\t"
> +                   "ret\n\t"
> +                   LINKONCE_EPILOGUE(SYMNAME),
> +                   "pdep %[mask], %[off], %[ma]", X86_FEATURE_BMI2,

The compiler understanding V doesn't imply that the assembler
understands pdep

> +                   ASM_OUTPUT2([ma] "=&r" (ma), [off] "+r" (va)),
> +                   [mask] "m" (ma_real_mask),
> +                   [shift] "m" (pfn_pdx_hole_shift),
> +                   [bmask] "m" (ma_va_bottom_mask),
> +                   [tmask] "m" (ma_top_mask)
> +                   : "ecx");
> +#undef SYMNAME
> +#else
> +    alternative_io("call do2ma",
> +                   /* pdep ma_real_mask(%rip), %rdi, %rax */
> +                   ".byte 0xc4, 0xe2, 0xc3, 0xf5, 0x05\n\t"
> +                   ".long ma_real_mask - 4 - .",
> +                   X86_FEATURE_BMI2,
> +                   ASM_OUTPUT2("=a" (ma), "+D" (va)), "m" (ma_real_mask)
> +                   : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
> +#endif

This is a massive clobber list in a function you've forced always
inline, and I can't see it doing nice things to the callsites.  TBH,
this still feels over-complicated for what it wants to be.

Why not implement one single function in assembly that doesn't have
usual C calling conventions and can clobber %ecx and one other, and use
that?

It avoids the need for potentially 256 almost-identical copies of the
function in the linkonce section, and avoids having the multiple
implementations in C/asm, avoids the need for any logic derived from
CONFIG_INDIRECT_THUNK, and avoids the need for massive clobber lists.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/5] x86: use PDEP/PEXT for maddr/direct-map-offset conversion when available
  2018-08-17  8:59             ` Andrew Cooper
@ 2018-08-17  9:38               ` Jan Beulich
  2018-09-07 16:17                 ` Andrew Cooper
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2018-08-17  9:38 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

>>> On 17.08.18 at 10:59, <andrew.cooper3@citrix.com> wrote:
> On 17/08/2018 08:21, Jan Beulich wrote:
>> --- a/xen/include/asm-x86/asm_defns.h
>> +++ b/xen/include/asm-x86/asm_defns.h
>> @@ -186,6 +186,20 @@ void ret_from_intr(void);
>>          UNLIKELY_END_SECTION "\n"          \
>>          ".Llikely." #tag ".%=:"
>>  
>> +#define LINKONCE_PROLOGUE(sym)                    \
>> +        ".ifndef " sym() "\n\t"                   \
>> +        ".pushsection " sym(.gnu.linkonce.t.) "," \
> 
> This definitely warrants a comment and a change of name, seeing as sym
> isn't a symbol.  Its a macro which gives you a string back.

It's still a symbol name; don't forget that in the end section names
are symbols names too. I'd have gone with a better name if I had
any better idea. It's also not really clear to me what you want a
comment to say here.

>> @@ -71,16 +80,77 @@ static inline unsigned long __virt_to_ma
>>  
>>          va += xen_phys_start - XEN_VIRT_START;
>>      }
>> -    return (va & ma_va_bottom_mask) |
>> -           ((va << pfn_pdx_hole_shift) & ma_top_mask);
>> +
>> +#ifdef CONFIG_INDIRECT_THUNK /* V modifier available? */
>> +#define SYMNAME(pfx...) #pfx "do2ma_%V[ma]_%V[off]"
>> +    alternative_io("call " SYMNAME() "\n\t"
>> +                   LINKONCE_PROLOGUE(SYMNAME) "\n\t"
>> +                   "mov %[shift], %%ecx\n\t"
>> +                   "mov %[off], %[ma]\n\t"
>> +                   "and %[bmask], %[ma]\n\t"
>> +                   "shl %%cl, %[off]\n\t"
>> +                   "and %[tmask], %[off]\n\t"
>> +                   "or %[off], %[ma]\n\t"
>> +                   "ret\n\t"
>> +                   LINKONCE_EPILOGUE(SYMNAME),
>> +                   "pdep %[mask], %[off], %[ma]", X86_FEATURE_BMI2,
> 
> The compiler understanding V doesn't imply that the assembler
> understands pdep

I've been considering this case, and in fact in an early version I had
a -DHAVE_AS_BMI2 addition. But I then decided to drop it, as I couldn't
really imagine V support to be backported to gcc pre-dating BMI2
support in gas (available as of 2.22, i.e. gcc 4.5 / 4.6 timeframe).
But yes, if you really think we need to cope with that, I could re-add
this. Just let me know if you indeed think this is needed.

>> +                   ASM_OUTPUT2([ma] "=&r" (ma), [off] "+r" (va)),
>> +                   [mask] "m" (ma_real_mask),
>> +                   [shift] "m" (pfn_pdx_hole_shift),
>> +                   [bmask] "m" (ma_va_bottom_mask),
>> +                   [tmask] "m" (ma_top_mask)
>> +                   : "ecx");
>> +#undef SYMNAME
>> +#else
>> +    alternative_io("call do2ma",
>> +                   /* pdep ma_real_mask(%rip), %rdi, %rax */
>> +                   ".byte 0xc4, 0xe2, 0xc3, 0xf5, 0x05\n\t"
>> +                   ".long ma_real_mask - 4 - .",
>> +                   X86_FEATURE_BMI2,
>> +                   ASM_OUTPUT2("=a" (ma), "+D" (va)), "m" (ma_real_mask)
>> +                   : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
>> +#endif
> 
> This is a massive clobber list in a function you've forced always
> inline, and I can't see it doing nice things to the callsites.  TBH,
> this still feels over-complicated for what it wants to be.
> 
> Why not implement one single function in assembly that doesn't have
> usual C calling conventions and can clobber %ecx and one other, and use
> that?
> 
> It avoids the need for potentially 256 almost-identical copies of the
> function in the linkonce section, and avoids having the multiple
> implementations in C/asm, avoids the need for any logic derived from
> CONFIG_INDIRECT_THUNK, and avoids the need for massive clobber lists.

Your response mixes things a bit too much for me to sort out what
exactly you're concerned about: The massive clobber list exists only
in the !CONFIG_INDIRECT_THUNK case. In that case though there
aren't going to be up to 225 instances of the function. I'd be fine
implementing the single one in assembly to reduce the clobber list,
it was just to keep down assembly code size and also to have
compiler generated code to compare against. I have to admit though
that I'm not overly concerned about the !CONFIG_INDIRECT_THUNK
case in the first place, so I also didn't see much reason to try to
optimize it.

For the CONFIG_INDIRECT_THUNK case, otoh, I'd really like to
avoid dictating register allocation to the compiler. Hence the solution
with the (possibly many) function instances. Overall code size still
decreases with this approach, and on modern hardware the entire
region of the image in which they live will remain cold. (Additionally
not affecting register allocation here has the benefit of making it
far easier to compare pre/post generated code.)

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 5/5] x86: use PDEP for PTE flags insertion when available
  2018-08-17  7:24           ` [PATCH v3 5/5] x86: use PDEP for PTE flags insertion when available Jan Beulich
@ 2018-08-18  1:08             ` Rich Persaud
  2018-08-20  7:25               ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Rich Persaud @ 2018-08-18  1:08 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper


[-- Attachment #1.1: Type: text/plain, Size: 622 bytes --]

On Aug 17, 2018, at 03:24, Jan Beulich <JBeulich@suse.com> wrote:
> 
> This replaces 5 instructions by a single one, further reducing code size,
> cache, and TLB footprint (in particular on systems supporting BMI2).

This link claims that BMI2 may be less performant/consistent on AMD Ryzen than Intel:
https://www.reddit.com/r/Amd/comments/60i6er/ryzen_and_bmi2_strange_behavior_and_high_latencies/

Would this patch series have any benefit to L0 hypervisors/rootkits (e.g. Bromium, Bareflank or similar hypervisors) which could be monitoring L1 Xen?  Or Xen as L0 hypervisor and Hyper-V as L1 hypervisor?

Rich

[-- Attachment #1.2: Type: text/html, Size: 1060 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 5/5] x86: use PDEP for PTE flags insertion when available
  2018-08-18  1:08             ` Rich Persaud
@ 2018-08-20  7:25               ` Jan Beulich
  0 siblings, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2018-08-20  7:25 UTC (permalink / raw)
  To: Brian Woods, Rich Persaud; +Cc: Andrew Cooper, xen-devel

>>> On 18.08.18 at 03:08, <persaur@gmail.com> wrote:
> On Aug 17, 2018, at 03:24, Jan Beulich <JBeulich@suse.com> wrote:
>> 
>> This replaces 5 instructions by a single one, further reducing code size,
>> cache, and TLB footprint (in particular on systems supporting BMI2).
> 
> This link claims that BMI2 may be less performant/consistent on AMD Ryzen 
> than Intel:
> https://www.reddit.com/r/Amd/comments/60i6er/ryzen_and_bmi2_strange_behavior 
> _and_high_latencies/

Hmm, interesting. Brian - any word as to whether we'd better avoid
using PDEP/PEXT for now on AMD?

> Would this patch series have any benefit to L0 hypervisors/rootkits (e.g. 
> Bromium, Bareflank or similar hypervisors) which could be monitoring L1 Xen?  
> Or Xen as L0 hypervisor and Hyper-V as L1 hypervisor?

The insns aren't used on any secrets, so I don't see the connection.
But then again I'm not an expert here at all.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/5] x86: use PDEP/PEXT for maddr/direct-map-offset conversion when available
  2018-08-17  9:38               ` Jan Beulich
@ 2018-09-07 16:17                 ` Andrew Cooper
  2018-09-10 10:00                   ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Cooper @ 2018-09-07 16:17 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On 17/08/18 10:38, Jan Beulich wrote:
>>>> On 17.08.18 at 10:59, <andrew.cooper3@citrix.com> wrote:
>> On 17/08/2018 08:21, Jan Beulich wrote:
>>> --- a/xen/include/asm-x86/asm_defns.h
>>> +++ b/xen/include/asm-x86/asm_defns.h
>>> @@ -186,6 +186,20 @@ void ret_from_intr(void);
>>>          UNLIKELY_END_SECTION "\n"          \
>>>          ".Llikely." #tag ".%=:"
>>>  
>>> +#define LINKONCE_PROLOGUE(sym)                    \
>>> +        ".ifndef " sym() "\n\t"                   \
>>> +        ".pushsection " sym(.gnu.linkonce.t.) "," \
>> This definitely warrants a comment and a change of name, seeing as sym
>> isn't a symbol.  Its a macro which gives you a string back.
> It's still a symbol name; don't forget that in the end section names
> are symbols names too. I'd have gone with a better name if I had
> any better idea. It's also not really clear to me what you want a
> comment to say here.

/* sym must be a macro which takes an optional symbol name prefix */ ?

A user of this macro, knowing only the prototype would expect to use by
passing a string literal.

>
>>> @@ -71,16 +80,77 @@ static inline unsigned long __virt_to_ma
>>>  
>>>          va += xen_phys_start - XEN_VIRT_START;
>>>      }
>>> -    return (va & ma_va_bottom_mask) |
>>> -           ((va << pfn_pdx_hole_shift) & ma_top_mask);
>>> +
>>> +#ifdef CONFIG_INDIRECT_THUNK /* V modifier available? */
>>> +#define SYMNAME(pfx...) #pfx "do2ma_%V[ma]_%V[off]"
>>> +    alternative_io("call " SYMNAME() "\n\t"
>>> +                   LINKONCE_PROLOGUE(SYMNAME) "\n\t"
>>> +                   "mov %[shift], %%ecx\n\t"
>>> +                   "mov %[off], %[ma]\n\t"
>>> +                   "and %[bmask], %[ma]\n\t"
>>> +                   "shl %%cl, %[off]\n\t"
>>> +                   "and %[tmask], %[off]\n\t"
>>> +                   "or %[off], %[ma]\n\t"
>>> +                   "ret\n\t"
>>> +                   LINKONCE_EPILOGUE(SYMNAME),
>>> +                   "pdep %[mask], %[off], %[ma]", X86_FEATURE_BMI2,
>> The compiler understanding V doesn't imply that the assembler
>> understands pdep
> I've been considering this case, and in fact in an early version I had
> a -DHAVE_AS_BMI2 addition. But I then decided to drop it, as I couldn't
> really imagine V support to be backported to gcc pre-dating BMI2
> support in gas (available as of 2.22, i.e. gcc 4.5 / 4.6 timeframe).
> But yes, if you really think we need to cope with that, I could re-add
> this. Just let me know if you indeed think this is needed.

Didn't you say that you've got Retpoline backported to GCC 4.3 ?

Independently of my argument against using V below, I'm not overly
fussed especially as we are considering upping the requirements to
something a little more modern, so long as we accept that we may have to
retrofit a -DHAVE_AS_BMI2 after the fact.

>
>>> +                   ASM_OUTPUT2([ma] "=&r" (ma), [off] "+r" (va)),
>>> +                   [mask] "m" (ma_real_mask),
>>> +                   [shift] "m" (pfn_pdx_hole_shift),
>>> +                   [bmask] "m" (ma_va_bottom_mask),
>>> +                   [tmask] "m" (ma_top_mask)
>>> +                   : "ecx");
>>> +#undef SYMNAME
>>> +#else
>>> +    alternative_io("call do2ma",
>>> +                   /* pdep ma_real_mask(%rip), %rdi, %rax */
>>> +                   ".byte 0xc4, 0xe2, 0xc3, 0xf5, 0x05\n\t"
>>> +                   ".long ma_real_mask - 4 - .",
>>> +                   X86_FEATURE_BMI2,
>>> +                   ASM_OUTPUT2("=a" (ma), "+D" (va)), "m" (ma_real_mask)
>>> +                   : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
>>> +#endif
>> This is a massive clobber list in a function you've forced always
>> inline, and I can't see it doing nice things to the callsites.  TBH,
>> this still feels over-complicated for what it wants to be.
>>
>> Why not implement one single function in assembly that doesn't have
>> usual C calling conventions and can clobber %ecx and one other, and use
>> that?
>>
>> It avoids the need for potentially 256 almost-identical copies of the
>> function in the linkonce section, and avoids having the multiple
>> implementations in C/asm, avoids the need for any logic derived from
>> CONFIG_INDIRECT_THUNK, and avoids the need for massive clobber lists.
> Your response mixes things a bit too much for me to sort out what
> exactly you're concerned about: The massive clobber list exists only
> in the !CONFIG_INDIRECT_THUNK case. In that case though there
> aren't going to be up to 225 instances of the function. I'd be fine
> implementing the single one in assembly to reduce the clobber list,
> it was just to keep down assembly code size and also to have
> compiler generated code to compare against. I have to admit though
> that I'm not overly concerned about the !CONFIG_INDIRECT_THUNK
> case in the first place, so I also didn't see much reason to try to
> optimize it.
>
> For the CONFIG_INDIRECT_THUNK case, otoh, I'd really like to
> avoid dictating register allocation to the compiler. Hence the solution
> with the (possibly many) function instances. Overall code size still
> decreases with this approach, and on modern hardware the entire
> region of the image in which they live will remain cold. (Additionally
> not affecting register allocation here has the benefit of making it
> far easier to compare pre/post generated code.)

How many passes through the hypervisor hit two or more of these functions?

I appreciate that for development, reducing the register perturbance can
be nice for diffing the resulting disassembly, but when it comes to
actually running the code, register renames at compile time are free,
whereas pulling multiple almost-identical copies of the stub into the
I-cache is not.

Implementing this as a single function looks to be simpler in terms of
the change, will compile to a smaller result, and will run faster.  It
seems to be a win-win-win overall.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/5] x86: use PDEP/PEXT for maddr/direct-map-offset conversion when available
  2018-09-07 16:17                 ` Andrew Cooper
@ 2018-09-10 10:00                   ` Jan Beulich
  2018-09-25 17:15                     ` Andrew Cooper
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2018-09-10 10:00 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

>>> On 07.09.18 at 18:17, <andrew.cooper3@citrix.com> wrote:
> On 17/08/18 10:38, Jan Beulich wrote:
>>>>> On 17.08.18 at 10:59, <andrew.cooper3@citrix.com> wrote:
>>> On 17/08/2018 08:21, Jan Beulich wrote:
>>>> --- a/xen/include/asm-x86/asm_defns.h
>>>> +++ b/xen/include/asm-x86/asm_defns.h
>>>> @@ -186,6 +186,20 @@ void ret_from_intr(void);
>>>>          UNLIKELY_END_SECTION "\n"          \
>>>>          ".Llikely." #tag ".%=:"
>>>>  
>>>> +#define LINKONCE_PROLOGUE(sym)                    \
>>>> +        ".ifndef " sym() "\n\t"                   \
>>>> +        ".pushsection " sym(.gnu.linkonce.t.) "," \
>>> This definitely warrants a comment and a change of name, seeing as sym
>>> isn't a symbol.  Its a macro which gives you a string back.
>> It's still a symbol name; don't forget that in the end section names
>> are symbols names too. I'd have gone with a better name if I had
>> any better idea. It's also not really clear to me what you want a
>> comment to say here.
> 
> /* sym must be a macro which takes an optional symbol name prefix */ ?

Thanks, I'll use this in slightly modified form.

>>>> @@ -71,16 +80,77 @@ static inline unsigned long __virt_to_ma
>>>>  
>>>>          va += xen_phys_start - XEN_VIRT_START;
>>>>      }
>>>> -    return (va & ma_va_bottom_mask) |
>>>> -           ((va << pfn_pdx_hole_shift) & ma_top_mask);
>>>> +
>>>> +#ifdef CONFIG_INDIRECT_THUNK /* V modifier available? */
>>>> +#define SYMNAME(pfx...) #pfx "do2ma_%V[ma]_%V[off]"
>>>> +    alternative_io("call " SYMNAME() "\n\t"
>>>> +                   LINKONCE_PROLOGUE(SYMNAME) "\n\t"
>>>> +                   "mov %[shift], %%ecx\n\t"
>>>> +                   "mov %[off], %[ma]\n\t"
>>>> +                   "and %[bmask], %[ma]\n\t"
>>>> +                   "shl %%cl, %[off]\n\t"
>>>> +                   "and %[tmask], %[off]\n\t"
>>>> +                   "or %[off], %[ma]\n\t"
>>>> +                   "ret\n\t"
>>>> +                   LINKONCE_EPILOGUE(SYMNAME),
>>>> +                   "pdep %[mask], %[off], %[ma]", X86_FEATURE_BMI2,
>>> The compiler understanding V doesn't imply that the assembler
>>> understands pdep
>> I've been considering this case, and in fact in an early version I had
>> a -DHAVE_AS_BMI2 addition. But I then decided to drop it, as I couldn't
>> really imagine V support to be backported to gcc pre-dating BMI2
>> support in gas (available as of 2.22, i.e. gcc 4.5 / 4.6 timeframe).
>> But yes, if you really think we need to cope with that, I could re-add
>> this. Just let me know if you indeed think this is needed.
> 
> Didn't you say that you've got Retpoline backported to GCC 4.3 ?

Yes, with an underly binutils 2.25.

> Independently of my argument against using V below, I'm not overly
> fussed especially as we are considering upping the requirements to
> something a little more modern, so long as we accept that we may have to
> retrofit a -DHAVE_AS_BMI2 after the fact.

I've added a respective sentence to the description.

>>>> +                   ASM_OUTPUT2([ma] "=&r" (ma), [off] "+r" (va)),
>>>> +                   [mask] "m" (ma_real_mask),
>>>> +                   [shift] "m" (pfn_pdx_hole_shift),
>>>> +                   [bmask] "m" (ma_va_bottom_mask),
>>>> +                   [tmask] "m" (ma_top_mask)
>>>> +                   : "ecx");
>>>> +#undef SYMNAME
>>>> +#else
>>>> +    alternative_io("call do2ma",
>>>> +                   /* pdep ma_real_mask(%rip), %rdi, %rax */
>>>> +                   ".byte 0xc4, 0xe2, 0xc3, 0xf5, 0x05\n\t"
>>>> +                   ".long ma_real_mask - 4 - .",
>>>> +                   X86_FEATURE_BMI2,
>>>> +                   ASM_OUTPUT2("=a" (ma), "+D" (va)), "m" (ma_real_mask)
>>>> +                   : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
>>>> +#endif
>>> This is a massive clobber list in a function you've forced always
>>> inline, and I can't see it doing nice things to the callsites.  TBH,
>>> this still feels over-complicated for what it wants to be.
>>>
>>> Why not implement one single function in assembly that doesn't have
>>> usual C calling conventions and can clobber %ecx and one other, and use
>>> that?
>>>
>>> It avoids the need for potentially 256 almost-identical copies of the
>>> function in the linkonce section, and avoids having the multiple
>>> implementations in C/asm, avoids the need for any logic derived from
>>> CONFIG_INDIRECT_THUNK, and avoids the need for massive clobber lists.
>> Your response mixes things a bit too much for me to sort out what
>> exactly you're concerned about: The massive clobber list exists only
>> in the !CONFIG_INDIRECT_THUNK case. In that case though there
>> aren't going to be up to 225 instances of the function. I'd be fine
>> implementing the single one in assembly to reduce the clobber list,
>> it was just to keep down assembly code size and also to have
>> compiler generated code to compare against. I have to admit though
>> that I'm not overly concerned about the !CONFIG_INDIRECT_THUNK
>> case in the first place, so I also didn't see much reason to try to
>> optimize it.
>>
>> For the CONFIG_INDIRECT_THUNK case, otoh, I'd really like to
>> avoid dictating register allocation to the compiler. Hence the solution
>> with the (possibly many) function instances. Overall code size still
>> decreases with this approach, and on modern hardware the entire
>> region of the image in which they live will remain cold. (Additionally
>> not affecting register allocation here has the benefit of making it
>> far easier to compare pre/post generated code.)
> 
> How many passes through the hypervisor hit two or more of these functions?
> 
> I appreciate that for development, reducing the register perturbance can
> be nice for diffing the resulting disassembly, but when it comes to
> actually running the code, register renames at compile time are free,
> whereas pulling multiple almost-identical copies of the stub into the
> I-cache is not.
> 
> Implementing this as a single function looks to be simpler in terms of
> the change, will compile to a smaller result, and will run faster.  It
> seems to be a win-win-win overall.

Well, I continue to not really agree. First and foremost, as said before,
the common (exclusive?) case is going to be that with "x86: use MOV
for PFN/PDX conversion when possible" no calls will exist at runtime at
all. At that point all function instances could collectively be purged just
like .init.text, if we cared enough. And then, for this particular case,
leaving the compiler the widest possible choice of register allocation
still seems pretty desirable to me. I'd agree with your "register
renames at compile time are free" only if there weren't special uses of
quite a few of the registers.

As perhaps a prime example, consider the case where the
transformation here gets done in the course of setting up another
function's arguments. The compiler would have to avoid certain
registers (or generate extra MOVs) if it had to first pass the input
in a fixed register to the helper here (which then additionally also
needs to be assumed to clobber several other ones).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 0/4] x86: improve PDX <-> PFN and alike translations
       [not found]       ` <5AA7D98302000078001DD5F0@prv1-mh.provo.novell.com>
  2018-08-17  7:06         ` [PATCH v3 0/5] x86: improve PDX <-> PFN and alike translations Jan Beulich
@ 2018-09-18 12:34         ` Jan Beulich
  2018-09-18 12:36           ` [PATCH v4 1/4] x86: use PDEP/PEXT for maddr/direct-map-offset conversion when available Jan Beulich
                             ` (3 more replies)
  1 sibling, 4 replies; 24+ messages in thread
From: Jan Beulich @ 2018-09-18 12:34 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu

1: use PDEP/PEXT for maddr/direct-map-offset conversion when available
2: use PDEP/PEXT for PFN/PDX conversion when available
3: use MOV for PFN/PDX conversion when possible
4: use PDEP for PTE flags insertion when available

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v4: Indentation adjustments. Comments.
v3: Re-order series. Re-base.
v2: Extensive re-base.





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 1/4] x86: use PDEP/PEXT for maddr/direct-map-offset conversion when available
  2018-09-18 12:34         ` [PATCH v4 0/4] x86: improve PDX <-> PFN and alike translations Jan Beulich
@ 2018-09-18 12:36           ` Jan Beulich
  2018-10-31 17:03             ` Wei Liu
  2018-09-18 12:36           ` [PATCH v4 2/4] x86: use PDEP/PEXT for PFN/PDX " Jan Beulich
                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2018-09-18 12:36 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu

This allows to fold 6 instructions into a single one, reducing code size
quite a bit, especially when not considering the fallback functions
(which won't ever need to be brought into iCache or their mappings into
iTLB on systems supporting BMI2).

Make use of gcc's new V operand modifier, even if that results in a
slightly odd dependency in the sources (but I also didn't want to
introduce yet another manifest constant). This assumes that backports of
support for this relatively new modifier have only been done to tool
chains with not too old a binutils (gas) version. If this turns out to
be a false assumption, we'll have to add HAVE_AS_BMI2 as a qualifier.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v4: "Bodge" alternative_io() indentation here so that it'll come out
    right without re-indenting after patch 3. Add comment to
    LINKONCE_{PRO,EPI}LOGUE().
v3: Move infrastructure pieces here from "x86: use PDEP for PTE flags
    insertion when available".
v2: Avoid quoted symbols; use gcc's new V operand modifier instead.
    Re-base.

--- a/xen/arch/x86/alternative.c
+++ b/xen/arch/x86/alternative.c
@@ -243,6 +243,12 @@ void init_or_livepatch apply_alternative
         /* 0xe8/0xe9 are relative branches; fix the offset. */
         if ( a->repl_len >= 5 && (*buf & 0xfe) == 0xe8 )
             *(int32_t *)(buf + 1) += repl - orig;
+        /* RIP-relative addressing is easy to check for in VEX-encoded insns. */
+        else if ( a->repl_len >= 8 &&
+                  (*buf & ~1) == 0xc4 &&
+                  a->repl_len >= 9 - (*buf & 1) &&
+                  (buf[4 - (*buf & 1)] & ~0x38) == 0x05 )
+            *(int32_t *)(buf + 5 - (*buf & 1)) += repl - orig;
 
         add_nops(buf + a->repl_len, total_len - a->repl_len);
         text_poke(orig, buf, total_len);
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -390,6 +390,25 @@ void __init arch_init_memory(void)
 #endif
 }
 
+paddr_t __read_mostly ma_real_mask = ~0UL;
+
+#ifndef CONFIG_INDIRECT_THUNK /* V modifier unavailable? */
+
+/* Conversion between machine address and direct map offset. */
+paddr_t do2ma(unsigned long off)
+{
+    return (off & ma_va_bottom_mask) |
+           ((off << pfn_pdx_hole_shift) & ma_top_mask);
+}
+
+unsigned long ma2do(paddr_t ma)
+{
+    return (ma & ma_va_bottom_mask) |
+           ((ma & ma_top_mask) >> pfn_pdx_hole_shift);
+}
+
+#endif
+
 int page_is_ram_type(unsigned long mfn, unsigned long mem_type)
 {
     uint64_t maddr = pfn_to_paddr(mfn);
--- a/xen/arch/x86/srat.c
+++ b/xen/arch/x86/srat.c
@@ -450,6 +450,8 @@ void __init srat_parse_regions(u64 addr)
 	}
 
 	pfn_pdx_hole_setup(mask >> PAGE_SHIFT);
+
+	ma_real_mask = ma_top_mask | ma_va_bottom_mask;
 }
 
 /* Use the information discovered above to actually set up the nodes. */
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -66,6 +66,7 @@ SECTIONS
         _stext = .;            /* Text and read-only data */
        *(.text)
        *(.text.__x86_indirect_thunk_*)
+       *(.gnu.linkonce.t.*)
        *(.text.page_aligned)
 
        . = ALIGN(PAGE_SIZE);
--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -186,6 +186,24 @@ void ret_from_intr(void);
         UNLIKELY_END_SECTION "\n"          \
         ".Llikely." #tag ".%=:"
 
+/*
+ * For both of the below, sym() must be a macro which takes an optional
+ * symbol name prefix.
+ */
+#define LINKONCE_PROLOGUE(sym)                    \
+        ".ifndef " sym() "\n\t"                   \
+        ".pushsection " sym(.gnu.linkonce.t.) "," \
+                      "\"ax\",@progbits\n\t"      \
+        ".p2align 4\n"                            \
+        sym() ":"
+
+#define LINKONCE_EPILOGUE(sym)                    \
+        ".weak " sym() "\n\t"                     \
+        ".type " sym() ", @function\n\t"          \
+        ".size " sym() ", . - " sym() "\n\t"      \
+        ".popsection\n\t"                         \
+        ".endif"
+
 #endif
 
 /* "Raw" instruction opcodes */
--- a/xen/include/asm-x86/x86_64/page.h
+++ b/xen/include/asm-x86/x86_64/page.h
@@ -42,11 +42,18 @@ static inline unsigned long canonicalise
         return addr & ~CANONICAL_MASK;
 }
 
+#include <asm/alternative.h>
+#include <asm/asm_defns.h>
+#include <asm/cpufeature.h>
 #include <asm/types.h>
 
 #include <xen/pdx.h>
 
 extern unsigned long xen_virt_end;
+extern paddr_t ma_real_mask;
+
+paddr_t do2ma(unsigned long);
+unsigned long ma2do(paddr_t);
 
 /*
  * Note: These are solely for the use by page_{get,set}_owner(), and
@@ -57,8 +64,10 @@ extern unsigned long xen_virt_end;
 #define pdx_to_virt(pdx) ((void *)(DIRECTMAP_VIRT_START + \
                                    ((unsigned long)(pdx) << PAGE_SHIFT)))
 
-static inline unsigned long __virt_to_maddr(unsigned long va)
+static always_inline paddr_t __virt_to_maddr(unsigned long va)
 {
+    paddr_t ma;
+
     ASSERT(va < DIRECTMAP_VIRT_END);
     if ( va >= DIRECTMAP_VIRT_START )
         va -= DIRECTMAP_VIRT_START;
@@ -71,16 +80,77 @@ static inline unsigned long __virt_to_ma
 
         va += xen_phys_start - XEN_VIRT_START;
     }
-    return (va & ma_va_bottom_mask) |
-           ((va << pfn_pdx_hole_shift) & ma_top_mask);
+
+#ifdef CONFIG_INDIRECT_THUNK /* V modifier available? */
+#define SYMNAME(pfx...) #pfx "do2ma_%V[ma]_%V[off]"
+    alternative_io("call " SYMNAME() "\n\t"
+                     LINKONCE_PROLOGUE(SYMNAME) "\n\t"
+                     "mov %[shift], %%ecx\n\t"
+                     "mov %[off], %[ma]\n\t"
+                     "and %[bmask], %[ma]\n\t"
+                     "shl %%cl, %[off]\n\t"
+                     "and %[tmask], %[off]\n\t"
+                     "or %[off], %[ma]\n\t"
+                     "ret\n\t"
+                     LINKONCE_EPILOGUE(SYMNAME),
+                     "pdep %[mask], %[off], %[ma]", X86_FEATURE_BMI2,
+                     ASM_OUTPUT2([ma] "=&r" (ma), [off] "+r" (va)),
+                     [mask] "m" (ma_real_mask),
+                     [shift] "m" (pfn_pdx_hole_shift),
+                     [bmask] "m" (ma_va_bottom_mask),
+                     [tmask] "m" (ma_top_mask)
+                     : "ecx");
+#undef SYMNAME
+#else
+    alternative_io("call do2ma",
+                     /* pdep ma_real_mask(%rip), %rdi, %rax */
+                     ".byte 0xc4, 0xe2, 0xc3, 0xf5, 0x05\n\t"
+                     ".long ma_real_mask - 4 - .",
+                     X86_FEATURE_BMI2,
+                     ASM_OUTPUT2("=a" (ma), "+D" (va)), "m" (ma_real_mask)
+                     : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
+#endif
+
+    return ma;
 }
 
-static inline void *__maddr_to_virt(unsigned long ma)
+static always_inline void *__maddr_to_virt(paddr_t ma)
 {
+    unsigned long off;
+
     ASSERT(pfn_to_pdx(ma >> PAGE_SHIFT) < (DIRECTMAP_SIZE >> PAGE_SHIFT));
-    return (void *)(DIRECTMAP_VIRT_START +
-                    ((ma & ma_va_bottom_mask) |
-                     ((ma & ma_top_mask) >> pfn_pdx_hole_shift)));
+
+#ifdef CONFIG_INDIRECT_THUNK /* V modifier available? */
+#define SYMNAME(pfx...) #pfx "ma2do_%V[off]_%V[ma]"
+    alternative_io("call " SYMNAME() "\n\t"
+                     LINKONCE_PROLOGUE(SYMNAME) "\n\t"
+                     "mov %[tmask], %[off]\n\t"
+                     "mov %[shift], %%ecx\n\t"
+                     "and %[ma], %[off]\n\t"
+                     "and %[bmask], %[ma]\n\t"
+                     "shr %%cl, %[off]\n\t"
+                     "or %[ma], %[off]\n\t"
+                     "ret\n\t"
+                     LINKONCE_EPILOGUE(SYMNAME),
+                     "pext %[mask], %[ma], %[off]", X86_FEATURE_BMI2,
+                     ASM_OUTPUT2([off] "=&r" (off), [ma] "+r" (ma)),
+                     [mask] "m" (ma_real_mask),
+                     [shift] "m" (pfn_pdx_hole_shift),
+                     [bmask] "m" (ma_va_bottom_mask),
+                     [tmask] "m" (ma_top_mask)
+                     : "ecx");
+#undef SYMNAME
+#else
+    alternative_io("call ma2do",
+                     /* pext ma_real_mask(%rip), %rdi, %rax */
+                     ".byte 0xc4, 0xe2, 0xc2, 0xf5, 0x05\n\t"
+                     ".long ma_real_mask - 4 - .",
+                     X86_FEATURE_BMI2,
+                     ASM_OUTPUT2("=a" (off), "+D" (ma)), "m" (ma_real_mask)
+                     : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
+#endif
+
+    return (void *)DIRECTMAP_VIRT_START + off;
 }
 
 /* read access (should only be used for debug printk's) */




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 2/4] x86: use PDEP/PEXT for PFN/PDX conversion when available
  2018-09-18 12:34         ` [PATCH v4 0/4] x86: improve PDX <-> PFN and alike translations Jan Beulich
  2018-09-18 12:36           ` [PATCH v4 1/4] x86: use PDEP/PEXT for maddr/direct-map-offset conversion when available Jan Beulich
@ 2018-09-18 12:36           ` Jan Beulich
  2018-09-18 12:37           ` [PATCH v4 3/4] x86: use MOV for PFN/PDX conversion when possible Jan Beulich
  2018-09-18 12:37           ` [PATCH v4 4/4] x86: use PDEP for PTE flags insertion when available Jan Beulich
  3 siblings, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2018-09-18 12:36 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu

Both replace 6 instructions by a single one, further reducing code size,
cache, and TLB footprint (in particular on systems supporting BMI2).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v4: "Bodge" alternative_io() indentation here so that it'll come out
    right without re-indenting after patch 3.
v2: Avoid quoted symbols; use gcc's new V operand modifier instead.
    Re-base.

--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -391,6 +391,7 @@ void __init arch_init_memory(void)
 }
 
 paddr_t __read_mostly ma_real_mask = ~0UL;
+unsigned long __read_mostly pfn_real_mask = ~0UL;
 
 #ifndef CONFIG_INDIRECT_THUNK /* V modifier unavailable? */
 
@@ -407,6 +408,17 @@ unsigned long ma2do(paddr_t ma)
            ((ma & ma_top_mask) >> pfn_pdx_hole_shift);
 }
 
+/* Conversion between PDX and PFN. */
+unsigned long pdx2pfn(unsigned long pdx)
+{
+    return generic_pdx_to_pfn(pdx);
+}
+
+unsigned long pfn2pdx(unsigned long pfn)
+{
+    return generic_pfn_to_pdx(pfn);
+}
+
 #endif
 
 int page_is_ram_type(unsigned long mfn, unsigned long mem_type)
--- a/xen/arch/x86/srat.c
+++ b/xen/arch/x86/srat.c
@@ -452,6 +452,7 @@ void __init srat_parse_regions(u64 addr)
 	pfn_pdx_hole_setup(mask >> PAGE_SHIFT);
 
 	ma_real_mask = ma_top_mask | ma_va_bottom_mask;
+	pfn_real_mask = pfn_top_mask | pfn_pdx_bottom_mask;
 }
 
 /* Use the information discovered above to actually set up the nodes. */
--- /dev/null
+++ b/xen/include/asm-arm/pdx.h
@@ -0,0 +1,16 @@
+#ifndef __ASM_ARM_PDX_H__
+#define __ASM_ARM_PDX_H__
+
+#define pdx_to_pfn generic_pdx_to_pfn
+#define pfn_to_pdx generic_pfn_to_pdx
+
+#endif /* __ASM_ARM_PDX_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
--- /dev/null
+++ b/xen/include/asm-x86/pdx.h
@@ -0,0 +1,93 @@
+#ifndef __ASM_ARM_PDX_H__
+#define __ASM_ARM_PDX_H__
+
+#include <asm/alternative.h>
+#include <asm/asm_defns.h>
+#include <asm/cpufeature.h>
+
+extern unsigned long pfn_real_mask;
+
+static always_inline unsigned long pdx_to_pfn(unsigned long pdx)
+{
+    unsigned long pfn;
+
+#ifdef CONFIG_INDIRECT_THUNK /* V modifier available? */
+#define SYMNAME(pfx...) #pfx "pdx2pfn_%V[pfn]_%V[pdx]"
+    alternative_io("call " SYMNAME() "\n\t"
+                     LINKONCE_PROLOGUE(SYMNAME) "\n\t"
+                     "mov %[shift], %%ecx\n\t"
+                     "mov %[pdx], %[pfn]\n\t"
+                     "and %[bmask], %[pfn]\n\t"
+                     "shl %%cl, %[pdx]\n\t"
+                     "and %[tmask], %[pdx]\n\t"
+                     "or %[pdx], %[pfn]\n\t"
+                     "ret\n\t"
+                     LINKONCE_EPILOGUE(SYMNAME),
+                     "pdep %[mask], %[pdx], %[pfn]", X86_FEATURE_BMI2,
+                     ASM_OUTPUT2([pfn] "=&r" (pfn), [pdx] "+r" (pdx)),
+                     [mask] "m" (pfn_real_mask),
+                     [shift] "m" (pfn_pdx_hole_shift),
+                     [bmask] "m" (pfn_pdx_bottom_mask),
+                     [tmask] "m" (pfn_top_mask)
+                     : "ecx");
+#undef SYMNAME
+#else
+    alternative_io("call pdx2pfn",
+                     /* pdep pfn_real_mask(%rip), %rdi, %rax */
+                     ".byte 0xc4, 0xe2, 0xc3, 0xf5, 0x05\n\t"
+                     ".long pfn_real_mask - 4 - .",
+                     X86_FEATURE_BMI2,
+                     ASM_OUTPUT2("=a" (pfn), "+D" (pdx)), "m" (pfn_real_mask)
+                     : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
+#endif
+
+    return pfn;
+}
+
+static always_inline unsigned long pfn_to_pdx(unsigned long pfn)
+{
+    unsigned long pdx;
+
+#ifdef CONFIG_INDIRECT_THUNK /* V modifier available? */
+#define SYMNAME(pfx...) #pfx "pfn2pdx_%V[pdx]_%V[pfn]"
+    alternative_io("call " SYMNAME() "\n\t"
+                     LINKONCE_PROLOGUE(SYMNAME) "\n\t"
+                     "mov %[tmask], %[pdx]\n\t"
+                     "mov %[shift], %%ecx\n\t"
+                     "and %[pfn], %[pdx]\n\t"
+                     "and %[bmask], %[pfn]\n\t"
+                     "shr %%cl, %[pdx]\n\t"
+                     "or %[pfn], %[pdx]\n\t"
+                     "ret\n\t"
+                     LINKONCE_EPILOGUE(SYMNAME),
+                     "pext %[mask], %[pfn], %[pdx]", X86_FEATURE_BMI2,
+                     ASM_OUTPUT2([pdx] "=&r" (pdx), [pfn] "+r" (pfn)),
+                     [mask] "m" (pfn_real_mask),
+                     [shift] "m" (pfn_pdx_hole_shift),
+                     [bmask] "m" (pfn_pdx_bottom_mask),
+                     [tmask] "m" (pfn_top_mask)
+                     : "ecx");
+#undef SYMNAME
+#else
+    alternative_io("call pfn2pdx",
+                     /* pext pfn_real_mask(%rip), %rdi, %rax */
+                     ".byte 0xc4, 0xe2, 0xc2, 0xf5, 0x05\n\t"
+                     ".long pfn_real_mask - 4 - .",
+                     X86_FEATURE_BMI2,
+                     ASM_OUTPUT2("=a" (pdx), "+D" (pfn)), "m" (pfn_real_mask)
+                     : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
+#endif
+
+    return pdx;
+}
+
+#endif /* __ASM_ARM_PDX_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
--- a/xen/include/xen/pdx.h
+++ b/xen/include/xen/pdx.h
@@ -23,13 +23,13 @@ extern void set_pdx_range(unsigned long
 
 bool __mfn_valid(unsigned long mfn);
 
-static inline unsigned long pfn_to_pdx(unsigned long pfn)
+static inline unsigned long generic_pfn_to_pdx(unsigned long pfn)
 {
     return (pfn & pfn_pdx_bottom_mask) |
            ((pfn & pfn_top_mask) >> pfn_pdx_hole_shift);
 }
 
-static inline unsigned long pdx_to_pfn(unsigned long pdx)
+static inline unsigned long generic_pdx_to_pfn(unsigned long pdx)
 {
     return (pdx & pfn_pdx_bottom_mask) |
            ((pdx << pfn_pdx_hole_shift) & pfn_top_mask);
@@ -40,6 +40,8 @@ static inline unsigned long pdx_to_pfn(u
 
 extern void pfn_pdx_hole_setup(unsigned long);
 
+#include <asm/pdx.h>
+
 #endif /* HAS_PDX */
 #endif /* __XEN_PDX_H__ */
 




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 3/4] x86: use MOV for PFN/PDX conversion when possible
  2018-09-18 12:34         ` [PATCH v4 0/4] x86: improve PDX <-> PFN and alike translations Jan Beulich
  2018-09-18 12:36           ` [PATCH v4 1/4] x86: use PDEP/PEXT for maddr/direct-map-offset conversion when available Jan Beulich
  2018-09-18 12:36           ` [PATCH v4 2/4] x86: use PDEP/PEXT for PFN/PDX " Jan Beulich
@ 2018-09-18 12:37           ` Jan Beulich
  2018-09-18 12:37           ` [PATCH v4 4/4] x86: use PDEP for PTE flags insertion when available Jan Beulich
  3 siblings, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2018-09-18 12:37 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu

Most x86 systems don't actually require the use of PDX compression. Now
that we have patching for the conversion code in place anyway, extend it
to use simple MOV when possible. Introduce a new pseudo-CPU-feature to
key the patching off of.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
The indentation of the alternative_io_2() uses below ends up wrong, but
re-indenting here seems undesirable. I could either add a follow-up
re-formatting patch, or simply slightly bodge the indentation in the
earlier patches, for it to come out right here.
---
v3: Re-base.
v2: Avoid quoted symbols; use gcc's new V operand modifier instead.
    Re-base.

--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1442,6 +1442,9 @@ void __init noreturn __start_xen(unsigne
 
     numa_initmem_init(0, raw_max_page);
 
+    if ( !pfn_pdx_hole_shift )
+        setup_force_cpu_cap(X86_FEATURE_PFN_PDX_IDENT);
+
     if ( max_page - 1 > virt_to_mfn(HYPERVISOR_VIRT_END - 1) )
     {
         unsigned long limit = virt_to_mfn(HYPERVISOR_VIRT_END - 1);
--- a/xen/include/asm-x86/cpufeatures.h
+++ b/xen/include/asm-x86/cpufeatures.h
@@ -30,5 +30,6 @@ XEN_CPUFEATURE(SC_MSR_PV,       (FSCAPIN
 XEN_CPUFEATURE(SC_MSR_HVM,      (FSCAPINTS+0)*32+17) /* MSR_SPEC_CTRL used by Xen for HVM */
 XEN_CPUFEATURE(SC_RSB_PV,       (FSCAPINTS+0)*32+18) /* RSB overwrite needed for PV */
 XEN_CPUFEATURE(SC_RSB_HVM,      (FSCAPINTS+0)*32+19) /* RSB overwrite needed for HVM */
+XEN_CPUFEATURE(PFN_PDX_IDENT,   (FSCAPINTS+0)*32+20) /* PFN <-> PDX mapping is 1:1 */
 XEN_CPUFEATURE(SC_MSR_IDLE,     (FSCAPINTS+0)*32+21) /* (SC_MSR_PV || SC_MSR_HVM) && default_xen_spec_ctrl */
 XEN_CPUFEATURE(XEN_LBR,         (FSCAPINTS+0)*32+22) /* Xen uses MSR_DEBUGCTL.LBR */
--- a/xen/include/asm-x86/pdx.h
+++ b/xen/include/asm-x86/pdx.h
@@ -13,7 +13,7 @@ static always_inline unsigned long pdx_t
 
 #ifdef CONFIG_INDIRECT_THUNK /* V modifier available? */
 #define SYMNAME(pfx...) #pfx "pdx2pfn_%V[pfn]_%V[pdx]"
-    alternative_io("call " SYMNAME() "\n\t"
+    alternative_io_2("call " SYMNAME() "\n\t"
                      LINKONCE_PROLOGUE(SYMNAME) "\n\t"
                      "mov %[shift], %%ecx\n\t"
                      "mov %[pdx], %[pfn]\n\t"
@@ -24,6 +24,7 @@ static always_inline unsigned long pdx_t
                      "ret\n\t"
                      LINKONCE_EPILOGUE(SYMNAME),
                      "pdep %[mask], %[pdx], %[pfn]", X86_FEATURE_BMI2,
+                     "mov %[pdx], %[pfn]", X86_FEATURE_PFN_PDX_IDENT,
                      ASM_OUTPUT2([pfn] "=&r" (pfn), [pdx] "+r" (pdx)),
                      [mask] "m" (pfn_real_mask),
                      [shift] "m" (pfn_pdx_hole_shift),
@@ -32,11 +33,12 @@ static always_inline unsigned long pdx_t
                      : "ecx");
 #undef SYMNAME
 #else
-    alternative_io("call pdx2pfn",
+    alternative_io_2("call pdx2pfn",
                      /* pdep pfn_real_mask(%rip), %rdi, %rax */
                      ".byte 0xc4, 0xe2, 0xc3, 0xf5, 0x05\n\t"
                      ".long pfn_real_mask - 4 - .",
                      X86_FEATURE_BMI2,
+                     "mov %%rdi, %%rax", X86_FEATURE_PFN_PDX_IDENT,
                      ASM_OUTPUT2("=a" (pfn), "+D" (pdx)), "m" (pfn_real_mask)
                      : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
 #endif
@@ -50,7 +52,7 @@ static always_inline unsigned long pfn_t
 
 #ifdef CONFIG_INDIRECT_THUNK /* V modifier available? */
 #define SYMNAME(pfx...) #pfx "pfn2pdx_%V[pdx]_%V[pfn]"
-    alternative_io("call " SYMNAME() "\n\t"
+    alternative_io_2("call " SYMNAME() "\n\t"
                      LINKONCE_PROLOGUE(SYMNAME) "\n\t"
                      "mov %[tmask], %[pdx]\n\t"
                      "mov %[shift], %%ecx\n\t"
@@ -61,6 +63,7 @@ static always_inline unsigned long pfn_t
                      "ret\n\t"
                      LINKONCE_EPILOGUE(SYMNAME),
                      "pext %[mask], %[pfn], %[pdx]", X86_FEATURE_BMI2,
+                     "mov %[pfn], %[pdx]", X86_FEATURE_PFN_PDX_IDENT,
                      ASM_OUTPUT2([pdx] "=&r" (pdx), [pfn] "+r" (pfn)),
                      [mask] "m" (pfn_real_mask),
                      [shift] "m" (pfn_pdx_hole_shift),
@@ -69,11 +72,12 @@ static always_inline unsigned long pfn_t
                      : "ecx");
 #undef SYMNAME
 #else
-    alternative_io("call pfn2pdx",
+    alternative_io_2("call pfn2pdx",
                      /* pext pfn_real_mask(%rip), %rdi, %rax */
                      ".byte 0xc4, 0xe2, 0xc2, 0xf5, 0x05\n\t"
                      ".long pfn_real_mask - 4 - .",
                      X86_FEATURE_BMI2,
+                     "mov %%rdi, %%rax", X86_FEATURE_PFN_PDX_IDENT,
                      ASM_OUTPUT2("=a" (pdx), "+D" (pfn)), "m" (pfn_real_mask)
                      : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
 #endif
--- a/xen/include/asm-x86/x86_64/page.h
+++ b/xen/include/asm-x86/x86_64/page.h
@@ -83,7 +83,7 @@ static always_inline paddr_t __virt_to_m
 
 #ifdef CONFIG_INDIRECT_THUNK /* V modifier available? */
 #define SYMNAME(pfx...) #pfx "do2ma_%V[ma]_%V[off]"
-    alternative_io("call " SYMNAME() "\n\t"
+    alternative_io_2("call " SYMNAME() "\n\t"
                      LINKONCE_PROLOGUE(SYMNAME) "\n\t"
                      "mov %[shift], %%ecx\n\t"
                      "mov %[off], %[ma]\n\t"
@@ -94,6 +94,7 @@ static always_inline paddr_t __virt_to_m
                      "ret\n\t"
                      LINKONCE_EPILOGUE(SYMNAME),
                      "pdep %[mask], %[off], %[ma]", X86_FEATURE_BMI2,
+                     "mov %[off], %[ma]", X86_FEATURE_PFN_PDX_IDENT,
                      ASM_OUTPUT2([ma] "=&r" (ma), [off] "+r" (va)),
                      [mask] "m" (ma_real_mask),
                      [shift] "m" (pfn_pdx_hole_shift),
@@ -102,11 +103,12 @@ static always_inline paddr_t __virt_to_m
                      : "ecx");
 #undef SYMNAME
 #else
-    alternative_io("call do2ma",
+    alternative_io_2("call do2ma",
                      /* pdep ma_real_mask(%rip), %rdi, %rax */
                      ".byte 0xc4, 0xe2, 0xc3, 0xf5, 0x05\n\t"
                      ".long ma_real_mask - 4 - .",
                      X86_FEATURE_BMI2,
+                     "mov %%rdi, %%rax", X86_FEATURE_PFN_PDX_IDENT,
                      ASM_OUTPUT2("=a" (ma), "+D" (va)), "m" (ma_real_mask)
                      : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
 #endif
@@ -122,7 +124,7 @@ static always_inline void *__maddr_to_vi
 
 #ifdef CONFIG_INDIRECT_THUNK /* V modifier available? */
 #define SYMNAME(pfx...) #pfx "ma2do_%V[off]_%V[ma]"
-    alternative_io("call " SYMNAME() "\n\t"
+    alternative_io_2("call " SYMNAME() "\n\t"
                      LINKONCE_PROLOGUE(SYMNAME) "\n\t"
                      "mov %[tmask], %[off]\n\t"
                      "mov %[shift], %%ecx\n\t"
@@ -133,6 +135,7 @@ static always_inline void *__maddr_to_vi
                      "ret\n\t"
                      LINKONCE_EPILOGUE(SYMNAME),
                      "pext %[mask], %[ma], %[off]", X86_FEATURE_BMI2,
+                     "mov %[ma], %[off]", X86_FEATURE_PFN_PDX_IDENT,
                      ASM_OUTPUT2([off] "=&r" (off), [ma] "+r" (ma)),
                      [mask] "m" (ma_real_mask),
                      [shift] "m" (pfn_pdx_hole_shift),
@@ -141,11 +144,12 @@ static always_inline void *__maddr_to_vi
                      : "ecx");
 #undef SYMNAME
 #else
-    alternative_io("call ma2do",
+    alternative_io_2("call ma2do",
                      /* pext ma_real_mask(%rip), %rdi, %rax */
                      ".byte 0xc4, 0xe2, 0xc2, 0xf5, 0x05\n\t"
                      ".long ma_real_mask - 4 - .",
                      X86_FEATURE_BMI2,
+                     "mov %%rdi, %%rax", X86_FEATURE_PFN_PDX_IDENT,
                      ASM_OUTPUT2("=a" (off), "+D" (ma)), "m" (ma_real_mask)
                      : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
 #endif




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 4/4] x86: use PDEP for PTE flags insertion when available
  2018-09-18 12:34         ` [PATCH v4 0/4] x86: improve PDX <-> PFN and alike translations Jan Beulich
                             ` (2 preceding siblings ...)
  2018-09-18 12:37           ` [PATCH v4 3/4] x86: use MOV for PFN/PDX conversion when possible Jan Beulich
@ 2018-09-18 12:37           ` Jan Beulich
  3 siblings, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2018-09-18 12:37 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu

This replaces 5 instructions by a single one, further reducing code size,
cache, and TLB footprint (in particular on systems supporting BMI2).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Irrespective of the note regarding a possible alternative route, I think
the change here is an improvement until someone would take on
investigating whether the proposed dropping of the 24-bit flags
representation would be a worthwhile improvement of generated code.
---
v3: Move this patch last in series. Re-base.
v2: Avoid quoted symbols; use gcc's new V operand modifier instead.
    Re-base.
---
TBD: Also change get_pte_flags() (after having introduced test_pte_flags())?

--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -393,6 +393,8 @@ void __init arch_init_memory(void)
 paddr_t __read_mostly ma_real_mask = ~0UL;
 unsigned long __read_mostly pfn_real_mask = ~0UL;
 
+const intpte_t pte_flags_mask = ~(PADDR_MASK & PAGE_MASK);
+
 #ifndef CONFIG_INDIRECT_THUNK /* V modifier unavailable? */
 
 /* Conversion between machine address and direct map offset. */
@@ -419,6 +421,11 @@ unsigned long pfn2pdx(unsigned long pfn)
     return generic_pfn_to_pdx(pfn);
 }
 
+intpte_t put_pte_flags_v(unsigned int flags)
+{
+    return put_pte_flags_c(flags);
+}
+
 #endif
 
 int page_is_ram_type(unsigned long mfn, unsigned long mem_type)
--- a/xen/include/asm-x86/x86_64/page.h
+++ b/xen/include/asm-x86/x86_64/page.h
@@ -205,15 +205,53 @@ typedef l4_pgentry_t root_pgentry_t;
 
 /* Extract flags into 24-bit integer, or turn 24-bit flags into a pte mask. */
 #ifndef __ASSEMBLY__
+extern const intpte_t pte_flags_mask;
+intpte_t __attribute_const__ put_pte_flags_v(unsigned int x);
+
 static inline unsigned int get_pte_flags(intpte_t x)
 {
     return ((x >> 40) & ~0xfff) | (x & 0xfff);
 }
 
-static inline intpte_t put_pte_flags(unsigned int x)
+static inline intpte_t put_pte_flags_c(unsigned int x)
 {
     return (((intpte_t)x & ~0xfff) << 40) | (x & 0xfff);
 }
+
+static always_inline intpte_t put_pte_flags(unsigned int x)
+{
+    intpte_t pte;
+
+    if ( __builtin_constant_p(x) )
+        return put_pte_flags_c(x);
+
+#ifdef CONFIG_INDIRECT_THUNK /* V modifier available? */
+#define SYMNAME(pfx...) #pfx "put_pte_flags_%V[pte]_%V[flags]"
+    alternative_io("call " SYMNAME() "\n\t"
+                   LINKONCE_PROLOGUE(SYMNAME) "\n\t"
+                   "mov %[flags], %k[pte]\n\t"
+                   "and $0xfff000, %[flags]\n\t"
+                   "and $0x000fff, %k[pte]\n\t"
+                   "shl $40, %q[flags]\n\t"
+                   "or %q[flags], %[pte]\n\t"
+                   "ret\n\t"
+                   LINKONCE_EPILOGUE(SYMNAME),
+                   "pdep %[mask], %q[flags], %[pte]", X86_FEATURE_BMI2,
+                   ASM_OUTPUT2([pte] "=&r" (pte), [flags] "+r" (x)),
+                   [mask] "m" (pte_flags_mask));
+#undef SYMNAME
+#else
+    alternative_io("call put_pte_flags_v",
+                   /* pdep pte_flags_mask(%rip), %rdi, %rax */
+                   ".byte 0xc4, 0xe2, 0xc3, 0xf5, 0x05\n\t"
+                   ".long pte_flags_mask - 4 - .",
+                   X86_FEATURE_BMI2,
+                   ASM_OUTPUT2("=a" (pte), "+D" (x)), "m" (pte_flags_mask)
+                   : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
+#endif
+
+    return pte;
+}
 #endif
 
 /*





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/5] x86: use PDEP/PEXT for maddr/direct-map-offset conversion when available
  2018-09-10 10:00                   ` Jan Beulich
@ 2018-09-25 17:15                     ` Andrew Cooper
  2018-09-26  8:48                       ` Jan Beulich
  2018-09-27  6:44                       ` Jan Beulich
  0 siblings, 2 replies; 24+ messages in thread
From: Andrew Cooper @ 2018-09-25 17:15 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On 10/09/18 11:00, Jan Beulich wrote:
>
>>>>> +                   ASM_OUTPUT2([ma] "=&r" (ma), [off] "+r" (va)),
>>>>> +                   [mask] "m" (ma_real_mask),
>>>>> +                   [shift] "m" (pfn_pdx_hole_shift),
>>>>> +                   [bmask] "m" (ma_va_bottom_mask),
>>>>> +                   [tmask] "m" (ma_top_mask)
>>>>> +                   : "ecx");
>>>>> +#undef SYMNAME
>>>>> +#else
>>>>> +    alternative_io("call do2ma",
>>>>> +                   /* pdep ma_real_mask(%rip), %rdi, %rax */
>>>>> +                   ".byte 0xc4, 0xe2, 0xc3, 0xf5, 0x05\n\t"
>>>>> +                   ".long ma_real_mask - 4 - .",
>>>>> +                   X86_FEATURE_BMI2,
>>>>> +                   ASM_OUTPUT2("=a" (ma), "+D" (va)), "m" (ma_real_mask)
>>>>> +                   : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
>>>>> +#endif
>>>> This is a massive clobber list in a function you've forced always
>>>> inline, and I can't see it doing nice things to the callsites.  TBH,
>>>> this still feels over-complicated for what it wants to be.
>>>>
>>>> Why not implement one single function in assembly that doesn't have
>>>> usual C calling conventions and can clobber %ecx and one other, and use
>>>> that?
>>>>
>>>> It avoids the need for potentially 256 almost-identical copies of the
>>>> function in the linkonce section, and avoids having the multiple
>>>> implementations in C/asm, avoids the need for any logic derived from
>>>> CONFIG_INDIRECT_THUNK, and avoids the need for massive clobber lists.
>>> Your response mixes things a bit too much for me to sort out what
>>> exactly you're concerned about: The massive clobber list exists only
>>> in the !CONFIG_INDIRECT_THUNK case. In that case though there
>>> aren't going to be up to 225 instances of the function. I'd be fine
>>> implementing the single one in assembly to reduce the clobber list,
>>> it was just to keep down assembly code size and also to have
>>> compiler generated code to compare against. I have to admit though
>>> that I'm not overly concerned about the !CONFIG_INDIRECT_THUNK
>>> case in the first place, so I also didn't see much reason to try to
>>> optimize it.
>>>
>>> For the CONFIG_INDIRECT_THUNK case, otoh, I'd really like to
>>> avoid dictating register allocation to the compiler. Hence the solution
>>> with the (possibly many) function instances. Overall code size still
>>> decreases with this approach, and on modern hardware the entire
>>> region of the image in which they live will remain cold. (Additionally
>>> not affecting register allocation here has the benefit of making it
>>> far easier to compare pre/post generated code.)
>> How many passes through the hypervisor hit two or more of these functions?
>>
>> I appreciate that for development, reducing the register perturbance can
>> be nice for diffing the resulting disassembly, but when it comes to
>> actually running the code, register renames at compile time are free,
>> whereas pulling multiple almost-identical copies of the stub into the
>> I-cache is not.
>>
>> Implementing this as a single function looks to be simpler in terms of
>> the change, will compile to a smaller result, and will run faster.  It
>> seems to be a win-win-win overall.
> Well, I continue to not really agree. First and foremost, as said before,
> the common (exclusive?) case is going to be that with "x86: use MOV
> for PFN/PDX conversion when possible" no calls will exist at runtime at
> all.

Taking this one step further, why don't we drop PDX entirely?

I seem to recall you saying that the one system it was introduced for
never shipped, at which point, why bother keeping the code around?

A separate point which has only just occurred to me is the humongous
pipeline stall which occurs when mixing legacy and VEX SSE instructions
on SandyBridge and later hardware.  I severely doubt that a single
transformation from ALU operations to PDEP/PEXT is going to make up for
the pipeline stall if the guest is using legacy SSE, although given how
common the PDX conversions are, I could easily believe that the net is
in the same ballpark.

> At that point all function instances could collectively be purged just
> like .init.text, if we cared enough. And then, for this particular case,
> leaving the compiler the widest possible choice of register allocation
> still seems pretty desirable to me. I'd agree with your "register
> renames at compile time are free" only if there weren't special uses of
> quite a few of the registers.
>
> As perhaps a prime example, consider the case where the
> transformation here gets done in the course of setting up another
> function's arguments. The compiler would have to avoid certain
> registers (or generate extra MOVs) if it had to first pass the input
> in a fixed register to the helper here (which then additionally also
> needs to be assumed to clobber several other ones).

Once again, I think you are focusing on the wrong aspect, and ending up
with something which is worse overall.

First, is there a single example here where the compiler sets up
registers before a function call? Given the sequence point, it is
distinctly non-trivial to optimise around.

Irrespective, a couple of extra movs (which are handled during register
renaming) is far less overhead than hitting a cold icache line, and
having 256 variations of this stub function is a very good way to hit a
lot of cold icache lines.

Ultimately, real performance numbers are the only way to say for sure,
but I expect you'll be surprised by the results you'd see.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/5] x86: use PDEP/PEXT for maddr/direct-map-offset conversion when available
  2018-09-25 17:15                     ` Andrew Cooper
@ 2018-09-26  8:48                       ` Jan Beulich
  2018-09-27  6:44                       ` Jan Beulich
  1 sibling, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2018-09-26  8:48 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

>>> On 25.09.18 at 19:15, <andrew.cooper3@citrix.com> wrote:
> On 10/09/18 11:00, Jan Beulich wrote:
>> Well, I continue to not really agree. First and foremost, as said before,
>> the common (exclusive?) case is going to be that with "x86: use MOV
>> for PFN/PDX conversion when possible" no calls will exist at runtime at
>> all.
> 
> Taking this one step further, why don't we drop PDX entirely?
> 
> I seem to recall you saying that the one system it was introduced for
> never shipped, at which point, why bother keeping the code around?

For one I don't know if they or anyone else have plans to ship something
like this in the future. And then ARM uses PDX as well.

> A separate point which has only just occurred to me is the humongous
> pipeline stall which occurs when mixing legacy and VEX SSE instructions
> on SandyBridge and later hardware.  I severely doubt that a single
> transformation from ALU operations to PDEP/PEXT is going to make up for
> the pipeline stall if the guest is using legacy SSE, although given how
> common the PDX conversions are, I could easily believe that the net is
> in the same ballpark.

I think you're mixing up SIMD and VEX-encoded GPR insns. I'm not
aware of the latter (which PDEP and PEXT belong to) falling into the
category where that SIMD register related stall would occur. That's
only related to non-VEX-encoded SIMD insns not updating the full
YMM / ZMM registers afaik (and iirc has been largely taken care of
in newer hardware).

>> At that point all function instances could collectively be purged just
>> like .init.text, if we cared enough. And then, for this particular case,
>> leaving the compiler the widest possible choice of register allocation
>> still seems pretty desirable to me. I'd agree with your "register
>> renames at compile time are free" only if there weren't special uses of
>> quite a few of the registers.
>>
>> As perhaps a prime example, consider the case where the
>> transformation here gets done in the course of setting up another
>> function's arguments. The compiler would have to avoid certain
>> registers (or generate extra MOVs) if it had to first pass the input
>> in a fixed register to the helper here (which then additionally also
>> needs to be assumed to clobber several other ones).
> 
> Once again, I think you are focusing on the wrong aspect, and ending up
> with something which is worse overall.
> 
> First, is there a single example here where the compiler sets up
> registers before a function call? Given the sequence point, it is
> distinctly non-trivial to optimise around.

First and foremost

#define __map_domain_page(pg)        map_domain_page(page_to_mfn(pg))

static inline void *__map_domain_page_global(const struct page_info *pg)
{
    return map_domain_page_global(page_to_mfn(pg));
}

Plus efi_rs_enter() has

    switch_cr3_cr4(virt_to_maddr(efi_l4_pgtable), read_cr4());

and gnttab_unpopulate_status_frames() has

                     : guest_physmap_remove_page(d, gfn,
                                                 page_to_mfn(pg), 0);

just to give a few examples, and while looking for some I've already
skipped printk() invocations, init-only code and alike.

> Irrespective, a couple of extra movs (which are handled during register
> renaming) is far less overhead than hitting a cold icache line, and
> having 256 variations of this stub function is a very good way to hit a
> lot of cold icache lines.

Why do you continue to think that the called functions would be any
more cold than the call sites? I'm not going to claim I perfectly know
all details of when and how prefetching works on the various CPU
models, but I think direct calls should be pretty easy to handle in
hardware.

> Ultimately, real performance numbers are the only way to say for sure,
> but I expect you'll be surprised by the results you'd see.

I don't think I would, because for there to be surprises I'd have to
have hardware where I actually end up using the stub functions (or
revive/reconstruct the patch I used to have to fake memory holes).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/5] x86: use PDEP/PEXT for maddr/direct-map-offset conversion when available
  2018-09-25 17:15                     ` Andrew Cooper
  2018-09-26  8:48                       ` Jan Beulich
@ 2018-09-27  6:44                       ` Jan Beulich
  1 sibling, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2018-09-27  6:44 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

>>> On 25.09.18 at 19:15, <andrew.cooper3@citrix.com> wrote:
> I seem to recall you saying that the one system it was introduced for
> never shipped, at which point, why bother keeping the code around?

Well, I've said _I think_ they've never shipped.

Irrespective of this, another thought here to come to a middle
ground: What about restricting support for discontiguous memory
maps to the BIGMEM case? At which point we might consider
making BMI2 a prereq for that, eliminating the need for stub
functions, and using alternatives just to replace PDEP/PEXT with
MOV when possible.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 1/4] x86: use PDEP/PEXT for maddr/direct-map-offset conversion when available
  2018-09-18 12:36           ` [PATCH v4 1/4] x86: use PDEP/PEXT for maddr/direct-map-offset conversion when available Jan Beulich
@ 2018-10-31 17:03             ` Wei Liu
  2018-11-02  8:49               ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Wei Liu @ 2018-10-31 17:03 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Wei Liu, Andrew Cooper

On Tue, Sep 18, 2018 at 06:36:21AM -0600, Jan Beulich wrote:
> This allows to fold 6 instructions into a single one, reducing code size
> quite a bit, especially when not considering the fallback functions
> (which won't ever need to be brought into iCache or their mappings into
> iTLB on systems supporting BMI2).
> 
> Make use of gcc's new V operand modifier, even if that results in a
> slightly odd dependency in the sources (but I also didn't want to
> introduce yet another manifest constant). This assumes that backports of
> support for this relatively new modifier have only been done to tool
> chains with not too old a binutils (gas) version. If this turns out to
> be a false assumption, we'll have to add HAVE_AS_BMI2 as a qualifier.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

I'm afraid this patch series smells like  "micro optimisation gone too
far" to me. :-/

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 1/4] x86: use PDEP/PEXT for maddr/direct-map-offset conversion when available
  2018-10-31 17:03             ` Wei Liu
@ 2018-11-02  8:49               ` Jan Beulich
  0 siblings, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2018-11-02  8:49 UTC (permalink / raw)
  To: Wei Liu; +Cc: Andrew Cooper, xen-devel

>>> On 31.10.18 at 18:03, <wei.liu2@citrix.com> wrote:
> On Tue, Sep 18, 2018 at 06:36:21AM -0600, Jan Beulich wrote:
>> This allows to fold 6 instructions into a single one, reducing code size
>> quite a bit, especially when not considering the fallback functions
>> (which won't ever need to be brought into iCache or their mappings into
>> iTLB on systems supporting BMI2).
>> 
>> Make use of gcc's new V operand modifier, even if that results in a
>> slightly odd dependency in the sources (but I also didn't want to
>> introduce yet another manifest constant). This assumes that backports of
>> support for this relatively new modifier have only been done to tool
>> chains with not too old a binutils (gas) version. If this turns out to
>> be a false assumption, we'll have to add HAVE_AS_BMI2 as a qualifier.
>> 
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> I'm afraid this patch series smells like  "micro optimisation gone too
> far" to me. :-/

Well, as explained in replies to Andrew as well, the primary motivation
here is two-fold:
1) The binary code involved in these translations is rather hard to
follow in some cases. Analysis of crashes would be a bit easier with
the simpler resulting insns.
2) The ultimate use of a single MOV is, considering the large number
of instances, quite certainly an improvement on the vast majority
of systems.

That said, I think I've decided to not pursue this series following
Andrew's comments, which in part have matched your remark.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2018-11-02  8:49 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <5AA7E79302000078001B0FAB@prv1-mh.provo.novell.com>
     [not found] ` <5AA7D98302000000000F73C0@prv1-mh.provo.novell.com>
     [not found]   ` <5AA7D98302000078001CDC8C@prv1-mh.provo.novell.com>
     [not found]     ` <5AA7D98302000000000F8008@prv1-mh.provo.novell.com>
     [not found]       ` <5AA7D98302000078001DD5F0@prv1-mh.provo.novell.com>
2018-08-17  7:06         ` [PATCH v3 0/5] x86: improve PDX <-> PFN and alike translations Jan Beulich
2018-08-17  7:20           ` [PATCH v3 1/5] x86: remove page.h and processor.h inclusion from asm_defns.h Jan Beulich
2018-08-17  8:39             ` Andrew Cooper
2018-08-17  7:21           ` [PATCH v3 2/5] x86: use PDEP/PEXT for maddr/direct-map-offset conversion when available Jan Beulich
2018-08-17  8:59             ` Andrew Cooper
2018-08-17  9:38               ` Jan Beulich
2018-09-07 16:17                 ` Andrew Cooper
2018-09-10 10:00                   ` Jan Beulich
2018-09-25 17:15                     ` Andrew Cooper
2018-09-26  8:48                       ` Jan Beulich
2018-09-27  6:44                       ` Jan Beulich
2018-08-17  7:22           ` [PATCH v3 3/5] x86: use PDEP/PEXT for PFN/PDX " Jan Beulich
2018-08-17  7:23           ` [PATCH v3 4/5] x86: use MOV for PFN/PDX conversion when possible Jan Beulich
2018-08-17  8:41             ` Andrew Cooper
2018-08-17  7:24           ` [PATCH v3 5/5] x86: use PDEP for PTE flags insertion when available Jan Beulich
2018-08-18  1:08             ` Rich Persaud
2018-08-20  7:25               ` Jan Beulich
2018-09-18 12:34         ` [PATCH v4 0/4] x86: improve PDX <-> PFN and alike translations Jan Beulich
2018-09-18 12:36           ` [PATCH v4 1/4] x86: use PDEP/PEXT for maddr/direct-map-offset conversion when available Jan Beulich
2018-10-31 17:03             ` Wei Liu
2018-11-02  8:49               ` Jan Beulich
2018-09-18 12:36           ` [PATCH v4 2/4] x86: use PDEP/PEXT for PFN/PDX " Jan Beulich
2018-09-18 12:37           ` [PATCH v4 3/4] x86: use MOV for PFN/PDX conversion when possible Jan Beulich
2018-09-18 12:37           ` [PATCH v4 4/4] x86: use PDEP for PTE flags insertion when available Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.