All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] x86: improve PDX <-> PFN and alike translations
@ 2018-02-28 13:51 Jan Beulich
  2018-02-28 13:56 ` [PATCH 1/5] x86: remove page.h and processor.h inclusion from asm_defns.h Jan Beulich
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: Jan Beulich @ 2018-02-28 13:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

1: remove page.h and processor.h inclusion from asm_defns.h
2: use PDEP for PTE flags insertion when available
3: use PDEP/PEXT for maddr/direct-map-offset conversion when available
4: use PDEP/PEXT for PFN/PDX conversion when available
5: use MOV for PFN/PDX conversion when possible

Signed-off-by: Jan Beulich <jbeulich@suse.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/5] x86: remove page.h and processor.h inclusion from asm_defns.h
  2018-02-28 13:51 [PATCH 0/5] x86: improve PDX <-> PFN and alike translations Jan Beulich
@ 2018-02-28 13:56 ` Jan Beulich
  2018-02-28 13:57 ` [PATCH 2/5] x86: use PDEP for PTE flags insertion when available Jan Beulich
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Jan Beulich @ 2018-02-28 13:56 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

Subsequent changes require this (too wide anyway imo) dependency to be
dropped.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/boot/head.S
+++ b/xen/arch/x86/boot/head.S
@@ -5,6 +5,7 @@
 #include <asm/desc.h>
 #include <asm/fixmap.h>
 #include <asm/page.h>
+#include <asm/processor.h>
 #include <asm/msr.h>
 #include <asm/cpufeature.h>
 #include <public/elfnote.h>
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -9,6 +9,7 @@
 #include <asm/asm_defns.h>
 #include <asm/apicdef.h>
 #include <asm/page.h>
+#include <asm/processor.h>
 #include <asm/desc.h>
 #include <public/xen.h>
 #include <irq_vectors.h>
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -11,6 +11,7 @@
 #include <asm/asm_defns.h>
 #include <asm/apicdef.h>
 #include <asm/page.h>
+#include <asm/processor.h>
 #include <public/xen.h>
 #include <irq_vectors.h>
 
--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -7,9 +7,8 @@
 #include <asm/asm-offsets.h>
 #endif
 #include <asm/bug.h>
-#include <asm/page.h>
-#include <asm/processor.h>
 #include <asm/percpu.h>
+#include <asm/x86-defns.h>
 #include <xen/stringify.h>
 #include <asm/cpufeature.h>
 #include <asm/alternative.h>
--- a/xen/include/asm-x86/cpuid.h
+++ b/xen/include/asm-x86/cpuid.h
@@ -259,6 +259,7 @@ int init_domain_cpuid_policy(struct doma
 /* Clamp the CPUID policy to reality. */
 void recalculate_cpuid_policy(struct domain *d);
 
+struct vcpu;
 void guest_cpuid(const struct vcpu *v, uint32_t leaf,
                  uint32_t subleaf, struct cpuid_leaf *res);
 
--- a/xen/include/asm-x86/msr.h
+++ b/xen/include/asm-x86/msr.h
@@ -11,6 +11,7 @@
 #include <asm/alternative.h>
 #include <asm/asm_defns.h>
 #include <asm/cpufeature.h>
+#include <asm/processor.h>
 
 #define rdmsr(msr,val1,val2) \
      __asm__ __volatile__("rdmsr" \




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 2/5] x86: use PDEP for PTE flags insertion when available
  2018-02-28 13:51 [PATCH 0/5] x86: improve PDX <-> PFN and alike translations Jan Beulich
  2018-02-28 13:56 ` [PATCH 1/5] x86: remove page.h and processor.h inclusion from asm_defns.h Jan Beulich
@ 2018-02-28 13:57 ` Jan Beulich
  2018-02-28 13:57 ` [PATCH 3/5] x86: use PDEP/PEXT for maddr/direct-map-offset conversion " Jan Beulich
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Jan Beulich @ 2018-02-28 13:57 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

This allows to fold 5 instructions into a single one, reducing code size
quite a bit, especially when not considering the fallback functions
(which won't ever need to be brought into iCache or their mappings into
iTLB on systems supporting BMI2).

Unfortunately gas support for quoted symbols has been introduced in
multiple steps, so we need to add a second check rather than being able
to re-use the already existing HAVE_GAS_QUOTED_SYM manifest constant.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
TBD: Also change get_pte_flags() (after having introduced test_pte_flags())?

--- a/xen/arch/x86/Rules.mk
+++ b/xen/arch/x86/Rules.mk
@@ -23,6 +23,7 @@ $(call as-option-add,CFLAGS,CC,"rdseed %
 $(call as-option-add,CFLAGS,CC,".equ \"x\"$$(comma)1", \
                      -U__OBJECT_LABEL__ -DHAVE_GAS_QUOTED_SYM \
                      '-D__OBJECT_LABEL__=$(subst $(BASEDIR)/,,$(CURDIR))/$$@')
+$(call as-option-add,CFLAGS,CC,".ifdef \"%\"; .endif",-DHAVE_GAS_QUOTED_EXPR_SYM)
 
 CFLAGS += -mno-red-zone -fpic -fno-asynchronous-unwind-tables
 
--- a/xen/arch/x86/alternative.c
+++ b/xen/arch/x86/alternative.c
@@ -192,6 +192,12 @@ void init_or_livepatch apply_alternative
         /* 0xe8/0xe9 are relative branches; fix the offset. */
         if ( a->replacementlen >= 5 && (*insnbuf & 0xfe) == 0xe8 )
             *(s32 *)(insnbuf + 1) += replacement - instr;
+        /* RIP-relative addressing is easy to check for in VEX-encoded insns. */
+        else if ( a->replacementlen >= 8 &&
+                  (*insnbuf & ~1) == 0xc4 &&
+                  a->replacementlen >= 9 - (*insnbuf & 1) &&
+                  (insnbuf[4 - (*insnbuf & 1)] & ~0x38) == 0x05 )
+            *(s32 *)(insnbuf + 5 - (*insnbuf & 1)) += replacement - instr;
 
         add_nops(insnbuf + a->replacementlen,
                  a->instrlen - a->replacementlen);
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -391,6 +391,15 @@ void __init arch_init_memory(void)
 #endif
 }
 
+const intpte_t pte_flags_mask = ~(PADDR_MASK & PAGE_MASK);
+
+#ifndef HAVE_GAS_QUOTED_EXPR_SYM
+intpte_t put_pte_flags_v(unsigned int flags)
+{
+    return put_pte_flags_c(flags);
+}
+#endif
+
 int page_is_ram_type(unsigned long mfn, unsigned long mem_type)
 {
     uint64_t maddr = pfn_to_paddr(mfn);
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -66,6 +66,7 @@ SECTIONS
         _stext = .;            /* Text and read-only data */
        *(.text)
        *(.text.__x86_indirect_thunk_*)
+       *(.gnu.linkonce.t.*)
        *(.text.page_aligned)
 
        . = ALIGN(PAGE_SIZE);
--- a/xen/include/asm-x86/x86_64/page.h
+++ b/xen/include/asm-x86/x86_64/page.h
@@ -34,6 +34,9 @@
 
 #ifndef __ASSEMBLY__
 
+#include <asm/alternative.h>
+#include <asm/asm_defns.h>
+#include <asm/cpufeature.h>
 #include <asm/types.h>
 
 #include <xen/pdx.h>
@@ -123,15 +126,55 @@ typedef l4_pgentry_t root_pgentry_t;
 
 /* Extract flags into 24-bit integer, or turn 24-bit flags into a pte mask. */
 #ifndef __ASSEMBLY__
+extern const intpte_t pte_flags_mask;
+intpte_t __attribute_const__ put_pte_flags_v(unsigned int x);
+
 static inline unsigned int get_pte_flags(intpte_t x)
 {
     return ((x >> 40) & ~0xfff) | (x & 0xfff);
 }
 
-static inline intpte_t put_pte_flags(unsigned int x)
+static inline intpte_t put_pte_flags_c(unsigned int x)
 {
     return (((intpte_t)x & ~0xfff) << 40) | (x & 0xfff);
 }
+
+static always_inline intpte_t put_pte_flags(unsigned int x)
+{
+    intpte_t pte;
+
+    if ( __builtin_constant_p(x) )
+        return put_pte_flags_c(x);
+
+#ifdef HAVE_GAS_QUOTED_EXPR_SYM
+#define SYMNAME(pfx...) "\"" #pfx "put_pte_flags_%[pte]_%q[flags]\""
+    alternative_io_2("call " SYMNAME() "; " ASM_NOP4 "\t"
+                     LINKONCE_PROLOGUE(SYMNAME)
+                     "mov %[flags], %k[pte]\n\t"
+                     "and $0xfff000, %[flags]\n\t"
+                     "and $0x000fff, %k[pte]\n\t"
+                     "shl $40, %q[flags]\n\t"
+                     "or %q[flags], %[pte]\n\t"
+                     "ret\n\t"
+                     LINKONCE_EPILOGUE(SYMNAME),
+                     "call " SYMNAME(), X86_FEATURE_ALWAYS,
+                     "pdep %[mask], %q[flags], %[pte]", X86_FEATURE_BMI2,
+                     ASM_OUTPUT2([pte] "=&r" (pte), [flags] "+r" (x)),
+                     [mask] "m" (pte_flags_mask));
+#undef SYMNAME
+#else
+    alternative_io_2("call put_pte_flags_v; " ASM_NOP4,
+                     "call put_pte_flags_v", X86_FEATURE_ALWAYS,
+                     /* pdep pte_flags_mask(%rip), %rdi, %rax */
+                     ".byte 0xc4, 0xe2, 0xc3, 0xf5, 0x05\n\t"
+                     ".long pte_flags_mask - 4 - .",
+                     X86_FEATURE_BMI2,
+                     ASM_OUTPUT2("=a" (pte), "+D" (x)), "m" (pte_flags_mask)
+                     : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
+#endif
+
+    return pte;
+}
 #endif
 
 /*
--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -187,6 +187,20 @@ void ret_from_intr(void);
         UNLIKELY_END_SECTION "\n"          \
         ".Llikely." #tag ".%=:"
 
+#define LINKONCE_PROLOGUE(sym)                    \
+        ".ifndef " sym() "\n\t"                   \
+        ".pushsection " sym(.gnu.linkonce.t.) "," \
+                      "\"ax\",@progbits\n\t"      \
+        ".p2align 4\n"                            \
+        sym() ":\n\t"
+
+#define LINKONCE_EPILOGUE(sym)                    \
+        ".weak " sym() "\n\t"                     \
+        ".type " sym() ", @function\n\t"          \
+        ".size " sym() ", . - " sym() "\n\t"      \
+        ".popsection\n\t"                         \
+        ".endif\n\t"
+
 #endif
 
 /* "Raw" instruction opcodes */



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 3/5] x86: use PDEP/PEXT for maddr/direct-map-offset conversion when available
  2018-02-28 13:51 [PATCH 0/5] x86: improve PDX <-> PFN and alike translations Jan Beulich
  2018-02-28 13:56 ` [PATCH 1/5] x86: remove page.h and processor.h inclusion from asm_defns.h Jan Beulich
  2018-02-28 13:57 ` [PATCH 2/5] x86: use PDEP for PTE flags insertion when available Jan Beulich
@ 2018-02-28 13:57 ` Jan Beulich
  2018-02-28 13:58 ` [PATCH 4/5] x86: use PDEP/PEXT for PFN/PDX " Jan Beulich
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Jan Beulich @ 2018-02-28 13:57 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

Both replace 6 instructions by a single one, further reducing code size,
cache, and TLB footprint (in particular on systems supporting BMI2).

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -393,11 +393,26 @@ void __init arch_init_memory(void)
 
 const intpte_t pte_flags_mask = ~(PADDR_MASK & PAGE_MASK);
 
+paddr_t __read_mostly ma_real_mask = ~0UL;
+
 #ifndef HAVE_GAS_QUOTED_EXPR_SYM
 intpte_t put_pte_flags_v(unsigned int flags)
 {
     return put_pte_flags_c(flags);
 }
+
+/* Conversion between machine address and direct map offset. */
+paddr_t do2ma(unsigned long off)
+{
+    return (off & ma_va_bottom_mask) |
+           ((off << pfn_pdx_hole_shift) & ma_top_mask);
+}
+
+unsigned long ma2do(paddr_t ma)
+{
+    return (ma & ma_va_bottom_mask) |
+           ((ma & ma_top_mask) >> pfn_pdx_hole_shift);
+}
 #endif
 
 int page_is_ram_type(unsigned long mfn, unsigned long mem_type)
--- a/xen/arch/x86/srat.c
+++ b/xen/arch/x86/srat.c
@@ -446,6 +446,8 @@ void __init srat_parse_regions(u64 addr)
 	}
 
 	pfn_pdx_hole_setup(mask >> PAGE_SHIFT);
+
+	ma_real_mask = ma_top_mask | ma_va_bottom_mask;
 }
 
 /* Use the information discovered above to actually set up the nodes. */
--- a/xen/include/asm-x86/x86_64/page.h
+++ b/xen/include/asm-x86/x86_64/page.h
@@ -42,6 +42,10 @@
 #include <xen/pdx.h>
 
 extern unsigned long xen_virt_end;
+extern paddr_t ma_real_mask;
+
+paddr_t do2ma(unsigned long);
+unsigned long ma2do(paddr_t);
 
 /*
  * Note: These are solely for the use by page_{get,set}_owner(), and
@@ -52,8 +56,10 @@ extern unsigned long xen_virt_end;
 #define pdx_to_virt(pdx) ((void *)(DIRECTMAP_VIRT_START + \
                                    ((unsigned long)(pdx) << PAGE_SHIFT)))
 
-static inline unsigned long __virt_to_maddr(unsigned long va)
+static always_inline paddr_t __virt_to_maddr(unsigned long va)
 {
+    paddr_t ma;
+
     ASSERT(va < DIRECTMAP_VIRT_END);
     if ( va >= DIRECTMAP_VIRT_START )
         va -= DIRECTMAP_VIRT_START;
@@ -66,16 +72,81 @@ static inline unsigned long __virt_to_ma
 
         va += xen_phys_start - XEN_VIRT_START;
     }
-    return (va & ma_va_bottom_mask) |
-           ((va << pfn_pdx_hole_shift) & ma_top_mask);
+
+#ifdef HAVE_GAS_QUOTED_EXPR_SYM
+#define SYMNAME(pfx...) "\"" #pfx "do2ma_%[ma]_%[off]\""
+    alternative_io_2("call " SYMNAME() "; " ASM_NOP4 "\t"
+                     LINKONCE_PROLOGUE(SYMNAME)
+                     "mov %[shift], %%ecx\n\t"
+                     "mov %[off], %[ma]\n\t"
+                     "and %[bmask], %[ma]\n\t"
+                     "shl %%cl, %[off]\n\t"
+                     "and %[tmask], %[off]\n\t"
+                     "or %[off], %[ma]\n\t"
+                     "ret\n\t"
+                     LINKONCE_EPILOGUE(SYMNAME),
+                     "call " SYMNAME(), X86_FEATURE_ALWAYS,
+                     "pdep %[mask], %[off], %[ma]", X86_FEATURE_BMI2,
+                     ASM_OUTPUT2([ma] "=&r" (ma), [off] "+r" (va)),
+                     [mask] "m" (ma_real_mask),
+                     [shift] "m" (pfn_pdx_hole_shift),
+                     [bmask] "m" (ma_va_bottom_mask),
+                     [tmask] "m" (ma_top_mask)
+                     : "ecx");
+#undef SYMNAME
+#else
+    alternative_io_2("call do2ma; " ASM_NOP4,
+                     "call do2ma", X86_FEATURE_ALWAYS,
+                     /* pdep ma_real_mask(%rip), %rdi, %rax */
+                     ".byte 0xc4, 0xe2, 0xc3, 0xf5, 0x05\n\t"
+                     ".long ma_real_mask - 4 - .",
+                     X86_FEATURE_BMI2,
+                     ASM_OUTPUT2("=a" (ma), "+D" (va)), "m" (ma_real_mask)
+                     : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
+#endif
+
+    return ma;
 }
 
-static inline void *__maddr_to_virt(unsigned long ma)
+static always_inline void *__maddr_to_virt(paddr_t ma)
 {
+    unsigned long off;
+
     ASSERT(pfn_to_pdx(ma >> PAGE_SHIFT) < (DIRECTMAP_SIZE >> PAGE_SHIFT));
-    return (void *)(DIRECTMAP_VIRT_START +
-                    ((ma & ma_va_bottom_mask) |
-                     ((ma & ma_top_mask) >> pfn_pdx_hole_shift)));
+
+#ifdef HAVE_GAS_QUOTED_EXPR_SYM
+#define SYMNAME(pfx...) "\"" #pfx "ma2do_%[off]_%[ma]\""
+    alternative_io_2("call " SYMNAME() "; " ASM_NOP4 "\t"
+                     LINKONCE_PROLOGUE(SYMNAME)
+                     "mov %[tmask], %[off]\n\t"
+                     "mov %[shift], %%ecx\n\t"
+                     "and %[ma], %[off]\n\t"
+                     "and %[bmask], %[ma]\n\t"
+                     "shr %%cl, %[off]\n\t"
+                     "or %[ma], %[off]\n\t"
+                     "ret\n\t"
+                     LINKONCE_EPILOGUE(SYMNAME),
+                     "call " SYMNAME(), X86_FEATURE_ALWAYS,
+                     "pext %[mask], %[ma], %[off]", X86_FEATURE_BMI2,
+                     ASM_OUTPUT2([off] "=&r" (off), [ma] "+r" (ma)),
+                     [mask] "m" (ma_real_mask),
+                     [shift] "m" (pfn_pdx_hole_shift),
+                     [bmask] "m" (ma_va_bottom_mask),
+                     [tmask] "m" (ma_top_mask)
+                     : "ecx");
+#undef SYMNAME
+#else
+    alternative_io_2("call ma2do; " ASM_NOP4,
+                     "call ma2do", X86_FEATURE_ALWAYS,
+                     /* pext ma_real_mask(%rip), %rdi, %rax */
+                     ".byte 0xc4, 0xe2, 0xc2, 0xf5, 0x05\n\t"
+                     ".long ma_real_mask - 4 - .",
+                     X86_FEATURE_BMI2,
+                     ASM_OUTPUT2("=a" (off), "+D" (ma)), "m" (ma_real_mask)
+                     : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
+#endif
+
+    return (void *)DIRECTMAP_VIRT_START + off;
 }
 
 /* read access (should only be used for debug printk's) */



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/5] x86: use PDEP/PEXT for PFN/PDX conversion when available
  2018-02-28 13:51 [PATCH 0/5] x86: improve PDX <-> PFN and alike translations Jan Beulich
                   ` (2 preceding siblings ...)
  2018-02-28 13:57 ` [PATCH 3/5] x86: use PDEP/PEXT for maddr/direct-map-offset conversion " Jan Beulich
@ 2018-02-28 13:58 ` Jan Beulich
  2018-02-28 14:35   ` Jan Beulich
  2018-02-28 13:59 ` [PATCH 5/5] x86: use MOV for PFN/PDX conversion when possible Jan Beulich
  2018-02-28 16:47 ` [PATCH 0/5] x86: improve PDX <-> PFN and alike translations Andrew Cooper
  5 siblings, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2018-02-28 13:58 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Julien Grall, Stefano Stabellini

Both replace 6 instructions by a single one, further reducing code size,
cache, and TLB footprint (in particular on systems supporting BMI2).

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -394,6 +394,7 @@ void __init arch_init_memory(void)
 const intpte_t pte_flags_mask = ~(PADDR_MASK & PAGE_MASK);
 
 paddr_t __read_mostly ma_real_mask = ~0UL;
+unsigned long __read_mostly pfn_real_mask = ~0UL;
 
 #ifndef HAVE_GAS_QUOTED_EXPR_SYM
 intpte_t put_pte_flags_v(unsigned int flags)
@@ -413,6 +414,17 @@ unsigned long ma2do(paddr_t ma)
     return (ma & ma_va_bottom_mask) |
            ((ma & ma_top_mask) >> pfn_pdx_hole_shift);
 }
+
+/* Conversion between PDX and PFN. */
+unsigned long pdx2pfn(unsigned long pdx)
+{
+    return generic_pdx_to_pfn(pdx);
+}
+
+unsigned long pfn2pdx(unsigned long pfn)
+{
+    return generic_pfn_to_pdx(pfn);
+}
 #endif
 
 int page_is_ram_type(unsigned long mfn, unsigned long mem_type)
--- a/xen/arch/x86/srat.c
+++ b/xen/arch/x86/srat.c
@@ -448,6 +448,7 @@ void __init srat_parse_regions(u64 addr)
 	pfn_pdx_hole_setup(mask >> PAGE_SHIFT);
 
 	ma_real_mask = ma_top_mask | ma_va_bottom_mask;
+	pfn_real_mask = pfn_top_mask | pfn_pdx_bottom_mask;
 }
 
 /* Use the information discovered above to actually set up the nodes. */
--- /dev/null
+++ b/xen/include/asm-arm/pdx.h
@@ -0,0 +1,16 @@
+#ifndef __ASM_ARM_PDX_H__
+#define __ASM_ARM_PDX_H__
+
+#define pdx_to_pfn generic_pdx_to_pfn
+#define pfn_to_pdx generic_pfn_to_pdx
+
+#endif /* __ASM_ARM_PDX_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
--- /dev/null
+++ b/xen/include/asm-x86/pdx.h
@@ -0,0 +1,97 @@
+#ifndef __ASM_ARM_PDX_H__
+#define __ASM_ARM_PDX_H__
+
+#include <asm/alternative.h>
+#include <asm/asm_defns.h>
+#include <asm/cpufeature.h>
+
+extern unsigned long pfn_real_mask;
+
+static always_inline unsigned long pdx_to_pfn(unsigned long pdx)
+{
+    unsigned long pfn;
+
+#ifdef HAVE_GAS_QUOTED_EXPR_SYM
+#define SYMNAME(pfx...) "\"" #pfx "pdx2pfn_%[pfn]_%[pdx]\""
+    alternative_io_2("call " SYMNAME() "; " ASM_NOP4 "\t"
+                     LINKONCE_PROLOGUE(SYMNAME)
+                     "mov %[shift], %%ecx\n\t"
+                     "mov %[pdx], %[pfn]\n\t"
+                     "and %[bmask], %[pfn]\n\t"
+                     "shl %%cl, %[pdx]\n\t"
+                     "and %[tmask], %[pdx]\n\t"
+                     "or %[pdx], %[pfn]\n\t"
+                     "ret\n\t"
+                     LINKONCE_EPILOGUE(SYMNAME),
+                     "call " SYMNAME(), X86_FEATURE_ALWAYS,
+                     "pdep %[mask], %[pdx], %[pfn]", X86_FEATURE_BMI2,
+                     ASM_OUTPUT2([pfn] "=&r" (pfn), [pdx] "+r" (pdx)),
+                     [mask] "m" (pfn_real_mask),
+                     [shift] "m" (pfn_pdx_hole_shift),
+                     [bmask] "m" (pfn_pdx_bottom_mask),
+                     [tmask] "m" (pfn_top_mask)
+                     : "ecx");
+#undef SYMNAME
+#else
+    alternative_io_2("call pdx2pfn; " ASM_NOP4,
+                     "call pdx2pfn", X86_FEATURE_ALWAYS,
+                     /* pdep pfn_real_mask(%rip), %rdi, %rax */
+                     ".byte 0xc4, 0xe2, 0xc3, 0xf5, 0x05\n\t"
+                     ".long pfn_real_mask - 4 - .",
+                     X86_FEATURE_BMI2,
+                     ASM_OUTPUT2("=a" (pfn), "+D" (pdx)), "m" (pfn_real_mask)
+                     : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
+#endif
+
+    return pfn;
+}
+
+static always_inline unsigned long pfn_to_pdx(unsigned long pfn)
+{
+    unsigned long pdx;
+
+#ifdef HAVE_GAS_QUOTED_EXPR_SYM
+#define SYMNAME(pfx...) "\"" #pfx "pfn2pdx_%[pdx]_%[pfn]\""
+    alternative_io_2("call " SYMNAME() "; " ASM_NOP4 "\t"
+                     LINKONCE_PROLOGUE(SYMNAME)
+                     "mov %[tmask], %[pdx]\n\t"
+                     "mov %[shift], %%ecx\n\t"
+                     "and %[pfn], %[pdx]\n\t"
+                     "and %[bmask], %[pfn]\n\t"
+                     "shr %%cl, %[pdx]\n\t"
+                     "or %[pfn], %[pdx]\n\t"
+                     "ret\n\t"
+                     LINKONCE_EPILOGUE(SYMNAME),
+                     "call " SYMNAME(), X86_FEATURE_ALWAYS,
+                     "pext %[mask], %[pfn], %[pdx]", X86_FEATURE_BMI2,
+                     ASM_OUTPUT2([pdx] "=&r" (pdx), [pfn] "+r" (pfn)),
+                     [mask] "m" (pfn_real_mask),
+                     [shift] "m" (pfn_pdx_hole_shift),
+                     [bmask] "m" (pfn_pdx_bottom_mask),
+                     [tmask] "m" (pfn_top_mask)
+                     : "ecx");
+#undef SYMNAME
+#else
+    alternative_io_2("call pfn2pdx; " ASM_NOP4,
+                     "call pfn2pdx", X86_FEATURE_ALWAYS,
+                     /* pext pfn_real_mask(%rip), %rdi, %rax */
+                     ".byte 0xc4, 0xe2, 0xc2, 0xf5, 0x05\n\t"
+                     ".long pfn_real_mask - 4 - .",
+                     X86_FEATURE_BMI2,
+                     ASM_OUTPUT2("=a" (pdx), "+D" (pfn)), "m" (pfn_real_mask)
+                     : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
+#endif
+
+    return pdx;
+}
+
+#endif /* __ASM_ARM_PDX_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
--- a/xen/include/xen/pdx.h
+++ b/xen/include/xen/pdx.h
@@ -23,13 +23,13 @@ extern void set_pdx_range(unsigned long
 
 bool __mfn_valid(unsigned long mfn);
 
-static inline unsigned long pfn_to_pdx(unsigned long pfn)
+static inline unsigned long generic_pfn_to_pdx(unsigned long pfn)
 {
     return (pfn & pfn_pdx_bottom_mask) |
            ((pfn & pfn_top_mask) >> pfn_pdx_hole_shift);
 }
 
-static inline unsigned long pdx_to_pfn(unsigned long pdx)
+static inline unsigned long generic_pdx_to_pfn(unsigned long pdx)
 {
     return (pdx & pfn_pdx_bottom_mask) |
            ((pdx << pfn_pdx_hole_shift) & pfn_top_mask);
@@ -37,6 +37,8 @@ static inline unsigned long pdx_to_pfn(u
 
 extern void pfn_pdx_hole_setup(unsigned long);
 
+#include <asm/pdx.h>
+
 #endif /* HAS_PDX */
 #endif /* __XEN_PDX_H__ */
 



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 5/5] x86: use MOV for PFN/PDX conversion when possible
  2018-02-28 13:51 [PATCH 0/5] x86: improve PDX <-> PFN and alike translations Jan Beulich
                   ` (3 preceding siblings ...)
  2018-02-28 13:58 ` [PATCH 4/5] x86: use PDEP/PEXT for PFN/PDX " Jan Beulich
@ 2018-02-28 13:59 ` Jan Beulich
  2018-02-28 16:47 ` [PATCH 0/5] x86: improve PDX <-> PFN and alike translations Andrew Cooper
  5 siblings, 0 replies; 11+ messages in thread
From: Jan Beulich @ 2018-02-28 13:59 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

... and (of course) also maddr / direct-map-offset ones.

Most x86 systems don't actually require the use of PDX compression. Now
that we have patching for the conversion code in place anyway, extend it
to use simple MOV when possible. Introduce a new pseudo-CPU-feature to
key the patching off of.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
This patch will only apply cleanly on top of "x86: NOP out XPTI
entry/exit code when it's not in use".

--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1410,6 +1410,9 @@ void __init noreturn __start_xen(unsigne
 
     numa_initmem_init(0, raw_max_page);
 
+    if ( !pfn_pdx_hole_shift )
+        setup_force_cpu_cap(X86_FEATURE_PFN_PDX_IDENT);
+
     if ( max_page - 1 > virt_to_mfn(HYPERVISOR_VIRT_END - 1) )
     {
         unsigned long limit = virt_to_mfn(HYPERVISOR_VIRT_END - 1);
--- a/xen/include/asm-x86/cpufeatures.h
+++ b/xen/include/asm-x86/cpufeatures.h
@@ -31,3 +31,4 @@ XEN_CPUFEATURE(XEN_IBRS_CLEAR,  (FSCAPIN
 XEN_CPUFEATURE(RSB_NATIVE,      (FSCAPINTS+0)*32+18) /* RSB overwrite needed for native */
 XEN_CPUFEATURE(RSB_VMEXIT,      (FSCAPINTS+0)*32+19) /* RSB overwrite needed for vmexit */
 XEN_CPUFEATURE(NO_XPTI,         (FSCAPINTS+0)*32+20) /* XPTI mitigation not in use */
+XEN_CPUFEATURE(PFN_PDX_IDENT,   (FSCAPINTS+0)*32+21) /* PFN <-> PDX mapping is 1:1 */
--- a/xen/include/asm-x86/pdx.h
+++ b/xen/include/asm-x86/pdx.h
@@ -13,7 +13,7 @@ static always_inline unsigned long pdx_t
 
 #ifdef HAVE_GAS_QUOTED_EXPR_SYM
 #define SYMNAME(pfx...) "\"" #pfx "pdx2pfn_%[pfn]_%[pdx]\""
-    alternative_io_2("call " SYMNAME() "; " ASM_NOP4 "\t"
+    alternative_io_3("call " SYMNAME() "; " ASM_NOP4 "\t"
                      LINKONCE_PROLOGUE(SYMNAME)
                      "mov %[shift], %%ecx\n\t"
                      "mov %[pdx], %[pfn]\n\t"
@@ -25,6 +25,7 @@ static always_inline unsigned long pdx_t
                      LINKONCE_EPILOGUE(SYMNAME),
                      "call " SYMNAME(), X86_FEATURE_ALWAYS,
                      "pdep %[mask], %[pdx], %[pfn]", X86_FEATURE_BMI2,
+                     "mov %[pdx], %[pfn]", X86_FEATURE_PFN_PDX_IDENT,
                      ASM_OUTPUT2([pfn] "=&r" (pfn), [pdx] "+r" (pdx)),
                      [mask] "m" (pfn_real_mask),
                      [shift] "m" (pfn_pdx_hole_shift),
@@ -33,12 +34,13 @@ static always_inline unsigned long pdx_t
                      : "ecx");
 #undef SYMNAME
 #else
-    alternative_io_2("call pdx2pfn; " ASM_NOP4,
+    alternative_io_3("call pdx2pfn; " ASM_NOP4,
                      "call pdx2pfn", X86_FEATURE_ALWAYS,
                      /* pdep pfn_real_mask(%rip), %rdi, %rax */
                      ".byte 0xc4, 0xe2, 0xc3, 0xf5, 0x05\n\t"
                      ".long pfn_real_mask - 4 - .",
                      X86_FEATURE_BMI2,
+                     "mov %%rdi, %%rax", X86_FEATURE_PFN_PDX_IDENT,
                      ASM_OUTPUT2("=a" (pfn), "+D" (pdx)), "m" (pfn_real_mask)
                      : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
 #endif
@@ -52,7 +54,7 @@ static always_inline unsigned long pfn_t
 
 #ifdef HAVE_GAS_QUOTED_EXPR_SYM
 #define SYMNAME(pfx...) "\"" #pfx "pfn2pdx_%[pdx]_%[pfn]\""
-    alternative_io_2("call " SYMNAME() "; " ASM_NOP4 "\t"
+    alternative_io_3("call " SYMNAME() "; " ASM_NOP4 "\t"
                      LINKONCE_PROLOGUE(SYMNAME)
                      "mov %[tmask], %[pdx]\n\t"
                      "mov %[shift], %%ecx\n\t"
@@ -64,6 +66,7 @@ static always_inline unsigned long pfn_t
                      LINKONCE_EPILOGUE(SYMNAME),
                      "call " SYMNAME(), X86_FEATURE_ALWAYS,
                      "pext %[mask], %[pfn], %[pdx]", X86_FEATURE_BMI2,
+                     "mov %[pfn], %[pdx]", X86_FEATURE_PFN_PDX_IDENT,
                      ASM_OUTPUT2([pdx] "=&r" (pdx), [pfn] "+r" (pfn)),
                      [mask] "m" (pfn_real_mask),
                      [shift] "m" (pfn_pdx_hole_shift),
@@ -72,12 +75,13 @@ static always_inline unsigned long pfn_t
                      : "ecx");
 #undef SYMNAME
 #else
-    alternative_io_2("call pfn2pdx; " ASM_NOP4,
+    alternative_io_3("call pfn2pdx; " ASM_NOP4,
                      "call pfn2pdx", X86_FEATURE_ALWAYS,
                      /* pext pfn_real_mask(%rip), %rdi, %rax */
                      ".byte 0xc4, 0xe2, 0xc2, 0xf5, 0x05\n\t"
                      ".long pfn_real_mask - 4 - .",
                      X86_FEATURE_BMI2,
+                     "mov %%rdi, %%rax", X86_FEATURE_PFN_PDX_IDENT,
                      ASM_OUTPUT2("=a" (pdx), "+D" (pfn)), "m" (pfn_real_mask)
                      : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
 #endif
--- a/xen/include/asm-x86/x86_64/page.h
+++ b/xen/include/asm-x86/x86_64/page.h
@@ -75,7 +75,7 @@ static always_inline paddr_t __virt_to_m
 
 #ifdef HAVE_GAS_QUOTED_EXPR_SYM
 #define SYMNAME(pfx...) "\"" #pfx "do2ma_%[ma]_%[off]\""
-    alternative_io_2("call " SYMNAME() "; " ASM_NOP4 "\t"
+    alternative_io_3("call " SYMNAME() "; " ASM_NOP4 "\t"
                      LINKONCE_PROLOGUE(SYMNAME)
                      "mov %[shift], %%ecx\n\t"
                      "mov %[off], %[ma]\n\t"
@@ -87,6 +87,7 @@ static always_inline paddr_t __virt_to_m
                      LINKONCE_EPILOGUE(SYMNAME),
                      "call " SYMNAME(), X86_FEATURE_ALWAYS,
                      "pdep %[mask], %[off], %[ma]", X86_FEATURE_BMI2,
+                     "mov %[off], %[ma]", X86_FEATURE_PFN_PDX_IDENT,
                      ASM_OUTPUT2([ma] "=&r" (ma), [off] "+r" (va)),
                      [mask] "m" (ma_real_mask),
                      [shift] "m" (pfn_pdx_hole_shift),
@@ -95,12 +96,13 @@ static always_inline paddr_t __virt_to_m
                      : "ecx");
 #undef SYMNAME
 #else
-    alternative_io_2("call do2ma; " ASM_NOP4,
+    alternative_io_3("call do2ma; " ASM_NOP4,
                      "call do2ma", X86_FEATURE_ALWAYS,
                      /* pdep ma_real_mask(%rip), %rdi, %rax */
                      ".byte 0xc4, 0xe2, 0xc3, 0xf5, 0x05\n\t"
                      ".long ma_real_mask - 4 - .",
                      X86_FEATURE_BMI2,
+                     "mov %%rdi, %%rax", X86_FEATURE_PFN_PDX_IDENT,
                      ASM_OUTPUT2("=a" (ma), "+D" (va)), "m" (ma_real_mask)
                      : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
 #endif
@@ -116,7 +118,7 @@ static always_inline void *__maddr_to_vi
 
 #ifdef HAVE_GAS_QUOTED_EXPR_SYM
 #define SYMNAME(pfx...) "\"" #pfx "ma2do_%[off]_%[ma]\""
-    alternative_io_2("call " SYMNAME() "; " ASM_NOP4 "\t"
+    alternative_io_3("call " SYMNAME() "; " ASM_NOP4 "\t"
                      LINKONCE_PROLOGUE(SYMNAME)
                      "mov %[tmask], %[off]\n\t"
                      "mov %[shift], %%ecx\n\t"
@@ -128,6 +130,7 @@ static always_inline void *__maddr_to_vi
                      LINKONCE_EPILOGUE(SYMNAME),
                      "call " SYMNAME(), X86_FEATURE_ALWAYS,
                      "pext %[mask], %[ma], %[off]", X86_FEATURE_BMI2,
+                     "mov %[ma], %[off]", X86_FEATURE_PFN_PDX_IDENT,
                      ASM_OUTPUT2([off] "=&r" (off), [ma] "+r" (ma)),
                      [mask] "m" (ma_real_mask),
                      [shift] "m" (pfn_pdx_hole_shift),
@@ -136,12 +139,13 @@ static always_inline void *__maddr_to_vi
                      : "ecx");
 #undef SYMNAME
 #else
-    alternative_io_2("call ma2do; " ASM_NOP4,
+    alternative_io_3("call ma2do; " ASM_NOP4,
                      "call ma2do", X86_FEATURE_ALWAYS,
                      /* pext ma_real_mask(%rip), %rdi, %rax */
                      ".byte 0xc4, 0xe2, 0xc2, 0xf5, 0x05\n\t"
                      ".long ma_real_mask - 4 - .",
                      X86_FEATURE_BMI2,
+                     "mov %%rdi, %%rax", X86_FEATURE_PFN_PDX_IDENT,
                      ASM_OUTPUT2("=a" (off), "+D" (ma)), "m" (ma_real_mask)
                      : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
 #endif



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/5] x86: use PDEP/PEXT for PFN/PDX conversion when available
  2018-02-28 13:58 ` [PATCH 4/5] x86: use PDEP/PEXT for PFN/PDX " Jan Beulich
@ 2018-02-28 14:35   ` Jan Beulich
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Beulich @ 2018-02-28 14:35 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Julien Grall, Stefano Stabellini

I'm sorry for the bogus "Re:" in the original subject; not sure how that
happened.

Jan

>>> On 28.02.18 at 14:58, <JBeulich@suse.com> wrote:
> Both replace 6 instructions by a single one, further reducing code size,
> cache, and TLB footprint (in particular on systems supporting BMI2).
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -394,6 +394,7 @@ void __init arch_init_memory(void)
>  const intpte_t pte_flags_mask = ~(PADDR_MASK & PAGE_MASK);
>  
>  paddr_t __read_mostly ma_real_mask = ~0UL;
> +unsigned long __read_mostly pfn_real_mask = ~0UL;
>  
>  #ifndef HAVE_GAS_QUOTED_EXPR_SYM
>  intpte_t put_pte_flags_v(unsigned int flags)
> @@ -413,6 +414,17 @@ unsigned long ma2do(paddr_t ma)
>      return (ma & ma_va_bottom_mask) |
>             ((ma & ma_top_mask) >> pfn_pdx_hole_shift);
>  }
> +
> +/* Conversion between PDX and PFN. */
> +unsigned long pdx2pfn(unsigned long pdx)
> +{
> +    return generic_pdx_to_pfn(pdx);
> +}
> +
> +unsigned long pfn2pdx(unsigned long pfn)
> +{
> +    return generic_pfn_to_pdx(pfn);
> +}
>  #endif
>  
>  int page_is_ram_type(unsigned long mfn, unsigned long mem_type)
> --- a/xen/arch/x86/srat.c
> +++ b/xen/arch/x86/srat.c
> @@ -448,6 +448,7 @@ void __init srat_parse_regions(u64 addr)
>  	pfn_pdx_hole_setup(mask >> PAGE_SHIFT);
>  
>  	ma_real_mask = ma_top_mask | ma_va_bottom_mask;
> +	pfn_real_mask = pfn_top_mask | pfn_pdx_bottom_mask;
>  }
>  
>  /* Use the information discovered above to actually set up the nodes. */
> --- /dev/null
> +++ b/xen/include/asm-arm/pdx.h
> @@ -0,0 +1,16 @@
> +#ifndef __ASM_ARM_PDX_H__
> +#define __ASM_ARM_PDX_H__
> +
> +#define pdx_to_pfn generic_pdx_to_pfn
> +#define pfn_to_pdx generic_pfn_to_pdx
> +
> +#endif /* __ASM_ARM_PDX_H__ */
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> --- /dev/null
> +++ b/xen/include/asm-x86/pdx.h
> @@ -0,0 +1,97 @@
> +#ifndef __ASM_ARM_PDX_H__
> +#define __ASM_ARM_PDX_H__
> +
> +#include <asm/alternative.h>
> +#include <asm/asm_defns.h>
> +#include <asm/cpufeature.h>
> +
> +extern unsigned long pfn_real_mask;
> +
> +static always_inline unsigned long pdx_to_pfn(unsigned long pdx)
> +{
> +    unsigned long pfn;
> +
> +#ifdef HAVE_GAS_QUOTED_EXPR_SYM
> +#define SYMNAME(pfx...) "\"" #pfx "pdx2pfn_%[pfn]_%[pdx]\""
> +    alternative_io_2("call " SYMNAME() "; " ASM_NOP4 "\t"
> +                     LINKONCE_PROLOGUE(SYMNAME)
> +                     "mov %[shift], %%ecx\n\t"
> +                     "mov %[pdx], %[pfn]\n\t"
> +                     "and %[bmask], %[pfn]\n\t"
> +                     "shl %%cl, %[pdx]\n\t"
> +                     "and %[tmask], %[pdx]\n\t"
> +                     "or %[pdx], %[pfn]\n\t"
> +                     "ret\n\t"
> +                     LINKONCE_EPILOGUE(SYMNAME),
> +                     "call " SYMNAME(), X86_FEATURE_ALWAYS,
> +                     "pdep %[mask], %[pdx], %[pfn]", X86_FEATURE_BMI2,
> +                     ASM_OUTPUT2([pfn] "=&r" (pfn), [pdx] "+r" (pdx)),
> +                     [mask] "m" (pfn_real_mask),
> +                     [shift] "m" (pfn_pdx_hole_shift),
> +                     [bmask] "m" (pfn_pdx_bottom_mask),
> +                     [tmask] "m" (pfn_top_mask)
> +                     : "ecx");
> +#undef SYMNAME
> +#else
> +    alternative_io_2("call pdx2pfn; " ASM_NOP4,
> +                     "call pdx2pfn", X86_FEATURE_ALWAYS,
> +                     /* pdep pfn_real_mask(%rip), %rdi, %rax */
> +                     ".byte 0xc4, 0xe2, 0xc3, 0xf5, 0x05\n\t"
> +                     ".long pfn_real_mask - 4 - .",
> +                     X86_FEATURE_BMI2,
> +                     ASM_OUTPUT2("=a" (pfn), "+D" (pdx)), "m" 
> (pfn_real_mask)
> +                     : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
> +#endif
> +
> +    return pfn;
> +}
> +
> +static always_inline unsigned long pfn_to_pdx(unsigned long pfn)
> +{
> +    unsigned long pdx;
> +
> +#ifdef HAVE_GAS_QUOTED_EXPR_SYM
> +#define SYMNAME(pfx...) "\"" #pfx "pfn2pdx_%[pdx]_%[pfn]\""
> +    alternative_io_2("call " SYMNAME() "; " ASM_NOP4 "\t"
> +                     LINKONCE_PROLOGUE(SYMNAME)
> +                     "mov %[tmask], %[pdx]\n\t"
> +                     "mov %[shift], %%ecx\n\t"
> +                     "and %[pfn], %[pdx]\n\t"
> +                     "and %[bmask], %[pfn]\n\t"
> +                     "shr %%cl, %[pdx]\n\t"
> +                     "or %[pfn], %[pdx]\n\t"
> +                     "ret\n\t"
> +                     LINKONCE_EPILOGUE(SYMNAME),
> +                     "call " SYMNAME(), X86_FEATURE_ALWAYS,
> +                     "pext %[mask], %[pfn], %[pdx]", X86_FEATURE_BMI2,
> +                     ASM_OUTPUT2([pdx] "=&r" (pdx), [pfn] "+r" (pfn)),
> +                     [mask] "m" (pfn_real_mask),
> +                     [shift] "m" (pfn_pdx_hole_shift),
> +                     [bmask] "m" (pfn_pdx_bottom_mask),
> +                     [tmask] "m" (pfn_top_mask)
> +                     : "ecx");
> +#undef SYMNAME
> +#else
> +    alternative_io_2("call pfn2pdx; " ASM_NOP4,
> +                     "call pfn2pdx", X86_FEATURE_ALWAYS,
> +                     /* pext pfn_real_mask(%rip), %rdi, %rax */
> +                     ".byte 0xc4, 0xe2, 0xc2, 0xf5, 0x05\n\t"
> +                     ".long pfn_real_mask - 4 - .",
> +                     X86_FEATURE_BMI2,
> +                     ASM_OUTPUT2("=a" (pdx), "+D" (pfn)), "m" 
> (pfn_real_mask)
> +                     : "rcx", "rdx", "rsi", "r8", "r9", "r10", "r11");
> +#endif
> +
> +    return pdx;
> +}
> +
> +#endif /* __ASM_ARM_PDX_H__ */
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> --- a/xen/include/xen/pdx.h
> +++ b/xen/include/xen/pdx.h
> @@ -23,13 +23,13 @@ extern void set_pdx_range(unsigned long
>  
>  bool __mfn_valid(unsigned long mfn);
>  
> -static inline unsigned long pfn_to_pdx(unsigned long pfn)
> +static inline unsigned long generic_pfn_to_pdx(unsigned long pfn)
>  {
>      return (pfn & pfn_pdx_bottom_mask) |
>             ((pfn & pfn_top_mask) >> pfn_pdx_hole_shift);
>  }
>  
> -static inline unsigned long pdx_to_pfn(unsigned long pdx)
> +static inline unsigned long generic_pdx_to_pfn(unsigned long pdx)
>  {
>      return (pdx & pfn_pdx_bottom_mask) |
>             ((pdx << pfn_pdx_hole_shift) & pfn_top_mask);
> @@ -37,6 +37,8 @@ static inline unsigned long pdx_to_pfn(u
>  
>  extern void pfn_pdx_hole_setup(unsigned long);
>  
> +#include <asm/pdx.h>
> +
>  #endif /* HAS_PDX */
>  #endif /* __XEN_PDX_H__ */
>  
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org 
> https://lists.xenproject.org/mailman/listinfo/xen-devel 



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/5] x86: improve PDX <-> PFN and alike translations
  2018-02-28 13:51 [PATCH 0/5] x86: improve PDX <-> PFN and alike translations Jan Beulich
                   ` (4 preceding siblings ...)
  2018-02-28 13:59 ` [PATCH 5/5] x86: use MOV for PFN/PDX conversion when possible Jan Beulich
@ 2018-02-28 16:47 ` Andrew Cooper
  2018-03-01  7:22   ` Jan Beulich
                     ` (2 more replies)
  5 siblings, 3 replies; 11+ messages in thread
From: Andrew Cooper @ 2018-02-28 16:47 UTC (permalink / raw)
  To: Jan Beulich, xen-devel

On 28/02/18 13:51, Jan Beulich wrote:
> 1: remove page.h and processor.h inclusion from asm_defns.h
> 2: use PDEP for PTE flags insertion when available
> 3: use PDEP/PEXT for maddr/direct-map-offset conversion when available
> 4: use PDEP/PEXT for PFN/PDX conversion when available
> 5: use MOV for PFN/PDX conversion when possible
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>

Ah - so this was the series you were on about which would have an
interesting time in combination with my nop autosizing.

Do you have performance numbers for these changes?  I can certainly see
the attraction of using BMI2 when available, but do the associated costs
on incompatible hardware worth it?  I'm thinking specifically of turning
all this inline bit manipulation into function calls?  (I genuinely
don't know the answer, and it might be entirely fine, but I'm concerned
about whether it may not be).

What generation of binutils do you expect this all to work with?

As for the pte flags, there is a much more simple approach which I've
considered investigating in the past, and I think warrants discussing here.

By switching 'unsigned int flags' to 'unsigned long flags', we avoid any
need for packing in the first place.  Being 64bit only these days, all
other PTE calculations are already 64bit operations, and the masks are
probably already available in GPRs at the use-sites.  I.e. I think the
use of 64bit flags will make better code than even this proposal.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/5] x86: improve PDX <-> PFN and alike translations
  2018-02-28 16:47 ` [PATCH 0/5] x86: improve PDX <-> PFN and alike translations Andrew Cooper
@ 2018-03-01  7:22   ` Jan Beulich
  2018-03-05  8:37   ` Jan Beulich
  2018-03-06  8:45   ` Jan Beulich
  2 siblings, 0 replies; 11+ messages in thread
From: Jan Beulich @ 2018-03-01  7:22 UTC (permalink / raw)
  To: andrew.cooper3; +Cc: xen-devel

>>> Andrew Cooper <andrew.cooper3@citrix.com> 02/28/18 6:26 PM >>>
>On 28/02/18 13:51, Jan Beulich wrote:
>> 1: remove page.h and processor.h inclusion from asm_defns.h
>> 2: use PDEP for PTE flags insertion when available
>> 3: use PDEP/PEXT for maddr/direct-map-offset conversion when available
>> 4: use PDEP/PEXT for PFN/PDX conversion when available
>> 5: use MOV for PFN/PDX conversion when possible
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
>Ah - so this was the series you were on about which would have an
>interesting time in combination with my nop autosizing.
>
>Do you have performance numbers for these changes?  I can certainly see
>the attraction of using BMI2 when available, but do the associated costs
>>on incompatible hardware worth it?  I'm thinking specifically of turning
>all this inline bit manipulation into function calls?  (I genuinely
>don't know the answer, and it might be entirely fine, but I'm concerned
>about whether it may not be).

To be honest, performance on older hardware is of secondary concern to
me here, BMI2 isn't all that new anymore. The primary concern is
performance on recent hardware (which certainly is being improved) and
the much improved readability of generated code (which is particularly
relevant when one needs to investigate issues in one of the bigger
functions involving such translations.

>What generation of binutils do you expect this all to work with?

The respective change (d02603dc20) was done in August 2015.

>As for the pte flags, there is a much more simple approach which I've
>considered investigating in the past, and I think warrants discussing here.
>
>By switching 'unsigned int flags' to 'unsigned long flags', we avoid any
>need for packing in the first place.  Being 64bit only these days, all
>other PTE calculations are already 64bit operations, and the masks are
>probably already available in GPRs at the use-sites.  I.e. I think the
>use of 64bit flags will make better code than even this proposal.

If that doesn't result in overly many extra REX prefixes and/or full 64-bit
constant loads, perhaps. But that would affect just one of the five patches
here anyway.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/5] x86: improve PDX <-> PFN and alike translations
  2018-02-28 16:47 ` [PATCH 0/5] x86: improve PDX <-> PFN and alike translations Andrew Cooper
  2018-03-01  7:22   ` Jan Beulich
@ 2018-03-05  8:37   ` Jan Beulich
  2018-03-06  8:45   ` Jan Beulich
  2 siblings, 0 replies; 11+ messages in thread
From: Jan Beulich @ 2018-03-05  8:37 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

>>> On 28.02.18 at 17:47, <andrew.cooper3@citrix.com> wrote:
> On 28/02/18 13:51, Jan Beulich wrote:
>> 1: remove page.h and processor.h inclusion from asm_defns.h
>> 2: use PDEP for PTE flags insertion when available
>> 3: use PDEP/PEXT for maddr/direct-map-offset conversion when available
>> 4: use PDEP/PEXT for PFN/PDX conversion when available
>> 5: use MOV for PFN/PDX conversion when possible
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>
> 
> Ah - so this was the series you were on about which would have an
> interesting time in combination with my nop autosizing.
> 
> Do you have performance numbers for these changes?  I can certainly see
> the attraction of using BMI2 when available, but do the associated costs
> on incompatible hardware worth it?  I'm thinking specifically of turning
> all this inline bit manipulation into function calls?  (I genuinely
> don't know the answer, and it might be entirely fine, but I'm concerned
> about whether it may not be).

Btw, before you voice any performance concerns for older
hardware, please take into consideration the last patch of the
series, which converts the CALL to MOV on virtually all
hardware (as mentioned on some older thread I'm not sure the
hardware/firmware that this PDX/PFN conversion was written
for has ever made it to any customers).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/5] x86: improve PDX <-> PFN and alike translations
  2018-02-28 16:47 ` [PATCH 0/5] x86: improve PDX <-> PFN and alike translations Andrew Cooper
  2018-03-01  7:22   ` Jan Beulich
  2018-03-05  8:37   ` Jan Beulich
@ 2018-03-06  8:45   ` Jan Beulich
  2 siblings, 0 replies; 11+ messages in thread
From: Jan Beulich @ 2018-03-06  8:45 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

>>> On 28.02.18 at 17:47, <andrew.cooper3@citrix.com> wrote:
> On 28/02/18 13:51, Jan Beulich wrote:
>> 1: remove page.h and processor.h inclusion from asm_defns.h
>> 2: use PDEP for PTE flags insertion when available
>> 3: use PDEP/PEXT for maddr/direct-map-offset conversion when available
>> 4: use PDEP/PEXT for PFN/PDX conversion when available
>> 5: use MOV for PFN/PDX conversion when possible
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>
> 
> Ah - so this was the series you were on about which would have an
> interesting time in combination with my nop autosizing.
> 
> Do you have performance numbers for these changes?  I can certainly see
> the attraction of using BMI2 when available, but do the associated costs
> on incompatible hardware worth it?  I'm thinking specifically of turning
> all this inline bit manipulation into function calls?  (I genuinely
> don't know the answer, and it might be entirely fine, but I'm concerned
> about whether it may not be).
> 
> What generation of binutils do you expect this all to work with?

So one question here is whether to make this independent of
binutils version, and instead make use of gcc's new V operand
modifier (which didn't exist yet back when I wrote this). Since
those indirect thunk patches are likely to be backported by
distros, I'd expect us to be able to use the more flexible
variant of the alternatives here in a wider set of cases if we
went that route.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-03-06  8:45 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-28 13:51 [PATCH 0/5] x86: improve PDX <-> PFN and alike translations Jan Beulich
2018-02-28 13:56 ` [PATCH 1/5] x86: remove page.h and processor.h inclusion from asm_defns.h Jan Beulich
2018-02-28 13:57 ` [PATCH 2/5] x86: use PDEP for PTE flags insertion when available Jan Beulich
2018-02-28 13:57 ` [PATCH 3/5] x86: use PDEP/PEXT for maddr/direct-map-offset conversion " Jan Beulich
2018-02-28 13:58 ` [PATCH 4/5] x86: use PDEP/PEXT for PFN/PDX " Jan Beulich
2018-02-28 14:35   ` Jan Beulich
2018-02-28 13:59 ` [PATCH 5/5] x86: use MOV for PFN/PDX conversion when possible Jan Beulich
2018-02-28 16:47 ` [PATCH 0/5] x86: improve PDX <-> PFN and alike translations Andrew Cooper
2018-03-01  7:22   ` Jan Beulich
2018-03-05  8:37   ` Jan Beulich
2018-03-06  8:45   ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.