All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RESEND v2 0/7] x86/alternatives: Support for automatic padding calculations
@ 2018-02-26 11:34 Andrew Cooper
  2018-02-26 11:34 ` [PATCH v2 1/7] x86/alt: Drop unused alternative infrastructure Andrew Cooper
                   ` (6 more replies)
  0 siblings, 7 replies; 26+ messages in thread
From: Andrew Cooper @ 2018-02-26 11:34 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

[ Resend properly numbered this time. ]

This is the end result of a lot of work I started during the Spectre/Meltdown
embargo window, and deferred because it was taking too long.  It finally
resolves the explict padding calculations for the SPEC_CTRL alternatives.

This series depends on Roger's "x86/clang: allow integrated assembler usage"
and has been tested with a mainline version of clang.

http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/alternatives-v2

Andrew Cooper (7):
  x86/alt: Drop unused alternative infrastructure
  x86/alt: Clean up struct alt_instr and its users
  x86/alt: Clean up the assembly used to generate alternatives
  x86/asm: Remove opencoded uses of altinstruction_entry
  x86/alt: Support for automatic padding calculations
  x86/alt: Drop explicit padding of origin sites
  x86/build: Use new .nop directive when available

 xen/arch/x86/Rules.mk                 |   8 ++
 xen/arch/x86/alternative.c            |  48 ++++++++---
 xen/arch/x86/x86_64/compat/entry.S    |  26 +++---
 xen/arch/x86/x86_64/entry.S           |  20 +----
 xen/include/asm-x86/alternative-asm.h | 113 +++++++++++++++++++-------
 xen/include/asm-x86/alternative.h     | 148 +++++++++++++++++++---------------
 xen/include/asm-x86/asm_defns.h       |  32 +++-----
 xen/include/asm-x86/nops.h            |   7 --
 xen/include/asm-x86/spec_ctrl_asm.h   |  19 ++---
 9 files changed, 244 insertions(+), 177 deletions(-)

-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 1/7] x86/alt: Drop unused alternative infrastructure
  2018-02-26 11:34 [PATCH RESEND v2 0/7] x86/alternatives: Support for automatic padding calculations Andrew Cooper
@ 2018-02-26 11:34 ` Andrew Cooper
  2018-02-26 11:34 ` [PATCH v2 2/7] x86/alt: Clean up struct alt_instr and its users Andrew Cooper
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: Andrew Cooper @ 2018-02-26 11:34 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

ALTERNATIVE_3 is more complicated than ALTERNATIVE_2 when it comes to
calculating extra padding length, and we have no need for the complexity.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
v2:
 * Retain ASM_OUTPUT2()
---
 xen/include/asm-x86/alternative.h | 22 ----------------------
 1 file changed, 22 deletions(-)

diff --git a/xen/include/asm-x86/alternative.h b/xen/include/asm-x86/alternative.h
index ba537d6..325a29f 100644
--- a/xen/include/asm-x86/alternative.h
+++ b/xen/include/asm-x86/alternative.h
@@ -65,11 +65,6 @@ extern void alternative_instructions(void);
 	ALTERNATIVE(oldinstr, newinstr1, feature1)			  \
 	ALTERNATIVE_N(newinstr2, feature2, 2)
 
-#define ALTERNATIVE_3(oldinstr, newinstr1, feature1, newinstr2, feature2, \
-		      newinstr3, feature3)				  \
-	ALTERNATIVE_2(oldinstr, newinstr1, feature1, newinstr2, feature2) \
-	ALTERNATIVE_N(newinstr3, feature3, 3)
-
 /*
  * Alternative instructions for different CPU types or capabilities.
  *
@@ -118,23 +113,6 @@ extern void alternative_instructions(void);
 				   newinstr2, feature2)			\
 		     : output : input)
 
-/*
- * This is similar to alternative_io. But it has three features and
- * respective instructions.
- *
- * If CPU has feature3, newinstr3 is used.
- * Otherwise, if CPU has feature2, newinstr2 is used.
- * Otherwise, if CPU has feature1, newinstr1 is used.
- * Otherwise, oldinstr is used.
- */
-#define alternative_io_3(oldinstr, newinstr1, feature1, newinstr2,	\
-			 feature2, newinstr3, feature3, output,		\
-			 input...)					\
-	asm volatile(ALTERNATIVE_3(oldinstr, newinstr1, feature1,	\
-				   newinstr2, feature2, newinstr3,	\
-				   feature3)				\
-		     : output : input)
-
 /* Use this macro(s) if you need more than one output parameter. */
 #define ASM_OUTPUT2(a...) a
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 2/7] x86/alt: Clean up struct alt_instr and its users
  2018-02-26 11:34 [PATCH RESEND v2 0/7] x86/alternatives: Support for automatic padding calculations Andrew Cooper
  2018-02-26 11:34 ` [PATCH v2 1/7] x86/alt: Drop unused alternative infrastructure Andrew Cooper
@ 2018-02-26 11:34 ` Andrew Cooper
  2018-02-26 11:35 ` [PATCH v2 3/7] x86/alt: Clean up the assembly used to generate alternatives Andrew Cooper
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: Andrew Cooper @ 2018-02-26 11:34 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

 * Rename some fields for consistency and clarity, and use standard types.
 * Don't opencode the use of ALT_{ORIG,REPL}_PTR().

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
v2:
 * Change more types to standard ones
---
 xen/arch/x86/alternative.c        | 24 ++++++++++++------------
 xen/include/asm-x86/alternative.h | 14 +++++++-------
 2 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/alternative.c b/xen/arch/x86/alternative.c
index 5c8b6f6..51ca53e 100644
--- a/xen/arch/x86/alternative.c
+++ b/xen/arch/x86/alternative.c
@@ -163,8 +163,6 @@ void init_or_livepatch apply_alternatives(const struct alt_instr *start,
                                           const struct alt_instr *end)
 {
     const struct alt_instr *a;
-    u8 *instr, *replacement;
-    u8 insnbuf[MAX_PATCH_LEN];
 
     printk(KERN_INFO "alt table %p -> %p\n", start, end);
 
@@ -179,23 +177,25 @@ void init_or_livepatch apply_alternatives(const struct alt_instr *start,
      */
     for ( a = start; a < end; a++ )
     {
-        instr = (u8 *)&a->instr_offset + a->instr_offset;
-        replacement = (u8 *)&a->repl_offset + a->repl_offset;
-        BUG_ON(a->replacementlen > a->instrlen);
-        BUG_ON(a->instrlen > sizeof(insnbuf));
+        uint8_t *orig = ALT_ORIG_PTR(a);
+        uint8_t *repl = ALT_REPL_PTR(a);
+        uint8_t buf[MAX_PATCH_LEN];
+
+        BUG_ON(a->repl_len > a->orig_len);
+        BUG_ON(a->orig_len > sizeof(buf));
         BUG_ON(a->cpuid >= NCAPINTS * 32);
+
         if ( !boot_cpu_has(a->cpuid) )
             continue;
 
-        memcpy(insnbuf, replacement, a->replacementlen);
+        memcpy(buf, repl, a->repl_len);
 
         /* 0xe8/0xe9 are relative branches; fix the offset. */
-        if ( a->replacementlen >= 5 && (*insnbuf & 0xfe) == 0xe8 )
-            *(s32 *)(insnbuf + 1) += replacement - instr;
+        if ( a->repl_len >= 5 && (*buf & 0xfe) == 0xe8 )
+            *(int32_t *)(buf + 1) += repl - orig;
 
-        add_nops(insnbuf + a->replacementlen,
-                 a->instrlen - a->replacementlen);
-        text_poke(instr, insnbuf, a->instrlen);
+        add_nops(buf + a->repl_len, a->orig_len - a->repl_len);
+        text_poke(orig, buf, a->orig_len);
     }
 }
 
diff --git a/xen/include/asm-x86/alternative.h b/xen/include/asm-x86/alternative.h
index 325a29f..d9706fd 100644
--- a/xen/include/asm-x86/alternative.h
+++ b/xen/include/asm-x86/alternative.h
@@ -9,15 +9,15 @@
 #include <xen/types.h>
 
 struct alt_instr {
-    s32 instr_offset;       /* original instruction */
-    s32 repl_offset;        /* offset to replacement instruction */
-    u16 cpuid;              /* cpuid bit set for replacement */
-    u8  instrlen;           /* length of original instruction */
-    u8  replacementlen;     /* length of new instruction, <= instrlen */
+    int32_t  orig_offset;   /* original instruction */
+    int32_t  repl_offset;   /* offset to replacement instruction */
+    uint16_t cpuid;         /* cpuid bit set for replacement */
+    uint8_t  orig_len;      /* length of original instruction */
+    uint8_t  repl_len;      /* length of new instruction, <= instrlen */
 };
 
-#define __ALT_PTR(a,f)      ((u8 *)((void *)&(a)->f + (a)->f))
-#define ALT_ORIG_PTR(a)     __ALT_PTR(a, instr_offset)
+#define __ALT_PTR(a,f)      ((uint8_t *)((void *)&(a)->f + (a)->f))
+#define ALT_ORIG_PTR(a)     __ALT_PTR(a, orig_offset)
 #define ALT_REPL_PTR(a)     __ALT_PTR(a, repl_offset)
 
 extern void add_nops(void *insns, unsigned int len);
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 3/7] x86/alt: Clean up the assembly used to generate alternatives
  2018-02-26 11:34 [PATCH RESEND v2 0/7] x86/alternatives: Support for automatic padding calculations Andrew Cooper
  2018-02-26 11:34 ` [PATCH v2 1/7] x86/alt: Drop unused alternative infrastructure Andrew Cooper
  2018-02-26 11:34 ` [PATCH v2 2/7] x86/alt: Clean up struct alt_instr and its users Andrew Cooper
@ 2018-02-26 11:35 ` Andrew Cooper
  2018-02-26 14:09   ` Roger Pau Monné
                     ` (2 more replies)
  2018-02-26 11:35 ` [PATCH v2 4/7] x86/asm: Remove opencoded uses of altinstruction_entry Andrew Cooper
                   ` (3 subsequent siblings)
  6 siblings, 3 replies; 26+ messages in thread
From: Andrew Cooper @ 2018-02-26 11:35 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, Jan Beulich

 * On the C side, switch to using local lables rather than hardcoded numbers.
 * Rename parameters and lables to be consistent with alt_instr names, and
   consistent between the the C and asm versions.
 * On the asm side, factor some expressions out into macros to aid clarity.
 * Consistently declare section attributes.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>

v2:
 * repl_{s,e} => alt_repl_{s,e}
 * Use .LXEN%= prefix for the C lables
---
 xen/include/asm-x86/alternative-asm.h | 57 ++++++++++++++++--------------
 xen/include/asm-x86/alternative.h     | 65 +++++++++++++++++++----------------
 2 files changed, 67 insertions(+), 55 deletions(-)

diff --git a/xen/include/asm-x86/alternative-asm.h b/xen/include/asm-x86/alternative-asm.h
index 6640e85..150bd1a 100644
--- a/xen/include/asm-x86/alternative-asm.h
+++ b/xen/include/asm-x86/alternative-asm.h
@@ -9,60 +9,67 @@
  * enough information for the alternatives patching code to patch an
  * instruction. See apply_alternatives().
  */
-.macro altinstruction_entry orig alt feature orig_len alt_len
+.macro altinstruction_entry orig repl feature orig_len repl_len
     .long \orig - .
-    .long \alt - .
+    .long \repl - .
     .word \feature
     .byte \orig_len
-    .byte \alt_len
+    .byte \repl_len
 .endm
 
+#define orig_len               (.L\@_orig_e       -     .L\@_orig_s)
+#define repl_len(nr)           (.L\@_repl_e\()nr  -     .L\@_repl_s\()nr)
+#define decl_repl(insn, nr)     .L\@_repl_s\()nr: insn; .L\@_repl_e\()nr:
+
 .macro ALTERNATIVE oldinstr, newinstr, feature
-.Lold_start_\@:
+.L\@_orig_s:
     \oldinstr
-.Lold_end_\@:
+.L\@_orig_e:
 
     .pushsection .altinstructions, "a", @progbits
-    altinstruction_entry .Lold_start_\@, .Lnew_start_\@, \feature, \
-        (.Lold_end_\@ - .Lold_start_\@), (.Lnew_end_\@ - .Lnew_start_\@)
+    altinstruction_entry .L\@_orig_s, .L\@_repl_s1, \feature, \
+        orig_len, repl_len(1)
 
     .section .discard, "a", @progbits
     /* Assembler-time check that \newinstr isn't longer than \oldinstr. */
-    .byte 0xff + (.Lnew_end_\@ - .Lnew_start_\@) - (.Lold_end_\@ - .Lold_start_\@)
+    .byte 0xff + repl_len(1) - orig_len
 
     .section .altinstr_replacement, "ax", @progbits
-.Lnew_start_\@:
-    \newinstr
-.Lnew_end_\@:
+
+    decl_repl(\newinstr, 1)
+
     .popsection
 .endm
 
 .macro ALTERNATIVE_2 oldinstr, newinstr1, feature1, newinstr2, feature2
-.Lold_start_\@:
+.L\@_orig_s:
     \oldinstr
-.Lold_end_\@:
+.L\@_orig_e:
 
     .pushsection .altinstructions, "a", @progbits
-    altinstruction_entry .Lold_start_\@, .Lnew1_start_\@, \feature1, \
-        (.Lold_end_\@ - .Lold_start_\@), (.Lnew1_end_\@ - .Lnew1_start_\@)
-    altinstruction_entry .Lold_start_\@, .Lnew2_start_\@, \feature2, \
-        (.Lold_end_\@ - .Lold_start_\@), (.Lnew2_end_\@ - .Lnew2_start_\@)
+
+    altinstruction_entry .L\@_orig_s, .L\@_repl_s1, \feature1, \
+        orig_len, repl_len(1)
+    altinstruction_entry .L\@_orig_s, .L\@_repl_s2, \feature2, \
+        orig_len, repl_len(2)
 
     .section .discard, "a", @progbits
     /* Assembler-time check that \newinstr{1,2} aren't longer than \oldinstr. */
-    .byte 0xff + (.Lnew1_end_\@ - .Lnew1_start_\@) - (.Lold_end_\@ - .Lold_start_\@)
-    .byte 0xff + (.Lnew2_end_\@ - .Lnew2_start_\@) - (.Lold_end_\@ - .Lold_start_\@)
+    .byte 0xff + repl_len(1) - orig_len
+    .byte 0xff + repl_len(2) - orig_len
 
     .section .altinstr_replacement, "ax", @progbits
-.Lnew1_start_\@:
-    \newinstr1
-.Lnew1_end_\@:
-.Lnew2_start_\@:
-    \newinstr2
-.Lnew2_end_\@:
+
+    decl_repl(\newinstr1, 1)
+    decl_repl(\newinstr2, 2)
+
     .popsection
 .endm
 
+#undef decl_repl
+#undef repl_len
+#undef orig_len
+
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_X86_ALTERNATIVE_ASM_H_ */
 
diff --git a/xen/include/asm-x86/alternative.h b/xen/include/asm-x86/alternative.h
index d9706fd..bcad3ee 100644
--- a/xen/include/asm-x86/alternative.h
+++ b/xen/include/asm-x86/alternative.h
@@ -26,44 +26,49 @@ extern void apply_alternatives(const struct alt_instr *start,
                                const struct alt_instr *end);
 extern void alternative_instructions(void);
 
-#define OLDINSTR(oldinstr)      "661:\n\t" oldinstr "\n662:\n"
+#define OLDINSTR(oldinstr) ".LXEN%=_orig_s:\n\t" oldinstr "\n.LXEN%=_orig_e:\n"
 
-#define b_replacement(number)   "663"#number
-#define e_replacement(number)   "664"#number
+#define alt_orig_len       "(.LXEN%=_orig_e - .LXEN%=_orig_s)"
+#define alt_repl_s(num)    ".LXEN%=_repl_s"#num
+#define alt_repl_e(num)    ".LXEN%=_repl_e"#num
+#define alt_repl_len(num)  "(" alt_repl_e(num) " - " alt_repl_s(num) ")"
 
-#define alt_slen "662b-661b"
-#define alt_rlen(number) e_replacement(number)"f-"b_replacement(number)"f"
+#define ALTINSTR_ENTRY(feature, num)                                    \
+        " .long .LXEN%=_orig_s - .\n"             /* label           */ \
+        " .long " alt_repl_s(num)" - .\n"         /* new instruction */ \
+        " .word " __stringify(feature) "\n"       /* feature bit     */ \
+        " .byte " alt_orig_len "\n"               /* source len      */ \
+        " .byte " alt_repl_len(num) "\n"          /* replacement len */
 
-#define ALTINSTR_ENTRY(feature, number)                                       \
-        " .long 661b - .\n"                             /* label           */ \
-        " .long " b_replacement(number)"f - .\n"        /* new instruction */ \
-        " .word " __stringify(feature) "\n"             /* feature bit     */ \
-        " .byte " alt_slen "\n"                         /* source len      */ \
-        " .byte " alt_rlen(number) "\n"                 /* replacement len */
+#define DISCARD_ENTRY(num)                        /* repl <= orig */    \
+        " .byte 0xff + (" alt_repl_len(num) ") - (" alt_orig_len ")\n"
 
-#define DISCARD_ENTRY(number)                           /* rlen <= slen */    \
-        " .byte 0xff + (" alt_rlen(number) ") - (" alt_slen ")\n"
-
-#define ALTINSTR_REPLACEMENT(newinstr, feature, number) /* replacement */     \
-        b_replacement(number)":\n\t" newinstr "\n" e_replacement(number) ":\n\t"
-
-#define ALTERNATIVE_N(newinstr, feature, number)	\
-	".pushsection .altinstructions,\"a\"\n"		\
-	ALTINSTR_ENTRY(feature, number)			\
-	".section .discard,\"a\",@progbits\n"		\
-	DISCARD_ENTRY(number)				\
-	".section .altinstr_replacement, \"ax\"\n"	\
-	ALTINSTR_REPLACEMENT(newinstr, feature, number)	\
-	".popsection\n"
+#define ALTINSTR_REPLACEMENT(newinstr, num)       /* replacement */     \
+        alt_repl_s(num)":\n\t" newinstr "\n" alt_repl_e(num) ":\n\t"
 
 /* alternative assembly primitive: */
-#define ALTERNATIVE(oldinstr, newinstr, feature)			  \
-	OLDINSTR(oldinstr)						  \
-	ALTERNATIVE_N(newinstr, feature, 1)
+#define ALTERNATIVE(oldinstr, newinstr, feature)                        \
+        OLDINSTR(oldinstr)                                              \
+        ".pushsection .altinstructions, \"a\", @progbits\n"             \
+        ALTINSTR_ENTRY(feature, 1)                                      \
+        ".section .discard, \"a\", @progbits\n"                         \
+        DISCARD_ENTRY(1)                                                \
+        ".section .altinstr_replacement, \"ax\", @progbits\n"           \
+        ALTINSTR_REPLACEMENT(newinstr, 1)                               \
+        ".popsection\n"
 
 #define ALTERNATIVE_2(oldinstr, newinstr1, feature1, newinstr2, feature2) \
-	ALTERNATIVE(oldinstr, newinstr1, feature1)			  \
-	ALTERNATIVE_N(newinstr2, feature2, 2)
+        OLDINSTR(oldinstr)                                              \
+        ".pushsection .altinstructions, \"a\", @progbits\n"             \
+        ALTINSTR_ENTRY(feature1, 1)                                     \
+        ALTINSTR_ENTRY(feature2, 2)                                     \
+        ".section .discard, \"a\", @progbits\n"                         \
+        DISCARD_ENTRY(1)                                                \
+        DISCARD_ENTRY(2)                                                \
+        ".section .altinstr_replacement, \"ax\", @progbits\n"           \
+        ALTINSTR_REPLACEMENT(newinstr1, 1)                              \
+        ALTINSTR_REPLACEMENT(newinstr2, 2)                              \
+        ".popsection\n"
 
 /*
  * Alternative instructions for different CPU types or capabilities.
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 4/7] x86/asm: Remove opencoded uses of altinstruction_entry
  2018-02-26 11:34 [PATCH RESEND v2 0/7] x86/alternatives: Support for automatic padding calculations Andrew Cooper
                   ` (2 preceding siblings ...)
  2018-02-26 11:35 ` [PATCH v2 3/7] x86/alt: Clean up the assembly used to generate alternatives Andrew Cooper
@ 2018-02-26 11:35 ` Andrew Cooper
  2018-02-26 11:35 ` [PATCH v2 5/7] x86/alt: Support for automatic padding calculations Andrew Cooper
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: Andrew Cooper @ 2018-02-26 11:35 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

With future changes, altinstruction_entry is going to become more complicated
to use.  Furthermore, there are already ALTERNATIVE* macros which can be used
to avoid opencoding the creation of replacement information.

For ASM_STAC, ASM_CLAC and CR4_PV32_RESTORE, this means the removal of all
hardocded label numbers.  For the cr4_pv32 alternatives, this means hardcoding
the extra space required in the original patch site, but the hardcoding will
be removed by a later patch.

No change to any functionality, but the handling of nops inside the original
patch sites are a bit different.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/x86_64/compat/entry.S | 26 +++++++++-----------------
 xen/arch/x86/x86_64/entry.S        | 20 +++-----------------
 xen/include/asm-x86/asm_defns.h    | 32 +++++++++++---------------------
 3 files changed, 23 insertions(+), 55 deletions(-)

diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
index 458d810..8aba269 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -111,13 +111,10 @@ ENTRY(compat_restore_all_guest)
         ASSERT_INTERRUPTS_DISABLED
         mov   $~(X86_EFLAGS_IOPL|X86_EFLAGS_NT|X86_EFLAGS_VM),%r11d
         and   UREGS_eflags(%rsp),%r11d
-.Lcr4_orig:
-        .skip .Lcr4_alt_end - .Lcr4_alt, 0x90
-.Lcr4_orig_end:
-        .pushsection .altinstr_replacement, "ax"
-.Lcr4_alt:
+
+.macro alt_cr4_pv32
         testb $3,UREGS_cs(%rsp)
-        jpe   .Lcr4_alt_end
+        jpe   2f
         mov   CPUINFO_cr4-CPUINFO_guest_cpu_user_regs(%rsp), %rax
         and   $~XEN_CR4_PV32_BITS, %rax
 1:
@@ -135,17 +132,12 @@ ENTRY(compat_restore_all_guest)
          */
         cmp   %rax, CPUINFO_cr4-CPUINFO_guest_cpu_user_regs(%rsp)
         jne   1b
-.Lcr4_alt_end:
-        .section .altinstructions, "a"
-        altinstruction_entry .Lcr4_orig, .Lcr4_orig, X86_FEATURE_ALWAYS, \
-                             (.Lcr4_orig_end - .Lcr4_orig), 0
-        altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_XEN_SMEP, \
-                             (.Lcr4_orig_end - .Lcr4_orig), \
-                             (.Lcr4_alt_end - .Lcr4_alt)
-        altinstruction_entry .Lcr4_orig, .Lcr4_alt, X86_FEATURE_XEN_SMAP, \
-                             (.Lcr4_orig_end - .Lcr4_orig), \
-                             (.Lcr4_alt_end - .Lcr4_alt)
-        .popsection
+2:
+.endm
+	ALTERNATIVE_2 ".skip 45, 0x90", \
+            alt_cr4_pv32, X86_FEATURE_XEN_SMEP, \
+            alt_cr4_pv32, X86_FEATURE_XEN_SMAP
+
         or    $X86_EFLAGS_IF,%r11
         mov   %r11d,UREGS_eflags(%rsp)
 
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 941f06f..e939f20 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -564,23 +564,9 @@ handle_exception_saved:
         testb $X86_EFLAGS_IF>>8,UREGS_eflags+1(%rsp)
         jz    exception_with_ints_disabled
 
-.Lcr4_pv32_orig:
-        jmp   .Lcr4_pv32_done
-        .skip (.Lcr4_pv32_alt_end - .Lcr4_pv32_alt) - (. - .Lcr4_pv32_orig), 0xcc
-        .pushsection .altinstr_replacement, "ax"
-.Lcr4_pv32_alt:
-        mov   VCPU_domain(%rbx),%rax
-.Lcr4_pv32_alt_end:
-        .section .altinstructions, "a"
-        altinstruction_entry .Lcr4_pv32_orig, .Lcr4_pv32_alt, \
-                             X86_FEATURE_XEN_SMEP, \
-                             (.Lcr4_pv32_alt_end - .Lcr4_pv32_alt), \
-                             (.Lcr4_pv32_alt_end - .Lcr4_pv32_alt)
-        altinstruction_entry .Lcr4_pv32_orig, .Lcr4_pv32_alt, \
-                             X86_FEATURE_XEN_SMAP, \
-                             (.Lcr4_pv32_alt_end - .Lcr4_pv32_alt), \
-                             (.Lcr4_pv32_alt_end - .Lcr4_pv32_alt)
-        .popsection
+        ALTERNATIVE_2 "jmp .Lcr4_pv32_done; .skip 2, 0x90", \
+            __stringify(mov VCPU_domain(%rbx), %rax), X86_FEATURE_XEN_SMEP, \
+            __stringify(mov VCPU_domain(%rbx), %rax), X86_FEATURE_XEN_SMAP
 
         testb $3,UREGS_cs(%rsp)
         jz    .Lcr4_pv32_done
diff --git a/xen/include/asm-x86/asm_defns.h b/xen/include/asm-x86/asm_defns.h
index ebd2c88..a484265 100644
--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -195,18 +195,13 @@ void ret_from_intr(void);
 #define __ASM_STAC      .byte 0x0f,0x01,0xcb
 
 #ifdef __ASSEMBLY__
-#define ASM_AC(op)                                                     \
-        661: ASM_NOP3;                                                 \
-        .pushsection .altinstr_replacement, "ax";                      \
-        662: __ASM_##op;                                               \
-        .popsection;                                                   \
-        .pushsection .altinstructions, "a";                            \
-        altinstruction_entry 661b, 661b, X86_FEATURE_ALWAYS, 3, 0;     \
-        altinstruction_entry 661b, 662b, X86_FEATURE_XEN_SMAP, 3, 3;       \
-        .popsection
-
-#define ASM_STAC ASM_AC(STAC)
-#define ASM_CLAC ASM_AC(CLAC)
+#define ASM_STAC                                        \
+    ALTERNATIVE __stringify(ASM_NOP3),                  \
+        __stringify(__ASM_STAC), X86_FEATURE_XEN_SMAP
+
+#define ASM_CLAC                                        \
+    ALTERNATIVE __stringify(ASM_NOP3),                  \
+        __stringify(__ASM_CLAC), X86_FEATURE_XEN_SMAP
 
 .macro write_cr3 val:req, tmp1:req, tmp2:req
         mov   %cr4, %\tmp1
@@ -217,15 +212,10 @@ void ret_from_intr(void);
         mov   %\tmp2, %cr4
 .endm
 
-#define CR4_PV32_RESTORE                                           \
-        667: ASM_NOP5;                                             \
-        .pushsection .altinstr_replacement, "ax";                  \
-        668: call cr4_pv32_restore;                                \
-        .section .altinstructions, "a";                            \
-        altinstruction_entry 667b, 667b, X86_FEATURE_ALWAYS, 5, 0; \
-        altinstruction_entry 667b, 668b, X86_FEATURE_XEN_SMEP, 5, 5;   \
-        altinstruction_entry 667b, 668b, X86_FEATURE_XEN_SMAP, 5, 5;   \
-        .popsection
+#define CR4_PV32_RESTORE                                \
+    ALTERNATIVE_2 __stringify(ASM_NOP5),                \
+        "call cr4_pv32_restore", X86_FEATURE_XEN_SMEP,  \
+        "call cr4_pv32_restore", X86_FEATURE_XEN_SMAP
 
 #else
 static always_inline void clac(void)
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 5/7] x86/alt: Support for automatic padding calculations
  2018-02-26 11:34 [PATCH RESEND v2 0/7] x86/alternatives: Support for automatic padding calculations Andrew Cooper
                   ` (3 preceding siblings ...)
  2018-02-26 11:35 ` [PATCH v2 4/7] x86/asm: Remove opencoded uses of altinstruction_entry Andrew Cooper
@ 2018-02-26 11:35 ` Andrew Cooper
  2018-02-28 16:16   ` Jan Beulich
  2018-02-26 11:35 ` [PATCH v2 6/7] x86/alt: Drop explicit padding of origin sites Andrew Cooper
  2018-02-26 11:35 ` [PATCH v2 7/7] x86/build: Use new .nop directive when available Andrew Cooper
  6 siblings, 1 reply; 26+ messages in thread
From: Andrew Cooper @ 2018-02-26 11:35 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, Jan Beulich

The correct amount of padding in an origin patch site can be calculated
automatically, based on the relative lengths of the replacements.

This requires a bit of trickery to calculate correctly, especially in the
ALTENRATIVE_2 case where a branchless max() calculation in needed.  The
calculation is further complicated because GAS's idea of true is -1 rather
than 1, which is why the extra negations are required.

Additionally, have apply_alternatives() attempt to optimise the padding nops.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>

v2: Fix build with Clang.
---
 xen/arch/x86/Rules.mk                 |  4 +++
 xen/arch/x86/alternative.c            | 32 ++++++++++++++++---
 xen/include/asm-x86/alternative-asm.h | 60 +++++++++++++++++++++++++++++------
 xen/include/asm-x86/alternative.h     | 46 +++++++++++++++++++++------
 4 files changed, 120 insertions(+), 22 deletions(-)

diff --git a/xen/arch/x86/Rules.mk b/xen/arch/x86/Rules.mk
index 9897dea..e169d67 100644
--- a/xen/arch/x86/Rules.mk
+++ b/xen/arch/x86/Rules.mk
@@ -24,6 +24,10 @@ $(call as-option-add,CFLAGS,CC,".equ \"x\"$$(comma)1", \
                      -U__OBJECT_LABEL__ -DHAVE_GAS_QUOTED_SYM \
                      '-D__OBJECT_LABEL__=$(subst $(BASEDIR)/,,$(CURDIR))/$$@')
 
+# GCC's idea of true is -1.  Clang's idea is 1
+$(call as-option-add,CFLAGS,CC,\
+    ".if ((1 > 0) < 0); .error \"\";.endif",,-DHAVE_AS_NEGATIVE_TRUE)
+
 CFLAGS += -mno-red-zone -fpic -fno-asynchronous-unwind-tables
 
 # Xen doesn't use SSE interally.  If the compiler supports it, also skip the
diff --git a/xen/arch/x86/alternative.c b/xen/arch/x86/alternative.c
index 51ca53e..e24db84 100644
--- a/xen/arch/x86/alternative.c
+++ b/xen/arch/x86/alternative.c
@@ -180,13 +180,37 @@ void init_or_livepatch apply_alternatives(const struct alt_instr *start,
         uint8_t *orig = ALT_ORIG_PTR(a);
         uint8_t *repl = ALT_REPL_PTR(a);
         uint8_t buf[MAX_PATCH_LEN];
+        unsigned int total_len = a->orig_len + a->pad_len;
 
-        BUG_ON(a->repl_len > a->orig_len);
-        BUG_ON(a->orig_len > sizeof(buf));
+        BUG_ON(a->repl_len > total_len);
+        BUG_ON(total_len > sizeof(buf));
         BUG_ON(a->cpuid >= NCAPINTS * 32);
 
+        /* No replacement to make, but try to optimise any padding. */
         if ( !boot_cpu_has(a->cpuid) )
+        {
+            unsigned int i;
+
+            if ( a->pad_len <= 1 )
+                continue;
+
+            /* Search the padding area for any byte which isn't a nop. */
+            for ( i = a->orig_len; i < total_len; ++i )
+                if ( orig[i] != 0x90 )
+                    break;
+
+            /*
+             * Only make any changes if all padding bytes are unoptimised
+             * nops.  With multiple alternatives over the same origin site, we
+             * may have already made a replacement, or optimised the nops.
+             */
+            if ( i != total_len )
+                continue;
+
+            add_nops(buf, a->pad_len);
+            text_poke(orig + a->orig_len, buf, a->pad_len);
             continue;
+        }
 
         memcpy(buf, repl, a->repl_len);
 
@@ -194,8 +218,8 @@ void init_or_livepatch apply_alternatives(const struct alt_instr *start,
         if ( a->repl_len >= 5 && (*buf & 0xfe) == 0xe8 )
             *(int32_t *)(buf + 1) += repl - orig;
 
-        add_nops(buf + a->repl_len, a->orig_len - a->repl_len);
-        text_poke(orig, buf, a->orig_len);
+        add_nops(buf + a->repl_len, total_len - a->repl_len);
+        text_poke(orig, buf, total_len);
     }
 }
 
diff --git a/xen/include/asm-x86/alternative-asm.h b/xen/include/asm-x86/alternative-asm.h
index 150bd1a..25f79fe 100644
--- a/xen/include/asm-x86/alternative-asm.h
+++ b/xen/include/asm-x86/alternative-asm.h
@@ -9,30 +9,55 @@
  * enough information for the alternatives patching code to patch an
  * instruction. See apply_alternatives().
  */
-.macro altinstruction_entry orig repl feature orig_len repl_len
+.macro altinstruction_entry orig repl feature orig_len repl_len pad_len
     .long \orig - .
     .long \repl - .
     .word \feature
     .byte \orig_len
     .byte \repl_len
+    .byte \pad_len
 .endm
 
 #define orig_len               (.L\@_orig_e       -     .L\@_orig_s)
+#define pad_len                (.L\@_orig_p       -     .L\@_orig_e)
+#define total_len              (.L\@_orig_p       -     .L\@_orig_s)
 #define repl_len(nr)           (.L\@_repl_e\()nr  -     .L\@_repl_s\()nr)
 #define decl_repl(insn, nr)     .L\@_repl_s\()nr: insn; .L\@_repl_e\()nr:
 
+/* GCC's idea of true is -1, while Clang's idea is 1. */
+#ifdef HAVE_AS_NEGATIVE_TRUE
+# define as_true(x) (-(x))
+#else
+# define as_true(x) (x)
+#endif
+
+#define as_max(a, b)           ((a) ^ (((a) ^ (b)) & -as_true((a) < (b))))
+
 .macro ALTERNATIVE oldinstr, newinstr, feature
 .L\@_orig_s:
     \oldinstr
 .L\@_orig_e:
+    /*
+     * Calculate the difference in size between the replacement and original
+     * instructions, to derive how much padding to introduce.
+     */
+    .L\@_diff = repl_len(1) - orig_len
+
+    .skip as_true(.L\@_diff > 0) * .L\@_diff, 0x90
+.L\@_orig_p:
 
     .pushsection .altinstructions, "a", @progbits
     altinstruction_entry .L\@_orig_s, .L\@_repl_s1, \feature, \
-        orig_len, repl_len(1)
+        orig_len, repl_len(1), pad_len
 
     .section .discard, "a", @progbits
-    /* Assembler-time check that \newinstr isn't longer than \oldinstr. */
-    .byte 0xff + repl_len(1) - orig_len
+    /*
+     * Assembler-time checks:
+     *   - total_len <= 255
+     *   - \newinstr <= total_len
+     */
+    .byte total_len
+    .byte 0xff + repl_len(1) - total_len
 
     .section .altinstr_replacement, "ax", @progbits
 
@@ -45,18 +70,31 @@
 .L\@_orig_s:
     \oldinstr
 .L\@_orig_e:
+    /*
+     * Calculate the difference in size between the largest replacement and
+     * the original instructions, to derive how much padding to introduce.
+     */
+    .L\@_diff = as_max(repl_len(1), repl_len(2)) - orig_len
+
+     .skip as_true(.L\@_diff > 0) * .L\@_diff, 0x90
+.L\@_orig_p:
 
     .pushsection .altinstructions, "a", @progbits
 
     altinstruction_entry .L\@_orig_s, .L\@_repl_s1, \feature1, \
-        orig_len, repl_len(1)
+        orig_len, repl_len(1), pad_len
     altinstruction_entry .L\@_orig_s, .L\@_repl_s2, \feature2, \
-        orig_len, repl_len(2)
+        orig_len, repl_len(2), pad_len
 
     .section .discard, "a", @progbits
-    /* Assembler-time check that \newinstr{1,2} aren't longer than \oldinstr. */
-    .byte 0xff + repl_len(1) - orig_len
-    .byte 0xff + repl_len(2) - orig_len
+    /*
+     * Assembler-time checks:
+     *   - total_len <= 255
+     *   - \newinstr* <= total_len
+     */
+    .byte total_len
+    .byte 0xff + repl_len(1) - total_len
+    .byte 0xff + repl_len(2) - total_len
 
     .section .altinstr_replacement, "ax", @progbits
 
@@ -66,8 +104,12 @@
     .popsection
 .endm
 
+#undef as_max
+#undef as_true
 #undef decl_repl
 #undef repl_len
+#undef total_len
+#undef pad_len
 #undef orig_len
 
 #endif /* __ASSEMBLY__ */
diff --git a/xen/include/asm-x86/alternative.h b/xen/include/asm-x86/alternative.h
index bcad3ee..d53cea0 100644
--- a/xen/include/asm-x86/alternative.h
+++ b/xen/include/asm-x86/alternative.h
@@ -8,12 +8,13 @@
 #include <xen/stringify.h>
 #include <xen/types.h>
 
-struct alt_instr {
+struct __packed alt_instr {
     int32_t  orig_offset;   /* original instruction */
     int32_t  repl_offset;   /* offset to replacement instruction */
     uint16_t cpuid;         /* cpuid bit set for replacement */
     uint8_t  orig_len;      /* length of original instruction */
-    uint8_t  repl_len;      /* length of new instruction, <= instrlen */
+    uint8_t  repl_len;      /* length of new instruction */
+    uint8_t  pad_len;       /* length of build-time padding */
 };
 
 #define __ALT_PTR(a,f)      ((uint8_t *)((void *)&(a)->f + (a)->f))
@@ -26,43 +27,70 @@ extern void apply_alternatives(const struct alt_instr *start,
                                const struct alt_instr *end);
 extern void alternative_instructions(void);
 
-#define OLDINSTR(oldinstr) ".LXEN%=_orig_s:\n\t" oldinstr "\n.LXEN%=_orig_e:\n"
-
 #define alt_orig_len       "(.LXEN%=_orig_e - .LXEN%=_orig_s)"
+#define alt_pad_len        "(.LXEN%=_orig_p - .LXEN%=_orig_e)"
+#define alt_total_len      "(.LXEN%=_orig_p - .LXEN%=_orig_s)"
 #define alt_repl_s(num)    ".LXEN%=_repl_s"#num
 #define alt_repl_e(num)    ".LXEN%=_repl_e"#num
 #define alt_repl_len(num)  "(" alt_repl_e(num) " - " alt_repl_s(num) ")"
 
+/* GCC's idea of true is -1, while Clang's idea is 1. */
+#ifdef HAVE_AS_NEGATIVE_TRUE
+# define AS_TRUE "-"
+#else
+# define AS_TRUE ""
+#endif
+
+#define as_max(a, b) "(("a") ^ ((("a") ^ ("b")) & -("AS_TRUE"(("a") < ("b")))))"
+
+#define OLDINSTR_1(oldinstr, n1)                                 \
+    ".LXEN%=_orig_s:\n\t" oldinstr "\n .LXEN%=_orig_e:\n\t"      \
+    ".LXEN%=_diff = "alt_repl_len(n1)"-"alt_orig_len"\n\t"       \
+    ".skip "AS_TRUE"(.LXEN%=_diff > 0) * .LXEN%=_diff, 0x90\n\t" \
+    ".LXEN%=_orig_p:\n\t"
+
+#define ALT_PADDING_LEN(n1, n2) \
+    as_max((alt_repl_len(n1), alt_repl_len(n2))"-"alt_orig_len
+
+#define OLDINSTR_2(oldinstr, n1, n2)                             \
+    ".LXEN%=_orig_s:\n\t" oldinstr "\n .LXEN%=_orig_e:\n\t"      \
+    ".LXEN%=_diff = "ALT_PADDING_LEN(n1, n2)"\n\t"               \
+    ".skip "AS_TRUE"(.LXEN%=_diff > 0) * .LXEN%=_diff, 0x90\n\t" \
+    ".LXEN%=_orig_p:\n\t"
+
 #define ALTINSTR_ENTRY(feature, num)                                    \
         " .long .LXEN%=_orig_s - .\n"             /* label           */ \
         " .long " alt_repl_s(num)" - .\n"         /* new instruction */ \
         " .word " __stringify(feature) "\n"       /* feature bit     */ \
         " .byte " alt_orig_len "\n"               /* source len      */ \
-        " .byte " alt_repl_len(num) "\n"          /* replacement len */
+        " .byte " alt_repl_len(num) "\n"          /* replacement len */ \
+        " .byte " alt_pad_len "\n"                /* padding len     */
 
-#define DISCARD_ENTRY(num)                        /* repl <= orig */    \
-        " .byte 0xff + (" alt_repl_len(num) ") - (" alt_orig_len ")\n"
+#define DISCARD_ENTRY(num)                        /* repl <= total */   \
+        " .byte 0xff + (" alt_repl_len(num) ") - (" alt_total_len ")\n"
 
 #define ALTINSTR_REPLACEMENT(newinstr, num)       /* replacement */     \
         alt_repl_s(num)":\n\t" newinstr "\n" alt_repl_e(num) ":\n\t"
 
 /* alternative assembly primitive: */
 #define ALTERNATIVE(oldinstr, newinstr, feature)                        \
-        OLDINSTR(oldinstr)                                              \
+        OLDINSTR_1(oldinstr, 1)                                         \
         ".pushsection .altinstructions, \"a\", @progbits\n"             \
         ALTINSTR_ENTRY(feature, 1)                                      \
         ".section .discard, \"a\", @progbits\n"                         \
+        ".byte " alt_total_len "\n" /* total_len <= 255 */              \
         DISCARD_ENTRY(1)                                                \
         ".section .altinstr_replacement, \"ax\", @progbits\n"           \
         ALTINSTR_REPLACEMENT(newinstr, 1)                               \
         ".popsection\n"
 
 #define ALTERNATIVE_2(oldinstr, newinstr1, feature1, newinstr2, feature2) \
-        OLDINSTR(oldinstr)                                              \
+        OLDINSTR_2(oldinstr, 1, 2)                                      \
         ".pushsection .altinstructions, \"a\", @progbits\n"             \
         ALTINSTR_ENTRY(feature1, 1)                                     \
         ALTINSTR_ENTRY(feature2, 2)                                     \
         ".section .discard, \"a\", @progbits\n"                         \
+        ".byte " alt_total_len "\n" /* total_len <= 255 */              \
         DISCARD_ENTRY(1)                                                \
         DISCARD_ENTRY(2)                                                \
         ".section .altinstr_replacement, \"ax\", @progbits\n"           \
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 6/7] x86/alt: Drop explicit padding of origin sites
  2018-02-26 11:34 [PATCH RESEND v2 0/7] x86/alternatives: Support for automatic padding calculations Andrew Cooper
                   ` (4 preceding siblings ...)
  2018-02-26 11:35 ` [PATCH v2 5/7] x86/alt: Support for automatic padding calculations Andrew Cooper
@ 2018-02-26 11:35 ` Andrew Cooper
  2018-02-26 11:35 ` [PATCH v2 7/7] x86/build: Use new .nop directive when available Andrew Cooper
  6 siblings, 0 replies; 26+ messages in thread
From: Andrew Cooper @ 2018-02-26 11:35 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

Now that the alternatives infrastructure can calculate the required padding
automatically, there is no need to hard code it.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/x86_64/compat/entry.S  |  2 +-
 xen/arch/x86/x86_64/entry.S         |  2 +-
 xen/include/asm-x86/nops.h          |  7 -------
 xen/include/asm-x86/spec_ctrl_asm.h | 19 ++++++++-----------
 4 files changed, 10 insertions(+), 20 deletions(-)

diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
index 8aba269..f650610 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -134,7 +134,7 @@ ENTRY(compat_restore_all_guest)
         jne   1b
 2:
 .endm
-	ALTERNATIVE_2 ".skip 45, 0x90", \
+	ALTERNATIVE_2 "", \
             alt_cr4_pv32, X86_FEATURE_XEN_SMEP, \
             alt_cr4_pv32, X86_FEATURE_XEN_SMAP
 
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index e939f20..cc94333 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -564,7 +564,7 @@ handle_exception_saved:
         testb $X86_EFLAGS_IF>>8,UREGS_eflags+1(%rsp)
         jz    exception_with_ints_disabled
 
-        ALTERNATIVE_2 "jmp .Lcr4_pv32_done; .skip 2, 0x90", \
+        ALTERNATIVE_2 "jmp .Lcr4_pv32_done", \
             __stringify(mov VCPU_domain(%rbx), %rax), X86_FEATURE_XEN_SMEP, \
             __stringify(mov VCPU_domain(%rbx), %rax), X86_FEATURE_XEN_SMAP
 
diff --git a/xen/include/asm-x86/nops.h b/xen/include/asm-x86/nops.h
index 61319cc..1a46b97 100644
--- a/xen/include/asm-x86/nops.h
+++ b/xen/include/asm-x86/nops.h
@@ -65,13 +65,6 @@
 #define ASM_NOP8 _ASM_MK_NOP(P6_NOP8)
 #define ASM_NOP9 _ASM_MK_NOP(P6_NOP9)
 
-#define ASM_NOP17 ASM_NOP8; ASM_NOP7; ASM_NOP2
-#define ASM_NOP21 ASM_NOP8; ASM_NOP8; ASM_NOP5
-#define ASM_NOP24 ASM_NOP8; ASM_NOP8; ASM_NOP8
-#define ASM_NOP29 ASM_NOP8; ASM_NOP8; ASM_NOP8; ASM_NOP5
-#define ASM_NOP32 ASM_NOP8; ASM_NOP8; ASM_NOP8; ASM_NOP8
-#define ASM_NOP40 ASM_NOP8; ASM_NOP8; ASM_NOP8; ASM_NOP8; ASM_NOP8
-
 #define ASM_NOP_MAX 9
 
 #endif /* __X86_ASM_NOPS_H__ */
diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
index 1f2b6f3..1623fc0 100644
--- a/xen/include/asm-x86/spec_ctrl_asm.h
+++ b/xen/include/asm-x86/spec_ctrl_asm.h
@@ -216,9 +216,8 @@
 
 /* Use after a VMEXIT from an HVM guest. */
 #define SPEC_CTRL_ENTRY_FROM_VMEXIT                                     \
-    ALTERNATIVE __stringify(ASM_NOP40),                                 \
-        DO_OVERWRITE_RSB, X86_FEATURE_RSB_VMEXIT;                       \
-    ALTERNATIVE_2 __stringify(ASM_NOP32),                               \
+    ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_RSB_VMEXIT;           \
+    ALTERNATIVE_2 "",                                                   \
         __stringify(DO_SPEC_CTRL_ENTRY_FROM_VMEXIT                      \
                     ibrs_val=SPEC_CTRL_IBRS),                           \
         X86_FEATURE_XEN_IBRS_SET,                                       \
@@ -228,9 +227,8 @@
 
 /* Use after an entry from PV context (syscall/sysenter/int80/int82/etc). */
 #define SPEC_CTRL_ENTRY_FROM_PV                                         \
-    ALTERNATIVE __stringify(ASM_NOP40),                                 \
-        DO_OVERWRITE_RSB, X86_FEATURE_RSB_NATIVE;                       \
-    ALTERNATIVE_2 __stringify(ASM_NOP21),                               \
+    ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_RSB_NATIVE;           \
+    ALTERNATIVE_2 "",                                                   \
         __stringify(DO_SPEC_CTRL_ENTRY maybexen=0                       \
                     ibrs_val=SPEC_CTRL_IBRS),                           \
         X86_FEATURE_XEN_IBRS_SET,                                       \
@@ -239,9 +237,8 @@
 
 /* Use in interrupt/exception context.  May interrupt Xen or PV context. */
 #define SPEC_CTRL_ENTRY_FROM_INTR                                       \
-    ALTERNATIVE __stringify(ASM_NOP40),                                 \
-        DO_OVERWRITE_RSB, X86_FEATURE_RSB_NATIVE;                       \
-    ALTERNATIVE_2 __stringify(ASM_NOP29),                               \
+    ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_RSB_NATIVE;           \
+    ALTERNATIVE_2 "",                                                   \
         __stringify(DO_SPEC_CTRL_ENTRY maybexen=1                       \
                     ibrs_val=SPEC_CTRL_IBRS),                           \
         X86_FEATURE_XEN_IBRS_SET,                                       \
@@ -250,13 +247,13 @@
 
 /* Use when exiting to Xen context. */
 #define SPEC_CTRL_EXIT_TO_XEN                                           \
-    ALTERNATIVE_2 __stringify(ASM_NOP17),                               \
+    ALTERNATIVE_2 "",                                                   \
         DO_SPEC_CTRL_EXIT_TO_XEN, X86_FEATURE_XEN_IBRS_SET,             \
         DO_SPEC_CTRL_EXIT_TO_XEN, X86_FEATURE_XEN_IBRS_CLEAR
 
 /* Use when exiting to guest context. */
 #define SPEC_CTRL_EXIT_TO_GUEST                                         \
-    ALTERNATIVE_2 __stringify(ASM_NOP24),                               \
+    ALTERNATIVE_2 "",                                                   \
         DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_XEN_IBRS_SET,           \
         DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_XEN_IBRS_CLEAR
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 7/7] x86/build: Use new .nop directive when available
  2018-02-26 11:34 [PATCH RESEND v2 0/7] x86/alternatives: Support for automatic padding calculations Andrew Cooper
                   ` (5 preceding siblings ...)
  2018-02-26 11:35 ` [PATCH v2 6/7] x86/alt: Drop explicit padding of origin sites Andrew Cooper
@ 2018-02-26 11:35 ` Andrew Cooper
  2018-02-26 12:31   ` Roger Pau Monné
  2018-02-28 16:22   ` Jan Beulich
  6 siblings, 2 replies; 26+ messages in thread
From: Andrew Cooper @ 2018-02-26 11:35 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, Jan Beulich

Newer versions of binutils are capable of emitting an exact number bytes worth
of optimised nops.  Use this in preference to .skip when available.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>

RFC until support is actually committed to binutils mainline.
---
 xen/arch/x86/Rules.mk                 |  4 ++++
 xen/include/asm-x86/alternative-asm.h | 14 ++++++++++++--
 xen/include/asm-x86/alternative.h     | 13 ++++++++++---
 3 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/Rules.mk b/xen/arch/x86/Rules.mk
index e169d67..bf5047f 100644
--- a/xen/arch/x86/Rules.mk
+++ b/xen/arch/x86/Rules.mk
@@ -28,6 +28,10 @@ $(call as-option-add,CFLAGS,CC,".equ \"x\"$$(comma)1", \
 $(call as-option-add,CFLAGS,CC,\
     ".if ((1 > 0) < 0); .error \"\";.endif",,-DHAVE_AS_NEGATIVE_TRUE)
 
+# Check to see whether the assmbler supports the .nop directive.
+$(call as-option-add,CFLAGS,CC,\
+    ".L1: .L2: .nop (.L2 - .L1)$$(comma)9",-DHAVE_AS_NOP_DIRECTIVE)
+
 CFLAGS += -mno-red-zone -fpic -fno-asynchronous-unwind-tables
 
 # Xen doesn't use SSE interally.  If the compiler supports it, also skip the
diff --git a/xen/include/asm-x86/alternative-asm.h b/xen/include/asm-x86/alternative-asm.h
index 25f79fe..9e46bed 100644
--- a/xen/include/asm-x86/alternative-asm.h
+++ b/xen/include/asm-x86/alternative-asm.h
@@ -1,6 +1,8 @@
 #ifndef _ASM_X86_ALTERNATIVE_ASM_H_
 #define _ASM_X86_ALTERNATIVE_ASM_H_
 
+#include <asm/nops.h>
+
 #ifdef __ASSEMBLY__
 
 /*
@@ -18,6 +20,14 @@
     .byte \pad_len
 .endm
 
+.macro mknops nr_bytes
+#ifdef HAVE_AS_NOP_DIRECTIVE
+    .nop \nr_bytes, ASM_NOP_MAX
+#else
+    .skip \nr_bytes, 0x90
+#endif
+.endm
+
 #define orig_len               (.L\@_orig_e       -     .L\@_orig_s)
 #define pad_len                (.L\@_orig_p       -     .L\@_orig_e)
 #define total_len              (.L\@_orig_p       -     .L\@_orig_s)
@@ -43,7 +53,7 @@
      */
     .L\@_diff = repl_len(1) - orig_len
 
-    .skip as_true(.L\@_diff > 0) * .L\@_diff, 0x90
+     mknops (as_true(.L\@_diff > 0) * .L\@_diff)
 .L\@_orig_p:
 
     .pushsection .altinstructions, "a", @progbits
@@ -76,7 +86,7 @@
      */
     .L\@_diff = as_max(repl_len(1), repl_len(2)) - orig_len
 
-     .skip as_true(.L\@_diff > 0) * .L\@_diff, 0x90
+     mknops (as_true(.L\@_diff > 0) * .L\@_diff)
 .L\@_orig_p:
 
     .pushsection .altinstructions, "a", @progbits
diff --git a/xen/include/asm-x86/alternative.h b/xen/include/asm-x86/alternative.h
index d53cea0..51538b6 100644
--- a/xen/include/asm-x86/alternative.h
+++ b/xen/include/asm-x86/alternative.h
@@ -2,7 +2,6 @@
 #define __X86_ALTERNATIVE_H__
 
 #include <asm/alternative-asm.h>
-#include <asm/nops.h>
 
 #ifndef __ASSEMBLY__
 #include <xen/stringify.h>
@@ -27,6 +26,14 @@ extern void apply_alternatives(const struct alt_instr *start,
                                const struct alt_instr *end);
 extern void alternative_instructions(void);
 
+asm ( ".macro mknops nr_bytes\n\t"
+#ifdef HAVE_AS_NOP_DIRECTIVE
+      ".nop \\nr_bytes, " __stringify(ASM_NOP_MAX) "\n\t"
+#else
+      ".skip \\nr_bytes, 0x90\n\t"
+#endif
+      ".endm\n\t" );
+
 #define alt_orig_len       "(.LXEN%=_orig_e - .LXEN%=_orig_s)"
 #define alt_pad_len        "(.LXEN%=_orig_p - .LXEN%=_orig_e)"
 #define alt_total_len      "(.LXEN%=_orig_p - .LXEN%=_orig_s)"
@@ -46,7 +53,7 @@ extern void alternative_instructions(void);
 #define OLDINSTR_1(oldinstr, n1)                                 \
     ".LXEN%=_orig_s:\n\t" oldinstr "\n .LXEN%=_orig_e:\n\t"      \
     ".LXEN%=_diff = "alt_repl_len(n1)"-"alt_orig_len"\n\t"       \
-    ".skip "AS_TRUE"(.LXEN%=_diff > 0) * .LXEN%=_diff, 0x90\n\t" \
+    "mknops ("AS_TRUE"(.LXEN%=_diff > 0) * .LXEN%=_diff)\n\t"    \
     ".LXEN%=_orig_p:\n\t"
 
 #define ALT_PADDING_LEN(n1, n2) \
@@ -55,7 +62,7 @@ extern void alternative_instructions(void);
 #define OLDINSTR_2(oldinstr, n1, n2)                             \
     ".LXEN%=_orig_s:\n\t" oldinstr "\n .LXEN%=_orig_e:\n\t"      \
     ".LXEN%=_diff = "ALT_PADDING_LEN(n1, n2)"\n\t"               \
-    ".skip "AS_TRUE"(.LXEN%=_diff > 0) * .LXEN%=_diff, 0x90\n\t" \
+    "mknops ("AS_TRUE"(.LXEN%=_diff > 0) * .LXEN%=_diff)\n\t"    \
     ".LXEN%=_orig_p:\n\t"
 
 #define ALTINSTR_ENTRY(feature, num)                                    \
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 7/7] x86/build: Use new .nop directive when available
  2018-02-26 11:35 ` [PATCH v2 7/7] x86/build: Use new .nop directive when available Andrew Cooper
@ 2018-02-26 12:31   ` Roger Pau Monné
  2018-02-26 13:08     ` Andrew Cooper
  2018-02-28 16:22   ` Jan Beulich
  1 sibling, 1 reply; 26+ messages in thread
From: Roger Pau Monné @ 2018-02-26 12:31 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Jan Beulich, Xen-devel

On Mon, Feb 26, 2018 at 11:35:04AM +0000, Andrew Cooper wrote:
> Newer versions of binutils are capable of emitting an exact number bytes worth
> of optimised nops.  Use this in preference to .skip when available.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> 
> RFC until support is actually committed to binutils mainline.

Since RFC has been dropped from the subject, is this now committed?

> ---
>  xen/arch/x86/Rules.mk                 |  4 ++++
>  xen/include/asm-x86/alternative-asm.h | 14 ++++++++++++--
>  xen/include/asm-x86/alternative.h     | 13 ++++++++++---
>  3 files changed, 26 insertions(+), 5 deletions(-)
> 
> diff --git a/xen/arch/x86/Rules.mk b/xen/arch/x86/Rules.mk
> index e169d67..bf5047f 100644
> --- a/xen/arch/x86/Rules.mk
> +++ b/xen/arch/x86/Rules.mk
> @@ -28,6 +28,10 @@ $(call as-option-add,CFLAGS,CC,".equ \"x\"$$(comma)1", \
>  $(call as-option-add,CFLAGS,CC,\
>      ".if ((1 > 0) < 0); .error \"\";.endif",,-DHAVE_AS_NEGATIVE_TRUE)
>  
> +# Check to see whether the assmbler supports the .nop directive.
> +$(call as-option-add,CFLAGS,CC,\
> +    ".L1: .L2: .nop (.L2 - .L1)$$(comma)9",-DHAVE_AS_NOP_DIRECTIVE)
> +
>  CFLAGS += -mno-red-zone -fpic -fno-asynchronous-unwind-tables
>  
>  # Xen doesn't use SSE interally.  If the compiler supports it, also skip the
> diff --git a/xen/include/asm-x86/alternative-asm.h b/xen/include/asm-x86/alternative-asm.h
> index 25f79fe..9e46bed 100644
> --- a/xen/include/asm-x86/alternative-asm.h
> +++ b/xen/include/asm-x86/alternative-asm.h
> @@ -1,6 +1,8 @@
>  #ifndef _ASM_X86_ALTERNATIVE_ASM_H_
>  #define _ASM_X86_ALTERNATIVE_ASM_H_
>  
> +#include <asm/nops.h>
> +
>  #ifdef __ASSEMBLY__
>  
>  /*
> @@ -18,6 +20,14 @@
>      .byte \pad_len
>  .endm
>  
> +.macro mknops nr_bytes
> +#ifdef HAVE_AS_NOP_DIRECTIVE
> +    .nop \nr_bytes, ASM_NOP_MAX

I'm not able to find any online document about the .nop directive, and
I cannot really figure out the purpose of the second parameter.

Also, after this patch is applied it seems like the padding is not
going to be 0x90, because as will already emit optimized nops. Are
those nops more optimized than the ones added by the alternatives
framework?

I would expect the nops added at runtime would be more optimized than
the ones added at build time, because the runtime ones could take into
account the specific CPU model Xen is running on.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 7/7] x86/build: Use new .nop directive when available
  2018-02-26 12:31   ` Roger Pau Monné
@ 2018-02-26 13:08     ` Andrew Cooper
  2018-02-26 14:27       ` Roger Pau Monné
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Cooper @ 2018-02-26 13:08 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Wei Liu, Jan Beulich, Xen-devel

On 26/02/18 12:31, Roger Pau Monné wrote:
> On Mon, Feb 26, 2018 at 11:35:04AM +0000, Andrew Cooper wrote:
>> Newer versions of binutils are capable of emitting an exact number bytes worth
>> of optimised nops.  Use this in preference to .skip when available.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> CC: Roger Pau Monné <roger.pau@citrix.com>
>> CC: Wei Liu <wei.liu2@citrix.com>
>>
>> RFC until support is actually committed to binutils mainline.
> Since RFC has been dropped from the subject, is this now committed?

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commitdiff;h=62a02d25b6e5d9f92c205260daa11355d0c62532

>
>> ---
>>  xen/arch/x86/Rules.mk                 |  4 ++++
>>  xen/include/asm-x86/alternative-asm.h | 14 ++++++++++++--
>>  xen/include/asm-x86/alternative.h     | 13 ++++++++++---
>>  3 files changed, 26 insertions(+), 5 deletions(-)
>>
>> diff --git a/xen/arch/x86/Rules.mk b/xen/arch/x86/Rules.mk
>> index e169d67..bf5047f 100644
>> --- a/xen/arch/x86/Rules.mk
>> +++ b/xen/arch/x86/Rules.mk
>> @@ -28,6 +28,10 @@ $(call as-option-add,CFLAGS,CC,".equ \"x\"$$(comma)1", \
>>  $(call as-option-add,CFLAGS,CC,\
>>      ".if ((1 > 0) < 0); .error \"\";.endif",,-DHAVE_AS_NEGATIVE_TRUE)
>>  
>> +# Check to see whether the assmbler supports the .nop directive.
>> +$(call as-option-add,CFLAGS,CC,\
>> +    ".L1: .L2: .nop (.L2 - .L1)$$(comma)9",-DHAVE_AS_NOP_DIRECTIVE)
>> +
>>  CFLAGS += -mno-red-zone -fpic -fno-asynchronous-unwind-tables
>>  
>>  # Xen doesn't use SSE interally.  If the compiler supports it, also skip the
>> diff --git a/xen/include/asm-x86/alternative-asm.h b/xen/include/asm-x86/alternative-asm.h
>> index 25f79fe..9e46bed 100644
>> --- a/xen/include/asm-x86/alternative-asm.h
>> +++ b/xen/include/asm-x86/alternative-asm.h
>> @@ -1,6 +1,8 @@
>>  #ifndef _ASM_X86_ALTERNATIVE_ASM_H_
>>  #define _ASM_X86_ALTERNATIVE_ASM_H_
>>  
>> +#include <asm/nops.h>
>> +
>>  #ifdef __ASSEMBLY__
>>  
>>  /*
>> @@ -18,6 +20,14 @@
>>      .byte \pad_len
>>  .endm
>>  
>> +.macro mknops nr_bytes
>> +#ifdef HAVE_AS_NOP_DIRECTIVE
>> +    .nop \nr_bytes, ASM_NOP_MAX
> I'm not able to find any online document about the .nop directive, and
> I cannot really figure out the purpose of the second parameter.

Its a bit woolly, named "control".  In practice, it is the maximum
length of an individual nop.  Beyond 11 bytes (iirc), most pipelines
take a decode stall.

>
> Also, after this patch is applied it seems like the padding is not
> going to be 0x90, because as will already emit optimized nops. Are
> those nops more optimized than the ones added by the alternatives
> framework?

They are the same nops (by and large), although arranged differently. 
For example, GAS fills backwards rather than forwards, which is as
recommended in the Intel ORM.

> I would expect the nops added at runtime would be more optimized than
> the ones added at build time, because the runtime ones could take into
> account the specific CPU model Xen is running on.

There are only a very few 64-bit capable CPUs which prefer K8 nops over
P6 nops, and they are all old.  The build-time nops are correct for the
overwhelming majority of hardware.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 3/7] x86/alt: Clean up the assembly used to generate alternatives
  2018-02-26 11:35 ` [PATCH v2 3/7] x86/alt: Clean up the assembly used to generate alternatives Andrew Cooper
@ 2018-02-26 14:09   ` Roger Pau Monné
  2018-02-28 15:59   ` Jan Beulich
  2018-03-01 12:44   ` Wei Liu
  2 siblings, 0 replies; 26+ messages in thread
From: Roger Pau Monné @ 2018-02-26 14:09 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Jan Beulich, Xen-devel

On Mon, Feb 26, 2018 at 11:35:00AM +0000, Andrew Cooper wrote:
>  * On the C side, switch to using local lables rather than hardcoded numbers.
>  * Rename parameters and lables to be consistent with alt_instr names, and
>    consistent between the the C and asm versions.
>  * On the asm side, factor some expressions out into macros to aid clarity.
>  * Consistently declare section attributes.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 7/7] x86/build: Use new .nop directive when available
  2018-02-26 13:08     ` Andrew Cooper
@ 2018-02-26 14:27       ` Roger Pau Monné
  0 siblings, 0 replies; 26+ messages in thread
From: Roger Pau Monné @ 2018-02-26 14:27 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Jan Beulich, Xen-devel

On Mon, Feb 26, 2018 at 01:08:05PM +0000, Andrew Cooper wrote:
> On 26/02/18 12:31, Roger Pau Monné wrote:
> > On Mon, Feb 26, 2018 at 11:35:04AM +0000, Andrew Cooper wrote:
> >> Newer versions of binutils are capable of emitting an exact number bytes worth
> >> of optimised nops.  Use this in preference to .skip when available.
> >>
> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> >> ---
> >> CC: Jan Beulich <JBeulich@suse.com>
> >> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> >> CC: Roger Pau Monné <roger.pau@citrix.com>
> >> CC: Wei Liu <wei.liu2@citrix.com>
> >>
> >> RFC until support is actually committed to binutils mainline.
> > Since RFC has been dropped from the subject, is this now committed?
> 
> https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commitdiff;h=62a02d25b6e5d9f92c205260daa11355d0c62532

Thanks for the reference.

> >
> >> ---
> >>  xen/arch/x86/Rules.mk                 |  4 ++++
> >>  xen/include/asm-x86/alternative-asm.h | 14 ++++++++++++--
> >>  xen/include/asm-x86/alternative.h     | 13 ++++++++++---
> >>  3 files changed, 26 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/xen/arch/x86/Rules.mk b/xen/arch/x86/Rules.mk
> >> index e169d67..bf5047f 100644
> >> --- a/xen/arch/x86/Rules.mk
> >> +++ b/xen/arch/x86/Rules.mk
> >> @@ -28,6 +28,10 @@ $(call as-option-add,CFLAGS,CC,".equ \"x\"$$(comma)1", \
> >>  $(call as-option-add,CFLAGS,CC,\
> >>      ".if ((1 > 0) < 0); .error \"\";.endif",,-DHAVE_AS_NEGATIVE_TRUE)
> >>  
> >> +# Check to see whether the assmbler supports the .nop directive.
> >> +$(call as-option-add,CFLAGS,CC,\
> >> +    ".L1: .L2: .nop (.L2 - .L1)$$(comma)9",-DHAVE_AS_NOP_DIRECTIVE)
> >> +
> >>  CFLAGS += -mno-red-zone -fpic -fno-asynchronous-unwind-tables
> >>  
> >>  # Xen doesn't use SSE interally.  If the compiler supports it, also skip the
> >> diff --git a/xen/include/asm-x86/alternative-asm.h b/xen/include/asm-x86/alternative-asm.h
> >> index 25f79fe..9e46bed 100644
> >> --- a/xen/include/asm-x86/alternative-asm.h
> >> +++ b/xen/include/asm-x86/alternative-asm.h
> >> @@ -1,6 +1,8 @@
> >>  #ifndef _ASM_X86_ALTERNATIVE_ASM_H_
> >>  #define _ASM_X86_ALTERNATIVE_ASM_H_
> >>  
> >> +#include <asm/nops.h>
> >> +
> >>  #ifdef __ASSEMBLY__
> >>  
> >>  /*
> >> @@ -18,6 +20,14 @@
> >>      .byte \pad_len
> >>  .endm
> >>  
> >> +.macro mknops nr_bytes
> >> +#ifdef HAVE_AS_NOP_DIRECTIVE
> >> +    .nop \nr_bytes, ASM_NOP_MAX
> > I'm not able to find any online document about the .nop directive, and
> > I cannot really figure out the purpose of the second parameter.
> 
> Its a bit woolly, named "control".  In practice, it is the maximum
> length of an individual nop.  Beyond 11 bytes (iirc), most pipelines
> take a decode stall.

Oh, I've looked at the commit above, and since control is an optional
parameter, why not leave the assembler chose the default value? AFAICT
in our case this should be 11 because it's 64bit code.

> >
> > Also, after this patch is applied it seems like the padding is not
> > going to be 0x90, because as will already emit optimized nops. Are
> > those nops more optimized than the ones added by the alternatives
> > framework?
> 
> They are the same nops (by and large), although arranged differently. 
> For example, GAS fills backwards rather than forwards, which is as
> recommended in the Intel ORM.

So that 'larger' nop instructions are going to be placed at the end of
the region instead of the beginning of it?

> > I would expect the nops added at runtime would be more optimized than
> > the ones added at build time, because the runtime ones could take into
> > account the specific CPU model Xen is running on.
> 
> There are only a very few 64-bit capable CPUs which prefer K8 nops over
> P6 nops, and they are all old.  The build-time nops are correct for the
> overwhelming majority of hardware.

OK, I have no idea about this, but if you think filling with .nop is
not going to introduce a performance penalty over the run-time filling
then that's fine for me.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 3/7] x86/alt: Clean up the assembly used to generate alternatives
  2018-02-26 11:35 ` [PATCH v2 3/7] x86/alt: Clean up the assembly used to generate alternatives Andrew Cooper
  2018-02-26 14:09   ` Roger Pau Monné
@ 2018-02-28 15:59   ` Jan Beulich
  2018-03-01 12:44   ` Wei Liu
  2 siblings, 0 replies; 26+ messages in thread
From: Jan Beulich @ 2018-02-28 15:59 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

 >>> On 26.02.18 at 12:35, <andrew.cooper3@citrix.com> wrote:
> * On the C side, switch to using local lables rather than hardcoded numbers.
>  * Rename parameters and lables to be consistent with alt_instr names, and
>    consistent between the the C and asm versions.
>  * On the asm side, factor some expressions out into macros to aid clarity.
>  * Consistently declare section attributes.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 5/7] x86/alt: Support for automatic padding calculations
  2018-02-26 11:35 ` [PATCH v2 5/7] x86/alt: Support for automatic padding calculations Andrew Cooper
@ 2018-02-28 16:16   ` Jan Beulich
  2018-02-28 17:26     ` Andrew Cooper
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2018-02-28 16:16 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

>>> On 26.02.18 at 12:35, <andrew.cooper3@citrix.com> wrote:
> --- a/xen/arch/x86/Rules.mk
> +++ b/xen/arch/x86/Rules.mk
> @@ -24,6 +24,10 @@ $(call as-option-add,CFLAGS,CC,".equ \"x\"$$(comma)1", \
>                       -U__OBJECT_LABEL__ -DHAVE_GAS_QUOTED_SYM \
>                       '-D__OBJECT_LABEL__=$(subst $(BASEDIR)/,,$(CURDIR))/$$@')
>  
> +# GCC's idea of true is -1.  Clang's idea is 1
> +$(call as-option-add,CFLAGS,CC,\
> +    ".if ((1 > 0) < 0); .error \"\";.endif",,-DHAVE_AS_NEGATIVE_TRUE)

With GCC replaced by gas, as indicated already be Roger
(also in alternative*.h then)
Reviewed-by: Jan Beulich <jbeulich@suse.com>

However, I have a question to raise and a remark to make:

>  .macro ALTERNATIVE oldinstr, newinstr, feature
>  .L\@_orig_s:
>      \oldinstr
>  .L\@_orig_e:
> +    /*
> +     * Calculate the difference in size between the replacement and original
> +     * instructions, to derive how much padding to introduce.
> +     */
> +    .L\@_diff = repl_len(1) - orig_len
> +
> +    .skip as_true(.L\@_diff > 0) * .L\@_diff, 0x90

How certain are you that these forward references actually work on
at least all half way recent gas versions? IOW I wonder whether it
wouldn't be more safe to make these backward references (i.e.
emitting the replacement code first).

> @@ -45,18 +70,31 @@
>  .L\@_orig_s:
>      \oldinstr
>  .L\@_orig_e:
> +    /*
> +     * Calculate the difference in size between the largest replacement and
> +     * the original instructions, to derive how much padding to introduce.
> +     */
> +    .L\@_diff = as_max(repl_len(1), repl_len(2)) - orig_len
> +
> +     .skip as_true(.L\@_diff > 0) * .L\@_diff, 0x90
> +.L\@_orig_p:
>  
>      .pushsection .altinstructions, "a", @progbits
>  
>      altinstruction_entry .L\@_orig_s, .L\@_repl_s1, \feature1, \
> -        orig_len, repl_len(1)
> +        orig_len, repl_len(1), pad_len
>      altinstruction_entry .L\@_orig_s, .L\@_repl_s2, \feature2, \
> -        orig_len, repl_len(2)
> +        orig_len, repl_len(2), pad_len
>  
>      .section .discard, "a", @progbits
> -    /* Assembler-time check that \newinstr{1,2} aren't longer than \oldinstr. */
> -    .byte 0xff + repl_len(1) - orig_len
> -    .byte 0xff + repl_len(2) - orig_len
> +    /*
> +     * Assembler-time checks:
> +     *   - total_len <= 255
> +     *   - \newinstr* <= total_len
> +     */
> +    .byte total_len
> +    .byte 0xff + repl_len(1) - total_len
> +    .byte 0xff + repl_len(2) - total_len

Use as_max() here and emit only a single byte? I don't think the
diagnostic will be any less helpful, as iirc it doesn't point out the
line inside the macro anyway, so the developer will be left guessing
anyway which of the two alternatives it was. Otoh with the
padding size now being calculated, I don't see much point in these
checks anymore anyway.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 7/7] x86/build: Use new .nop directive when available
  2018-02-26 11:35 ` [PATCH v2 7/7] x86/build: Use new .nop directive when available Andrew Cooper
  2018-02-26 12:31   ` Roger Pau Monné
@ 2018-02-28 16:22   ` Jan Beulich
  2018-02-28 17:45     ` Andrew Cooper
  1 sibling, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2018-02-28 16:22 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

>>> On 26.02.18 at 12:35, <andrew.cooper3@citrix.com> wrote:
> Newer versions of binutils are capable of emitting an exact number bytes worth
> of optimised nops.  Use this in preference to .skip when available.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

In principle
Reviewed-by: Jan Beulich <jbeulich@suse.com>
However, ...

> RFC until support is actually committed to binutils mainline.

... upstream looks to have switched to .nops now, so the prereq
to the R-b is that you consistently switch over.

> --- a/xen/arch/x86/Rules.mk
> +++ b/xen/arch/x86/Rules.mk
> @@ -28,6 +28,10 @@ $(call as-option-add,CFLAGS,CC,".equ \"x\"$$(comma)1", \
>  $(call as-option-add,CFLAGS,CC,\
>      ".if ((1 > 0) < 0); .error \"\";.endif",,-DHAVE_AS_NEGATIVE_TRUE)
>  
> +# Check to see whether the assmbler supports the .nop directive.
> +$(call as-option-add,CFLAGS,CC,\
> +    ".L1: .L2: .nop (.L2 - .L1)$$(comma)9",-DHAVE_AS_NOP_DIRECTIVE)

Do you really need the (arbitrary) second argument here?

> --- a/xen/include/asm-x86/alternative-asm.h
> +++ b/xen/include/asm-x86/alternative-asm.h
> @@ -1,6 +1,8 @@
>  #ifndef _ASM_X86_ALTERNATIVE_ASM_H_
>  #define _ASM_X86_ALTERNATIVE_ASM_H_
>  
> +#include <asm/nops.h>
> +
>  #ifdef __ASSEMBLY__
>  
>  /*
> @@ -18,6 +20,14 @@
>      .byte \pad_len
>  .endm
>  
> +.macro mknops nr_bytes
> +#ifdef HAVE_AS_NOP_DIRECTIVE
> +    .nop \nr_bytes, ASM_NOP_MAX

And do you really need to specify ASM_NOP_MAX here? What's
wrong with letting the assembler pick what it wants as the longest
NOP?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 5/7] x86/alt: Support for automatic padding calculations
  2018-02-28 16:16   ` Jan Beulich
@ 2018-02-28 17:26     ` Andrew Cooper
  2018-03-01  7:26       ` Jan Beulich
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Cooper @ 2018-02-28 17:26 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Roger Pau Monné, Wei Liu, Xen-devel

On 28/02/18 16:16, Jan Beulich wrote:
>>>> On 26.02.18 at 12:35, <andrew.cooper3@citrix.com> wrote:
>> --- a/xen/arch/x86/Rules.mk
>> +++ b/xen/arch/x86/Rules.mk
>> @@ -24,6 +24,10 @@ $(call as-option-add,CFLAGS,CC,".equ \"x\"$$(comma)1", \
>>                       -U__OBJECT_LABEL__ -DHAVE_GAS_QUOTED_SYM \
>>                       '-D__OBJECT_LABEL__=$(subst $(BASEDIR)/,,$(CURDIR))/$$@')
>>  
>> +# GCC's idea of true is -1.  Clang's idea is 1
>> +$(call as-option-add,CFLAGS,CC,\
>> +    ".if ((1 > 0) < 0); .error \"\";.endif",,-DHAVE_AS_NEGATIVE_TRUE)
> With GCC replaced by gas, as indicated already be Roger
> (also in alternative*.h then)
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>
> However, I have a question to raise and a remark to make:
>
>>  .macro ALTERNATIVE oldinstr, newinstr, feature
>>  .L\@_orig_s:
>>      \oldinstr
>>  .L\@_orig_e:
>> +    /*
>> +     * Calculate the difference in size between the replacement and original
>> +     * instructions, to derive how much padding to introduce.
>> +     */
>> +    .L\@_diff = repl_len(1) - orig_len
>> +
>> +    .skip as_true(.L\@_diff > 0) * .L\@_diff, 0x90
> How certain are you that these forward references actually work on
> at least all half way recent gas versions? IOW I wonder whether it
> wouldn't be more safe to make these backward references (i.e.
> emitting the replacement code first).

This code is used unconditionally in Linux, so any binutils within their
minimum set work.

So overall, I'm going to err on the side of "yes" in answer to your
question.

>
>> @@ -45,18 +70,31 @@
>>  .L\@_orig_s:
>>      \oldinstr
>>  .L\@_orig_e:
>> +    /*
>> +     * Calculate the difference in size between the largest replacement and
>> +     * the original instructions, to derive how much padding to introduce.
>> +     */
>> +    .L\@_diff = as_max(repl_len(1), repl_len(2)) - orig_len
>> +
>> +     .skip as_true(.L\@_diff > 0) * .L\@_diff, 0x90
>> +.L\@_orig_p:
>>  
>>      .pushsection .altinstructions, "a", @progbits
>>  
>>      altinstruction_entry .L\@_orig_s, .L\@_repl_s1, \feature1, \
>> -        orig_len, repl_len(1)
>> +        orig_len, repl_len(1), pad_len
>>      altinstruction_entry .L\@_orig_s, .L\@_repl_s2, \feature2, \
>> -        orig_len, repl_len(2)
>> +        orig_len, repl_len(2), pad_len
>>  
>>      .section .discard, "a", @progbits
>> -    /* Assembler-time check that \newinstr{1,2} aren't longer than \oldinstr. */
>> -    .byte 0xff + repl_len(1) - orig_len
>> -    .byte 0xff + repl_len(2) - orig_len
>> +    /*
>> +     * Assembler-time checks:
>> +     *   - total_len <= 255
>> +     *   - \newinstr* <= total_len
>> +     */
>> +    .byte total_len
>> +    .byte 0xff + repl_len(1) - total_len
>> +    .byte 0xff + repl_len(2) - total_len
> Use as_max() here and emit only a single byte? I don't think the
> diagnostic will be any less helpful, as iirc it doesn't point out the
> line inside the macro anyway, so the developer will be left guessing
> anyway which of the two alternatives it was. Otoh with the
> padding size now being calculated, I don't see much point in these
> checks anymore anyway.

The bytes are discarded, so the quantity doesn't matter.  I can use max,
and you are correct that the line reference only ends up pointing to the
.macro itself.

However, I would prefer to keep these.  They've spotted two bugs in the
binutils development effort of .nops, and I'd prefer to be safe than sorry.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 7/7] x86/build: Use new .nop directive when available
  2018-02-28 16:22   ` Jan Beulich
@ 2018-02-28 17:45     ` Andrew Cooper
  2018-03-01  7:28       ` Jan Beulich
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Cooper @ 2018-02-28 17:45 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 28/02/18 16:22, Jan Beulich wrote:
>>>> On 26.02.18 at 12:35, <andrew.cooper3@citrix.com> wrote:
>> Newer versions of binutils are capable of emitting an exact number bytes worth
>> of optimised nops.  Use this in preference to .skip when available.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> In principle
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> However, ...
>
>> RFC until support is actually committed to binutils mainline.
> ... upstream looks to have switched to .nops now, so the prereq
> to the R-b is that you consistently switch over.

Yes.  I need to pull, rebuild and retest.  Even if this patch gets
committed, it is still semi-RFC and up for changes until the next
binutils release.

That said, given the explicit probe, it is at least safe to have a stale
ABI in use, and we can change it later if needs be.

>
>> --- a/xen/arch/x86/Rules.mk
>> +++ b/xen/arch/x86/Rules.mk
>> @@ -28,6 +28,10 @@ $(call as-option-add,CFLAGS,CC,".equ \"x\"$$(comma)1", \
>>  $(call as-option-add,CFLAGS,CC,\
>>      ".if ((1 > 0) < 0); .error \"\";.endif",,-DHAVE_AS_NEGATIVE_TRUE)
>>  
>> +# Check to see whether the assmbler supports the .nop directive.
>> +$(call as-option-add,CFLAGS,CC,\
>> +    ".L1: .L2: .nop (.L2 - .L1)$$(comma)9",-DHAVE_AS_NOP_DIRECTIVE)
> Do you really need the (arbitrary) second argument here?

I want the check to match the form we actually use.

>
>> --- a/xen/include/asm-x86/alternative-asm.h
>> +++ b/xen/include/asm-x86/alternative-asm.h
>> @@ -1,6 +1,8 @@
>>  #ifndef _ASM_X86_ALTERNATIVE_ASM_H_
>>  #define _ASM_X86_ALTERNATIVE_ASM_H_
>>  
>> +#include <asm/nops.h>
>> +
>>  #ifdef __ASSEMBLY__
>>  
>>  /*
>> @@ -18,6 +20,14 @@
>>      .byte \pad_len
>>  .endm
>>  
>> +.macro mknops nr_bytes
>> +#ifdef HAVE_AS_NOP_DIRECTIVE
>> +    .nop \nr_bytes, ASM_NOP_MAX
> And do you really need to specify ASM_NOP_MAX here? What's
> wrong with letting the assembler pick what it wants as the longest
> NOP?

I don't want a toolchain change to cause us to go beyond 11 byte nops,
because of the associated decode stall on almost all hardware.  Using
ASM_NOP_MAX seemed like the easiest way to keep the end result
consistent, irrespective of toolchain support.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 5/7] x86/alt: Support for automatic padding calculations
  2018-02-28 17:26     ` Andrew Cooper
@ 2018-03-01  7:26       ` Jan Beulich
  0 siblings, 0 replies; 26+ messages in thread
From: Jan Beulich @ 2018-03-01  7:26 UTC (permalink / raw)
  To: andrew.cooper3; +Cc: xen-devel, wei.liu2, roger.pau

>>> Andrew Cooper <andrew.cooper3@citrix.com> 02/28/18 7:02 PM >>>
>On 28/02/18 16:16, Jan Beulich wrote:
>>>>> On 26.02.18 at 12:35, <andrew.cooper3@citrix.com> wrote:
>>>  .macro ALTERNATIVE oldinstr, newinstr, feature
>>>  .L\@_orig_s:
>>>      \oldinstr
>>>  .L\@_orig_e:
>>> +    /*
>>> +     * Calculate the difference in size between the replacement and original
>>> +     * instructions, to derive how much padding to introduce.
>>> +     */
>>> +    .L\@_diff = repl_len(1) - orig_len
>>> +
>>> +    .skip as_true(.L\@_diff > 0) * .L\@_diff, 0x90
>> How certain are you that these forward references actually work on
>> at least all half way recent gas versions? IOW I wonder whether it
>> wouldn't be more safe to make these backward references (i.e.
>> emitting the replacement code first).
>
>This code is used unconditionally in Linux, so any binutils within their
>minimum set work.
>
>So overall, I'm going to err on the side of "yes" in answer to your
>question.

Okay.

>>> @@ -45,18 +70,31 @@
>>>  .L\@_orig_s:
>>>      \oldinstr
>>>  .L\@_orig_e:
>>> +    /*
>>> +     * Calculate the difference in size between the largest replacement and
>>> +     * the original instructions, to derive how much padding to introduce.
>>> +     */
>>> +    .L\@_diff = as_max(repl_len(1), repl_len(2)) - orig_len
>>> +
>>> +     .skip as_true(.L\@_diff > 0) * .L\@_diff, 0x90
>>> +.L\@_orig_p:
>>>  
>>>      .pushsection .altinstructions, "a", @progbits
>>>  
>>>      altinstruction_entry .L\@_orig_s, .L\@_repl_s1, \feature1, \
>>> -        orig_len, repl_len(1)
>>> +        orig_len, repl_len(1), pad_len
>>>      altinstruction_entry .L\@_orig_s, .L\@_repl_s2, \feature2, \
>>> -        orig_len, repl_len(2)
>>> +        orig_len, repl_len(2), pad_len
>>>  
>>>      .section .discard, "a", @progbits
>>> -    /* Assembler-time check that \newinstr{1,2} aren't longer than \oldinstr. */
>>> -    .byte 0xff + repl_len(1) - orig_len
>>> -    .byte 0xff + repl_len(2) - orig_len
>>> +    /*
>>> +     * Assembler-time checks:
>>> +     *   - total_len <= 255
>>> +     *   - \newinstr* <= total_len
>>> +     */
>>> +    .byte total_len
>>> +    .byte 0xff + repl_len(1) - total_len
>>> +    .byte 0xff + repl_len(2) - total_len
>> Use as_max() here and emit only a single byte? I don't think the
>> diagnostic will be any less helpful, as iirc it doesn't point out the
>> line inside the macro anyway, so the developer will be left guessing
>> anyway which of the two alternatives it was. Otoh with the
>> padding size now being calculated, I don't see much point in these
>> checks anymore anyway.
>
>The bytes are discarded, so the quantity doesn't matter.  I can use max,
>and you are correct that the line reference only ends up pointing to the
>.macro itself.
>
>However, I would prefer to keep these.  They've spotted two bugs in the
>binutils development effort of .nops, and I'd prefer to be safe than sorry.

Well, fine with me then, and in that case it'll perhaps be indeed better to
avoid the use of max() here.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 7/7] x86/build: Use new .nop directive when available
  2018-02-28 17:45     ` Andrew Cooper
@ 2018-03-01  7:28       ` Jan Beulich
  2018-03-01 10:36         ` Roger Pau Monné
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2018-03-01  7:28 UTC (permalink / raw)
  To: andrew.cooper3; +Cc: xen-devel, wei.liu2, roger.pau

>>> Andrew Cooper <andrew.cooper3@citrix.com> 02/28/18 7:20 PM >>>
>On 28/02/18 16:22, Jan Beulich wrote:
>>>>> On 26.02.18 at 12:35, <andrew.cooper3@citrix.com> wrote:
>>> --- a/xen/include/asm-x86/alternative-asm.h
>>> +++ b/xen/include/asm-x86/alternative-asm.h
>>> @@ -1,6 +1,8 @@
>>>  #ifndef _ASM_X86_ALTERNATIVE_ASM_H_
>>>  #define _ASM_X86_ALTERNATIVE_ASM_H_
>>>  
>>> +#include <asm/nops.h>
>>> +
>>>  #ifdef __ASSEMBLY__
>>>  
>>>  /*
>>> @@ -18,6 +20,14 @@
>>>      .byte \pad_len
>>>  .endm
>>>  
>>> +.macro mknops nr_bytes
>>> +#ifdef HAVE_AS_NOP_DIRECTIVE
>>> +    .nop \nr_bytes, ASM_NOP_MAX
>> And do you really need to specify ASM_NOP_MAX here? What's
>> wrong with letting the assembler pick what it wants as the longest
>> NOP?
>
>I don't want a toolchain change to cause us to go beyond 11 byte nops,
>because of the associated decode stall on almost all hardware.  Using
>ASM_NOP_MAX seemed like the easiest way to keep the end result
>consistent, irrespective of toolchain support.

I don't understand - an earlier patch takes care of runtime replacing them
anyway. What stalls can then result?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 7/7] x86/build: Use new .nop directive when available
  2018-03-01  7:28       ` Jan Beulich
@ 2018-03-01 10:36         ` Roger Pau Monné
  2018-03-01 10:54           ` Jan Beulich
  0 siblings, 1 reply; 26+ messages in thread
From: Roger Pau Monné @ 2018-03-01 10:36 UTC (permalink / raw)
  To: Jan Beulich; +Cc: andrew.cooper3, wei.liu2, xen-devel

On Thu, Mar 01, 2018 at 12:28:27AM -0700, Jan Beulich wrote:
> >>> Andrew Cooper <andrew.cooper3@citrix.com> 02/28/18 7:20 PM >>>
> >On 28/02/18 16:22, Jan Beulich wrote:
> >>>>> On 26.02.18 at 12:35, <andrew.cooper3@citrix.com> wrote:
> >>> --- a/xen/include/asm-x86/alternative-asm.h
> >>> +++ b/xen/include/asm-x86/alternative-asm.h
> >>> @@ -1,6 +1,8 @@
> >>>  #ifndef _ASM_X86_ALTERNATIVE_ASM_H_
> >>>  #define _ASM_X86_ALTERNATIVE_ASM_H_
> >>>  
> >>> +#include <asm/nops.h>
> >>> +
> >>>  #ifdef __ASSEMBLY__
> >>>  
> >>>  /*
> >>> @@ -18,6 +20,14 @@
> >>>      .byte \pad_len
> >>>  .endm
> >>>  
> >>> +.macro mknops nr_bytes
> >>> +#ifdef HAVE_AS_NOP_DIRECTIVE
> >>> +    .nop \nr_bytes, ASM_NOP_MAX
> >> And do you really need to specify ASM_NOP_MAX here? What's
> >> wrong with letting the assembler pick what it wants as the longest
> >> NOP?
> >
> >I don't want a toolchain change to cause us to go beyond 11 byte nops,
> >because of the associated decode stall on almost all hardware.  Using
> >ASM_NOP_MAX seemed like the easiest way to keep the end result
> >consistent, irrespective of toolchain support.
> 
> I don't understand - an earlier patch takes care of runtime replacing them
> anyway. What stalls can then result?

The runtime replacement won't happen when using the .nops directive
AFAICT, because the original padding section will likely be filled
with opcodes different than 0x90, and thus the runtime nop
optimization won't be performed.

I also agree that using the default (not proving a second argument)
seems like a better solution. Why would the toolstack switch to
something that leads to worse performance? That would certainly be
considered a bug.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 7/7] x86/build: Use new .nop directive when available
  2018-03-01 10:36         ` Roger Pau Monné
@ 2018-03-01 10:54           ` Jan Beulich
  2018-03-01 16:58             ` Andrew Cooper
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2018-03-01 10:54 UTC (permalink / raw)
  To: Andrew Cooper, Roger Pau Monné; +Cc: wei.liu2, xen-devel

>>> On 01.03.18 at 11:36, <roger.pau@citrix.com> wrote:
> On Thu, Mar 01, 2018 at 12:28:27AM -0700, Jan Beulich wrote:
>> >>> Andrew Cooper <andrew.cooper3@citrix.com> 02/28/18 7:20 PM >>>
>> >On 28/02/18 16:22, Jan Beulich wrote:
>> >>>>> On 26.02.18 at 12:35, <andrew.cooper3@citrix.com> wrote:
>> >>> --- a/xen/include/asm-x86/alternative-asm.h
>> >>> +++ b/xen/include/asm-x86/alternative-asm.h
>> >>> @@ -1,6 +1,8 @@
>> >>>  #ifndef _ASM_X86_ALTERNATIVE_ASM_H_
>> >>>  #define _ASM_X86_ALTERNATIVE_ASM_H_
>> >>>  
>> >>> +#include <asm/nops.h>
>> >>> +
>> >>>  #ifdef __ASSEMBLY__
>> >>>  
>> >>>  /*
>> >>> @@ -18,6 +20,14 @@
>> >>>      .byte \pad_len
>> >>>  .endm
>> >>>  
>> >>> +.macro mknops nr_bytes
>> >>> +#ifdef HAVE_AS_NOP_DIRECTIVE
>> >>> +    .nop \nr_bytes, ASM_NOP_MAX
>> >> And do you really need to specify ASM_NOP_MAX here? What's
>> >> wrong with letting the assembler pick what it wants as the longest
>> >> NOP?
>> >
>> >I don't want a toolchain change to cause us to go beyond 11 byte nops,
>> >because of the associated decode stall on almost all hardware.  Using
>> >ASM_NOP_MAX seemed like the easiest way to keep the end result
>> >consistent, irrespective of toolchain support.
>> 
>> I don't understand - an earlier patch takes care of runtime replacing them
>> anyway. What stalls can then result?
> 
> The runtime replacement won't happen when using the .nops directive
> AFAICT, because the original padding section will likely be filled
> with opcodes different than 0x90, and thus the runtime nop
> optimization won't be performed.

Oh, indeed. That puts under question the whole idea of using
.nops in favor of .skip. Andrew, I'm sorry, but with this I prefer
to withdraw my ack.

> I also agree that using the default (not proving a second argument)
> seems like a better solution. Why would the toolstack switch to
> something that leads to worse performance? That would certainly be
> considered a bug.

Why? They may change it based on data available for newer /
older / whatever hardware. Any build-time choice is going to be
suboptimal somewhere, so I think we absolutely should not
bypass runtime replacing these NOPs, the more that now we
may have quite large sequences of them.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 3/7] x86/alt: Clean up the assembly used to generate alternatives
  2018-02-26 11:35 ` [PATCH v2 3/7] x86/alt: Clean up the assembly used to generate alternatives Andrew Cooper
  2018-02-26 14:09   ` Roger Pau Monné
  2018-02-28 15:59   ` Jan Beulich
@ 2018-03-01 12:44   ` Wei Liu
  2 siblings, 0 replies; 26+ messages in thread
From: Wei Liu @ 2018-03-01 12:44 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Roger Pau Monné, Jan Beulich, Xen-devel

On Mon, Feb 26, 2018 at 11:35:00AM +0000, Andrew Cooper wrote:
>  * On the C side, switch to using local lables rather than hardcoded numbers.
>  * Rename parameters and lables to be consistent with alt_instr names, and
>    consistent between the the C and asm versions.
>  * On the asm side, factor some expressions out into macros to aid clarity.
>  * Consistently declare section attributes.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Wei Liu <wei.liu2@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 7/7] x86/build: Use new .nop directive when available
  2018-03-01 10:54           ` Jan Beulich
@ 2018-03-01 16:58             ` Andrew Cooper
  2018-03-02  7:10               ` Jan Beulich
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Cooper @ 2018-03-01 16:58 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monné; +Cc: wei.liu2, xen-devel

On 01/03/18 10:54, Jan Beulich wrote:
>>>> On 01.03.18 at 11:36, <roger.pau@citrix.com> wrote:
>> On Thu, Mar 01, 2018 at 12:28:27AM -0700, Jan Beulich wrote:
>>>>>> Andrew Cooper <andrew.cooper3@citrix.com> 02/28/18 7:20 PM >>>
>>>> On 28/02/18 16:22, Jan Beulich wrote:
>>>>>>>> On 26.02.18 at 12:35, <andrew.cooper3@citrix.com> wrote:
>>>>>> --- a/xen/include/asm-x86/alternative-asm.h
>>>>>> +++ b/xen/include/asm-x86/alternative-asm.h
>>>>>> @@ -1,6 +1,8 @@
>>>>>>  #ifndef _ASM_X86_ALTERNATIVE_ASM_H_
>>>>>>  #define _ASM_X86_ALTERNATIVE_ASM_H_
>>>>>>  
>>>>>> +#include <asm/nops.h>
>>>>>> +
>>>>>>  #ifdef __ASSEMBLY__
>>>>>>  
>>>>>>  /*
>>>>>> @@ -18,6 +20,14 @@
>>>>>>      .byte \pad_len
>>>>>>  .endm
>>>>>>  
>>>>>> +.macro mknops nr_bytes
>>>>>> +#ifdef HAVE_AS_NOP_DIRECTIVE
>>>>>> +    .nop \nr_bytes, ASM_NOP_MAX
>>>>> And do you really need to specify ASM_NOP_MAX here? What's
>>>>> wrong with letting the assembler pick what it wants as the longest
>>>>> NOP?
>>>> I don't want a toolchain change to cause us to go beyond 11 byte nops,
>>>> because of the associated decode stall on almost all hardware.  Using
>>>> ASM_NOP_MAX seemed like the easiest way to keep the end result
>>>> consistent, irrespective of toolchain support.
>>> I don't understand - an earlier patch takes care of runtime replacing them
>>> anyway. What stalls can then result?
>> The runtime replacement won't happen when using the .nops directive
>> AFAICT, because the original padding section will likely be filled
>> with opcodes different than 0x90, and thus the runtime nop
>> optimization won't be performed.
> Oh, indeed. That puts under question the whole idea of using
> .nops in favor of .skip. Andrew, I'm sorry, but with this I prefer
> to withdraw my ack.
>
>> I also agree that using the default (not proving a second argument)
>> seems like a better solution. Why would the toolstack switch to
>> something that leads to worse performance? That would certainly be
>> considered a bug.
> Why? They may change it based on data available for newer /
> older / whatever hardware. Any build-time choice is going to be
> suboptimal somewhere, so I think we absolutely should not
> bypass runtime replacing these NOPs, the more that now we
> may have quite large sequences of them.

The pont of having the toolchain put out optimised nops is to avoid the
need for us to patch the site at all.  I.e. calling optimise_nops() on a
set of toolchain nops defeats the purpose in the overwhelming common
case of running on a system which prefers P6 nops.

The problem of working out when to optimise is that, when we come to
apply an individual alternative, we don't know if we've already patched
this site before.  Even the unoptimised algorithm has a corner case
which explodes, if there is a stream of 0x90's on the end of a
replacement e.g. in a imm or disp field.

Put simply, we cannot determine, by peeking at the patchsite, whether it
has been patched or not (other than keeping a full copy of the origin
site as a reference).  As soon as we chose to optimise the nops of the
origin site, we cannot determine anything at all.

Thinking out loud, we could perhaps have a section containing one byte
per origin site, which we use to track whether we've already optimised
the padding bytes, and whether the contents have been replaced.  This
would also add an extra long into struct altentry, but its all cold data
after boot.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 7/7] x86/build: Use new .nop directive when available
  2018-03-01 16:58             ` Andrew Cooper
@ 2018-03-02  7:10               ` Jan Beulich
  2018-03-02 19:34                 ` Andrew Cooper
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2018-03-02  7:10 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, wei.liu2, Roger Pau Monné

>>> On 01.03.18 at 17:58, <andrew.cooper3@citrix.com> wrote:
> On 01/03/18 10:54, Jan Beulich wrote:
>>>>> On 01.03.18 at 11:36, <roger.pau@citrix.com> wrote:
>>> On Thu, Mar 01, 2018 at 12:28:27AM -0700, Jan Beulich wrote:
>>>>>>> Andrew Cooper <andrew.cooper3@citrix.com> 02/28/18 7:20 PM >>>
>>>>> On 28/02/18 16:22, Jan Beulich wrote:
>>>>>>>>> On 26.02.18 at 12:35, <andrew.cooper3@citrix.com> wrote:
>>>>>>> --- a/xen/include/asm-x86/alternative-asm.h
>>>>>>> +++ b/xen/include/asm-x86/alternative-asm.h
>>>>>>> @@ -1,6 +1,8 @@
>>>>>>>  #ifndef _ASM_X86_ALTERNATIVE_ASM_H_
>>>>>>>  #define _ASM_X86_ALTERNATIVE_ASM_H_
>>>>>>>  
>>>>>>> +#include <asm/nops.h>
>>>>>>> +
>>>>>>>  #ifdef __ASSEMBLY__
>>>>>>>  
>>>>>>>  /*
>>>>>>> @@ -18,6 +20,14 @@
>>>>>>>      .byte \pad_len
>>>>>>>  .endm
>>>>>>>  
>>>>>>> +.macro mknops nr_bytes
>>>>>>> +#ifdef HAVE_AS_NOP_DIRECTIVE
>>>>>>> +    .nop \nr_bytes, ASM_NOP_MAX
>>>>>> And do you really need to specify ASM_NOP_MAX here? What's
>>>>>> wrong with letting the assembler pick what it wants as the longest
>>>>>> NOP?
>>>>> I don't want a toolchain change to cause us to go beyond 11 byte nops,
>>>>> because of the associated decode stall on almost all hardware.  Using
>>>>> ASM_NOP_MAX seemed like the easiest way to keep the end result
>>>>> consistent, irrespective of toolchain support.
>>>> I don't understand - an earlier patch takes care of runtime replacing them
>>>> anyway. What stalls can then result?
>>> The runtime replacement won't happen when using the .nops directive
>>> AFAICT, because the original padding section will likely be filled
>>> with opcodes different than 0x90, and thus the runtime nop
>>> optimization won't be performed.
>> Oh, indeed. That puts under question the whole idea of using
>> .nops in favor of .skip. Andrew, I'm sorry, but with this I prefer
>> to withdraw my ack.
>>
>>> I also agree that using the default (not proving a second argument)
>>> seems like a better solution. Why would the toolstack switch to
>>> something that leads to worse performance? That would certainly be
>>> considered a bug.
>> Why? They may change it based on data available for newer /
>> older / whatever hardware. Any build-time choice is going to be
>> suboptimal somewhere, so I think we absolutely should not
>> bypass runtime replacing these NOPs, the more that now we
>> may have quite large sequences of them.
> 
> The pont of having the toolchain put out optimised nops is to avoid the
> need for us to patch the site at all.  I.e. calling optimise_nops() on a
> set of toolchain nops defeats the purpose in the overwhelming common
> case of running on a system which prefers P6 nops.
> 
> The problem of working out when to optimise is that, when we come to
> apply an individual alternative, we don't know if we've already patched
> this site before.  Even the unoptimised algorithm has a corner case
> which explodes, if there is a stream of 0x90's on the end of a
> replacement e.g. in a imm or disp field.
> 
> Put simply, we cannot determine, by peeking at the patchsite, whether it
> has been patched or not (other than keeping a full copy of the origin
> site as a reference).  As soon as we chose to optimise the nops of the
> origin site, we cannot determine anything at all.
> 
> Thinking out loud, we could perhaps have a section containing one byte
> per origin site, which we use to track whether we've already optimised
> the padding bytes, and whether the contents have been replaced.  This
> would also add an extra long into struct altentry, but its all cold data
> after boot.

What about alternatively simply updating the struct alt_instr
instances to describe the code _after_ a patch that was applied?
That'll allow to always know how much padding there is.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 7/7] x86/build: Use new .nop directive when available
  2018-03-02  7:10               ` Jan Beulich
@ 2018-03-02 19:34                 ` Andrew Cooper
  2018-03-05  8:33                   ` Jan Beulich
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Cooper @ 2018-03-02 19:34 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, wei.liu2, Roger Pau Monné

On 02/03/18 07:10, Jan Beulich wrote:
>>>> On 01.03.18 at 17:58, <andrew.cooper3@citrix.com> wrote:
>> On 01/03/18 10:54, Jan Beulich wrote:
>>>>>> On 01.03.18 at 11:36, <roger.pau@citrix.com> wrote:
>>>> On Thu, Mar 01, 2018 at 12:28:27AM -0700, Jan Beulich wrote:
>>>>>>>> Andrew Cooper <andrew.cooper3@citrix.com> 02/28/18 7:20 PM >>>
>>>>>> On 28/02/18 16:22, Jan Beulich wrote:
>>>>>>>>>> On 26.02.18 at 12:35, <andrew.cooper3@citrix.com> wrote:
>>>>>>>> --- a/xen/include/asm-x86/alternative-asm.h
>>>>>>>> +++ b/xen/include/asm-x86/alternative-asm.h
>>>>>>>> @@ -1,6 +1,8 @@
>>>>>>>>  #ifndef _ASM_X86_ALTERNATIVE_ASM_H_
>>>>>>>>  #define _ASM_X86_ALTERNATIVE_ASM_H_
>>>>>>>>  
>>>>>>>> +#include <asm/nops.h>
>>>>>>>> +
>>>>>>>>  #ifdef __ASSEMBLY__
>>>>>>>>  
>>>>>>>>  /*
>>>>>>>> @@ -18,6 +20,14 @@
>>>>>>>>      .byte \pad_len
>>>>>>>>  .endm
>>>>>>>>  
>>>>>>>> +.macro mknops nr_bytes
>>>>>>>> +#ifdef HAVE_AS_NOP_DIRECTIVE
>>>>>>>> +    .nop \nr_bytes, ASM_NOP_MAX
>>>>>>> And do you really need to specify ASM_NOP_MAX here? What's
>>>>>>> wrong with letting the assembler pick what it wants as the longest
>>>>>>> NOP?
>>>>>> I don't want a toolchain change to cause us to go beyond 11 byte nops,
>>>>>> because of the associated decode stall on almost all hardware.  Using
>>>>>> ASM_NOP_MAX seemed like the easiest way to keep the end result
>>>>>> consistent, irrespective of toolchain support.
>>>>> I don't understand - an earlier patch takes care of runtime replacing them
>>>>> anyway. What stalls can then result?
>>>> The runtime replacement won't happen when using the .nops directive
>>>> AFAICT, because the original padding section will likely be filled
>>>> with opcodes different than 0x90, and thus the runtime nop
>>>> optimization won't be performed.
>>> Oh, indeed. That puts under question the whole idea of using
>>> .nops in favor of .skip. Andrew, I'm sorry, but with this I prefer
>>> to withdraw my ack.
>>>
>>>> I also agree that using the default (not proving a second argument)
>>>> seems like a better solution. Why would the toolstack switch to
>>>> something that leads to worse performance? That would certainly be
>>>> considered a bug.
>>> Why? They may change it based on data available for newer /
>>> older / whatever hardware. Any build-time choice is going to be
>>> suboptimal somewhere, so I think we absolutely should not
>>> bypass runtime replacing these NOPs, the more that now we
>>> may have quite large sequences of them.
>> The pont of having the toolchain put out optimised nops is to avoid the
>> need for us to patch the site at all.  I.e. calling optimise_nops() on a
>> set of toolchain nops defeats the purpose in the overwhelming common
>> case of running on a system which prefers P6 nops.
>>
>> The problem of working out when to optimise is that, when we come to
>> apply an individual alternative, we don't know if we've already patched
>> this site before.  Even the unoptimised algorithm has a corner case
>> which explodes, if there is a stream of 0x90's on the end of a
>> replacement e.g. in a imm or disp field.
>>
>> Put simply, we cannot determine, by peeking at the patchsite, whether it
>> has been patched or not (other than keeping a full copy of the origin
>> site as a reference).  As soon as we chose to optimise the nops of the
>> origin site, we cannot determine anything at all.
>>
>> Thinking out loud, we could perhaps have a section containing one byte
>> per origin site, which we use to track whether we've already optimised
>> the padding bytes, and whether the contents have been replaced.  This
>> would also add an extra long into struct altentry, but its all cold data
>> after boot.
> What about alternatively simply updating the struct alt_instr
> instances to describe the code _after_ a patch that was applied?
> That'll allow to always know how much padding there is.

There are multiple alt_instr pointing to the same origin sites when
using ALTERNATIVE_2, meaning you keep all the safety problems with the
current setup.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 7/7] x86/build: Use new .nop directive when available
  2018-03-02 19:34                 ` Andrew Cooper
@ 2018-03-05  8:33                   ` Jan Beulich
  0 siblings, 0 replies; 26+ messages in thread
From: Jan Beulich @ 2018-03-05  8:33 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, wei.liu2, Roger Pau Monné

>>> On 02.03.18 at 20:34, <andrew.cooper3@citrix.com> wrote:
> On 02/03/18 07:10, Jan Beulich wrote:
>>>>> On 01.03.18 at 17:58, <andrew.cooper3@citrix.com> wrote:
>>> The pont of having the toolchain put out optimised nops is to avoid the
>>> need for us to patch the site at all.  I.e. calling optimise_nops() on a
>>> set of toolchain nops defeats the purpose in the overwhelming common
>>> case of running on a system which prefers P6 nops.
>>>
>>> The problem of working out when to optimise is that, when we come to
>>> apply an individual alternative, we don't know if we've already patched
>>> this site before.  Even the unoptimised algorithm has a corner case
>>> which explodes, if there is a stream of 0x90's on the end of a
>>> replacement e.g. in a imm or disp field.
>>>
>>> Put simply, we cannot determine, by peeking at the patchsite, whether it
>>> has been patched or not (other than keeping a full copy of the origin
>>> site as a reference).  As soon as we chose to optimise the nops of the
>>> origin site, we cannot determine anything at all.
>>>
>>> Thinking out loud, we could perhaps have a section containing one byte
>>> per origin site, which we use to track whether we've already optimised
>>> the padding bytes, and whether the contents have been replaced.  This
>>> would also add an extra long into struct altentry, but its all cold data
>>> after boot.
>> What about alternatively simply updating the struct alt_instr
>> instances to describe the code _after_ a patch that was applied?
>> That'll allow to always know how much padding there is.
> 
> There are multiple alt_instr pointing to the same origin sites when
> using ALTERNATIVE_2, meaning you keep all the safety problems with the
> current setup.

Oh, right, it was rubbish what I said - we don't ever look a 2nd time
at any particular struct alt_instr instance. We'd have to settle on
multiple changes to the same patch site to always be adjacent, such
that apply_alternatives() could have local state forwarding (between
loop iterations) added.

For the approach you describe I can't see how you would safely
glue together multiple patches to the same origin site, unless we
would outright forbid any use of lower level infrastructure than
ALTERNATIVE_2(). But if we demand that, we could easily flag
the first of the alternatives with a "continued" indicator, used to
trigger state forwarding in apply_alternatives() (perhaps no
more state than "avoid the NOP optimization if the controlling
feature is unavailable"). No need for an extra array and pointer
then.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2018-03-05  8:33 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-26 11:34 [PATCH RESEND v2 0/7] x86/alternatives: Support for automatic padding calculations Andrew Cooper
2018-02-26 11:34 ` [PATCH v2 1/7] x86/alt: Drop unused alternative infrastructure Andrew Cooper
2018-02-26 11:34 ` [PATCH v2 2/7] x86/alt: Clean up struct alt_instr and its users Andrew Cooper
2018-02-26 11:35 ` [PATCH v2 3/7] x86/alt: Clean up the assembly used to generate alternatives Andrew Cooper
2018-02-26 14:09   ` Roger Pau Monné
2018-02-28 15:59   ` Jan Beulich
2018-03-01 12:44   ` Wei Liu
2018-02-26 11:35 ` [PATCH v2 4/7] x86/asm: Remove opencoded uses of altinstruction_entry Andrew Cooper
2018-02-26 11:35 ` [PATCH v2 5/7] x86/alt: Support for automatic padding calculations Andrew Cooper
2018-02-28 16:16   ` Jan Beulich
2018-02-28 17:26     ` Andrew Cooper
2018-03-01  7:26       ` Jan Beulich
2018-02-26 11:35 ` [PATCH v2 6/7] x86/alt: Drop explicit padding of origin sites Andrew Cooper
2018-02-26 11:35 ` [PATCH v2 7/7] x86/build: Use new .nop directive when available Andrew Cooper
2018-02-26 12:31   ` Roger Pau Monné
2018-02-26 13:08     ` Andrew Cooper
2018-02-26 14:27       ` Roger Pau Monné
2018-02-28 16:22   ` Jan Beulich
2018-02-28 17:45     ` Andrew Cooper
2018-03-01  7:28       ` Jan Beulich
2018-03-01 10:36         ` Roger Pau Monné
2018-03-01 10:54           ` Jan Beulich
2018-03-01 16:58             ` Andrew Cooper
2018-03-02  7:10               ` Jan Beulich
2018-03-02 19:34                 ` Andrew Cooper
2018-03-05  8:33                   ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.