[PATCH v1 00/12] x86, alternatives: Instruction padding and more robust JMPs

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v1 00/12] x86, alternatives: Instruction padding and more robust JMPs
@ 2015-02-03 18:16 Borislav Petkov
  2015-02-03 18:16 ` [PATCH v1 01/12] x86, copy_user: Remove FIX_ALIGNMENT define Borislav Petkov
                   ` (12 more replies)
  0 siblings, 13 replies; 21+ messages in thread
From: Borislav Petkov @ 2015-02-03 18:16 UTC (permalink / raw)
  To: X86 ML; +Cc: LKML

From: Borislav Petkov <bp@suse.de>

  [ Changelog is in version-increasing number so that one can follow the
    evolution of the patch set in a more natural way (i.e., latest version
    comes at the end. ]

v0:

this is something which hpa and I talked about recently: the ability for
the alternatives code to add padding to the original instruction in case
the replacement is longer and also to be able to simply write "jmp" and
not care about which JMP exactly the compiler generates and whether the
relative offsets are correct.

So this is a stab at it, it seems to boot in kvm here but it needs more
staring to make sure we're actually generating the proper code at all
times.

Thus the RFC tag, comments/suggestions are welcome.

v1:

This is the first version which passes testing on AMD/Intel, 32/64-bit
boxes I have here. For more info what it does, you can boot with
"debug-alternative" to see some verbose information about what gets
changed into what.

Patches 1 and 2 are cleanups.

Patch 3 is adding the padding at build time and patch 4 simplifies using
JMPs in alternatives without having to do crazy math with labels, as a
user of the alternatives facilities.

Patch 5 optimizes the single-byte NOPs we're adding at build time to
longer NOPs which should go easier through the frontend.

Patches 6-12 then convert most of the alternative callsites to the
generic macros and kill the homegrown fun.

As always, constructive comments/suggestions are welcome.

Borislav Petkov (12):
  x86, copy_user: Remove FIX_ALIGNMENT define
  x86, alternatives: Cleanup DPRINTK macro
  x86, alternatives: Add instruction padding
  x86, alternatives: Make JMPs more robust
  x86, alternatives: Use optimized NOPs for padding
  x86, copy_page_64.S: Use generic ALTERNATIVE macro
  x86, copy_user_64.S: Convert to ALTERNATIVE_2
  x86, SMAP: Use ALTERNATIVE macro
  x86, alternative: Convert X86_INVD_BUG to generic macro
  x86, alternatives: Convert clear_page_64.S
  x86, alternative: Use alternative_2 in rdtsc_barrier
  x86, alternative: Cleanup prefetch primitives

 arch/x86/include/asm/alternative-asm.h |  40 ++++++++
 arch/x86/include/asm/alternative.h     |  58 +++++++-----
 arch/x86/include/asm/apic.h            |   2 +-
 arch/x86/include/asm/barrier.h         |   6 +-
 arch/x86/include/asm/cpufeature.h      |  23 ++---
 arch/x86/include/asm/processor.h       |  16 ++--
 arch/x86/include/asm/smap.h            |  30 ++----
 arch/x86/kernel/alternative.c          | 164 ++++++++++++++++++++++++++++-----
 arch/x86/kernel/cpu/amd.c              |   5 +
 arch/x86/kernel/entry_32.S             |  12 +--
 arch/x86/lib/clear_page_64.S           |  66 ++++++-------
 arch/x86/lib/copy_page_64.S            |  37 +++-----
 arch/x86/lib/copy_user_64.S            |  46 ++-------
 arch/x86/um/asm/barrier.h              |   4 +-
 14 files changed, 305 insertions(+), 204 deletions(-)

-- 
2.2.0.33.gc18b867

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v1 01/12] x86, copy_user: Remove FIX_ALIGNMENT define
  2015-02-03 18:16 [PATCH v1 00/12] x86, alternatives: Instruction padding and more robust JMPs Borislav Petkov
@ 2015-02-03 18:16 ` Borislav Petkov
  2015-02-03 18:16 ` [PATCH v1 02/12] x86, alternatives: Cleanup DPRINTK macro Borislav Petkov
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Borislav Petkov @ 2015-02-03 18:16 UTC (permalink / raw)
  To: X86 ML; +Cc: LKML

From: Borislav Petkov <bp@suse.de>

It is superfluous now so remove it. No object file change before and
after.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/lib/copy_user_64.S | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S
index dee945d55594..1530ec2c1b12 100644
--- a/arch/x86/lib/copy_user_64.S
+++ b/arch/x86/lib/copy_user_64.S
@@ -8,9 +8,6 @@
 
 #include <linux/linkage.h>
 #include <asm/dwarf2.h>
-
-#define FIX_ALIGNMENT 1
-
 #include <asm/current.h>
 #include <asm/asm-offsets.h>
 #include <asm/thread_info.h>
@@ -45,7 +42,6 @@
 	.endm
 
 	.macro ALIGN_DESTINATION
-#ifdef FIX_ALIGNMENT
 	/* check for bad alignment of destination */
 	movl %edi,%ecx
 	andl $7,%ecx
@@ -67,7 +63,6 @@
 
 	_ASM_EXTABLE(100b,103b)
 	_ASM_EXTABLE(101b,103b)
-#endif
 	.endm
 
 /* Standard copy_to_user with segment limit checking */
-- 
2.2.0.33.gc18b867


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v1 02/12] x86, alternatives: Cleanup DPRINTK macro
  2015-02-03 18:16 [PATCH v1 00/12] x86, alternatives: Instruction padding and more robust JMPs Borislav Petkov
  2015-02-03 18:16 ` [PATCH v1 01/12] x86, copy_user: Remove FIX_ALIGNMENT define Borislav Petkov
@ 2015-02-03 18:16 ` Borislav Petkov
  2015-02-03 19:01   ` Joe Perches
  2015-02-03 18:16 ` [PATCH v1 03/12] x86, alternatives: Add instruction padding Borislav Petkov
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 21+ messages in thread
From: Borislav Petkov @ 2015-02-03 18:16 UTC (permalink / raw)
  To: X86 ML; +Cc: LKML

From: Borislav Petkov <bp@suse.de>

Make it pass __func__ implicitly. Also, dump info about each replacing
we're doing. Fixup comments and style while at it.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/alternative.c | 41 +++++++++++++++++++++++++----------------
 1 file changed, 25 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 703130f469ec..1e86e85bcf58 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -52,10 +52,10 @@ static int __init setup_noreplace_paravirt(char *str)
 __setup("noreplace-paravirt", setup_noreplace_paravirt);
 #endif
 
-#define DPRINTK(fmt, ...)				\
-do {							\
-	if (debug_alternative)				\
-		printk(KERN_DEBUG fmt, ##__VA_ARGS__);	\
+#define DPRINTK(fmt, args...)						\
+do {									\
+	if (debug_alternative)						\
+		printk(KERN_DEBUG "%s: " fmt "\n", __func__, ##args);	\
 } while (0)
 
 /*
@@ -243,12 +243,13 @@ extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
 extern s32 __smp_locks[], __smp_locks_end[];
 void *text_poke_early(void *addr, const void *opcode, size_t len);
 
-/* Replace instructions with better alternatives for this CPU type.
-   This runs before SMP is initialized to avoid SMP problems with
-   self modifying code. This implies that asymmetric systems where
-   APs have less capabilities than the boot processor are not handled.
-   Tough. Make sure you disable such features by hand. */
-
+/*
+ * Replace instructions with better alternatives for this CPU type. This runs
+ * before SMP is initialized to avoid SMP problems with self modifying code.
+ * This implies that asymmetric systems where APs have less capabilities than
+ * the boot processor are not handled. Tough. Make sure you disable such
+ * features by hand.
+ */
 void __init_or_module apply_alternatives(struct alt_instr *start,
 					 struct alt_instr *end)
 {
@@ -256,10 +257,10 @@ void __init_or_module apply_alternatives(struct alt_instr *start,
 	u8 *instr, *replacement;
 	u8 insnbuf[MAX_PATCH_LEN];
 
-	DPRINTK("%s: alt table %p -> %p\n", __func__, start, end);
+	DPRINTK("alt table %p -> %p", start, end);
 	/*
 	 * The scan order should be from start to end. A later scanned
-	 * alternative code can overwrite a previous scanned alternative code.
+	 * alternative code can overwrite previously scanned alternative code.
 	 * Some kernel functions (e.g. memcpy, memset, etc) use this order to
 	 * patch code.
 	 *
@@ -275,11 +276,19 @@ void __init_or_module apply_alternatives(struct alt_instr *start,
 		if (!boot_cpu_has(a->cpuid))
 			continue;
 
+		DPRINTK("feat: %d*32+%d, old: (%p, len: %d), repl: (%p, len: %d)",
+			a->cpuid >> 5,
+			a->cpuid & 0x1f,
+			instr, a->instrlen,
+			replacement, a->replacementlen);
+
 		memcpy(insnbuf, replacement, a->replacementlen);
 
 		/* 0xe8 is a relative jump; fix the offset. */
-		if (*insnbuf == 0xe8 && a->replacementlen == 5)
-		    *(s32 *)(insnbuf + 1) += replacement - instr;
+		if (*insnbuf == 0xe8 && a->replacementlen == 5) {
+			*(s32 *)(insnbuf + 1) += replacement - instr;
+			DPRINTK("Fix CALL offset: 0x%x", *(s32 *)(insnbuf + 1));
+		}
 
 		add_nops(insnbuf + a->replacementlen,
 			 a->instrlen - a->replacementlen);
@@ -371,8 +380,8 @@ void __init_or_module alternatives_smp_module_add(struct module *mod,
 	smp->locks_end	= locks_end;
 	smp->text	= text;
 	smp->text_end	= text_end;
-	DPRINTK("%s: locks %p -> %p, text %p -> %p, name %s\n",
-		__func__, smp->locks, smp->locks_end,
+	DPRINTK("locks %p -> %p, text %p -> %p, name %s\n",
+		smp->locks, smp->locks_end,
 		smp->text, smp->text_end, smp->name);
 
 	list_add_tail(&smp->next, &smp_alt_modules);
-- 
2.2.0.33.gc18b867


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v1 03/12] x86, alternatives: Add instruction padding
  2015-02-03 18:16 [PATCH v1 00/12] x86, alternatives: Instruction padding and more robust JMPs Borislav Petkov
  2015-02-03 18:16 ` [PATCH v1 01/12] x86, copy_user: Remove FIX_ALIGNMENT define Borislav Petkov
  2015-02-03 18:16 ` [PATCH v1 02/12] x86, alternatives: Cleanup DPRINTK macro Borislav Petkov
@ 2015-02-03 18:16 ` Borislav Petkov
  2015-02-03 18:16 ` [PATCH v1 04/12] x86, alternatives: Make JMPs more robust Borislav Petkov
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Borislav Petkov @ 2015-02-03 18:16 UTC (permalink / raw)
  To: X86 ML; +Cc: LKML

From: Borislav Petkov <bp@suse.de>

Up until now we have always paid attention to make sure the length of
the new instruction replacing the old one is at least less or equal to
the length of the old instruction. If the new instruction is longer, at
the time it replaces the old instruction it will overwrite the beginning
of the next instruction in the kernel image and cause your pants to
catch fire.

So instead of having to pay attention, teach the alternatives framework
to pad shorter old instructions with NOPs at buildtime - but only in the
case when

len(old instruction(s)) < len(new instruction(s))

and add nothing in the >= case. (In that case we do add_nops() when
patching).

This way the alternatives user shouldn't have to care about instruction
sizes and simply use the macros.

Add asm ALTERNATIVE* flavor macros too, while at it.

Thanks to Michael Matz for the great help with toolchain questions.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/include/asm/alternative-asm.h | 40 +++++++++++++++++++++++
 arch/x86/include/asm/alternative.h     | 58 ++++++++++++++++++++--------------
 arch/x86/include/asm/cpufeature.h      | 15 +++++----
 arch/x86/kernel/alternative.c          |  6 ++--
 4 files changed, 87 insertions(+), 32 deletions(-)

diff --git a/arch/x86/include/asm/alternative-asm.h b/arch/x86/include/asm/alternative-asm.h
index 372231c22a47..327d2c58cda5 100644
--- a/arch/x86/include/asm/alternative-asm.h
+++ b/arch/x86/include/asm/alternative-asm.h
@@ -26,6 +26,46 @@
 	.byte \alt_len
 .endm
 
+.macro ALTERNATIVE oldinstr, newinstr, feature
+140:
+	\oldinstr
+141:
+	.skip -(((144f-143f)-(141b-140b)) > 0) * ((144f-143f)-(141b-140b)),0x90
+142:
+
+	.pushsection .altinstructions,"a"
+	altinstruction_entry 140b,143f,\feature,142b-140b,144f-143f
+	.popsection
+
+	.pushsection .altinstr_replacement,"ax"
+143:
+	\newinstr
+144:
+	.popsection
+.endm
+
+.macro ALTERNATIVE_2 oldinstr, newinstr1, feature1, newinstr2, feature2
+140:
+	\oldinstr
+141:
+	.skip -(((144f-143f)-(141b-140b)) > 0) * ((144f-143f)-(141b-140b)),0x90
+	.skip -(((145f-144f)-(144f-143f)-(141b-140b)) > 0) * ((145f-144f)-(144f-143f)-(141b-140b)),0x90
+142:
+
+	.pushsection .altinstructions,"a"
+	altinstruction_entry 140b,143f,\feature1,142b-140b,144f-143f
+	altinstruction_entry 140b,144f,\feature2,142b-140b,145f-144f
+	.popsection
+
+	.pushsection .altinstr_replacement,"ax"
+143:
+	\newinstr1
+144:
+	\newinstr2
+145:
+	.popsection
+.endm
+
 #endif  /*  __ASSEMBLY__  */
 
 #endif /* _ASM_X86_ALTERNATIVE_ASM_H */
diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 473bdbee378a..0c4e7617a375 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -76,50 +76,59 @@ static inline int alternatives_text_reserved(void *start, void *end)
 }
 #endif	/* CONFIG_SMP */
 
-#define OLDINSTR(oldinstr)	"661:\n\t" oldinstr "\n662:\n"
+#define b_replacement(num)	"664"#num
+#define e_replacement(num)	"665"#num
 
-#define b_replacement(number)	"663"#number
-#define e_replacement(number)	"664"#number
+#define alt_end_marker		"663"
+#define alt_slen		"662b-661b"
+#define alt_total_slen		alt_end_marker"b-661b"
+#define alt_rlen(num)		e_replacement(num)"f-"b_replacement(num)"f"
 
-#define alt_slen "662b-661b"
-#define alt_rlen(number) e_replacement(number)"f-"b_replacement(number)"f"
+#define __OLDINSTR(oldinstr, num)					\
+	"661:\n\t" oldinstr "\n662:\n"					\
+	".skip -(((" alt_rlen(num) ")-(" alt_slen ")) > 0) * "		\
+		"((" alt_rlen(num) ")-(" alt_slen ")),0x90\n"
 
-#define ALTINSTR_ENTRY(feature, number)					      \
+#define OLDINSTR(oldinstr, num)						\
+	__OLDINSTR(oldinstr, num)					\
+	alt_end_marker ":\n"
+
+/*
+ * Pad the second replacement alternative with additional NOPs if it is
+ * additionally longer than the first replacement alternative.
+ */
+#define OLDINSTR_2(oldinstr, num1, num2)					\
+	__OLDINSTR(oldinstr, num1)						\
+	".skip -(((" alt_rlen(num2) ")-(" alt_rlen(num1) ")-(662b-661b)) > 0) * " \
+		"((" alt_rlen(num2) ")-(" alt_rlen(num1) ")-(662b-661b)),0x90\n"  \
+	alt_end_marker ":\n"
+
+#define ALTINSTR_ENTRY(feature, num)					      \
 	" .long 661b - .\n"				/* label           */ \
-	" .long " b_replacement(number)"f - .\n"	/* new instruction */ \
+	" .long " b_replacement(num)"f - .\n"		/* new instruction */ \
 	" .word " __stringify(feature) "\n"		/* feature bit     */ \
-	" .byte " alt_slen "\n"				/* source len      */ \
-	" .byte " alt_rlen(number) "\n"			/* replacement len */
+	" .byte " alt_total_slen "\n"			/* source len      */ \
+	" .byte " alt_rlen(num) "\n"			/* replacement len */
 
-#define DISCARD_ENTRY(number)				/* rlen <= slen */    \
-	" .byte 0xff + (" alt_rlen(number) ") - (" alt_slen ")\n"
-
-#define ALTINSTR_REPLACEMENT(newinstr, feature, number)	/* replacement */     \
-	b_replacement(number)":\n\t" newinstr "\n" e_replacement(number) ":\n\t"
+#define ALTINSTR_REPLACEMENT(newinstr, feature, num)	/* replacement */     \
+	b_replacement(num)":\n\t" newinstr "\n" e_replacement(num) ":\n\t"
 
 /* alternative assembly primitive: */
 #define ALTERNATIVE(oldinstr, newinstr, feature)			\
-	OLDINSTR(oldinstr)						\
+	OLDINSTR(oldinstr, 1)						\
 	".pushsection .altinstructions,\"a\"\n"				\
 	ALTINSTR_ENTRY(feature, 1)					\
 	".popsection\n"							\
-	".pushsection .discard,\"aw\",@progbits\n"			\
-	DISCARD_ENTRY(1)						\
-	".popsection\n"							\
 	".pushsection .altinstr_replacement, \"ax\"\n"			\
 	ALTINSTR_REPLACEMENT(newinstr, feature, 1)			\
 	".popsection"
 
 #define ALTERNATIVE_2(oldinstr, newinstr1, feature1, newinstr2, feature2)\
-	OLDINSTR(oldinstr)						\
+	OLDINSTR_2(oldinstr, 1, 2)					\
 	".pushsection .altinstructions,\"a\"\n"				\
 	ALTINSTR_ENTRY(feature1, 1)					\
 	ALTINSTR_ENTRY(feature2, 2)					\
 	".popsection\n"							\
-	".pushsection .discard,\"aw\",@progbits\n"			\
-	DISCARD_ENTRY(1)						\
-	DISCARD_ENTRY(2)						\
-	".popsection\n"							\
 	".pushsection .altinstr_replacement, \"ax\"\n"			\
 	ALTINSTR_REPLACEMENT(newinstr1, feature1, 1)			\
 	ALTINSTR_REPLACEMENT(newinstr2, feature2, 2)			\
@@ -146,6 +155,9 @@ static inline int alternatives_text_reserved(void *start, void *end)
 #define alternative(oldinstr, newinstr, feature)			\
 	asm volatile (ALTERNATIVE(oldinstr, newinstr, feature) : : : "memory")
 
+#define alternative_2(oldinstr, newinstr1, feature1, newinstr2, feature2) \
+	asm volatile(ALTERNATIVE_2(oldinstr, newinstr1, feature1, newinstr2, feature2) ::: "memory")
+
 /*
  * Alternative inline assembly with input.
  *
diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index aede2c347bde..5d491248c78d 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -489,22 +489,25 @@ static __always_inline __pure bool _static_cpu_has_safe(u16 bit)
  */
 		asm_volatile_goto("1: .byte 0xe9\n .long %l[t_dynamic] - 2f\n"
 			 "2:\n"
+			 ".skip -(((5f-4f) - (2b-1b)) > 0) * "
+			         "((5f-4f) - (2b-1b)),0x90\n"
+			 "3:\n"
 			 ".section .altinstructions,\"a\"\n"
 			 " .long 1b - .\n"		/* src offset */
-			 " .long 3f - .\n"		/* repl offset */
+			 " .long 4f - .\n"		/* repl offset */
 			 " .word %P1\n"			/* always replace */
-			 " .byte 2b - 1b\n"		/* src len */
-			 " .byte 4f - 3f\n"		/* repl len */
+			 " .byte 3b - 1b\n"		/* src len */
+			 " .byte 5f - 4f\n"		/* repl len */
 			 ".previous\n"
 			 ".section .altinstr_replacement,\"ax\"\n"
-			 "3: .byte 0xe9\n .long %l[t_no] - 2b\n"
-			 "4:\n"
+			 "4: .byte 0xe9\n .long %l[t_no] - 2b\n"
+			 "5:\n"
 			 ".previous\n"
 			 ".section .altinstructions,\"a\"\n"
 			 " .long 1b - .\n"		/* src offset */
 			 " .long 0\n"			/* no replacement */
 			 " .word %P0\n"			/* feature bit */
-			 " .byte 2b - 1b\n"		/* src len */
+			 " .byte 3b - 1b\n"		/* src len */
 			 " .byte 0\n"			/* repl len */
 			 ".previous\n"
 			 : : "i" (bit), "i" (X86_FEATURE_ALWAYS)
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 1e86e85bcf58..c99b0f13a90e 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -270,7 +270,6 @@ void __init_or_module apply_alternatives(struct alt_instr *start,
 	for (a = start; a < end; a++) {
 		instr = (u8 *)&a->instr_offset + a->instr_offset;
 		replacement = (u8 *)&a->repl_offset + a->repl_offset;
-		BUG_ON(a->replacementlen > a->instrlen);
 		BUG_ON(a->instrlen > sizeof(insnbuf));
 		BUG_ON(a->cpuid >= (NCAPINTS + NBUGINTS) * 32);
 		if (!boot_cpu_has(a->cpuid))
@@ -290,8 +289,9 @@ void __init_or_module apply_alternatives(struct alt_instr *start,
 			DPRINTK("Fix CALL offset: 0x%x", *(s32 *)(insnbuf + 1));
 		}
 
-		add_nops(insnbuf + a->replacementlen,
-			 a->instrlen - a->replacementlen);
+		if (a->instrlen > a->replacementlen)
+			add_nops(insnbuf + a->replacementlen,
+				 a->instrlen - a->replacementlen);
 
 		text_poke_early(instr, insnbuf, a->instrlen);
 	}
-- 
2.2.0.33.gc18b867


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v1 04/12] x86, alternatives: Make JMPs more robust
  2015-02-03 18:16 [PATCH v1 00/12] x86, alternatives: Instruction padding and more robust JMPs Borislav Petkov
                   ` (2 preceding siblings ...)
  2015-02-03 18:16 ` [PATCH v1 03/12] x86, alternatives: Add instruction padding Borislav Petkov
@ 2015-02-03 18:16 ` Borislav Petkov
  2015-02-03 18:16 ` [PATCH v1 05/12] x86, alternatives: Use optimized NOPs for padding Borislav Petkov
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Borislav Petkov @ 2015-02-03 18:16 UTC (permalink / raw)
  To: X86 ML; +Cc: LKML

From: Borislav Petkov <bp@suse.de>

Up until now we had to pay attention to relative JMPs in alternatives
about how their relative offset gets computed so that the jump target
is still correct. Or, as it is the case for near CALLs (opcode e8), we
still have to go and readjust the offset at patching time.

What is more, the static_cpu_has_safe() facility had to forcefully
generate 5-byte JMPs since we couldn't rely on the compiler to generate
properly sized ones. Worse than that, sometimes it would generate a
replacement JMP which is longer than the original one, thus overwriting
the beginning of the next instruction at patching time.

So, in order to alleviate all that and make using JMPs more
straight-forward we go and pad the original instruction in an
alternative block with NOPs at build time, should the replacement(s) be
longer. This way, alternatives users shouldn't pay special attention
so that original and replacement instruction sizes are fine but the
assembler would simply add padding where needed and not do anything
otherwise.

As a second aspect, we go and recompute JMPs at patching time so that we
can try to make 5-byte JMPs into two-byte ones if possible. If not, we
still have to recompute the offsets as the replacement JMP gets put far
away in the .altinstr_replacement section leading to a wrong offset if
copied verbatim.

For example, on a locally generated kernel image

old insn VA: 0xffffffff810014bd, CPU feat: X86_FEATURE_ALWAYS, size: 2
__switch_to:
 ffffffff810014bd:      eb 21                   jmp ffffffff810014e0
repl insn: size: 5
ffffffff81d0b23c:       e9 b1 62 2f ff          jmpq ffffffff810014f2

gets corrected to a 2-byte JMP:

apply_alternatives: feat: 3*32+21, old: (ffffffff810014bd, len: 2), repl: (ffffffff81d0b23c, len: 5)
alt_insn: e9 b1 62 2f ff
recompute_jumps: next_rip: ffffffff81d0b241, tgt_rip: ffffffff810014f2, new_displ: 0x00000033, ret len: 2
converted to: eb 33 90 90 90

and a 5-byte JMP:

old insn VA: 0xffffffff81001516, CPU feat: X86_FEATURE_ALWAYS, size: 2
__switch_to:
 ffffffff81001516:      eb 30                   jmp ffffffff81001548
repl insn: size: 5
 ffffffff81d0b241:      e9 10 63 2f ff          jmpq ffffffff81001556

gets shortened into a two-byte one:

apply_alternatives: feat: 3*32+21, old: (ffffffff81001516, len: 2), repl: (ffffffff81d0b241, len: 5)
alt_insn: e9 10 63 2f ff
recompute_jumps: next_rip: ffffffff81d0b246, tgt_rip: ffffffff81001556, new_displ: 0x0000003e, ret len: 2
converted to: eb 3e 90 90 90

... and so on.

This leads to a net win of around

40ish replacements * 3 bytes savings =~ 120 bytes of I$

on an AMD guest which means some savings of precious instruction cache
bandwidth. The padding to the shorter 2-byte JMPs are single-byte NOPs
which on smart microarchitectures means discarding NOPs at decode time
and thus freeing up execution bandwidth.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/include/asm/cpufeature.h |  10 +---
 arch/x86/kernel/alternative.c     | 103 ++++++++++++++++++++++++++++++++++++--
 arch/x86/lib/copy_user_64.S       |  11 ++--
 3 files changed, 105 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 5d491248c78d..bb0d14edc32e 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -481,13 +481,7 @@ static __always_inline __pure bool __static_cpu_has(u16 bit)
 static __always_inline __pure bool _static_cpu_has_safe(u16 bit)
 {
 #ifdef CC_HAVE_ASM_GOTO
-/*
- * We need to spell the jumps to the compiler because, depending on the offset,
- * the replacement jump can be bigger than the original jump, and this we cannot
- * have. Thus, we force the jump to the widest, 4-byte, signed relative
- * offset even though the last would often fit in less bytes.
- */
-		asm_volatile_goto("1: .byte 0xe9\n .long %l[t_dynamic] - 2f\n"
+		asm_volatile_goto("1: jmp %l[t_dynamic]\n"
 			 "2:\n"
 			 ".skip -(((5f-4f) - (2b-1b)) > 0) * "
 			         "((5f-4f) - (2b-1b)),0x90\n"
@@ -500,7 +494,7 @@ static __always_inline __pure bool _static_cpu_has_safe(u16 bit)
 			 " .byte 5f - 4f\n"		/* repl len */
 			 ".previous\n"
 			 ".section .altinstr_replacement,\"ax\"\n"
-			 "4: .byte 0xe9\n .long %l[t_no] - 2b\n"
+			 "4: jmp %l[t_no]\n"
 			 "5:\n"
 			 ".previous\n"
 			 ".section .altinstructions,\"a\"\n"
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index c99b0f13a90e..715af37bf008 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -58,6 +58,21 @@ do {									\
 		printk(KERN_DEBUG "%s: " fmt "\n", __func__, ##args);	\
 } while (0)
 
+#define DUMP_BYTES(buf, len, fmt, args...)				\
+do {									\
+	if (unlikely(debug_alternative)) {				\
+		int j;							\
+									\
+		if (!(len))						\
+			break;						\
+									\
+		printk(KERN_DEBUG fmt, ##args);				\
+		for (j = 0; j < (len) - 1; j++)				\
+			printk(KERN_CONT "%02hhx ", buf[j]);		\
+		printk(KERN_CONT "%02hhx\n", buf[j]);			\
+	}								\
+} while (0)
+
 /*
  * Each GENERIC_NOPX is of X bytes, and defined as an array of bytes
  * that correspond to that nop. Getting from one nop to the next, we
@@ -244,6 +259,71 @@ extern s32 __smp_locks[], __smp_locks_end[];
 void *text_poke_early(void *addr, const void *opcode, size_t len);
 
 /*
+ * Are we looking at a near JMP with a 1 or 4-byte displacement.
+ */
+static inline bool is_jmp(const u8 opcode)
+{
+	return opcode == 0xeb || opcode == 0xe9;
+}
+
+static void __init_or_module
+recompute_jump(struct alt_instr *a, u8 *orig_insn, u8 *repl_insn, u8 *insnbuf)
+{
+	u8 *next_rip, *tgt_rip;
+	s32 n_dspl, o_dspl;
+	int repl_len;
+
+	if (a->replacementlen != 5)
+		return;
+
+	o_dspl = *(s32 *)(insnbuf + 1);
+
+	/* next_rip of the replacement JMP */
+	next_rip = repl_insn + a->replacementlen;
+	/* target rip of the replacement JMP */
+	tgt_rip  = next_rip + o_dspl;
+	n_dspl = tgt_rip - orig_insn;
+
+	DPRINTK("target RIP: %p, new_displ: 0x%x", tgt_rip, n_dspl);
+
+	if (tgt_rip - orig_insn >= 0) {
+		if (n_dspl - 2 <= 127)
+			goto two_byte_jmp;
+		else
+			goto five_byte_jmp;
+	/* negative offset */
+	} else {
+		if (((n_dspl - 2) & 0xff) == (n_dspl - 2))
+			goto two_byte_jmp;
+		else
+			goto five_byte_jmp;
+	}
+
+two_byte_jmp:
+	n_dspl -= 2;
+
+	insnbuf[0] = 0xeb;
+	insnbuf[1] = (s8)n_dspl;
+	add_nops(insnbuf + 2, 3);
+
+	repl_len = 2;
+	goto done;
+
+five_byte_jmp:
+	n_dspl -= 5;
+
+	insnbuf[0] = 0xe9;
+	*(s32 *)&insnbuf[1] = n_dspl;
+
+	repl_len = 5;
+
+done:
+
+	DPRINTK("final displ: 0x%08x, JMP 0x%lx",
+		n_dspl, (unsigned long)orig_insn + n_dspl + repl_len);
+}
+
+/*
  * Replace instructions with better alternatives for this CPU type. This runs
  * before SMP is initialized to avoid SMP problems with self modifying code.
  * This implies that asymmetric systems where APs have less capabilities than
@@ -268,6 +348,8 @@ void __init_or_module apply_alternatives(struct alt_instr *start,
 	 * order.
 	 */
 	for (a = start; a < end; a++) {
+		int insnbuf_sz = 0;
+
 		instr = (u8 *)&a->instr_offset + a->instr_offset;
 		replacement = (u8 *)&a->repl_offset + a->repl_offset;
 		BUG_ON(a->instrlen > sizeof(insnbuf));
@@ -281,24 +363,35 @@ void __init_or_module apply_alternatives(struct alt_instr *start,
 			instr, a->instrlen,
 			replacement, a->replacementlen);
 
+		DUMP_BYTES(instr, a->instrlen, "%p: old_insn: ", instr);
+		DUMP_BYTES(replacement, a->replacementlen, "%p: rpl_insn: ", replacement);
+
 		memcpy(insnbuf, replacement, a->replacementlen);
+		insnbuf_sz = a->replacementlen;
 
 		/* 0xe8 is a relative jump; fix the offset. */
 		if (*insnbuf == 0xe8 && a->replacementlen == 5) {
 			*(s32 *)(insnbuf + 1) += replacement - instr;
-			DPRINTK("Fix CALL offset: 0x%x", *(s32 *)(insnbuf + 1));
+			DPRINTK("Fix CALL offset: 0x%x, CALL 0x%lx",
+				*(s32 *)(insnbuf + 1),
+				(unsigned long)instr + *(s32 *)(insnbuf + 1) + 5);
 		}
 
-		if (a->instrlen > a->replacementlen)
+		if (a->replacementlen && is_jmp(replacement[0]))
+			recompute_jump(a, instr, replacement, insnbuf);
+
+		if (a->instrlen > a->replacementlen) {
 			add_nops(insnbuf + a->replacementlen,
 				 a->instrlen - a->replacementlen);
+			insnbuf_sz += a->instrlen - a->replacementlen;
+		}
+		DUMP_BYTES(insnbuf, insnbuf_sz, "%p: final_insn: ", instr);
 
-		text_poke_early(instr, insnbuf, a->instrlen);
+		text_poke_early(instr, insnbuf, insnbuf_sz);
 	}
 }
 
 #ifdef CONFIG_SMP
-
 static void alternatives_smp_lock(const s32 *start, const s32 *end,
 				  u8 *text, u8 *text_end)
 {
@@ -449,7 +542,7 @@ int alternatives_text_reserved(void *start, void *end)
 
 	return 0;
 }
-#endif
+#endif /* CONFIG_SMP */
 
 #ifdef CONFIG_PARAVIRT
 void __init_or_module apply_paravirt(struct paravirt_patch_site *start,
diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S
index 1530ec2c1b12..4e6121ab5456 100644
--- a/arch/x86/lib/copy_user_64.S
+++ b/arch/x86/lib/copy_user_64.S
@@ -25,14 +25,13 @@
  */
 	.macro ALTERNATIVE_JUMP feature1,feature2,orig,alt1,alt2
 0:
-	.byte 0xe9	/* 32bit jump */
-	.long \orig-1f	/* by default jump to orig */
+	jmp \orig
 1:
 	.section .altinstr_replacement,"ax"
-2:	.byte 0xe9			/* near jump with 32bit immediate */
-	.long \alt1-1b /* offset */   /* or alternatively to alt1 */
-3:	.byte 0xe9			/* near jump with 32bit immediate */
-	.long \alt2-1b /* offset */   /* or alternatively to alt2 */
+2:
+	jmp \alt1
+3:
+	jmp \alt2
 	.previous
 
 	.section .altinstructions,"a"
-- 
2.2.0.33.gc18b867


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v1 05/12] x86, alternatives: Use optimized NOPs for padding
  2015-02-03 18:16 [PATCH v1 00/12] x86, alternatives: Instruction padding and more robust JMPs Borislav Petkov
                   ` (3 preceding siblings ...)
  2015-02-03 18:16 ` [PATCH v1 04/12] x86, alternatives: Make JMPs more robust Borislav Petkov
@ 2015-02-03 18:16 ` Borislav Petkov
  2015-02-03 19:36   ` Andy Lutomirski
  2015-02-03 18:16 ` [PATCH v1 06/12] x86, copy_page_64.S: Use generic ALTERNATIVE macro Borislav Petkov
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 21+ messages in thread
From: Borislav Petkov @ 2015-02-03 18:16 UTC (permalink / raw)
  To: X86 ML; +Cc: LKML

From: Borislav Petkov <bp@suse.de>

Alternatives allow now for empty old instruction. In this case we go
and pad the space with NOPs at assembly time. However, there are the
optimal, longer NOPs which should be used. Do that at patching time.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/alternative.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 715af37bf008..dd0cdb6b179c 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -323,6 +323,21 @@ done:
 		n_dspl, (unsigned long)orig_insn + n_dspl + repl_len);
 }
 
+static void __init_or_module optimize_nops(u8 *instr, u8 max_len)
+{
+	int i = 0;
+
+	while (instr[i] == 0x90 && i < max_len)
+		i++;
+
+	if (!i)
+		return;
+
+	add_nops(instr, i);
+
+	DUMP_BYTES(instr, i, "%p: optimized NOPs: ", instr);
+}
+
 /*
  * Replace instructions with better alternatives for this CPU type. This runs
  * before SMP is initialized to avoid SMP problems with self modifying code.
@@ -354,8 +369,11 @@ void __init_or_module apply_alternatives(struct alt_instr *start,
 		replacement = (u8 *)&a->repl_offset + a->repl_offset;
 		BUG_ON(a->instrlen > sizeof(insnbuf));
 		BUG_ON(a->cpuid >= (NCAPINTS + NBUGINTS) * 32);
-		if (!boot_cpu_has(a->cpuid))
+		if (!boot_cpu_has(a->cpuid)) {
+			if (instr[0] == 0x90)
+				optimize_nops(instr, a->instrlen);
 			continue;
+		}
 
 		DPRINTK("feat: %d*32+%d, old: (%p, len: %d), repl: (%p, len: %d)",
 			a->cpuid >> 5,
-- 
2.2.0.33.gc18b867


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v1 06/12] x86, copy_page_64.S: Use generic ALTERNATIVE macro
  2015-02-03 18:16 [PATCH v1 00/12] x86, alternatives: Instruction padding and more robust JMPs Borislav Petkov
                   ` (4 preceding siblings ...)
  2015-02-03 18:16 ` [PATCH v1 05/12] x86, alternatives: Use optimized NOPs for padding Borislav Petkov
@ 2015-02-03 18:16 ` Borislav Petkov
  2015-02-03 18:16 ` [PATCH v1 07/12] x86, copy_user_64.S: Convert to ALTERNATIVE_2 Borislav Petkov
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Borislav Petkov @ 2015-02-03 18:16 UTC (permalink / raw)
  To: X86 ML; +Cc: LKML

From: Borislav Petkov <bp@suse.de>

... instead of the semi-version with the spelled out sections.

What is more, make the REP_GOOD version be the default copy_page()
version as the majority of the relevant x86 CPUs do set
X86_FEATURE_REP_GOOD. Thus, copy_page gets compiled to:

ffffffff8130af80 <copy_page>:
ffffffff8130af80:       e9 0b 00 00 00          jmpq   ffffffff8130af90 <copy_page_regs>
ffffffff8130af85:       b9 00 02 00 00          mov    $0x200,%ecx
ffffffff8130af8a:       f3 48 a5                rep movsq %ds:(%rsi),%es:(%rdi)
ffffffff8130af8d:       c3                      retq
ffffffff8130af8e:       66 90                   xchg   %ax,%ax

ffffffff8130af90 <copy_page_regs>:
...

and after the alternatives have run, the JMP to the old, unrolled
version gets NOPed out:

ffffffff8130af80 <copy_page>:
ffffffff8130af80:  66 66 90		xchg   %ax,%ax
ffffffff8130af83:  66 90		xchg   %ax,%ax
ffffffff8130af85:  b9 00 02 00 00	mov    $0x200,%ecx
ffffffff8130af8a:  f3 48 a5		rep movsq %ds:(%rsi),%es:(%rdi)
ffffffff8130af8d:  c3			retq

On modern uarches, those NOPs are cheaper than the unconditional JMP
previously.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/lib/copy_page_64.S | 37 ++++++++++++-------------------------
 1 file changed, 12 insertions(+), 25 deletions(-)

diff --git a/arch/x86/lib/copy_page_64.S b/arch/x86/lib/copy_page_64.S
index 176cca67212b..8239dbcbf984 100644
--- a/arch/x86/lib/copy_page_64.S
+++ b/arch/x86/lib/copy_page_64.S
@@ -2,23 +2,26 @@
 
 #include <linux/linkage.h>
 #include <asm/dwarf2.h>
+#include <asm/cpufeature.h>
 #include <asm/alternative-asm.h>
 
+/*
+ * Some CPUs run faster using the string copy instructions (sane microcode).
+ * It is also a lot simpler. Use this when possible. But, don't use streaming
+ * copy unless the CPU indicates X86_FEATURE_REP_GOOD. Could vary the
+ * prefetch distance based on SMP/UP.
+ */
 	ALIGN
-copy_page_rep:
+ENTRY(copy_page)
 	CFI_STARTPROC
+	ALTERNATIVE "jmp copy_page_regs", "", X86_FEATURE_REP_GOOD
 	movl	$4096/8, %ecx
 	rep	movsq
 	ret
 	CFI_ENDPROC
-ENDPROC(copy_page_rep)
-
-/*
- *  Don't use streaming copy unless the CPU indicates X86_FEATURE_REP_GOOD.
- *  Could vary the prefetch distance based on SMP/UP.
-*/
+ENDPROC(copy_page)
 
-ENTRY(copy_page)
+ENTRY(copy_page_regs)
 	CFI_STARTPROC
 	subq	$2*8,	%rsp
 	CFI_ADJUST_CFA_OFFSET 2*8
@@ -90,21 +93,5 @@ ENTRY(copy_page)
 	addq	$2*8, %rsp
 	CFI_ADJUST_CFA_OFFSET -2*8
 	ret
-.Lcopy_page_end:
 	CFI_ENDPROC
-ENDPROC(copy_page)
-
-	/* Some CPUs run faster using the string copy instructions.
-	   It is also a lot simpler. Use this when possible */
-
-#include <asm/cpufeature.h>
-
-	.section .altinstr_replacement,"ax"
-1:	.byte 0xeb					/* jmp <disp8> */
-	.byte (copy_page_rep - copy_page) - (2f - 1b)	/* offset */
-2:
-	.previous
-	.section .altinstructions,"a"
-	altinstruction_entry copy_page, 1b, X86_FEATURE_REP_GOOD,	\
-		.Lcopy_page_end-copy_page, 2b-1b
-	.previous
+ENDPROC(copy_page_regs)
-- 
2.2.0.33.gc18b867


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v1 07/12] x86, copy_user_64.S: Convert to ALTERNATIVE_2
  2015-02-03 18:16 [PATCH v1 00/12] x86, alternatives: Instruction padding and more robust JMPs Borislav Petkov
                   ` (5 preceding siblings ...)
  2015-02-03 18:16 ` [PATCH v1 06/12] x86, copy_page_64.S: Use generic ALTERNATIVE macro Borislav Petkov
@ 2015-02-03 18:16 ` Borislav Petkov
  2015-02-03 18:16 ` [PATCH v1 08/12] x86, SMAP: Use ALTERNATIVE macro Borislav Petkov
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Borislav Petkov @ 2015-02-03 18:16 UTC (permalink / raw)
  To: X86 ML; +Cc: LKML

From: Borislav Petkov <bp@suse.de>

Use the asm macro and drop the locally grown version.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/lib/copy_user_64.S | 40 ++++++++++------------------------------
 1 file changed, 10 insertions(+), 30 deletions(-)

diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S
index 4e6121ab5456..fa997dfaef24 100644
--- a/arch/x86/lib/copy_user_64.S
+++ b/arch/x86/lib/copy_user_64.S
@@ -16,30 +16,6 @@
 #include <asm/asm.h>
 #include <asm/smap.h>
 
-/*
- * By placing feature2 after feature1 in altinstructions section, we logically
- * implement:
- * If CPU has feature2, jmp to alt2 is used
- * else if CPU has feature1, jmp to alt1 is used
- * else jmp to orig is used.
- */
-	.macro ALTERNATIVE_JUMP feature1,feature2,orig,alt1,alt2
-0:
-	jmp \orig
-1:
-	.section .altinstr_replacement,"ax"
-2:
-	jmp \alt1
-3:
-	jmp \alt2
-	.previous
-
-	.section .altinstructions,"a"
-	altinstruction_entry 0b,2b,\feature1,5,5
-	altinstruction_entry 0b,3b,\feature2,5,5
-	.previous
-	.endm
-
 	.macro ALIGN_DESTINATION
 	/* check for bad alignment of destination */
 	movl %edi,%ecx
@@ -73,9 +49,11 @@ ENTRY(_copy_to_user)
 	jc bad_to_user
 	cmpq TI_addr_limit(%rax),%rcx
 	ja bad_to_user
-	ALTERNATIVE_JUMP X86_FEATURE_REP_GOOD,X86_FEATURE_ERMS,	\
-		copy_user_generic_unrolled,copy_user_generic_string,	\
-		copy_user_enhanced_fast_string
+	ALTERNATIVE_2 "jmp copy_user_generic_unrolled",		\
+		      "jmp copy_user_generic_string",		\
+		      X86_FEATURE_REP_GOOD,			\
+		      "jmp copy_user_enhanced_fast_string",	\
+		      X86_FEATURE_ERMS
 	CFI_ENDPROC
 ENDPROC(_copy_to_user)
 
@@ -88,9 +66,11 @@ ENTRY(_copy_from_user)
 	jc bad_from_user
 	cmpq TI_addr_limit(%rax),%rcx
 	ja bad_from_user
-	ALTERNATIVE_JUMP X86_FEATURE_REP_GOOD,X86_FEATURE_ERMS,	\
-		copy_user_generic_unrolled,copy_user_generic_string,	\
-		copy_user_enhanced_fast_string
+	ALTERNATIVE_2 "jmp copy_user_generic_unrolled",		\
+		      "jmp copy_user_generic_string",		\
+		      X86_FEATURE_REP_GOOD,			\
+		      "jmp copy_user_enhanced_fast_string",	\
+		      X86_FEATURE_ERMS
 	CFI_ENDPROC
 ENDPROC(_copy_from_user)
 
-- 
2.2.0.33.gc18b867


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v1 08/12] x86, SMAP: Use ALTERNATIVE macro
  2015-02-03 18:16 [PATCH v1 00/12] x86, alternatives: Instruction padding and more robust JMPs Borislav Petkov
                   ` (6 preceding siblings ...)
  2015-02-03 18:16 ` [PATCH v1 07/12] x86, copy_user_64.S: Convert to ALTERNATIVE_2 Borislav Petkov
@ 2015-02-03 18:16 ` Borislav Petkov
  2015-02-03 18:16 ` [PATCH v1 09/12] x86, alternative: Convert X86_INVD_BUG to generic macro Borislav Petkov
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Borislav Petkov @ 2015-02-03 18:16 UTC (permalink / raw)
  To: X86 ML; +Cc: LKML

From: Borislav Petkov <bp@suse.de>

... and drop unfolded version. No need for ASM_NOP3 anymore either as
the alternatives do the proper padding at build time and insert proper
NOPs at boot time.

There should be no apparent operational change from this patch.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/include/asm/smap.h | 30 +++++++++---------------------
 1 file changed, 9 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
index 8d3120f4e270..ba665ebd17bb 100644
--- a/arch/x86/include/asm/smap.h
+++ b/arch/x86/include/asm/smap.h
@@ -27,23 +27,11 @@
 
 #ifdef CONFIG_X86_SMAP
 
-#define ASM_CLAC							\
-	661: ASM_NOP3 ;							\
-	.pushsection .altinstr_replacement, "ax" ;			\
-	662: __ASM_CLAC ;						\
-	.popsection ;							\
-	.pushsection .altinstructions, "a" ;				\
-	altinstruction_entry 661b, 662b, X86_FEATURE_SMAP, 3, 3 ;	\
-	.popsection
-
-#define ASM_STAC							\
-	661: ASM_NOP3 ;							\
-	.pushsection .altinstr_replacement, "ax" ;			\
-	662: __ASM_STAC ;						\
-	.popsection ;							\
-	.pushsection .altinstructions, "a" ;				\
-	altinstruction_entry 661b, 662b, X86_FEATURE_SMAP, 3, 3 ;	\
-	.popsection
+#define ASM_CLAC \
+	ALTERNATIVE "", __stringify(__ASM_CLAC), X86_FEATURE_SMAP
+
+#define ASM_STAC \
+	ALTERNATIVE "", __stringify(__ASM_STAC), X86_FEATURE_SMAP
 
 #else /* CONFIG_X86_SMAP */
 
@@ -61,20 +49,20 @@
 static __always_inline void clac(void)
 {
 	/* Note: a barrier is implicit in alternative() */
-	alternative(ASM_NOP3, __stringify(__ASM_CLAC), X86_FEATURE_SMAP);
+	alternative("", __stringify(__ASM_CLAC), X86_FEATURE_SMAP);
 }
 
 static __always_inline void stac(void)
 {
 	/* Note: a barrier is implicit in alternative() */
-	alternative(ASM_NOP3, __stringify(__ASM_STAC), X86_FEATURE_SMAP);
+	alternative("", __stringify(__ASM_STAC), X86_FEATURE_SMAP);
 }
 
 /* These macros can be used in asm() statements */
 #define ASM_CLAC \
-	ALTERNATIVE(ASM_NOP3, __stringify(__ASM_CLAC), X86_FEATURE_SMAP)
+	ALTERNATIVE("", __stringify(__ASM_CLAC), X86_FEATURE_SMAP)
 #define ASM_STAC \
-	ALTERNATIVE(ASM_NOP3, __stringify(__ASM_STAC), X86_FEATURE_SMAP)
+	ALTERNATIVE("", __stringify(__ASM_STAC), X86_FEATURE_SMAP)
 
 #else /* CONFIG_X86_SMAP */
 
-- 
2.2.0.33.gc18b867


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v1 09/12] x86, alternative: Convert X86_INVD_BUG to generic macro
  2015-02-03 18:16 [PATCH v1 00/12] x86, alternatives: Instruction padding and more robust JMPs Borislav Petkov
                   ` (7 preceding siblings ...)
  2015-02-03 18:16 ` [PATCH v1 08/12] x86, SMAP: Use ALTERNATIVE macro Borislav Petkov
@ 2015-02-03 18:16 ` Borislav Petkov
  2015-02-03 18:16 ` [PATCH v1 10/12] x86, alternatives: Convert clear_page_64.S Borislav Petkov
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Borislav Petkov @ 2015-02-03 18:16 UTC (permalink / raw)
  To: X86 ML; +Cc: LKML

From: Borislav Petkov <bp@suse.de>

Booting a 486 kernel on an AMD guest says:

[    0.416564] apply_alternatives: feat: 0*32+25, old: (c160a475, len: 5), repl: (c19557d4, len: 5)
[    0.420009] c160a475: alt_insn: 68 10 35 00 c1
[    0.425106] c19557d4: rpl_insn: 68 80 39 00 c1

which looks ok:

old insn VA: 0xc160a475, CPU feat: X86_FEATURE_XMM, size: 5
simd_coprocessor_error:
         c160a475:      68 10 35 00 c1          push $0xc1003510 <do_general_protection>
repl insn: 0xc19557d4, size: 5
         c160a475:      68 80 39 00 c1          push $0xc1003980 <do_simd_coprocessor_error>

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/entry_32.S | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index 000d4199b03e..b910577d1ff9 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -816,15 +816,9 @@ ENTRY(simd_coprocessor_error)
 	pushl_cfi $0
 #ifdef CONFIG_X86_INVD_BUG
 	/* AMD 486 bug: invd from userspace calls exception 19 instead of #GP */
-661:	pushl_cfi $do_general_protection
-662:
-.section .altinstructions,"a"
-	altinstruction_entry 661b, 663f, X86_FEATURE_XMM, 662b-661b, 664f-663f
-.previous
-.section .altinstr_replacement,"ax"
-663:	pushl $do_simd_coprocessor_error
-664:
-.previous
+	ALTERNATIVE "pushl_cfi $do_general_protection",	\
+		    "pushl $do_simd_coprocessor_error", \
+		    X86_FEATURE_XMM
 #else
 	pushl_cfi $do_simd_coprocessor_error
 #endif
-- 
2.2.0.33.gc18b867


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v1 10/12] x86, alternatives: Convert clear_page_64.S
  2015-02-03 18:16 [PATCH v1 00/12] x86, alternatives: Instruction padding and more robust JMPs Borislav Petkov
                   ` (8 preceding siblings ...)
  2015-02-03 18:16 ` [PATCH v1 09/12] x86, alternative: Convert X86_INVD_BUG to generic macro Borislav Petkov
@ 2015-02-03 18:16 ` Borislav Petkov
  2015-02-03 18:16 ` [PATCH v1 11/12] x86, alternative: Use alternative_2 in rdtsc_barrier Borislav Petkov
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Borislav Petkov @ 2015-02-03 18:16 UTC (permalink / raw)
  To: X86 ML; +Cc: LKML

From: Borislav Petkov <bp@suse.de>

Move clear_page() up so that we can get 2-byte forward JMPs when
patching:

[    2.388007] apply_alternatives: feat: 3*32+16, old: (ffffffff8130adb0, len: 5), repl: (ffffffff81d0b859, len: 5)
[    2.390992] ffffffff8130adb0: alt_insn: 90 90 90 90 90
[    2.393169] recompute_jump: new_displ: 0x0000003e
[    2.394628] ffffffff81d0b859: rpl_insn: eb 3e 66 66 90

even though compiler generated a 5-byte JMPs which we padded with 5
NOPs:

old insn VA: 0xffffffff8130adb0, CPU feat: X86_FEATURE_REP_GOOD, size: 5
clear_page:

ffffffff8130adb0 <clear_page>:
ffffffff8130adb0:       90                      nop
ffffffff8130adb1:       90                      nop
ffffffff8130adb2:       90                      nop
ffffffff8130adb3:       90                      nop
ffffffff8130adb4:       90                      nop
repl insn: 0xffffffff81d0b859, size: 5
 ffffffff81d0b859:      e9 92 f5 5f ff          jmpq ffffffff8130adf0

ffffffff8130adb0 <clear_page>:
ffffffff8130adb0:       e9 92 f5 5f ff          jmpq ffffffff8090a347

old insn VA: 0xffffffff8130adb0, CPU feat: X86_FEATURE_ERMS, size: 5
clear_page:

ffffffff8130adb0 <clear_page>:
ffffffff8130adb0:       90                      nop
ffffffff8130adb1:       90                      nop
ffffffff8130adb2:       90                      nop
ffffffff8130adb3:       90                      nop
ffffffff8130adb4:       90                      nop
repl insn: 0xffffffff81d0b85e, size: 5
 ffffffff81d0b85e:      e9 9d f5 5f ff          jmpq ffffffff8130ae00

ffffffff8130adb0 <clear_page>:
ffffffff8130adb0:       e9 9d f5 5f ff          jmpq ffffffff8090a352

Also, make the REP_GOOD version be the default as the majority of
machines set REP_GOOD. This way we get to save ourselves the JMP.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/lib/clear_page_64.S | 66 ++++++++++++++++++--------------------------
 1 file changed, 27 insertions(+), 39 deletions(-)

diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S
index f2145cfa12a6..e67e579c93bd 100644
--- a/arch/x86/lib/clear_page_64.S
+++ b/arch/x86/lib/clear_page_64.S
@@ -1,31 +1,35 @@
 #include <linux/linkage.h>
 #include <asm/dwarf2.h>
+#include <asm/cpufeature.h>
 #include <asm/alternative-asm.h>
 
 /*
- * Zero a page. 	
- * rdi	page
- */			
-ENTRY(clear_page_c)
+ * Most CPUs support enhanced REP MOVSB/STOSB instructions. It is
+ * recommended to use this when possible and we do use them by default.
+ * If enhanced REP MOVSB/STOSB is not available, try to use fast string.
+ * Otherwise, use original.
+ */
+
+/*
+ * Zero a page.
+ * %rdi	- page
+ */
+ENTRY(clear_page)
 	CFI_STARTPROC
+
+	ALTERNATIVE_2 "jmp clear_page_orig", "", X86_FEATURE_REP_GOOD, \
+		      "jmp clear_page_c_e", X86_FEATURE_ERMS
+
 	movl $4096/8,%ecx
 	xorl %eax,%eax
 	rep stosq
 	ret
 	CFI_ENDPROC
-ENDPROC(clear_page_c)
+ENDPROC(clear_page)
 
-ENTRY(clear_page_c_e)
+ENTRY(clear_page_orig)
 	CFI_STARTPROC
-	movl $4096,%ecx
-	xorl %eax,%eax
-	rep stosb
-	ret
-	CFI_ENDPROC
-ENDPROC(clear_page_c_e)
 
-ENTRY(clear_page)
-	CFI_STARTPROC
 	xorl   %eax,%eax
 	movl   $4096/64,%ecx
 	.p2align 4
@@ -45,29 +49,13 @@ ENTRY(clear_page)
 	nop
 	ret
 	CFI_ENDPROC
-.Lclear_page_end:
-ENDPROC(clear_page)
-
-	/*
-	 * Some CPUs support enhanced REP MOVSB/STOSB instructions.
-	 * It is recommended to use this when possible.
-	 * If enhanced REP MOVSB/STOSB is not available, try to use fast string.
-	 * Otherwise, use original function.
-	 *
-	 */
+ENDPROC(clear_page_orig)
 
-#include <asm/cpufeature.h>
-
-	.section .altinstr_replacement,"ax"
-1:	.byte 0xeb					/* jmp <disp8> */
-	.byte (clear_page_c - clear_page) - (2f - 1b)	/* offset */
-2:	.byte 0xeb					/* jmp <disp8> */
-	.byte (clear_page_c_e - clear_page) - (3f - 2b)	/* offset */
-3:
-	.previous
-	.section .altinstructions,"a"
-	altinstruction_entry clear_page,1b,X86_FEATURE_REP_GOOD,\
-			     .Lclear_page_end-clear_page, 2b-1b
-	altinstruction_entry clear_page,2b,X86_FEATURE_ERMS,   \
-			     .Lclear_page_end-clear_page,3b-2b
-	.previous
+ENTRY(clear_page_c_e)
+	CFI_STARTPROC
+	movl $4096,%ecx
+	xorl %eax,%eax
+	rep stosb
+	ret
+	CFI_ENDPROC
+ENDPROC(clear_page_c_e)
-- 
2.2.0.33.gc18b867


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v1 11/12] x86, alternative: Use alternative_2 in rdtsc_barrier
  2015-02-03 18:16 [PATCH v1 00/12] x86, alternatives: Instruction padding and more robust JMPs Borislav Petkov
                   ` (9 preceding siblings ...)
  2015-02-03 18:16 ` [PATCH v1 10/12] x86, alternatives: Convert clear_page_64.S Borislav Petkov
@ 2015-02-03 18:16 ` Borislav Petkov
  2015-02-03 19:55   ` Andy Lutomirski
  2015-02-03 18:16 ` [PATCH v1 12/12] x86, alternative: Cleanup prefetch primitives Borislav Petkov
  2015-02-18 21:20 ` [PATCH v1 00/12] x86, alternatives: Instruction padding and more robust JMPs Ingo Molnar
  12 siblings, 1 reply; 21+ messages in thread
From: Borislav Petkov @ 2015-02-03 18:16 UTC (permalink / raw)
  To: X86 ML; +Cc: LKML, Richard Weinberger

From: Borislav Petkov <bp@suse.de>

... now that we have it.

Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/include/asm/barrier.h | 6 ++----
 arch/x86/um/asm/barrier.h      | 4 ++--
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 2ab1eb33106e..959e45b81fe2 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -95,13 +95,11 @@ do {									\
  * Stop RDTSC speculation. This is needed when you need to use RDTSC
  * (or get_cycles or vread that possibly accesses the TSC) in a defined
  * code region.
- *
- * (Could use an alternative three way for this if there was one.)
  */
 static __always_inline void rdtsc_barrier(void)
 {
-	alternative(ASM_NOP3, "mfence", X86_FEATURE_MFENCE_RDTSC);
-	alternative(ASM_NOP3, "lfence", X86_FEATURE_LFENCE_RDTSC);
+	alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC,
+			  "lfence", X86_FEATURE_LFENCE_RDTSC);
 }
 
 #endif /* _ASM_X86_BARRIER_H */
diff --git a/arch/x86/um/asm/barrier.h b/arch/x86/um/asm/barrier.h
index 2d7d9a1f5b53..8ffd2146fa6a 100644
--- a/arch/x86/um/asm/barrier.h
+++ b/arch/x86/um/asm/barrier.h
@@ -64,8 +64,8 @@
  */
 static inline void rdtsc_barrier(void)
 {
-	alternative(ASM_NOP3, "mfence", X86_FEATURE_MFENCE_RDTSC);
-	alternative(ASM_NOP3, "lfence", X86_FEATURE_LFENCE_RDTSC);
+	alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC,
+			  "lfence", X86_FEATURE_LFENCE_RDTSC);
 }
 
 #endif
-- 
2.2.0.33.gc18b867


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v1 12/12] x86, alternative: Cleanup prefetch primitives
  2015-02-03 18:16 [PATCH v1 00/12] x86, alternatives: Instruction padding and more robust JMPs Borislav Petkov
                   ` (10 preceding siblings ...)
  2015-02-03 18:16 ` [PATCH v1 11/12] x86, alternative: Use alternative_2 in rdtsc_barrier Borislav Petkov
@ 2015-02-03 18:16 ` Borislav Petkov
  2015-02-18 21:20 ` [PATCH v1 00/12] x86, alternatives: Instruction padding and more robust JMPs Ingo Molnar
  12 siblings, 0 replies; 21+ messages in thread
From: Borislav Petkov @ 2015-02-03 18:16 UTC (permalink / raw)
  To: X86 ML; +Cc: LKML

From: Borislav Petkov <bp@suse.de>

This is based on a patch originally by hpa.

With the current improvements to the alternatives, we can simply use %P1
as a mem8 operand constraint and rely on the toolchain to generate the
proper instruction sizes. For example, on 32-bit, where we use an empty
old instruction we get:

[    2.226656] apply_alternatives: feat: 6*32+8, old: (c104648b, len: 4), repl: (c195566c, len: 4)
[    2.228005] c104648b: alt_insn: 90 90 90 90
[    2.230534] c195566c: rpl_insn: 0f 0d 4b 5c

...

[    3.653760] apply_alternatives: feat: 6*32+8, old: (c18e09b4, len: 3), repl: (c1955948, len: 3)
[    3.656004] c18e09b4: alt_insn: 90 90 90
[    3.658101] c1955948: rpl_insn: 0f 0d 08

...

[    4.200010] apply_alternatives: feat: 6*32+8, old: (c1190cf9, len: 7), repl: (c1955a79, len: 7)
[    4.204004] c1190cf9: alt_insn: 90 90 90 90 90 90 90
[    4.206809] c1955a79: rpl_insn: 0f 0d 0d a0 d4 85 c1

all with the proper padding done depending on the size of the
replacement instruction the compiler generates.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/include/asm/apic.h      |  2 +-
 arch/x86/include/asm/processor.h | 16 +++++++---------
 arch/x86/kernel/cpu/amd.c        |  5 +++++
 3 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 465b309af254..ddf77bd3d76c 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -91,7 +91,7 @@ static inline void native_apic_mem_write(u32 reg, u32 v)
 {
 	volatile u32 *addr = (volatile u32 *)(APIC_BASE + reg);
 
-	alternative_io("movl %0, %1", "xchgl %0, %1", X86_BUG_11AP,
+	alternative_io("movl %0, %P1", "xchgl %0, %P1", X86_BUG_11AP,
 		       ASM_OUTPUT2("=r" (v), "=m" (*addr)),
 		       ASM_OUTPUT2("0" (v), "m" (*addr)));
 }
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index a092a0cce0b7..65be9a37165a 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -794,10 +794,10 @@ extern char			ignore_fpu_irq;
 #define ARCH_HAS_SPINLOCK_PREFETCH
 
 #ifdef CONFIG_X86_32
-# define BASE_PREFETCH		ASM_NOP4
+# define BASE_PREFETCH		""
 # define ARCH_HAS_PREFETCH
 #else
-# define BASE_PREFETCH		"prefetcht0 (%1)"
+# define BASE_PREFETCH		"prefetcht0 %P1"
 #endif
 
 /*
@@ -808,10 +808,9 @@ extern char			ignore_fpu_irq;
  */
 static inline void prefetch(const void *x)
 {
-	alternative_input(BASE_PREFETCH,
-			  "prefetchnta (%1)",
+	alternative_input(BASE_PREFETCH, "prefetchnta %P1",
 			  X86_FEATURE_XMM,
-			  "r" (x));
+			  "m" (*(const char *)x));
 }
 
 /*
@@ -821,10 +820,9 @@ static inline void prefetch(const void *x)
  */
 static inline void prefetchw(const void *x)
 {
-	alternative_input(BASE_PREFETCH,
-			  "prefetchw (%1)",
-			  X86_FEATURE_3DNOW,
-			  "r" (x));
+	alternative_input(BASE_PREFETCH, "prefetchw %P1",
+			  X86_FEATURE_3DNOWPREFETCH,
+			  "m" (*(const char *)x));
 }
 
 static inline void spin_lock_prefetch(const void *x)
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 15c5df92f74e..05e627d1c898 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -711,6 +711,11 @@ static void init_amd(struct cpuinfo_x86 *c)
 		set_cpu_bug(c, X86_BUG_AMD_APIC_C1E);
 
 	rdmsr_safe(MSR_AMD64_PATCH_LEVEL, &c->microcode, &dummy);
+
+	/* 3DNow or LM implies PREFETCHW */
+	if (!cpu_has(c, X86_FEATURE_3DNOWPREFETCH))
+		if (cpu_has(c, X86_FEATURE_3DNOW) || cpu_has(c, X86_FEATURE_LM))
+			set_cpu_cap(c, X86_FEATURE_3DNOWPREFETCH);
 }
 
 #ifdef CONFIG_X86_32
-- 
2.2.0.33.gc18b867


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 02/12] x86, alternatives: Cleanup DPRINTK macro
  2015-02-03 18:16 ` [PATCH v1 02/12] x86, alternatives: Cleanup DPRINTK macro Borislav Petkov
@ 2015-02-03 19:01   ` Joe Perches
  0 siblings, 0 replies; 21+ messages in thread
From: Joe Perches @ 2015-02-03 19:01 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: X86 ML, LKML

On Tue, 2015-02-03 at 19:16 +0100, Borislav Petkov wrote:
> Make it pass __func__ implicitly. Also, dump info about each replacing
> we're doing. Fixup comments and style while at it.

Perhaps s/DPRINTK/pr_debug/ and remove debug_alternative
while using the generic dynamic debug facilities is feasible.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 05/12] x86, alternatives: Use optimized NOPs for padding
  2015-02-03 18:16 ` [PATCH v1 05/12] x86, alternatives: Use optimized NOPs for padding Borislav Petkov
@ 2015-02-03 19:36   ` Andy Lutomirski
  2015-02-03 19:54     ` Borislav Petkov
  0 siblings, 1 reply; 21+ messages in thread
From: Andy Lutomirski @ 2015-02-03 19:36 UTC (permalink / raw)
  To: Borislav Petkov, X86 ML; +Cc: LKML

On 02/03/2015 10:16 AM, Borislav Petkov wrote:
> From: Borislav Petkov <bp@suse.de>
>
> Alternatives allow now for empty old instruction. In this case we go
> and pad the space with NOPs at assembly time. However, there are the
> optimal, longer NOPs which should be used. Do that at patching time.
>
> Signed-off-by: Borislav Petkov <bp@suse.de>
> ---
>   arch/x86/kernel/alternative.c | 20 +++++++++++++++++++-
>   1 file changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index 715af37bf008..dd0cdb6b179c 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -323,6 +323,21 @@ done:
>   		n_dspl, (unsigned long)orig_insn + n_dspl + repl_len);
>   }
>
> +static void __init_or_module optimize_nops(u8 *instr, u8 max_len)
> +{
> +	int i = 0;
> +
> +	while (instr[i] == 0x90 && i < max_len)
> +		i++;
> +
> +	if (!i)
> +		return;
> +
> +	add_nops(instr, i);
> +
> +	DUMP_BYTES(instr, i, "%p: optimized NOPs: ", instr);
> +}
> +
>   /*
>    * Replace instructions with better alternatives for this CPU type. This runs
>    * before SMP is initialized to avoid SMP problems with self modifying code.
> @@ -354,8 +369,11 @@ void __init_or_module apply_alternatives(struct alt_instr *start,
>   		replacement = (u8 *)&a->repl_offset + a->repl_offset;
>   		BUG_ON(a->instrlen > sizeof(insnbuf));
>   		BUG_ON(a->cpuid >= (NCAPINTS + NBUGINTS) * 32);
> -		if (!boot_cpu_has(a->cpuid))
> +		if (!boot_cpu_has(a->cpuid)) {
> +			if (instr[0] == 0x90)
> +				optimize_nops(instr, a->instrlen);
>   			continue;
> +		}

I'm a bit confused here.  Shouldn't NOPs after a non-NOP in the old 
instruction also be optimized?

--Andy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 05/12] x86, alternatives: Use optimized NOPs for padding
  2015-02-03 19:36   ` Andy Lutomirski
@ 2015-02-03 19:54     ` Borislav Petkov
  0 siblings, 0 replies; 21+ messages in thread
From: Borislav Petkov @ 2015-02-03 19:54 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: X86 ML, LKML

On Tue, Feb 03, 2015 at 11:36:34AM -0800, Andy Lutomirski wrote:
> I'm a bit confused here.  Shouldn't NOPs after a non-NOP in the old
> instruction also be optimized?

Yes, they should! Good point.

Thanks, will fix.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 11/12] x86, alternative: Use alternative_2 in rdtsc_barrier
  2015-02-03 18:16 ` [PATCH v1 11/12] x86, alternative: Use alternative_2 in rdtsc_barrier Borislav Petkov
@ 2015-02-03 19:55   ` Andy Lutomirski
  2015-02-03 20:08     ` Borislav Petkov
  0 siblings, 1 reply; 21+ messages in thread
From: Andy Lutomirski @ 2015-02-03 19:55 UTC (permalink / raw)
  To: Borislav Petkov, X86 ML; +Cc: LKML, Richard Weinberger

On 02/03/2015 10:16 AM, Borislav Petkov wrote:
> From: Borislav Petkov <bp@suse.de>
>
> ... now that we have it.

Hallelujah!

Acked-by: Andy Lutomirski <luto@amacapital.net>

>
> Cc: Richard Weinberger <richard@nod.at>
> Signed-off-by: Borislav Petkov <bp@suse.de>
> ---
>   arch/x86/include/asm/barrier.h | 6 ++----
>   arch/x86/um/asm/barrier.h      | 4 ++--
>   2 files changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
> index 2ab1eb33106e..959e45b81fe2 100644
> --- a/arch/x86/include/asm/barrier.h
> +++ b/arch/x86/include/asm/barrier.h
> @@ -95,13 +95,11 @@ do {									\
>    * Stop RDTSC speculation. This is needed when you need to use RDTSC
>    * (or get_cycles or vread that possibly accesses the TSC) in a defined
>    * code region.
> - *
> - * (Could use an alternative three way for this if there was one.)
>    */
>   static __always_inline void rdtsc_barrier(void)
>   {
> -	alternative(ASM_NOP3, "mfence", X86_FEATURE_MFENCE_RDTSC);
> -	alternative(ASM_NOP3, "lfence", X86_FEATURE_LFENCE_RDTSC);
> +	alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC,
> +			  "lfence", X86_FEATURE_LFENCE_RDTSC);
>   }
>
>   #endif /* _ASM_X86_BARRIER_H */
> diff --git a/arch/x86/um/asm/barrier.h b/arch/x86/um/asm/barrier.h
> index 2d7d9a1f5b53..8ffd2146fa6a 100644
> --- a/arch/x86/um/asm/barrier.h
> +++ b/arch/x86/um/asm/barrier.h
> @@ -64,8 +64,8 @@
>    */
>   static inline void rdtsc_barrier(void)
>   {
> -	alternative(ASM_NOP3, "mfence", X86_FEATURE_MFENCE_RDTSC);
> -	alternative(ASM_NOP3, "lfence", X86_FEATURE_LFENCE_RDTSC);
> +	alternative_2("", "mfence", X86_FEATURE_MFENCE_RDTSC,
> +			  "lfence", X86_FEATURE_LFENCE_RDTSC);
>   }
>
>   #endif
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 11/12] x86, alternative: Use alternative_2 in rdtsc_barrier
  2015-02-03 19:55   ` Andy Lutomirski
@ 2015-02-03 20:08     ` Borislav Petkov
  0 siblings, 0 replies; 21+ messages in thread
From: Borislav Petkov @ 2015-02-03 20:08 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: X86 ML, LKML, Richard Weinberger

On Tue, Feb 03, 2015 at 11:55:11AM -0800, Andy Lutomirski wrote:
> On 02/03/2015 10:16 AM, Borislav Petkov wrote:
> >From: Borislav Petkov <bp@suse.de>
> >
> >... now that we have it.
> 
> Hallelujah!
> 
> Acked-by: Andy Lutomirski <luto@amacapital.net>

Haha, yeah.

Btw, the uml bits are going away because we realized we don't need the
barrier.h version there so Richard is removing it. I'll keep your ACK
though :)

Thanks and thanks for going through the series.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 00/12] x86, alternatives: Instruction padding and more robust JMPs
  2015-02-03 18:16 [PATCH v1 00/12] x86, alternatives: Instruction padding and more robust JMPs Borislav Petkov
                   ` (11 preceding siblings ...)
  2015-02-03 18:16 ` [PATCH v1 12/12] x86, alternative: Cleanup prefetch primitives Borislav Petkov
@ 2015-02-18 21:20 ` Ingo Molnar
  2015-02-18 21:23   ` Borislav Petkov
  2015-02-24 11:04   ` Borislav Petkov
  12 siblings, 2 replies; 21+ messages in thread
From: Ingo Molnar @ 2015-02-18 21:20 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: X86 ML, LKML


* Borislav Petkov <bp@alien8.de> wrote:

> From: Borislav Petkov <bp@suse.de>
> 
>   [ Changelog is in version-increasing number so that one can follow the
>     evolution of the patch set in a more natural way (i.e., latest version
>     comes at the end. ]
> 
> v0:
> 
> this is something which hpa and I talked about recently: 
> the ability for the alternatives code to add padding to 
> the original instruction in case the replacement is 
> longer and also to be able to simply write "jmp" and not 
> care about which JMP exactly the compiler generates and 
> whether the relative offsets are correct.
> 
> So this is a stab at it, it seems to boot in kvm here but 
> it needs more staring to make sure we're actually 
> generating the proper code at all times.

Ok, this looks really cool.

Do you have any stats about how many such (affected) patch 
sites we have in say the 64-bit defconfig kernel?

I.e. what's the scope of this optimization?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 00/12] x86, alternatives: Instruction padding and more robust JMPs
  2015-02-18 21:20 ` [PATCH v1 00/12] x86, alternatives: Instruction padding and more robust JMPs Ingo Molnar
@ 2015-02-18 21:23   ` Borislav Petkov
  2015-02-24 11:04   ` Borislav Petkov
  1 sibling, 0 replies; 21+ messages in thread
From: Borislav Petkov @ 2015-02-18 21:23 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: X86 ML, LKML

On Wed, Feb 18, 2015 at 10:20:13PM +0100, Ingo Molnar wrote:
> Ok, this looks really cool.

Thanks, I think so too :-)

> Do you have any stats about how many such (affected) patch 
> sites we have in say the 64-bit defconfig kernel?
> 
> I.e. what's the scope of this optimization?

I'm currently reworking the NOP optimizing aspect. I'll send it out when
it is done along with stats. Probably next week the latest.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 00/12] x86, alternatives: Instruction padding and more robust JMPs
  2015-02-18 21:20 ` [PATCH v1 00/12] x86, alternatives: Instruction padding and more robust JMPs Ingo Molnar
  2015-02-18 21:23   ` Borislav Petkov
@ 2015-02-24 11:04   ` Borislav Petkov
  1 sibling, 0 replies; 21+ messages in thread
From: Borislav Petkov @ 2015-02-24 11:04 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: X86 ML, LKML

On Wed, Feb 18, 2015 at 10:20:13PM +0100, Ingo Molnar wrote:
> Do you have any stats about how many such (affected) patch 
> sites we have in say the 64-bit defconfig kernel?

It is a good thing I have a tool to dump the alternatives patch sites
and what gets replaced by what. Because now I can create such stats just
like that! And hpa was questioning that tool's justification at the time
:-)

Anyway, here are the stats:

x86_64 defconfig:

Alternatives sites total:		2478
Total padding added (in Bytes):		6051

The padding is currently done for:

X86_FEATURE_ALWAYS
X86_FEATURE_ERMS
X86_FEATURE_LFENCE_RDTSC
X86_FEATURE_MFENCE_RDTSC
X86_FEATURE_SMAP

This is with the latest version of the patchset. Of course, on each
machine the alternatives sites actually being patched are a subset of
the total number.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2015-02-24 11:05 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-03 18:16 [PATCH v1 00/12] x86, alternatives: Instruction padding and more robust JMPs Borislav Petkov
2015-02-03 18:16 ` [PATCH v1 01/12] x86, copy_user: Remove FIX_ALIGNMENT define Borislav Petkov
2015-02-03 18:16 ` [PATCH v1 02/12] x86, alternatives: Cleanup DPRINTK macro Borislav Petkov
2015-02-03 19:01   ` Joe Perches
2015-02-03 18:16 ` [PATCH v1 03/12] x86, alternatives: Add instruction padding Borislav Petkov
2015-02-03 18:16 ` [PATCH v1 04/12] x86, alternatives: Make JMPs more robust Borislav Petkov
2015-02-03 18:16 ` [PATCH v1 05/12] x86, alternatives: Use optimized NOPs for padding Borislav Petkov
2015-02-03 19:36   ` Andy Lutomirski
2015-02-03 19:54     ` Borislav Petkov
2015-02-03 18:16 ` [PATCH v1 06/12] x86, copy_page_64.S: Use generic ALTERNATIVE macro Borislav Petkov
2015-02-03 18:16 ` [PATCH v1 07/12] x86, copy_user_64.S: Convert to ALTERNATIVE_2 Borislav Petkov
2015-02-03 18:16 ` [PATCH v1 08/12] x86, SMAP: Use ALTERNATIVE macro Borislav Petkov
2015-02-03 18:16 ` [PATCH v1 09/12] x86, alternative: Convert X86_INVD_BUG to generic macro Borislav Petkov
2015-02-03 18:16 ` [PATCH v1 10/12] x86, alternatives: Convert clear_page_64.S Borislav Petkov
2015-02-03 18:16 ` [PATCH v1 11/12] x86, alternative: Use alternative_2 in rdtsc_barrier Borislav Petkov
2015-02-03 19:55   ` Andy Lutomirski
2015-02-03 20:08     ` Borislav Petkov
2015-02-03 18:16 ` [PATCH v1 12/12] x86, alternative: Cleanup prefetch primitives Borislav Petkov
2015-02-18 21:20 ` [PATCH v1 00/12] x86, alternatives: Instruction padding and more robust JMPs Ingo Molnar
2015-02-18 21:23   ` Borislav Petkov
2015-02-24 11:04   ` Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).