linux-toolchains.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] x86: Remove ideal_nops[]
@ 2021-03-12 11:32 Peter Zijlstra
  2021-03-12 11:32 ` [PATCH 1/2] x86: Remove dynamic NOP selection Peter Zijlstra
                   ` (3 more replies)
  0 siblings, 4 replies; 51+ messages in thread
From: Peter Zijlstra @ 2021-03-12 11:32 UTC (permalink / raw)
  To: x86, rostedt, hpa, torvalds
  Cc: linux-kernel, linux-toolchains, peterz, jpoimboe,
	alexei.starovoitov, mhiramat

Hi!

A while ago Steve complained about x86 being weird for having different NOPs [1]

Having cursed the same thing before, I figured it was time to look at the NOP
situation.

32bit simply isn't a performance target anymore, so all we need is a set of
NOPs that works on all.

x86_64 has two main NOP variants, NOPL and prefix NOP. NOPL was introduced by
P6 and is architecturally mandated for x86_64. However, some uarchs made the
choice to limit NOPL decoding to a single port, which obviously limits NOPL
throughput. Other uarchs have (severe) decoding penalties for excessive (>~3)
prefixes, hobbling prefix NOP throughput.

But the thing is, all the modern uarchs can handle both without issue; that is
AMD K10 (2007) and later and Intel Ivy Bridge (2012) and later. The only
exception is Atom, which has the prefix penalty.

Since ultimate performance of a 10 year old chip (Intel Sandy Bridge, 2011) is
simply irrelevant today, remove variable NOPs and use NOPL.

This gives us deterministic NOPs and restores sanity.



[1] https://lkml.kernel.org/r/20210302105827.3403656c@gandalf.local.home


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 1/2] x86: Remove dynamic NOP selection
  2021-03-12 11:32 [PATCH 0/2] x86: Remove ideal_nops[] Peter Zijlstra
@ 2021-03-12 11:32 ` Peter Zijlstra
  2021-03-12 12:09   ` Peter Zijlstra
  2024-01-20  6:58   ` Thorsten Glaser
  2021-03-12 11:32 ` [PATCH 2/2] objtool,x86: Use asm/nops.h Peter Zijlstra
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 51+ messages in thread
From: Peter Zijlstra @ 2021-03-12 11:32 UTC (permalink / raw)
  To: x86, rostedt, hpa, torvalds
  Cc: linux-kernel, linux-toolchains, peterz, jpoimboe,
	alexei.starovoitov, mhiramat

This ensures that a NOP is a NOP and not a random other instruction
that is also a NOP. It allows simplification of dynamic code patching
that wants to verify existing code before writing new instructions
(ftrace, jump_label, static_call, etc..).

Differentiating on NOPs is not a feature.

This pessimises 32bit (DONTCARE) and 32bit on 64bit CPUs
(CARELESS). 32bit is not a performance target.

Everything x86_64 since AMD K10 (2007) and Intel IvyBridge (2012) is
fine with using NOPL (as opposed to prefix NOP). And per FEATURE_NOPL
being required for x86_64, all x86_64 CPUs can use NOPL. So stop
caring about NOPs, simplify things and get on with life.

[ The problem seems to be that some uarchs can only decode NOPL on a
single front-end port while others have severe decode penalties for
excessive prefixes. All modern uarchs can handle both, except Atom,
which has prefix penalties. ]

[ Also, much doubt you can actually measure any of this on normal
workloads. ]

After this FEATURE_NOPL is unused except for required-features for
x86_64. FEATURE_K8 is only used for PTI and FEATURE_K7 is unused.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> # bpf
---
 arch/x86/include/asm/jump_label.h    |   12 --
 arch/x86/include/asm/nops.h          |  180 ++++++++++---------------------
 arch/x86/include/asm/special_insns.h |    4 
 arch/x86/kernel/alternative.c        |  198 +++--------------------------------
 arch/x86/kernel/ftrace.c             |    4 
 arch/x86/kernel/jump_label.c         |   32 +----
 arch/x86/kernel/kprobes/core.c       |    2 
 arch/x86/kernel/setup.c              |    1 
 arch/x86/kernel/static_call.c        |    4 
 arch/x86/net/bpf_jit_comp.c          |    8 -
 10 files changed, 98 insertions(+), 347 deletions(-)

--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -6,12 +6,6 @@
 
 #define JUMP_LABEL_NOP_SIZE 5
 
-#ifdef CONFIG_X86_64
-# define STATIC_KEY_INIT_NOP P6_NOP5_ATOMIC
-#else
-# define STATIC_KEY_INIT_NOP GENERIC_NOP5_ATOMIC
-#endif
-
 #include <asm/asm.h>
 #include <asm/nops.h>
 
@@ -23,7 +17,7 @@
 static __always_inline bool arch_static_branch(struct static_key * const key, const bool branch)
 {
 	asm_volatile_goto("1:"
-		".byte " __stringify(STATIC_KEY_INIT_NOP) "\n\t"
+		".byte " __stringify(BYTES_NOP5) "\n\t"
 		".pushsection __jump_table,  \"aw\" \n\t"
 		_ASM_ALIGN "\n\t"
 		".long 1b - ., %l[l_yes] - . \n\t"
@@ -63,7 +57,7 @@ static __always_inline bool arch_static_
 	.long		\target - .Lstatic_jump_after_\@
 .Lstatic_jump_after_\@:
 	.else
-	.byte		STATIC_KEY_INIT_NOP
+	.byte		BYTES_NOP5
 	.endif
 	.pushsection __jump_table, "aw"
 	_ASM_ALIGN
@@ -75,7 +69,7 @@ static __always_inline bool arch_static_
 .macro STATIC_JUMP_IF_FALSE target, key, def
 .Lstatic_jump_\@:
 	.if \def
-	.byte		STATIC_KEY_INIT_NOP
+	.byte		BYTES_NOP5
 	.else
 	/* Equivalent to "jmp.d32 \target" */
 	.byte		0xe9
--- a/arch/x86/include/asm/nops.h
+++ b/arch/x86/include/asm/nops.h
@@ -4,89 +4,58 @@
 
 /*
  * Define nops for use with alternative() and for tracing.
+ */
+
+#ifndef CONFIG_64BIT
+
+/*
+ * Generic 32bit nops from GAS:
+ *
+ * 1: nop
+ * 2: movl %esi,%esi
+ * 3: leal 0x0(%esi),%esi
+ * 4: leal 0x0(%esi,%eiz,1),%esi
+ * 5: leal %ds:0x0(%esi,%eiz,1),%esi
+ * 6: leal 0x0(%esi),%esi
+ * 7: leal 0x0(%esi,%eiz,1),%esi
+ * 8: leal %ds:0x0(%esi,%eiz,1),%esi
  *
- * *_NOP5_ATOMIC must be a single instruction.
+ * Except 5 and 8, which are DS prefixed 4 and 7 resp, where GAS would emit 2
+ * nop instructions.
  */
+#define BYTES_NOP1	0x90
+#define BYTES_NOP2	0x89,0xf6
+#define BYTES_NOP3	0x8d,0x76,0x00
+#define BYTES_NOP4	0x8d,0x74,0x26,0x00
+#define BYTES_NOP5	0x3e,BYTES_NOP4
+#define BYTES_NOP6	0x8d,0xb6,0x00,0x00,0x00,0x00
+#define BYTES_NOP7	0x8d,0xb4,0x26,0x00,0x00,0x00,0x00
+#define BYTES_NOP8	0x3e,BYTES_NOP7
+
+#else
 
-#define NOP_DS_PREFIX 0x3e
+/*
+ * Generic 64bit nops from GAS:
+ *
+ * 1: nop
+ * 2: osp nop
+ * 3: nopl (%eax)
+ * 4: nopl 0x00(%eax)
+ * 5: nopl 0x00(%eax,%eax,1)
+ * 6: osp nopl 0x00(%eax,%eax,1)
+ * 7: nopl 0x00000000(%eax)
+ * 8: nopl 0x00000000(%eax,%eax,1)
+ */
+#define BYTES_NOP1	0x90
+#define BYTES_NOP2	0x66,BYTES_NOP1
+#define BYTES_NOP3	0x0f,0x1f,0x00
+#define BYTES_NOP4	0x0f,0x1f,0x40,0x00
+#define BYTES_NOP5	0x0f,0x1f,0x44,0x00,0x00
+#define BYTES_NOP6	0x66,BYTES_NOP5
+#define BYTES_NOP7	0x0f,0x1f,0x80,0x00,0x00,0x00,0x00
+#define BYTES_NOP8	0x0f,0x1f,0x84,0x00,0x00,0x00,0x00,0x00
 
-/* generic versions from gas
-   1: nop
-   the following instructions are NOT nops in 64-bit mode,
-   for 64-bit mode use K8 or P6 nops instead
-   2: movl %esi,%esi
-   3: leal 0x00(%esi),%esi
-   4: leal 0x00(,%esi,1),%esi
-   6: leal 0x00000000(%esi),%esi
-   7: leal 0x00000000(,%esi,1),%esi
-*/
-#define GENERIC_NOP1 0x90
-#define GENERIC_NOP2 0x89,0xf6
-#define GENERIC_NOP3 0x8d,0x76,0x00
-#define GENERIC_NOP4 0x8d,0x74,0x26,0x00
-#define GENERIC_NOP5 GENERIC_NOP1,GENERIC_NOP4
-#define GENERIC_NOP6 0x8d,0xb6,0x00,0x00,0x00,0x00
-#define GENERIC_NOP7 0x8d,0xb4,0x26,0x00,0x00,0x00,0x00
-#define GENERIC_NOP8 GENERIC_NOP1,GENERIC_NOP7
-#define GENERIC_NOP5_ATOMIC NOP_DS_PREFIX,GENERIC_NOP4
-
-/* Opteron 64bit nops
-   1: nop
-   2: osp nop
-   3: osp osp nop
-   4: osp osp osp nop
-*/
-#define K8_NOP1 GENERIC_NOP1
-#define K8_NOP2	0x66,K8_NOP1
-#define K8_NOP3	0x66,K8_NOP2
-#define K8_NOP4	0x66,K8_NOP3
-#define K8_NOP5	K8_NOP3,K8_NOP2
-#define K8_NOP6	K8_NOP3,K8_NOP3
-#define K8_NOP7	K8_NOP4,K8_NOP3
-#define K8_NOP8	K8_NOP4,K8_NOP4
-#define K8_NOP5_ATOMIC 0x66,K8_NOP4
-
-/* K7 nops
-   uses eax dependencies (arbitrary choice)
-   1: nop
-   2: movl %eax,%eax
-   3: leal (,%eax,1),%eax
-   4: leal 0x00(,%eax,1),%eax
-   6: leal 0x00000000(%eax),%eax
-   7: leal 0x00000000(,%eax,1),%eax
-*/
-#define K7_NOP1	GENERIC_NOP1
-#define K7_NOP2	0x8b,0xc0
-#define K7_NOP3	0x8d,0x04,0x20
-#define K7_NOP4	0x8d,0x44,0x20,0x00
-#define K7_NOP5	K7_NOP4,K7_NOP1
-#define K7_NOP6	0x8d,0x80,0,0,0,0
-#define K7_NOP7	0x8D,0x04,0x05,0,0,0,0
-#define K7_NOP8	K7_NOP7,K7_NOP1
-#define K7_NOP5_ATOMIC NOP_DS_PREFIX,K7_NOP4
-
-/* P6 nops
-   uses eax dependencies (Intel-recommended choice)
-   1: nop
-   2: osp nop
-   3: nopl (%eax)
-   4: nopl 0x00(%eax)
-   5: nopl 0x00(%eax,%eax,1)
-   6: osp nopl 0x00(%eax,%eax,1)
-   7: nopl 0x00000000(%eax)
-   8: nopl 0x00000000(%eax,%eax,1)
-   Note: All the above are assumed to be a single instruction.
-	There is kernel code that depends on this.
-*/
-#define P6_NOP1	GENERIC_NOP1
-#define P6_NOP2	0x66,0x90
-#define P6_NOP3	0x0f,0x1f,0x00
-#define P6_NOP4	0x0f,0x1f,0x40,0
-#define P6_NOP5	0x0f,0x1f,0x44,0x00,0
-#define P6_NOP6	0x66,0x0f,0x1f,0x44,0x00,0
-#define P6_NOP7	0x0f,0x1f,0x80,0,0,0,0
-#define P6_NOP8	0x0f,0x1f,0x84,0x00,0,0,0,0
-#define P6_NOP5_ATOMIC P6_NOP5
+#endif /* CONFIG_64BIT */
 
 #ifdef __ASSEMBLY__
 #define _ASM_MK_NOP(x) .byte x
@@ -94,54 +63,19 @@
 #define _ASM_MK_NOP(x) ".byte " __stringify(x) "\n"
 #endif
 
-#if defined(CONFIG_MK7)
-#define ASM_NOP1 _ASM_MK_NOP(K7_NOP1)
-#define ASM_NOP2 _ASM_MK_NOP(K7_NOP2)
-#define ASM_NOP3 _ASM_MK_NOP(K7_NOP3)
-#define ASM_NOP4 _ASM_MK_NOP(K7_NOP4)
-#define ASM_NOP5 _ASM_MK_NOP(K7_NOP5)
-#define ASM_NOP6 _ASM_MK_NOP(K7_NOP6)
-#define ASM_NOP7 _ASM_MK_NOP(K7_NOP7)
-#define ASM_NOP8 _ASM_MK_NOP(K7_NOP8)
-#define ASM_NOP5_ATOMIC _ASM_MK_NOP(K7_NOP5_ATOMIC)
-#elif defined(CONFIG_X86_P6_NOP)
-#define ASM_NOP1 _ASM_MK_NOP(P6_NOP1)
-#define ASM_NOP2 _ASM_MK_NOP(P6_NOP2)
-#define ASM_NOP3 _ASM_MK_NOP(P6_NOP3)
-#define ASM_NOP4 _ASM_MK_NOP(P6_NOP4)
-#define ASM_NOP5 _ASM_MK_NOP(P6_NOP5)
-#define ASM_NOP6 _ASM_MK_NOP(P6_NOP6)
-#define ASM_NOP7 _ASM_MK_NOP(P6_NOP7)
-#define ASM_NOP8 _ASM_MK_NOP(P6_NOP8)
-#define ASM_NOP5_ATOMIC _ASM_MK_NOP(P6_NOP5_ATOMIC)
-#elif defined(CONFIG_X86_64)
-#define ASM_NOP1 _ASM_MK_NOP(K8_NOP1)
-#define ASM_NOP2 _ASM_MK_NOP(K8_NOP2)
-#define ASM_NOP3 _ASM_MK_NOP(K8_NOP3)
-#define ASM_NOP4 _ASM_MK_NOP(K8_NOP4)
-#define ASM_NOP5 _ASM_MK_NOP(K8_NOP5)
-#define ASM_NOP6 _ASM_MK_NOP(K8_NOP6)
-#define ASM_NOP7 _ASM_MK_NOP(K8_NOP7)
-#define ASM_NOP8 _ASM_MK_NOP(K8_NOP8)
-#define ASM_NOP5_ATOMIC _ASM_MK_NOP(K8_NOP5_ATOMIC)
-#else
-#define ASM_NOP1 _ASM_MK_NOP(GENERIC_NOP1)
-#define ASM_NOP2 _ASM_MK_NOP(GENERIC_NOP2)
-#define ASM_NOP3 _ASM_MK_NOP(GENERIC_NOP3)
-#define ASM_NOP4 _ASM_MK_NOP(GENERIC_NOP4)
-#define ASM_NOP5 _ASM_MK_NOP(GENERIC_NOP5)
-#define ASM_NOP6 _ASM_MK_NOP(GENERIC_NOP6)
-#define ASM_NOP7 _ASM_MK_NOP(GENERIC_NOP7)
-#define ASM_NOP8 _ASM_MK_NOP(GENERIC_NOP8)
-#define ASM_NOP5_ATOMIC _ASM_MK_NOP(GENERIC_NOP5_ATOMIC)
-#endif
+#define ASM_NOP1 _ASM_MK_NOP(BYTES_NOP1)
+#define ASM_NOP2 _ASM_MK_NOP(BYTES_NOP2)
+#define ASM_NOP3 _ASM_MK_NOP(BYTES_NOP3)
+#define ASM_NOP4 _ASM_MK_NOP(BYTES_NOP4)
+#define ASM_NOP5 _ASM_MK_NOP(BYTES_NOP5)
+#define ASM_NOP6 _ASM_MK_NOP(BYTES_NOP6)
+#define ASM_NOP7 _ASM_MK_NOP(BYTES_NOP7)
+#define ASM_NOP8 _ASM_MK_NOP(BYTES_NOP8)
 
 #define ASM_NOP_MAX 8
-#define NOP_ATOMIC5 (ASM_NOP_MAX+1)	/* Entry for the 5-byte atomic NOP */
 
 #ifndef __ASSEMBLY__
-extern const unsigned char * const *ideal_nops;
-extern void arch_init_ideal_nops(void);
+extern const unsigned char * const x86_nops[];
 #endif
 
 #endif /* _ASM_X86_NOPS_H */
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -214,7 +214,7 @@ static inline void clflush(volatile void
 
 static inline void clflushopt(volatile void *__p)
 {
-	alternative_io(".byte " __stringify(NOP_DS_PREFIX) "; clflush %P0",
+	alternative_io(".byte 0x3e; clflush %P0",
 		       ".byte 0x66; clflush %P0",
 		       X86_FEATURE_CLFLUSHOPT,
 		       "+m" (*(volatile char __force *)__p));
@@ -225,7 +225,7 @@ static inline void clwb(volatile void *_
 	volatile struct { char x[64]; } *p = __p;
 
 	asm volatile(ALTERNATIVE_2(
-		".byte " __stringify(NOP_DS_PREFIX) "; clflush (%[pax])",
+		".byte 0x3e; clflush (%[pax])",
 		".byte 0x66; clflush (%[pax])", /* clflushopt (%%rax) */
 		X86_FEATURE_CLFLUSHOPT,
 		".byte 0x66, 0x0f, 0xae, 0x30",  /* clwb (%%rax) */
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -74,186 +74,30 @@ do {									\
 	}								\
 } while (0)
 
-/*
- * Each GENERIC_NOPX is of X bytes, and defined as an array of bytes
- * that correspond to that nop. Getting from one nop to the next, we
- * add to the array the offset that is equal to the sum of all sizes of
- * nops preceding the one we are after.
- *
- * Note: The GENERIC_NOP5_ATOMIC is at the end, as it breaks the
- * nice symmetry of sizes of the previous nops.
- */
-#if defined(GENERIC_NOP1) && !defined(CONFIG_X86_64)
-static const unsigned char intelnops[] =
+const unsigned char x86nops[] =
 {
-	GENERIC_NOP1,
-	GENERIC_NOP2,
-	GENERIC_NOP3,
-	GENERIC_NOP4,
-	GENERIC_NOP5,
-	GENERIC_NOP6,
-	GENERIC_NOP7,
-	GENERIC_NOP8,
-	GENERIC_NOP5_ATOMIC
-};
-static const unsigned char * const intel_nops[ASM_NOP_MAX+2] =
-{
-	NULL,
-	intelnops,
-	intelnops + 1,
-	intelnops + 1 + 2,
-	intelnops + 1 + 2 + 3,
-	intelnops + 1 + 2 + 3 + 4,
-	intelnops + 1 + 2 + 3 + 4 + 5,
-	intelnops + 1 + 2 + 3 + 4 + 5 + 6,
-	intelnops + 1 + 2 + 3 + 4 + 5 + 6 + 7,
-	intelnops + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8,
+	BYTES_NOP1,
+	BYTES_NOP2,
+	BYTES_NOP3,
+	BYTES_NOP4,
+	BYTES_NOP5,
+	BYTES_NOP6,
+	BYTES_NOP7,
+	BYTES_NOP8,
 };
-#endif
 
-#ifdef K8_NOP1
-static const unsigned char k8nops[] =
-{
-	K8_NOP1,
-	K8_NOP2,
-	K8_NOP3,
-	K8_NOP4,
-	K8_NOP5,
-	K8_NOP6,
-	K8_NOP7,
-	K8_NOP8,
-	K8_NOP5_ATOMIC
-};
-static const unsigned char * const k8_nops[ASM_NOP_MAX+2] =
-{
-	NULL,
-	k8nops,
-	k8nops + 1,
-	k8nops + 1 + 2,
-	k8nops + 1 + 2 + 3,
-	k8nops + 1 + 2 + 3 + 4,
-	k8nops + 1 + 2 + 3 + 4 + 5,
-	k8nops + 1 + 2 + 3 + 4 + 5 + 6,
-	k8nops + 1 + 2 + 3 + 4 + 5 + 6 + 7,
-	k8nops + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8,
-};
-#endif
-
-#if defined(K7_NOP1) && !defined(CONFIG_X86_64)
-static const unsigned char k7nops[] =
-{
-	K7_NOP1,
-	K7_NOP2,
-	K7_NOP3,
-	K7_NOP4,
-	K7_NOP5,
-	K7_NOP6,
-	K7_NOP7,
-	K7_NOP8,
-	K7_NOP5_ATOMIC
-};
-static const unsigned char * const k7_nops[ASM_NOP_MAX+2] =
-{
-	NULL,
-	k7nops,
-	k7nops + 1,
-	k7nops + 1 + 2,
-	k7nops + 1 + 2 + 3,
-	k7nops + 1 + 2 + 3 + 4,
-	k7nops + 1 + 2 + 3 + 4 + 5,
-	k7nops + 1 + 2 + 3 + 4 + 5 + 6,
-	k7nops + 1 + 2 + 3 + 4 + 5 + 6 + 7,
-	k7nops + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8,
-};
-#endif
-
-#ifdef P6_NOP1
-static const unsigned char p6nops[] =
-{
-	P6_NOP1,
-	P6_NOP2,
-	P6_NOP3,
-	P6_NOP4,
-	P6_NOP5,
-	P6_NOP6,
-	P6_NOP7,
-	P6_NOP8,
-	P6_NOP5_ATOMIC
-};
-static const unsigned char * const p6_nops[ASM_NOP_MAX+2] =
+const unsigned char * const x86_nops[ASM_NOP_MAX+1] =
 {
 	NULL,
-	p6nops,
-	p6nops + 1,
-	p6nops + 1 + 2,
-	p6nops + 1 + 2 + 3,
-	p6nops + 1 + 2 + 3 + 4,
-	p6nops + 1 + 2 + 3 + 4 + 5,
-	p6nops + 1 + 2 + 3 + 4 + 5 + 6,
-	p6nops + 1 + 2 + 3 + 4 + 5 + 6 + 7,
-	p6nops + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8,
+	x86nops,
+	x86nops + 1,
+	x86nops + 1 + 2,
+	x86nops + 1 + 2 + 3,
+	x86nops + 1 + 2 + 3 + 4,
+	x86nops + 1 + 2 + 3 + 4 + 5,
+	x86nops + 1 + 2 + 3 + 4 + 5 + 6,
+	x86nops + 1 + 2 + 3 + 4 + 5 + 6 + 7,
 };
-#endif
-
-/* Initialize these to a safe default */
-#ifdef CONFIG_X86_64
-const unsigned char * const *ideal_nops = p6_nops;
-#else
-const unsigned char * const *ideal_nops = intel_nops;
-#endif
-
-void __init arch_init_ideal_nops(void)
-{
-	switch (boot_cpu_data.x86_vendor) {
-	case X86_VENDOR_INTEL:
-		/*
-		 * Due to a decoder implementation quirk, some
-		 * specific Intel CPUs actually perform better with
-		 * the "k8_nops" than with the SDM-recommended NOPs.
-		 */
-		if (boot_cpu_data.x86 == 6 &&
-		    boot_cpu_data.x86_model >= 0x0f &&
-		    boot_cpu_data.x86_model != 0x1c &&
-		    boot_cpu_data.x86_model != 0x26 &&
-		    boot_cpu_data.x86_model != 0x27 &&
-		    boot_cpu_data.x86_model < 0x30) {
-			ideal_nops = k8_nops;
-		} else if (boot_cpu_has(X86_FEATURE_NOPL)) {
-			   ideal_nops = p6_nops;
-		} else {
-#ifdef CONFIG_X86_64
-			ideal_nops = k8_nops;
-#else
-			ideal_nops = intel_nops;
-#endif
-		}
-		break;
-
-	case X86_VENDOR_HYGON:
-		ideal_nops = p6_nops;
-		return;
-
-	case X86_VENDOR_AMD:
-		if (boot_cpu_data.x86 > 0xf) {
-			ideal_nops = p6_nops;
-			return;
-		}
-
-		fallthrough;
-
-	default:
-#ifdef CONFIG_X86_64
-		ideal_nops = k8_nops;
-#else
-		if (boot_cpu_has(X86_FEATURE_K8))
-			ideal_nops = k8_nops;
-		else if (boot_cpu_has(X86_FEATURE_K7))
-			ideal_nops = k7_nops;
-		else
-			ideal_nops = intel_nops;
-#endif
-	}
-}
 
 /* Use this to add nops to a buffer, then text_poke the whole buffer. */
 static void __init_or_module add_nops(void *insns, unsigned int len)
@@ -262,7 +106,7 @@ static void __init_or_module add_nops(vo
 		unsigned int noplen = len;
 		if (noplen > ASM_NOP_MAX)
 			noplen = ASM_NOP_MAX;
-		memcpy(insns, ideal_nops[noplen], noplen);
+		memcpy(insns, x86_nops[noplen], noplen);
 		insns += noplen;
 		len -= noplen;
 	}
@@ -1302,13 +1146,13 @@ static void text_poke_loc_init(struct te
 	default: /* assume NOP */
 		switch (len) {
 		case 2: /* NOP2 -- emulate as JMP8+0 */
-			BUG_ON(memcmp(emulate, ideal_nops[len], len));
+			BUG_ON(memcmp(emulate, x86_nops[len], len));
 			tp->opcode = JMP8_INSN_OPCODE;
 			tp->rel32 = 0;
 			break;
 
 		case 5: /* NOP5 -- emulate as JMP32+0 */
-			BUG_ON(memcmp(emulate, ideal_nops[NOP_ATOMIC5], len));
+			BUG_ON(memcmp(emulate, x86_nops[len], len));
 			tp->opcode = JMP32_INSN_OPCODE;
 			tp->rel32 = 0;
 			break;
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -66,7 +66,7 @@ int ftrace_arch_code_modify_post_process
 
 static const char *ftrace_nop_replace(void)
 {
-	return ideal_nops[NOP_ATOMIC5];
+	return x86_nops[5];
 }
 
 static const char *ftrace_call_replace(unsigned long ip, unsigned long addr)
@@ -377,7 +377,7 @@ create_trampoline(struct ftrace_ops *ops
 		ip = trampoline + (jmp_offset - start_offset);
 		if (WARN_ON(*(char *)ip != 0x75))
 			goto fail;
-		ret = copy_from_kernel_nofault(ip, ideal_nops[2], 2);
+		ret = copy_from_kernel_nofault(ip, x86_nops[2], 2);
 		if (ret < 0)
 			goto fail;
 	}
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -28,10 +28,8 @@ static void bug_at(const void *ip, int l
 }
 
 static const void *
-__jump_label_set_jump_code(struct jump_entry *entry, enum jump_label_type type, int init)
+__jump_label_set_jump_code(struct jump_entry *entry, enum jump_label_type type)
 {
-	const unsigned char default_nop[] = { STATIC_KEY_INIT_NOP };
-	const unsigned char *ideal_nop = ideal_nops[NOP_ATOMIC5];
 	const void *expect, *code;
 	const void *addr, *dest;
 	int line;
@@ -41,10 +39,8 @@ __jump_label_set_jump_code(struct jump_e
 
 	code = text_gen_insn(JMP32_INSN_OPCODE, addr, dest);
 
-	if (init) {
-		expect = default_nop; line = __LINE__;
-	} else if (type == JUMP_LABEL_JMP) {
-		expect = ideal_nop; line = __LINE__;
+	if (type == JUMP_LABEL_JMP) {
+		expect = x86_nops[5]; line = __LINE__;
 	} else {
 		expect = code; line = __LINE__;
 	}
@@ -53,7 +49,7 @@ __jump_label_set_jump_code(struct jump_e
 		bug_at(addr, line);
 
 	if (type == JUMP_LABEL_NOP)
-		code = ideal_nop;
+		code = x86_nops[5];
 
 	return code;
 }
@@ -62,7 +58,7 @@ static inline void __jump_label_transfor
 					  enum jump_label_type type,
 					  int init)
 {
-	const void *opcode = __jump_label_set_jump_code(entry, type, init);
+	const void *opcode = __jump_label_set_jump_code(entry, type);
 
 	/*
 	 * As long as only a single processor is running and the code is still
@@ -113,7 +109,7 @@ bool arch_jump_label_transform_queue(str
 	}
 
 	mutex_lock(&text_mutex);
-	opcode = __jump_label_set_jump_code(entry, type, 0);
+	opcode = __jump_label_set_jump_code(entry, type);
 	text_poke_queue((void *)jump_entry_code(entry),
 			opcode, JUMP_LABEL_NOP_SIZE, NULL);
 	mutex_unlock(&text_mutex);
@@ -136,22 +132,6 @@ static enum {
 __init_or_module void arch_jump_label_transform_static(struct jump_entry *entry,
 				      enum jump_label_type type)
 {
-	/*
-	 * This function is called at boot up and when modules are
-	 * first loaded. Check if the default nop, the one that is
-	 * inserted at compile time, is the ideal nop. If it is, then
-	 * we do not need to update the nop, and we can leave it as is.
-	 * If it is not, then we need to update the nop to the ideal nop.
-	 */
-	if (jlstate == JL_STATE_START) {
-		const unsigned char default_nop[] = { STATIC_KEY_INIT_NOP };
-		const unsigned char *ideal_nop = ideal_nops[NOP_ATOMIC5];
-
-		if (memcmp(ideal_nop, default_nop, 5) != 0)
-			jlstate = JL_STATE_UPDATE;
-		else
-			jlstate = JL_STATE_NO_UPDATE;
-	}
 	if (jlstate == JL_STATE_UPDATE)
 		jump_label_transform(entry, type, 1);
 }
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -229,7 +229,7 @@ __recover_probed_insn(kprobe_opcode_t *b
 		return 0UL;
 
 	if (faddr)
-		memcpy(buf, ideal_nops[NOP_ATOMIC5], 5);
+		memcpy(buf, x86_nops[5], 5);
 	else
 		buf[0] = kp->opcode;
 	return (unsigned long)buf;
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -822,7 +822,6 @@ void __init setup_arch(char **cmdline_p)
 
 	idt_setup_early_traps();
 	early_cpu_init();
-	arch_init_ideal_nops();
 	jump_label_init();
 	static_call_init();
 	early_ioremap_init();
--- a/arch/x86/kernel/static_call.c
+++ b/arch/x86/kernel/static_call.c
@@ -34,7 +34,7 @@ static void __ref __static_call_transfor
 		break;
 
 	case NOP:
-		code = ideal_nops[NOP_ATOMIC5];
+		code = x86_nops[5];
 		break;
 
 	case JMP:
@@ -66,7 +66,7 @@ static void __static_call_validate(void
 			return;
 	} else {
 		if (opcode == CALL_INSN_OPCODE ||
-		    !memcmp(insn, ideal_nops[NOP_ATOMIC5], 5) ||
+		    !memcmp(insn, x86_nops[5], 5) ||
 		    !memcmp(insn, xor5rax, 5))
 			return;
 	}
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -282,7 +282,7 @@ static void emit_prologue(u8 **pprog, u3
 	/* BPF trampoline can be made to work without these nops,
 	 * but let's waste 5 bytes for now and optimize later
 	 */
-	memcpy(prog, ideal_nops[NOP_ATOMIC5], cnt);
+	memcpy(prog, x86_nops[5], cnt);
 	prog += cnt;
 	if (!ebpf_from_cbpf) {
 		if (tail_call_reachable && !is_subprog)
@@ -330,7 +330,7 @@ static int __bpf_arch_text_poke(void *ip
 				void *old_addr, void *new_addr,
 				const bool text_live)
 {
-	const u8 *nop_insn = ideal_nops[NOP_ATOMIC5];
+	const u8 *nop_insn = x86_nops[5];
 	u8 old_insn[X86_PATCH_SIZE];
 	u8 new_insn[X86_PATCH_SIZE];
 	u8 *prog;
@@ -560,7 +560,7 @@ static void emit_bpf_tail_call_direct(st
 	if (stack_depth)
 		EMIT3_off32(0x48, 0x81, 0xC4, round_up(stack_depth, 8));
 
-	memcpy(prog, ideal_nops[NOP_ATOMIC5], X86_PATCH_SIZE);
+	memcpy(prog, x86_nops[5], X86_PATCH_SIZE);
 	prog += X86_PATCH_SIZE;
 	/* out: */
 
@@ -881,7 +881,7 @@ static int emit_nops(u8 **pprog, int len
 			noplen = ASM_NOP_MAX;
 
 		for (i = 0; i < noplen; i++)
-			EMIT1(ideal_nops[noplen][i]);
+			EMIT1(x86_nops[noplen][i]);
 		len -= noplen;
 	}
 



^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 2/2] objtool,x86: Use asm/nops.h
  2021-03-12 11:32 [PATCH 0/2] x86: Remove ideal_nops[] Peter Zijlstra
  2021-03-12 11:32 ` [PATCH 1/2] x86: Remove dynamic NOP selection Peter Zijlstra
@ 2021-03-12 11:32 ` Peter Zijlstra
  2021-03-12 14:29 ` [PATCH 0/2] x86: Remove ideal_nops[] Sedat Dilek
  2021-03-12 20:59 ` Borislav Petkov
  3 siblings, 0 replies; 51+ messages in thread
From: Peter Zijlstra @ 2021-03-12 11:32 UTC (permalink / raw)
  To: x86, rostedt, hpa, torvalds
  Cc: linux-kernel, linux-toolchains, peterz, jpoimboe,
	alexei.starovoitov, mhiramat

Since the kernel will rely on a single canonical set of NOPs, make
sure objtool uses the exact same ones.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 tools/arch/x86/include/asm/nops.h |   81 ++++++++++++++++++++++++++++++++++++++
 tools/objtool/arch/x86/decode.c   |   13 +++---
 tools/objtool/sync-check.sh       |    1 
 3 files changed, 90 insertions(+), 5 deletions(-)

--- /dev/null
+++ b/tools/arch/x86/include/asm/nops.h
@@ -0,0 +1,81 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_NOPS_H
+#define _ASM_X86_NOPS_H
+
+/*
+ * Define nops for use with alternative() and for tracing.
+ */
+
+#ifndef CONFIG_64BIT
+
+/*
+ * Generic 32bit nops from GAS:
+ *
+ * 1: nop
+ * 2: movl %esi,%esi
+ * 3: leal 0x0(%esi),%esi
+ * 4: leal 0x0(%esi,%eiz,1),%esi
+ * 5: leal %ds:0x0(%esi,%eiz,1),%esi
+ * 6: leal 0x0(%esi),%esi
+ * 7: leal 0x0(%esi,%eiz,1),%esi
+ * 8: leal %ds:0x0(%esi,%eiz,1),%esi
+ *
+ * Except 5 and 8, which are DS prefixed 4 and 7 resp, where GAS would emit 2
+ * nop instructions.
+ */
+#define BYTES_NOP1	0x90
+#define BYTES_NOP2	0x89,0xf6
+#define BYTES_NOP3	0x8d,0x76,0x00
+#define BYTES_NOP4	0x8d,0x74,0x26,0x00
+#define BYTES_NOP5	0x3e,BYTES_NOP4
+#define BYTES_NOP6	0x8d,0xb6,0x00,0x00,0x00,0x00
+#define BYTES_NOP7	0x8d,0xb4,0x26,0x00,0x00,0x00,0x00
+#define BYTES_NOP8	0x3e,BYTES_NOP7
+
+#else
+
+/*
+ * Generic 64bit nops from GAS:
+ *
+ * 1: nop
+ * 2: osp nop
+ * 3: nopl (%eax)
+ * 4: nopl 0x00(%eax)
+ * 5: nopl 0x00(%eax,%eax,1)
+ * 6: osp nopl 0x00(%eax,%eax,1)
+ * 7: nopl 0x00000000(%eax)
+ * 8: nopl 0x00000000(%eax,%eax,1)
+ */
+#define BYTES_NOP1	0x90
+#define BYTES_NOP2	0x66,BYTES_NOP1
+#define BYTES_NOP3	0x0f,0x1f,0x00
+#define BYTES_NOP4	0x0f,0x1f,0x40,0x00
+#define BYTES_NOP5	0x0f,0x1f,0x44,0x00,0x00
+#define BYTES_NOP6	0x66,BYTES_NOP5
+#define BYTES_NOP7	0x0f,0x1f,0x80,0x00,0x00,0x00,0x00
+#define BYTES_NOP8	0x0f,0x1f,0x84,0x00,0x00,0x00,0x00,0x00
+
+#endif /* CONFIG_64BIT */
+
+#ifdef __ASSEMBLY__
+#define _ASM_MK_NOP(x) .byte x
+#else
+#define _ASM_MK_NOP(x) ".byte " __stringify(x) "\n"
+#endif
+
+#define ASM_NOP1 _ASM_MK_NOP(BYTES_NOP1)
+#define ASM_NOP2 _ASM_MK_NOP(BYTES_NOP2)
+#define ASM_NOP3 _ASM_MK_NOP(BYTES_NOP3)
+#define ASM_NOP4 _ASM_MK_NOP(BYTES_NOP4)
+#define ASM_NOP5 _ASM_MK_NOP(BYTES_NOP5)
+#define ASM_NOP6 _ASM_MK_NOP(BYTES_NOP6)
+#define ASM_NOP7 _ASM_MK_NOP(BYTES_NOP7)
+#define ASM_NOP8 _ASM_MK_NOP(BYTES_NOP8)
+
+#define ASM_NOP_MAX 8
+
+#ifndef __ASSEMBLY__
+extern const unsigned char * const x86_nops[];
+#endif
+
+#endif /* _ASM_X86_NOPS_H */
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -11,6 +11,9 @@
 #include "../../../arch/x86/lib/inat.c"
 #include "../../../arch/x86/lib/insn.c"
 
+#define CONFIG_64BIT 1
+#include <asm/nops.h>
+
 #include <asm/orc_types.h>
 #include <objtool/check.h>
 #include <objtool/elf.h>
@@ -640,11 +643,11 @@ void arch_initial_func_cfi_state(struct
 const char *arch_nop_insn(int len)
 {
 	static const char nops[5][5] = {
-		/* 1 */ { 0x90 },
-		/* 2 */ { 0x66, 0x90 },
-		/* 3 */ { 0x0f, 0x1f, 0x00 },
-		/* 4 */ { 0x0f, 0x1f, 0x40, 0x00 },
-		/* 5 */ { 0x0f, 0x1f, 0x44, 0x00, 0x00 },
+		{ BYTES_NOP1 },
+		{ BYTES_NOP2 },
+		{ BYTES_NOP3 },
+		{ BYTES_NOP4 },
+		{ BYTES_NOP5 },
 	};
 
 	if (len < 1 || len > 5) {
--- a/tools/objtool/sync-check.sh
+++ b/tools/objtool/sync-check.sh
@@ -10,6 +10,7 @@ FILES="include/linux/objtool.h"
 
 if [ "$SRCARCH" = "x86" ]; then
 FILES="$FILES
+arch/x86/include/asm/nops.h
 arch/x86/include/asm/inat_types.h
 arch/x86/include/asm/orc_types.h
 arch/x86/include/asm/emulate_prefix.h



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/2] x86: Remove dynamic NOP selection
  2021-03-12 11:32 ` [PATCH 1/2] x86: Remove dynamic NOP selection Peter Zijlstra
@ 2021-03-12 12:09   ` Peter Zijlstra
  2021-03-12 20:36     ` Linus Torvalds
  2024-01-20  6:58   ` Thorsten Glaser
  1 sibling, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2021-03-12 12:09 UTC (permalink / raw)
  To: x86, rostedt, hpa, torvalds
  Cc: linux-kernel, linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

On Fri, Mar 12, 2021 at 12:32:54PM +0100, Peter Zijlstra wrote:
> +#ifndef CONFIG_64BIT
> +
> +/*
> + * Generic 32bit nops from GAS:
> + *
> + * 1: nop
> + * 2: movl %esi,%esi
> + * 3: leal 0x0(%esi),%esi
> + * 4: leal 0x0(%esi,%eiz,1),%esi
> + * 5: leal %ds:0x0(%esi,%eiz,1),%esi
> + * 6: leal 0x0(%esi),%esi
> + * 7: leal 0x0(%esi,%eiz,1),%esi
> + * 8: leal %ds:0x0(%esi,%eiz,1),%esi
>   *
> + * Except 5 and 8, which are DS prefixed 4 and 7 resp, where GAS would emit 2
> + * nop instructions.
>   */
> +#define BYTES_NOP1	0x90
> +#define BYTES_NOP2	0x89,0xf6
> +#define BYTES_NOP3	0x8d,0x76,0x00
> +#define BYTES_NOP4	0x8d,0x74,0x26,0x00
> +#define BYTES_NOP5	0x3e,BYTES_NOP4
> +#define BYTES_NOP6	0x8d,0xb6,0x00,0x00,0x00,0x00
> +#define BYTES_NOP7	0x8d,0xb4,0x26,0x00,0x00,0x00,0x00
> +#define BYTES_NOP8	0x3e,BYTES_NOP7
> +
> +#else
>  
> +/*
> + * Generic 64bit nops from GAS:
> + *
> + * 1: nop
> + * 2: osp nop
> + * 3: nopl (%eax)
> + * 4: nopl 0x00(%eax)
> + * 5: nopl 0x00(%eax,%eax,1)
> + * 6: osp nopl 0x00(%eax,%eax,1)
> + * 7: nopl 0x00000000(%eax)
> + * 8: nopl 0x00000000(%eax,%eax,1)
> + */
> +#define BYTES_NOP1	0x90
> +#define BYTES_NOP2	0x66,BYTES_NOP1
> +#define BYTES_NOP3	0x0f,0x1f,0x00
> +#define BYTES_NOP4	0x0f,0x1f,0x40,0x00
> +#define BYTES_NOP5	0x0f,0x1f,0x44,0x00,0x00
> +#define BYTES_NOP6	0x66,BYTES_NOP5
> +#define BYTES_NOP7	0x0f,0x1f,0x80,0x00,0x00,0x00,0x00
> +#define BYTES_NOP8	0x0f,0x1f,0x84,0x00,0x00,0x00,0x00,0x00
>  
> +#endif /* CONFIG_64BIT */

Note that this also made all NOPs single instructions and removed the
special atomic nop.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/2] x86: Remove ideal_nops[]
  2021-03-12 11:32 [PATCH 0/2] x86: Remove ideal_nops[] Peter Zijlstra
  2021-03-12 11:32 ` [PATCH 1/2] x86: Remove dynamic NOP selection Peter Zijlstra
  2021-03-12 11:32 ` [PATCH 2/2] objtool,x86: Use asm/nops.h Peter Zijlstra
@ 2021-03-12 14:29 ` Sedat Dilek
  2021-03-12 14:47   ` Borislav Petkov
  2021-03-12 20:59 ` Borislav Petkov
  3 siblings, 1 reply; 51+ messages in thread
From: Sedat Dilek @ 2021-03-12 14:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, rostedt, hpa, torvalds, linux-kernel, linux-toolchains,
	jpoimboe, alexei.starovoitov, mhiramat

On Fri, Mar 12, 2021 at 1:00 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> Hi!
>
> A while ago Steve complained about x86 being weird for having different NOPs [1]
>
> Having cursed the same thing before, I figured it was time to look at the NOP
> situation.
>
> 32bit simply isn't a performance target anymore, so all we need is a set of
> NOPs that works on all.
>
> x86_64 has two main NOP variants, NOPL and prefix NOP. NOPL was introduced by
> P6 and is architecturally mandated for x86_64. However, some uarchs made the
> choice to limit NOPL decoding to a single port, which obviously limits NOPL
> throughput. Other uarchs have (severe) decoding penalties for excessive (>~3)
> prefixes, hobbling prefix NOP throughput.
>
> But the thing is, all the modern uarchs can handle both without issue; that is
> AMD K10 (2007) and later and Intel Ivy Bridge (2012) and later. The only
> exception is Atom, which has the prefix penalty.
>
> Since ultimate performance of a 10 year old chip (Intel Sandy Bridge, 2011) is
> simply irrelevant today, remove variable NOPs and use NOPL.
>

Hi Peter,

I am an Intel SandyBridge power user and want the ultimate performance
on my hardware.

What does this change exactly mean to/for me?

I got this laptop as the last gift for my birthday in 2012 from my mother.
She died the same year.
So, this is a bit sentimental hardware for me.

It's amazing what this laptop all was involved in.
10+ years of LLVM/Clang for Linux-kernel and Linux graphics stack.
Worked in a Ubuntu/precise 12.04 LTS WUBI (installation) environment -
5 years (full LTS period) long!
How many Linux-kernel bugs got reported and/or fixed...
Debian/stretch...Debian/bullseye with no fresh installation. Rolling release.

I remember my decision in March 2012 not to choose that Asus notebook
with the first hardware-revision of IvyBridge and bought
conservatively a SandyBridge Gen. 2 Samsung notebook.

It's a pity to see no or restricted/limited Vulkan support.

If you are not concerned - life goes on for you.

It's like being white colored not understanding what "Black Lives
Matter" really means.
If people use or talk about white/black listings then allow/deny lists.
Or being a female software developer having a 10-15% less salary
because you are not male - in the same department!
This week we had our 100th anniversary of International Women's Day.
I am not black - I am male - I am not concerned - Live goes on?

Again, this machine is able to do fast Linux-kernel builds with an
adapted Debian Linux v5.10 kernel-config.
If you do NOT use Debian's LLVM/Clang - means build a selfmade
stage1-only LLVM toolchain (saves ~1 hour of build-time) - or a
ThinLTO+PGO optimized LLVM toolchain (saves again ~1 hour of
build-time).
Latest Linus Git plus With Clang-CFI took me today approx. 04:20
[hh:mm] with a selfmade stage1-only LLVM toolchain version 12.0.0-rc3.
Again, this is amazing.

What I wanna try to say is:
This is old hardware but you can - if you are a smart enough -
optimize your builds.

On the other hand I can understand dropping support for XXX whatever hardware...
Where is the limit(ation):
Support 10 years or 7 years old hardware?

Sorry, I am a bit concerned that this is the beginning - or a backdoor
? - to drop (optimized) Intel SandyBridge support.

So, what do I need to do - to have "ultimate performance" back for
SandyBridge with your patchset :-)?

Yes, you are right: Life goes on.

Regards,
- Sedat -


> This gives us deterministic NOPs and restores sanity.
>
>
>
> [1] https://lkml.kernel.org/r/20210302105827.3403656c@gandalf.local.home
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/2] x86: Remove ideal_nops[]
  2021-03-12 14:29 ` [PATCH 0/2] x86: Remove ideal_nops[] Sedat Dilek
@ 2021-03-12 14:47   ` Borislav Petkov
  2021-03-12 17:26     ` Steven Rostedt
  0 siblings, 1 reply; 51+ messages in thread
From: Borislav Petkov @ 2021-03-12 14:47 UTC (permalink / raw)
  To: Sedat Dilek
  Cc: Peter Zijlstra, x86, rostedt, hpa, torvalds, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

On Fri, Mar 12, 2021 at 03:29:48PM +0100, Sedat Dilek wrote:
> What does this change exactly mean to/for me?

Probably nothing.

I would be very surprised if it would be at all noticeable for you -
it's not like the kernel is executing long streams of NOPs in fast
paths.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/2] x86: Remove ideal_nops[]
  2021-03-12 14:47   ` Borislav Petkov
@ 2021-03-12 17:26     ` Steven Rostedt
  2021-03-12 17:35       ` Sedat Dilek
  0 siblings, 1 reply; 51+ messages in thread
From: Steven Rostedt @ 2021-03-12 17:26 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Sedat Dilek, Peter Zijlstra, x86, hpa, torvalds, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

On Fri, 12 Mar 2021 15:47:26 +0100
Borislav Petkov <bp@alien8.de> wrote:

> On Fri, Mar 12, 2021 at 03:29:48PM +0100, Sedat Dilek wrote:
> > What does this change exactly mean to/for me?  
> 
> Probably nothing.
> 
> I would be very surprised if it would be at all noticeable for you -
> it's not like the kernel is executing long streams of NOPs in fast
> paths.
> 

With ftrace enabled, every function starts with a NOP. But that said, the
simple answer is for Sedat to apply the patches on his box and do some
performance testing. It doesn't matter if you are white, black, male,
female, or anything in between. As my daughter's swim coach said; it's the
numbers that matter here. Run a bunch of benchmarks on your box on the
latest kernel, apply Peter's patches, and then run the benchmarks again on
the latest kernel with Peter's patches and then report the difference. If
it's negligible then there's nothing to worry about.

-- Steve

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/2] x86: Remove ideal_nops[]
  2021-03-12 17:26     ` Steven Rostedt
@ 2021-03-12 17:35       ` Sedat Dilek
  2021-03-12 17:46         ` Borislav Petkov
  2021-03-12 17:47         ` Steven Rostedt
  0 siblings, 2 replies; 51+ messages in thread
From: Sedat Dilek @ 2021-03-12 17:35 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Borislav Petkov, Peter Zijlstra, x86, hpa, torvalds,
	linux-kernel, linux-toolchains, jpoimboe, alexei.starovoitov,
	mhiramat

On Fri, Mar 12, 2021 at 6:26 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Fri, 12 Mar 2021 15:47:26 +0100
> Borislav Petkov <bp@alien8.de> wrote:
>
> > On Fri, Mar 12, 2021 at 03:29:48PM +0100, Sedat Dilek wrote:
> > > What does this change exactly mean to/for me?
> >
> > Probably nothing.
> >
> > I would be very surprised if it would be at all noticeable for you -
> > it's not like the kernel is executing long streams of NOPs in fast
> > paths.
> >
>
> With ftrace enabled, every function starts with a NOP. But that said, the
> simple answer is for Sedat to apply the patches on his box and do some
> performance testing. It doesn't matter if you are white, black, male,
> female, or anything in between. As my daughter's swim coach said; it's the
> numbers that matter here. Run a bunch of benchmarks on your box on the
> latest kernel, apply Peter's patches, and then run the benchmarks again on
> the latest kernel with Peter's patches and then report the difference. If
> it's negligible then there's nothing to worry about.
>

Hey Steve, you degraded me to a number :-).

I dunno which Git tree this patchset applies to, but I check if I can
apply the patchset to my current local Git.
Then build a kernel in the same build-environment.
Lemme see.

To say with Linus's words:
"Numbers talk - bullshit walks."

- Sedat -

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/2] x86: Remove ideal_nops[]
  2021-03-12 17:35       ` Sedat Dilek
@ 2021-03-12 17:46         ` Borislav Petkov
  2021-03-12 17:47         ` Steven Rostedt
  1 sibling, 0 replies; 51+ messages in thread
From: Borislav Petkov @ 2021-03-12 17:46 UTC (permalink / raw)
  To: Sedat Dilek
  Cc: Steven Rostedt, Peter Zijlstra, x86, hpa, torvalds, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

On Fri, Mar 12, 2021 at 06:35:45PM +0100, Sedat Dilek wrote:
> Hey Steve, you degraded me to a number :-).

How did he degrade you to a number?! Actually, he went the length to
patiently explain what you could do.

> I dunno which Git tree this patchset applies to, but I check if I can

https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git

master branch.

> apply the patchset to my current local Git.
> Then build a kernel in the same build-environment.

Yes, what Steve said. You can run some benchmarks and compare
before/after numbers.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/2] x86: Remove ideal_nops[]
  2021-03-12 17:35       ` Sedat Dilek
  2021-03-12 17:46         ` Borislav Petkov
@ 2021-03-12 17:47         ` Steven Rostedt
  2021-03-12 18:13           ` Sedat Dilek
  1 sibling, 1 reply; 51+ messages in thread
From: Steven Rostedt @ 2021-03-12 17:47 UTC (permalink / raw)
  To: Sedat Dilek
  Cc: Borislav Petkov, Peter Zijlstra, x86, hpa, torvalds,
	linux-kernel, linux-toolchains, jpoimboe, alexei.starovoitov,
	mhiramat

On Fri, 12 Mar 2021 18:35:45 +0100
Sedat Dilek <sedat.dilek@gmail.com> wrote:


> Hey Steve, you degraded me to a number :-).

It's the internet, everyone is a number.

> 
> I dunno which Git tree this patchset applies to, but I check if I can
> apply the patchset to my current local Git.

Try Linus's latest.

> Then build a kernel in the same build-environment.
> Lemme see.
> 
> To say with Linus's words:
> "Numbers talk - bullshit walks."

Exactly.

-- Steve

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/2] x86: Remove ideal_nops[]
  2021-03-12 17:47         ` Steven Rostedt
@ 2021-03-12 18:13           ` Sedat Dilek
  2021-03-12 19:03             ` Sedat Dilek
  0 siblings, 1 reply; 51+ messages in thread
From: Sedat Dilek @ 2021-03-12 18:13 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Borislav Petkov, Peter Zijlstra, x86, hpa, torvalds,
	linux-kernel, linux-toolchains, jpoimboe, alexei.starovoitov,
	mhiramat

[-- Attachment #1: Type: text/plain, Size: 730 bytes --]

On Fri, Mar 12, 2021 at 6:47 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Fri, 12 Mar 2021 18:35:45 +0100
> Sedat Dilek <sedat.dilek@gmail.com> wrote:
>
>
> > Hey Steve, you degraded me to a number :-).
>
> It's the internet, everyone is a number.
>
> >
> > I dunno which Git tree this patchset applies to, but I check if I can
> > apply the patchset to my current local Git.
>
> Try Linus's latest.
>

$ git describe origin/HEAD
v5.12-rc2-338-gf78d76e72a46

I adapted 1/2 in arch/x86/include/asm/jump_label.h to fit ^^^, see attachment.

- Sedat -




> > Then build a kernel in the same build-environment.
> > Lemme see.
> >
> > To say with Linus's words:
> > "Numbers talk - bullshit walks."
>
> Exactly.
>
> -- Steve

[-- Attachment #2: 20210312_peterz_x86_remove_ideal_nops-dileks-v2.mbx --]
[-- Type: application/octet-stream, Size: 26033 bytes --]

From MAILER-DAEMON Fri Mar 12 17:46:39 2021
Message-ID: <20210312115749.065275711@infradead.org>
Date: Fri, 12 Mar 2021 12:32:54 +0100
From: Peter Zijlstra <peterz@infradead.org>
To: x86@kernel.org, rostedt@goodmis.org, hpa@zytor.com, torvalds@linuxfoundation.org
Cc: linux-kernel@vger.kernel.org, linux-toolchains@vger.kernel.org, peterz@infradead.org, jpoimboe@redhat.com, alexei.starovoitov@gmail.com, mhiramat@kernel.org
Subject: [PATCH 1/2] x86: Remove dynamic NOP selection
References: <20210312113253.305040674@infradead.org>
List-ID: <linux-toolchains.vger.kernel.org>
X-Mailing-List: linux-toolchains@vger.kernel.org
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit

This ensures that a NOP is a NOP and not a random other instruction
that is also a NOP. It allows simplification of dynamic code patching
that wants to verify existing code before writing new instructions
(ftrace, jump_label, static_call, etc..).

Differentiating on NOPs is not a feature.

This pessimises 32bit (DONTCARE) and 32bit on 64bit CPUs
(CARELESS). 32bit is not a performance target.

Everything x86_64 since AMD K10 (2007) and Intel IvyBridge (2012) is
fine with using NOPL (as opposed to prefix NOP). And per FEATURE_NOPL
being required for x86_64, all x86_64 CPUs can use NOPL. So stop
caring about NOPs, simplify things and get on with life.

[ The problem seems to be that some uarchs can only decode NOPL on a
single front-end port while others have severe decode penalties for
excessive prefixes. All modern uarchs can handle both, except Atom,
which has prefix penalties. ]

[ Also, much doubt you can actually measure any of this on normal
workloads. ]

After this FEATURE_NOPL is unused except for required-features for
x86_64. FEATURE_K8 is only used for PTI and FEATURE_K7 is unused.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> # bpf
---
Changes v1->v2 (dileks):
- Adapt to fit Linux v5.12-rc2-338-gf78d76e72a46

 arch/x86/include/asm/jump_label.h    |   12 --
 arch/x86/include/asm/nops.h          |  180 ++++++++++---------------------
 arch/x86/include/asm/special_insns.h |    4 
 arch/x86/kernel/alternative.c        |  198 +++--------------------------------
 arch/x86/kernel/ftrace.c             |    4 
 arch/x86/kernel/jump_label.c         |   32 +----
 arch/x86/kernel/kprobes/core.c       |    2 
 arch/x86/kernel/setup.c              |    1 
 arch/x86/kernel/static_call.c        |    4 
 arch/x86/net/bpf_jit_comp.c          |    8 -
 10 files changed, 98 insertions(+), 347 deletions(-)

--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -6,12 +6,6 @@
 
 #define JUMP_LABEL_NOP_SIZE 5
 
-#ifdef CONFIG_X86_64
-# define STATIC_KEY_INIT_NOP P6_NOP5_ATOMIC
-#else
-# define STATIC_KEY_INIT_NOP GENERIC_NOP5_ATOMIC
-#endif
-
 #include <asm/asm.h>
 #include <asm/nops.h>
 
@@ -23,7 +17,7 @@
 static __always_inline bool arch_static_branch(struct static_key *key, bool branch)
 {
 	asm_volatile_goto("1:"
-		".byte " __stringify(STATIC_KEY_INIT_NOP) "\n\t"
+		".byte " __stringify(BYTES_NOP5) "\n\t"
 		".pushsection __jump_table,  \"aw\" \n\t"
 		_ASM_ALIGN "\n\t"
 		".long 1b - ., %l[l_yes] - . \n\t"
@@ -63,7 +57,7 @@ static __always_inline bool arch_static_branch_jump(struct static_key *key, bool
 	.long		\target - .Lstatic_jump_after_\@
 .Lstatic_jump_after_\@:
 	.else
-	.byte		STATIC_KEY_INIT_NOP
+	.byte		BYTES_NOP5
 	.endif
 	.pushsection __jump_table, "aw"
 	_ASM_ALIGN
@@ -75,7 +69,7 @@ static __always_inline bool arch_static_branch_jump(struct static_key *key, bool
 .macro STATIC_JUMP_IF_FALSE target, key, def
 .Lstatic_jump_\@:
 	.if \def
-	.byte		STATIC_KEY_INIT_NOP
+	.byte		BYTES_NOP5
 	.else
 	/* Equivalent to "jmp.d32 \target" */
 	.byte		0xe9
--- a/arch/x86/include/asm/nops.h
+++ b/arch/x86/include/asm/nops.h
@@ -4,89 +4,58 @@
 
 /*
  * Define nops for use with alternative() and for tracing.
+ */
+
+#ifndef CONFIG_64BIT
+
+/*
+ * Generic 32bit nops from GAS:
+ *
+ * 1: nop
+ * 2: movl %esi,%esi
+ * 3: leal 0x0(%esi),%esi
+ * 4: leal 0x0(%esi,%eiz,1),%esi
+ * 5: leal %ds:0x0(%esi,%eiz,1),%esi
+ * 6: leal 0x0(%esi),%esi
+ * 7: leal 0x0(%esi,%eiz,1),%esi
+ * 8: leal %ds:0x0(%esi,%eiz,1),%esi
  *
- * *_NOP5_ATOMIC must be a single instruction.
+ * Except 5 and 8, which are DS prefixed 4 and 7 resp, where GAS would emit 2
+ * nop instructions.
  */
+#define BYTES_NOP1	0x90
+#define BYTES_NOP2	0x89,0xf6
+#define BYTES_NOP3	0x8d,0x76,0x00
+#define BYTES_NOP4	0x8d,0x74,0x26,0x00
+#define BYTES_NOP5	0x3e,BYTES_NOP4
+#define BYTES_NOP6	0x8d,0xb6,0x00,0x00,0x00,0x00
+#define BYTES_NOP7	0x8d,0xb4,0x26,0x00,0x00,0x00,0x00
+#define BYTES_NOP8	0x3e,BYTES_NOP7
+
+#else
 
-#define NOP_DS_PREFIX 0x3e
+/*
+ * Generic 64bit nops from GAS:
+ *
+ * 1: nop
+ * 2: osp nop
+ * 3: nopl (%eax)
+ * 4: nopl 0x00(%eax)
+ * 5: nopl 0x00(%eax,%eax,1)
+ * 6: osp nopl 0x00(%eax,%eax,1)
+ * 7: nopl 0x00000000(%eax)
+ * 8: nopl 0x00000000(%eax,%eax,1)
+ */
+#define BYTES_NOP1	0x90
+#define BYTES_NOP2	0x66,BYTES_NOP1
+#define BYTES_NOP3	0x0f,0x1f,0x00
+#define BYTES_NOP4	0x0f,0x1f,0x40,0x00
+#define BYTES_NOP5	0x0f,0x1f,0x44,0x00,0x00
+#define BYTES_NOP6	0x66,BYTES_NOP5
+#define BYTES_NOP7	0x0f,0x1f,0x80,0x00,0x00,0x00,0x00
+#define BYTES_NOP8	0x0f,0x1f,0x84,0x00,0x00,0x00,0x00,0x00
 
-/* generic versions from gas
-   1: nop
-   the following instructions are NOT nops in 64-bit mode,
-   for 64-bit mode use K8 or P6 nops instead
-   2: movl %esi,%esi
-   3: leal 0x00(%esi),%esi
-   4: leal 0x00(,%esi,1),%esi
-   6: leal 0x00000000(%esi),%esi
-   7: leal 0x00000000(,%esi,1),%esi
-*/
-#define GENERIC_NOP1 0x90
-#define GENERIC_NOP2 0x89,0xf6
-#define GENERIC_NOP3 0x8d,0x76,0x00
-#define GENERIC_NOP4 0x8d,0x74,0x26,0x00
-#define GENERIC_NOP5 GENERIC_NOP1,GENERIC_NOP4
-#define GENERIC_NOP6 0x8d,0xb6,0x00,0x00,0x00,0x00
-#define GENERIC_NOP7 0x8d,0xb4,0x26,0x00,0x00,0x00,0x00
-#define GENERIC_NOP8 GENERIC_NOP1,GENERIC_NOP7
-#define GENERIC_NOP5_ATOMIC NOP_DS_PREFIX,GENERIC_NOP4
-
-/* Opteron 64bit nops
-   1: nop
-   2: osp nop
-   3: osp osp nop
-   4: osp osp osp nop
-*/
-#define K8_NOP1 GENERIC_NOP1
-#define K8_NOP2	0x66,K8_NOP1
-#define K8_NOP3	0x66,K8_NOP2
-#define K8_NOP4	0x66,K8_NOP3
-#define K8_NOP5	K8_NOP3,K8_NOP2
-#define K8_NOP6	K8_NOP3,K8_NOP3
-#define K8_NOP7	K8_NOP4,K8_NOP3
-#define K8_NOP8	K8_NOP4,K8_NOP4
-#define K8_NOP5_ATOMIC 0x66,K8_NOP4
-
-/* K7 nops
-   uses eax dependencies (arbitrary choice)
-   1: nop
-   2: movl %eax,%eax
-   3: leal (,%eax,1),%eax
-   4: leal 0x00(,%eax,1),%eax
-   6: leal 0x00000000(%eax),%eax
-   7: leal 0x00000000(,%eax,1),%eax
-*/
-#define K7_NOP1	GENERIC_NOP1
-#define K7_NOP2	0x8b,0xc0
-#define K7_NOP3	0x8d,0x04,0x20
-#define K7_NOP4	0x8d,0x44,0x20,0x00
-#define K7_NOP5	K7_NOP4,K7_NOP1
-#define K7_NOP6	0x8d,0x80,0,0,0,0
-#define K7_NOP7	0x8D,0x04,0x05,0,0,0,0
-#define K7_NOP8	K7_NOP7,K7_NOP1
-#define K7_NOP5_ATOMIC NOP_DS_PREFIX,K7_NOP4
-
-/* P6 nops
-   uses eax dependencies (Intel-recommended choice)
-   1: nop
-   2: osp nop
-   3: nopl (%eax)
-   4: nopl 0x00(%eax)
-   5: nopl 0x00(%eax,%eax,1)
-   6: osp nopl 0x00(%eax,%eax,1)
-   7: nopl 0x00000000(%eax)
-   8: nopl 0x00000000(%eax,%eax,1)
-   Note: All the above are assumed to be a single instruction.
-	There is kernel code that depends on this.
-*/
-#define P6_NOP1	GENERIC_NOP1
-#define P6_NOP2	0x66,0x90
-#define P6_NOP3	0x0f,0x1f,0x00
-#define P6_NOP4	0x0f,0x1f,0x40,0
-#define P6_NOP5	0x0f,0x1f,0x44,0x00,0
-#define P6_NOP6	0x66,0x0f,0x1f,0x44,0x00,0
-#define P6_NOP7	0x0f,0x1f,0x80,0,0,0,0
-#define P6_NOP8	0x0f,0x1f,0x84,0x00,0,0,0,0
-#define P6_NOP5_ATOMIC P6_NOP5
+#endif /* CONFIG_64BIT */
 
 #ifdef __ASSEMBLY__
 #define _ASM_MK_NOP(x) .byte x
@@ -94,54 +63,19 @@
 #define _ASM_MK_NOP(x) ".byte " __stringify(x) "\n"
 #endif
 
-#if defined(CONFIG_MK7)
-#define ASM_NOP1 _ASM_MK_NOP(K7_NOP1)
-#define ASM_NOP2 _ASM_MK_NOP(K7_NOP2)
-#define ASM_NOP3 _ASM_MK_NOP(K7_NOP3)
-#define ASM_NOP4 _ASM_MK_NOP(K7_NOP4)
-#define ASM_NOP5 _ASM_MK_NOP(K7_NOP5)
-#define ASM_NOP6 _ASM_MK_NOP(K7_NOP6)
-#define ASM_NOP7 _ASM_MK_NOP(K7_NOP7)
-#define ASM_NOP8 _ASM_MK_NOP(K7_NOP8)
-#define ASM_NOP5_ATOMIC _ASM_MK_NOP(K7_NOP5_ATOMIC)
-#elif defined(CONFIG_X86_P6_NOP)
-#define ASM_NOP1 _ASM_MK_NOP(P6_NOP1)
-#define ASM_NOP2 _ASM_MK_NOP(P6_NOP2)
-#define ASM_NOP3 _ASM_MK_NOP(P6_NOP3)
-#define ASM_NOP4 _ASM_MK_NOP(P6_NOP4)
-#define ASM_NOP5 _ASM_MK_NOP(P6_NOP5)
-#define ASM_NOP6 _ASM_MK_NOP(P6_NOP6)
-#define ASM_NOP7 _ASM_MK_NOP(P6_NOP7)
-#define ASM_NOP8 _ASM_MK_NOP(P6_NOP8)
-#define ASM_NOP5_ATOMIC _ASM_MK_NOP(P6_NOP5_ATOMIC)
-#elif defined(CONFIG_X86_64)
-#define ASM_NOP1 _ASM_MK_NOP(K8_NOP1)
-#define ASM_NOP2 _ASM_MK_NOP(K8_NOP2)
-#define ASM_NOP3 _ASM_MK_NOP(K8_NOP3)
-#define ASM_NOP4 _ASM_MK_NOP(K8_NOP4)
-#define ASM_NOP5 _ASM_MK_NOP(K8_NOP5)
-#define ASM_NOP6 _ASM_MK_NOP(K8_NOP6)
-#define ASM_NOP7 _ASM_MK_NOP(K8_NOP7)
-#define ASM_NOP8 _ASM_MK_NOP(K8_NOP8)
-#define ASM_NOP5_ATOMIC _ASM_MK_NOP(K8_NOP5_ATOMIC)
-#else
-#define ASM_NOP1 _ASM_MK_NOP(GENERIC_NOP1)
-#define ASM_NOP2 _ASM_MK_NOP(GENERIC_NOP2)
-#define ASM_NOP3 _ASM_MK_NOP(GENERIC_NOP3)
-#define ASM_NOP4 _ASM_MK_NOP(GENERIC_NOP4)
-#define ASM_NOP5 _ASM_MK_NOP(GENERIC_NOP5)
-#define ASM_NOP6 _ASM_MK_NOP(GENERIC_NOP6)
-#define ASM_NOP7 _ASM_MK_NOP(GENERIC_NOP7)
-#define ASM_NOP8 _ASM_MK_NOP(GENERIC_NOP8)
-#define ASM_NOP5_ATOMIC _ASM_MK_NOP(GENERIC_NOP5_ATOMIC)
-#endif
+#define ASM_NOP1 _ASM_MK_NOP(BYTES_NOP1)
+#define ASM_NOP2 _ASM_MK_NOP(BYTES_NOP2)
+#define ASM_NOP3 _ASM_MK_NOP(BYTES_NOP3)
+#define ASM_NOP4 _ASM_MK_NOP(BYTES_NOP4)
+#define ASM_NOP5 _ASM_MK_NOP(BYTES_NOP5)
+#define ASM_NOP6 _ASM_MK_NOP(BYTES_NOP6)
+#define ASM_NOP7 _ASM_MK_NOP(BYTES_NOP7)
+#define ASM_NOP8 _ASM_MK_NOP(BYTES_NOP8)
 
 #define ASM_NOP_MAX 8
-#define NOP_ATOMIC5 (ASM_NOP_MAX+1)	/* Entry for the 5-byte atomic NOP */
 
 #ifndef __ASSEMBLY__
-extern const unsigned char * const *ideal_nops;
-extern void arch_init_ideal_nops(void);
+extern const unsigned char * const x86_nops[];
 #endif
 
 #endif /* _ASM_X86_NOPS_H */
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -214,7 +214,7 @@ static inline void clflush(volatile void
 
 static inline void clflushopt(volatile void *__p)
 {
-	alternative_io(".byte " __stringify(NOP_DS_PREFIX) "; clflush %P0",
+	alternative_io(".byte 0x3e; clflush %P0",
 		       ".byte 0x66; clflush %P0",
 		       X86_FEATURE_CLFLUSHOPT,
 		       "+m" (*(volatile char __force *)__p));
@@ -225,7 +225,7 @@ static inline void clwb(volatile void *_
 	volatile struct { char x[64]; } *p = __p;
 
 	asm volatile(ALTERNATIVE_2(
-		".byte " __stringify(NOP_DS_PREFIX) "; clflush (%[pax])",
+		".byte 0x3e; clflush (%[pax])",
 		".byte 0x66; clflush (%[pax])", /* clflushopt (%%rax) */
 		X86_FEATURE_CLFLUSHOPT,
 		".byte 0x66, 0x0f, 0xae, 0x30",  /* clwb (%%rax) */
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -74,186 +74,30 @@ do {									\
 	}								\
 } while (0)
 
-/*
- * Each GENERIC_NOPX is of X bytes, and defined as an array of bytes
- * that correspond to that nop. Getting from one nop to the next, we
- * add to the array the offset that is equal to the sum of all sizes of
- * nops preceding the one we are after.
- *
- * Note: The GENERIC_NOP5_ATOMIC is at the end, as it breaks the
- * nice symmetry of sizes of the previous nops.
- */
-#if defined(GENERIC_NOP1) && !defined(CONFIG_X86_64)
-static const unsigned char intelnops[] =
+const unsigned char x86nops[] =
 {
-	GENERIC_NOP1,
-	GENERIC_NOP2,
-	GENERIC_NOP3,
-	GENERIC_NOP4,
-	GENERIC_NOP5,
-	GENERIC_NOP6,
-	GENERIC_NOP7,
-	GENERIC_NOP8,
-	GENERIC_NOP5_ATOMIC
-};
-static const unsigned char * const intel_nops[ASM_NOP_MAX+2] =
-{
-	NULL,
-	intelnops,
-	intelnops + 1,
-	intelnops + 1 + 2,
-	intelnops + 1 + 2 + 3,
-	intelnops + 1 + 2 + 3 + 4,
-	intelnops + 1 + 2 + 3 + 4 + 5,
-	intelnops + 1 + 2 + 3 + 4 + 5 + 6,
-	intelnops + 1 + 2 + 3 + 4 + 5 + 6 + 7,
-	intelnops + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8,
+	BYTES_NOP1,
+	BYTES_NOP2,
+	BYTES_NOP3,
+	BYTES_NOP4,
+	BYTES_NOP5,
+	BYTES_NOP6,
+	BYTES_NOP7,
+	BYTES_NOP8,
 };
-#endif
 
-#ifdef K8_NOP1
-static const unsigned char k8nops[] =
-{
-	K8_NOP1,
-	K8_NOP2,
-	K8_NOP3,
-	K8_NOP4,
-	K8_NOP5,
-	K8_NOP6,
-	K8_NOP7,
-	K8_NOP8,
-	K8_NOP5_ATOMIC
-};
-static const unsigned char * const k8_nops[ASM_NOP_MAX+2] =
-{
-	NULL,
-	k8nops,
-	k8nops + 1,
-	k8nops + 1 + 2,
-	k8nops + 1 + 2 + 3,
-	k8nops + 1 + 2 + 3 + 4,
-	k8nops + 1 + 2 + 3 + 4 + 5,
-	k8nops + 1 + 2 + 3 + 4 + 5 + 6,
-	k8nops + 1 + 2 + 3 + 4 + 5 + 6 + 7,
-	k8nops + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8,
-};
-#endif
-
-#if defined(K7_NOP1) && !defined(CONFIG_X86_64)
-static const unsigned char k7nops[] =
-{
-	K7_NOP1,
-	K7_NOP2,
-	K7_NOP3,
-	K7_NOP4,
-	K7_NOP5,
-	K7_NOP6,
-	K7_NOP7,
-	K7_NOP8,
-	K7_NOP5_ATOMIC
-};
-static const unsigned char * const k7_nops[ASM_NOP_MAX+2] =
-{
-	NULL,
-	k7nops,
-	k7nops + 1,
-	k7nops + 1 + 2,
-	k7nops + 1 + 2 + 3,
-	k7nops + 1 + 2 + 3 + 4,
-	k7nops + 1 + 2 + 3 + 4 + 5,
-	k7nops + 1 + 2 + 3 + 4 + 5 + 6,
-	k7nops + 1 + 2 + 3 + 4 + 5 + 6 + 7,
-	k7nops + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8,
-};
-#endif
-
-#ifdef P6_NOP1
-static const unsigned char p6nops[] =
-{
-	P6_NOP1,
-	P6_NOP2,
-	P6_NOP3,
-	P6_NOP4,
-	P6_NOP5,
-	P6_NOP6,
-	P6_NOP7,
-	P6_NOP8,
-	P6_NOP5_ATOMIC
-};
-static const unsigned char * const p6_nops[ASM_NOP_MAX+2] =
+const unsigned char * const x86_nops[ASM_NOP_MAX+1] =
 {
 	NULL,
-	p6nops,
-	p6nops + 1,
-	p6nops + 1 + 2,
-	p6nops + 1 + 2 + 3,
-	p6nops + 1 + 2 + 3 + 4,
-	p6nops + 1 + 2 + 3 + 4 + 5,
-	p6nops + 1 + 2 + 3 + 4 + 5 + 6,
-	p6nops + 1 + 2 + 3 + 4 + 5 + 6 + 7,
-	p6nops + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8,
+	x86nops,
+	x86nops + 1,
+	x86nops + 1 + 2,
+	x86nops + 1 + 2 + 3,
+	x86nops + 1 + 2 + 3 + 4,
+	x86nops + 1 + 2 + 3 + 4 + 5,
+	x86nops + 1 + 2 + 3 + 4 + 5 + 6,
+	x86nops + 1 + 2 + 3 + 4 + 5 + 6 + 7,
 };
-#endif
-
-/* Initialize these to a safe default */
-#ifdef CONFIG_X86_64
-const unsigned char * const *ideal_nops = p6_nops;
-#else
-const unsigned char * const *ideal_nops = intel_nops;
-#endif
-
-void __init arch_init_ideal_nops(void)
-{
-	switch (boot_cpu_data.x86_vendor) {
-	case X86_VENDOR_INTEL:
-		/*
-		 * Due to a decoder implementation quirk, some
-		 * specific Intel CPUs actually perform better with
-		 * the "k8_nops" than with the SDM-recommended NOPs.
-		 */
-		if (boot_cpu_data.x86 == 6 &&
-		    boot_cpu_data.x86_model >= 0x0f &&
-		    boot_cpu_data.x86_model != 0x1c &&
-		    boot_cpu_data.x86_model != 0x26 &&
-		    boot_cpu_data.x86_model != 0x27 &&
-		    boot_cpu_data.x86_model < 0x30) {
-			ideal_nops = k8_nops;
-		} else if (boot_cpu_has(X86_FEATURE_NOPL)) {
-			   ideal_nops = p6_nops;
-		} else {
-#ifdef CONFIG_X86_64
-			ideal_nops = k8_nops;
-#else
-			ideal_nops = intel_nops;
-#endif
-		}
-		break;
-
-	case X86_VENDOR_HYGON:
-		ideal_nops = p6_nops;
-		return;
-
-	case X86_VENDOR_AMD:
-		if (boot_cpu_data.x86 > 0xf) {
-			ideal_nops = p6_nops;
-			return;
-		}
-
-		fallthrough;
-
-	default:
-#ifdef CONFIG_X86_64
-		ideal_nops = k8_nops;
-#else
-		if (boot_cpu_has(X86_FEATURE_K8))
-			ideal_nops = k8_nops;
-		else if (boot_cpu_has(X86_FEATURE_K7))
-			ideal_nops = k7_nops;
-		else
-			ideal_nops = intel_nops;
-#endif
-	}
-}
 
 /* Use this to add nops to a buffer, then text_poke the whole buffer. */
 static void __init_or_module add_nops(void *insns, unsigned int len)
@@ -262,7 +106,7 @@ static void __init_or_module add_nops(vo
 		unsigned int noplen = len;
 		if (noplen > ASM_NOP_MAX)
 			noplen = ASM_NOP_MAX;
-		memcpy(insns, ideal_nops[noplen], noplen);
+		memcpy(insns, x86_nops[noplen], noplen);
 		insns += noplen;
 		len -= noplen;
 	}
@@ -1302,13 +1146,13 @@ static void text_poke_loc_init(struct te
 	default: /* assume NOP */
 		switch (len) {
 		case 2: /* NOP2 -- emulate as JMP8+0 */
-			BUG_ON(memcmp(emulate, ideal_nops[len], len));
+			BUG_ON(memcmp(emulate, x86_nops[len], len));
 			tp->opcode = JMP8_INSN_OPCODE;
 			tp->rel32 = 0;
 			break;
 
 		case 5: /* NOP5 -- emulate as JMP32+0 */
-			BUG_ON(memcmp(emulate, ideal_nops[NOP_ATOMIC5], len));
+			BUG_ON(memcmp(emulate, x86_nops[len], len));
 			tp->opcode = JMP32_INSN_OPCODE;
 			tp->rel32 = 0;
 			break;
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -66,7 +66,7 @@ int ftrace_arch_code_modify_post_process
 
 static const char *ftrace_nop_replace(void)
 {
-	return ideal_nops[NOP_ATOMIC5];
+	return x86_nops[5];
 }
 
 static const char *ftrace_call_replace(unsigned long ip, unsigned long addr)
@@ -377,7 +377,7 @@ create_trampoline(struct ftrace_ops *ops
 		ip = trampoline + (jmp_offset - start_offset);
 		if (WARN_ON(*(char *)ip != 0x75))
 			goto fail;
-		ret = copy_from_kernel_nofault(ip, ideal_nops[2], 2);
+		ret = copy_from_kernel_nofault(ip, x86_nops[2], 2);
 		if (ret < 0)
 			goto fail;
 	}
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -28,10 +28,8 @@ static void bug_at(const void *ip, int l
 }
 
 static const void *
-__jump_label_set_jump_code(struct jump_entry *entry, enum jump_label_type type, int init)
+__jump_label_set_jump_code(struct jump_entry *entry, enum jump_label_type type)
 {
-	const unsigned char default_nop[] = { STATIC_KEY_INIT_NOP };
-	const unsigned char *ideal_nop = ideal_nops[NOP_ATOMIC5];
 	const void *expect, *code;
 	const void *addr, *dest;
 	int line;
@@ -41,10 +39,8 @@ __jump_label_set_jump_code(struct jump_e
 
 	code = text_gen_insn(JMP32_INSN_OPCODE, addr, dest);
 
-	if (init) {
-		expect = default_nop; line = __LINE__;
-	} else if (type == JUMP_LABEL_JMP) {
-		expect = ideal_nop; line = __LINE__;
+	if (type == JUMP_LABEL_JMP) {
+		expect = x86_nops[5]; line = __LINE__;
 	} else {
 		expect = code; line = __LINE__;
 	}
@@ -53,7 +49,7 @@ __jump_label_set_jump_code(struct jump_e
 		bug_at(addr, line);
 
 	if (type == JUMP_LABEL_NOP)
-		code = ideal_nop;
+		code = x86_nops[5];
 
 	return code;
 }
@@ -62,7 +58,7 @@ static inline void __jump_label_transfor
 					  enum jump_label_type type,
 					  int init)
 {
-	const void *opcode = __jump_label_set_jump_code(entry, type, init);
+	const void *opcode = __jump_label_set_jump_code(entry, type);
 
 	/*
 	 * As long as only a single processor is running and the code is still
@@ -113,7 +109,7 @@ bool arch_jump_label_transform_queue(str
 	}
 
 	mutex_lock(&text_mutex);
-	opcode = __jump_label_set_jump_code(entry, type, 0);
+	opcode = __jump_label_set_jump_code(entry, type);
 	text_poke_queue((void *)jump_entry_code(entry),
 			opcode, JUMP_LABEL_NOP_SIZE, NULL);
 	mutex_unlock(&text_mutex);
@@ -136,22 +132,6 @@ static enum {
 __init_or_module void arch_jump_label_transform_static(struct jump_entry *entry,
 				      enum jump_label_type type)
 {
-	/*
-	 * This function is called at boot up and when modules are
-	 * first loaded. Check if the default nop, the one that is
-	 * inserted at compile time, is the ideal nop. If it is, then
-	 * we do not need to update the nop, and we can leave it as is.
-	 * If it is not, then we need to update the nop to the ideal nop.
-	 */
-	if (jlstate == JL_STATE_START) {
-		const unsigned char default_nop[] = { STATIC_KEY_INIT_NOP };
-		const unsigned char *ideal_nop = ideal_nops[NOP_ATOMIC5];
-
-		if (memcmp(ideal_nop, default_nop, 5) != 0)
-			jlstate = JL_STATE_UPDATE;
-		else
-			jlstate = JL_STATE_NO_UPDATE;
-	}
 	if (jlstate == JL_STATE_UPDATE)
 		jump_label_transform(entry, type, 1);
 }
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -229,7 +229,7 @@ __recover_probed_insn(kprobe_opcode_t *b
 		return 0UL;
 
 	if (faddr)
-		memcpy(buf, ideal_nops[NOP_ATOMIC5], 5);
+		memcpy(buf, x86_nops[5], 5);
 	else
 		buf[0] = kp->opcode;
 	return (unsigned long)buf;
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -822,7 +822,6 @@ void __init setup_arch(char **cmdline_p)
 
 	idt_setup_early_traps();
 	early_cpu_init();
-	arch_init_ideal_nops();
 	jump_label_init();
 	static_call_init();
 	early_ioremap_init();
--- a/arch/x86/kernel/static_call.c
+++ b/arch/x86/kernel/static_call.c
@@ -34,7 +34,7 @@ static void __ref __static_call_transfor
 		break;
 
 	case NOP:
-		code = ideal_nops[NOP_ATOMIC5];
+		code = x86_nops[5];
 		break;
 
 	case JMP:
@@ -66,7 +66,7 @@ static void __static_call_validate(void
 			return;
 	} else {
 		if (opcode == CALL_INSN_OPCODE ||
-		    !memcmp(insn, ideal_nops[NOP_ATOMIC5], 5) ||
+		    !memcmp(insn, x86_nops[5], 5) ||
 		    !memcmp(insn, xor5rax, 5))
 			return;
 	}
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -282,7 +282,7 @@ static void emit_prologue(u8 **pprog, u3
 	/* BPF trampoline can be made to work without these nops,
 	 * but let's waste 5 bytes for now and optimize later
 	 */
-	memcpy(prog, ideal_nops[NOP_ATOMIC5], cnt);
+	memcpy(prog, x86_nops[5], cnt);
 	prog += cnt;
 	if (!ebpf_from_cbpf) {
 		if (tail_call_reachable && !is_subprog)
@@ -330,7 +330,7 @@ static int __bpf_arch_text_poke(void *ip
 				void *old_addr, void *new_addr,
 				const bool text_live)
 {
-	const u8 *nop_insn = ideal_nops[NOP_ATOMIC5];
+	const u8 *nop_insn = x86_nops[5];
 	u8 old_insn[X86_PATCH_SIZE];
 	u8 new_insn[X86_PATCH_SIZE];
 	u8 *prog;
@@ -560,7 +560,7 @@ static void emit_bpf_tail_call_direct(st
 	if (stack_depth)
 		EMIT3_off32(0x48, 0x81, 0xC4, round_up(stack_depth, 8));
 
-	memcpy(prog, ideal_nops[NOP_ATOMIC5], X86_PATCH_SIZE);
+	memcpy(prog, x86_nops[5], X86_PATCH_SIZE);
 	prog += X86_PATCH_SIZE;
 	/* out: */
 
@@ -881,7 +881,7 @@ static int emit_nops(u8 **pprog, int len
 			noplen = ASM_NOP_MAX;
 
 		for (i = 0; i < noplen; i++)
-			EMIT1(ideal_nops[noplen][i]);
+			EMIT1(x86_nops[noplen][i]);
 		len -= noplen;
 	}
 

From MAILER-DAEMON Fri Mar 12 17:46:39 2021
Message-ID: <20210312115749.136357911@infradead.org>
Date: Fri, 12 Mar 2021 12:32:55 +0100
From: Peter Zijlstra <peterz@infradead.org>
To: x86@kernel.org, rostedt@goodmis.org, hpa@zytor.com, torvalds@linuxfoundation.org
Cc: linux-kernel@vger.kernel.org, linux-toolchains@vger.kernel.org, peterz@infradead.org, jpoimboe@redhat.com, alexei.starovoitov@gmail.com, mhiramat@kernel.org
Subject: [PATCH 2/2] objtool,x86: Use asm/nops.h
References: <20210312113253.305040674@infradead.org>
List-ID: <linux-toolchains.vger.kernel.org>
X-Mailing-List: linux-toolchains@vger.kernel.org
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit

Since the kernel will rely on a single canonical set of NOPs, make
sure objtool uses the exact same ones.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 tools/arch/x86/include/asm/nops.h |   81 ++++++++++++++++++++++++++++++++++++++
 tools/objtool/arch/x86/decode.c   |   13 +++---
 tools/objtool/sync-check.sh       |    1 
 3 files changed, 90 insertions(+), 5 deletions(-)

--- /dev/null
+++ b/tools/arch/x86/include/asm/nops.h
@@ -0,0 +1,81 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_NOPS_H
+#define _ASM_X86_NOPS_H
+
+/*
+ * Define nops for use with alternative() and for tracing.
+ */
+
+#ifndef CONFIG_64BIT
+
+/*
+ * Generic 32bit nops from GAS:
+ *
+ * 1: nop
+ * 2: movl %esi,%esi
+ * 3: leal 0x0(%esi),%esi
+ * 4: leal 0x0(%esi,%eiz,1),%esi
+ * 5: leal %ds:0x0(%esi,%eiz,1),%esi
+ * 6: leal 0x0(%esi),%esi
+ * 7: leal 0x0(%esi,%eiz,1),%esi
+ * 8: leal %ds:0x0(%esi,%eiz,1),%esi
+ *
+ * Except 5 and 8, which are DS prefixed 4 and 7 resp, where GAS would emit 2
+ * nop instructions.
+ */
+#define BYTES_NOP1	0x90
+#define BYTES_NOP2	0x89,0xf6
+#define BYTES_NOP3	0x8d,0x76,0x00
+#define BYTES_NOP4	0x8d,0x74,0x26,0x00
+#define BYTES_NOP5	0x3e,BYTES_NOP4
+#define BYTES_NOP6	0x8d,0xb6,0x00,0x00,0x00,0x00
+#define BYTES_NOP7	0x8d,0xb4,0x26,0x00,0x00,0x00,0x00
+#define BYTES_NOP8	0x3e,BYTES_NOP7
+
+#else
+
+/*
+ * Generic 64bit nops from GAS:
+ *
+ * 1: nop
+ * 2: osp nop
+ * 3: nopl (%eax)
+ * 4: nopl 0x00(%eax)
+ * 5: nopl 0x00(%eax,%eax,1)
+ * 6: osp nopl 0x00(%eax,%eax,1)
+ * 7: nopl 0x00000000(%eax)
+ * 8: nopl 0x00000000(%eax,%eax,1)
+ */
+#define BYTES_NOP1	0x90
+#define BYTES_NOP2	0x66,BYTES_NOP1
+#define BYTES_NOP3	0x0f,0x1f,0x00
+#define BYTES_NOP4	0x0f,0x1f,0x40,0x00
+#define BYTES_NOP5	0x0f,0x1f,0x44,0x00,0x00
+#define BYTES_NOP6	0x66,BYTES_NOP5
+#define BYTES_NOP7	0x0f,0x1f,0x80,0x00,0x00,0x00,0x00
+#define BYTES_NOP8	0x0f,0x1f,0x84,0x00,0x00,0x00,0x00,0x00
+
+#endif /* CONFIG_64BIT */
+
+#ifdef __ASSEMBLY__
+#define _ASM_MK_NOP(x) .byte x
+#else
+#define _ASM_MK_NOP(x) ".byte " __stringify(x) "\n"
+#endif
+
+#define ASM_NOP1 _ASM_MK_NOP(BYTES_NOP1)
+#define ASM_NOP2 _ASM_MK_NOP(BYTES_NOP2)
+#define ASM_NOP3 _ASM_MK_NOP(BYTES_NOP3)
+#define ASM_NOP4 _ASM_MK_NOP(BYTES_NOP4)
+#define ASM_NOP5 _ASM_MK_NOP(BYTES_NOP5)
+#define ASM_NOP6 _ASM_MK_NOP(BYTES_NOP6)
+#define ASM_NOP7 _ASM_MK_NOP(BYTES_NOP7)
+#define ASM_NOP8 _ASM_MK_NOP(BYTES_NOP8)
+
+#define ASM_NOP_MAX 8
+
+#ifndef __ASSEMBLY__
+extern const unsigned char * const x86_nops[];
+#endif
+
+#endif /* _ASM_X86_NOPS_H */
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -11,6 +11,9 @@
 #include "../../../arch/x86/lib/inat.c"
 #include "../../../arch/x86/lib/insn.c"
 
+#define CONFIG_64BIT 1
+#include <asm/nops.h>
+
 #include <asm/orc_types.h>
 #include <objtool/check.h>
 #include <objtool/elf.h>
@@ -640,11 +643,11 @@ void arch_initial_func_cfi_state(struct
 const char *arch_nop_insn(int len)
 {
 	static const char nops[5][5] = {
-		/* 1 */ { 0x90 },
-		/* 2 */ { 0x66, 0x90 },
-		/* 3 */ { 0x0f, 0x1f, 0x00 },
-		/* 4 */ { 0x0f, 0x1f, 0x40, 0x00 },
-		/* 5 */ { 0x0f, 0x1f, 0x44, 0x00, 0x00 },
+		{ BYTES_NOP1 },
+		{ BYTES_NOP2 },
+		{ BYTES_NOP3 },
+		{ BYTES_NOP4 },
+		{ BYTES_NOP5 },
 	};
 
 	if (len < 1 || len > 5) {
--- a/tools/objtool/sync-check.sh
+++ b/tools/objtool/sync-check.sh
@@ -10,6 +10,7 @@ FILES="include/linux/objtool.h"
 
 if [ "$SRCARCH" = "x86" ]; then
 FILES="$FILES
+arch/x86/include/asm/nops.h
 arch/x86/include/asm/inat_types.h
 arch/x86/include/asm/orc_types.h
 arch/x86/include/asm/emulate_prefix.h


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/2] x86: Remove ideal_nops[]
  2021-03-12 18:13           ` Sedat Dilek
@ 2021-03-12 19:03             ` Sedat Dilek
  0 siblings, 0 replies; 51+ messages in thread
From: Sedat Dilek @ 2021-03-12 19:03 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Borislav Petkov, Peter Zijlstra, x86, hpa, torvalds,
	linux-kernel, linux-toolchains, jpoimboe, alexei.starovoitov,
	mhiramat

On Fri, Mar 12, 2021 at 7:13 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
>
> On Fri, Mar 12, 2021 at 6:47 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Fri, 12 Mar 2021 18:35:45 +0100
> > Sedat Dilek <sedat.dilek@gmail.com> wrote:
> >
> >
> > > Hey Steve, you degraded me to a number :-).
> >
> > It's the internet, everyone is a number.
> >
> > >
> > > I dunno which Git tree this patchset applies to, but I check if I can
> > > apply the patchset to my current local Git.
> >
> > Try Linus's latest.
> >
>
> $ git describe origin/HEAD
> v5.12-rc2-338-gf78d76e72a46
>
> I adapted 1/2 in arch/x86/include/asm/jump_label.h to fit ^^^, see attachment.
>

Forget this.

With latest Linus Git you need to apply "x86/jump_label: Mark
arguments as const to satisfy asm constraints" from tip Git.

- Sedat -

[1] https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/patch/?id=864b435514b286c0be2a38a02f487aa28d990ef8

>
> > > Then build a kernel in the same build-environment.
> > > Lemme see.
> > >
> > > To say with Linus's words:
> > > "Numbers talk - bullshit walks."
> >
> > Exactly.
> >
> > -- Steve

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/2] x86: Remove dynamic NOP selection
  2021-03-12 12:09   ` Peter Zijlstra
@ 2021-03-12 20:36     ` Linus Torvalds
  0 siblings, 0 replies; 51+ messages in thread
From: Linus Torvalds @ 2021-03-12 20:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: the arch/x86 maintainers, Steven Rostedt, Peter Anvin,
	Linux Kernel Mailing List, linux-toolchains, Josh Poimboeuf,
	Alexei Starovoitov, Masami Hiramatsu

On Fri, Mar 12, 2021 at 4:09 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> Note that this also made all NOPs single instructions and removed the
> special atomic nop.

Ack. Good riddance.

             Linus

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/2] x86: Remove ideal_nops[]
  2021-03-12 11:32 [PATCH 0/2] x86: Remove ideal_nops[] Peter Zijlstra
                   ` (2 preceding siblings ...)
  2021-03-12 14:29 ` [PATCH 0/2] x86: Remove ideal_nops[] Sedat Dilek
@ 2021-03-12 20:59 ` Borislav Petkov
       [not found]   ` <CA+icZUWSCS6vAQOXoG6nsW+Dbnogivzf+rmegCTMjz5hjE5cKQ@mail.gmail.com>
  3 siblings, 1 reply; 51+ messages in thread
From: Borislav Petkov @ 2021-03-12 20:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, rostedt, hpa, torvalds, linux-kernel, linux-toolchains,
	jpoimboe, alexei.starovoitov, mhiramat

On Fri, Mar 12, 2021 at 12:32:53PM +0100, Peter Zijlstra wrote:
> Since ultimate performance of a 10 year old chip (Intel Sandy Bridge, 2011) is
> simply irrelevant today, remove variable NOPs and use NOPL.

Just ran them on my SNB box:

cpu family      : 6
model           : 45
model name      : Intel(R) Xeon(R) CPU E5-1620 0 @ 3.60GHz
stepping        : 7

with the usual perf stat kernel build workload with
CONFIG_DYNAMIC_FTRACE and CONFIG_FUNCTION_TRACER where each function has
a NOP at its beginning when ftrace is disabled (thx Steve).

./tools/perf/perf stat --repeat 5 --sync --pre=/root/bin/pre-build-kernel.sh -- make -s -j9 bzImage

before: tip-master

 Performance counter stats for 'make -s -j9 bzImage' (5 runs):

      3,213,728.10 msec task-clock                #    7.307 CPUs utilized            ( +-  0.01% )
           339,270      context-switches          #    0.106 K/sec                    ( +-  0.09% )
            31,472      cpu-migrations            #    0.010 K/sec                    ( +-  0.64% )
        62,070,684      page-faults               #    0.019 M/sec                    ( +-  0.01% )
11,498,198,009,323      cycles                    #    3.578 GHz                      ( +-  0.01% )  (83.33%)
 8,235,957,366,696      stalled-cycles-frontend   #   71.63% frontend cycles idle     ( +-  0.01% )  (83.33%)
 5,976,456,688,814      stalled-cycles-backend    #   51.98% backend cycles idle      ( +-  0.02% )  (66.67%)
 7,553,156,344,376      instructions              #    0.66  insn per cycle         
                                                  #    1.09  stalled cycles per insn  ( +-  0.00% )  (83.33%)
 1,635,468,917,524      branches                  #  508.901 M/sec                    ( +-  0.00% )  (83.34%)
    51,888,292,932      branch-misses             #    3.17% of all branches          ( +-  0.02% )  (83.33%)

           439.809 +- 0.156 seconds time elapsed  ( +-  0.04% )


after: tip-master-nops

 Performance counter stats for 'make -s -j9 bzImage' (5 runs):

      3,217,113.67 msec task-clock                #    7.307 CPUs utilized            ( +-  0.03% )
           339,425      context-switches          #    0.106 K/sec                    ( +-  0.20% )
            31,724      cpu-migrations            #    0.010 K/sec                    ( +-  0.54% )
        62,027,130      page-faults               #    0.019 M/sec                    ( +-  0.01% )
11,508,779,965,901      cycles                    #    3.577 GHz                      ( +-  0.03% )  (83.34%)
 8,241,212,210,440      stalled-cycles-frontend   #   71.61% frontend cycles idle     ( +-  0.04% )  (83.33%)
 5,982,615,533,177      stalled-cycles-backend    #   51.98% backend cycles idle      ( +-  0.06% )  (66.66%)
 7,546,407,430,314      instructions              #    0.66  insn per cycle         
                                                  #    1.09  stalled cycles per insn  ( +-  0.00% )  (83.33%)
 1,634,187,006,479      branches                  #  507.967 M/sec                    ( +-  0.00% )  (83.33%)
    51,941,580,371      branch-misses             #    3.18% of all branches          ( +-  0.01% )  (83.33%)

           440.266 +- 0.195 seconds time elapsed  ( +-  0.04% )


So here's numbers talk, bullshit walks. And with those numbers no
bullshit can remain lingering around anyway.

Cheers!

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/2] x86: Remove ideal_nops[]
       [not found]   ` <CA+icZUWSCS6vAQOXoG6nsW+Dbnogivzf+rmegCTMjz5hjE5cKQ@mail.gmail.com>
@ 2021-03-13  8:49     ` Borislav Petkov
  2021-03-13 11:23       ` Borislav Petkov
  2021-03-13 12:10       ` Sedat Dilek
  0 siblings, 2 replies; 51+ messages in thread
From: Borislav Petkov @ 2021-03-13  8:49 UTC (permalink / raw)
  To: Sedat Dilek
  Cc: Peter Zijlstra, x86, rostedt, hpa, torvalds, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

On Sat, Mar 13, 2021 at 06:26:15AM +0100, Sedat Dilek wrote:
> x86/jump_label: Mark arguments as const to satisfy asm constraints

Where do I find this patch?

> x86: Remove dynamic NOP selection
> objtool,x86: Use asm/nops.h
> 
> My benchmark was to build a Linux-kernel with LLVM/Clang v12.0.0-rc3
> on Debian/testing AMD64.
> 
> Patchset applied for a first build:
> 
>  Performance counter stats for 'make V=1 -j4 LLVM=1 LLVM_IAS=1
> PAHOLE=/opt/pahole/bin/pahole LOCALVERSION=-7-amd64-clang12-cfi
> KBUILD_VERBOSE=1 KBUILD_BUILD_HOST=iniza

There's a reason I have -s for silent in the build - printing output
during the build creates a *lot* of variance. And you have excessive
printing with V=1 and KBUILD_VERBOSE=1.

Also, you need to repeat those workloads a couple of times - one is not
enough. That's why I have --repeat 5 in there.

Also, you need --pre=/root/bin/pre-build-kernel.sh where that script is:

---
#!/bin/bash
echo $0

make -s clean
echo 3 > /proc/sys/vm/drop_caches
---

so that you can avoid pagecache influence.

Lemme rerun here with clang.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/2] x86: Remove ideal_nops[]
  2021-03-13  8:49     ` Borislav Petkov
@ 2021-03-13 11:23       ` Borislav Petkov
  2021-03-13 12:10       ` Sedat Dilek
  1 sibling, 0 replies; 51+ messages in thread
From: Borislav Petkov @ 2021-03-13 11:23 UTC (permalink / raw)
  To: Sedat Dilek
  Cc: Peter Zijlstra, x86, rostedt, hpa, torvalds, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

On Sat, Mar 13, 2021 at 09:49:23AM +0100, Borislav Petkov wrote:
> Lemme rerun here with clang.

clang11 is almost twice as slow as gcc but difference is still
negligible: ~0.6 seconds.

./tools/perf/perf stat --repeat 5 --sync --pre=/root/bin/pre-build-kernel.sh -- make -s -j9 LLVM=1 LLVM_IAS=1 bzImage

before:

 Performance counter stats for 'make -s -j9 LLVM=1 LLVM_IAS=1 bzImage' (5 runs):

      5,576,081.48 msec task-clock                #    7.664 CPUs utilized            ( +-  0.03% )
           496,841      context-switches          #    0.089 K/sec                    ( +-  0.11% )
            30,245      cpu-migrations            #    0.005 K/sec                    ( +-  0.53% )
        49,702,714      page-faults               #    0.009 M/sec                    ( +-  0.00% )
19,954,704,926,347      cycles                    #    3.579 GHz                      ( +-  0.02% )  (83.33%)
15,920,125,996,460      stalled-cycles-frontend   #   79.78% frontend cycles idle     ( +-  0.03% )  (83.33%)
13,177,812,137,935      stalled-cycles-backend    #   66.04% backend cycles idle      ( +-  0.04% )  (66.67%)
 8,778,060,061,848      instructions              #    0.44  insn per cycle
                                                  #    1.81  stalled cycles per insn  ( +-  0.00% )  (83.33%)
 1,852,121,066,032      branches                  #  332.155 M/sec                    ( +-  0.00% )  (83.33%)
    84,048,262,434      branch-misses             #    4.54% of all branches          ( +-  0.02% )  (83.33%)

           727.572 +- 0.305 seconds time elapsed  ( +-  0.04% )

after:

 Performance counter stats for 'make -s -j9 LLVM=1 LLVM_IAS=1 bzImage' (5 runs):

      5,581,654.38 msec task-clock                #    7.665 CPUs utilized            ( +-  0.01% )
           496,274      context-switches          #    0.089 K/sec                    ( +-  0.12% )
            30,645      cpu-migrations            #    0.005 K/sec                    ( +-  0.54% )
        49,711,551      page-faults               #    0.009 M/sec                    ( +-  0.01% )
19,968,933,753,686      cycles                    #    3.578 GHz                      ( +-  0.01% )  (83.33%)
15,925,776,797,854      stalled-cycles-frontend   #   79.75% frontend cycles idle     ( +-  0.01% )  (83.33%)
13,182,158,323,446      stalled-cycles-backend    #   66.01% backend cycles idle      ( +-  0.01% )  (66.67%)
 8,778,619,885,119      instructions              #    0.44  insn per cycle
                                                  #    1.81  stalled cycles per insn  ( +-  0.00% )  (83.33%)
 1,852,096,100,464      branches                  #  331.818 M/sec                    ( +-  0.01% )  (83.33%)
    84,264,257,355      branch-misses             #    4.55% of all branches          ( +-  0.03% )  (83.33%)

          728.2400 +- 0.0613 seconds time elapsed  ( +-  0.01% )

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/2] x86: Remove ideal_nops[]
  2021-03-13  8:49     ` Borislav Petkov
  2021-03-13 11:23       ` Borislav Petkov
@ 2021-03-13 12:10       ` Sedat Dilek
  2021-03-13 12:15         ` Borislav Petkov
  1 sibling, 1 reply; 51+ messages in thread
From: Sedat Dilek @ 2021-03-13 12:10 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Peter Zijlstra, x86, rostedt, hpa, torvalds, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

On Sat, Mar 13, 2021 at 9:51 AM Borislav Petkov <bp@alien8.de> wrote:
>
> On Sat, Mar 13, 2021 at 06:26:15AM +0100, Sedat Dilek wrote:
> > x86/jump_label: Mark arguments as const to satisfy asm constraints
>
> Where do I find this patch?
>

Here we go:

https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/patch/?id=864b435514b286c0be2a38a02f487aa28d990ef8

> > x86: Remove dynamic NOP selection
> > objtool,x86: Use asm/nops.h
> >
> > My benchmark was to build a Linux-kernel with LLVM/Clang v12.0.0-rc3
> > on Debian/testing AMD64.
> >
> > Patchset applied for a first build:
> >
> >  Performance counter stats for 'make V=1 -j4 LLVM=1 LLVM_IAS=1
> > PAHOLE=/opt/pahole/bin/pahole LOCALVERSION=-7-amd64-clang12-cfi
> > KBUILD_VERBOSE=1 KBUILD_BUILD_HOST=iniza
>
> There's a reason I have -s for silent in the build - printing output
> during the build creates a *lot* of variance. And you have excessive
> printing with V=1 and KBUILD_VERBOSE=1.
>
> Also, you need to repeat those workloads a couple of times - one is not
> enough. That's why I have --repeat 5 in there.
>
> Also, you need --pre=/root/bin/pre-build-kernel.sh where that script is:
>
> ---
> #!/bin/bash
> echo $0
>
> make -s clean
> echo 3 > /proc/sys/vm/drop_caches
> ---
>
> so that you can avoid pagecache influence.
>

OK, I see.

- Sedat -

> Lemme rerun here with clang.
>
> --
> Regards/Gruss,
>     Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/2] x86: Remove ideal_nops[]
  2021-03-13 12:10       ` Sedat Dilek
@ 2021-03-13 12:15         ` Borislav Petkov
  2021-03-13 12:38           ` Sedat Dilek
  0 siblings, 1 reply; 51+ messages in thread
From: Borislav Petkov @ 2021-03-13 12:15 UTC (permalink / raw)
  To: Sedat Dilek
  Cc: Peter Zijlstra, x86, rostedt, hpa, torvalds, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

On Sat, Mar 13, 2021 at 01:10:29PM +0100, Sedat Dilek wrote:
> Here we go:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/patch/?id=864b435514b286c0be2a38a02f487aa28d990ef8

That's why I told earlier you to use tip/master - that patch is already
in it and all you would've needed to do is to apply the two nop patches.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/2] x86: Remove ideal_nops[]
  2021-03-13 12:15         ` Borislav Petkov
@ 2021-03-13 12:38           ` Sedat Dilek
  2021-03-13 12:49             ` Borislav Petkov
  0 siblings, 1 reply; 51+ messages in thread
From: Sedat Dilek @ 2021-03-13 12:38 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Peter Zijlstra, x86, rostedt, hpa, torvalds, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

On Sat, Mar 13, 2021 at 1:15 PM Borislav Petkov <bp@alien8.de> wrote:
>
> On Sat, Mar 13, 2021 at 01:10:29PM +0100, Sedat Dilek wrote:
> > Here we go:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/patch/?id=864b435514b286c0be2a38a02f487aa28d990ef8
>
> That's why I told earlier you to use tip/master - that patch is already
> in it and all you would've needed to do is to apply the two nop patches.
>

Thanks for all your testings and suggestions.

For me it was easier to apply these 3 patches on top of my custom
patchset to see what impact Peter's patchset.

AFAICS you did a 5 times x86-64 defconfig with dropped pagecache and `make -j9`?
I run my "normal" workflow(s) (and build-script) for easier comparison
on my side.

Big thank-you for testing with LLVM/Clang v11.x - twice as slow as with GCC :-(.
A selfmade ThinLTO+PGO optimized LLVM tooolchain v11.x/v12-rcX/v13-git
is here as fast as Debian's GCC-v10.2.1 to build a Linux-kernel -
approx. 03:30 [hh:mm] - full adapted Debian v5.10.y kernel-config.
Does your distribution offer LLVM/Clang v12.0.0-rc3 (released this
week) binaries?

- Sedat -

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/2] x86: Remove ideal_nops[]
  2021-03-13 12:38           ` Sedat Dilek
@ 2021-03-13 12:49             ` Borislav Petkov
  2021-03-13 12:58               ` Sedat Dilek
  0 siblings, 1 reply; 51+ messages in thread
From: Borislav Petkov @ 2021-03-13 12:49 UTC (permalink / raw)
  To: Sedat Dilek
  Cc: Peter Zijlstra, x86, rostedt, hpa, torvalds, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

On Sat, Mar 13, 2021 at 01:38:22PM +0100, Sedat Dilek wrote:
> AFAICS you did a 5 times x86-64 defconfig with dropped pagecache and `make -j9`?

The tailored .config for that particular test box.

> Does your distribution offer LLVM/Clang v12.0.0-rc3 (released this
> week) binaries?

The partition on that box I used is debian testing, so no:

$ apt search llvm-1* 2>/dev/null | grep llvm-1
libllvm-11-ocaml-dev/testing,testing 1:11.0.1-2 amd64
llvm-10/now 1:10.0.1-8+b1 amd64 [installed,local]
llvm-10-dev/now 1:10.0.1-8+b1 amd64 [installed,local]
llvm-10-runtime/now 1:10.0.1-8+b1 amd64 [installed,local]
llvm-10-tools/now 1:10.0.1-8+b1 amd64 [installed,local]
llvm-11/testing,testing,now 1:11.0.1-2 amd64 [installed,automatic]
llvm-11-dev/testing,testing,now 1:11.0.1-2 amd64 [installed,automatic]
llvm-11-doc/testing,testing 1:11.0.1-2 all
llvm-11-examples/testing,testing 1:11.0.1-2 all
llvm-11-runtime/testing,testing,now 1:11.0.1-2 amd64 [installed,automatic]
llvm-11-tools/testing,testing,now 1:11.0.1-2 amd64 [installed,automatic]

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/2] x86: Remove ideal_nops[]
  2021-03-13 12:49             ` Borislav Petkov
@ 2021-03-13 12:58               ` Sedat Dilek
  2021-03-13 13:29                 ` Borislav Petkov
  0 siblings, 1 reply; 51+ messages in thread
From: Sedat Dilek @ 2021-03-13 12:58 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Peter Zijlstra, x86, rostedt, hpa, torvalds, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

On Sat, Mar 13, 2021 at 1:49 PM Borislav Petkov <bp@alien8.de> wrote:
>
> On Sat, Mar 13, 2021 at 01:38:22PM +0100, Sedat Dilek wrote:
> > AFAICS you did a 5 times x86-64 defconfig with dropped pagecache and `make -j9`?
>
> The tailored .config for that particular test box.
>
> > Does your distribution offer LLVM/Clang v12.0.0-rc3 (released this
> > week) binaries?
>
> The partition on that box I used is debian testing, so no:
>
> $ apt search llvm-1* 2>/dev/null | grep llvm-1
> libllvm-11-ocaml-dev/testing,testing 1:11.0.1-2 amd64
> llvm-10/now 1:10.0.1-8+b1 amd64 [installed,local]
> llvm-10-dev/now 1:10.0.1-8+b1 amd64 [installed,local]
> llvm-10-runtime/now 1:10.0.1-8+b1 amd64 [installed,local]
> llvm-10-tools/now 1:10.0.1-8+b1 amd64 [installed,local]
> llvm-11/testing,testing,now 1:11.0.1-2 amd64 [installed,automatic]
> llvm-11-dev/testing,testing,now 1:11.0.1-2 amd64 [installed,automatic]
> llvm-11-doc/testing,testing 1:11.0.1-2 all
> llvm-11-examples/testing,testing 1:11.0.1-2 all
> llvm-11-runtime/testing,testing,now 1:11.0.1-2 amd64 [installed,automatic]
> llvm-11-tools/testing,testing,now 1:11.0.1-2 amd64 [installed,automatic]
>

You can add Debian/experimental APT sources.list ...

[ /etc/apt/sources.list.d/debian-experimental.list  ]
deb http://ftp.debian.org/debian experimental main contrib non-free
deb https://deb.debian.org/debian experimental main non-free contrib

[ /etc/apt/preferences.d/99_debian-experimental.pref ]
Package: *
Pin: release o=Debian,a=experimental
Pin-Priority: 99

This gives LLVM/Clang v12 packages an APT prio of 99 - meaning no
auto-upgrade installations will be done.
You have full control by doing it manually.

Renew informations from APT repositories:

root# apt-get update

What clang-12 version is/are available?

root# apt-cache policy clang-12

Simulate an install (note: --no-install-recommends option):

root# apt-get install llvm-12 clang-12 lld-12 llvm-12-tools
--no-install-recommends -t experimental -s

option -s: simulate

Really do an installation of LLVM/Clang v12 stuff:

root# apt-get install llvm-12 clang-12 lld-12 llvm-12-tools
--no-install-recommends -t experimental -y

option -y: yes

If you like to test.

Of course you can use packages from <apt-llvm.org> repositories.
I can give you APT sources.list plus pref files if you desire.

Have more fun.

- Sedat -

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/2] x86: Remove ideal_nops[]
  2021-03-13 12:58               ` Sedat Dilek
@ 2021-03-13 13:29                 ` Borislav Petkov
  2021-03-13 13:47                   ` Sedat Dilek
  0 siblings, 1 reply; 51+ messages in thread
From: Borislav Petkov @ 2021-03-13 13:29 UTC (permalink / raw)
  To: Sedat Dilek
  Cc: Peter Zijlstra, x86, rostedt, hpa, torvalds, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

On Sat, Mar 13, 2021 at 01:58:56PM +0100, Sedat Dilek wrote:
> You can add Debian/experimental APT sources.list ...

I could but I don't expect clang12 to behave any differently here.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/2] x86: Remove ideal_nops[]
  2021-03-13 13:29                 ` Borislav Petkov
@ 2021-03-13 13:47                   ` Sedat Dilek
  2021-03-15 17:04                     ` Sedat Dilek
  0 siblings, 1 reply; 51+ messages in thread
From: Sedat Dilek @ 2021-03-13 13:47 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Peter Zijlstra, x86, rostedt, hpa, torvalds, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

On Sat, Mar 13, 2021 at 2:29 PM Borislav Petkov <bp@alien8.de> wrote:
>
> On Sat, Mar 13, 2021 at 01:58:56PM +0100, Sedat Dilek wrote:
> > You can add Debian/experimental APT sources.list ...
>
> I could but I don't expect clang12 to behave any differently here.
>

Agreed in things of build-time.
There were some improvements and optimizations to LLVM/Clang but twice
as slow is really hard compared with GCC.

I was thinking more in the direction of "compatibility" of tip tree
with recent LLVM/Clang other than what is officially supported via
Kbuild-system.

Let me look if I will do a selfmade ThinLTO+PGO optimized LLVM
toolchain v12.0.0-rc3 this weekend.

- Sedat -

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/2] x86: Remove ideal_nops[]
  2021-03-13 13:47                   ` Sedat Dilek
@ 2021-03-15 17:04                     ` Sedat Dilek
  2021-03-15 17:15                       ` Borislav Petkov
  2021-03-15 18:10                       ` Peter Zijlstra
  0 siblings, 2 replies; 51+ messages in thread
From: Sedat Dilek @ 2021-03-15 17:04 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Peter Zijlstra, x86, rostedt, hpa, torvalds, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

[-- Attachment #1: Type: text/plain, Size: 4618 bytes --]

On Sat, Mar 13, 2021 at 2:47 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
[ ... ]
> Let me look if I will do a selfmade ThinLTO+PGO optimized LLVM
> toolchain v12.0.0-rc3 this weekend.
>

I did it.

Here some fresh numbers:

[ Selfmade LLVM toolchain v12.0.0-rc3 "stage1-only" ]
[ Host-Kernel: 5.12.0-rc2-8-amd64-clang12-cfi includes Peter's NOPS patchset ]

Performance counter stats for 'make V=1 -j4 LLVM=1 LLVM_IAS=1
PAHOLE=/opt/pahole/bin/pahole LOCALVERSION=-9-amd64-clang12-cfi
KBUILD_VERBOSE=1 KBUILD_BUILD_HOST=iniza
KBUILD_BUILD_USER=sedat.dilek@gmail.com
KBUILD_BUILD_TIMESTAMP=2021-03-13 bindeb-pkg
KDEB_PKGVERSION=5.12.0~rc2-9~bullseye+dileks1':

      55936351.95 msec task-clock                #    3.580 CPUs
utilized
          8291848      context-switches          #    0.148 K/sec
           269686      cpu-migrations            #    0.005 K/sec
        288389721      page-faults               #    0.005 M/sec
  108344049253836      cycles                    #    1.937 GHz
   83228135285263      stalled-cycles-frontend   #   76.82% frontend
cycles idle
   65616255370809      stalled-cycles-backend    #   60.56% backend
cycles idle
   59590373937199      instructions              #    0.55  insn per
cycle
                                                 #    1.40  stalled
cycles per insn
   10906265495505      branches                  #  194.976 M/sec
     488578274434      branch-misses             #    4.48% of all
branches

  15622.926203302 seconds time elapsed

  53453.974928000 seconds user
   2526.773533000 seconds sys


[ Selfmade LLVM toolchain v12.0.0-rc3 "thinlto_pgo_optimized" ]
[ Host-Kernel: Debian's 5.10.19-1 kernel ]

Performance counter stats for 'make V=1 -j4 LLVM=1 LLVM_IAS=1
PAHOLE=/opt/pahole/bin/pahole LOCALVERSION=-10-amd64-clang12-cfi
KBUILD_VERBOSE=1 KBUILD_BUILD_HOST=iniza
KBUILD_BUILD_USER=sedat.dilek@gmail.com
KBUILD_BUILD_TIMESTAMP=2021-03-14 bindeb-pkg
KDEB_PKGVERSION=5.12.0~rc2-10~bullseye+dileks1':

      40223080.69 msec task-clock                #    3.434 CPUs
utilized
          7438923      context-switches          #    0.185 K/sec
           245636      cpu-migrations            #    0.006 K/sec
        288073015      page-faults               #    0.007 M/sec
   77325441657129      cycles                    #    1.922 GHz
   55357463522675      stalled-cycles-frontend   #   71.59% frontend
cycles idle
   38978871249074      stalled-cycles-backend    #   50.41% backend
cycles idle
   55178265045056      instructions              #    0.71  insn per
cycle
                                                 #    1.00  stalled
cycles per insn
    9749166033571      branches                  #  242.377 M/sec
     431303563167      branch-misses             #    4.42% of all
branches

  11714.751645982 seconds time elapsed

  37951.117840000 seconds user
   2313.807151000 seconds sys


[ Selfmade LLVM toolchain v12.0.0-rc3 "thinlto_pgo_optimized" ]
[ Host-Kernel: 5.12.0-rc2-10-amd64-clang12-cfi includes Peter's NOPS patchset ]

Performance counter stats for 'make V=1 -j4 LLVM=1 LLVM_IAS=1
PAHOLE=/opt/pahole/bin/pahole LOCALVERSION=-1-amd64-clang12-cfi
KBUILD_VERBOSE=1 KBUILD_BUILD_HOST=iniza
KBUILD_BUILD_USER=sedat.dilek@gmail.com
KBUILD_BUILD_TIMESTAMP=2021-03-15 bindeb-pkg
KDEB_PKGVERSION=5.12.0~rc3-1~bullseye+dileks1':

      40632207.25 msec task-clock                #    3.406 CPUs
utilized
          8216832      context-switches          #    0.202 K/sec
           277610      cpu-migrations            #    0.007 K/sec
        281331052      page-faults               #    0.007 M/sec
   77031538570411      cycles                    #    1.896 GHz
              (83.33%)
   55247905369487      stalled-cycles-frontend   #   71.72% frontend
cycles idle     (83.33%)
   39046795510242      stalled-cycles-backend    #   50.69% backend
cycles idle      (66.67%)
   54592585444704      instructions              #    0.71  insn per
cycle
                                                 #    1.01  stalled
cycles per insn  (83.33%)
    9641589406714      branches                  #  237.289 M/sec
              (83.33%)
     435317273069      branch-misses             #    4.51% of all
branches          (83.33%)

  11928.047003788 seconds time elapsed

  38187.685111000 seconds user
   2502.075987000 seconds sys

As said in an earlier email:
A ThinLTO+PGO optimized LLVM-toolchain saves here approx. 60mins of build-time.

Depending on the host-kernel including Peter's NOPS patchset: 3mins
longer build-time.
Brewing time of one single Turkish Tea bag.

Attached are the 3 build-time log-files.

- Sedat -

[-- Attachment #2: build-time_5.12.0-rc2-9-amd64-clang12-cfi.txt --]
[-- Type: text/plain, Size: 1344 bytes --]

 Performance counter stats for 'make V=1 -j4 LLVM=1 LLVM_IAS=1 PAHOLE=/opt/pahole/bin/pahole LOCALVERSION=-9-amd64-clang12-cfi KBUILD_VERBOSE=1 KBUILD_BUILD_HOST=iniza KBUILD_BUILD_USER=sedat.dilek@gmail.com KBUILD_BUILD_TIMESTAMP=2021-03-13 bindeb-pkg KDEB_PKGVERSION=5.12.0~rc2-9~bullseye+dileks1':

       55936351.95 msec task-clock                #    3.580 CPUs utilized          
           8291848      context-switches          #    0.148 K/sec                  
            269686      cpu-migrations            #    0.005 K/sec                  
         288389721      page-faults               #    0.005 M/sec                  
   108344049253836      cycles                    #    1.937 GHz                    
    83228135285263      stalled-cycles-frontend   #   76.82% frontend cycles idle   
    65616255370809      stalled-cycles-backend    #   60.56% backend cycles idle    
    59590373937199      instructions              #    0.55  insn per cycle         
                                                  #    1.40  stalled cycles per insn
    10906265495505      branches                  #  194.976 M/sec                  
      488578274434      branch-misses             #    4.48% of all branches        

   15622.926203302 seconds time elapsed

   53453.974928000 seconds user
    2526.773533000 seconds sys



[-- Attachment #3: build-time_5.12.0-rc2-10-amd64-clang12-cfi.txt --]
[-- Type: text/plain, Size: 1346 bytes --]

 Performance counter stats for 'make V=1 -j4 LLVM=1 LLVM_IAS=1 PAHOLE=/opt/pahole/bin/pahole LOCALVERSION=-10-amd64-clang12-cfi KBUILD_VERBOSE=1 KBUILD_BUILD_HOST=iniza KBUILD_BUILD_USER=sedat.dilek@gmail.com KBUILD_BUILD_TIMESTAMP=2021-03-14 bindeb-pkg KDEB_PKGVERSION=5.12.0~rc2-10~bullseye+dileks1':

       40223080.69 msec task-clock                #    3.434 CPUs utilized          
           7438923      context-switches          #    0.185 K/sec                  
            245636      cpu-migrations            #    0.006 K/sec                  
         288073015      page-faults               #    0.007 M/sec                  
    77325441657129      cycles                    #    1.922 GHz                    
    55357463522675      stalled-cycles-frontend   #   71.59% frontend cycles idle   
    38978871249074      stalled-cycles-backend    #   50.41% backend cycles idle    
    55178265045056      instructions              #    0.71  insn per cycle         
                                                  #    1.00  stalled cycles per insn
     9749166033571      branches                  #  242.377 M/sec                  
      431303563167      branch-misses             #    4.42% of all branches        

   11714.751645982 seconds time elapsed

   37951.117840000 seconds user
    2313.807151000 seconds sys



[-- Attachment #4: build-time_5.12.0-rc3-1-amd64-clang12-cfi.txt --]
[-- Type: text/plain, Size: 1404 bytes --]

 Performance counter stats for 'make V=1 -j4 LLVM=1 LLVM_IAS=1 PAHOLE=/opt/pahole/bin/pahole LOCALVERSION=-1-amd64-clang12-cfi KBUILD_VERBOSE=1 KBUILD_BUILD_HOST=iniza KBUILD_BUILD_USER=sedat.dilek@gmail.com KBUILD_BUILD_TIMESTAMP=2021-03-15 bindeb-pkg KDEB_PKGVERSION=5.12.0~rc3-1~bullseye+dileks1':

       40632207.25 msec task-clock                #    3.406 CPUs utilized          
           8216832      context-switches          #    0.202 K/sec                  
            277610      cpu-migrations            #    0.007 K/sec                  
         281331052      page-faults               #    0.007 M/sec                  
    77031538570411      cycles                    #    1.896 GHz                      (83.33%)
    55247905369487      stalled-cycles-frontend   #   71.72% frontend cycles idle     (83.33%)
    39046795510242      stalled-cycles-backend    #   50.69% backend cycles idle      (66.67%)
    54592585444704      instructions              #    0.71  insn per cycle         
                                                  #    1.01  stalled cycles per insn  (83.33%)
     9641589406714      branches                  #  237.289 M/sec                    (83.33%)
      435317273069      branch-misses             #    4.51% of all branches          (83.33%)

   11928.047003788 seconds time elapsed

   38187.685111000 seconds user
    2502.075987000 seconds sys



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/2] x86: Remove ideal_nops[]
  2021-03-15 17:04                     ` Sedat Dilek
@ 2021-03-15 17:15                       ` Borislav Petkov
  2021-03-15 17:19                         ` Sedat Dilek
  2021-03-15 18:10                       ` Peter Zijlstra
  1 sibling, 1 reply; 51+ messages in thread
From: Borislav Petkov @ 2021-03-15 17:15 UTC (permalink / raw)
  To: Sedat Dilek
  Cc: Peter Zijlstra, x86, rostedt, hpa, torvalds, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

On Mon, Mar 15, 2021 at 06:04:41PM +0100, Sedat Dilek wrote:
> Here some fresh numbers:

Lemme paste my previous reply which still holds true here:

"There's a reason I have -s for silent in the build - printing output
during the build creates a *lot* of variance. And you have excessive
printing with V=1 and KBUILD_VERBOSE=1.

Also, you need to repeat those workloads a couple of times - one is not
enough. That's why I have --repeat 5 in there.

Also, you need --pre=/root/bin/pre-build-kernel.sh where that script is:

---
#!/bin/bash
echo $0

make -s clean
echo 3 > /proc/sys/vm/drop_caches
---

so that you can avoid pagecache influence."

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/2] x86: Remove ideal_nops[]
  2021-03-15 17:15                       ` Borislav Petkov
@ 2021-03-15 17:19                         ` Sedat Dilek
  2021-03-15 17:23                           ` Borislav Petkov
  0 siblings, 1 reply; 51+ messages in thread
From: Sedat Dilek @ 2021-03-15 17:19 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Peter Zijlstra, x86, rostedt, hpa, torvalds, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

On Mon, Mar 15, 2021 at 6:15 PM Borislav Petkov <bp@alien8.de> wrote:
>
> On Mon, Mar 15, 2021 at 06:04:41PM +0100, Sedat Dilek wrote:
> > Here some fresh numbers:
>
> Lemme paste my previous reply which still holds true here:
>
> "There's a reason I have -s for silent in the build - printing output
> during the build creates a *lot* of variance. And you have excessive
> printing with V=1 and KBUILD_VERBOSE=1.
>

I have this for diagnostic reasons.
Yes, I can drop V=1 and KBUILD_VERBOSE=1.
This is a good idea for a fast build.

> Also, you need to repeat those workloads a couple of times - one is not
> enough. That's why I have --repeat 5 in there.
>
> Also, you need --pre=/root/bin/pre-build-kernel.sh where that script is:
>
> ---
> #!/bin/bash
> echo $0
>
> make -s clean
> echo 3 > /proc/sys/vm/drop_caches
> ---
>
> so that you can avoid pagecache influence."
>

With my next build I try to apply this.

- Sedat -

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/2] x86: Remove ideal_nops[]
  2021-03-15 17:19                         ` Sedat Dilek
@ 2021-03-15 17:23                           ` Borislav Petkov
  0 siblings, 0 replies; 51+ messages in thread
From: Borislav Petkov @ 2021-03-15 17:23 UTC (permalink / raw)
  To: Sedat Dilek
  Cc: Peter Zijlstra, x86, rostedt, hpa, torvalds, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

On Mon, Mar 15, 2021 at 06:19:34PM +0100, Sedat Dilek wrote:
> With my next build I try to apply this.

Your perf tool command should look something like this:

perf stat --repeat 5 --sync --pre=/root/bin/pre-build-kernel.sh -- make -s -j9 LLVM=1 LLVM_IAS=1 bzImage

Also, needless to say, your box needs to not run anything else during
the measurement.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/2] x86: Remove ideal_nops[]
  2021-03-15 17:04                     ` Sedat Dilek
  2021-03-15 17:15                       ` Borislav Petkov
@ 2021-03-15 18:10                       ` Peter Zijlstra
  2021-03-15 18:23                         ` Sedat Dilek
  1 sibling, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2021-03-15 18:10 UTC (permalink / raw)
  To: Sedat Dilek
  Cc: Borislav Petkov, x86, rostedt, hpa, torvalds, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

On Mon, Mar 15, 2021 at 06:04:41PM +0100, Sedat Dilek wrote:

> make V=1 -j4 LLVM=1 LLVM_IAS=1

So for giggles I checked, neither GCC nor LLVM seem to emit prefix NOPs
when building with -march=sandybridge, they always use MOPL.

Furthermore, the kernel explicitly sets: -falign-jumps=1
-falign-loops=1, which, when not specified, default to 16 or so.

This means that your userspace is *littered* with NOPL, even when you
build your entire distro from source with -march=sandybridge.
(arch/gentoo FTW I suppose).

(The only good new is that recent LLVM has a pass to use alternative
instruction encoding in order to grow a basic block in size in order to
minimize the amount of NOP it needs to emit at the end in order to
satisfy the jump/loop alignment.)

So if you *really* deeply care about NOP performance on your SNB, start
by teaching LLVM about prefix NOPs and rebuild your complete userspace.
At that point, you can do some trivial patches to the kernel to make it
use -march=sandybridge and prefix NOPs too.

Until that time, the vast majority of NOPs your CPU will execute will be
NOPL.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/2] x86: Remove ideal_nops[]
  2021-03-15 18:10                       ` Peter Zijlstra
@ 2021-03-15 18:23                         ` Sedat Dilek
  2021-03-15 22:14                           ` Peter Zijlstra
  0 siblings, 1 reply; 51+ messages in thread
From: Sedat Dilek @ 2021-03-15 18:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Borislav Petkov, x86, rostedt, hpa, torvalds, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

On Mon, Mar 15, 2021 at 7:10 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Mar 15, 2021 at 06:04:41PM +0100, Sedat Dilek wrote:
>
> > make V=1 -j4 LLVM=1 LLVM_IAS=1
>
> So for giggles I checked, neither GCC nor LLVM seem to emit prefix NOPs
> when building with -march=sandybridge, they always use MOPL.
>
> Furthermore, the kernel explicitly sets: -falign-jumps=1
> -falign-loops=1, which, when not specified, default to 16 or so.
>
> This means that your userspace is *littered* with NOPL, even when you
> build your entire distro from source with -march=sandybridge.
> (arch/gentoo FTW I suppose).
>

That reminds me of the Git repo of the wireguard maintainer.

"x86: enable additional cpu optimizations for gcc v9.1+"

You mean something like that ^^?

- Sedat -

[1] https://git.zx2c4.com/laptop-kernel/commit/?id=116badbe0a18bc36ba90acb8b80cff41f9ab0686

> (The only good new is that recent LLVM has a pass to use alternative
> instruction encoding in order to grow a basic block in size in order to
> minimize the amount of NOP it needs to emit at the end in order to
> satisfy the jump/loop alignment.)
>
> So if you *really* deeply care about NOP performance on your SNB, start
> by teaching LLVM about prefix NOPs and rebuild your complete userspace.
> At that point, you can do some trivial patches to the kernel to make it
> use -march=sandybridge and prefix NOPs too.
>
> Until that time, the vast majority of NOPs your CPU will execute will be
> NOPL.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/2] x86: Remove ideal_nops[]
  2021-03-15 18:23                         ` Sedat Dilek
@ 2021-03-15 22:14                           ` Peter Zijlstra
  2021-03-16  5:56                             ` Sedat Dilek
  0 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2021-03-15 22:14 UTC (permalink / raw)
  To: Sedat Dilek
  Cc: Borislav Petkov, x86, rostedt, hpa, torvalds, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

On Mon, Mar 15, 2021 at 07:23:29PM +0100, Sedat Dilek wrote:

> You mean something like that ^^?
> 
> - Sedat -
> 
> [1] https://git.zx2c4.com/laptop-kernel/commit/?id=116badbe0a18bc36ba90acb8b80cff41f9ab0686

*shudder*, I was more thinking you'd simply add it to you CFLAGS when
building. I don't see any point in having that in Kconfig.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/2] x86: Remove ideal_nops[]
  2021-03-15 22:14                           ` Peter Zijlstra
@ 2021-03-16  5:56                             ` Sedat Dilek
  2021-03-27 12:08                               ` Sedat Dilek
  0 siblings, 1 reply; 51+ messages in thread
From: Sedat Dilek @ 2021-03-16  5:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Borislav Petkov, x86, rostedt, hpa, torvalds, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

On Mon, Mar 15, 2021 at 11:14 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Mar 15, 2021 at 07:23:29PM +0100, Sedat Dilek wrote:
>
> > You mean something like that ^^?
> >
> > - Sedat -
> >
> > [1] https://git.zx2c4.com/laptop-kernel/commit/?id=116badbe0a18bc36ba90acb8b80cff41f9ab0686
>
> *shudder*, I was more thinking you'd simply add it to you CFLAGS when
> building. I don't see any point in having that in Kconfig.

Simply adding the CFLAGS to arch/x86/Makefile.

If I forgot to mention:

   Tested-by: Sedat Dilek <sedat.dilek@gmail.com>. # LLVM/Clang v12.0.0-rc3

- Sedat -

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/2] x86: Remove ideal_nops[]
  2021-03-16  5:56                             ` Sedat Dilek
@ 2021-03-27 12:08                               ` Sedat Dilek
  2021-03-27 20:02                                 ` Linus Torvalds
  0 siblings, 1 reply; 51+ messages in thread
From: Sedat Dilek @ 2021-03-27 12:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Borislav Petkov, x86, rostedt, hpa, torvalds, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

Out of curiosity I tried in my build-environment and my testing-rules
to have comparable numbers...

..without passing "V=1" and "KBUILD_VERBOSE=1" as make-options:

NOTE: Identical linux-config plus LLVM/Clang v12.0.0-rc3.

debian-5.10.19 as host-kernel:
11655.755564957 seconds time elapsed

dileks-5.12-rc3 plus x86-nops as host-kernel:
11941.439350080 seconds time elapsed

I compared the build-times only:
Approx. 04:45 [mm:ss] in the worst case.
( Brewing time of a strong Turkish tea-bag ~5mins. )

I will keep both make-options to see what's going on in my builds.

- Sedat -

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/2] x86: Remove ideal_nops[]
  2021-03-27 12:08                               ` Sedat Dilek
@ 2021-03-27 20:02                                 ` Linus Torvalds
  2021-03-30 12:31                                   ` Sedat Dilek
  0 siblings, 1 reply; 51+ messages in thread
From: Linus Torvalds @ 2021-03-27 20:02 UTC (permalink / raw)
  To: Sedat Dilek
  Cc: Peter Zijlstra, Borislav Petkov, the arch/x86 maintainers,
	Steven Rostedt, Peter Anvin, Linux Kernel Mailing List,
	linux-toolchains, Josh Poimboeuf, Alexei Starovoitov,
	Masami Hiramatsu

On Sat, Mar 27, 2021 at 5:08 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
>
> debian-5.10.19 as host-kernel:
> 11655.755564957 seconds time elapsed
>
> dileks-5.12-rc3 plus x86-nops as host-kernel:
> 11941.439350080 seconds time elapsed

That's 2.5% - a huge difference. Particularly since kernel build times
shouldn't even be that kernel-intensive.

I think there's something else going on than the nops. Same config?
There are likely many other differences between 5.10.19 and 5.12-rc3.

So can you check just plain 5.12-rc3 and then 5.12-rc3 plus x86-nops,
with otherwise identical configuration?

               Linus

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/2] x86: Remove ideal_nops[]
  2021-03-27 20:02                                 ` Linus Torvalds
@ 2021-03-30 12:31                                   ` Sedat Dilek
  0 siblings, 0 replies; 51+ messages in thread
From: Sedat Dilek @ 2021-03-30 12:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Borislav Petkov, the arch/x86 maintainers,
	Steven Rostedt, Peter Anvin, Linux Kernel Mailing List,
	linux-toolchains, Josh Poimboeuf, Alexei Starovoitov,
	Masami Hiramatsu

On Sat, Mar 27, 2021 at 9:02 PM Linus Torvalds
<torvalds@linuxfoundation.org> wrote:
>
> On Sat, Mar 27, 2021 at 5:08 AM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> >
> > debian-5.10.19 as host-kernel:
> > 11655.755564957 seconds time elapsed
> >
> > dileks-5.12-rc3 plus x86-nops as host-kernel:
> > 11941.439350080 seconds time elapsed
>
> That's 2.5% - a huge difference. Particularly since kernel build times
> shouldn't even be that kernel-intensive.
>
> I think there's something else going on than the nops. Same config?
> There are likely many other differences between 5.10.19 and 5.12-rc3.
>
> So can you check just plain 5.12-rc3 and then 5.12-rc3 plus x86-nops,
> with otherwise identical configuration?
>

Hi Linus,

I re-checked my linux-config and custom patchset.

I had "kbuild: add CONFIG_VMLINUX_MAP expert option" in my queue and
build with CONFIG_VMLINUX_MAP=y.
This option generated here an approx. 30MiB big vmlinux.map file.
Cannot say how long this is taking in seconds but that can explain the
the time-diff.

[ The above option is helpful to analyze a recent Linux-kernel build
with CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y.
Always, I was able to build but not boot on bare metal with
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y.
With a LLVM toolchain, of course. ]

( In the meantime Debian has a 5.20.26 kernel released - so if you
want I can re-test with Linux v5.12-rc5. )

Regards,
- Sedat -

[1] https://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild.git/commit/?h=kbuild&id=babd8cd96d333cb83c9b8abf4f01ab1f161d6ec4

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/2] x86: Remove dynamic NOP selection
  2021-03-12 11:32 ` [PATCH 1/2] x86: Remove dynamic NOP selection Peter Zijlstra
  2021-03-12 12:09   ` Peter Zijlstra
@ 2024-01-20  6:58   ` Thorsten Glaser
  2024-01-20  8:22     ` H. Peter Anvin
  1 sibling, 1 reply; 51+ messages in thread
From: Thorsten Glaser @ 2024-01-20  6:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, rostedt, hpa, torvalds, linux-kernel, linux-toolchains,
	jpoimboe, alexei.starovoitov, mhiramat

Peter Zijlstra dixit:

>-/* generic versions from gas
[…]
>-   3: leal 0x00(%esi),%esi
>-   4: leal 0x00(,%esi,1),%esi
>-   6: leal 0x00000000(%esi),%esi
>-   7: leal 0x00000000(,%esi,1),%esi

vs.

>+ * Generic 32bit nops from GAS:
[…]
>+ * 3: leal 0x0(%esi),%esi
>+ * 4: leal 0x0(%esi,%eiz,1),%esi
>+ * 5: leal %ds:0x0(%esi,%eiz,1),%esi
>+ * 6: leal 0x0(%esi),%esi
>+ * 7: leal 0x0(%esi,%eiz,1),%esi
>+ * 8: leal %ds:0x0(%esi,%eiz,1),%esi

I think there’s some mistake introduced. The BYTES_* are
identical for e.g. #7, but %eiz must be wrong, it’s not
a register. Indeed, gas (on Debian bullseye) does not
assemble that either.

(Awful AT&T syntax aside…)

bye,
//mirabilos
-- 
„Cool, /usr/share/doc/mksh/examples/uhr.gz ist ja ein Grund,
mksh auf jedem System zu installieren.“
	-- XTaran auf der OpenRheinRuhr, ganz begeistert
(EN: “[…]uhr.gz is a reason to install mksh on every system.”)

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/2] x86: Remove dynamic NOP selection
  2024-01-20  6:58   ` Thorsten Glaser
@ 2024-01-20  8:22     ` H. Peter Anvin
  2024-01-20 16:53       ` Thorsten Glaser
  2024-01-20 17:00       ` Linus Torvalds
  0 siblings, 2 replies; 51+ messages in thread
From: H. Peter Anvin @ 2024-01-20  8:22 UTC (permalink / raw)
  To: Thorsten Glaser, Peter Zijlstra
  Cc: x86, rostedt, torvalds, linux-kernel, linux-toolchains, jpoimboe,
	alexei.starovoitov, mhiramat

On January 19, 2024 10:58:56 PM PST, Thorsten Glaser <tg@debian.org> wrote:
>Peter Zijlstra dixit:
>
>>-/* generic versions from gas
>[…]
>>-   3: leal 0x00(%esi),%esi
>>-   4: leal 0x00(,%esi,1),%esi
>>-   6: leal 0x00000000(%esi),%esi
>>-   7: leal 0x00000000(,%esi,1),%esi
>
>vs.
>
>>+ * Generic 32bit nops from GAS:
>[…]
>>+ * 3: leal 0x0(%esi),%esi
>>+ * 4: leal 0x0(%esi,%eiz,1),%esi
>>+ * 5: leal %ds:0x0(%esi,%eiz,1),%esi
>>+ * 6: leal 0x0(%esi),%esi
>>+ * 7: leal 0x0(%esi,%eiz,1),%esi
>>+ * 8: leal %ds:0x0(%esi,%eiz,1),%esi
>
>I think there’s some mistake introduced. The BYTES_* are
>identical for e.g. #7, but %eiz must be wrong, it’s not
>a register. Indeed, gas (on Debian bullseye) does not
>assemble that either.
>
>(Awful AT&T syntax aside…)
>
>bye,
>//mirabilos

%eiz was something that binutils used to put in when disassembling certain redundant encodings with SIB at some point.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/2] x86: Remove dynamic NOP selection
  2024-01-20  8:22     ` H. Peter Anvin
@ 2024-01-20 16:53       ` Thorsten Glaser
  2024-01-21 23:21         ` H. Peter Anvin
  2024-01-20 17:00       ` Linus Torvalds
  1 sibling, 1 reply; 51+ messages in thread
From: Thorsten Glaser @ 2024-01-20 16:53 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Peter Zijlstra, x86, rostedt, torvalds, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

H. Peter Anvin dixit:

>%eiz was something that binutils used to put in when disassembling
>certain redundant encodings with SIB at some point.

Ah, fair enough. Maybe this could be added as one more line in
the comment or something.

Thanks,
//mirabilos
-- 
15:41⎜<Lo-lan-do:#fusionforge> Somebody write a testsuite for helloworld :-)

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/2] x86: Remove dynamic NOP selection
  2024-01-20  8:22     ` H. Peter Anvin
  2024-01-20 16:53       ` Thorsten Glaser
@ 2024-01-20 17:00       ` Linus Torvalds
  2024-01-20 17:19         ` Thorsten Glaser
  2024-01-21 22:36         ` [PATCH 1/2] x86: Remove dynamic NOP selection David Laight
  1 sibling, 2 replies; 51+ messages in thread
From: Linus Torvalds @ 2024-01-20 17:00 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thorsten Glaser, Peter Zijlstra, x86, rostedt, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

On Sat, 20 Jan 2024 at 00:28, H. Peter Anvin <hpa@zytor.com> wrote:
>
> %eiz was something that binutils used to put in when disassembling certain redundant encodings with SIB at some point.

Yeah, it's purely (bad) syntactic sugar for "no register". Somebody
decided that the fact that so many RISC architectures have a "zero
register" means that they should make x86 look like it has a "zero
register" too.

I assume it regularized some very silly decoding issue, but it was horrible.

It's not the worst thing I've ever seen - in objdump output, and it's
easy to just remove with a sed script or a simple search-and-replace
in the editor.  Unlike some of the other "design" choices of objdump.

On that note, does anybody have a better disassembler than objdump? Or
even a script around it to make it more useful? I do use "objdump
--disassemble" a fair amount, and I hate how bad it is.

My pet peeve is the crazy relocation handling (or lack there-of). IOW,
if I do something like

    objdump --disassemble \
        --no-show-raw-insn
        --no-addresses \
        kernel/exit.o

I get output like this:

        call   <delayed_put_task_struct+0x1a>

whis is garbage: it's not calling delayed_put_task_struct+0x1a at all,
that's just "the offset bytes are all zero because the data is in the
relocation".

And if I add "-r" to get relocation info, I get

        call   <delayed_put_task_struct+0x1a>
                        R_X86_64_PLT32  rethook_flush_task-0x4

which shows the raw relocation data, but with truly mind-bogglingly
horrendous syntax.

Is there some sane tool that just does the sane thing and shows this as

        call   rethook_flush_task

which is what the thing actually means?

And no, the llvm-objdump thing isn't any better. It isn't compatible
with the GNU binutils objdump, but it does the same insanely bad
decoding.

            Linus

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/2] x86: Remove dynamic NOP selection
  2024-01-20 17:00       ` Linus Torvalds
@ 2024-01-20 17:19         ` Thorsten Glaser
  2024-01-20 18:21           ` disassemblers (was Re: [PATCH 1/2] x86: Remove dynamic NOP selection) Thorsten Glaser
  2024-01-21 22:36         ` [PATCH 1/2] x86: Remove dynamic NOP selection David Laight
  1 sibling, 1 reply; 51+ messages in thread
From: Thorsten Glaser @ 2024-01-20 17:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, Peter Zijlstra, x86, rostedt, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

Linus Torvalds dixit:

>On that note, does anybody have a better disassembler than objdump? Or
>even a script around it to make it more useful? I do use "objdump
>--disassemble" a fair amount, and I hate how bad it is.

Other than -Mintel (and -Mintel,i8086 for boot code) to make the
syntax 90–95% less awful, I’m sorry I’ve also been looking.

>My pet peeve is the crazy relocation handling (or lack there-of). IOW,

Yes! This!

I’ve been putting markers into the file and then disassembling
the final linked thing instead of just the object file I need
because of this.

>Is there some sane tool that just does the sane thing and shows this as

The only other disassemblers I know don’t know about ELF objects
at all, I’m sorry to say.

I didn’t know about objdump -r, but that’s truly awful to read.
Given a wide enough screen, an intermediate | sed 's/^\t/&&&&/'
at least moves the relocation info more to the right to interrupt
the reading flow less.

Thanks,
//mirabilos
-- 
FWIW, I'm quite impressed with mksh interactively. I thought it was much
*much* more bare bones. But it turns out it beats the living hell out of
ksh93 in that respect. I'd even consider it for my daily use if I hadn't
wasted half my life on my zsh setup. :-) -- Frank Terbeck in #!/bin/mksh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* disassemblers (was Re: [PATCH 1/2] x86: Remove dynamic NOP selection)
  2024-01-20 17:19         ` Thorsten Glaser
@ 2024-01-20 18:21           ` Thorsten Glaser
  0 siblings, 0 replies; 51+ messages in thread
From: Thorsten Glaser @ 2024-01-20 18:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, Peter Zijlstra, x86, rostedt, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

Dixi quod…

>>Is there some sane tool that just does the sane thing and shows this as
>
>The only other disassemblers I know don’t know about ELF objects
>at all, I’m sorry to say.

I have searched through my bookmarks and found “Agner Fog’s objconv”
https://www.agner.org/optimize/#objconv which I had not yet tried as
it comes with a .exe but apparently, the included GPL source builds
on GNU/Linux (and BSD and MacOSX) as well.

Usage is: ./objconv -fgasm filename.o

This will write filename.asm ⚠ into the same directory as the .o file,
surprisingly.

It works for i386 and amd64 but not x32 (aka amd64ilp32) which is
mis-disassembled as if it were i386. Sample output fragment:

tsv_header:
        sub     rsp, 8                                  # 00E3 _ 48: 83. EC, 08
        lea     rdi, [.LC7+rip]                         # 00E7 _ 48: 8D. 3D, 00000000(rel)
        call    puts@PLT                                # 00EE _ E8, 00000000(PLT r)
        add     rsp, 8                                  # 00F3 _ 48: 83. C4, 08
        ret                                             # 00F7 _ C3

Bit irritating is it uses decimal numbers…

        sub     rsp, 232                                # 0102 _ 48: 81. EC, 000000E8

… and the way the input is separated with colon, period and comma,
but it’s legible enough.

Credits to Peter Cordes for the discovery.

bye,
//mirabilos
-- 
When he found out that the m68k port was in a pretty bad shape, he did
not, like many before him, shrug and move on; instead, he took it upon
himself to start compiling things, just so he could compile his shell.
How's that for dedication. -- Wouter, about my Debian/m68k revival

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH 1/2] x86: Remove dynamic NOP selection
  2024-01-20 17:00       ` Linus Torvalds
  2024-01-20 17:19         ` Thorsten Glaser
@ 2024-01-21 22:36         ` David Laight
  2024-01-21 23:10           ` H. Peter Anvin
  1 sibling, 1 reply; 51+ messages in thread
From: David Laight @ 2024-01-21 22:36 UTC (permalink / raw)
  To: 'Linus Torvalds', H. Peter Anvin
  Cc: Thorsten Glaser, Peter Zijlstra, x86, rostedt, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

From: Linus Torvalds
> Sent: 20 January 2024 17:00
...
> And if I add "-r" to get relocation info, I get
> 
>         call   <delayed_put_task_struct+0x1a>
>                         R_X86_64_PLT32  rethook_flush_task-0x4
> 
> which shows the raw relocation data, but with truly mind-bogglingly
> horrendous syntax.
> 
> Is there some sane tool that just does the sane thing and shows this as
> 
>         call   rethook_flush_task
> 
> which is what the thing actually means?

While you are re-writing a disassembler, remember to print the
contents of string when you get a reference into .rodata.str :-)

How many times have you had to dig out a printf format string in
order to locate the source associated with some object code?
It is so much easier if the disassembler does it for you.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH 1/2] x86: Remove dynamic NOP selection
  2024-01-21 22:36         ` [PATCH 1/2] x86: Remove dynamic NOP selection David Laight
@ 2024-01-21 23:10           ` H. Peter Anvin
  0 siblings, 0 replies; 51+ messages in thread
From: H. Peter Anvin @ 2024-01-21 23:10 UTC (permalink / raw)
  To: David Laight, 'Linus Torvalds'
  Cc: Thorsten Glaser, Peter Zijlstra, x86, rostedt, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

On January 21, 2024 2:36:32 PM PST, David Laight <David.Laight@ACULAB.COM> wrote:
>From: Linus Torvalds
>> Sent: 20 January 2024 17:00
>...
>> And if I add "-r" to get relocation info, I get
>> 
>>         call   <delayed_put_task_struct+0x1a>
>>                         R_X86_64_PLT32  rethook_flush_task-0x4
>> 
>> which shows the raw relocation data, but with truly mind-bogglingly
>> horrendous syntax.
>> 
>> Is there some sane tool that just does the sane thing and shows this as
>> 
>>         call   rethook_flush_task
>> 
>> which is what the thing actually means?
>
>While you are re-writing a disassembler, remember to print the
>contents of string when you get a reference into .rodata.str :-)
>
>How many times have you had to dig out a printf format string in
>order to locate the source associated with some object code?
>It is so much easier if the disassembler does it for you.
>
>	David
>
>-
>Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
>Registration No: 1397386 (Wales)
>

Probably don't even need to rewrite the disassembler. Postprocessing is probably sufficient.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/2] x86: Remove dynamic NOP selection
  2024-01-20 16:53       ` Thorsten Glaser
@ 2024-01-21 23:21         ` H. Peter Anvin
  2024-01-21 23:58           ` Thorsten Glaser
  0 siblings, 1 reply; 51+ messages in thread
From: H. Peter Anvin @ 2024-01-21 23:21 UTC (permalink / raw)
  To: Thorsten Glaser
  Cc: Peter Zijlstra, x86, rostedt, torvalds, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

On 1/20/24 08:53, Thorsten Glaser wrote:
> H. Peter Anvin dixit:
> 
>> %eiz was something that binutils used to put in when disassembling
>> certain redundant encodings with SIB at some point.
> 
> Ah, fair enough. Maybe this could be added as one more line in
> the comment or something.
> 

I think "proper" gas syntax would be 0(%esi,,1), although that doesn't 
assemble either (I don't believe there is a way to get gas to actually 
generate this sequence.)

But yes, with all even remotely recent CPUs all actually handling nopl 
properly, this isn't relevant anymore.

	-hpa


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/2] x86: Remove dynamic NOP selection
  2024-01-21 23:21         ` H. Peter Anvin
@ 2024-01-21 23:58           ` Thorsten Glaser
  2024-01-22  0:15             ` H. Peter Anvin
  0 siblings, 1 reply; 51+ messages in thread
From: Thorsten Glaser @ 2024-01-21 23:58 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Peter Zijlstra, x86, rostedt, torvalds, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

H. Peter Anvin dixit:

> But yes, with all even remotely recent CPUs all actually handling nopl
> properly, this isn't relevant anymore.

This was, incidentally, triggered by looking into a problem report that
something did *not* work on a Geode LX system.

People don’t just run Linux on “recent CPUs” (though I at least got me
an Atom and a Core2Duo for it and run BSD on my Pentium-M and VIA C7
systems).

bye,
//mirabilos
-- 
„Cool, /usr/share/doc/mksh/examples/uhr.gz ist ja ein Grund,
mksh auf jedem System zu installieren.“
	-- XTaran auf der OpenRheinRuhr, ganz begeistert
(EN: “[…]uhr.gz is a reason to install mksh on every system.”)

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/2] x86: Remove dynamic NOP selection
  2024-01-21 23:58           ` Thorsten Glaser
@ 2024-01-22  0:15             ` H. Peter Anvin
  2024-01-22  0:56               ` Steven Rostedt
  0 siblings, 1 reply; 51+ messages in thread
From: H. Peter Anvin @ 2024-01-22  0:15 UTC (permalink / raw)
  To: Thorsten Glaser
  Cc: Peter Zijlstra, x86, rostedt, torvalds, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

On January 21, 2024 3:58:11 PM PST, Thorsten Glaser <tg@debian.org> wrote:
>H. Peter Anvin dixit:
>
>> But yes, with all even remotely recent CPUs all actually handling nopl
>> properly, this isn't relevant anymore.
>
>This was, incidentally, triggered by looking into a problem report that
>something did *not* work on a Geode LX system.
>
>People don’t just run Linux on “recent CPUs” (though I at least got me
>an Atom and a Core2Duo for it and run BSD on my Pentium-M and VIA C7
>systems).
>
>bye,
>//mirabilos

Yes, but it is a matter of where we optimize for performance as opposed to correctness.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/2] x86: Remove dynamic NOP selection
  2024-01-22  0:15             ` H. Peter Anvin
@ 2024-01-22  0:56               ` Steven Rostedt
  2024-01-22  1:17                 ` Thorsten Glaser
  2024-01-22  2:15                 ` H. Peter Anvin
  0 siblings, 2 replies; 51+ messages in thread
From: Steven Rostedt @ 2024-01-22  0:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thorsten Glaser, Peter Zijlstra, x86, torvalds, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

On Sun, 21 Jan 2024 16:15:57 -0800
"H. Peter Anvin" <hpa@zytor.com> wrote:

> On January 21, 2024 3:58:11 PM PST, Thorsten Glaser <tg@debian.org> wrote:
> >H. Peter Anvin dixit:
> >  
> >> But yes, with all even remotely recent CPUs all actually handling nopl
> >> properly, this isn't relevant anymore.  
> >
> >This was, incidentally, triggered by looking into a problem report that
> >something did *not* work on a Geode LX system.

What problem happened?

> >
> >People don’t just run Linux on “recent CPUs” (though I at least got me
> >an Atom and a Core2Duo for it and run BSD on my Pentium-M and VIA C7
> >systems).
> >
> >bye,
> >//mirabilos  
> 
> Yes, but it is a matter of where we optimize for performance as opposed to correctness.

There is no such thing as "optimize for correctness", it is either
correct or it is not. Correctness should always come before performance
(at least that is what Thomas has pounded into me ;-)

If a kernel use to work on a machine but a newer version no longer
works, I call that a regression.

-- Steve

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/2] x86: Remove dynamic NOP selection
  2024-01-22  0:56               ` Steven Rostedt
@ 2024-01-22  1:17                 ` Thorsten Glaser
  2024-01-22  2:04                   ` H. Peter Anvin
  2024-01-22  2:15                 ` H. Peter Anvin
  1 sibling, 1 reply; 51+ messages in thread
From: Thorsten Glaser @ 2024-01-22  1:17 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: H. Peter Anvin, Peter Zijlstra, x86, torvalds, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

Steven Rostedt dixit:

>> >This was, incidentally, triggered by looking into a problem report that
>> >something did *not* work on a Geode LX system.
>
>What problem happened?

It turned out to be a compiler issue (GCC thinks i686 means PPro,
not 686-class CPUs, and -fcf-protection causes long NOPs, which
not all 686-class CPUs support, to be inserted). This turned out
to break a large part of Debian stable on OLPCs and other systems,
and the kernel’s changes in nopl handling were tabled as arguments.

https://www.jookia.org/wiki/Nopl has a longer writeup on the nopl
history.

bye,
//mirabilos
-- 
<igli> exceptions: a truly awful implementation of quite a nice idea.
<igli> just about the worst way you could do something like that, afaic.
<igli> it's like anti-design.  <mirabilos> that too… may I quote you on that?
<igli> sure, tho i doubt anyone will listen ;)

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/2] x86: Remove dynamic NOP selection
  2024-01-22  1:17                 ` Thorsten Glaser
@ 2024-01-22  2:04                   ` H. Peter Anvin
  0 siblings, 0 replies; 51+ messages in thread
From: H. Peter Anvin @ 2024-01-22  2:04 UTC (permalink / raw)
  To: Thorsten Glaser, Steven Rostedt
  Cc: Peter Zijlstra, x86, torvalds, linux-kernel, linux-toolchains,
	jpoimboe, alexei.starovoitov, mhiramat

On January 21, 2024 5:17:36 PM PST, Thorsten Glaser <tg@debian.org> wrote:
>Steven Rostedt dixit:
>
>>> >This was, incidentally, triggered by looking into a problem report that
>>> >something did *not* work on a Geode LX system.
>>
>>What problem happened?
>
>It turned out to be a compiler issue (GCC thinks i686 means PPro,
>not 686-class CPUs, and -fcf-protection causes long NOPs, which
>not all 686-class CPUs support, to be inserted). This turned out
>to break a large part of Debian stable on OLPCs and other systems,
>and the kernel’s changes in nopl handling were tabled as arguments.
>
>https://www.jookia.org/wiki/Nopl has a longer writeup on the nopl
>history.
>
>bye,
>//mirabilos

i686 *is* Pentium Pro...

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/2] x86: Remove dynamic NOP selection
  2024-01-22  0:56               ` Steven Rostedt
  2024-01-22  1:17                 ` Thorsten Glaser
@ 2024-01-22  2:15                 ` H. Peter Anvin
  2024-01-22  2:22                   ` Steven Rostedt
  1 sibling, 1 reply; 51+ messages in thread
From: H. Peter Anvin @ 2024-01-22  2:15 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thorsten Glaser, Peter Zijlstra, x86, torvalds, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

On 1/21/24 16:56, Steven Rostedt wrote:
>>
>> Yes, but it is a matter of where we optimize for performance as opposed to correctness.
> 
> There is no such thing as "optimize for correctness", it is either
> correct or it is not. Correctness should always come before performance
> (at least that is what Thomas has pounded into me ;-)
> 
> If a kernel use to work on a machine but a newer version no longer
> works, I call that a regression.
> 

There absolutely is such a thing as "optimize for correctness." It means 
to keep the code clean, easily testable, and with a minimal number of 
distinct code paths so that regressions and *especially* uncaught 
regressions get caught quickly.

	-hpa


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/2] x86: Remove dynamic NOP selection
  2024-01-22  2:15                 ` H. Peter Anvin
@ 2024-01-22  2:22                   ` Steven Rostedt
  2024-01-22  2:31                     ` H. Peter Anvin
  0 siblings, 1 reply; 51+ messages in thread
From: Steven Rostedt @ 2024-01-22  2:22 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thorsten Glaser, Peter Zijlstra, x86, torvalds, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

On Sun, 21 Jan 2024 18:15:39 -0800
"H. Peter Anvin" <hpa@zytor.com> wrote:

> On 1/21/24 16:56, Steven Rostedt wrote:
> >>
> >> Yes, but it is a matter of where we optimize for performance as opposed to correctness.  
> > 
> > There is no such thing as "optimize for correctness", it is either
> > correct or it is not. Correctness should always come before performance
> > (at least that is what Thomas has pounded into me ;-)
> > 
> > If a kernel use to work on a machine but a newer version no longer
> > works, I call that a regression.
> >   
> 
> There absolutely is such a thing as "optimize for correctness." It means 
> to keep the code clean, easily testable, and with a minimal number of 
> distinct code paths so that regressions and *especially* uncaught 
> regressions get caught quickly.

I call that maintainability, not correctness. It is either correct and
works, or is incorrect and does not work.

You can change code to be more maintainable and still make it incorrect.

-- Steve

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 1/2] x86: Remove dynamic NOP selection
  2024-01-22  2:22                   ` Steven Rostedt
@ 2024-01-22  2:31                     ` H. Peter Anvin
  0 siblings, 0 replies; 51+ messages in thread
From: H. Peter Anvin @ 2024-01-22  2:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thorsten Glaser, Peter Zijlstra, x86, torvalds, linux-kernel,
	linux-toolchains, jpoimboe, alexei.starovoitov, mhiramat

On January 21, 2024 6:22:36 PM PST, Steven Rostedt <rostedt@goodmis.org> wrote:
>On Sun, 21 Jan 2024 18:15:39 -0800
>"H. Peter Anvin" <hpa@zytor.com> wrote:
>
>> On 1/21/24 16:56, Steven Rostedt wrote:
>> >>
>> >> Yes, but it is a matter of where we optimize for performance as opposed to correctness.  
>> > 
>> > There is no such thing as "optimize for correctness", it is either
>> > correct or it is not. Correctness should always come before performance
>> > (at least that is what Thomas has pounded into me ;-)
>> > 
>> > If a kernel use to work on a machine but a newer version no longer
>> > works, I call that a regression.
>> >   
>> 
>> There absolutely is such a thing as "optimize for correctness." It means 
>> to keep the code clean, easily testable, and with a minimal number of 
>> distinct code paths so that regressions and *especially* uncaught 
>> regressions get caught quickly.
>
>I call that maintainability, not correctness. It is either correct and
>works, or is incorrect and does not work.
>
>You can change code to be more maintainable and still make it incorrect.
>
>-- Steve

Yes, of course. That's called failure :)

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2024-01-22  2:32 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-12 11:32 [PATCH 0/2] x86: Remove ideal_nops[] Peter Zijlstra
2021-03-12 11:32 ` [PATCH 1/2] x86: Remove dynamic NOP selection Peter Zijlstra
2021-03-12 12:09   ` Peter Zijlstra
2021-03-12 20:36     ` Linus Torvalds
2024-01-20  6:58   ` Thorsten Glaser
2024-01-20  8:22     ` H. Peter Anvin
2024-01-20 16:53       ` Thorsten Glaser
2024-01-21 23:21         ` H. Peter Anvin
2024-01-21 23:58           ` Thorsten Glaser
2024-01-22  0:15             ` H. Peter Anvin
2024-01-22  0:56               ` Steven Rostedt
2024-01-22  1:17                 ` Thorsten Glaser
2024-01-22  2:04                   ` H. Peter Anvin
2024-01-22  2:15                 ` H. Peter Anvin
2024-01-22  2:22                   ` Steven Rostedt
2024-01-22  2:31                     ` H. Peter Anvin
2024-01-20 17:00       ` Linus Torvalds
2024-01-20 17:19         ` Thorsten Glaser
2024-01-20 18:21           ` disassemblers (was Re: [PATCH 1/2] x86: Remove dynamic NOP selection) Thorsten Glaser
2024-01-21 22:36         ` [PATCH 1/2] x86: Remove dynamic NOP selection David Laight
2024-01-21 23:10           ` H. Peter Anvin
2021-03-12 11:32 ` [PATCH 2/2] objtool,x86: Use asm/nops.h Peter Zijlstra
2021-03-12 14:29 ` [PATCH 0/2] x86: Remove ideal_nops[] Sedat Dilek
2021-03-12 14:47   ` Borislav Petkov
2021-03-12 17:26     ` Steven Rostedt
2021-03-12 17:35       ` Sedat Dilek
2021-03-12 17:46         ` Borislav Petkov
2021-03-12 17:47         ` Steven Rostedt
2021-03-12 18:13           ` Sedat Dilek
2021-03-12 19:03             ` Sedat Dilek
2021-03-12 20:59 ` Borislav Petkov
     [not found]   ` <CA+icZUWSCS6vAQOXoG6nsW+Dbnogivzf+rmegCTMjz5hjE5cKQ@mail.gmail.com>
2021-03-13  8:49     ` Borislav Petkov
2021-03-13 11:23       ` Borislav Petkov
2021-03-13 12:10       ` Sedat Dilek
2021-03-13 12:15         ` Borislav Petkov
2021-03-13 12:38           ` Sedat Dilek
2021-03-13 12:49             ` Borislav Petkov
2021-03-13 12:58               ` Sedat Dilek
2021-03-13 13:29                 ` Borislav Petkov
2021-03-13 13:47                   ` Sedat Dilek
2021-03-15 17:04                     ` Sedat Dilek
2021-03-15 17:15                       ` Borislav Petkov
2021-03-15 17:19                         ` Sedat Dilek
2021-03-15 17:23                           ` Borislav Petkov
2021-03-15 18:10                       ` Peter Zijlstra
2021-03-15 18:23                         ` Sedat Dilek
2021-03-15 22:14                           ` Peter Zijlstra
2021-03-16  5:56                             ` Sedat Dilek
2021-03-27 12:08                               ` Sedat Dilek
2021-03-27 20:02                                 ` Linus Torvalds
2021-03-30 12:31                                   ` Sedat Dilek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).