bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] get rid of GCC __attribute__((optimize)) for BPF
@ 2020-10-28 17:15 Ard Biesheuvel
  2020-10-28 17:15 ` [PATCH v2 1/2] bpf: don't rely on GCC __attribute__((optimize)) to disable GCSE Ard Biesheuvel
  2020-10-28 17:15 ` [PATCH v2 2/2] bpf: move interpreter into separate source file Ard Biesheuvel
  0 siblings, 2 replies; 16+ messages in thread
From: Ard Biesheuvel @ 2020-10-28 17:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: netdev, bpf, arnd, Ard Biesheuvel, Nick Desaulniers,
	Arvind Sankar, Randy Dunlap, Josh Poimboeuf, Thomas Gleixner,
	Alexei Starovoitov, Daniel Borkmann, Peter Zijlstra,
	Geert Uytterhoeven, Kees Cook

This is a followup to [0]:
[PATCH] bpf: don't rely on GCC __attribute__((optimize)) to disable GCSE[0]

Changes since v1:
- only use -fno-gcse when CONFIG_BPF_JIT_ALWAYS_ON=y and CONFIG_CC_IS_GCC=y
  (but ignore CONFIG_RETPOLINE since we want to avoid GCSE in all cases)
- to avoid potential impact of disabling GCSE on other code, put the
  interpreter in a separate file (patch #2)

Note that patch #1 is intended for backporting, as function scope GCC
optimization attributes are really quite broken.

I don't have a strong opinion on whether the interpreter code should be
split off or not, but it looks like it can be done fairly painlessly,
so it is probably a good idea to do it anyway.

[0] https://lore.kernel.org/bpf/20201027205723.12514-1-ardb@kernel.org/

Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Kees Cook <keescook@chromium.org>

Ard Biesheuvel (2):
  bpf: don't rely on GCC __attribute__((optimize)) to disable GCSE
  bpf: move interpreter into separate source file

 include/linux/compiler-gcc.h   |   2 -
 include/linux/compiler_types.h |   4 -
 include/linux/filter.h         |   1 +
 kernel/bpf/Makefile            |   7 +-
 kernel/bpf/core.c              | 567 ------------------
 kernel/bpf/interp.c            | 601 ++++++++++++++++++++
 6 files changed, 607 insertions(+), 575 deletions(-)
 create mode 100644 kernel/bpf/interp.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/2] bpf: don't rely on GCC __attribute__((optimize)) to disable GCSE
  2020-10-28 17:15 [PATCH v2 0/2] get rid of GCC __attribute__((optimize)) for BPF Ard Biesheuvel
@ 2020-10-28 17:15 ` Ard Biesheuvel
  2020-10-28 21:39   ` Alexei Starovoitov
                     ` (2 more replies)
  2020-10-28 17:15 ` [PATCH v2 2/2] bpf: move interpreter into separate source file Ard Biesheuvel
  1 sibling, 3 replies; 16+ messages in thread
From: Ard Biesheuvel @ 2020-10-28 17:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: netdev, bpf, arnd, Ard Biesheuvel, Nick Desaulniers,
	Arvind Sankar, Randy Dunlap, Josh Poimboeuf, Thomas Gleixner,
	Alexei Starovoitov, Daniel Borkmann, Peter Zijlstra,
	Geert Uytterhoeven, Kees Cook

Commit 3193c0836 ("bpf: Disable GCC -fgcse optimization for
___bpf_prog_run()") introduced a __no_fgcse macro that expands to a
function scope __attribute__((optimize("-fno-gcse"))), to disable a
GCC specific optimization that was causing trouble on x86 builds, and
was not expected to have any positive effect in the first place.

However, as the GCC manual documents, __attribute__((optimize))
is not for production use, and results in all other optimization
options to be forgotten for the function in question. This can
cause all kinds of trouble, but in one particular reported case,
it causes -fno-asynchronous-unwind-tables to be disregarded,
resulting in .eh_frame info to be emitted for the function.

This reverts commit 3193c0836, and instead, it disables the -fgcse
optimization for the entire source file, but only when building for
X86 using GCC with CONFIG_BPF_JIT_ALWAYS_ON disabled. Note that the
original commit states that CONFIG_RETPOLINE=n triggers the issue,
whereas CONFIG_RETPOLINE=y performs better without the optimization,
so it is kept disabled in both cases.

Fixes: 3193c0836 ("bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()")
Link: https://lore.kernel.org/lkml/CAMuHMdUg0WJHEcq6to0-eODpXPOywLot6UD2=GFHpzoj_hCoBQ@mail.gmail.com/
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 include/linux/compiler-gcc.h   | 2 --
 include/linux/compiler_types.h | 4 ----
 kernel/bpf/Makefile            | 6 +++++-
 kernel/bpf/core.c              | 2 +-
 4 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index d1e3c6896b71..5deb37024574 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -175,5 +175,3 @@
 #else
 #define __diag_GCC_8(s)
 #endif
-
-#define __no_fgcse __attribute__((optimize("-fno-gcse")))
diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index 6e390d58a9f8..ac3fa37a84f9 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -247,10 +247,6 @@ struct ftrace_likely_data {
 #define asm_inline asm
 #endif
 
-#ifndef __no_fgcse
-# define __no_fgcse
-#endif
-
 /* Are two types/vars the same type (ignoring qualifiers)? */
 #define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b))
 
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index bdc8cd1b6767..c1b9f71ee6aa 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -1,6 +1,10 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-y := core.o
-CFLAGS_core.o += $(call cc-disable-warning, override-init)
+ifneq ($(CONFIG_BPF_JIT_ALWAYS_ON),y)
+# ___bpf_prog_run() needs GCSE disabled on x86; see 3193c0836f203 for details
+cflags-nogcse-$(CONFIG_X86)$(CONFIG_CC_IS_GCC) := -fno-gcse
+endif
+CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy)
 
 obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_iter.o map_iter.o task_iter.o prog_iter.o
 obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 9268d77898b7..55454d2278b1 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1369,7 +1369,7 @@ u64 __weak bpf_probe_read_kernel(void *dst, u32 size, const void *unsafe_ptr)
  *
  * Decode and execute eBPF instructions.
  */
-static u64 __no_fgcse ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn, u64 *stack)
+static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn, u64 *stack)
 {
 #define BPF_INSN_2_LBL(x, y)    [BPF_##x | BPF_##y] = &&x##_##y
 #define BPF_INSN_3_LBL(x, y, z) [BPF_##x | BPF_##y | BPF_##z] = &&x##_##y##_##z
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 2/2] bpf: move interpreter into separate source file
  2020-10-28 17:15 [PATCH v2 0/2] get rid of GCC __attribute__((optimize)) for BPF Ard Biesheuvel
  2020-10-28 17:15 ` [PATCH v2 1/2] bpf: don't rely on GCC __attribute__((optimize)) to disable GCSE Ard Biesheuvel
@ 2020-10-28 17:15 ` Ard Biesheuvel
  1 sibling, 0 replies; 16+ messages in thread
From: Ard Biesheuvel @ 2020-10-28 17:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: netdev, bpf, arnd, Ard Biesheuvel, Nick Desaulniers,
	Arvind Sankar, Randy Dunlap, Josh Poimboeuf, Thomas Gleixner,
	Alexei Starovoitov, Daniel Borkmann, Peter Zijlstra,
	Geert Uytterhoeven, Kees Cook

To reduce the impact of disabling certain compiler optimizations that
are only needed for the interpreter, move it into its own source file,
and apply the compiler command line override only to this file.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 include/linux/filter.h |   1 +
 kernel/bpf/Makefile    |   7 +-
 kernel/bpf/core.c      | 567 ------------------
 kernel/bpf/interp.c    | 601 ++++++++++++++++++++
 4 files changed, 605 insertions(+), 571 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 72d62cbc1578..5e027cddcbea 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -841,6 +841,7 @@ static inline int sk_filter(struct sock *sk, struct sk_buff *skb)
 }
 
 struct bpf_prog *bpf_prog_select_runtime(struct bpf_prog *fp, int *err);
+void bpf_prog_select_func(struct bpf_prog *fp);
 void bpf_prog_free(struct bpf_prog *fp);
 
 bool bpf_opcode_in_insntable(u8 code);
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index c1b9f71ee6aa..a1573be0d94b 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -1,10 +1,9 @@
 # SPDX-License-Identifier: GPL-2.0
-obj-y := core.o
-ifneq ($(CONFIG_BPF_JIT_ALWAYS_ON),y)
+obj-y := core.o interp.o
+
 # ___bpf_prog_run() needs GCSE disabled on x86; see 3193c0836f203 for details
 cflags-nogcse-$(CONFIG_X86)$(CONFIG_CC_IS_GCC) := -fno-gcse
-endif
-CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy)
+CFLAGS_interp.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy)
 
 obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_iter.o map_iter.o task_iter.o prog_iter.o
 obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 55454d2278b1..81d874b85240 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -34,28 +34,6 @@
 #include <linux/log2.h>
 #include <asm/unaligned.h>
 
-/* Registers */
-#define BPF_R0	regs[BPF_REG_0]
-#define BPF_R1	regs[BPF_REG_1]
-#define BPF_R2	regs[BPF_REG_2]
-#define BPF_R3	regs[BPF_REG_3]
-#define BPF_R4	regs[BPF_REG_4]
-#define BPF_R5	regs[BPF_REG_5]
-#define BPF_R6	regs[BPF_REG_6]
-#define BPF_R7	regs[BPF_REG_7]
-#define BPF_R8	regs[BPF_REG_8]
-#define BPF_R9	regs[BPF_REG_9]
-#define BPF_R10	regs[BPF_REG_10]
-
-/* Named registers */
-#define DST	regs[insn->dst_reg]
-#define SRC	regs[insn->src_reg]
-#define FP	regs[BPF_REG_FP]
-#define AX	regs[BPF_REG_AX]
-#define ARG1	regs[BPF_REG_ARG1]
-#define CTX	regs[BPF_REG_CTX]
-#define IMM	insn->imm
-
 /* No hurry in this branch
  *
  * Exported for the bpf jit load helper.
@@ -1196,540 +1174,6 @@ noinline u64 __bpf_call_base(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
 }
 EXPORT_SYMBOL_GPL(__bpf_call_base);
 
-/* All UAPI available opcodes. */
-#define BPF_INSN_MAP(INSN_2, INSN_3)		\
-	/* 32 bit ALU operations. */		\
-	/*   Register based. */			\
-	INSN_3(ALU, ADD,  X),			\
-	INSN_3(ALU, SUB,  X),			\
-	INSN_3(ALU, AND,  X),			\
-	INSN_3(ALU, OR,   X),			\
-	INSN_3(ALU, LSH,  X),			\
-	INSN_3(ALU, RSH,  X),			\
-	INSN_3(ALU, XOR,  X),			\
-	INSN_3(ALU, MUL,  X),			\
-	INSN_3(ALU, MOV,  X),			\
-	INSN_3(ALU, ARSH, X),			\
-	INSN_3(ALU, DIV,  X),			\
-	INSN_3(ALU, MOD,  X),			\
-	INSN_2(ALU, NEG),			\
-	INSN_3(ALU, END, TO_BE),		\
-	INSN_3(ALU, END, TO_LE),		\
-	/*   Immediate based. */		\
-	INSN_3(ALU, ADD,  K),			\
-	INSN_3(ALU, SUB,  K),			\
-	INSN_3(ALU, AND,  K),			\
-	INSN_3(ALU, OR,   K),			\
-	INSN_3(ALU, LSH,  K),			\
-	INSN_3(ALU, RSH,  K),			\
-	INSN_3(ALU, XOR,  K),			\
-	INSN_3(ALU, MUL,  K),			\
-	INSN_3(ALU, MOV,  K),			\
-	INSN_3(ALU, ARSH, K),			\
-	INSN_3(ALU, DIV,  K),			\
-	INSN_3(ALU, MOD,  K),			\
-	/* 64 bit ALU operations. */		\
-	/*   Register based. */			\
-	INSN_3(ALU64, ADD,  X),			\
-	INSN_3(ALU64, SUB,  X),			\
-	INSN_3(ALU64, AND,  X),			\
-	INSN_3(ALU64, OR,   X),			\
-	INSN_3(ALU64, LSH,  X),			\
-	INSN_3(ALU64, RSH,  X),			\
-	INSN_3(ALU64, XOR,  X),			\
-	INSN_3(ALU64, MUL,  X),			\
-	INSN_3(ALU64, MOV,  X),			\
-	INSN_3(ALU64, ARSH, X),			\
-	INSN_3(ALU64, DIV,  X),			\
-	INSN_3(ALU64, MOD,  X),			\
-	INSN_2(ALU64, NEG),			\
-	/*   Immediate based. */		\
-	INSN_3(ALU64, ADD,  K),			\
-	INSN_3(ALU64, SUB,  K),			\
-	INSN_3(ALU64, AND,  K),			\
-	INSN_3(ALU64, OR,   K),			\
-	INSN_3(ALU64, LSH,  K),			\
-	INSN_3(ALU64, RSH,  K),			\
-	INSN_3(ALU64, XOR,  K),			\
-	INSN_3(ALU64, MUL,  K),			\
-	INSN_3(ALU64, MOV,  K),			\
-	INSN_3(ALU64, ARSH, K),			\
-	INSN_3(ALU64, DIV,  K),			\
-	INSN_3(ALU64, MOD,  K),			\
-	/* Call instruction. */			\
-	INSN_2(JMP, CALL),			\
-	/* Exit instruction. */			\
-	INSN_2(JMP, EXIT),			\
-	/* 32-bit Jump instructions. */		\
-	/*   Register based. */			\
-	INSN_3(JMP32, JEQ,  X),			\
-	INSN_3(JMP32, JNE,  X),			\
-	INSN_3(JMP32, JGT,  X),			\
-	INSN_3(JMP32, JLT,  X),			\
-	INSN_3(JMP32, JGE,  X),			\
-	INSN_3(JMP32, JLE,  X),			\
-	INSN_3(JMP32, JSGT, X),			\
-	INSN_3(JMP32, JSLT, X),			\
-	INSN_3(JMP32, JSGE, X),			\
-	INSN_3(JMP32, JSLE, X),			\
-	INSN_3(JMP32, JSET, X),			\
-	/*   Immediate based. */		\
-	INSN_3(JMP32, JEQ,  K),			\
-	INSN_3(JMP32, JNE,  K),			\
-	INSN_3(JMP32, JGT,  K),			\
-	INSN_3(JMP32, JLT,  K),			\
-	INSN_3(JMP32, JGE,  K),			\
-	INSN_3(JMP32, JLE,  K),			\
-	INSN_3(JMP32, JSGT, K),			\
-	INSN_3(JMP32, JSLT, K),			\
-	INSN_3(JMP32, JSGE, K),			\
-	INSN_3(JMP32, JSLE, K),			\
-	INSN_3(JMP32, JSET, K),			\
-	/* Jump instructions. */		\
-	/*   Register based. */			\
-	INSN_3(JMP, JEQ,  X),			\
-	INSN_3(JMP, JNE,  X),			\
-	INSN_3(JMP, JGT,  X),			\
-	INSN_3(JMP, JLT,  X),			\
-	INSN_3(JMP, JGE,  X),			\
-	INSN_3(JMP, JLE,  X),			\
-	INSN_3(JMP, JSGT, X),			\
-	INSN_3(JMP, JSLT, X),			\
-	INSN_3(JMP, JSGE, X),			\
-	INSN_3(JMP, JSLE, X),			\
-	INSN_3(JMP, JSET, X),			\
-	/*   Immediate based. */		\
-	INSN_3(JMP, JEQ,  K),			\
-	INSN_3(JMP, JNE,  K),			\
-	INSN_3(JMP, JGT,  K),			\
-	INSN_3(JMP, JLT,  K),			\
-	INSN_3(JMP, JGE,  K),			\
-	INSN_3(JMP, JLE,  K),			\
-	INSN_3(JMP, JSGT, K),			\
-	INSN_3(JMP, JSLT, K),			\
-	INSN_3(JMP, JSGE, K),			\
-	INSN_3(JMP, JSLE, K),			\
-	INSN_3(JMP, JSET, K),			\
-	INSN_2(JMP, JA),			\
-	/* Store instructions. */		\
-	/*   Register based. */			\
-	INSN_3(STX, MEM,  B),			\
-	INSN_3(STX, MEM,  H),			\
-	INSN_3(STX, MEM,  W),			\
-	INSN_3(STX, MEM,  DW),			\
-	INSN_3(STX, XADD, W),			\
-	INSN_3(STX, XADD, DW),			\
-	/*   Immediate based. */		\
-	INSN_3(ST, MEM, B),			\
-	INSN_3(ST, MEM, H),			\
-	INSN_3(ST, MEM, W),			\
-	INSN_3(ST, MEM, DW),			\
-	/* Load instructions. */		\
-	/*   Register based. */			\
-	INSN_3(LDX, MEM, B),			\
-	INSN_3(LDX, MEM, H),			\
-	INSN_3(LDX, MEM, W),			\
-	INSN_3(LDX, MEM, DW),			\
-	/*   Immediate based. */		\
-	INSN_3(LD, IMM, DW)
-
-bool bpf_opcode_in_insntable(u8 code)
-{
-#define BPF_INSN_2_TBL(x, y)    [BPF_##x | BPF_##y] = true
-#define BPF_INSN_3_TBL(x, y, z) [BPF_##x | BPF_##y | BPF_##z] = true
-	static const bool public_insntable[256] = {
-		[0 ... 255] = false,
-		/* Now overwrite non-defaults ... */
-		BPF_INSN_MAP(BPF_INSN_2_TBL, BPF_INSN_3_TBL),
-		/* UAPI exposed, but rewritten opcodes. cBPF carry-over. */
-		[BPF_LD | BPF_ABS | BPF_B] = true,
-		[BPF_LD | BPF_ABS | BPF_H] = true,
-		[BPF_LD | BPF_ABS | BPF_W] = true,
-		[BPF_LD | BPF_IND | BPF_B] = true,
-		[BPF_LD | BPF_IND | BPF_H] = true,
-		[BPF_LD | BPF_IND | BPF_W] = true,
-	};
-#undef BPF_INSN_3_TBL
-#undef BPF_INSN_2_TBL
-	return public_insntable[code];
-}
-
-#ifndef CONFIG_BPF_JIT_ALWAYS_ON
-u64 __weak bpf_probe_read_kernel(void *dst, u32 size, const void *unsafe_ptr)
-{
-	memset(dst, 0, size);
-	return -EFAULT;
-}
-
-/**
- *	__bpf_prog_run - run eBPF program on a given context
- *	@regs: is the array of MAX_BPF_EXT_REG eBPF pseudo-registers
- *	@insn: is the array of eBPF instructions
- *	@stack: is the eBPF storage stack
- *
- * Decode and execute eBPF instructions.
- */
-static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn, u64 *stack)
-{
-#define BPF_INSN_2_LBL(x, y)    [BPF_##x | BPF_##y] = &&x##_##y
-#define BPF_INSN_3_LBL(x, y, z) [BPF_##x | BPF_##y | BPF_##z] = &&x##_##y##_##z
-	static const void * const jumptable[256] __annotate_jump_table = {
-		[0 ... 255] = &&default_label,
-		/* Now overwrite non-defaults ... */
-		BPF_INSN_MAP(BPF_INSN_2_LBL, BPF_INSN_3_LBL),
-		/* Non-UAPI available opcodes. */
-		[BPF_JMP | BPF_CALL_ARGS] = &&JMP_CALL_ARGS,
-		[BPF_JMP | BPF_TAIL_CALL] = &&JMP_TAIL_CALL,
-		[BPF_LDX | BPF_PROBE_MEM | BPF_B] = &&LDX_PROBE_MEM_B,
-		[BPF_LDX | BPF_PROBE_MEM | BPF_H] = &&LDX_PROBE_MEM_H,
-		[BPF_LDX | BPF_PROBE_MEM | BPF_W] = &&LDX_PROBE_MEM_W,
-		[BPF_LDX | BPF_PROBE_MEM | BPF_DW] = &&LDX_PROBE_MEM_DW,
-	};
-#undef BPF_INSN_3_LBL
-#undef BPF_INSN_2_LBL
-	u32 tail_call_cnt = 0;
-
-#define CONT	 ({ insn++; goto select_insn; })
-#define CONT_JMP ({ insn++; goto select_insn; })
-
-select_insn:
-	goto *jumptable[insn->code];
-
-	/* ALU */
-#define ALU(OPCODE, OP)			\
-	ALU64_##OPCODE##_X:		\
-		DST = DST OP SRC;	\
-		CONT;			\
-	ALU_##OPCODE##_X:		\
-		DST = (u32) DST OP (u32) SRC;	\
-		CONT;			\
-	ALU64_##OPCODE##_K:		\
-		DST = DST OP IMM;		\
-		CONT;			\
-	ALU_##OPCODE##_K:		\
-		DST = (u32) DST OP (u32) IMM;	\
-		CONT;
-
-	ALU(ADD,  +)
-	ALU(SUB,  -)
-	ALU(AND,  &)
-	ALU(OR,   |)
-	ALU(LSH, <<)
-	ALU(RSH, >>)
-	ALU(XOR,  ^)
-	ALU(MUL,  *)
-#undef ALU
-	ALU_NEG:
-		DST = (u32) -DST;
-		CONT;
-	ALU64_NEG:
-		DST = -DST;
-		CONT;
-	ALU_MOV_X:
-		DST = (u32) SRC;
-		CONT;
-	ALU_MOV_K:
-		DST = (u32) IMM;
-		CONT;
-	ALU64_MOV_X:
-		DST = SRC;
-		CONT;
-	ALU64_MOV_K:
-		DST = IMM;
-		CONT;
-	LD_IMM_DW:
-		DST = (u64) (u32) insn[0].imm | ((u64) (u32) insn[1].imm) << 32;
-		insn++;
-		CONT;
-	ALU_ARSH_X:
-		DST = (u64) (u32) (((s32) DST) >> SRC);
-		CONT;
-	ALU_ARSH_K:
-		DST = (u64) (u32) (((s32) DST) >> IMM);
-		CONT;
-	ALU64_ARSH_X:
-		(*(s64 *) &DST) >>= SRC;
-		CONT;
-	ALU64_ARSH_K:
-		(*(s64 *) &DST) >>= IMM;
-		CONT;
-	ALU64_MOD_X:
-		div64_u64_rem(DST, SRC, &AX);
-		DST = AX;
-		CONT;
-	ALU_MOD_X:
-		AX = (u32) DST;
-		DST = do_div(AX, (u32) SRC);
-		CONT;
-	ALU64_MOD_K:
-		div64_u64_rem(DST, IMM, &AX);
-		DST = AX;
-		CONT;
-	ALU_MOD_K:
-		AX = (u32) DST;
-		DST = do_div(AX, (u32) IMM);
-		CONT;
-	ALU64_DIV_X:
-		DST = div64_u64(DST, SRC);
-		CONT;
-	ALU_DIV_X:
-		AX = (u32) DST;
-		do_div(AX, (u32) SRC);
-		DST = (u32) AX;
-		CONT;
-	ALU64_DIV_K:
-		DST = div64_u64(DST, IMM);
-		CONT;
-	ALU_DIV_K:
-		AX = (u32) DST;
-		do_div(AX, (u32) IMM);
-		DST = (u32) AX;
-		CONT;
-	ALU_END_TO_BE:
-		switch (IMM) {
-		case 16:
-			DST = (__force u16) cpu_to_be16(DST);
-			break;
-		case 32:
-			DST = (__force u32) cpu_to_be32(DST);
-			break;
-		case 64:
-			DST = (__force u64) cpu_to_be64(DST);
-			break;
-		}
-		CONT;
-	ALU_END_TO_LE:
-		switch (IMM) {
-		case 16:
-			DST = (__force u16) cpu_to_le16(DST);
-			break;
-		case 32:
-			DST = (__force u32) cpu_to_le32(DST);
-			break;
-		case 64:
-			DST = (__force u64) cpu_to_le64(DST);
-			break;
-		}
-		CONT;
-
-	/* CALL */
-	JMP_CALL:
-		/* Function call scratches BPF_R1-BPF_R5 registers,
-		 * preserves BPF_R6-BPF_R9, and stores return value
-		 * into BPF_R0.
-		 */
-		BPF_R0 = (__bpf_call_base + insn->imm)(BPF_R1, BPF_R2, BPF_R3,
-						       BPF_R4, BPF_R5);
-		CONT;
-
-	JMP_CALL_ARGS:
-		BPF_R0 = (__bpf_call_base_args + insn->imm)(BPF_R1, BPF_R2,
-							    BPF_R3, BPF_R4,
-							    BPF_R5,
-							    insn + insn->off + 1);
-		CONT;
-
-	JMP_TAIL_CALL: {
-		struct bpf_map *map = (struct bpf_map *) (unsigned long) BPF_R2;
-		struct bpf_array *array = container_of(map, struct bpf_array, map);
-		struct bpf_prog *prog;
-		u32 index = BPF_R3;
-
-		if (unlikely(index >= array->map.max_entries))
-			goto out;
-		if (unlikely(tail_call_cnt > MAX_TAIL_CALL_CNT))
-			goto out;
-
-		tail_call_cnt++;
-
-		prog = READ_ONCE(array->ptrs[index]);
-		if (!prog)
-			goto out;
-
-		/* ARG1 at this point is guaranteed to point to CTX from
-		 * the verifier side due to the fact that the tail call is
-		 * handled like a helper, that is, bpf_tail_call_proto,
-		 * where arg1_type is ARG_PTR_TO_CTX.
-		 */
-		insn = prog->insnsi;
-		goto select_insn;
-out:
-		CONT;
-	}
-	JMP_JA:
-		insn += insn->off;
-		CONT;
-	JMP_EXIT:
-		return BPF_R0;
-	/* JMP */
-#define COND_JMP(SIGN, OPCODE, CMP_OP)				\
-	JMP_##OPCODE##_X:					\
-		if ((SIGN##64) DST CMP_OP (SIGN##64) SRC) {	\
-			insn += insn->off;			\
-			CONT_JMP;				\
-		}						\
-		CONT;						\
-	JMP32_##OPCODE##_X:					\
-		if ((SIGN##32) DST CMP_OP (SIGN##32) SRC) {	\
-			insn += insn->off;			\
-			CONT_JMP;				\
-		}						\
-		CONT;						\
-	JMP_##OPCODE##_K:					\
-		if ((SIGN##64) DST CMP_OP (SIGN##64) IMM) {	\
-			insn += insn->off;			\
-			CONT_JMP;				\
-		}						\
-		CONT;						\
-	JMP32_##OPCODE##_K:					\
-		if ((SIGN##32) DST CMP_OP (SIGN##32) IMM) {	\
-			insn += insn->off;			\
-			CONT_JMP;				\
-		}						\
-		CONT;
-	COND_JMP(u, JEQ, ==)
-	COND_JMP(u, JNE, !=)
-	COND_JMP(u, JGT, >)
-	COND_JMP(u, JLT, <)
-	COND_JMP(u, JGE, >=)
-	COND_JMP(u, JLE, <=)
-	COND_JMP(u, JSET, &)
-	COND_JMP(s, JSGT, >)
-	COND_JMP(s, JSLT, <)
-	COND_JMP(s, JSGE, >=)
-	COND_JMP(s, JSLE, <=)
-#undef COND_JMP
-	/* STX and ST and LDX*/
-#define LDST(SIZEOP, SIZE)						\
-	STX_MEM_##SIZEOP:						\
-		*(SIZE *)(unsigned long) (DST + insn->off) = SRC;	\
-		CONT;							\
-	ST_MEM_##SIZEOP:						\
-		*(SIZE *)(unsigned long) (DST + insn->off) = IMM;	\
-		CONT;							\
-	LDX_MEM_##SIZEOP:						\
-		DST = *(SIZE *)(unsigned long) (SRC + insn->off);	\
-		CONT;
-
-	LDST(B,   u8)
-	LDST(H,  u16)
-	LDST(W,  u32)
-	LDST(DW, u64)
-#undef LDST
-#define LDX_PROBE(SIZEOP, SIZE)							\
-	LDX_PROBE_MEM_##SIZEOP:							\
-		bpf_probe_read_kernel(&DST, SIZE, (const void *)(long) (SRC + insn->off));	\
-		CONT;
-	LDX_PROBE(B,  1)
-	LDX_PROBE(H,  2)
-	LDX_PROBE(W,  4)
-	LDX_PROBE(DW, 8)
-#undef LDX_PROBE
-
-	STX_XADD_W: /* lock xadd *(u32 *)(dst_reg + off16) += src_reg */
-		atomic_add((u32) SRC, (atomic_t *)(unsigned long)
-			   (DST + insn->off));
-		CONT;
-	STX_XADD_DW: /* lock xadd *(u64 *)(dst_reg + off16) += src_reg */
-		atomic64_add((u64) SRC, (atomic64_t *)(unsigned long)
-			     (DST + insn->off));
-		CONT;
-
-	default_label:
-		/* If we ever reach this, we have a bug somewhere. Die hard here
-		 * instead of just returning 0; we could be somewhere in a subprog,
-		 * so execution could continue otherwise which we do /not/ want.
-		 *
-		 * Note, verifier whitelists all opcodes in bpf_opcode_in_insntable().
-		 */
-		pr_warn("BPF interpreter: unknown opcode %02x\n", insn->code);
-		BUG_ON(1);
-		return 0;
-}
-
-#define PROG_NAME(stack_size) __bpf_prog_run##stack_size
-#define DEFINE_BPF_PROG_RUN(stack_size) \
-static unsigned int PROG_NAME(stack_size)(const void *ctx, const struct bpf_insn *insn) \
-{ \
-	u64 stack[stack_size / sizeof(u64)]; \
-	u64 regs[MAX_BPF_EXT_REG]; \
-\
-	FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)]; \
-	ARG1 = (u64) (unsigned long) ctx; \
-	return ___bpf_prog_run(regs, insn, stack); \
-}
-
-#define PROG_NAME_ARGS(stack_size) __bpf_prog_run_args##stack_size
-#define DEFINE_BPF_PROG_RUN_ARGS(stack_size) \
-static u64 PROG_NAME_ARGS(stack_size)(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5, \
-				      const struct bpf_insn *insn) \
-{ \
-	u64 stack[stack_size / sizeof(u64)]; \
-	u64 regs[MAX_BPF_EXT_REG]; \
-\
-	FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)]; \
-	BPF_R1 = r1; \
-	BPF_R2 = r2; \
-	BPF_R3 = r3; \
-	BPF_R4 = r4; \
-	BPF_R5 = r5; \
-	return ___bpf_prog_run(regs, insn, stack); \
-}
-
-#define EVAL1(FN, X) FN(X)
-#define EVAL2(FN, X, Y...) FN(X) EVAL1(FN, Y)
-#define EVAL3(FN, X, Y...) FN(X) EVAL2(FN, Y)
-#define EVAL4(FN, X, Y...) FN(X) EVAL3(FN, Y)
-#define EVAL5(FN, X, Y...) FN(X) EVAL4(FN, Y)
-#define EVAL6(FN, X, Y...) FN(X) EVAL5(FN, Y)
-
-EVAL6(DEFINE_BPF_PROG_RUN, 32, 64, 96, 128, 160, 192);
-EVAL6(DEFINE_BPF_PROG_RUN, 224, 256, 288, 320, 352, 384);
-EVAL4(DEFINE_BPF_PROG_RUN, 416, 448, 480, 512);
-
-EVAL6(DEFINE_BPF_PROG_RUN_ARGS, 32, 64, 96, 128, 160, 192);
-EVAL6(DEFINE_BPF_PROG_RUN_ARGS, 224, 256, 288, 320, 352, 384);
-EVAL4(DEFINE_BPF_PROG_RUN_ARGS, 416, 448, 480, 512);
-
-#define PROG_NAME_LIST(stack_size) PROG_NAME(stack_size),
-
-static unsigned int (*interpreters[])(const void *ctx,
-				      const struct bpf_insn *insn) = {
-EVAL6(PROG_NAME_LIST, 32, 64, 96, 128, 160, 192)
-EVAL6(PROG_NAME_LIST, 224, 256, 288, 320, 352, 384)
-EVAL4(PROG_NAME_LIST, 416, 448, 480, 512)
-};
-#undef PROG_NAME_LIST
-#define PROG_NAME_LIST(stack_size) PROG_NAME_ARGS(stack_size),
-static u64 (*interpreters_args[])(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5,
-				  const struct bpf_insn *insn) = {
-EVAL6(PROG_NAME_LIST, 32, 64, 96, 128, 160, 192)
-EVAL6(PROG_NAME_LIST, 224, 256, 288, 320, 352, 384)
-EVAL4(PROG_NAME_LIST, 416, 448, 480, 512)
-};
-#undef PROG_NAME_LIST
-
-void bpf_patch_call_args(struct bpf_insn *insn, u32 stack_depth)
-{
-	stack_depth = max_t(u32, stack_depth, 1);
-	insn->off = (s16) insn->imm;
-	insn->imm = interpreters_args[(round_up(stack_depth, 32) / 32) - 1] -
-		__bpf_call_base_args;
-	insn->code = BPF_JMP | BPF_CALL_ARGS;
-}
-
-#else
-static unsigned int __bpf_prog_ret0_warn(const void *ctx,
-					 const struct bpf_insn *insn)
-{
-	/* If this handler ever gets executed, then BPF_JIT_ALWAYS_ON
-	 * is not working properly, so warn about it!
-	 */
-	WARN_ON_ONCE(1);
-	return 0;
-}
-#endif
-
 bool bpf_prog_array_compatible(struct bpf_array *array,
 			       const struct bpf_prog *fp)
 {
@@ -1774,17 +1218,6 @@ static int bpf_check_tail_call(const struct bpf_prog *fp)
 	return ret;
 }
 
-static void bpf_prog_select_func(struct bpf_prog *fp)
-{
-#ifndef CONFIG_BPF_JIT_ALWAYS_ON
-	u32 stack_depth = max_t(u32, fp->aux->stack_depth, 1);
-
-	fp->bpf_func = interpreters[(round_up(stack_depth, 32) / 32) - 1];
-#else
-	fp->bpf_func = __bpf_prog_ret0_warn;
-#endif
-}
-
 /**
  *	bpf_prog_select_runtime - select exec runtime for BPF program
  *	@fp: bpf_prog populated with internal BPF program
diff --git a/kernel/bpf/interp.c b/kernel/bpf/interp.c
new file mode 100644
index 000000000000..793ab5b2d62b
--- /dev/null
+++ b/kernel/bpf/interp.c
@@ -0,0 +1,601 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Linux Socket Filter - Kernel level socket filtering
+ *
+ * Based on the design of the Berkeley Packet Filter. The new
+ * internal format has been designed by PLUMgrid:
+ *
+ *	Copyright (c) 2011 - 2014 PLUMgrid, http://plumgrid.com
+ *
+ * Authors:
+ *
+ *	Jay Schulist <jschlst@samba.org>
+ *	Alexei Starovoitov <ast@plumgrid.com>
+ *	Daniel Borkmann <dborkman@redhat.com>
+ *
+ * Andi Kleen - Fix a few bad bugs and races.
+ * Kris Katterjohn - Added many additional checks in bpf_check_classic()
+ */
+
+#include <uapi/linux/btf.h>
+#include <linux/filter.h>
+#include <linux/skbuff.h>
+#include <linux/vmalloc.h>
+#include <linux/random.h>
+#include <linux/moduleloader.h>
+#include <linux/bpf.h>
+#include <linux/btf.h>
+#include <linux/objtool.h>
+#include <linux/rbtree_latch.h>
+#include <linux/kallsyms.h>
+#include <linux/rcupdate.h>
+#include <linux/perf_event.h>
+#include <linux/extable.h>
+#include <linux/log2.h>
+#include <asm/unaligned.h>
+
+/* Registers */
+#define BPF_R0	regs[BPF_REG_0]
+#define BPF_R1	regs[BPF_REG_1]
+#define BPF_R2	regs[BPF_REG_2]
+#define BPF_R3	regs[BPF_REG_3]
+#define BPF_R4	regs[BPF_REG_4]
+#define BPF_R5	regs[BPF_REG_5]
+#define BPF_R6	regs[BPF_REG_6]
+#define BPF_R7	regs[BPF_REG_7]
+#define BPF_R8	regs[BPF_REG_8]
+#define BPF_R9	regs[BPF_REG_9]
+#define BPF_R10	regs[BPF_REG_10]
+
+/* Named registers */
+#define DST	regs[insn->dst_reg]
+#define SRC	regs[insn->src_reg]
+#define FP	regs[BPF_REG_FP]
+#define AX	regs[BPF_REG_AX]
+#define ARG1	regs[BPF_REG_ARG1]
+#define CTX	regs[BPF_REG_CTX]
+#define IMM	insn->imm
+
+/* All UAPI available opcodes. */
+#define BPF_INSN_MAP(INSN_2, INSN_3)		\
+	/* 32 bit ALU operations. */		\
+	/*   Register based. */			\
+	INSN_3(ALU, ADD,  X),			\
+	INSN_3(ALU, SUB,  X),			\
+	INSN_3(ALU, AND,  X),			\
+	INSN_3(ALU, OR,   X),			\
+	INSN_3(ALU, LSH,  X),			\
+	INSN_3(ALU, RSH,  X),			\
+	INSN_3(ALU, XOR,  X),			\
+	INSN_3(ALU, MUL,  X),			\
+	INSN_3(ALU, MOV,  X),			\
+	INSN_3(ALU, ARSH, X),			\
+	INSN_3(ALU, DIV,  X),			\
+	INSN_3(ALU, MOD,  X),			\
+	INSN_2(ALU, NEG),			\
+	INSN_3(ALU, END, TO_BE),		\
+	INSN_3(ALU, END, TO_LE),		\
+	/*   Immediate based. */		\
+	INSN_3(ALU, ADD,  K),			\
+	INSN_3(ALU, SUB,  K),			\
+	INSN_3(ALU, AND,  K),			\
+	INSN_3(ALU, OR,   K),			\
+	INSN_3(ALU, LSH,  K),			\
+	INSN_3(ALU, RSH,  K),			\
+	INSN_3(ALU, XOR,  K),			\
+	INSN_3(ALU, MUL,  K),			\
+	INSN_3(ALU, MOV,  K),			\
+	INSN_3(ALU, ARSH, K),			\
+	INSN_3(ALU, DIV,  K),			\
+	INSN_3(ALU, MOD,  K),			\
+	/* 64 bit ALU operations. */		\
+	/*   Register based. */			\
+	INSN_3(ALU64, ADD,  X),			\
+	INSN_3(ALU64, SUB,  X),			\
+	INSN_3(ALU64, AND,  X),			\
+	INSN_3(ALU64, OR,   X),			\
+	INSN_3(ALU64, LSH,  X),			\
+	INSN_3(ALU64, RSH,  X),			\
+	INSN_3(ALU64, XOR,  X),			\
+	INSN_3(ALU64, MUL,  X),			\
+	INSN_3(ALU64, MOV,  X),			\
+	INSN_3(ALU64, ARSH, X),			\
+	INSN_3(ALU64, DIV,  X),			\
+	INSN_3(ALU64, MOD,  X),			\
+	INSN_2(ALU64, NEG),			\
+	/*   Immediate based. */		\
+	INSN_3(ALU64, ADD,  K),			\
+	INSN_3(ALU64, SUB,  K),			\
+	INSN_3(ALU64, AND,  K),			\
+	INSN_3(ALU64, OR,   K),			\
+	INSN_3(ALU64, LSH,  K),			\
+	INSN_3(ALU64, RSH,  K),			\
+	INSN_3(ALU64, XOR,  K),			\
+	INSN_3(ALU64, MUL,  K),			\
+	INSN_3(ALU64, MOV,  K),			\
+	INSN_3(ALU64, ARSH, K),			\
+	INSN_3(ALU64, DIV,  K),			\
+	INSN_3(ALU64, MOD,  K),			\
+	/* Call instruction. */			\
+	INSN_2(JMP, CALL),			\
+	/* Exit instruction. */			\
+	INSN_2(JMP, EXIT),			\
+	/* 32-bit Jump instructions. */		\
+	/*   Register based. */			\
+	INSN_3(JMP32, JEQ,  X),			\
+	INSN_3(JMP32, JNE,  X),			\
+	INSN_3(JMP32, JGT,  X),			\
+	INSN_3(JMP32, JLT,  X),			\
+	INSN_3(JMP32, JGE,  X),			\
+	INSN_3(JMP32, JLE,  X),			\
+	INSN_3(JMP32, JSGT, X),			\
+	INSN_3(JMP32, JSLT, X),			\
+	INSN_3(JMP32, JSGE, X),			\
+	INSN_3(JMP32, JSLE, X),			\
+	INSN_3(JMP32, JSET, X),			\
+	/*   Immediate based. */		\
+	INSN_3(JMP32, JEQ,  K),			\
+	INSN_3(JMP32, JNE,  K),			\
+	INSN_3(JMP32, JGT,  K),			\
+	INSN_3(JMP32, JLT,  K),			\
+	INSN_3(JMP32, JGE,  K),			\
+	INSN_3(JMP32, JLE,  K),			\
+	INSN_3(JMP32, JSGT, K),			\
+	INSN_3(JMP32, JSLT, K),			\
+	INSN_3(JMP32, JSGE, K),			\
+	INSN_3(JMP32, JSLE, K),			\
+	INSN_3(JMP32, JSET, K),			\
+	/* Jump instructions. */		\
+	/*   Register based. */			\
+	INSN_3(JMP, JEQ,  X),			\
+	INSN_3(JMP, JNE,  X),			\
+	INSN_3(JMP, JGT,  X),			\
+	INSN_3(JMP, JLT,  X),			\
+	INSN_3(JMP, JGE,  X),			\
+	INSN_3(JMP, JLE,  X),			\
+	INSN_3(JMP, JSGT, X),			\
+	INSN_3(JMP, JSLT, X),			\
+	INSN_3(JMP, JSGE, X),			\
+	INSN_3(JMP, JSLE, X),			\
+	INSN_3(JMP, JSET, X),			\
+	/*   Immediate based. */		\
+	INSN_3(JMP, JEQ,  K),			\
+	INSN_3(JMP, JNE,  K),			\
+	INSN_3(JMP, JGT,  K),			\
+	INSN_3(JMP, JLT,  K),			\
+	INSN_3(JMP, JGE,  K),			\
+	INSN_3(JMP, JLE,  K),			\
+	INSN_3(JMP, JSGT, K),			\
+	INSN_3(JMP, JSLT, K),			\
+	INSN_3(JMP, JSGE, K),			\
+	INSN_3(JMP, JSLE, K),			\
+	INSN_3(JMP, JSET, K),			\
+	INSN_2(JMP, JA),			\
+	/* Store instructions. */		\
+	/*   Register based. */			\
+	INSN_3(STX, MEM,  B),			\
+	INSN_3(STX, MEM,  H),			\
+	INSN_3(STX, MEM,  W),			\
+	INSN_3(STX, MEM,  DW),			\
+	INSN_3(STX, XADD, W),			\
+	INSN_3(STX, XADD, DW),			\
+	/*   Immediate based. */		\
+	INSN_3(ST, MEM, B),			\
+	INSN_3(ST, MEM, H),			\
+	INSN_3(ST, MEM, W),			\
+	INSN_3(ST, MEM, DW),			\
+	/* Load instructions. */		\
+	/*   Register based. */			\
+	INSN_3(LDX, MEM, B),			\
+	INSN_3(LDX, MEM, H),			\
+	INSN_3(LDX, MEM, W),			\
+	INSN_3(LDX, MEM, DW),			\
+	/*   Immediate based. */		\
+	INSN_3(LD, IMM, DW)
+
+bool bpf_opcode_in_insntable(u8 code)
+{
+#define BPF_INSN_2_TBL(x, y)    [BPF_##x | BPF_##y] = true
+#define BPF_INSN_3_TBL(x, y, z) [BPF_##x | BPF_##y | BPF_##z] = true
+	static const bool public_insntable[256] = {
+		[0 ... 255] = false,
+		/* Now overwrite non-defaults ... */
+		BPF_INSN_MAP(BPF_INSN_2_TBL, BPF_INSN_3_TBL),
+		/* UAPI exposed, but rewritten opcodes. cBPF carry-over. */
+		[BPF_LD | BPF_ABS | BPF_B] = true,
+		[BPF_LD | BPF_ABS | BPF_H] = true,
+		[BPF_LD | BPF_ABS | BPF_W] = true,
+		[BPF_LD | BPF_IND | BPF_B] = true,
+		[BPF_LD | BPF_IND | BPF_H] = true,
+		[BPF_LD | BPF_IND | BPF_W] = true,
+	};
+#undef BPF_INSN_3_TBL
+#undef BPF_INSN_2_TBL
+	return public_insntable[code];
+}
+
+#ifndef CONFIG_BPF_JIT_ALWAYS_ON
+u64 __weak bpf_probe_read_kernel(void *dst, u32 size, const void *unsafe_ptr)
+{
+	memset(dst, 0, size);
+	return -EFAULT;
+}
+
+/**
+ *	__bpf_prog_run - run eBPF program on a given context
+ *	@regs: is the array of MAX_BPF_EXT_REG eBPF pseudo-registers
+ *	@insn: is the array of eBPF instructions
+ *	@stack: is the eBPF storage stack
+ *
+ * Decode and execute eBPF instructions.
+ */
+static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn, u64 *stack)
+{
+#define BPF_INSN_2_LBL(x, y)    [BPF_##x | BPF_##y] = &&x##_##y
+#define BPF_INSN_3_LBL(x, y, z) [BPF_##x | BPF_##y | BPF_##z] = &&x##_##y##_##z
+	static const void * const jumptable[256] __annotate_jump_table = {
+		[0 ... 255] = &&default_label,
+		/* Now overwrite non-defaults ... */
+		BPF_INSN_MAP(BPF_INSN_2_LBL, BPF_INSN_3_LBL),
+		/* Non-UAPI available opcodes. */
+		[BPF_JMP | BPF_CALL_ARGS] = &&JMP_CALL_ARGS,
+		[BPF_JMP | BPF_TAIL_CALL] = &&JMP_TAIL_CALL,
+		[BPF_LDX | BPF_PROBE_MEM | BPF_B] = &&LDX_PROBE_MEM_B,
+		[BPF_LDX | BPF_PROBE_MEM | BPF_H] = &&LDX_PROBE_MEM_H,
+		[BPF_LDX | BPF_PROBE_MEM | BPF_W] = &&LDX_PROBE_MEM_W,
+		[BPF_LDX | BPF_PROBE_MEM | BPF_DW] = &&LDX_PROBE_MEM_DW,
+	};
+#undef BPF_INSN_3_LBL
+#undef BPF_INSN_2_LBL
+	u32 tail_call_cnt = 0;
+
+#define CONT	 ({ insn++; goto select_insn; })
+#define CONT_JMP ({ insn++; goto select_insn; })
+
+select_insn:
+	goto *jumptable[insn->code];
+
+	/* ALU */
+#define ALU(OPCODE, OP)			\
+	ALU64_##OPCODE##_X:		\
+		DST = DST OP SRC;	\
+		CONT;			\
+	ALU_##OPCODE##_X:		\
+		DST = (u32) DST OP (u32) SRC;	\
+		CONT;			\
+	ALU64_##OPCODE##_K:		\
+		DST = DST OP IMM;		\
+		CONT;			\
+	ALU_##OPCODE##_K:		\
+		DST = (u32) DST OP (u32) IMM;	\
+		CONT;
+
+	ALU(ADD,  +)
+	ALU(SUB,  -)
+	ALU(AND,  &)
+	ALU(OR,   |)
+	ALU(LSH, <<)
+	ALU(RSH, >>)
+	ALU(XOR,  ^)
+	ALU(MUL,  *)
+#undef ALU
+	ALU_NEG:
+		DST = (u32) -DST;
+		CONT;
+	ALU64_NEG:
+		DST = -DST;
+		CONT;
+	ALU_MOV_X:
+		DST = (u32) SRC;
+		CONT;
+	ALU_MOV_K:
+		DST = (u32) IMM;
+		CONT;
+	ALU64_MOV_X:
+		DST = SRC;
+		CONT;
+	ALU64_MOV_K:
+		DST = IMM;
+		CONT;
+	LD_IMM_DW:
+		DST = (u64) (u32) insn[0].imm | ((u64) (u32) insn[1].imm) << 32;
+		insn++;
+		CONT;
+	ALU_ARSH_X:
+		DST = (u64) (u32) (((s32) DST) >> SRC);
+		CONT;
+	ALU_ARSH_K:
+		DST = (u64) (u32) (((s32) DST) >> IMM);
+		CONT;
+	ALU64_ARSH_X:
+		(*(s64 *) &DST) >>= SRC;
+		CONT;
+	ALU64_ARSH_K:
+		(*(s64 *) &DST) >>= IMM;
+		CONT;
+	ALU64_MOD_X:
+		div64_u64_rem(DST, SRC, &AX);
+		DST = AX;
+		CONT;
+	ALU_MOD_X:
+		AX = (u32) DST;
+		DST = do_div(AX, (u32) SRC);
+		CONT;
+	ALU64_MOD_K:
+		div64_u64_rem(DST, IMM, &AX);
+		DST = AX;
+		CONT;
+	ALU_MOD_K:
+		AX = (u32) DST;
+		DST = do_div(AX, (u32) IMM);
+		CONT;
+	ALU64_DIV_X:
+		DST = div64_u64(DST, SRC);
+		CONT;
+	ALU_DIV_X:
+		AX = (u32) DST;
+		do_div(AX, (u32) SRC);
+		DST = (u32) AX;
+		CONT;
+	ALU64_DIV_K:
+		DST = div64_u64(DST, IMM);
+		CONT;
+	ALU_DIV_K:
+		AX = (u32) DST;
+		do_div(AX, (u32) IMM);
+		DST = (u32) AX;
+		CONT;
+	ALU_END_TO_BE:
+		switch (IMM) {
+		case 16:
+			DST = (__force u16) cpu_to_be16(DST);
+			break;
+		case 32:
+			DST = (__force u32) cpu_to_be32(DST);
+			break;
+		case 64:
+			DST = (__force u64) cpu_to_be64(DST);
+			break;
+		}
+		CONT;
+	ALU_END_TO_LE:
+		switch (IMM) {
+		case 16:
+			DST = (__force u16) cpu_to_le16(DST);
+			break;
+		case 32:
+			DST = (__force u32) cpu_to_le32(DST);
+			break;
+		case 64:
+			DST = (__force u64) cpu_to_le64(DST);
+			break;
+		}
+		CONT;
+
+	/* CALL */
+	JMP_CALL:
+		/* Function call scratches BPF_R1-BPF_R5 registers,
+		 * preserves BPF_R6-BPF_R9, and stores return value
+		 * into BPF_R0.
+		 */
+		BPF_R0 = (__bpf_call_base + insn->imm)(BPF_R1, BPF_R2, BPF_R3,
+						       BPF_R4, BPF_R5);
+		CONT;
+
+	JMP_CALL_ARGS:
+		BPF_R0 = (__bpf_call_base_args + insn->imm)(BPF_R1, BPF_R2,
+							    BPF_R3, BPF_R4,
+							    BPF_R5,
+							    insn + insn->off + 1);
+		CONT;
+
+	JMP_TAIL_CALL: {
+		struct bpf_map *map = (struct bpf_map *) (unsigned long) BPF_R2;
+		struct bpf_array *array = container_of(map, struct bpf_array, map);
+		struct bpf_prog *prog;
+		u32 index = BPF_R3;
+
+		if (unlikely(index >= array->map.max_entries))
+			goto out;
+		if (unlikely(tail_call_cnt > MAX_TAIL_CALL_CNT))
+			goto out;
+
+		tail_call_cnt++;
+
+		prog = READ_ONCE(array->ptrs[index]);
+		if (!prog)
+			goto out;
+
+		/* ARG1 at this point is guaranteed to point to CTX from
+		 * the verifier side due to the fact that the tail call is
+		 * handled like a helper, that is, bpf_tail_call_proto,
+		 * where arg1_type is ARG_PTR_TO_CTX.
+		 */
+		insn = prog->insnsi;
+		goto select_insn;
+out:
+		CONT;
+	}
+	JMP_JA:
+		insn += insn->off;
+		CONT;
+	JMP_EXIT:
+		return BPF_R0;
+	/* JMP */
+#define COND_JMP(SIGN, OPCODE, CMP_OP)				\
+	JMP_##OPCODE##_X:					\
+		if ((SIGN##64) DST CMP_OP (SIGN##64) SRC) {	\
+			insn += insn->off;			\
+			CONT_JMP;				\
+		}						\
+		CONT;						\
+	JMP32_##OPCODE##_X:					\
+		if ((SIGN##32) DST CMP_OP (SIGN##32) SRC) {	\
+			insn += insn->off;			\
+			CONT_JMP;				\
+		}						\
+		CONT;						\
+	JMP_##OPCODE##_K:					\
+		if ((SIGN##64) DST CMP_OP (SIGN##64) IMM) {	\
+			insn += insn->off;			\
+			CONT_JMP;				\
+		}						\
+		CONT;						\
+	JMP32_##OPCODE##_K:					\
+		if ((SIGN##32) DST CMP_OP (SIGN##32) IMM) {	\
+			insn += insn->off;			\
+			CONT_JMP;				\
+		}						\
+		CONT;
+	COND_JMP(u, JEQ, ==)
+	COND_JMP(u, JNE, !=)
+	COND_JMP(u, JGT, >)
+	COND_JMP(u, JLT, <)
+	COND_JMP(u, JGE, >=)
+	COND_JMP(u, JLE, <=)
+	COND_JMP(u, JSET, &)
+	COND_JMP(s, JSGT, >)
+	COND_JMP(s, JSLT, <)
+	COND_JMP(s, JSGE, >=)
+	COND_JMP(s, JSLE, <=)
+#undef COND_JMP
+	/* STX and ST and LDX*/
+#define LDST(SIZEOP, SIZE)						\
+	STX_MEM_##SIZEOP:						\
+		*(SIZE *)(unsigned long) (DST + insn->off) = SRC;	\
+		CONT;							\
+	ST_MEM_##SIZEOP:						\
+		*(SIZE *)(unsigned long) (DST + insn->off) = IMM;	\
+		CONT;							\
+	LDX_MEM_##SIZEOP:						\
+		DST = *(SIZE *)(unsigned long) (SRC + insn->off);	\
+		CONT;
+
+	LDST(B,   u8)
+	LDST(H,  u16)
+	LDST(W,  u32)
+	LDST(DW, u64)
+#undef LDST
+#define LDX_PROBE(SIZEOP, SIZE)							\
+	LDX_PROBE_MEM_##SIZEOP:							\
+		bpf_probe_read_kernel(&DST, SIZE, (const void *)(long) (SRC + insn->off));	\
+		CONT;
+	LDX_PROBE(B,  1)
+	LDX_PROBE(H,  2)
+	LDX_PROBE(W,  4)
+	LDX_PROBE(DW, 8)
+#undef LDX_PROBE
+
+	STX_XADD_W: /* lock xadd *(u32 *)(dst_reg + off16) += src_reg */
+		atomic_add((u32) SRC, (atomic_t *)(unsigned long)
+			   (DST + insn->off));
+		CONT;
+	STX_XADD_DW: /* lock xadd *(u64 *)(dst_reg + off16) += src_reg */
+		atomic64_add((u64) SRC, (atomic64_t *)(unsigned long)
+			     (DST + insn->off));
+		CONT;
+
+	default_label:
+		/* If we ever reach this, we have a bug somewhere. Die hard here
+		 * instead of just returning 0; we could be somewhere in a subprog,
+		 * so execution could continue otherwise which we do /not/ want.
+		 *
+		 * Note, verifier whitelists all opcodes in bpf_opcode_in_insntable().
+		 */
+		pr_warn("BPF interpreter: unknown opcode %02x\n", insn->code);
+		BUG_ON(1);
+		return 0;
+}
+
+#define PROG_NAME(stack_size) __bpf_prog_run##stack_size
+#define DEFINE_BPF_PROG_RUN(stack_size) \
+static unsigned int PROG_NAME(stack_size)(const void *ctx, const struct bpf_insn *insn) \
+{ \
+	u64 stack[stack_size / sizeof(u64)]; \
+	u64 regs[MAX_BPF_EXT_REG]; \
+\
+	FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)]; \
+	ARG1 = (u64) (unsigned long) ctx; \
+	return ___bpf_prog_run(regs, insn, stack); \
+}
+
+#define PROG_NAME_ARGS(stack_size) __bpf_prog_run_args##stack_size
+#define DEFINE_BPF_PROG_RUN_ARGS(stack_size) \
+static u64 PROG_NAME_ARGS(stack_size)(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5, \
+				      const struct bpf_insn *insn) \
+{ \
+	u64 stack[stack_size / sizeof(u64)]; \
+	u64 regs[MAX_BPF_EXT_REG]; \
+\
+	FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)]; \
+	BPF_R1 = r1; \
+	BPF_R2 = r2; \
+	BPF_R3 = r3; \
+	BPF_R4 = r4; \
+	BPF_R5 = r5; \
+	return ___bpf_prog_run(regs, insn, stack); \
+}
+
+#define EVAL1(FN, X) FN(X)
+#define EVAL2(FN, X, Y...) FN(X) EVAL1(FN, Y)
+#define EVAL3(FN, X, Y...) FN(X) EVAL2(FN, Y)
+#define EVAL4(FN, X, Y...) FN(X) EVAL3(FN, Y)
+#define EVAL5(FN, X, Y...) FN(X) EVAL4(FN, Y)
+#define EVAL6(FN, X, Y...) FN(X) EVAL5(FN, Y)
+
+EVAL6(DEFINE_BPF_PROG_RUN, 32, 64, 96, 128, 160, 192);
+EVAL6(DEFINE_BPF_PROG_RUN, 224, 256, 288, 320, 352, 384);
+EVAL4(DEFINE_BPF_PROG_RUN, 416, 448, 480, 512);
+
+EVAL6(DEFINE_BPF_PROG_RUN_ARGS, 32, 64, 96, 128, 160, 192);
+EVAL6(DEFINE_BPF_PROG_RUN_ARGS, 224, 256, 288, 320, 352, 384);
+EVAL4(DEFINE_BPF_PROG_RUN_ARGS, 416, 448, 480, 512);
+
+#define PROG_NAME_LIST(stack_size) PROG_NAME(stack_size),
+
+static unsigned int (*interpreters[])(const void *ctx,
+				      const struct bpf_insn *insn) = {
+EVAL6(PROG_NAME_LIST, 32, 64, 96, 128, 160, 192)
+EVAL6(PROG_NAME_LIST, 224, 256, 288, 320, 352, 384)
+EVAL4(PROG_NAME_LIST, 416, 448, 480, 512)
+};
+#undef PROG_NAME_LIST
+#define PROG_NAME_LIST(stack_size) PROG_NAME_ARGS(stack_size),
+static u64 (*interpreters_args[])(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5,
+				  const struct bpf_insn *insn) = {
+EVAL6(PROG_NAME_LIST, 32, 64, 96, 128, 160, 192)
+EVAL6(PROG_NAME_LIST, 224, 256, 288, 320, 352, 384)
+EVAL4(PROG_NAME_LIST, 416, 448, 480, 512)
+};
+#undef PROG_NAME_LIST
+
+void bpf_patch_call_args(struct bpf_insn *insn, u32 stack_depth)
+{
+	stack_depth = max_t(u32, stack_depth, 1);
+	insn->off = (s16) insn->imm;
+	insn->imm = interpreters_args[(round_up(stack_depth, 32) / 32) - 1] -
+		__bpf_call_base_args;
+	insn->code = BPF_JMP | BPF_CALL_ARGS;
+}
+#else
+static unsigned int __bpf_prog_ret0_warn(const void *ctx,
+					 const struct bpf_insn *insn)
+{
+	/* If this handler ever gets executed, then BPF_JIT_ALWAYS_ON
+	 * is not working properly, so warn about it!
+	 */
+	WARN_ON_ONCE(1);
+	return 0;
+}
+#endif
+
+void bpf_prog_select_func(struct bpf_prog *fp)
+{
+#ifndef CONFIG_BPF_JIT_ALWAYS_ON
+	u32 stack_depth = max_t(u32, fp->aux->stack_depth, 1);
+
+	fp->bpf_func = interpreters[(round_up(stack_depth, 32) / 32) - 1];
+#else
+	fp->bpf_func = __bpf_prog_ret0_warn;
+#endif
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] bpf: don't rely on GCC __attribute__((optimize)) to disable GCSE
  2020-10-28 17:15 ` [PATCH v2 1/2] bpf: don't rely on GCC __attribute__((optimize)) to disable GCSE Ard Biesheuvel
@ 2020-10-28 21:39   ` Alexei Starovoitov
  2020-10-28 22:15     ` Ard Biesheuvel
  2020-10-29  8:25   ` Geert Uytterhoeven
  2020-10-30  0:34   ` Nick Desaulniers
  2 siblings, 1 reply; 16+ messages in thread
From: Alexei Starovoitov @ 2020-10-28 21:39 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-kernel, netdev, bpf, arnd, Nick Desaulniers, Arvind Sankar,
	Randy Dunlap, Josh Poimboeuf, Thomas Gleixner,
	Alexei Starovoitov, Daniel Borkmann, Peter Zijlstra,
	Geert Uytterhoeven, Kees Cook

On Wed, Oct 28, 2020 at 06:15:05PM +0100, Ard Biesheuvel wrote:
> Commit 3193c0836 ("bpf: Disable GCC -fgcse optimization for
> ___bpf_prog_run()") introduced a __no_fgcse macro that expands to a
> function scope __attribute__((optimize("-fno-gcse"))), to disable a
> GCC specific optimization that was causing trouble on x86 builds, and
> was not expected to have any positive effect in the first place.
> 
> However, as the GCC manual documents, __attribute__((optimize))
> is not for production use, and results in all other optimization
> options to be forgotten for the function in question. This can
> cause all kinds of trouble, but in one particular reported case,
> it causes -fno-asynchronous-unwind-tables to be disregarded,
> resulting in .eh_frame info to be emitted for the function.
> 
> This reverts commit 3193c0836, and instead, it disables the -fgcse
> optimization for the entire source file, but only when building for
> X86 using GCC with CONFIG_BPF_JIT_ALWAYS_ON disabled. Note that the
> original commit states that CONFIG_RETPOLINE=n triggers the issue,
> whereas CONFIG_RETPOLINE=y performs better without the optimization,
> so it is kept disabled in both cases.
> 
> Fixes: 3193c0836 ("bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()")
> Link: https://lore.kernel.org/lkml/CAMuHMdUg0WJHEcq6to0-eODpXPOywLot6UD2=GFHpzoj_hCoBQ@mail.gmail.com/
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  include/linux/compiler-gcc.h   | 2 --
>  include/linux/compiler_types.h | 4 ----
>  kernel/bpf/Makefile            | 6 +++++-
>  kernel/bpf/core.c              | 2 +-
>  4 files changed, 6 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> index d1e3c6896b71..5deb37024574 100644
> --- a/include/linux/compiler-gcc.h
> +++ b/include/linux/compiler-gcc.h
> @@ -175,5 +175,3 @@
>  #else
>  #define __diag_GCC_8(s)
>  #endif
> -
> -#define __no_fgcse __attribute__((optimize("-fno-gcse")))

See my reply in the other thread.
I prefer
-#define __no_fgcse __attribute__((optimize("-fno-gcse")))
+#define __no_fgcse __attribute__((optimize("-fno-gcse,-fno-omit-frame-pointer")))

Potentially with -fno-asynchronous-unwind-tables.

__attribute__((optimize("")) is not as broken as you're claiming to be.
It has quirky gcc internal logic, but it's still widely used
in many software projects.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] bpf: don't rely on GCC __attribute__((optimize)) to disable GCSE
  2020-10-28 21:39   ` Alexei Starovoitov
@ 2020-10-28 22:15     ` Ard Biesheuvel
  2020-10-28 22:59       ` Alexei Starovoitov
  0 siblings, 1 reply; 16+ messages in thread
From: Ard Biesheuvel @ 2020-10-28 22:15 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Linux Kernel Mailing List,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Arnd Bergmann, Nick Desaulniers, Arvind Sankar, Randy Dunlap,
	Josh Poimboeuf, Thomas Gleixner, Alexei Starovoitov,
	Daniel Borkmann, Peter Zijlstra, Geert Uytterhoeven, Kees Cook

On Wed, 28 Oct 2020 at 22:39, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Oct 28, 2020 at 06:15:05PM +0100, Ard Biesheuvel wrote:
> > Commit 3193c0836 ("bpf: Disable GCC -fgcse optimization for
> > ___bpf_prog_run()") introduced a __no_fgcse macro that expands to a
> > function scope __attribute__((optimize("-fno-gcse"))), to disable a
> > GCC specific optimization that was causing trouble on x86 builds, and
> > was not expected to have any positive effect in the first place.
> >
> > However, as the GCC manual documents, __attribute__((optimize))
> > is not for production use, and results in all other optimization
> > options to be forgotten for the function in question. This can
> > cause all kinds of trouble, but in one particular reported case,
> > it causes -fno-asynchronous-unwind-tables to be disregarded,
> > resulting in .eh_frame info to be emitted for the function.
> >
> > This reverts commit 3193c0836, and instead, it disables the -fgcse
> > optimization for the entire source file, but only when building for
> > X86 using GCC with CONFIG_BPF_JIT_ALWAYS_ON disabled. Note that the
> > original commit states that CONFIG_RETPOLINE=n triggers the issue,
> > whereas CONFIG_RETPOLINE=y performs better without the optimization,
> > so it is kept disabled in both cases.
> >
> > Fixes: 3193c0836 ("bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()")
> > Link: https://lore.kernel.org/lkml/CAMuHMdUg0WJHEcq6to0-eODpXPOywLot6UD2=GFHpzoj_hCoBQ@mail.gmail.com/
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> >  include/linux/compiler-gcc.h   | 2 --
> >  include/linux/compiler_types.h | 4 ----
> >  kernel/bpf/Makefile            | 6 +++++-
> >  kernel/bpf/core.c              | 2 +-
> >  4 files changed, 6 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> > index d1e3c6896b71..5deb37024574 100644
> > --- a/include/linux/compiler-gcc.h
> > +++ b/include/linux/compiler-gcc.h
> > @@ -175,5 +175,3 @@
> >  #else
> >  #define __diag_GCC_8(s)
> >  #endif
> > -
> > -#define __no_fgcse __attribute__((optimize("-fno-gcse")))
>
> See my reply in the other thread.
> I prefer
> -#define __no_fgcse __attribute__((optimize("-fno-gcse")))
> +#define __no_fgcse __attribute__((optimize("-fno-gcse,-fno-omit-frame-pointer")))
>
> Potentially with -fno-asynchronous-unwind-tables.
>

So how would that work? arm64 has the following:

KBUILD_CFLAGS += -fno-asynchronous-unwind-tables -fno-unwind-tables

ifeq ($(CONFIG_SHADOW_CALL_STACK), y)
KBUILD_CFLAGS += -ffixed-x18
endif

and it adds -fpatchable-function-entry=2 for compilers that support
it, but only when CONFIG_FTRACE is enabled.

Also, as Nick pointed out, -fno-gcse does not work on Clang.

Every architecture will have a different set of requirements here. And
there is no way of knowing which -f options are disregarded when you
use the function attribute.

So how on earth are you going to #define __no-fgcse correctly for
every configuration imaginable?

> __attribute__((optimize("")) is not as broken as you're claiming to be.
> It has quirky gcc internal logic, but it's still widely used
> in many software projects.

So it's fine because it is only a little bit broken? I'm sorry, but
that makes no sense whatsoever.

If you insist on sticking with this broken construct, can you please
make it GCC/x86-only at least?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] bpf: don't rely on GCC __attribute__((optimize)) to disable GCSE
  2020-10-28 22:15     ` Ard Biesheuvel
@ 2020-10-28 22:59       ` Alexei Starovoitov
  2020-10-28 23:10         ` Ard Biesheuvel
  0 siblings, 1 reply; 16+ messages in thread
From: Alexei Starovoitov @ 2020-10-28 22:59 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Linux Kernel Mailing List,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Arnd Bergmann, Nick Desaulniers, Arvind Sankar, Randy Dunlap,
	Josh Poimboeuf, Thomas Gleixner, Alexei Starovoitov,
	Daniel Borkmann, Peter Zijlstra, Geert Uytterhoeven, Kees Cook

On Wed, Oct 28, 2020 at 11:15:04PM +0100, Ard Biesheuvel wrote:
> On Wed, 28 Oct 2020 at 22:39, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Wed, Oct 28, 2020 at 06:15:05PM +0100, Ard Biesheuvel wrote:
> > > Commit 3193c0836 ("bpf: Disable GCC -fgcse optimization for
> > > ___bpf_prog_run()") introduced a __no_fgcse macro that expands to a
> > > function scope __attribute__((optimize("-fno-gcse"))), to disable a
> > > GCC specific optimization that was causing trouble on x86 builds, and
> > > was not expected to have any positive effect in the first place.
> > >
> > > However, as the GCC manual documents, __attribute__((optimize))
> > > is not for production use, and results in all other optimization
> > > options to be forgotten for the function in question. This can
> > > cause all kinds of trouble, but in one particular reported case,
> > > it causes -fno-asynchronous-unwind-tables to be disregarded,
> > > resulting in .eh_frame info to be emitted for the function.
> > >
> > > This reverts commit 3193c0836, and instead, it disables the -fgcse
> > > optimization for the entire source file, but only when building for
> > > X86 using GCC with CONFIG_BPF_JIT_ALWAYS_ON disabled. Note that the
> > > original commit states that CONFIG_RETPOLINE=n triggers the issue,
> > > whereas CONFIG_RETPOLINE=y performs better without the optimization,
> > > so it is kept disabled in both cases.
> > >
> > > Fixes: 3193c0836 ("bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()")
> > > Link: https://lore.kernel.org/lkml/CAMuHMdUg0WJHEcq6to0-eODpXPOywLot6UD2=GFHpzoj_hCoBQ@mail.gmail.com/
> > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > ---
> > >  include/linux/compiler-gcc.h   | 2 --
> > >  include/linux/compiler_types.h | 4 ----
> > >  kernel/bpf/Makefile            | 6 +++++-
> > >  kernel/bpf/core.c              | 2 +-
> > >  4 files changed, 6 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> > > index d1e3c6896b71..5deb37024574 100644
> > > --- a/include/linux/compiler-gcc.h
> > > +++ b/include/linux/compiler-gcc.h
> > > @@ -175,5 +175,3 @@
> > >  #else
> > >  #define __diag_GCC_8(s)
> > >  #endif
> > > -
> > > -#define __no_fgcse __attribute__((optimize("-fno-gcse")))
> >
> > See my reply in the other thread.
> > I prefer
> > -#define __no_fgcse __attribute__((optimize("-fno-gcse")))
> > +#define __no_fgcse __attribute__((optimize("-fno-gcse,-fno-omit-frame-pointer")))
> >
> > Potentially with -fno-asynchronous-unwind-tables.
> >
> 
> So how would that work? arm64 has the following:
> 
> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables -fno-unwind-tables
> 
> ifeq ($(CONFIG_SHADOW_CALL_STACK), y)
> KBUILD_CFLAGS += -ffixed-x18
> endif
> 
> and it adds -fpatchable-function-entry=2 for compilers that support
> it, but only when CONFIG_FTRACE is enabled.

I think you're assuming that GCC drops all flags when it sees __attribute__((optimize)).
That's not the case.

> Also, as Nick pointed out, -fno-gcse does not work on Clang.

yes and what's the point?
#define __no_fgcse is GCC only. clang doesn't need this workaround.

> Every architecture will have a different set of requirements here. And
> there is no way of knowing which -f options are disregarded when you
> use the function attribute.
> 
> So how on earth are you going to #define __no-fgcse correctly for
> every configuration imaginable?
> 
> > __attribute__((optimize("")) is not as broken as you're claiming to be.
> > It has quirky gcc internal logic, but it's still widely used
> > in many software projects.
> 
> So it's fine because it is only a little bit broken? I'm sorry, but
> that makes no sense whatsoever.
> 
> If you insist on sticking with this broken construct, can you please
> make it GCC/x86-only at least?

I'm totally fine with making
#define __no_fgcse __attribute__((optimize("-fno-gcse,-fno-omit-frame-pointer")))
to be gcc+x86 only.
I'd like to get rid of it, but objtool is not smart enough to understand
generated asm without it.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] bpf: don't rely on GCC __attribute__((optimize)) to disable GCSE
  2020-10-28 22:59       ` Alexei Starovoitov
@ 2020-10-28 23:10         ` Ard Biesheuvel
  2020-10-28 23:20           ` Alexei Starovoitov
  0 siblings, 1 reply; 16+ messages in thread
From: Ard Biesheuvel @ 2020-10-28 23:10 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Linux Kernel Mailing List,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Arnd Bergmann, Nick Desaulniers, Arvind Sankar, Randy Dunlap,
	Josh Poimboeuf, Thomas Gleixner, Alexei Starovoitov,
	Daniel Borkmann, Peter Zijlstra, Geert Uytterhoeven, Kees Cook

On Wed, 28 Oct 2020 at 23:59, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Oct 28, 2020 at 11:15:04PM +0100, Ard Biesheuvel wrote:
> > On Wed, 28 Oct 2020 at 22:39, Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Wed, Oct 28, 2020 at 06:15:05PM +0100, Ard Biesheuvel wrote:
> > > > Commit 3193c0836 ("bpf: Disable GCC -fgcse optimization for
> > > > ___bpf_prog_run()") introduced a __no_fgcse macro that expands to a
> > > > function scope __attribute__((optimize("-fno-gcse"))), to disable a
> > > > GCC specific optimization that was causing trouble on x86 builds, and
> > > > was not expected to have any positive effect in the first place.
> > > >
> > > > However, as the GCC manual documents, __attribute__((optimize))
> > > > is not for production use, and results in all other optimization
> > > > options to be forgotten for the function in question. This can
> > > > cause all kinds of trouble, but in one particular reported case,
> > > > it causes -fno-asynchronous-unwind-tables to be disregarded,
> > > > resulting in .eh_frame info to be emitted for the function.
> > > >
> > > > This reverts commit 3193c0836, and instead, it disables the -fgcse
> > > > optimization for the entire source file, but only when building for
> > > > X86 using GCC with CONFIG_BPF_JIT_ALWAYS_ON disabled. Note that the
> > > > original commit states that CONFIG_RETPOLINE=n triggers the issue,
> > > > whereas CONFIG_RETPOLINE=y performs better without the optimization,
> > > > so it is kept disabled in both cases.
> > > >
> > > > Fixes: 3193c0836 ("bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()")
> > > > Link: https://lore.kernel.org/lkml/CAMuHMdUg0WJHEcq6to0-eODpXPOywLot6UD2=GFHpzoj_hCoBQ@mail.gmail.com/
> > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > > ---
> > > >  include/linux/compiler-gcc.h   | 2 --
> > > >  include/linux/compiler_types.h | 4 ----
> > > >  kernel/bpf/Makefile            | 6 +++++-
> > > >  kernel/bpf/core.c              | 2 +-
> > > >  4 files changed, 6 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> > > > index d1e3c6896b71..5deb37024574 100644
> > > > --- a/include/linux/compiler-gcc.h
> > > > +++ b/include/linux/compiler-gcc.h
> > > > @@ -175,5 +175,3 @@
> > > >  #else
> > > >  #define __diag_GCC_8(s)
> > > >  #endif
> > > > -
> > > > -#define __no_fgcse __attribute__((optimize("-fno-gcse")))
> > >
> > > See my reply in the other thread.
> > > I prefer
> > > -#define __no_fgcse __attribute__((optimize("-fno-gcse")))
> > > +#define __no_fgcse __attribute__((optimize("-fno-gcse,-fno-omit-frame-pointer")))
> > >
> > > Potentially with -fno-asynchronous-unwind-tables.
> > >
> >
> > So how would that work? arm64 has the following:
> >
> > KBUILD_CFLAGS += -fno-asynchronous-unwind-tables -fno-unwind-tables
> >
> > ifeq ($(CONFIG_SHADOW_CALL_STACK), y)
> > KBUILD_CFLAGS += -ffixed-x18
> > endif
> >
> > and it adds -fpatchable-function-entry=2 for compilers that support
> > it, but only when CONFIG_FTRACE is enabled.
>
> I think you're assuming that GCC drops all flags when it sees __attribute__((optimize)).
> That's not the case.
>

So which flags does it drop, and which doesn't it drop? Is that
documented somewhere? Is that the same for all versions of GCC?

> > Also, as Nick pointed out, -fno-gcse does not work on Clang.
>
> yes and what's the point?
> #define __no_fgcse is GCC only. clang doesn't need this workaround.
>

Ah ok, that's at least something.

> > Every architecture will have a different set of requirements here. And
> > there is no way of knowing which -f options are disregarded when you
> > use the function attribute.
> >
> > So how on earth are you going to #define __no-fgcse correctly for
> > every configuration imaginable?
> >
> > > __attribute__((optimize("")) is not as broken as you're claiming to be.
> > > It has quirky gcc internal logic, but it's still widely used
> > > in many software projects.
> >
> > So it's fine because it is only a little bit broken? I'm sorry, but
> > that makes no sense whatsoever.
> >
> > If you insist on sticking with this broken construct, can you please
> > make it GCC/x86-only at least?
>
> I'm totally fine with making
> #define __no_fgcse __attribute__((optimize("-fno-gcse,-fno-omit-frame-pointer")))
> to be gcc+x86 only.
> I'd like to get rid of it, but objtool is not smart enough to understand
> generated asm without it.

I'll defer to the x86 folks to make the final call here, but I would
be perfectly happy doing

index d1e3c6896b71..68ddb91fbcc6 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -176,4 +176,6 @@
 #define __diag_GCC_8(s)
 #endif

+#ifdef CONFIG_X86
 #define __no_fgcse __attribute__((optimize("-fno-gcse")))
+#endif

and end the conversation here, because I honestly cannot wrap my head
around the fact that you are willing to work around an x86 specific
objtool shortcoming by arbitrarily disabling some GCC optimization for
all architectures, using a construct that may or may not affect other
compiler settings in unpredictable ways, where the compiler is being
used to compile a BPF language runtime for executing BPF programs
inside the kernel.

What on earth could go wrong?

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] bpf: don't rely on GCC __attribute__((optimize)) to disable GCSE
  2020-10-28 23:10         ` Ard Biesheuvel
@ 2020-10-28 23:20           ` Alexei Starovoitov
  2020-10-29  2:57             ` Arvind Sankar
  2020-10-30  0:28             ` Nick Desaulniers
  0 siblings, 2 replies; 16+ messages in thread
From: Alexei Starovoitov @ 2020-10-28 23:20 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Linux Kernel Mailing List,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Arnd Bergmann, Nick Desaulniers, Arvind Sankar, Randy Dunlap,
	Josh Poimboeuf, Thomas Gleixner, Alexei Starovoitov,
	Daniel Borkmann, Peter Zijlstra, Geert Uytterhoeven, Kees Cook

On Thu, Oct 29, 2020 at 12:10:52AM +0100, Ard Biesheuvel wrote:
> On Wed, 28 Oct 2020 at 23:59, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Wed, Oct 28, 2020 at 11:15:04PM +0100, Ard Biesheuvel wrote:
> > > On Wed, 28 Oct 2020 at 22:39, Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > On Wed, Oct 28, 2020 at 06:15:05PM +0100, Ard Biesheuvel wrote:
> > > > > Commit 3193c0836 ("bpf: Disable GCC -fgcse optimization for
> > > > > ___bpf_prog_run()") introduced a __no_fgcse macro that expands to a
> > > > > function scope __attribute__((optimize("-fno-gcse"))), to disable a
> > > > > GCC specific optimization that was causing trouble on x86 builds, and
> > > > > was not expected to have any positive effect in the first place.
> > > > >
> > > > > However, as the GCC manual documents, __attribute__((optimize))
> > > > > is not for production use, and results in all other optimization
> > > > > options to be forgotten for the function in question. This can
> > > > > cause all kinds of trouble, but in one particular reported case,
> > > > > it causes -fno-asynchronous-unwind-tables to be disregarded,
> > > > > resulting in .eh_frame info to be emitted for the function.
> > > > >
> > > > > This reverts commit 3193c0836, and instead, it disables the -fgcse
> > > > > optimization for the entire source file, but only when building for
> > > > > X86 using GCC with CONFIG_BPF_JIT_ALWAYS_ON disabled. Note that the
> > > > > original commit states that CONFIG_RETPOLINE=n triggers the issue,
> > > > > whereas CONFIG_RETPOLINE=y performs better without the optimization,
> > > > > so it is kept disabled in both cases.
> > > > >
> > > > > Fixes: 3193c0836 ("bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()")
> > > > > Link: https://lore.kernel.org/lkml/CAMuHMdUg0WJHEcq6to0-eODpXPOywLot6UD2=GFHpzoj_hCoBQ@mail.gmail.com/
> > > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > > > ---
> > > > >  include/linux/compiler-gcc.h   | 2 --
> > > > >  include/linux/compiler_types.h | 4 ----
> > > > >  kernel/bpf/Makefile            | 6 +++++-
> > > > >  kernel/bpf/core.c              | 2 +-
> > > > >  4 files changed, 6 insertions(+), 8 deletions(-)
> > > > >
> > > > > diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> > > > > index d1e3c6896b71..5deb37024574 100644
> > > > > --- a/include/linux/compiler-gcc.h
> > > > > +++ b/include/linux/compiler-gcc.h
> > > > > @@ -175,5 +175,3 @@
> > > > >  #else
> > > > >  #define __diag_GCC_8(s)
> > > > >  #endif
> > > > > -
> > > > > -#define __no_fgcse __attribute__((optimize("-fno-gcse")))
> > > >
> > > > See my reply in the other thread.
> > > > I prefer
> > > > -#define __no_fgcse __attribute__((optimize("-fno-gcse")))
> > > > +#define __no_fgcse __attribute__((optimize("-fno-gcse,-fno-omit-frame-pointer")))
> > > >
> > > > Potentially with -fno-asynchronous-unwind-tables.
> > > >
> > >
> > > So how would that work? arm64 has the following:
> > >
> > > KBUILD_CFLAGS += -fno-asynchronous-unwind-tables -fno-unwind-tables
> > >
> > > ifeq ($(CONFIG_SHADOW_CALL_STACK), y)
> > > KBUILD_CFLAGS += -ffixed-x18
> > > endif
> > >
> > > and it adds -fpatchable-function-entry=2 for compilers that support
> > > it, but only when CONFIG_FTRACE is enabled.
> >
> > I think you're assuming that GCC drops all flags when it sees __attribute__((optimize)).
> > That's not the case.
> >
> 
> So which flags does it drop, and which doesn't it drop? Is that
> documented somewhere? Is that the same for all versions of GCC?
> 
> > > Also, as Nick pointed out, -fno-gcse does not work on Clang.
> >
> > yes and what's the point?
> > #define __no_fgcse is GCC only. clang doesn't need this workaround.
> >
> 
> Ah ok, that's at least something.
> 
> > > Every architecture will have a different set of requirements here. And
> > > there is no way of knowing which -f options are disregarded when you
> > > use the function attribute.
> > >
> > > So how on earth are you going to #define __no-fgcse correctly for
> > > every configuration imaginable?
> > >
> > > > __attribute__((optimize("")) is not as broken as you're claiming to be.
> > > > It has quirky gcc internal logic, but it's still widely used
> > > > in many software projects.
> > >
> > > So it's fine because it is only a little bit broken? I'm sorry, but
> > > that makes no sense whatsoever.
> > >
> > > If you insist on sticking with this broken construct, can you please
> > > make it GCC/x86-only at least?
> >
> > I'm totally fine with making
> > #define __no_fgcse __attribute__((optimize("-fno-gcse,-fno-omit-frame-pointer")))
> > to be gcc+x86 only.
> > I'd like to get rid of it, but objtool is not smart enough to understand
> > generated asm without it.
> 
> I'll defer to the x86 folks to make the final call here, but I would
> be perfectly happy doing
> 
> index d1e3c6896b71..68ddb91fbcc6 100644
> --- a/include/linux/compiler-gcc.h
> +++ b/include/linux/compiler-gcc.h
> @@ -176,4 +176,6 @@
>  #define __diag_GCC_8(s)
>  #endif
> 
> +#ifdef CONFIG_X86
>  #define __no_fgcse __attribute__((optimize("-fno-gcse")))
> +#endif

If you're going to submit this patch could you please add
,-fno-omit-frame-pointer
to the above as well?

> and end the conversation here, because I honestly cannot wrap my head
> around the fact that you are willing to work around an x86 specific
> objtool shortcoming by arbitrarily disabling some GCC optimization for
> all architectures, using a construct that may or may not affect other
> compiler settings in unpredictable ways, where the compiler is being
> used to compile a BPF language runtime for executing BPF programs
> inside the kernel.
> 
> What on earth could go wrong?

Frankly I'm move worried that -Os will generate incorrect code.
All compilers have bugs. Kernel has bugs. What can go wrong?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] bpf: don't rely on GCC __attribute__((optimize)) to disable GCSE
  2020-10-28 23:20           ` Alexei Starovoitov
@ 2020-10-29  2:57             ` Arvind Sankar
  2020-10-29 20:31               ` Segher Boessenkool
  2020-10-30  0:28             ` Nick Desaulniers
  1 sibling, 1 reply; 16+ messages in thread
From: Arvind Sankar @ 2020-10-29  2:57 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Ard Biesheuvel, Linux Kernel Mailing List,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Arnd Bergmann, Nick Desaulniers, Arvind Sankar, Randy Dunlap,
	Josh Poimboeuf, Thomas Gleixner, Alexei Starovoitov,
	Daniel Borkmann, Peter Zijlstra, Geert Uytterhoeven, Kees Cook,
	linux-toolchains

On Wed, Oct 28, 2020 at 04:20:01PM -0700, Alexei Starovoitov wrote:
> On Thu, Oct 29, 2020 at 12:10:52AM +0100, Ard Biesheuvel wrote:
> > On Wed, 28 Oct 2020 at 23:59, Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Wed, Oct 28, 2020 at 11:15:04PM +0100, Ard Biesheuvel wrote:
> > > > On Wed, 28 Oct 2020 at 22:39, Alexei Starovoitov
> > > > <alexei.starovoitov@gmail.com> wrote:
> > > > >
> > > > > On Wed, Oct 28, 2020 at 06:15:05PM +0100, Ard Biesheuvel wrote:
> > > > > > Commit 3193c0836 ("bpf: Disable GCC -fgcse optimization for
> > > > > > ___bpf_prog_run()") introduced a __no_fgcse macro that expands to a
> > > > > > function scope __attribute__((optimize("-fno-gcse"))), to disable a
> > > > > > GCC specific optimization that was causing trouble on x86 builds, and
> > > > > > was not expected to have any positive effect in the first place.
> > > > > >
> > > > > > However, as the GCC manual documents, __attribute__((optimize))
> > > > > > is not for production use, and results in all other optimization
> > > > > > options to be forgotten for the function in question. This can
> > > > > > cause all kinds of trouble, but in one particular reported case,
> > > > > > it causes -fno-asynchronous-unwind-tables to be disregarded,
> > > > > > resulting in .eh_frame info to be emitted for the function.
> > > > > >
> > > > > > This reverts commit 3193c0836, and instead, it disables the -fgcse
> > > > > > optimization for the entire source file, but only when building for
> > > > > > X86 using GCC with CONFIG_BPF_JIT_ALWAYS_ON disabled. Note that the
> > > > > > original commit states that CONFIG_RETPOLINE=n triggers the issue,
> > > > > > whereas CONFIG_RETPOLINE=y performs better without the optimization,
> > > > > > so it is kept disabled in both cases.
> > > > > >
> > > > > > Fixes: 3193c0836 ("bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()")
> > > > > > Link: https://lore.kernel.org/lkml/CAMuHMdUg0WJHEcq6to0-eODpXPOywLot6UD2=GFHpzoj_hCoBQ@mail.gmail.com/
> > > > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > > > > ---
> > > > > >  include/linux/compiler-gcc.h   | 2 --
> > > > > >  include/linux/compiler_types.h | 4 ----
> > > > > >  kernel/bpf/Makefile            | 6 +++++-
> > > > > >  kernel/bpf/core.c              | 2 +-
> > > > > >  4 files changed, 6 insertions(+), 8 deletions(-)
> > > > > >
> > > > > > diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> > > > > > index d1e3c6896b71..5deb37024574 100644
> > > > > > --- a/include/linux/compiler-gcc.h
> > > > > > +++ b/include/linux/compiler-gcc.h
> > > > > > @@ -175,5 +175,3 @@
> > > > > >  #else
> > > > > >  #define __diag_GCC_8(s)
> > > > > >  #endif
> > > > > > -
> > > > > > -#define __no_fgcse __attribute__((optimize("-fno-gcse")))
> > > > >
> > > > > See my reply in the other thread.
> > > > > I prefer
> > > > > -#define __no_fgcse __attribute__((optimize("-fno-gcse")))
> > > > > +#define __no_fgcse __attribute__((optimize("-fno-gcse,-fno-omit-frame-pointer")))
> > > > >
> > > > > Potentially with -fno-asynchronous-unwind-tables.
> > > > >
> > > >
> > > > So how would that work? arm64 has the following:
> > > >
> > > > KBUILD_CFLAGS += -fno-asynchronous-unwind-tables -fno-unwind-tables
> > > >
> > > > ifeq ($(CONFIG_SHADOW_CALL_STACK), y)
> > > > KBUILD_CFLAGS += -ffixed-x18
> > > > endif
> > > >
> > > > and it adds -fpatchable-function-entry=2 for compilers that support
> > > > it, but only when CONFIG_FTRACE is enabled.
> > >
> > > I think you're assuming that GCC drops all flags when it sees __attribute__((optimize)).
> > > That's not the case.
> > >
> > 
> > So which flags does it drop, and which doesn't it drop? Is that
> > documented somewhere? Is that the same for all versions of GCC?
> > 
> > > > Also, as Nick pointed out, -fno-gcse does not work on Clang.
> > >
> > > yes and what's the point?
> > > #define __no_fgcse is GCC only. clang doesn't need this workaround.
> > >
> > 
> > Ah ok, that's at least something.
> > 
> > > > Every architecture will have a different set of requirements here. And
> > > > there is no way of knowing which -f options are disregarded when you
> > > > use the function attribute.
> > > >
> > > > So how on earth are you going to #define __no-fgcse correctly for
> > > > every configuration imaginable?
> > > >
> > > > > __attribute__((optimize("")) is not as broken as you're claiming to be.
> > > > > It has quirky gcc internal logic, but it's still widely used
> > > > > in many software projects.
> > > >
> > > > So it's fine because it is only a little bit broken? I'm sorry, but
> > > > that makes no sense whatsoever.
> > > >
> > > > If you insist on sticking with this broken construct, can you please
> > > > make it GCC/x86-only at least?
> > >
> > > I'm totally fine with making
> > > #define __no_fgcse __attribute__((optimize("-fno-gcse,-fno-omit-frame-pointer")))
> > > to be gcc+x86 only.
> > > I'd like to get rid of it, but objtool is not smart enough to understand
> > > generated asm without it.
> > 
> > I'll defer to the x86 folks to make the final call here, but I would
> > be perfectly happy doing
> > 
> > index d1e3c6896b71..68ddb91fbcc6 100644
> > --- a/include/linux/compiler-gcc.h
> > +++ b/include/linux/compiler-gcc.h
> > @@ -176,4 +176,6 @@
> >  #define __diag_GCC_8(s)
> >  #endif
> > 
> > +#ifdef CONFIG_X86
> >  #define __no_fgcse __attribute__((optimize("-fno-gcse")))
> > +#endif
> 
> If you're going to submit this patch could you please add
> ,-fno-omit-frame-pointer
> to the above as well?
> 
> > and end the conversation here, because I honestly cannot wrap my head
> > around the fact that you are willing to work around an x86 specific
> > objtool shortcoming by arbitrarily disabling some GCC optimization for
> > all architectures, using a construct that may or may not affect other
> > compiler settings in unpredictable ways, where the compiler is being
> > used to compile a BPF language runtime for executing BPF programs
> > inside the kernel.
> > 
> > What on earth could go wrong?
> 
> Frankly I'm move worried that -Os will generate incorrect code.
> All compilers have bugs. Kernel has bugs. What can go wrong?

+linux-toolchains. GCC updated the documentation in 7.x to discourage
people from using the optimize attribute.

https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=893100c3fa9b3049ce84dcc0c1a839ddc7a21387

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] bpf: don't rely on GCC __attribute__((optimize)) to disable GCSE
  2020-10-28 17:15 ` [PATCH v2 1/2] bpf: don't rely on GCC __attribute__((optimize)) to disable GCSE Ard Biesheuvel
  2020-10-28 21:39   ` Alexei Starovoitov
@ 2020-10-29  8:25   ` Geert Uytterhoeven
  2020-10-30  0:34   ` Nick Desaulniers
  2 siblings, 0 replies; 16+ messages in thread
From: Geert Uytterhoeven @ 2020-10-29  8:25 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Linux Kernel Mailing List, netdev, bpf, Arnd Bergmann,
	Nick Desaulniers, Arvind Sankar, Randy Dunlap, Josh Poimboeuf,
	Thomas Gleixner, Alexei Starovoitov, Daniel Borkmann,
	Peter Zijlstra, Kees Cook

On Wed, Oct 28, 2020 at 6:15 PM Ard Biesheuvel <ardb@kernel.org> wrote:
> Commit 3193c0836 ("bpf: Disable GCC -fgcse optimization for
> ___bpf_prog_run()") introduced a __no_fgcse macro that expands to a
> function scope __attribute__((optimize("-fno-gcse"))), to disable a
> GCC specific optimization that was causing trouble on x86 builds, and
> was not expected to have any positive effect in the first place.
>
> However, as the GCC manual documents, __attribute__((optimize))
> is not for production use, and results in all other optimization
> options to be forgotten for the function in question. This can
> cause all kinds of trouble, but in one particular reported case,
> it causes -fno-asynchronous-unwind-tables to be disregarded,
> resulting in .eh_frame info to be emitted for the function.
>
> This reverts commit 3193c0836, and instead, it disables the -fgcse
> optimization for the entire source file, but only when building for
> X86 using GCC with CONFIG_BPF_JIT_ALWAYS_ON disabled. Note that the
> original commit states that CONFIG_RETPOLINE=n triggers the issue,
> whereas CONFIG_RETPOLINE=y performs better without the optimization,
> so it is kept disabled in both cases.
>
> Fixes: 3193c0836 ("bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()")
> Link: https://lore.kernel.org/lkml/CAMuHMdUg0WJHEcq6to0-eODpXPOywLot6UD2=GFHpzoj_hCoBQ@mail.gmail.com/
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

(probably you missed by tag on v1 due to kernel.org hickups)

Thanks, this gets rid of the following warning, which you may
want to quote in the patch description:

    aarch64-linux-gnu-ld: warning: orphan section `.eh_frame' from
`kernel/bpf/core.o' being placed in section `.eh_frame'

Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

Gr{oetje,eeting}s,

                        Geert


--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] bpf: don't rely on GCC __attribute__((optimize)) to disable GCSE
  2020-10-29  2:57             ` Arvind Sankar
@ 2020-10-29 20:31               ` Segher Boessenkool
  2020-10-29 22:13                 ` Ard Biesheuvel
  0 siblings, 1 reply; 16+ messages in thread
From: Segher Boessenkool @ 2020-10-29 20:31 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Alexei Starovoitov, Ard Biesheuvel, Linux Kernel Mailing List,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Arnd Bergmann, Nick Desaulniers, Randy Dunlap, Josh Poimboeuf,
	Thomas Gleixner, Alexei Starovoitov, Daniel Borkmann,
	Peter Zijlstra, Geert Uytterhoeven, Kees Cook, linux-toolchains

On Wed, Oct 28, 2020 at 10:57:45PM -0400, Arvind Sankar wrote:
> On Wed, Oct 28, 2020 at 04:20:01PM -0700, Alexei Starovoitov wrote:
> > All compilers have bugs. Kernel has bugs. What can go wrong?

Heh.

> +linux-toolchains. GCC updated the documentation in 7.x to discourage
> people from using the optimize attribute.
> 
> https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=893100c3fa9b3049ce84dcc0c1a839ddc7a21387

https://patchwork.ozlabs.org/project/gcc/patch/20151213081911.GA320@x4/
has all the discussion around that GCC patch.


Segher

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] bpf: don't rely on GCC __attribute__((optimize)) to disable GCSE
  2020-10-29 20:31               ` Segher Boessenkool
@ 2020-10-29 22:13                 ` Ard Biesheuvel
  0 siblings, 0 replies; 16+ messages in thread
From: Ard Biesheuvel @ 2020-10-29 22:13 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Arvind Sankar, Alexei Starovoitov, Linux Kernel Mailing List,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Arnd Bergmann, Nick Desaulniers, Randy Dunlap, Josh Poimboeuf,
	Thomas Gleixner, Alexei Starovoitov, Daniel Borkmann,
	Peter Zijlstra, Geert Uytterhoeven, Kees Cook, linux-toolchains

On Thu, 29 Oct 2020 at 21:35, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> On Wed, Oct 28, 2020 at 10:57:45PM -0400, Arvind Sankar wrote:
> > On Wed, Oct 28, 2020 at 04:20:01PM -0700, Alexei Starovoitov wrote:
> > > All compilers have bugs. Kernel has bugs. What can go wrong?
>
> Heh.
>
> > +linux-toolchains. GCC updated the documentation in 7.x to discourage
> > people from using the optimize attribute.
> >
> > https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=893100c3fa9b3049ce84dcc0c1a839ddc7a21387
>
> https://patchwork.ozlabs.org/project/gcc/patch/20151213081911.GA320@x4/
> has all the discussion around that GCC patch.
>

For everyone's convenience, let me reproduce here how the GCC
developers describe this attribute on their wiki [0]:

"""
Currently (2015), this attribute is known to have several critical
bugs (PR37565, PR63401, PR60580, PR50782). Using it may produce not
effect at all or lead to wrong-code.

Quoting one GCC maintainer: "I consider the optimize attribute code
seriously broken and unmaintained (but sometimes useful for debugging
- and only that)." source

Unfortunately, the people who added it are either not working on GCC
anymore or not interested in fixing it. Do not try to guess how it is
supposed to work by trial-and-error. There is not a list of options
that are safe to use or known to be broken. Bug reports about the
optimize attribute being broken will probably be closed as WONTFIX
(PR59262), thus it is not worth to open new ones. If it works for you
for a given version of GCC, it doesn't mean it will work on a
different machine or a different version.

The only realistic choices are to not use it, to use it and accept its
brokenness (current or future one, since it is unmaintained), or join
GCC and fix it (perhaps motivating other people along the way to join
your effort).
"""

[0] https://gcc.gnu.org/wiki/FAQ#optimize_attribute_broken

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] bpf: don't rely on GCC __attribute__((optimize)) to disable GCSE
  2020-10-28 23:20           ` Alexei Starovoitov
  2020-10-29  2:57             ` Arvind Sankar
@ 2020-10-30  0:28             ` Nick Desaulniers
  2020-10-30  3:22               ` Alexei Starovoitov
  1 sibling, 1 reply; 16+ messages in thread
From: Nick Desaulniers @ 2020-10-30  0:28 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Ard Biesheuvel, Linux Kernel Mailing List,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Arnd Bergmann, Arvind Sankar, Randy Dunlap, Josh Poimboeuf,
	Thomas Gleixner, Alexei Starovoitov, Daniel Borkmann,
	Peter Zijlstra, Geert Uytterhoeven, Kees Cook

On Wed, Oct 28, 2020 at 4:20 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Oct 29, 2020 at 12:10:52AM +0100, Ard Biesheuvel wrote:
> > On Wed, 28 Oct 2020 at 23:59, Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > I'm totally fine with making
> > > #define __no_fgcse __attribute__((optimize("-fno-gcse,-fno-omit-frame-pointer")))
> > > to be gcc+x86 only.
> > > I'd like to get rid of it, but objtool is not smart enough to understand
> > > generated asm without it.
> >
> > I'll defer to the x86 folks to make the final call here, but I would
> > be perfectly happy doing
> >
> > index d1e3c6896b71..68ddb91fbcc6 100644
> > --- a/include/linux/compiler-gcc.h
> > +++ b/include/linux/compiler-gcc.h
> > @@ -176,4 +176,6 @@
> >  #define __diag_GCC_8(s)
> >  #endif
> >
> > +#ifdef CONFIG_X86
> >  #define __no_fgcse __attribute__((optimize("-fno-gcse")))
> > +#endif
>
> If you're going to submit this patch could you please add
> ,-fno-omit-frame-pointer
> to the above as well?

You'll be playing whack-a-mole with other -f flags that should have
been used, which changes even based on the config.  The -fsanitize=
flags come to mind with the sanitizers.

defconfig shows:
$ make LLVM=1 -j71 kernel/bpf/core.o V=1 2>&1 | grep "\-f"
the following -f flags set:

-fno-strict-aliasing
-fno-common
-fshort-wchar
-fno-PIE
-fno-asynchronous-unwind-tables
-fno-delete-null-pointer-checks
-fomit-frame-pointer
-fmacro-prefix-map=./=
-fstack-protector-strong

We already know that -fno-asynchronous-unwind-tables get dropped,
hence this patch.  And we know -fomit-frame-pointer or
-fno-omit-frame-pointer I guess gets dropped, hence your ask.  We
might not know the full extent which other flags get dropped with the
optimize attribute, but I'd argue that my list above can all result in
pretty bad bugs when accidentally omitted (ok, maybe not -fshort-wchar
or -fmacro-prefix-map, idk what those do) or when mixed with code that
has different values those flags control.  Searching GCC's bug tracker
for `__attribute__((optimize` turns up plenty of reports to make me
think this attribute maybe doesn't work the way folks suspect or
intend: https://gcc.gnu.org/bugzilla/buglist.cgi?quicksearch=__attribute__%28%28optimize&list_id=283390.

There's plenty of folks arguing against the use of the optimize
attribute in favor of the command line flag.  I urge you to please
reconsider the request.

> Frankly I'm more worried that -Os will generate incorrect code.

If you have observed bugs as a result of setting
CONFIG_CC_OPTIMIZE_FOR_SIZE, we would love to help you get to the
bottom of them and help you debug them.  But we should also remain
vigilant against rejecting progress on the status quo for known issues
over hypothetical issues without proper regard for evidence.
Correctness is the chief concern of a compiler; that it generates
incorrect code unless default-on optimizations are explicitly disabled
would be concerning, if that was in fact the case.  Such a bug report
would be invaluable to this code base, and likely others.  I trust
you've seen bugs here, but I would like to help verify this claim.

> All compilers have bugs. Kernel has bugs. What can go wrong?

This is more terrifyingly precise and infinitely wise than you may
have initially intended.  That my phone and laptop don't catch fire
simultaneously now is nothing short of miraculous.  I'm still holding
my breath.

--
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] bpf: don't rely on GCC __attribute__((optimize)) to disable GCSE
  2020-10-28 17:15 ` [PATCH v2 1/2] bpf: don't rely on GCC __attribute__((optimize)) to disable GCSE Ard Biesheuvel
  2020-10-28 21:39   ` Alexei Starovoitov
  2020-10-29  8:25   ` Geert Uytterhoeven
@ 2020-10-30  0:34   ` Nick Desaulniers
  2 siblings, 0 replies; 16+ messages in thread
From: Nick Desaulniers @ 2020-10-30  0:34 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: LKML, Network Development, bpf, Arnd Bergmann, Arvind Sankar,
	Randy Dunlap, Josh Poimboeuf, Thomas Gleixner,
	Alexei Starovoitov, Daniel Borkmann, Peter Zijlstra,
	Geert Uytterhoeven, Kees Cook

On Wed, Oct 28, 2020 at 10:15 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> Commit 3193c0836 ("bpf: Disable GCC -fgcse optimization for
> ___bpf_prog_run()") introduced a __no_fgcse macro that expands to a
> function scope __attribute__((optimize("-fno-gcse"))), to disable a
> GCC specific optimization that was causing trouble on x86 builds, and
> was not expected to have any positive effect in the first place.
>
> However, as the GCC manual documents, __attribute__((optimize))
> is not for production use, and results in all other optimization
> options to be forgotten for the function in question. This can
> cause all kinds of trouble, but in one particular reported case,
> it causes -fno-asynchronous-unwind-tables to be disregarded,
> resulting in .eh_frame info to be emitted for the function.
>
> This reverts commit 3193c0836, and instead, it disables the -fgcse
> optimization for the entire source file, but only when building for
> X86 using GCC with CONFIG_BPF_JIT_ALWAYS_ON disabled. Note that the
> original commit states that CONFIG_RETPOLINE=n triggers the issue,
> whereas CONFIG_RETPOLINE=y performs better without the optimization,
> so it is kept disabled in both cases.
>
> Fixes: 3193c0836 ("bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()")
> Link: https://lore.kernel.org/lkml/CAMuHMdUg0WJHEcq6to0-eODpXPOywLot6UD2=GFHpzoj_hCoBQ@mail.gmail.com/
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  include/linux/compiler-gcc.h   | 2 --
>  include/linux/compiler_types.h | 4 ----
>  kernel/bpf/Makefile            | 6 +++++-
>  kernel/bpf/core.c              | 2 +-
>  4 files changed, 6 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> index d1e3c6896b71..5deb37024574 100644
> --- a/include/linux/compiler-gcc.h
> +++ b/include/linux/compiler-gcc.h
> @@ -175,5 +175,3 @@
>  #else
>  #define __diag_GCC_8(s)
>  #endif
> -
> -#define __no_fgcse __attribute__((optimize("-fno-gcse")))
> diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
> index 6e390d58a9f8..ac3fa37a84f9 100644
> --- a/include/linux/compiler_types.h
> +++ b/include/linux/compiler_types.h
> @@ -247,10 +247,6 @@ struct ftrace_likely_data {
>  #define asm_inline asm
>  #endif
>
> -#ifndef __no_fgcse
> -# define __no_fgcse
> -#endif
> -
>  /* Are two types/vars the same type (ignoring qualifiers)? */
>  #define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b))
>
> diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
> index bdc8cd1b6767..c1b9f71ee6aa 100644
> --- a/kernel/bpf/Makefile
> +++ b/kernel/bpf/Makefile
> @@ -1,6 +1,10 @@
>  # SPDX-License-Identifier: GPL-2.0
>  obj-y := core.o
> -CFLAGS_core.o += $(call cc-disable-warning, override-init)
> +ifneq ($(CONFIG_BPF_JIT_ALWAYS_ON),y)
> +# ___bpf_prog_run() needs GCSE disabled on x86; see 3193c0836f203 for details
> +cflags-nogcse-$(CONFIG_X86)$(CONFIG_CC_IS_GCC) := -fno-gcse
> +endif
> +CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy)

Writing multiple conditions in a conditional block in GNU make is
painful, hence the double `y` trick.  I feel like either 3 nested
conditionals (one for CONFIG_BPF_JIT_ALWAYS_ON, CONFIG_X86, and
CONFIG_CC_IS_GCC) would have been clearer, or using three `y`, rather
than mixing and matching `if`s with multiple `y`s, but regardless of
what color I think we should paint the bikeshed:

Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

This also doesn't resolve all issues here, but is a step in the right
direction, IMO.

>
>  obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_iter.o map_iter.o task_iter.o prog_iter.o
>  obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 9268d77898b7..55454d2278b1 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -1369,7 +1369,7 @@ u64 __weak bpf_probe_read_kernel(void *dst, u32 size, const void *unsafe_ptr)
>   *
>   * Decode and execute eBPF instructions.
>   */
> -static u64 __no_fgcse ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn, u64 *stack)
> +static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn, u64 *stack)
>  {
>  #define BPF_INSN_2_LBL(x, y)    [BPF_##x | BPF_##y] = &&x##_##y
>  #define BPF_INSN_3_LBL(x, y, z) [BPF_##x | BPF_##y | BPF_##z] = &&x##_##y##_##z
> --
> 2.17.1
>


-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] bpf: don't rely on GCC __attribute__((optimize)) to disable GCSE
  2020-10-30  0:28             ` Nick Desaulniers
@ 2020-10-30  3:22               ` Alexei Starovoitov
  2020-10-30  7:51                 ` Ard Biesheuvel
  0 siblings, 1 reply; 16+ messages in thread
From: Alexei Starovoitov @ 2020-10-30  3:22 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Ard Biesheuvel, Linux Kernel Mailing List,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Arnd Bergmann, Arvind Sankar, Randy Dunlap, Josh Poimboeuf,
	Thomas Gleixner, Alexei Starovoitov, Daniel Borkmann,
	Peter Zijlstra, Geert Uytterhoeven, Kees Cook

On Thu, Oct 29, 2020 at 05:28:11PM -0700, Nick Desaulniers wrote:
> 
> We already know that -fno-asynchronous-unwind-tables get dropped,
> hence this patch.  

On arm64 only. Not on x86

> And we know -fomit-frame-pointer or
> -fno-omit-frame-pointer I guess gets dropped, hence your ask.  

yep. that one is bugged.

> We might not know the full extent which other flags get dropped with the
> optimize attribute, but I'd argue that my list above can all result in
> pretty bad bugs when accidentally omitted (ok, maybe not -fshort-wchar
> or -fmacro-prefix-map, idk what those do) or when mixed with code that

true.
Few month back I've checked that strict-aliasing and no-common flags
from your list are not dropped by this attr in gcc [6789].
I've also checked that no-red-zone and model=kernel preserved as well.

> has different values those flags control.  Searching GCC's bug tracker
> for `__attribute__((optimize` turns up plenty of reports to make me
> think this attribute maybe doesn't work the way folks suspect or
> intend: https://gcc.gnu.org/bugzilla/buglist.cgi?quicksearch=__attribute__%28%28optimize&list_id=283390.

There is a risk.
Is it a footgun? Sure.
Yet. gcc testsuite is using __attribute__((optimize)).
And some of these tests were added _after_ offical gcc doc said that this
attribute is broken.
imo it's like 'beware of the dog' sign.

> There's plenty of folks arguing against the use of the optimize
> attribute in favor of the command line flag.  I urge you to please
> reconsider the request.

ok. Applied this first patch to bpf tree and will get it to Linus soon.
Second patch that is splitting interpreter out because of this mess
is dropped. The effect of gcse on performance is questionable.
iirc some interpreters used to do -fno-gcse to gain performance.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/2] bpf: don't rely on GCC __attribute__((optimize)) to disable GCSE
  2020-10-30  3:22               ` Alexei Starovoitov
@ 2020-10-30  7:51                 ` Ard Biesheuvel
  0 siblings, 0 replies; 16+ messages in thread
From: Ard Biesheuvel @ 2020-10-30  7:51 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Nick Desaulniers, Linux Kernel Mailing List,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT),
	Arnd Bergmann, Arvind Sankar, Randy Dunlap, Josh Poimboeuf,
	Thomas Gleixner, Alexei Starovoitov, Daniel Borkmann,
	Peter Zijlstra, Geert Uytterhoeven, Kees Cook

On Fri, 30 Oct 2020 at 04:22, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Oct 29, 2020 at 05:28:11PM -0700, Nick Desaulniers wrote:
> >
> > We already know that -fno-asynchronous-unwind-tables get dropped,
> > hence this patch.
>
> On arm64 only. Not on x86
>
> > And we know -fomit-frame-pointer or
> > -fno-omit-frame-pointer I guess gets dropped, hence your ask.
>
> yep. that one is bugged.
>
> > We might not know the full extent which other flags get dropped with the
> > optimize attribute, but I'd argue that my list above can all result in
> > pretty bad bugs when accidentally omitted (ok, maybe not -fshort-wchar
> > or -fmacro-prefix-map, idk what those do) or when mixed with code that
>
> true.
> Few month back I've checked that strict-aliasing and no-common flags
> from your list are not dropped by this attr in gcc [6789].
> I've also checked that no-red-zone and model=kernel preserved as well.
>
> > has different values those flags control.  Searching GCC's bug tracker
> > for `__attribute__((optimize` turns up plenty of reports to make me
> > think this attribute maybe doesn't work the way folks suspect or
> > intend: https://gcc.gnu.org/bugzilla/buglist.cgi?quicksearch=__attribute__%28%28optimize&list_id=283390.
>
> There is a risk.
> Is it a footgun? Sure.
> Yet. gcc testsuite is using __attribute__((optimize)).
> And some of these tests were added _after_ offical gcc doc said that this
> attribute is broken.
> imo it's like 'beware of the dog' sign.
>
> > There's plenty of folks arguing against the use of the optimize
> > attribute in favor of the command line flag.  I urge you to please
> > reconsider the request.
>
> ok. Applied this first patch to bpf tree and will get it to Linus soon.
> Second patch that is splitting interpreter out because of this mess
> is dropped. The effect of gcse on performance is questionable.
> iirc some interpreters used to do -fno-gcse to gain performance.

That is absolutely fine. I only included the second patch to address
Daniel's concern that -fno-gcse could affect unrelated code living in
the same source file as __bpf_prog_run(), but if you don't care about
that, nor will I.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-10-30  7:51 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-28 17:15 [PATCH v2 0/2] get rid of GCC __attribute__((optimize)) for BPF Ard Biesheuvel
2020-10-28 17:15 ` [PATCH v2 1/2] bpf: don't rely on GCC __attribute__((optimize)) to disable GCSE Ard Biesheuvel
2020-10-28 21:39   ` Alexei Starovoitov
2020-10-28 22:15     ` Ard Biesheuvel
2020-10-28 22:59       ` Alexei Starovoitov
2020-10-28 23:10         ` Ard Biesheuvel
2020-10-28 23:20           ` Alexei Starovoitov
2020-10-29  2:57             ` Arvind Sankar
2020-10-29 20:31               ` Segher Boessenkool
2020-10-29 22:13                 ` Ard Biesheuvel
2020-10-30  0:28             ` Nick Desaulniers
2020-10-30  3:22               ` Alexei Starovoitov
2020-10-30  7:51                 ` Ard Biesheuvel
2020-10-29  8:25   ` Geert Uytterhoeven
2020-10-30  0:34   ` Nick Desaulniers
2020-10-28 17:15 ` [PATCH v2 2/2] bpf: move interpreter into separate source file Ard Biesheuvel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).