All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v5 0/6] bpf trampoline for arm64
@ 2022-05-18 13:16 ` Xu Kuohai
  0 siblings, 0 replies; 60+ messages in thread
From: Xu Kuohai @ 2022-05-18 13:16 UTC (permalink / raw)
  To: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest
  Cc: Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Mark Rutland, Pasha Tatashin, Ard Biesheuvel, Daniel Kiss,
	Steven Price, Sudeep Holla, Marc Zyngier, Peter Collingbourne,
	Mark Brown, Delyan Kratunov, Kumar Kartikeya Dwivedi

Add bpf trampoline support for arm64. Most of the logic is the same as
x86.

Tested on raspberry pi 4b and qemu with KASLR disabled (avoid long jump),
result:
 #9  /1     bpf_cookie/kprobe:OK
 #9  /2     bpf_cookie/multi_kprobe_link_api:FAIL
 #9  /3     bpf_cookie/multi_kprobe_attach_api:FAIL
 #9  /4     bpf_cookie/uprobe:OK
 #9  /5     bpf_cookie/tracepoint:OK
 #9  /6     bpf_cookie/perf_event:OK
 #9  /7     bpf_cookie/trampoline:OK
 #9  /8     bpf_cookie/lsm:OK
 #9         bpf_cookie:FAIL
 #18 /1     bpf_tcp_ca/dctcp:OK
 #18 /2     bpf_tcp_ca/cubic:OK
 #18 /3     bpf_tcp_ca/invalid_license:OK
 #18 /4     bpf_tcp_ca/dctcp_fallback:OK
 #18 /5     bpf_tcp_ca/rel_setsockopt:OK
 #18        bpf_tcp_ca:OK
 #51 /1     dummy_st_ops/dummy_st_ops_attach:OK
 #51 /2     dummy_st_ops/dummy_init_ret_value:OK
 #51 /3     dummy_st_ops/dummy_init_ptr_arg:OK
 #51 /4     dummy_st_ops/dummy_multiple_args:OK
 #51        dummy_st_ops:OK
 #55        fentry_fexit:OK
 #56        fentry_test:OK
 #57 /1     fexit_bpf2bpf/target_no_callees:OK
 #57 /2     fexit_bpf2bpf/target_yes_callees:OK
 #57 /3     fexit_bpf2bpf/func_replace:OK
 #57 /4     fexit_bpf2bpf/func_replace_verify:OK
 #57 /5     fexit_bpf2bpf/func_sockmap_update:OK
 #57 /6     fexit_bpf2bpf/func_replace_return_code:OK
 #57 /7     fexit_bpf2bpf/func_map_prog_compatibility:OK
 #57 /8     fexit_bpf2bpf/func_replace_multi:OK
 #57 /9     fexit_bpf2bpf/fmod_ret_freplace:OK
 #57        fexit_bpf2bpf:OK
 #58        fexit_sleep:OK
 #59        fexit_stress:OK
 #60        fexit_test:OK
 #67        get_func_args_test:OK
 #68        get_func_ip_test:OK
 #104       modify_return:OK
 #237       xdp_bpf2bpf:OK

bpf_cookie/multi_kprobe_link_api and bpf_cookie/multi_kprobe_attach_api
failed due to lack of multi_kprobe on arm64.

v5:
- As Alexei suggested, remove is_valid_bpf_tramp_flags()

v4: https://lore.kernel.org/bpf/20220517071838.3366093-1-xukuohai@huawei.com/
- Run the test cases on raspberry pi 4b
- Rebase and add cookie to trampoline
- As Steve suggested, move trace_direct_tramp() back to entry-ftrace.S to
  avoid messing up generic code with architecture specific code
- As Jakub suggested, merge patch 4 and patch 5 of v3 to provide full function
  in one patch
- As Mark suggested, add a comment for the use of aarch64_insn_patch_text_nosync()
- Do not generate trampoline for long jump to avoid triggering ftrace_bug
- Round stack size to multiples of 16B to avoid SPAlignmentFault
- Use callee saved register x20 to reduce the use of mov_i64
- Add missing BTI J instructions
- Trivial spelling and code style fixes

v3: https://lore.kernel.org/bpf/20220424154028.1698685-1-xukuohai@huawei.com/
- Append test results for bpf_tcp_ca, dummy_st_ops, fexit_bpf2bpf,
  xdp_bpf2bpf
- Support to poke bpf progs
- Fix return value of arch_prepare_bpf_trampoline() to the total number
  of bytes instead of number of instructions 
- Do not check whether CONFIG_DYNAMIC_FTRACE_WITH_REGS is enabled in
  arch_prepare_bpf_trampoline, since the trampoline may be hooked to a bpf
  prog
- Restrict bpf_arch_text_poke() to poke bpf text only, as kernel functions
  are poked by ftrace
- Rewrite trace_direct_tramp() in inline assembly in trace_selftest.c
  to avoid messing entry-ftrace.S
- isolate arch_ftrace_set_direct_caller() with macro
  CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS to avoid compile error
  when this macro is disabled
- Some trivial code sytle fixes

v2: https://lore.kernel.org/bpf/20220414162220.1985095-1-xukuohai@huawei.com/
- Add Song's ACK
- Change the multi-line comment in is_valid_bpf_tramp_flags() into net
  style (patch 3)
- Fix a deadloop issue in ftrace selftest (patch 2)
- Replace pt_regs->x0 with pt_regs->orig_x0 in patch 1 commit message 
- Replace "bpf trampoline" with "custom trampoline" in patch 1, as
  ftrace direct call is not only used by bpf trampoline.

v1: https://lore.kernel.org/bpf/20220413054959.1053668-1-xukuohai@huawei.com/

Xu Kuohai (6):
  arm64: ftrace: Add ftrace direct call support
  ftrace: Fix deadloop caused by direct call in ftrace selftest
  bpf: Remove is_valid_bpf_tramp_flags()
  bpf, arm64: Impelment bpf_arch_text_poke() for arm64
  bpf, arm64: bpf trampoline for arm64
  selftests/bpf: Fix trivial typo in fentry_fexit.c

 arch/arm64/Kconfig                            |   2 +
 arch/arm64/include/asm/ftrace.h               |  22 +
 arch/arm64/kernel/asm-offsets.c               |   1 +
 arch/arm64/kernel/entry-ftrace.S              |  28 +-
 arch/arm64/net/bpf_jit.h                      |   1 +
 arch/arm64/net/bpf_jit_comp.c                 | 523 +++++++++++++++++-
 arch/x86/net/bpf_jit_comp.c                   |  20 -
 kernel/bpf/bpf_struct_ops.c                   |   3 +
 kernel/bpf/trampoline.c                       |   3 +
 kernel/trace/trace_selftest.c                 |   2 +
 .../selftests/bpf/prog_tests/fentry_fexit.c   |   4 +-
 11 files changed, 570 insertions(+), 39 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH bpf-next v5 0/6] bpf trampoline for arm64
@ 2022-05-18 13:16 ` Xu Kuohai
  0 siblings, 0 replies; 60+ messages in thread
From: Xu Kuohai @ 2022-05-18 13:16 UTC (permalink / raw)
  To: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest
  Cc: Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Mark Rutland, Pasha Tatashin, Ard Biesheuvel, Daniel Kiss,
	Steven Price, Sudeep Holla, Marc Zyngier, Peter Collingbourne,
	Mark Brown, Delyan Kratunov, Kumar Kartikeya Dwivedi

Add bpf trampoline support for arm64. Most of the logic is the same as
x86.

Tested on raspberry pi 4b and qemu with KASLR disabled (avoid long jump),
result:
 #9  /1     bpf_cookie/kprobe:OK
 #9  /2     bpf_cookie/multi_kprobe_link_api:FAIL
 #9  /3     bpf_cookie/multi_kprobe_attach_api:FAIL
 #9  /4     bpf_cookie/uprobe:OK
 #9  /5     bpf_cookie/tracepoint:OK
 #9  /6     bpf_cookie/perf_event:OK
 #9  /7     bpf_cookie/trampoline:OK
 #9  /8     bpf_cookie/lsm:OK
 #9         bpf_cookie:FAIL
 #18 /1     bpf_tcp_ca/dctcp:OK
 #18 /2     bpf_tcp_ca/cubic:OK
 #18 /3     bpf_tcp_ca/invalid_license:OK
 #18 /4     bpf_tcp_ca/dctcp_fallback:OK
 #18 /5     bpf_tcp_ca/rel_setsockopt:OK
 #18        bpf_tcp_ca:OK
 #51 /1     dummy_st_ops/dummy_st_ops_attach:OK
 #51 /2     dummy_st_ops/dummy_init_ret_value:OK
 #51 /3     dummy_st_ops/dummy_init_ptr_arg:OK
 #51 /4     dummy_st_ops/dummy_multiple_args:OK
 #51        dummy_st_ops:OK
 #55        fentry_fexit:OK
 #56        fentry_test:OK
 #57 /1     fexit_bpf2bpf/target_no_callees:OK
 #57 /2     fexit_bpf2bpf/target_yes_callees:OK
 #57 /3     fexit_bpf2bpf/func_replace:OK
 #57 /4     fexit_bpf2bpf/func_replace_verify:OK
 #57 /5     fexit_bpf2bpf/func_sockmap_update:OK
 #57 /6     fexit_bpf2bpf/func_replace_return_code:OK
 #57 /7     fexit_bpf2bpf/func_map_prog_compatibility:OK
 #57 /8     fexit_bpf2bpf/func_replace_multi:OK
 #57 /9     fexit_bpf2bpf/fmod_ret_freplace:OK
 #57        fexit_bpf2bpf:OK
 #58        fexit_sleep:OK
 #59        fexit_stress:OK
 #60        fexit_test:OK
 #67        get_func_args_test:OK
 #68        get_func_ip_test:OK
 #104       modify_return:OK
 #237       xdp_bpf2bpf:OK

bpf_cookie/multi_kprobe_link_api and bpf_cookie/multi_kprobe_attach_api
failed due to lack of multi_kprobe on arm64.

v5:
- As Alexei suggested, remove is_valid_bpf_tramp_flags()

v4: https://lore.kernel.org/bpf/20220517071838.3366093-1-xukuohai@huawei.com/
- Run the test cases on raspberry pi 4b
- Rebase and add cookie to trampoline
- As Steve suggested, move trace_direct_tramp() back to entry-ftrace.S to
  avoid messing up generic code with architecture specific code
- As Jakub suggested, merge patch 4 and patch 5 of v3 to provide full function
  in one patch
- As Mark suggested, add a comment for the use of aarch64_insn_patch_text_nosync()
- Do not generate trampoline for long jump to avoid triggering ftrace_bug
- Round stack size to multiples of 16B to avoid SPAlignmentFault
- Use callee saved register x20 to reduce the use of mov_i64
- Add missing BTI J instructions
- Trivial spelling and code style fixes

v3: https://lore.kernel.org/bpf/20220424154028.1698685-1-xukuohai@huawei.com/
- Append test results for bpf_tcp_ca, dummy_st_ops, fexit_bpf2bpf,
  xdp_bpf2bpf
- Support to poke bpf progs
- Fix return value of arch_prepare_bpf_trampoline() to the total number
  of bytes instead of number of instructions 
- Do not check whether CONFIG_DYNAMIC_FTRACE_WITH_REGS is enabled in
  arch_prepare_bpf_trampoline, since the trampoline may be hooked to a bpf
  prog
- Restrict bpf_arch_text_poke() to poke bpf text only, as kernel functions
  are poked by ftrace
- Rewrite trace_direct_tramp() in inline assembly in trace_selftest.c
  to avoid messing entry-ftrace.S
- isolate arch_ftrace_set_direct_caller() with macro
  CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS to avoid compile error
  when this macro is disabled
- Some trivial code sytle fixes

v2: https://lore.kernel.org/bpf/20220414162220.1985095-1-xukuohai@huawei.com/
- Add Song's ACK
- Change the multi-line comment in is_valid_bpf_tramp_flags() into net
  style (patch 3)
- Fix a deadloop issue in ftrace selftest (patch 2)
- Replace pt_regs->x0 with pt_regs->orig_x0 in patch 1 commit message 
- Replace "bpf trampoline" with "custom trampoline" in patch 1, as
  ftrace direct call is not only used by bpf trampoline.

v1: https://lore.kernel.org/bpf/20220413054959.1053668-1-xukuohai@huawei.com/

Xu Kuohai (6):
  arm64: ftrace: Add ftrace direct call support
  ftrace: Fix deadloop caused by direct call in ftrace selftest
  bpf: Remove is_valid_bpf_tramp_flags()
  bpf, arm64: Impelment bpf_arch_text_poke() for arm64
  bpf, arm64: bpf trampoline for arm64
  selftests/bpf: Fix trivial typo in fentry_fexit.c

 arch/arm64/Kconfig                            |   2 +
 arch/arm64/include/asm/ftrace.h               |  22 +
 arch/arm64/kernel/asm-offsets.c               |   1 +
 arch/arm64/kernel/entry-ftrace.S              |  28 +-
 arch/arm64/net/bpf_jit.h                      |   1 +
 arch/arm64/net/bpf_jit_comp.c                 | 523 +++++++++++++++++-
 arch/x86/net/bpf_jit_comp.c                   |  20 -
 kernel/bpf/bpf_struct_ops.c                   |   3 +
 kernel/bpf/trampoline.c                       |   3 +
 kernel/trace/trace_selftest.c                 |   2 +
 .../selftests/bpf/prog_tests/fentry_fexit.c   |   4 +-
 11 files changed, 570 insertions(+), 39 deletions(-)

-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH bpf-next v5 1/6] arm64: ftrace: Add ftrace direct call support
  2022-05-18 13:16 ` Xu Kuohai
@ 2022-05-18 13:16   ` Xu Kuohai
  -1 siblings, 0 replies; 60+ messages in thread
From: Xu Kuohai @ 2022-05-18 13:16 UTC (permalink / raw)
  To: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest
  Cc: Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Mark Rutland, Pasha Tatashin, Ard Biesheuvel, Daniel Kiss,
	Steven Price, Sudeep Holla, Marc Zyngier, Peter Collingbourne,
	Mark Brown, Delyan Kratunov, Kumar Kartikeya Dwivedi

Add ftrace direct support for arm64.

1. When there is custom trampoline only, replace the fentry nop to a
   jump instruction that jumps directly to the custom trampoline.

2. When ftrace trampoline and custom trampoline coexist, jump from
   fentry to ftrace trampoline first, then jump to custom trampoline
   when ftrace trampoline exits. The current unused register
   pt_regs->orig_x0 is used as an intermediary for jumping from ftrace
   trampoline to custom trampoline.

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Acked-by: Song Liu <songliubraving@fb.com>
---
 arch/arm64/Kconfig               |  2 ++
 arch/arm64/include/asm/ftrace.h  | 12 ++++++++++++
 arch/arm64/kernel/asm-offsets.c  |  1 +
 arch/arm64/kernel/entry-ftrace.S | 18 +++++++++++++++---
 4 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 57c4c995965f..81cc330daafc 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -177,6 +177,8 @@ config ARM64
 	select HAVE_DYNAMIC_FTRACE
 	select HAVE_DYNAMIC_FTRACE_WITH_REGS \
 		if $(cc-option,-fpatchable-function-entry=2)
+	select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS \
+		if DYNAMIC_FTRACE_WITH_REGS
 	select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY \
 		if DYNAMIC_FTRACE_WITH_REGS
 	select HAVE_EFFICIENT_UNALIGNED_ACCESS
diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
index 1494cfa8639b..14a35a5df0a1 100644
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -78,6 +78,18 @@ static inline unsigned long ftrace_call_adjust(unsigned long addr)
 	return addr;
 }
 
+#ifdef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+static inline void arch_ftrace_set_direct_caller(struct pt_regs *regs,
+						 unsigned long addr)
+{
+	/*
+	 * Place custom trampoline address in regs->orig_x0 to let ftrace
+	 * trampoline jump to it.
+	 */
+	regs->orig_x0 = addr;
+}
+#endif /* CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
+
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
 struct dyn_ftrace;
 int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec);
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 1197e7679882..b1ed0bf01c59 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -80,6 +80,7 @@ int main(void)
   DEFINE(S_SDEI_TTBR1,		offsetof(struct pt_regs, sdei_ttbr1));
   DEFINE(S_PMR_SAVE,		offsetof(struct pt_regs, pmr_save));
   DEFINE(S_STACKFRAME,		offsetof(struct pt_regs, stackframe));
+  DEFINE(S_ORIG_X0,		offsetof(struct pt_regs, orig_x0));
   DEFINE(PT_REGS_SIZE,		sizeof(struct pt_regs));
   BLANK();
 #ifdef CONFIG_COMPAT
diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
index e535480a4069..dfe62c55e3a2 100644
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -60,6 +60,9 @@
 	str	x29, [sp, #S_FP]
 	.endif
 
+	/* Set orig_x0 to zero  */
+	str     xzr, [sp, #S_ORIG_X0]
+
 	/* Save the callsite's SP and LR */
 	add	x10, sp, #(PT_REGS_SIZE + 16)
 	stp	x9, x10, [sp, #S_LR]
@@ -119,12 +122,21 @@ ftrace_common_return:
 	/* Restore the callsite's FP, LR, PC */
 	ldr	x29, [sp, #S_FP]
 	ldr	x30, [sp, #S_LR]
-	ldr	x9, [sp, #S_PC]
-
+	ldr	x10, [sp, #S_PC]
+
+	ldr	x11, [sp, #S_ORIG_X0]
+	cbz	x11, 1f
+	/* Set x9 to parent ip before jump to custom trampoline */
+	mov	x9,  x30
+	/* Set lr to self ip */
+	ldr	x30, [sp, #S_PC]
+	/* Set x10 (used for return address) to custom trampoline */
+	mov	x10, x11
+1:
 	/* Restore the callsite's SP */
 	add	sp, sp, #PT_REGS_SIZE + 16
 
-	ret	x9
+	ret	x10
 SYM_CODE_END(ftrace_common)
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH bpf-next v5 1/6] arm64: ftrace: Add ftrace direct call support
@ 2022-05-18 13:16   ` Xu Kuohai
  0 siblings, 0 replies; 60+ messages in thread
From: Xu Kuohai @ 2022-05-18 13:16 UTC (permalink / raw)
  To: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest
  Cc: Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Mark Rutland, Pasha Tatashin, Ard Biesheuvel, Daniel Kiss,
	Steven Price, Sudeep Holla, Marc Zyngier, Peter Collingbourne,
	Mark Brown, Delyan Kratunov, Kumar Kartikeya Dwivedi

Add ftrace direct support for arm64.

1. When there is custom trampoline only, replace the fentry nop to a
   jump instruction that jumps directly to the custom trampoline.

2. When ftrace trampoline and custom trampoline coexist, jump from
   fentry to ftrace trampoline first, then jump to custom trampoline
   when ftrace trampoline exits. The current unused register
   pt_regs->orig_x0 is used as an intermediary for jumping from ftrace
   trampoline to custom trampoline.

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Acked-by: Song Liu <songliubraving@fb.com>
---
 arch/arm64/Kconfig               |  2 ++
 arch/arm64/include/asm/ftrace.h  | 12 ++++++++++++
 arch/arm64/kernel/asm-offsets.c  |  1 +
 arch/arm64/kernel/entry-ftrace.S | 18 +++++++++++++++---
 4 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 57c4c995965f..81cc330daafc 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -177,6 +177,8 @@ config ARM64
 	select HAVE_DYNAMIC_FTRACE
 	select HAVE_DYNAMIC_FTRACE_WITH_REGS \
 		if $(cc-option,-fpatchable-function-entry=2)
+	select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS \
+		if DYNAMIC_FTRACE_WITH_REGS
 	select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY \
 		if DYNAMIC_FTRACE_WITH_REGS
 	select HAVE_EFFICIENT_UNALIGNED_ACCESS
diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
index 1494cfa8639b..14a35a5df0a1 100644
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -78,6 +78,18 @@ static inline unsigned long ftrace_call_adjust(unsigned long addr)
 	return addr;
 }
 
+#ifdef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+static inline void arch_ftrace_set_direct_caller(struct pt_regs *regs,
+						 unsigned long addr)
+{
+	/*
+	 * Place custom trampoline address in regs->orig_x0 to let ftrace
+	 * trampoline jump to it.
+	 */
+	regs->orig_x0 = addr;
+}
+#endif /* CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
+
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
 struct dyn_ftrace;
 int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec);
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 1197e7679882..b1ed0bf01c59 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -80,6 +80,7 @@ int main(void)
   DEFINE(S_SDEI_TTBR1,		offsetof(struct pt_regs, sdei_ttbr1));
   DEFINE(S_PMR_SAVE,		offsetof(struct pt_regs, pmr_save));
   DEFINE(S_STACKFRAME,		offsetof(struct pt_regs, stackframe));
+  DEFINE(S_ORIG_X0,		offsetof(struct pt_regs, orig_x0));
   DEFINE(PT_REGS_SIZE,		sizeof(struct pt_regs));
   BLANK();
 #ifdef CONFIG_COMPAT
diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
index e535480a4069..dfe62c55e3a2 100644
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -60,6 +60,9 @@
 	str	x29, [sp, #S_FP]
 	.endif
 
+	/* Set orig_x0 to zero  */
+	str     xzr, [sp, #S_ORIG_X0]
+
 	/* Save the callsite's SP and LR */
 	add	x10, sp, #(PT_REGS_SIZE + 16)
 	stp	x9, x10, [sp, #S_LR]
@@ -119,12 +122,21 @@ ftrace_common_return:
 	/* Restore the callsite's FP, LR, PC */
 	ldr	x29, [sp, #S_FP]
 	ldr	x30, [sp, #S_LR]
-	ldr	x9, [sp, #S_PC]
-
+	ldr	x10, [sp, #S_PC]
+
+	ldr	x11, [sp, #S_ORIG_X0]
+	cbz	x11, 1f
+	/* Set x9 to parent ip before jump to custom trampoline */
+	mov	x9,  x30
+	/* Set lr to self ip */
+	ldr	x30, [sp, #S_PC]
+	/* Set x10 (used for return address) to custom trampoline */
+	mov	x10, x11
+1:
 	/* Restore the callsite's SP */
 	add	sp, sp, #PT_REGS_SIZE + 16
 
-	ret	x9
+	ret	x10
 SYM_CODE_END(ftrace_common)
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH bpf-next v5 2/6] ftrace: Fix deadloop caused by direct call in ftrace selftest
  2022-05-18 13:16 ` Xu Kuohai
@ 2022-05-18 13:16   ` Xu Kuohai
  -1 siblings, 0 replies; 60+ messages in thread
From: Xu Kuohai @ 2022-05-18 13:16 UTC (permalink / raw)
  To: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest
  Cc: Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Mark Rutland, Pasha Tatashin, Ard Biesheuvel, Daniel Kiss,
	Steven Price, Sudeep Holla, Marc Zyngier, Peter Collingbourne,
	Mark Brown, Delyan Kratunov, Kumar Kartikeya Dwivedi

After direct call is enabled for arm64, ftrace selftest enters a
dead loop:

<trace_selftest_dynamic_test_func>:
00  bti     c
01  mov     x9, x30                            <trace_direct_tramp>:
02  bl      <trace_direct_tramp>    ---------->     ret
                                                     |
                                         lr/x30 is 03, return to 03
                                                     |
03  mov     w0, #0x0   <-----------------------------|
     |                                               |
     |                   dead loop!                  |
     |                                               |
04  ret   ---- lr/x30 is still 03, go back to 03 ----|

The reason is that when the direct caller trace_direct_tramp() returns
to the patched function trace_selftest_dynamic_test_func(), lr is still
the address after the instrumented instruction in the patched function,
so when the patched function exits, it returns to itself!

To fix this issue, we need to restore lr before trace_direct_tramp()
exits, so rewrite a dedicated trace_direct_tramp() for arm64.

Reported-by: Li Huafei <lihuafei1@huawei.com>
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
---
 arch/arm64/include/asm/ftrace.h  | 10 ++++++++++
 arch/arm64/kernel/entry-ftrace.S | 10 ++++++++++
 kernel/trace/trace_selftest.c    |  2 ++
 3 files changed, 22 insertions(+)

diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
index 14a35a5df0a1..6f6b184e72fb 100644
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -126,6 +126,16 @@ static inline bool arch_syscall_match_sym_name(const char *sym,
 	 */
 	return !strcmp(sym + 8, name);
 }
+
+#ifdef CONFIG_FTRACE_SELFTEST
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+
+#define trace_direct_tramp trace_direct_tramp
+extern void trace_direct_tramp(void);
+
+#endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
+#endif /* CONFIG_FTRACE_SELFTEST */
+
 #endif /* ifndef __ASSEMBLY__ */
 
 #endif /* __ASM_FTRACE_H */
diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
index dfe62c55e3a2..a47e87d4d3dd 100644
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -357,3 +357,13 @@ SYM_CODE_START(return_to_handler)
 	ret
 SYM_CODE_END(return_to_handler)
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
+
+#ifdef CONFIG_FTRACE_SELFTEST
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+SYM_FUNC_START(trace_direct_tramp)
+	mov x10, x30
+	mov x30, x9
+	ret x10
+SYM_FUNC_END(trace_direct_tramp)
+#endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
+#endif /* CONFIG_FTRACE_SELFTEST */
diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index abcadbe933bb..e7ccd0d10c39 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -785,8 +785,10 @@ static struct fgraph_ops fgraph_ops __initdata  = {
 };
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+#ifndef trace_direct_tramp
 noinline __noclone static void trace_direct_tramp(void) { }
 #endif
+#endif
 
 /*
  * Pretty much the same than for the function tracer from which the selftest
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH bpf-next v5 2/6] ftrace: Fix deadloop caused by direct call in ftrace selftest
@ 2022-05-18 13:16   ` Xu Kuohai
  0 siblings, 0 replies; 60+ messages in thread
From: Xu Kuohai @ 2022-05-18 13:16 UTC (permalink / raw)
  To: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest
  Cc: Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Mark Rutland, Pasha Tatashin, Ard Biesheuvel, Daniel Kiss,
	Steven Price, Sudeep Holla, Marc Zyngier, Peter Collingbourne,
	Mark Brown, Delyan Kratunov, Kumar Kartikeya Dwivedi

After direct call is enabled for arm64, ftrace selftest enters a
dead loop:

<trace_selftest_dynamic_test_func>:
00  bti     c
01  mov     x9, x30                            <trace_direct_tramp>:
02  bl      <trace_direct_tramp>    ---------->     ret
                                                     |
                                         lr/x30 is 03, return to 03
                                                     |
03  mov     w0, #0x0   <-----------------------------|
     |                                               |
     |                   dead loop!                  |
     |                                               |
04  ret   ---- lr/x30 is still 03, go back to 03 ----|

The reason is that when the direct caller trace_direct_tramp() returns
to the patched function trace_selftest_dynamic_test_func(), lr is still
the address after the instrumented instruction in the patched function,
so when the patched function exits, it returns to itself!

To fix this issue, we need to restore lr before trace_direct_tramp()
exits, so rewrite a dedicated trace_direct_tramp() for arm64.

Reported-by: Li Huafei <lihuafei1@huawei.com>
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
---
 arch/arm64/include/asm/ftrace.h  | 10 ++++++++++
 arch/arm64/kernel/entry-ftrace.S | 10 ++++++++++
 kernel/trace/trace_selftest.c    |  2 ++
 3 files changed, 22 insertions(+)

diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
index 14a35a5df0a1..6f6b184e72fb 100644
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -126,6 +126,16 @@ static inline bool arch_syscall_match_sym_name(const char *sym,
 	 */
 	return !strcmp(sym + 8, name);
 }
+
+#ifdef CONFIG_FTRACE_SELFTEST
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+
+#define trace_direct_tramp trace_direct_tramp
+extern void trace_direct_tramp(void);
+
+#endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
+#endif /* CONFIG_FTRACE_SELFTEST */
+
 #endif /* ifndef __ASSEMBLY__ */
 
 #endif /* __ASM_FTRACE_H */
diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
index dfe62c55e3a2..a47e87d4d3dd 100644
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -357,3 +357,13 @@ SYM_CODE_START(return_to_handler)
 	ret
 SYM_CODE_END(return_to_handler)
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
+
+#ifdef CONFIG_FTRACE_SELFTEST
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+SYM_FUNC_START(trace_direct_tramp)
+	mov x10, x30
+	mov x30, x9
+	ret x10
+SYM_FUNC_END(trace_direct_tramp)
+#endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
+#endif /* CONFIG_FTRACE_SELFTEST */
diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index abcadbe933bb..e7ccd0d10c39 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -785,8 +785,10 @@ static struct fgraph_ops fgraph_ops __initdata  = {
 };
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
+#ifndef trace_direct_tramp
 noinline __noclone static void trace_direct_tramp(void) { }
 #endif
+#endif
 
 /*
  * Pretty much the same than for the function tracer from which the selftest
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH bpf-next v5 3/6] bpf: Remove is_valid_bpf_tramp_flags()
  2022-05-18 13:16 ` Xu Kuohai
@ 2022-05-18 13:16   ` Xu Kuohai
  -1 siblings, 0 replies; 60+ messages in thread
From: Xu Kuohai @ 2022-05-18 13:16 UTC (permalink / raw)
  To: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest
  Cc: Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Mark Rutland, Pasha Tatashin, Ard Biesheuvel, Daniel Kiss,
	Steven Price, Sudeep Holla, Marc Zyngier, Peter Collingbourne,
	Mark Brown, Delyan Kratunov, Kumar Kartikeya Dwivedi

BPF_TRAM_F_XXX flags are not used by user code and are almost constant
at compile time, so run time validation is a bit overkill. Remove
is_valid_bpf_tramp_flags() and add some usage comments.

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Acked-by: Song Liu <songliubraving@fb.com>
---
 arch/x86/net/bpf_jit_comp.c | 20 --------------------
 kernel/bpf/bpf_struct_ops.c |  3 +++
 kernel/bpf/trampoline.c     |  3 +++
 3 files changed, 6 insertions(+), 20 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index a2b6d197c226..7698ef3b4821 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1922,23 +1922,6 @@ static int invoke_bpf_mod_ret(const struct btf_func_model *m, u8 **pprog,
 	return 0;
 }
 
-static bool is_valid_bpf_tramp_flags(unsigned int flags)
-{
-	if ((flags & BPF_TRAMP_F_RESTORE_REGS) &&
-	    (flags & BPF_TRAMP_F_SKIP_FRAME))
-		return false;
-
-	/*
-	 * BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops,
-	 * and it must be used alone.
-	 */
-	if ((flags & BPF_TRAMP_F_RET_FENTRY_RET) &&
-	    (flags & ~BPF_TRAMP_F_RET_FENTRY_RET))
-		return false;
-
-	return true;
-}
-
 /* Example:
  * __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev);
  * its 'struct btf_func_model' will be nr_args=2
@@ -2017,9 +2000,6 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
 	if (nr_args > 6)
 		return -ENOTSUPP;
 
-	if (!is_valid_bpf_tramp_flags(flags))
-		return -EINVAL;
-
 	/* Generated trampoline stack layout:
 	 *
 	 * RBP + 8         [ return address  ]
diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
index d9a3c9207240..0572cc5aeb28 100644
--- a/kernel/bpf/bpf_struct_ops.c
+++ b/kernel/bpf/bpf_struct_ops.c
@@ -341,6 +341,9 @@ int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks,
 
 	tlinks[BPF_TRAMP_FENTRY].links[0] = link;
 	tlinks[BPF_TRAMP_FENTRY].nr_links = 1;
+	/* BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops,
+	 * and it must be used alone.
+	 */
 	flags = model->ret_size > 0 ? BPF_TRAMP_F_RET_FENTRY_RET : 0;
 	return arch_prepare_bpf_trampoline(NULL, image, image_end,
 					   model, flags, tlinks, NULL);
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 93c7675f0c9e..bd3f2e673874 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -358,6 +358,9 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr)
 
 	if (tlinks[BPF_TRAMP_FEXIT].nr_links ||
 	    tlinks[BPF_TRAMP_MODIFY_RETURN].nr_links)
+		/* NOTE: BPF_TRAMP_F_RESTORE_REGS and BPF_TRAMP_F_SKIP_FRAME
+		 * should not be set together.
+		 */
 		flags = BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME;
 
 	if (ip_arg)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH bpf-next v5 3/6] bpf: Remove is_valid_bpf_tramp_flags()
@ 2022-05-18 13:16   ` Xu Kuohai
  0 siblings, 0 replies; 60+ messages in thread
From: Xu Kuohai @ 2022-05-18 13:16 UTC (permalink / raw)
  To: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest
  Cc: Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Mark Rutland, Pasha Tatashin, Ard Biesheuvel, Daniel Kiss,
	Steven Price, Sudeep Holla, Marc Zyngier, Peter Collingbourne,
	Mark Brown, Delyan Kratunov, Kumar Kartikeya Dwivedi

BPF_TRAM_F_XXX flags are not used by user code and are almost constant
at compile time, so run time validation is a bit overkill. Remove
is_valid_bpf_tramp_flags() and add some usage comments.

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Acked-by: Song Liu <songliubraving@fb.com>
---
 arch/x86/net/bpf_jit_comp.c | 20 --------------------
 kernel/bpf/bpf_struct_ops.c |  3 +++
 kernel/bpf/trampoline.c     |  3 +++
 3 files changed, 6 insertions(+), 20 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index a2b6d197c226..7698ef3b4821 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1922,23 +1922,6 @@ static int invoke_bpf_mod_ret(const struct btf_func_model *m, u8 **pprog,
 	return 0;
 }
 
-static bool is_valid_bpf_tramp_flags(unsigned int flags)
-{
-	if ((flags & BPF_TRAMP_F_RESTORE_REGS) &&
-	    (flags & BPF_TRAMP_F_SKIP_FRAME))
-		return false;
-
-	/*
-	 * BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops,
-	 * and it must be used alone.
-	 */
-	if ((flags & BPF_TRAMP_F_RET_FENTRY_RET) &&
-	    (flags & ~BPF_TRAMP_F_RET_FENTRY_RET))
-		return false;
-
-	return true;
-}
-
 /* Example:
  * __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev);
  * its 'struct btf_func_model' will be nr_args=2
@@ -2017,9 +2000,6 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
 	if (nr_args > 6)
 		return -ENOTSUPP;
 
-	if (!is_valid_bpf_tramp_flags(flags))
-		return -EINVAL;
-
 	/* Generated trampoline stack layout:
 	 *
 	 * RBP + 8         [ return address  ]
diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
index d9a3c9207240..0572cc5aeb28 100644
--- a/kernel/bpf/bpf_struct_ops.c
+++ b/kernel/bpf/bpf_struct_ops.c
@@ -341,6 +341,9 @@ int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks,
 
 	tlinks[BPF_TRAMP_FENTRY].links[0] = link;
 	tlinks[BPF_TRAMP_FENTRY].nr_links = 1;
+	/* BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops,
+	 * and it must be used alone.
+	 */
 	flags = model->ret_size > 0 ? BPF_TRAMP_F_RET_FENTRY_RET : 0;
 	return arch_prepare_bpf_trampoline(NULL, image, image_end,
 					   model, flags, tlinks, NULL);
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 93c7675f0c9e..bd3f2e673874 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -358,6 +358,9 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr)
 
 	if (tlinks[BPF_TRAMP_FEXIT].nr_links ||
 	    tlinks[BPF_TRAMP_MODIFY_RETURN].nr_links)
+		/* NOTE: BPF_TRAMP_F_RESTORE_REGS and BPF_TRAMP_F_SKIP_FRAME
+		 * should not be set together.
+		 */
 		flags = BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME;
 
 	if (ip_arg)
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH bpf-next v5 4/6] bpf, arm64: Impelment bpf_arch_text_poke() for arm64
  2022-05-18 13:16 ` Xu Kuohai
@ 2022-05-18 13:16   ` Xu Kuohai
  -1 siblings, 0 replies; 60+ messages in thread
From: Xu Kuohai @ 2022-05-18 13:16 UTC (permalink / raw)
  To: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest
  Cc: Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Mark Rutland, Pasha Tatashin, Ard Biesheuvel, Daniel Kiss,
	Steven Price, Sudeep Holla, Marc Zyngier, Peter Collingbourne,
	Mark Brown, Delyan Kratunov, Kumar Kartikeya Dwivedi

Impelment bpf_arch_text_poke() for arm64, so bpf trampoline code can use
it to replace nop with jump, or replace jump with nop.

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Acked-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 arch/arm64/net/bpf_jit.h      |   1 +
 arch/arm64/net/bpf_jit_comp.c | 107 +++++++++++++++++++++++++++++++---
 2 files changed, 99 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h
index 194c95ccc1cf..1c4b0075a3e2 100644
--- a/arch/arm64/net/bpf_jit.h
+++ b/arch/arm64/net/bpf_jit.h
@@ -270,6 +270,7 @@
 #define A64_BTI_C  A64_HINT(AARCH64_INSN_HINT_BTIC)
 #define A64_BTI_J  A64_HINT(AARCH64_INSN_HINT_BTIJ)
 #define A64_BTI_JC A64_HINT(AARCH64_INSN_HINT_BTIJC)
+#define A64_NOP    A64_HINT(AARCH64_INSN_HINT_NOP)
 
 /* DMB */
 #define A64_DMB_ISH aarch64_insn_gen_dmb(AARCH64_INSN_MB_ISH)
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 8ab4035dea27..5ce6ed5f42a1 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -9,6 +9,7 @@
 
 #include <linux/bitfield.h>
 #include <linux/bpf.h>
+#include <linux/memory.h>
 #include <linux/filter.h>
 #include <linux/printk.h>
 #include <linux/slab.h>
@@ -18,6 +19,7 @@
 #include <asm/cacheflush.h>
 #include <asm/debug-monitors.h>
 #include <asm/insn.h>
+#include <asm/patching.h>
 #include <asm/set_memory.h>
 
 #include "bpf_jit.h"
@@ -235,13 +237,13 @@ static bool is_lsi_offset(int offset, int scale)
 	return true;
 }
 
+#define BTI_INSNS (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) ? 1 : 0)
+#define PAC_INSNS (IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL) ? 1 : 0)
+
 /* Tail call offset to jump into */
-#if IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) || \
-	IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL)
-#define PROLOGUE_OFFSET 9
-#else
-#define PROLOGUE_OFFSET 8
-#endif
+#define PROLOGUE_OFFSET	(BTI_INSNS + 2 + PAC_INSNS + 8)
+/* Offset of nop instruction in bpf prog entry to be poked */
+#define POKE_OFFSET	(BTI_INSNS + 1)
 
 static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
 {
@@ -279,12 +281,15 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
 	 *
 	 */
 
+	if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
+		emit(A64_BTI_C, ctx);
+
+	emit(A64_MOV(1, A64_R(9), A64_LR), ctx);
+	emit(A64_NOP, ctx);
+
 	/* Sign lr */
 	if (IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL))
 		emit(A64_PACIASP, ctx);
-	/* BTI landing pad */
-	else if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
-		emit(A64_BTI_C, ctx);
 
 	/* Save FP and LR registers to stay align with ARM64 AAPCS */
 	emit(A64_PUSH(A64_FP, A64_LR, A64_SP), ctx);
@@ -1529,3 +1534,87 @@ void bpf_jit_free_exec(void *addr)
 {
 	return vfree(addr);
 }
+
+static int gen_branch_or_nop(enum aarch64_insn_branch_type type, void *ip,
+			     void *addr, u32 *insn)
+{
+	if (!addr)
+		*insn = aarch64_insn_gen_nop();
+	else
+		*insn = aarch64_insn_gen_branch_imm((unsigned long)ip,
+						    (unsigned long)addr,
+						    type);
+
+	return *insn != AARCH64_BREAK_FAULT ? 0 : -EFAULT;
+}
+
+int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type poke_type,
+		       void *old_addr, void *new_addr)
+{
+	int ret;
+	u32 old_insn;
+	u32 new_insn;
+	u32 replaced;
+	unsigned long offset = ~0UL;
+	enum aarch64_insn_branch_type branch_type;
+	char namebuf[KSYM_NAME_LEN];
+
+	if (!__bpf_address_lookup((unsigned long)ip, NULL, &offset, namebuf))
+		/* Only poking bpf text is supported. Since kernel function
+		 * entry is set up by ftrace, we reply on ftrace to poke kernel
+		 * functions.
+		 */
+		return -EINVAL;
+
+	/* bpf entry */
+	if (offset == 0UL)
+		/* skip to the nop instruction in bpf prog entry:
+		 * bti c	// if BTI enabled
+		 * mov x9, x30
+		 * nop
+		 */
+		ip = ip + POKE_OFFSET * AARCH64_INSN_SIZE;
+
+	if (poke_type == BPF_MOD_CALL)
+		branch_type = AARCH64_INSN_BRANCH_LINK;
+	else
+		branch_type = AARCH64_INSN_BRANCH_NOLINK;
+
+	if (gen_branch_or_nop(branch_type, ip, old_addr, &old_insn) < 0)
+		return -EFAULT;
+
+	if (gen_branch_or_nop(branch_type, ip, new_addr, &new_insn) < 0)
+		return -EFAULT;
+
+	mutex_lock(&text_mutex);
+	if (aarch64_insn_read(ip, &replaced)) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	if (replaced != old_insn) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	/* We call aarch64_insn_patch_text_nosync() to replace instruction
+	 * atomically, so no other CPUs will fetch a half-new and half-old
+	 * instruction. But there is chance that another CPU fetches the old
+	 * instruction after bpf_arch_text_poke() finishes, that is, different
+	 * CPUs may execute different versions of instructions at the same
+	 * time before the icache is synchronized by hardware.
+	 *
+	 * 1. when a new trampoline is attached, it is not an issue for
+	 *    different CPUs to jump to different trampolines temporarily.
+	 *
+	 * 2. when an old trampoline is freed, we should wait for all other
+	 *    CPUs to exit the trampoline and make sure the trampoline is no
+	 *    longer reachable, since bpf_tramp_image_put() function already
+	 *    uses percpu_ref and rcu task to do the sync, no need to call the
+	 *    sync interface here.
+	 */
+	ret = aarch64_insn_patch_text_nosync(ip, new_insn);
+out:
+	mutex_unlock(&text_mutex);
+	return ret;
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH bpf-next v5 4/6] bpf, arm64: Impelment bpf_arch_text_poke() for arm64
@ 2022-05-18 13:16   ` Xu Kuohai
  0 siblings, 0 replies; 60+ messages in thread
From: Xu Kuohai @ 2022-05-18 13:16 UTC (permalink / raw)
  To: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest
  Cc: Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Mark Rutland, Pasha Tatashin, Ard Biesheuvel, Daniel Kiss,
	Steven Price, Sudeep Holla, Marc Zyngier, Peter Collingbourne,
	Mark Brown, Delyan Kratunov, Kumar Kartikeya Dwivedi

Impelment bpf_arch_text_poke() for arm64, so bpf trampoline code can use
it to replace nop with jump, or replace jump with nop.

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Acked-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 arch/arm64/net/bpf_jit.h      |   1 +
 arch/arm64/net/bpf_jit_comp.c | 107 +++++++++++++++++++++++++++++++---
 2 files changed, 99 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h
index 194c95ccc1cf..1c4b0075a3e2 100644
--- a/arch/arm64/net/bpf_jit.h
+++ b/arch/arm64/net/bpf_jit.h
@@ -270,6 +270,7 @@
 #define A64_BTI_C  A64_HINT(AARCH64_INSN_HINT_BTIC)
 #define A64_BTI_J  A64_HINT(AARCH64_INSN_HINT_BTIJ)
 #define A64_BTI_JC A64_HINT(AARCH64_INSN_HINT_BTIJC)
+#define A64_NOP    A64_HINT(AARCH64_INSN_HINT_NOP)
 
 /* DMB */
 #define A64_DMB_ISH aarch64_insn_gen_dmb(AARCH64_INSN_MB_ISH)
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 8ab4035dea27..5ce6ed5f42a1 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -9,6 +9,7 @@
 
 #include <linux/bitfield.h>
 #include <linux/bpf.h>
+#include <linux/memory.h>
 #include <linux/filter.h>
 #include <linux/printk.h>
 #include <linux/slab.h>
@@ -18,6 +19,7 @@
 #include <asm/cacheflush.h>
 #include <asm/debug-monitors.h>
 #include <asm/insn.h>
+#include <asm/patching.h>
 #include <asm/set_memory.h>
 
 #include "bpf_jit.h"
@@ -235,13 +237,13 @@ static bool is_lsi_offset(int offset, int scale)
 	return true;
 }
 
+#define BTI_INSNS (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) ? 1 : 0)
+#define PAC_INSNS (IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL) ? 1 : 0)
+
 /* Tail call offset to jump into */
-#if IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) || \
-	IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL)
-#define PROLOGUE_OFFSET 9
-#else
-#define PROLOGUE_OFFSET 8
-#endif
+#define PROLOGUE_OFFSET	(BTI_INSNS + 2 + PAC_INSNS + 8)
+/* Offset of nop instruction in bpf prog entry to be poked */
+#define POKE_OFFSET	(BTI_INSNS + 1)
 
 static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
 {
@@ -279,12 +281,15 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
 	 *
 	 */
 
+	if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
+		emit(A64_BTI_C, ctx);
+
+	emit(A64_MOV(1, A64_R(9), A64_LR), ctx);
+	emit(A64_NOP, ctx);
+
 	/* Sign lr */
 	if (IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL))
 		emit(A64_PACIASP, ctx);
-	/* BTI landing pad */
-	else if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
-		emit(A64_BTI_C, ctx);
 
 	/* Save FP and LR registers to stay align with ARM64 AAPCS */
 	emit(A64_PUSH(A64_FP, A64_LR, A64_SP), ctx);
@@ -1529,3 +1534,87 @@ void bpf_jit_free_exec(void *addr)
 {
 	return vfree(addr);
 }
+
+static int gen_branch_or_nop(enum aarch64_insn_branch_type type, void *ip,
+			     void *addr, u32 *insn)
+{
+	if (!addr)
+		*insn = aarch64_insn_gen_nop();
+	else
+		*insn = aarch64_insn_gen_branch_imm((unsigned long)ip,
+						    (unsigned long)addr,
+						    type);
+
+	return *insn != AARCH64_BREAK_FAULT ? 0 : -EFAULT;
+}
+
+int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type poke_type,
+		       void *old_addr, void *new_addr)
+{
+	int ret;
+	u32 old_insn;
+	u32 new_insn;
+	u32 replaced;
+	unsigned long offset = ~0UL;
+	enum aarch64_insn_branch_type branch_type;
+	char namebuf[KSYM_NAME_LEN];
+
+	if (!__bpf_address_lookup((unsigned long)ip, NULL, &offset, namebuf))
+		/* Only poking bpf text is supported. Since kernel function
+		 * entry is set up by ftrace, we reply on ftrace to poke kernel
+		 * functions.
+		 */
+		return -EINVAL;
+
+	/* bpf entry */
+	if (offset == 0UL)
+		/* skip to the nop instruction in bpf prog entry:
+		 * bti c	// if BTI enabled
+		 * mov x9, x30
+		 * nop
+		 */
+		ip = ip + POKE_OFFSET * AARCH64_INSN_SIZE;
+
+	if (poke_type == BPF_MOD_CALL)
+		branch_type = AARCH64_INSN_BRANCH_LINK;
+	else
+		branch_type = AARCH64_INSN_BRANCH_NOLINK;
+
+	if (gen_branch_or_nop(branch_type, ip, old_addr, &old_insn) < 0)
+		return -EFAULT;
+
+	if (gen_branch_or_nop(branch_type, ip, new_addr, &new_insn) < 0)
+		return -EFAULT;
+
+	mutex_lock(&text_mutex);
+	if (aarch64_insn_read(ip, &replaced)) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	if (replaced != old_insn) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	/* We call aarch64_insn_patch_text_nosync() to replace instruction
+	 * atomically, so no other CPUs will fetch a half-new and half-old
+	 * instruction. But there is chance that another CPU fetches the old
+	 * instruction after bpf_arch_text_poke() finishes, that is, different
+	 * CPUs may execute different versions of instructions at the same
+	 * time before the icache is synchronized by hardware.
+	 *
+	 * 1. when a new trampoline is attached, it is not an issue for
+	 *    different CPUs to jump to different trampolines temporarily.
+	 *
+	 * 2. when an old trampoline is freed, we should wait for all other
+	 *    CPUs to exit the trampoline and make sure the trampoline is no
+	 *    longer reachable, since bpf_tramp_image_put() function already
+	 *    uses percpu_ref and rcu task to do the sync, no need to call the
+	 *    sync interface here.
+	 */
+	ret = aarch64_insn_patch_text_nosync(ip, new_insn);
+out:
+	mutex_unlock(&text_mutex);
+	return ret;
+}
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH bpf-next v5 5/6] bpf, arm64: bpf trampoline for arm64
  2022-05-18 13:16 ` Xu Kuohai
@ 2022-05-18 13:16   ` Xu Kuohai
  -1 siblings, 0 replies; 60+ messages in thread
From: Xu Kuohai @ 2022-05-18 13:16 UTC (permalink / raw)
  To: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest
  Cc: Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Mark Rutland, Pasha Tatashin, Ard Biesheuvel, Daniel Kiss,
	Steven Price, Sudeep Holla, Marc Zyngier, Peter Collingbourne,
	Mark Brown, Delyan Kratunov, Kumar Kartikeya Dwivedi

Add bpf trampoline support for arm64. Most of the logic is the same as
x86.

Tested on raspberry pi 4b and qemu with KASLR disabled (avoid long jump),
result:
 #9  /1     bpf_cookie/kprobe:OK
 #9  /2     bpf_cookie/multi_kprobe_link_api:FAIL
 #9  /3     bpf_cookie/multi_kprobe_attach_api:FAIL
 #9  /4     bpf_cookie/uprobe:OK
 #9  /5     bpf_cookie/tracepoint:OK
 #9  /6     bpf_cookie/perf_event:OK
 #9  /7     bpf_cookie/trampoline:OK
 #9  /8     bpf_cookie/lsm:OK
 #9         bpf_cookie:FAIL
 #18 /1     bpf_tcp_ca/dctcp:OK
 #18 /2     bpf_tcp_ca/cubic:OK
 #18 /3     bpf_tcp_ca/invalid_license:OK
 #18 /4     bpf_tcp_ca/dctcp_fallback:OK
 #18 /5     bpf_tcp_ca/rel_setsockopt:OK
 #18        bpf_tcp_ca:OK
 #51 /1     dummy_st_ops/dummy_st_ops_attach:OK
 #51 /2     dummy_st_ops/dummy_init_ret_value:OK
 #51 /3     dummy_st_ops/dummy_init_ptr_arg:OK
 #51 /4     dummy_st_ops/dummy_multiple_args:OK
 #51        dummy_st_ops:OK
 #55        fentry_fexit:OK
 #56        fentry_test:OK
 #57 /1     fexit_bpf2bpf/target_no_callees:OK
 #57 /2     fexit_bpf2bpf/target_yes_callees:OK
 #57 /3     fexit_bpf2bpf/func_replace:OK
 #57 /4     fexit_bpf2bpf/func_replace_verify:OK
 #57 /5     fexit_bpf2bpf/func_sockmap_update:OK
 #57 /6     fexit_bpf2bpf/func_replace_return_code:OK
 #57 /7     fexit_bpf2bpf/func_map_prog_compatibility:OK
 #57 /8     fexit_bpf2bpf/func_replace_multi:OK
 #57 /9     fexit_bpf2bpf/fmod_ret_freplace:OK
 #57        fexit_bpf2bpf:OK
 #58        fexit_sleep:OK
 #59        fexit_stress:OK
 #60        fexit_test:OK
 #67        get_func_args_test:OK
 #68        get_func_ip_test:OK
 #104       modify_return:OK
 #237       xdp_bpf2bpf:OK

bpf_cookie/multi_kprobe_link_api and bpf_cookie/multi_kprobe_attach_api
failed due to lack of multi_kprobe on arm64.

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Acked-by: Song Liu <songliubraving@fb.com>
---
 arch/arm64/net/bpf_jit_comp.c | 420 +++++++++++++++++++++++++++++++++-
 1 file changed, 413 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 5ce6ed5f42a1..59cb96dc4335 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -142,6 +142,12 @@ static inline void emit_a64_mov_i64(const int reg, const u64 val,
 	}
 }
 
+static inline void emit_bti(u32 insn, struct jit_ctx *ctx)
+{
+	if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
+		emit(insn, ctx);
+}
+
 /*
  * Kernel addresses in the vmalloc space use at most 48 bits, and the
  * remaining bits are guaranteed to be 0x1. So we can compose the address
@@ -161,6 +167,14 @@ static inline void emit_addr_mov_i64(const int reg, const u64 val,
 	}
 }
 
+static inline void emit_call(u64 target, struct jit_ctx *ctx)
+{
+	u8 tmp = bpf2a64[TMP_REG_1];
+
+	emit_addr_mov_i64(tmp, target, ctx);
+	emit(A64_BLR(tmp), ctx);
+}
+
 static inline int bpf2a64_offset(int bpf_insn, int off,
 				 const struct jit_ctx *ctx)
 {
@@ -281,8 +295,7 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
 	 *
 	 */
 
-	if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
-		emit(A64_BTI_C, ctx);
+	emit_bti(A64_BTI_C, ctx);
 
 	emit(A64_MOV(1, A64_R(9), A64_LR), ctx);
 	emit(A64_NOP, ctx);
@@ -316,8 +329,7 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
 		}
 
 		/* BTI landing pad for the tail call, done with a BR */
-		if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
-			emit(A64_BTI_J, ctx);
+		emit_bti(A64_BTI_J, ctx);
 	}
 
 	emit(A64_SUB_I(1, fpb, fp, ctx->fpb_offset), ctx);
@@ -995,8 +1007,7 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 					    &func_addr, &func_addr_fixed);
 		if (ret < 0)
 			return ret;
-		emit_addr_mov_i64(tmp, func_addr, ctx);
-		emit(A64_BLR(tmp), ctx);
+		emit_call(func_addr, ctx);
 		emit(A64_MOV(1, r0, A64_R(0)), ctx);
 		break;
 	}
@@ -1340,6 +1351,13 @@ static int validate_code(struct jit_ctx *ctx)
 		if (a64_insn == AARCH64_BREAK_FAULT)
 			return -1;
 	}
+	return 0;
+}
+
+static int validate_ctx(struct jit_ctx *ctx)
+{
+	if (validate_code(ctx))
+		return -1;
 
 	if (WARN_ON_ONCE(ctx->exentry_idx != ctx->prog->aux->num_exentries))
 		return -1;
@@ -1464,7 +1482,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 	build_epilogue(&ctx);
 
 	/* 3. Extra pass to validate JITed code. */
-	if (validate_code(&ctx)) {
+	if (validate_ctx(&ctx)) {
 		bpf_jit_binary_free(header);
 		prog = orig_prog;
 		goto out_off;
@@ -1535,6 +1553,394 @@ void bpf_jit_free_exec(void *addr)
 	return vfree(addr);
 }
 
+static void invoke_bpf_prog(struct jit_ctx *ctx, struct bpf_tramp_link *l,
+			    int args_off, int retval_off, int run_ctx_off,
+			    bool save_ret)
+{
+	u32 *branch;
+	u64 enter_prog;
+	u64 exit_prog;
+	u8 tmp = bpf2a64[TMP_REG_1];
+	u8 r0 = bpf2a64[BPF_REG_0];
+	struct bpf_prog *p = l->link.prog;
+	int cookie_off = offsetof(struct bpf_tramp_run_ctx, bpf_cookie);
+
+	if (p->aux->sleepable) {
+		enter_prog = (u64)__bpf_prog_enter_sleepable;
+		exit_prog = (u64)__bpf_prog_exit_sleepable;
+	} else {
+		enter_prog = (u64)__bpf_prog_enter;
+		exit_prog = (u64)__bpf_prog_exit;
+	}
+
+	if (l->cookie == 0) {
+		/* if cookie is zero, one instruction is enough to store it */
+		emit(A64_STR64I(A64_ZR, A64_SP, run_ctx_off + cookie_off), ctx);
+	} else {
+		emit_a64_mov_i64(tmp, l->cookie, ctx);
+		emit(A64_STR64I(tmp, A64_SP, run_ctx_off + cookie_off), ctx);
+	}
+
+	/* save p to callee saved register x19 to avoid loading p with mov_i64
+	 * each time.
+	 */
+	emit_addr_mov_i64(A64_R(19), (const u64)p, ctx);
+
+	/* arg1: prog */
+	emit(A64_MOV(1, A64_R(0), A64_R(19)), ctx);
+	/* arg2: &run_ctx */
+	emit(A64_ADD_I(1, A64_R(1), A64_SP, run_ctx_off), ctx);
+
+	emit_call(enter_prog, ctx);
+
+	/* if (__bpf_prog_enter(prog) == 0)
+	 *         goto skip_exec_of_prog;
+	 */
+	branch = ctx->image + ctx->idx;
+	emit(A64_NOP, ctx);
+
+	/* save return value to callee saved register x20 */
+	emit(A64_MOV(1, A64_R(20), r0), ctx);
+
+	emit(A64_ADD_I(1, A64_R(0), A64_SP, args_off), ctx);
+	if (!p->jited)
+		emit_addr_mov_i64(A64_R(1), (const u64)p->insnsi, ctx);
+
+	emit_call((const u64)p->bpf_func, ctx);
+
+	/* store return value */
+	if (save_ret)
+		emit(A64_STR64I(r0, A64_SP, retval_off), ctx);
+
+	if (ctx->image) {
+		int offset = &ctx->image[ctx->idx] - branch;
+		*branch = A64_CBZ(1, A64_R(0), offset);
+	}
+	emit_bti(A64_BTI_J, ctx);
+
+	/* arg1: prog */
+	emit(A64_MOV(1, A64_R(0), A64_R(19)), ctx);
+	/* arg2: start time */
+	emit(A64_MOV(1, A64_R(1), A64_R(20)), ctx);
+	/* arg3: &run_ctx */
+	emit(A64_ADD_I(1, A64_R(2), A64_SP, run_ctx_off), ctx);
+
+	emit_call(exit_prog, ctx);
+}
+
+static void invoke_bpf_mod_ret(struct jit_ctx *ctx, struct bpf_tramp_links *tl,
+			       int args_off, int retval_off, int run_ctx_off,
+			       u32 **branches)
+{
+	int i;
+
+	/* The first fmod_ret program will receive a garbage return value.
+	 * Set this to 0 to avoid confusing the program.
+	 */
+	emit(A64_STR64I(A64_ZR, A64_SP, retval_off), ctx);
+	for (i = 0; i < tl->nr_links; i++) {
+		invoke_bpf_prog(ctx, tl->links[i], args_off, retval_off,
+				run_ctx_off, true);
+		/* if (*(u64 *)(sp + retval_off) !=  0)
+		 *	goto do_fexit;
+		 */
+		emit(A64_LDR64I(A64_R(10), A64_SP, retval_off), ctx);
+		/* Save the location of branch, and generate a nop.
+		 * This nop will be replaced with a cbnz later.
+		 */
+		branches[i] = ctx->image + ctx->idx;
+		emit(A64_NOP, ctx);
+	}
+}
+
+static void save_args(struct jit_ctx *ctx, int args_off, int nargs)
+{
+	int i;
+
+	for (i = 0; i < nargs; i++) {
+		emit(A64_STR64I(i, A64_SP, args_off), ctx);
+		args_off += 8;
+	}
+}
+
+static void restore_args(struct jit_ctx *ctx, int args_off, int nargs)
+{
+	int i;
+
+	for (i = 0; i < nargs; i++) {
+		emit(A64_LDR64I(i, A64_SP, args_off), ctx);
+		args_off += 8;
+	}
+}
+
+/* Based on the x86's implementation of arch_prepare_bpf_trampoline().
+ *
+ * bpf prog and function entry before bpf trampoline hooked:
+ *   mov x9, x30
+ *   nop
+ *
+ * bpf prog and function entry after bpf trampoline hooked:
+ *   mov x9, x30
+ *   bl  <bpf_trampoline>
+ *
+ */
+static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
+			      struct bpf_tramp_links *tlinks, void *orig_call,
+			      int nargs, u32 flags)
+{
+	int i;
+	int stack_size;
+	int retaddr_off;
+	int regs_off;
+	int retval_off;
+	int args_off;
+	int nargs_off;
+	int ip_off;
+	int run_ctx_off;
+	struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
+	struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
+	struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
+	bool save_ret;
+	u32 **branches = NULL;
+
+	/* trampoline stack layout:
+	 *                  [ parent ip         ]
+	 *                  [ FP                ]
+	 * SP + retaddr_off [ self ip           ]
+	 *                  [ FP                ]
+	 *
+	 *                  [ padding           ] align SP to multiples of 16
+	 *
+	 *                  [ x20               ] callee saved reg x20
+	 * sp + regs_off    [ x19               ] callee saved reg x19
+	 *
+	 * SP + retval_off  [ return value      ] BPF_TRAMP_F_CALL_ORIG or
+	 *                                        BPF_TRAMP_F_RET_FENTRY_RET
+	 *
+	 *                  [ argN              ]
+	 *                  [ ...               ]
+	 * sp + args_off    [ arg1              ]
+	 *
+	 * SP + nargs_off   [ args count        ]
+	 *
+	 * SP + ip_off      [ traced function   ] BPF_TRAMP_F_IP_ARG flag
+	 *
+	 * SP + run_ctx_off [ bpf_tramp_run_ctx ]
+	 */
+
+	stack_size = 0;
+	run_ctx_off = stack_size;
+	/* room for bpf_tramp_run_ctx */
+	stack_size += round_up(sizeof(struct bpf_tramp_run_ctx), 8);
+
+	ip_off = stack_size;
+	/* room for IP address argument */
+	if (flags & BPF_TRAMP_F_IP_ARG)
+		stack_size += 8;
+
+	nargs_off = stack_size;
+	/* room for args count */
+	stack_size += 8;
+
+	args_off = stack_size;
+	/* room for args */
+	stack_size += nargs * 8;
+
+	/* room for return value */
+	retval_off = stack_size;
+	save_ret = flags & (BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_RET_FENTRY_RET);
+	if (save_ret)
+		stack_size += 8;
+
+	/* room for callee saved registers, currently x19 and x20 are used */
+	regs_off = stack_size;
+	stack_size += 16;
+
+	/* round up to multiples of 16 to avoid SPAlignmentFault */
+	stack_size = round_up(stack_size, 16);
+
+	/* return address locates above FP */
+	retaddr_off = stack_size + 8;
+
+	emit_bti(A64_BTI_C, ctx);
+
+	/* frame for parent function */
+	emit(A64_PUSH(A64_FP, A64_R(9), A64_SP), ctx);
+	emit(A64_MOV(1, A64_FP, A64_SP), ctx);
+
+	/* frame for patched function */
+	emit(A64_PUSH(A64_FP, A64_LR, A64_SP), ctx);
+	emit(A64_MOV(1, A64_FP, A64_SP), ctx);
+
+	/* allocate stack space */
+	emit(A64_SUB_I(1, A64_SP, A64_SP, stack_size), ctx);
+
+	if (flags & BPF_TRAMP_F_IP_ARG) {
+		/* save ip address of the traced function */
+		emit_addr_mov_i64(A64_R(10), (const u64)orig_call, ctx);
+		emit(A64_STR64I(A64_R(10), A64_SP, ip_off), ctx);
+	}
+
+	/* save args count*/
+	emit(A64_MOVZ(1, A64_R(10), nargs, 0), ctx);
+	emit(A64_STR64I(A64_R(10), A64_SP, nargs_off), ctx);
+
+	/* save args */
+	save_args(ctx, args_off, nargs);
+
+	/* save callee saved registers */
+	emit(A64_STR64I(A64_R(19), A64_SP, regs_off), ctx);
+	emit(A64_STR64I(A64_R(20), A64_SP, regs_off + 8), ctx);
+
+	if (flags & BPF_TRAMP_F_CALL_ORIG) {
+		emit_addr_mov_i64(A64_R(0), (const u64)im, ctx);
+		emit_call((const u64)__bpf_tramp_enter, ctx);
+	}
+
+	for (i = 0; i < fentry->nr_links; i++)
+		invoke_bpf_prog(ctx, fentry->links[i], args_off,
+				retval_off, run_ctx_off,
+				flags & BPF_TRAMP_F_RET_FENTRY_RET);
+
+	if (fmod_ret->nr_links) {
+		branches = kcalloc(fmod_ret->nr_links, sizeof(u32 *),
+				   GFP_KERNEL);
+		if (!branches)
+			return -ENOMEM;
+
+		invoke_bpf_mod_ret(ctx, fmod_ret, args_off, retval_off,
+				   run_ctx_off, branches);
+	}
+
+	if (flags & BPF_TRAMP_F_CALL_ORIG) {
+		restore_args(ctx, args_off, nargs);
+		/* call original func */
+		emit(A64_LDR64I(A64_R(10), A64_SP, retaddr_off), ctx);
+		emit(A64_BLR(A64_R(10)), ctx);
+		/* store return value */
+		emit(A64_STR64I(A64_R(0), A64_SP, retval_off), ctx);
+		/* reserve a nop for bpf_tramp_image_put */
+		im->ip_after_call = ctx->image + ctx->idx;
+		emit(A64_NOP, ctx);
+	}
+
+	/* update the branches saved in invoke_bpf_mod_ret with cbnz */
+	for (i = 0; i < fmod_ret->nr_links && ctx->image != NULL; i++) {
+		int offset = &ctx->image[ctx->idx] - branches[i];
+		*branches[i] = A64_CBNZ(1, A64_R(10), offset);
+	}
+
+	if (fmod_ret->nr_links)
+		emit_bti(A64_BTI_J, ctx);
+
+	for (i = 0; i < fexit->nr_links; i++)
+		invoke_bpf_prog(ctx, fexit->links[i], args_off, retval_off,
+				run_ctx_off, false);
+
+	if (flags & BPF_TRAMP_F_RESTORE_REGS)
+		restore_args(ctx, args_off, nargs);
+
+	if (flags & BPF_TRAMP_F_CALL_ORIG) {
+		im->ip_epilogue = ctx->image + ctx->idx;
+		emit_bti(A64_BTI_J, ctx);
+		emit_addr_mov_i64(A64_R(0), (const u64)im, ctx);
+		emit_call((const u64)__bpf_tramp_exit, ctx);
+	}
+
+	/* restore callee saved register x19 and x20 */
+	emit(A64_LDR64I(A64_R(19), A64_SP, regs_off), ctx);
+	emit(A64_LDR64I(A64_R(20), A64_SP, regs_off + 8), ctx);
+
+	if (save_ret)
+		emit(A64_LDR64I(A64_R(0), A64_SP, retval_off), ctx);
+
+	/* reset SP  */
+	emit(A64_MOV(1, A64_SP, A64_FP), ctx);
+
+	/* pop frames  */
+	emit(A64_POP(A64_FP, A64_LR, A64_SP), ctx);
+	emit(A64_POP(A64_FP, A64_R(9), A64_SP), ctx);
+
+	if (flags & BPF_TRAMP_F_SKIP_FRAME) {
+		/* skip patched function, return to parent */
+		emit(A64_MOV(1, A64_LR, A64_R(9)), ctx);
+		emit(A64_RET(A64_R(9)), ctx);
+	} else {
+		/* return to patched function */
+		emit(A64_MOV(1, A64_R(10), A64_LR), ctx);
+		emit(A64_MOV(1, A64_LR, A64_R(9)), ctx);
+		emit(A64_RET(A64_R(10)), ctx);
+	}
+
+	if (ctx->image)
+		bpf_flush_icache(ctx->image, ctx->image + ctx->idx);
+
+	kfree(branches);
+
+	return ctx->idx;
+}
+
+static inline bool is_long_jump(void *ip, void *target)
+{
+	long offset;
+
+	/* when ip == NULL, the trampoline address is used by bpf_struct_ops
+	 * as a normal kernel function pointer, which can be always jumped to,
+	 * so don't treat it as a long jump.
+	 */
+	if (ip == NULL)
+		return false;
+
+	offset = (long)target - (long)ip;
+	return offset < -SZ_128M || offset >= SZ_128M;
+}
+
+int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image,
+				void *image_end, const struct btf_func_model *m,
+				u32 flags, struct bpf_tramp_links *tlinks,
+				void *orig_call)
+{
+	int ret;
+	int nargs = m->nr_args;
+	int max_insns = ((long)image_end - (long)image) / AARCH64_INSN_SIZE;
+	struct jit_ctx ctx = {
+		.image = NULL,
+		.idx = 0
+	};
+
+	/* Return error since the long jump is not implemented. Otherwise,
+	 * ftrace_bug will be triggered when the fentry is patched by ftrace,
+	 * making ftrace no longer work properly.
+	 */
+	if (is_long_jump(orig_call, image))
+		return -ENOTSUPP;
+
+	/* the first 8 arguments are passed by registers */
+	if (nargs > 8)
+		return -ENOTSUPP;
+
+	ret = prepare_trampoline(&ctx, im, tlinks, orig_call, nargs, flags);
+	if (ret < 0)
+		return ret;
+
+	if (ret > max_insns)
+		return -EFBIG;
+
+	ctx.image = image;
+	ctx.idx = 0;
+
+	jit_fill_hole(image, (unsigned int)(image_end - image));
+	ret = prepare_trampoline(&ctx, im, tlinks, orig_call, nargs, flags);
+
+	if (ret > 0 && validate_code(&ctx) < 0)
+		ret = -EINVAL;
+
+	if (ret > 0)
+		ret *= AARCH64_INSN_SIZE;
+
+	return ret;
+}
+
 static int gen_branch_or_nop(enum aarch64_insn_branch_type type, void *ip,
 			     void *addr, u32 *insn)
 {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH bpf-next v5 5/6] bpf, arm64: bpf trampoline for arm64
@ 2022-05-18 13:16   ` Xu Kuohai
  0 siblings, 0 replies; 60+ messages in thread
From: Xu Kuohai @ 2022-05-18 13:16 UTC (permalink / raw)
  To: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest
  Cc: Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Mark Rutland, Pasha Tatashin, Ard Biesheuvel, Daniel Kiss,
	Steven Price, Sudeep Holla, Marc Zyngier, Peter Collingbourne,
	Mark Brown, Delyan Kratunov, Kumar Kartikeya Dwivedi

Add bpf trampoline support for arm64. Most of the logic is the same as
x86.

Tested on raspberry pi 4b and qemu with KASLR disabled (avoid long jump),
result:
 #9  /1     bpf_cookie/kprobe:OK
 #9  /2     bpf_cookie/multi_kprobe_link_api:FAIL
 #9  /3     bpf_cookie/multi_kprobe_attach_api:FAIL
 #9  /4     bpf_cookie/uprobe:OK
 #9  /5     bpf_cookie/tracepoint:OK
 #9  /6     bpf_cookie/perf_event:OK
 #9  /7     bpf_cookie/trampoline:OK
 #9  /8     bpf_cookie/lsm:OK
 #9         bpf_cookie:FAIL
 #18 /1     bpf_tcp_ca/dctcp:OK
 #18 /2     bpf_tcp_ca/cubic:OK
 #18 /3     bpf_tcp_ca/invalid_license:OK
 #18 /4     bpf_tcp_ca/dctcp_fallback:OK
 #18 /5     bpf_tcp_ca/rel_setsockopt:OK
 #18        bpf_tcp_ca:OK
 #51 /1     dummy_st_ops/dummy_st_ops_attach:OK
 #51 /2     dummy_st_ops/dummy_init_ret_value:OK
 #51 /3     dummy_st_ops/dummy_init_ptr_arg:OK
 #51 /4     dummy_st_ops/dummy_multiple_args:OK
 #51        dummy_st_ops:OK
 #55        fentry_fexit:OK
 #56        fentry_test:OK
 #57 /1     fexit_bpf2bpf/target_no_callees:OK
 #57 /2     fexit_bpf2bpf/target_yes_callees:OK
 #57 /3     fexit_bpf2bpf/func_replace:OK
 #57 /4     fexit_bpf2bpf/func_replace_verify:OK
 #57 /5     fexit_bpf2bpf/func_sockmap_update:OK
 #57 /6     fexit_bpf2bpf/func_replace_return_code:OK
 #57 /7     fexit_bpf2bpf/func_map_prog_compatibility:OK
 #57 /8     fexit_bpf2bpf/func_replace_multi:OK
 #57 /9     fexit_bpf2bpf/fmod_ret_freplace:OK
 #57        fexit_bpf2bpf:OK
 #58        fexit_sleep:OK
 #59        fexit_stress:OK
 #60        fexit_test:OK
 #67        get_func_args_test:OK
 #68        get_func_ip_test:OK
 #104       modify_return:OK
 #237       xdp_bpf2bpf:OK

bpf_cookie/multi_kprobe_link_api and bpf_cookie/multi_kprobe_attach_api
failed due to lack of multi_kprobe on arm64.

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Acked-by: Song Liu <songliubraving@fb.com>
---
 arch/arm64/net/bpf_jit_comp.c | 420 +++++++++++++++++++++++++++++++++-
 1 file changed, 413 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 5ce6ed5f42a1..59cb96dc4335 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -142,6 +142,12 @@ static inline void emit_a64_mov_i64(const int reg, const u64 val,
 	}
 }
 
+static inline void emit_bti(u32 insn, struct jit_ctx *ctx)
+{
+	if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
+		emit(insn, ctx);
+}
+
 /*
  * Kernel addresses in the vmalloc space use at most 48 bits, and the
  * remaining bits are guaranteed to be 0x1. So we can compose the address
@@ -161,6 +167,14 @@ static inline void emit_addr_mov_i64(const int reg, const u64 val,
 	}
 }
 
+static inline void emit_call(u64 target, struct jit_ctx *ctx)
+{
+	u8 tmp = bpf2a64[TMP_REG_1];
+
+	emit_addr_mov_i64(tmp, target, ctx);
+	emit(A64_BLR(tmp), ctx);
+}
+
 static inline int bpf2a64_offset(int bpf_insn, int off,
 				 const struct jit_ctx *ctx)
 {
@@ -281,8 +295,7 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
 	 *
 	 */
 
-	if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
-		emit(A64_BTI_C, ctx);
+	emit_bti(A64_BTI_C, ctx);
 
 	emit(A64_MOV(1, A64_R(9), A64_LR), ctx);
 	emit(A64_NOP, ctx);
@@ -316,8 +329,7 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
 		}
 
 		/* BTI landing pad for the tail call, done with a BR */
-		if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
-			emit(A64_BTI_J, ctx);
+		emit_bti(A64_BTI_J, ctx);
 	}
 
 	emit(A64_SUB_I(1, fpb, fp, ctx->fpb_offset), ctx);
@@ -995,8 +1007,7 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 					    &func_addr, &func_addr_fixed);
 		if (ret < 0)
 			return ret;
-		emit_addr_mov_i64(tmp, func_addr, ctx);
-		emit(A64_BLR(tmp), ctx);
+		emit_call(func_addr, ctx);
 		emit(A64_MOV(1, r0, A64_R(0)), ctx);
 		break;
 	}
@@ -1340,6 +1351,13 @@ static int validate_code(struct jit_ctx *ctx)
 		if (a64_insn == AARCH64_BREAK_FAULT)
 			return -1;
 	}
+	return 0;
+}
+
+static int validate_ctx(struct jit_ctx *ctx)
+{
+	if (validate_code(ctx))
+		return -1;
 
 	if (WARN_ON_ONCE(ctx->exentry_idx != ctx->prog->aux->num_exentries))
 		return -1;
@@ -1464,7 +1482,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 	build_epilogue(&ctx);
 
 	/* 3. Extra pass to validate JITed code. */
-	if (validate_code(&ctx)) {
+	if (validate_ctx(&ctx)) {
 		bpf_jit_binary_free(header);
 		prog = orig_prog;
 		goto out_off;
@@ -1535,6 +1553,394 @@ void bpf_jit_free_exec(void *addr)
 	return vfree(addr);
 }
 
+static void invoke_bpf_prog(struct jit_ctx *ctx, struct bpf_tramp_link *l,
+			    int args_off, int retval_off, int run_ctx_off,
+			    bool save_ret)
+{
+	u32 *branch;
+	u64 enter_prog;
+	u64 exit_prog;
+	u8 tmp = bpf2a64[TMP_REG_1];
+	u8 r0 = bpf2a64[BPF_REG_0];
+	struct bpf_prog *p = l->link.prog;
+	int cookie_off = offsetof(struct bpf_tramp_run_ctx, bpf_cookie);
+
+	if (p->aux->sleepable) {
+		enter_prog = (u64)__bpf_prog_enter_sleepable;
+		exit_prog = (u64)__bpf_prog_exit_sleepable;
+	} else {
+		enter_prog = (u64)__bpf_prog_enter;
+		exit_prog = (u64)__bpf_prog_exit;
+	}
+
+	if (l->cookie == 0) {
+		/* if cookie is zero, one instruction is enough to store it */
+		emit(A64_STR64I(A64_ZR, A64_SP, run_ctx_off + cookie_off), ctx);
+	} else {
+		emit_a64_mov_i64(tmp, l->cookie, ctx);
+		emit(A64_STR64I(tmp, A64_SP, run_ctx_off + cookie_off), ctx);
+	}
+
+	/* save p to callee saved register x19 to avoid loading p with mov_i64
+	 * each time.
+	 */
+	emit_addr_mov_i64(A64_R(19), (const u64)p, ctx);
+
+	/* arg1: prog */
+	emit(A64_MOV(1, A64_R(0), A64_R(19)), ctx);
+	/* arg2: &run_ctx */
+	emit(A64_ADD_I(1, A64_R(1), A64_SP, run_ctx_off), ctx);
+
+	emit_call(enter_prog, ctx);
+
+	/* if (__bpf_prog_enter(prog) == 0)
+	 *         goto skip_exec_of_prog;
+	 */
+	branch = ctx->image + ctx->idx;
+	emit(A64_NOP, ctx);
+
+	/* save return value to callee saved register x20 */
+	emit(A64_MOV(1, A64_R(20), r0), ctx);
+
+	emit(A64_ADD_I(1, A64_R(0), A64_SP, args_off), ctx);
+	if (!p->jited)
+		emit_addr_mov_i64(A64_R(1), (const u64)p->insnsi, ctx);
+
+	emit_call((const u64)p->bpf_func, ctx);
+
+	/* store return value */
+	if (save_ret)
+		emit(A64_STR64I(r0, A64_SP, retval_off), ctx);
+
+	if (ctx->image) {
+		int offset = &ctx->image[ctx->idx] - branch;
+		*branch = A64_CBZ(1, A64_R(0), offset);
+	}
+	emit_bti(A64_BTI_J, ctx);
+
+	/* arg1: prog */
+	emit(A64_MOV(1, A64_R(0), A64_R(19)), ctx);
+	/* arg2: start time */
+	emit(A64_MOV(1, A64_R(1), A64_R(20)), ctx);
+	/* arg3: &run_ctx */
+	emit(A64_ADD_I(1, A64_R(2), A64_SP, run_ctx_off), ctx);
+
+	emit_call(exit_prog, ctx);
+}
+
+static void invoke_bpf_mod_ret(struct jit_ctx *ctx, struct bpf_tramp_links *tl,
+			       int args_off, int retval_off, int run_ctx_off,
+			       u32 **branches)
+{
+	int i;
+
+	/* The first fmod_ret program will receive a garbage return value.
+	 * Set this to 0 to avoid confusing the program.
+	 */
+	emit(A64_STR64I(A64_ZR, A64_SP, retval_off), ctx);
+	for (i = 0; i < tl->nr_links; i++) {
+		invoke_bpf_prog(ctx, tl->links[i], args_off, retval_off,
+				run_ctx_off, true);
+		/* if (*(u64 *)(sp + retval_off) !=  0)
+		 *	goto do_fexit;
+		 */
+		emit(A64_LDR64I(A64_R(10), A64_SP, retval_off), ctx);
+		/* Save the location of branch, and generate a nop.
+		 * This nop will be replaced with a cbnz later.
+		 */
+		branches[i] = ctx->image + ctx->idx;
+		emit(A64_NOP, ctx);
+	}
+}
+
+static void save_args(struct jit_ctx *ctx, int args_off, int nargs)
+{
+	int i;
+
+	for (i = 0; i < nargs; i++) {
+		emit(A64_STR64I(i, A64_SP, args_off), ctx);
+		args_off += 8;
+	}
+}
+
+static void restore_args(struct jit_ctx *ctx, int args_off, int nargs)
+{
+	int i;
+
+	for (i = 0; i < nargs; i++) {
+		emit(A64_LDR64I(i, A64_SP, args_off), ctx);
+		args_off += 8;
+	}
+}
+
+/* Based on the x86's implementation of arch_prepare_bpf_trampoline().
+ *
+ * bpf prog and function entry before bpf trampoline hooked:
+ *   mov x9, x30
+ *   nop
+ *
+ * bpf prog and function entry after bpf trampoline hooked:
+ *   mov x9, x30
+ *   bl  <bpf_trampoline>
+ *
+ */
+static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
+			      struct bpf_tramp_links *tlinks, void *orig_call,
+			      int nargs, u32 flags)
+{
+	int i;
+	int stack_size;
+	int retaddr_off;
+	int regs_off;
+	int retval_off;
+	int args_off;
+	int nargs_off;
+	int ip_off;
+	int run_ctx_off;
+	struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
+	struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
+	struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
+	bool save_ret;
+	u32 **branches = NULL;
+
+	/* trampoline stack layout:
+	 *                  [ parent ip         ]
+	 *                  [ FP                ]
+	 * SP + retaddr_off [ self ip           ]
+	 *                  [ FP                ]
+	 *
+	 *                  [ padding           ] align SP to multiples of 16
+	 *
+	 *                  [ x20               ] callee saved reg x20
+	 * sp + regs_off    [ x19               ] callee saved reg x19
+	 *
+	 * SP + retval_off  [ return value      ] BPF_TRAMP_F_CALL_ORIG or
+	 *                                        BPF_TRAMP_F_RET_FENTRY_RET
+	 *
+	 *                  [ argN              ]
+	 *                  [ ...               ]
+	 * sp + args_off    [ arg1              ]
+	 *
+	 * SP + nargs_off   [ args count        ]
+	 *
+	 * SP + ip_off      [ traced function   ] BPF_TRAMP_F_IP_ARG flag
+	 *
+	 * SP + run_ctx_off [ bpf_tramp_run_ctx ]
+	 */
+
+	stack_size = 0;
+	run_ctx_off = stack_size;
+	/* room for bpf_tramp_run_ctx */
+	stack_size += round_up(sizeof(struct bpf_tramp_run_ctx), 8);
+
+	ip_off = stack_size;
+	/* room for IP address argument */
+	if (flags & BPF_TRAMP_F_IP_ARG)
+		stack_size += 8;
+
+	nargs_off = stack_size;
+	/* room for args count */
+	stack_size += 8;
+
+	args_off = stack_size;
+	/* room for args */
+	stack_size += nargs * 8;
+
+	/* room for return value */
+	retval_off = stack_size;
+	save_ret = flags & (BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_RET_FENTRY_RET);
+	if (save_ret)
+		stack_size += 8;
+
+	/* room for callee saved registers, currently x19 and x20 are used */
+	regs_off = stack_size;
+	stack_size += 16;
+
+	/* round up to multiples of 16 to avoid SPAlignmentFault */
+	stack_size = round_up(stack_size, 16);
+
+	/* return address locates above FP */
+	retaddr_off = stack_size + 8;
+
+	emit_bti(A64_BTI_C, ctx);
+
+	/* frame for parent function */
+	emit(A64_PUSH(A64_FP, A64_R(9), A64_SP), ctx);
+	emit(A64_MOV(1, A64_FP, A64_SP), ctx);
+
+	/* frame for patched function */
+	emit(A64_PUSH(A64_FP, A64_LR, A64_SP), ctx);
+	emit(A64_MOV(1, A64_FP, A64_SP), ctx);
+
+	/* allocate stack space */
+	emit(A64_SUB_I(1, A64_SP, A64_SP, stack_size), ctx);
+
+	if (flags & BPF_TRAMP_F_IP_ARG) {
+		/* save ip address of the traced function */
+		emit_addr_mov_i64(A64_R(10), (const u64)orig_call, ctx);
+		emit(A64_STR64I(A64_R(10), A64_SP, ip_off), ctx);
+	}
+
+	/* save args count*/
+	emit(A64_MOVZ(1, A64_R(10), nargs, 0), ctx);
+	emit(A64_STR64I(A64_R(10), A64_SP, nargs_off), ctx);
+
+	/* save args */
+	save_args(ctx, args_off, nargs);
+
+	/* save callee saved registers */
+	emit(A64_STR64I(A64_R(19), A64_SP, regs_off), ctx);
+	emit(A64_STR64I(A64_R(20), A64_SP, regs_off + 8), ctx);
+
+	if (flags & BPF_TRAMP_F_CALL_ORIG) {
+		emit_addr_mov_i64(A64_R(0), (const u64)im, ctx);
+		emit_call((const u64)__bpf_tramp_enter, ctx);
+	}
+
+	for (i = 0; i < fentry->nr_links; i++)
+		invoke_bpf_prog(ctx, fentry->links[i], args_off,
+				retval_off, run_ctx_off,
+				flags & BPF_TRAMP_F_RET_FENTRY_RET);
+
+	if (fmod_ret->nr_links) {
+		branches = kcalloc(fmod_ret->nr_links, sizeof(u32 *),
+				   GFP_KERNEL);
+		if (!branches)
+			return -ENOMEM;
+
+		invoke_bpf_mod_ret(ctx, fmod_ret, args_off, retval_off,
+				   run_ctx_off, branches);
+	}
+
+	if (flags & BPF_TRAMP_F_CALL_ORIG) {
+		restore_args(ctx, args_off, nargs);
+		/* call original func */
+		emit(A64_LDR64I(A64_R(10), A64_SP, retaddr_off), ctx);
+		emit(A64_BLR(A64_R(10)), ctx);
+		/* store return value */
+		emit(A64_STR64I(A64_R(0), A64_SP, retval_off), ctx);
+		/* reserve a nop for bpf_tramp_image_put */
+		im->ip_after_call = ctx->image + ctx->idx;
+		emit(A64_NOP, ctx);
+	}
+
+	/* update the branches saved in invoke_bpf_mod_ret with cbnz */
+	for (i = 0; i < fmod_ret->nr_links && ctx->image != NULL; i++) {
+		int offset = &ctx->image[ctx->idx] - branches[i];
+		*branches[i] = A64_CBNZ(1, A64_R(10), offset);
+	}
+
+	if (fmod_ret->nr_links)
+		emit_bti(A64_BTI_J, ctx);
+
+	for (i = 0; i < fexit->nr_links; i++)
+		invoke_bpf_prog(ctx, fexit->links[i], args_off, retval_off,
+				run_ctx_off, false);
+
+	if (flags & BPF_TRAMP_F_RESTORE_REGS)
+		restore_args(ctx, args_off, nargs);
+
+	if (flags & BPF_TRAMP_F_CALL_ORIG) {
+		im->ip_epilogue = ctx->image + ctx->idx;
+		emit_bti(A64_BTI_J, ctx);
+		emit_addr_mov_i64(A64_R(0), (const u64)im, ctx);
+		emit_call((const u64)__bpf_tramp_exit, ctx);
+	}
+
+	/* restore callee saved register x19 and x20 */
+	emit(A64_LDR64I(A64_R(19), A64_SP, regs_off), ctx);
+	emit(A64_LDR64I(A64_R(20), A64_SP, regs_off + 8), ctx);
+
+	if (save_ret)
+		emit(A64_LDR64I(A64_R(0), A64_SP, retval_off), ctx);
+
+	/* reset SP  */
+	emit(A64_MOV(1, A64_SP, A64_FP), ctx);
+
+	/* pop frames  */
+	emit(A64_POP(A64_FP, A64_LR, A64_SP), ctx);
+	emit(A64_POP(A64_FP, A64_R(9), A64_SP), ctx);
+
+	if (flags & BPF_TRAMP_F_SKIP_FRAME) {
+		/* skip patched function, return to parent */
+		emit(A64_MOV(1, A64_LR, A64_R(9)), ctx);
+		emit(A64_RET(A64_R(9)), ctx);
+	} else {
+		/* return to patched function */
+		emit(A64_MOV(1, A64_R(10), A64_LR), ctx);
+		emit(A64_MOV(1, A64_LR, A64_R(9)), ctx);
+		emit(A64_RET(A64_R(10)), ctx);
+	}
+
+	if (ctx->image)
+		bpf_flush_icache(ctx->image, ctx->image + ctx->idx);
+
+	kfree(branches);
+
+	return ctx->idx;
+}
+
+static inline bool is_long_jump(void *ip, void *target)
+{
+	long offset;
+
+	/* when ip == NULL, the trampoline address is used by bpf_struct_ops
+	 * as a normal kernel function pointer, which can be always jumped to,
+	 * so don't treat it as a long jump.
+	 */
+	if (ip == NULL)
+		return false;
+
+	offset = (long)target - (long)ip;
+	return offset < -SZ_128M || offset >= SZ_128M;
+}
+
+int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image,
+				void *image_end, const struct btf_func_model *m,
+				u32 flags, struct bpf_tramp_links *tlinks,
+				void *orig_call)
+{
+	int ret;
+	int nargs = m->nr_args;
+	int max_insns = ((long)image_end - (long)image) / AARCH64_INSN_SIZE;
+	struct jit_ctx ctx = {
+		.image = NULL,
+		.idx = 0
+	};
+
+	/* Return error since the long jump is not implemented. Otherwise,
+	 * ftrace_bug will be triggered when the fentry is patched by ftrace,
+	 * making ftrace no longer work properly.
+	 */
+	if (is_long_jump(orig_call, image))
+		return -ENOTSUPP;
+
+	/* the first 8 arguments are passed by registers */
+	if (nargs > 8)
+		return -ENOTSUPP;
+
+	ret = prepare_trampoline(&ctx, im, tlinks, orig_call, nargs, flags);
+	if (ret < 0)
+		return ret;
+
+	if (ret > max_insns)
+		return -EFBIG;
+
+	ctx.image = image;
+	ctx.idx = 0;
+
+	jit_fill_hole(image, (unsigned int)(image_end - image));
+	ret = prepare_trampoline(&ctx, im, tlinks, orig_call, nargs, flags);
+
+	if (ret > 0 && validate_code(&ctx) < 0)
+		ret = -EINVAL;
+
+	if (ret > 0)
+		ret *= AARCH64_INSN_SIZE;
+
+	return ret;
+}
+
 static int gen_branch_or_nop(enum aarch64_insn_branch_type type, void *ip,
 			     void *addr, u32 *insn)
 {
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH bpf-next v5 6/6] selftests/bpf: Fix trivial typo in fentry_fexit.c
  2022-05-18 13:16 ` Xu Kuohai
@ 2022-05-18 13:16   ` Xu Kuohai
  -1 siblings, 0 replies; 60+ messages in thread
From: Xu Kuohai @ 2022-05-18 13:16 UTC (permalink / raw)
  To: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest
  Cc: Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Mark Rutland, Pasha Tatashin, Ard Biesheuvel, Daniel Kiss,
	Steven Price, Sudeep Holla, Marc Zyngier, Peter Collingbourne,
	Mark Brown, Delyan Kratunov, Kumar Kartikeya Dwivedi

The "ipv6" word in assertion message should be "fentry_fexit".

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Acked-by: Song Liu <songliubraving@fb.com>
---
 tools/testing/selftests/bpf/prog_tests/fentry_fexit.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/fentry_fexit.c b/tools/testing/selftests/bpf/prog_tests/fentry_fexit.c
index 130f5b82d2e6..e3c139bde46e 100644
--- a/tools/testing/selftests/bpf/prog_tests/fentry_fexit.c
+++ b/tools/testing/selftests/bpf/prog_tests/fentry_fexit.c
@@ -28,8 +28,8 @@ void test_fentry_fexit(void)
 
 	prog_fd = fexit_skel->progs.test1.prog_fd;
 	err = bpf_prog_test_run_opts(prog_fd, &topts);
-	ASSERT_OK(err, "ipv6 test_run");
-	ASSERT_OK(topts.retval, "ipv6 test retval");
+	ASSERT_OK(err, "fentry_fexit test_run");
+	ASSERT_OK(topts.retval, "fentry_fexit test retval");
 
 	fentry_res = (__u64 *)fentry_skel->bss;
 	fexit_res = (__u64 *)fexit_skel->bss;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH bpf-next v5 6/6] selftests/bpf: Fix trivial typo in fentry_fexit.c
@ 2022-05-18 13:16   ` Xu Kuohai
  0 siblings, 0 replies; 60+ messages in thread
From: Xu Kuohai @ 2022-05-18 13:16 UTC (permalink / raw)
  To: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest
  Cc: Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Mark Rutland, Pasha Tatashin, Ard Biesheuvel, Daniel Kiss,
	Steven Price, Sudeep Holla, Marc Zyngier, Peter Collingbourne,
	Mark Brown, Delyan Kratunov, Kumar Kartikeya Dwivedi

The "ipv6" word in assertion message should be "fentry_fexit".

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Acked-by: Song Liu <songliubraving@fb.com>
---
 tools/testing/selftests/bpf/prog_tests/fentry_fexit.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/fentry_fexit.c b/tools/testing/selftests/bpf/prog_tests/fentry_fexit.c
index 130f5b82d2e6..e3c139bde46e 100644
--- a/tools/testing/selftests/bpf/prog_tests/fentry_fexit.c
+++ b/tools/testing/selftests/bpf/prog_tests/fentry_fexit.c
@@ -28,8 +28,8 @@ void test_fentry_fexit(void)
 
 	prog_fd = fexit_skel->progs.test1.prog_fd;
 	err = bpf_prog_test_run_opts(prog_fd, &topts);
-	ASSERT_OK(err, "ipv6 test_run");
-	ASSERT_OK(topts.retval, "ipv6 test retval");
+	ASSERT_OK(err, "fentry_fexit test_run");
+	ASSERT_OK(topts.retval, "fentry_fexit test retval");
 
 	fentry_res = (__u64 *)fentry_skel->bss;
 	fexit_res = (__u64 *)fexit_skel->bss;
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 5/6] bpf, arm64: bpf trampoline for arm64
  2022-05-18 13:16   ` Xu Kuohai
@ 2022-05-20 21:18     ` Alexei Starovoitov
  -1 siblings, 0 replies; 60+ messages in thread
From: Alexei Starovoitov @ 2022-05-20 21:18 UTC (permalink / raw)
  To: Xu Kuohai
  Cc: bpf, linux-arm-kernel, LKML, Network Development,
	open list:KERNEL SELFTEST FRAMEWORK, Catalin Marinas,
	Will Deacon, Steven Rostedt, Ingo Molnar, Daniel Borkmann,
	Alexei Starovoitov, Zi Shen Lim, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S . Miller, Hideaki YOSHIFUJI, David Ahern,
	Thomas Gleixner, Borislav Petkov, Dave Hansen, X86 ML,
	H. Peter Anvin, Shuah Khan, Jakub Kicinski,
	Jesper Dangaard Brouer, Mark Rutland, Pasha Tatashin,
	Ard Biesheuvel, Daniel Kiss, Steven Price, Sudeep Holla,
	Marc Zyngier, Peter Collingbourne, Mark Brown, Delyan Kratunov,
	Kumar Kartikeya Dwivedi

On Wed, May 18, 2022 at 6:54 AM Xu Kuohai <xukuohai@huawei.com> wrote:
>
> Add bpf trampoline support for arm64. Most of the logic is the same as
> x86.
>
> Tested on raspberry pi 4b and qemu with KASLR disabled (avoid long jump),
> result:
>  #9  /1     bpf_cookie/kprobe:OK
>  #9  /2     bpf_cookie/multi_kprobe_link_api:FAIL
>  #9  /3     bpf_cookie/multi_kprobe_attach_api:FAIL
>  #9  /4     bpf_cookie/uprobe:OK
>  #9  /5     bpf_cookie/tracepoint:OK
>  #9  /6     bpf_cookie/perf_event:OK
>  #9  /7     bpf_cookie/trampoline:OK
>  #9  /8     bpf_cookie/lsm:OK
>  #9         bpf_cookie:FAIL
>  #18 /1     bpf_tcp_ca/dctcp:OK
>  #18 /2     bpf_tcp_ca/cubic:OK
>  #18 /3     bpf_tcp_ca/invalid_license:OK
>  #18 /4     bpf_tcp_ca/dctcp_fallback:OK
>  #18 /5     bpf_tcp_ca/rel_setsockopt:OK
>  #18        bpf_tcp_ca:OK
>  #51 /1     dummy_st_ops/dummy_st_ops_attach:OK
>  #51 /2     dummy_st_ops/dummy_init_ret_value:OK
>  #51 /3     dummy_st_ops/dummy_init_ptr_arg:OK
>  #51 /4     dummy_st_ops/dummy_multiple_args:OK
>  #51        dummy_st_ops:OK
>  #55        fentry_fexit:OK
>  #56        fentry_test:OK
>  #57 /1     fexit_bpf2bpf/target_no_callees:OK
>  #57 /2     fexit_bpf2bpf/target_yes_callees:OK
>  #57 /3     fexit_bpf2bpf/func_replace:OK
>  #57 /4     fexit_bpf2bpf/func_replace_verify:OK
>  #57 /5     fexit_bpf2bpf/func_sockmap_update:OK
>  #57 /6     fexit_bpf2bpf/func_replace_return_code:OK
>  #57 /7     fexit_bpf2bpf/func_map_prog_compatibility:OK
>  #57 /8     fexit_bpf2bpf/func_replace_multi:OK
>  #57 /9     fexit_bpf2bpf/fmod_ret_freplace:OK
>  #57        fexit_bpf2bpf:OK
>  #58        fexit_sleep:OK
>  #59        fexit_stress:OK
>  #60        fexit_test:OK
>  #67        get_func_args_test:OK
>  #68        get_func_ip_test:OK
>  #104       modify_return:OK
>  #237       xdp_bpf2bpf:OK
>
> bpf_cookie/multi_kprobe_link_api and bpf_cookie/multi_kprobe_attach_api
> failed due to lack of multi_kprobe on arm64.
>
> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
> Acked-by: Song Liu <songliubraving@fb.com>

Catalin, Will, Mark,

could you please ack this patch that you don't mind us
taking this set through bpf-next ?

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 5/6] bpf, arm64: bpf trampoline for arm64
@ 2022-05-20 21:18     ` Alexei Starovoitov
  0 siblings, 0 replies; 60+ messages in thread
From: Alexei Starovoitov @ 2022-05-20 21:18 UTC (permalink / raw)
  To: Xu Kuohai
  Cc: bpf, linux-arm-kernel, LKML, Network Development,
	open list:KERNEL SELFTEST FRAMEWORK, Catalin Marinas,
	Will Deacon, Steven Rostedt, Ingo Molnar, Daniel Borkmann,
	Alexei Starovoitov, Zi Shen Lim, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S . Miller, Hideaki YOSHIFUJI, David Ahern,
	Thomas Gleixner, Borislav Petkov, Dave Hansen, X86 ML,
	H. Peter Anvin, Shuah Khan, Jakub Kicinski,
	Jesper Dangaard Brouer, Mark Rutland, Pasha Tatashin,
	Ard Biesheuvel, Daniel Kiss, Steven Price, Sudeep Holla,
	Marc Zyngier, Peter Collingbourne, Mark Brown, Delyan Kratunov,
	Kumar Kartikeya Dwivedi

On Wed, May 18, 2022 at 6:54 AM Xu Kuohai <xukuohai@huawei.com> wrote:
>
> Add bpf trampoline support for arm64. Most of the logic is the same as
> x86.
>
> Tested on raspberry pi 4b and qemu with KASLR disabled (avoid long jump),
> result:
>  #9  /1     bpf_cookie/kprobe:OK
>  #9  /2     bpf_cookie/multi_kprobe_link_api:FAIL
>  #9  /3     bpf_cookie/multi_kprobe_attach_api:FAIL
>  #9  /4     bpf_cookie/uprobe:OK
>  #9  /5     bpf_cookie/tracepoint:OK
>  #9  /6     bpf_cookie/perf_event:OK
>  #9  /7     bpf_cookie/trampoline:OK
>  #9  /8     bpf_cookie/lsm:OK
>  #9         bpf_cookie:FAIL
>  #18 /1     bpf_tcp_ca/dctcp:OK
>  #18 /2     bpf_tcp_ca/cubic:OK
>  #18 /3     bpf_tcp_ca/invalid_license:OK
>  #18 /4     bpf_tcp_ca/dctcp_fallback:OK
>  #18 /5     bpf_tcp_ca/rel_setsockopt:OK
>  #18        bpf_tcp_ca:OK
>  #51 /1     dummy_st_ops/dummy_st_ops_attach:OK
>  #51 /2     dummy_st_ops/dummy_init_ret_value:OK
>  #51 /3     dummy_st_ops/dummy_init_ptr_arg:OK
>  #51 /4     dummy_st_ops/dummy_multiple_args:OK
>  #51        dummy_st_ops:OK
>  #55        fentry_fexit:OK
>  #56        fentry_test:OK
>  #57 /1     fexit_bpf2bpf/target_no_callees:OK
>  #57 /2     fexit_bpf2bpf/target_yes_callees:OK
>  #57 /3     fexit_bpf2bpf/func_replace:OK
>  #57 /4     fexit_bpf2bpf/func_replace_verify:OK
>  #57 /5     fexit_bpf2bpf/func_sockmap_update:OK
>  #57 /6     fexit_bpf2bpf/func_replace_return_code:OK
>  #57 /7     fexit_bpf2bpf/func_map_prog_compatibility:OK
>  #57 /8     fexit_bpf2bpf/func_replace_multi:OK
>  #57 /9     fexit_bpf2bpf/fmod_ret_freplace:OK
>  #57        fexit_bpf2bpf:OK
>  #58        fexit_sleep:OK
>  #59        fexit_stress:OK
>  #60        fexit_test:OK
>  #67        get_func_args_test:OK
>  #68        get_func_ip_test:OK
>  #104       modify_return:OK
>  #237       xdp_bpf2bpf:OK
>
> bpf_cookie/multi_kprobe_link_api and bpf_cookie/multi_kprobe_attach_api
> failed due to lack of multi_kprobe on arm64.
>
> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
> Acked-by: Song Liu <songliubraving@fb.com>

Catalin, Will, Mark,

could you please ack this patch that you don't mind us
taking this set through bpf-next ?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 5/6] bpf, arm64: bpf trampoline for arm64
  2022-05-18 13:16   ` Xu Kuohai
@ 2022-05-23  1:36     ` KP Singh
  -1 siblings, 0 replies; 60+ messages in thread
From: KP Singh @ 2022-05-23  1:36 UTC (permalink / raw)
  To: Xu Kuohai
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, David S . Miller, Hideaki YOSHIFUJI, David Ahern,
	Thomas Gleixner, Borislav Petkov, Dave Hansen, x86, hpa,
	Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer, Mark Rutland,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi

On Wed, May 18, 2022 at 3:54 PM Xu Kuohai <xukuohai@huawei.com> wrote:
>
> Add bpf trampoline support for arm64. Most of the logic is the same as
> x86.
>
> Tested on raspberry pi 4b and qemu with KASLR disabled (avoid long jump),
> result:
>  #9  /1     bpf_cookie/kprobe:OK
>  #9  /2     bpf_cookie/multi_kprobe_link_api:FAIL
>  #9  /3     bpf_cookie/multi_kprobe_attach_api:FAIL
>  #9  /4     bpf_cookie/uprobe:OK
>  #9  /5     bpf_cookie/tracepoint:OK
>  #9  /6     bpf_cookie/perf_event:OK
>  #9  /7     bpf_cookie/trampoline:OK
>  #9  /8     bpf_cookie/lsm:OK
>  #9         bpf_cookie:FAIL
>  #18 /1     bpf_tcp_ca/dctcp:OK
>  #18 /2     bpf_tcp_ca/cubic:OK
>  #18 /3     bpf_tcp_ca/invalid_license:OK
>  #18 /4     bpf_tcp_ca/dctcp_fallback:OK
>  #18 /5     bpf_tcp_ca/rel_setsockopt:OK
>  #18        bpf_tcp_ca:OK
>  #51 /1     dummy_st_ops/dummy_st_ops_attach:OK
>  #51 /2     dummy_st_ops/dummy_init_ret_value:OK
>  #51 /3     dummy_st_ops/dummy_init_ptr_arg:OK
>  #51 /4     dummy_st_ops/dummy_multiple_args:OK
>  #51        dummy_st_ops:OK
>  #55        fentry_fexit:OK
>  #56        fentry_test:OK
>  #57 /1     fexit_bpf2bpf/target_no_callees:OK
>  #57 /2     fexit_bpf2bpf/target_yes_callees:OK
>  #57 /3     fexit_bpf2bpf/func_replace:OK
>  #57 /4     fexit_bpf2bpf/func_replace_verify:OK
>  #57 /5     fexit_bpf2bpf/func_sockmap_update:OK
>  #57 /6     fexit_bpf2bpf/func_replace_return_code:OK
>  #57 /7     fexit_bpf2bpf/func_map_prog_compatibility:OK
>  #57 /8     fexit_bpf2bpf/func_replace_multi:OK
>  #57 /9     fexit_bpf2bpf/fmod_ret_freplace:OK
>  #57        fexit_bpf2bpf:OK
>  #58        fexit_sleep:OK
>  #59        fexit_stress:OK
>  #60        fexit_test:OK
>  #67        get_func_args_test:OK
>  #68        get_func_ip_test:OK
>  #104       modify_return:OK
>  #237       xdp_bpf2bpf:OK
>
> bpf_cookie/multi_kprobe_link_api and bpf_cookie/multi_kprobe_attach_api
> failed due to lack of multi_kprobe on arm64.
>
> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
> Acked-by: Song Liu <songliubraving@fb.com>

Acked-by: KP Singh <kpsingh@kernel.org>

Thanks! This is exciting.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 5/6] bpf, arm64: bpf trampoline for arm64
@ 2022-05-23  1:36     ` KP Singh
  0 siblings, 0 replies; 60+ messages in thread
From: KP Singh @ 2022-05-23  1:36 UTC (permalink / raw)
  To: Xu Kuohai
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, David S . Miller, Hideaki YOSHIFUJI, David Ahern,
	Thomas Gleixner, Borislav Petkov, Dave Hansen, x86, hpa,
	Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer, Mark Rutland,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi

On Wed, May 18, 2022 at 3:54 PM Xu Kuohai <xukuohai@huawei.com> wrote:
>
> Add bpf trampoline support for arm64. Most of the logic is the same as
> x86.
>
> Tested on raspberry pi 4b and qemu with KASLR disabled (avoid long jump),
> result:
>  #9  /1     bpf_cookie/kprobe:OK
>  #9  /2     bpf_cookie/multi_kprobe_link_api:FAIL
>  #9  /3     bpf_cookie/multi_kprobe_attach_api:FAIL
>  #9  /4     bpf_cookie/uprobe:OK
>  #9  /5     bpf_cookie/tracepoint:OK
>  #9  /6     bpf_cookie/perf_event:OK
>  #9  /7     bpf_cookie/trampoline:OK
>  #9  /8     bpf_cookie/lsm:OK
>  #9         bpf_cookie:FAIL
>  #18 /1     bpf_tcp_ca/dctcp:OK
>  #18 /2     bpf_tcp_ca/cubic:OK
>  #18 /3     bpf_tcp_ca/invalid_license:OK
>  #18 /4     bpf_tcp_ca/dctcp_fallback:OK
>  #18 /5     bpf_tcp_ca/rel_setsockopt:OK
>  #18        bpf_tcp_ca:OK
>  #51 /1     dummy_st_ops/dummy_st_ops_attach:OK
>  #51 /2     dummy_st_ops/dummy_init_ret_value:OK
>  #51 /3     dummy_st_ops/dummy_init_ptr_arg:OK
>  #51 /4     dummy_st_ops/dummy_multiple_args:OK
>  #51        dummy_st_ops:OK
>  #55        fentry_fexit:OK
>  #56        fentry_test:OK
>  #57 /1     fexit_bpf2bpf/target_no_callees:OK
>  #57 /2     fexit_bpf2bpf/target_yes_callees:OK
>  #57 /3     fexit_bpf2bpf/func_replace:OK
>  #57 /4     fexit_bpf2bpf/func_replace_verify:OK
>  #57 /5     fexit_bpf2bpf/func_sockmap_update:OK
>  #57 /6     fexit_bpf2bpf/func_replace_return_code:OK
>  #57 /7     fexit_bpf2bpf/func_map_prog_compatibility:OK
>  #57 /8     fexit_bpf2bpf/func_replace_multi:OK
>  #57 /9     fexit_bpf2bpf/fmod_ret_freplace:OK
>  #57        fexit_bpf2bpf:OK
>  #58        fexit_sleep:OK
>  #59        fexit_stress:OK
>  #60        fexit_test:OK
>  #67        get_func_args_test:OK
>  #68        get_func_ip_test:OK
>  #104       modify_return:OK
>  #237       xdp_bpf2bpf:OK
>
> bpf_cookie/multi_kprobe_link_api and bpf_cookie/multi_kprobe_attach_api
> failed due to lack of multi_kprobe on arm64.
>
> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
> Acked-by: Song Liu <songliubraving@fb.com>

Acked-by: KP Singh <kpsingh@kernel.org>

Thanks! This is exciting.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 1/6] arm64: ftrace: Add ftrace direct call support
  2022-05-18 13:16   ` Xu Kuohai
@ 2022-05-23  1:39     ` KP Singh
  -1 siblings, 0 replies; 60+ messages in thread
From: KP Singh @ 2022-05-23  1:39 UTC (permalink / raw)
  To: Xu Kuohai
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, David S . Miller, Hideaki YOSHIFUJI, David Ahern,
	Thomas Gleixner, Borislav Petkov, Dave Hansen, x86, hpa,
	Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer, Mark Rutland,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi

On Wed, May 18, 2022 at 3:53 PM Xu Kuohai <xukuohai@huawei.com> wrote:
>
> Add ftrace direct support for arm64.
>
> 1. When there is custom trampoline only, replace the fentry nop to a
>    jump instruction that jumps directly to the custom trampoline.
>
> 2. When ftrace trampoline and custom trampoline coexist, jump from
>    fentry to ftrace trampoline first, then jump to custom trampoline
>    when ftrace trampoline exits. The current unused register
>    pt_regs->orig_x0 is used as an intermediary for jumping from ftrace
>    trampoline to custom trampoline.
>
> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
> Acked-by: Song Liu <songliubraving@fb.com>

Acked-by: KP Singh <kpsingh@kernel.org>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 1/6] arm64: ftrace: Add ftrace direct call support
@ 2022-05-23  1:39     ` KP Singh
  0 siblings, 0 replies; 60+ messages in thread
From: KP Singh @ 2022-05-23  1:39 UTC (permalink / raw)
  To: Xu Kuohai
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, David S . Miller, Hideaki YOSHIFUJI, David Ahern,
	Thomas Gleixner, Borislav Petkov, Dave Hansen, x86, hpa,
	Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer, Mark Rutland,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi

On Wed, May 18, 2022 at 3:53 PM Xu Kuohai <xukuohai@huawei.com> wrote:
>
> Add ftrace direct support for arm64.
>
> 1. When there is custom trampoline only, replace the fentry nop to a
>    jump instruction that jumps directly to the custom trampoline.
>
> 2. When ftrace trampoline and custom trampoline coexist, jump from
>    fentry to ftrace trampoline first, then jump to custom trampoline
>    when ftrace trampoline exits. The current unused register
>    pt_regs->orig_x0 is used as an intermediary for jumping from ftrace
>    trampoline to custom trampoline.
>
> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
> Acked-by: Song Liu <songliubraving@fb.com>

Acked-by: KP Singh <kpsingh@kernel.org>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 4/6] bpf, arm64: Impelment bpf_arch_text_poke() for arm64
  2022-05-18 13:16   ` Xu Kuohai
@ 2022-05-23  1:41     ` KP Singh
  -1 siblings, 0 replies; 60+ messages in thread
From: KP Singh @ 2022-05-23  1:41 UTC (permalink / raw)
  To: Xu Kuohai
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, David S . Miller, Hideaki YOSHIFUJI, David Ahern,
	Thomas Gleixner, Borislav Petkov, Dave Hansen, x86, hpa,
	Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer, Mark Rutland,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi

On Wed, May 18, 2022 at 3:54 PM Xu Kuohai <xukuohai@huawei.com> wrote:
>
> Impelment bpf_arch_text_poke() for arm64, so bpf trampoline code can use
> it to replace nop with jump, or replace jump with nop.
>
> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
> Acked-by: Song Liu <songliubraving@fb.com>
> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>

Reviewed-by: KP Singh <kpsingh@kernel.org>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 4/6] bpf, arm64: Impelment bpf_arch_text_poke() for arm64
@ 2022-05-23  1:41     ` KP Singh
  0 siblings, 0 replies; 60+ messages in thread
From: KP Singh @ 2022-05-23  1:41 UTC (permalink / raw)
  To: Xu Kuohai
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, David S . Miller, Hideaki YOSHIFUJI, David Ahern,
	Thomas Gleixner, Borislav Petkov, Dave Hansen, x86, hpa,
	Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer, Mark Rutland,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi

On Wed, May 18, 2022 at 3:54 PM Xu Kuohai <xukuohai@huawei.com> wrote:
>
> Impelment bpf_arch_text_poke() for arm64, so bpf trampoline code can use
> it to replace nop with jump, or replace jump with nop.
>
> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
> Acked-by: Song Liu <songliubraving@fb.com>
> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>

Reviewed-by: KP Singh <kpsingh@kernel.org>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 5/6] bpf, arm64: bpf trampoline for arm64
  2022-05-20 21:18     ` Alexei Starovoitov
@ 2022-05-23 16:09       ` Mark Rutland
  -1 siblings, 0 replies; 60+ messages in thread
From: Mark Rutland @ 2022-05-23 16:09 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Xu Kuohai, bpf, linux-arm-kernel, LKML, Network Development,
	open list:KERNEL SELFTEST FRAMEWORK, Catalin Marinas,
	Will Deacon, Steven Rostedt, Ingo Molnar, Daniel Borkmann,
	Alexei Starovoitov, Zi Shen Lim, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S . Miller, Hideaki YOSHIFUJI, David Ahern,
	Thomas Gleixner, Borislav Petkov, Dave Hansen, X86 ML,
	H. Peter Anvin, Shuah Khan, Jakub Kicinski,
	Jesper Dangaard Brouer, Pasha Tatashin, Ard Biesheuvel,
	Daniel Kiss, Steven Price, Sudeep Holla, Marc Zyngier,
	Peter Collingbourne, Mark Brown, Delyan Kratunov,
	Kumar Kartikeya Dwivedi

On Fri, May 20, 2022 at 02:18:20PM -0700, Alexei Starovoitov wrote:
> On Wed, May 18, 2022 at 6:54 AM Xu Kuohai <xukuohai@huawei.com> wrote:
> >
> > Add bpf trampoline support for arm64. Most of the logic is the same as
> > x86.
> >
> > Tested on raspberry pi 4b and qemu with KASLR disabled (avoid long jump),
> > result:
> >  #9  /1     bpf_cookie/kprobe:OK
> >  #9  /2     bpf_cookie/multi_kprobe_link_api:FAIL
> >  #9  /3     bpf_cookie/multi_kprobe_attach_api:FAIL
> >  #9  /4     bpf_cookie/uprobe:OK
> >  #9  /5     bpf_cookie/tracepoint:OK
> >  #9  /6     bpf_cookie/perf_event:OK
> >  #9  /7     bpf_cookie/trampoline:OK
> >  #9  /8     bpf_cookie/lsm:OK
> >  #9         bpf_cookie:FAIL
> >  #18 /1     bpf_tcp_ca/dctcp:OK
> >  #18 /2     bpf_tcp_ca/cubic:OK
> >  #18 /3     bpf_tcp_ca/invalid_license:OK
> >  #18 /4     bpf_tcp_ca/dctcp_fallback:OK
> >  #18 /5     bpf_tcp_ca/rel_setsockopt:OK
> >  #18        bpf_tcp_ca:OK
> >  #51 /1     dummy_st_ops/dummy_st_ops_attach:OK
> >  #51 /2     dummy_st_ops/dummy_init_ret_value:OK
> >  #51 /3     dummy_st_ops/dummy_init_ptr_arg:OK
> >  #51 /4     dummy_st_ops/dummy_multiple_args:OK
> >  #51        dummy_st_ops:OK
> >  #55        fentry_fexit:OK
> >  #56        fentry_test:OK
> >  #57 /1     fexit_bpf2bpf/target_no_callees:OK
> >  #57 /2     fexit_bpf2bpf/target_yes_callees:OK
> >  #57 /3     fexit_bpf2bpf/func_replace:OK
> >  #57 /4     fexit_bpf2bpf/func_replace_verify:OK
> >  #57 /5     fexit_bpf2bpf/func_sockmap_update:OK
> >  #57 /6     fexit_bpf2bpf/func_replace_return_code:OK
> >  #57 /7     fexit_bpf2bpf/func_map_prog_compatibility:OK
> >  #57 /8     fexit_bpf2bpf/func_replace_multi:OK
> >  #57 /9     fexit_bpf2bpf/fmod_ret_freplace:OK
> >  #57        fexit_bpf2bpf:OK
> >  #58        fexit_sleep:OK
> >  #59        fexit_stress:OK
> >  #60        fexit_test:OK
> >  #67        get_func_args_test:OK
> >  #68        get_func_ip_test:OK
> >  #104       modify_return:OK
> >  #237       xdp_bpf2bpf:OK
> >
> > bpf_cookie/multi_kprobe_link_api and bpf_cookie/multi_kprobe_attach_api
> > failed due to lack of multi_kprobe on arm64.
> >
> > Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
> > Acked-by: Song Liu <songliubraving@fb.com>
> 
> Catalin, Will, Mark,
> 
> could you please ack this patch that you don't mind us
> taking this set through bpf-next ?

This is on my queue of things to review alongside some other ftrace and kprobes
patches; I'll try to get that out of the way this week.

From a quick glance I'm not too keen on the change to the ftrace trampoline, as
to get rid of some existing unsoundness I'd really wanted to move that entirely
away from using regs (and had a sketch for how to handle different op
handlers). I'd discussed that with Steven and Masami in another thread:

  https://lore.kernel.org/linux-arm-kernel/YnJUTuOIX9YoJq23@FVFF77S0Q05N/

I'll see if it's possible to make this all work together. It's not entirely
clear to me how the FTRACE_DIRECT its are supposed to play with dynamically
allocated trampolines, and we might need to take a step back and reconsider.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 5/6] bpf, arm64: bpf trampoline for arm64
@ 2022-05-23 16:09       ` Mark Rutland
  0 siblings, 0 replies; 60+ messages in thread
From: Mark Rutland @ 2022-05-23 16:09 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Xu Kuohai, bpf, linux-arm-kernel, LKML, Network Development,
	open list:KERNEL SELFTEST FRAMEWORK, Catalin Marinas,
	Will Deacon, Steven Rostedt, Ingo Molnar, Daniel Borkmann,
	Alexei Starovoitov, Zi Shen Lim, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S . Miller, Hideaki YOSHIFUJI, David Ahern,
	Thomas Gleixner, Borislav Petkov, Dave Hansen, X86 ML,
	H. Peter Anvin, Shuah Khan, Jakub Kicinski,
	Jesper Dangaard Brouer, Pasha Tatashin, Ard Biesheuvel,
	Daniel Kiss, Steven Price, Sudeep Holla, Marc Zyngier,
	Peter Collingbourne, Mark Brown, Delyan Kratunov,
	Kumar Kartikeya Dwivedi

On Fri, May 20, 2022 at 02:18:20PM -0700, Alexei Starovoitov wrote:
> On Wed, May 18, 2022 at 6:54 AM Xu Kuohai <xukuohai@huawei.com> wrote:
> >
> > Add bpf trampoline support for arm64. Most of the logic is the same as
> > x86.
> >
> > Tested on raspberry pi 4b and qemu with KASLR disabled (avoid long jump),
> > result:
> >  #9  /1     bpf_cookie/kprobe:OK
> >  #9  /2     bpf_cookie/multi_kprobe_link_api:FAIL
> >  #9  /3     bpf_cookie/multi_kprobe_attach_api:FAIL
> >  #9  /4     bpf_cookie/uprobe:OK
> >  #9  /5     bpf_cookie/tracepoint:OK
> >  #9  /6     bpf_cookie/perf_event:OK
> >  #9  /7     bpf_cookie/trampoline:OK
> >  #9  /8     bpf_cookie/lsm:OK
> >  #9         bpf_cookie:FAIL
> >  #18 /1     bpf_tcp_ca/dctcp:OK
> >  #18 /2     bpf_tcp_ca/cubic:OK
> >  #18 /3     bpf_tcp_ca/invalid_license:OK
> >  #18 /4     bpf_tcp_ca/dctcp_fallback:OK
> >  #18 /5     bpf_tcp_ca/rel_setsockopt:OK
> >  #18        bpf_tcp_ca:OK
> >  #51 /1     dummy_st_ops/dummy_st_ops_attach:OK
> >  #51 /2     dummy_st_ops/dummy_init_ret_value:OK
> >  #51 /3     dummy_st_ops/dummy_init_ptr_arg:OK
> >  #51 /4     dummy_st_ops/dummy_multiple_args:OK
> >  #51        dummy_st_ops:OK
> >  #55        fentry_fexit:OK
> >  #56        fentry_test:OK
> >  #57 /1     fexit_bpf2bpf/target_no_callees:OK
> >  #57 /2     fexit_bpf2bpf/target_yes_callees:OK
> >  #57 /3     fexit_bpf2bpf/func_replace:OK
> >  #57 /4     fexit_bpf2bpf/func_replace_verify:OK
> >  #57 /5     fexit_bpf2bpf/func_sockmap_update:OK
> >  #57 /6     fexit_bpf2bpf/func_replace_return_code:OK
> >  #57 /7     fexit_bpf2bpf/func_map_prog_compatibility:OK
> >  #57 /8     fexit_bpf2bpf/func_replace_multi:OK
> >  #57 /9     fexit_bpf2bpf/fmod_ret_freplace:OK
> >  #57        fexit_bpf2bpf:OK
> >  #58        fexit_sleep:OK
> >  #59        fexit_stress:OK
> >  #60        fexit_test:OK
> >  #67        get_func_args_test:OK
> >  #68        get_func_ip_test:OK
> >  #104       modify_return:OK
> >  #237       xdp_bpf2bpf:OK
> >
> > bpf_cookie/multi_kprobe_link_api and bpf_cookie/multi_kprobe_attach_api
> > failed due to lack of multi_kprobe on arm64.
> >
> > Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
> > Acked-by: Song Liu <songliubraving@fb.com>
> 
> Catalin, Will, Mark,
> 
> could you please ack this patch that you don't mind us
> taking this set through bpf-next ?

This is on my queue of things to review alongside some other ftrace and kprobes
patches; I'll try to get that out of the way this week.

From a quick glance I'm not too keen on the change to the ftrace trampoline, as
to get rid of some existing unsoundness I'd really wanted to move that entirely
away from using regs (and had a sketch for how to handle different op
handlers). I'd discussed that with Steven and Masami in another thread:

  https://lore.kernel.org/linux-arm-kernel/YnJUTuOIX9YoJq23@FVFF77S0Q05N/

I'll see if it's possible to make this all work together. It's not entirely
clear to me how the FTRACE_DIRECT its are supposed to play with dynamically
allocated trampolines, and we might need to take a step back and reconsider.

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 1/6] arm64: ftrace: Add ftrace direct call support
  2022-05-18 13:16   ` Xu Kuohai
@ 2022-05-25 13:38     ` Mark Rutland
  -1 siblings, 0 replies; 60+ messages in thread
From: Mark Rutland @ 2022-05-25 13:38 UTC (permalink / raw)
  To: Xu Kuohai
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi, Wang ShaoBo,
	cj.chengjian, huawei.libin, xiexiuqi, liwei391

On Wed, May 18, 2022 at 09:16:33AM -0400, Xu Kuohai wrote:
> Add ftrace direct support for arm64.
> 
> 1. When there is custom trampoline only, replace the fentry nop to a
>    jump instruction that jumps directly to the custom trampoline.
> 
> 2. When ftrace trampoline and custom trampoline coexist, jump from
>    fentry to ftrace trampoline first, then jump to custom trampoline
>    when ftrace trampoline exits. The current unused register
>    pt_regs->orig_x0 is used as an intermediary for jumping from ftrace
>    trampoline to custom trampoline.

For those of us not all that familiar with BPF, can you explain *why* you want
this? The above explains what the patch implements, but not why that's useful.

e.g. is this just to avoid the overhead of the ops list processing in the
regular ftrace code, or is the custom trampoline there to allow you to do
something special?

There is another patch series on the list from some of your colleagues which
uses dynamic trampolines to try to avoid that ops list overhead, and it's not
clear to me whether these are trying to solve the largely same problem or
something different. That other thread is at:

  https://lore.kernel.org/linux-arm-kernel/20220316100132.244849-1-bobo.shaobowang@huawei.com/

... and I've added the relevant parties to CC here, since there doesn't seem to
be any overlap in the CC lists of the two threads.

In that other thread I've suggested a general approach we could follow at:
  
  https://lore.kernel.org/linux-arm-kernel/YmGF%2FOpIhAF8YeVq@lakrids/

As noted in that thread, I have a few concerns which equally apply here:

* Due to the limited range of BL instructions, it's not always possible to
  patch an ftrace call-site to branch to an arbitrary trampoline. The way this
  works for ftrace today relies upon knowingthe set of trampolines at
  compile-time, and allocating module PLTs for those, and that approach cannot
  work reliably for dynanically allocated trampolines.

  I'd strongly prefer to avoid custom tramplines unless they're strictly
  necessary for functional reasons, so that we can have this work reliably and
  consistently.

* If this is mostly about avoiding the ops list processing overhead, I beleive
  we can implement some custom ops support more generally in ftrace which would
  still use a common trampoline but could directly call into those custom ops.
  I would strongly prefer this over custom trampolines.

* I'm looking to minimize the set of regs ftrace saves, and never save a full
  pt_regs, since today we (incompletely) fill that with bogus values and cannot
  acquire some state reliably (e.g. PSTATE). I'd like to avoid usage of pt_regs
  unless necessary, and I don't want to add additional reliance upon that
  structure.

> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
> Acked-by: Song Liu <songliubraving@fb.com>
> ---
>  arch/arm64/Kconfig               |  2 ++
>  arch/arm64/include/asm/ftrace.h  | 12 ++++++++++++
>  arch/arm64/kernel/asm-offsets.c  |  1 +
>  arch/arm64/kernel/entry-ftrace.S | 18 +++++++++++++++---
>  4 files changed, 30 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 57c4c995965f..81cc330daafc 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -177,6 +177,8 @@ config ARM64
>  	select HAVE_DYNAMIC_FTRACE
>  	select HAVE_DYNAMIC_FTRACE_WITH_REGS \
>  		if $(cc-option,-fpatchable-function-entry=2)
> +	select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS \
> +		if DYNAMIC_FTRACE_WITH_REGS
>  	select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY \
>  		if DYNAMIC_FTRACE_WITH_REGS
>  	select HAVE_EFFICIENT_UNALIGNED_ACCESS
> diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
> index 1494cfa8639b..14a35a5df0a1 100644
> --- a/arch/arm64/include/asm/ftrace.h
> +++ b/arch/arm64/include/asm/ftrace.h
> @@ -78,6 +78,18 @@ static inline unsigned long ftrace_call_adjust(unsigned long addr)
>  	return addr;
>  }
>  
> +#ifdef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
> +static inline void arch_ftrace_set_direct_caller(struct pt_regs *regs,
> +						 unsigned long addr)
> +{
> +	/*
> +	 * Place custom trampoline address in regs->orig_x0 to let ftrace
> +	 * trampoline jump to it.
> +	 */
> +	regs->orig_x0 = addr;
> +}
> +#endif /* CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */

Please, let's not abuse pt_regs::orig_x0 for this. That's at best unnecessarily
confusing, and if we really need a field to place a value like this it implies
we should add an ftrace-specific structure to hold the ftrace-specific context
information.

Thanks,
Mark.

> +
>  #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
>  struct dyn_ftrace;
>  int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec);
> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> index 1197e7679882..b1ed0bf01c59 100644
> --- a/arch/arm64/kernel/asm-offsets.c
> +++ b/arch/arm64/kernel/asm-offsets.c
> @@ -80,6 +80,7 @@ int main(void)
>    DEFINE(S_SDEI_TTBR1,		offsetof(struct pt_regs, sdei_ttbr1));
>    DEFINE(S_PMR_SAVE,		offsetof(struct pt_regs, pmr_save));
>    DEFINE(S_STACKFRAME,		offsetof(struct pt_regs, stackframe));
> +  DEFINE(S_ORIG_X0,		offsetof(struct pt_regs, orig_x0));
>    DEFINE(PT_REGS_SIZE,		sizeof(struct pt_regs));
>    BLANK();
>  #ifdef CONFIG_COMPAT
> diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
> index e535480a4069..dfe62c55e3a2 100644
> --- a/arch/arm64/kernel/entry-ftrace.S
> +++ b/arch/arm64/kernel/entry-ftrace.S
> @@ -60,6 +60,9 @@
>  	str	x29, [sp, #S_FP]
>  	.endif
>  
> +	/* Set orig_x0 to zero  */
> +	str     xzr, [sp, #S_ORIG_X0]
> +
>  	/* Save the callsite's SP and LR */
>  	add	x10, sp, #(PT_REGS_SIZE + 16)
>  	stp	x9, x10, [sp, #S_LR]
> @@ -119,12 +122,21 @@ ftrace_common_return:
>  	/* Restore the callsite's FP, LR, PC */
>  	ldr	x29, [sp, #S_FP]
>  	ldr	x30, [sp, #S_LR]
> -	ldr	x9, [sp, #S_PC]
> -
> +	ldr	x10, [sp, #S_PC]
> +
> +	ldr	x11, [sp, #S_ORIG_X0]
> +	cbz	x11, 1f
> +	/* Set x9 to parent ip before jump to custom trampoline */
> +	mov	x9,  x30
> +	/* Set lr to self ip */
> +	ldr	x30, [sp, #S_PC]
> +	/* Set x10 (used for return address) to custom trampoline */
> +	mov	x10, x11
> +1:
>  	/* Restore the callsite's SP */
>  	add	sp, sp, #PT_REGS_SIZE + 16
>  
> -	ret	x9
> +	ret	x10
>  SYM_CODE_END(ftrace_common)
>  
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 1/6] arm64: ftrace: Add ftrace direct call support
@ 2022-05-25 13:38     ` Mark Rutland
  0 siblings, 0 replies; 60+ messages in thread
From: Mark Rutland @ 2022-05-25 13:38 UTC (permalink / raw)
  To: Xu Kuohai
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi, Wang ShaoBo,
	cj.chengjian, huawei.libin, xiexiuqi, liwei391

On Wed, May 18, 2022 at 09:16:33AM -0400, Xu Kuohai wrote:
> Add ftrace direct support for arm64.
> 
> 1. When there is custom trampoline only, replace the fentry nop to a
>    jump instruction that jumps directly to the custom trampoline.
> 
> 2. When ftrace trampoline and custom trampoline coexist, jump from
>    fentry to ftrace trampoline first, then jump to custom trampoline
>    when ftrace trampoline exits. The current unused register
>    pt_regs->orig_x0 is used as an intermediary for jumping from ftrace
>    trampoline to custom trampoline.

For those of us not all that familiar with BPF, can you explain *why* you want
this? The above explains what the patch implements, but not why that's useful.

e.g. is this just to avoid the overhead of the ops list processing in the
regular ftrace code, or is the custom trampoline there to allow you to do
something special?

There is another patch series on the list from some of your colleagues which
uses dynamic trampolines to try to avoid that ops list overhead, and it's not
clear to me whether these are trying to solve the largely same problem or
something different. That other thread is at:

  https://lore.kernel.org/linux-arm-kernel/20220316100132.244849-1-bobo.shaobowang@huawei.com/

... and I've added the relevant parties to CC here, since there doesn't seem to
be any overlap in the CC lists of the two threads.

In that other thread I've suggested a general approach we could follow at:
  
  https://lore.kernel.org/linux-arm-kernel/YmGF%2FOpIhAF8YeVq@lakrids/

As noted in that thread, I have a few concerns which equally apply here:

* Due to the limited range of BL instructions, it's not always possible to
  patch an ftrace call-site to branch to an arbitrary trampoline. The way this
  works for ftrace today relies upon knowingthe set of trampolines at
  compile-time, and allocating module PLTs for those, and that approach cannot
  work reliably for dynanically allocated trampolines.

  I'd strongly prefer to avoid custom tramplines unless they're strictly
  necessary for functional reasons, so that we can have this work reliably and
  consistently.

* If this is mostly about avoiding the ops list processing overhead, I beleive
  we can implement some custom ops support more generally in ftrace which would
  still use a common trampoline but could directly call into those custom ops.
  I would strongly prefer this over custom trampolines.

* I'm looking to minimize the set of regs ftrace saves, and never save a full
  pt_regs, since today we (incompletely) fill that with bogus values and cannot
  acquire some state reliably (e.g. PSTATE). I'd like to avoid usage of pt_regs
  unless necessary, and I don't want to add additional reliance upon that
  structure.

> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
> Acked-by: Song Liu <songliubraving@fb.com>
> ---
>  arch/arm64/Kconfig               |  2 ++
>  arch/arm64/include/asm/ftrace.h  | 12 ++++++++++++
>  arch/arm64/kernel/asm-offsets.c  |  1 +
>  arch/arm64/kernel/entry-ftrace.S | 18 +++++++++++++++---
>  4 files changed, 30 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 57c4c995965f..81cc330daafc 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -177,6 +177,8 @@ config ARM64
>  	select HAVE_DYNAMIC_FTRACE
>  	select HAVE_DYNAMIC_FTRACE_WITH_REGS \
>  		if $(cc-option,-fpatchable-function-entry=2)
> +	select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS \
> +		if DYNAMIC_FTRACE_WITH_REGS
>  	select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY \
>  		if DYNAMIC_FTRACE_WITH_REGS
>  	select HAVE_EFFICIENT_UNALIGNED_ACCESS
> diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
> index 1494cfa8639b..14a35a5df0a1 100644
> --- a/arch/arm64/include/asm/ftrace.h
> +++ b/arch/arm64/include/asm/ftrace.h
> @@ -78,6 +78,18 @@ static inline unsigned long ftrace_call_adjust(unsigned long addr)
>  	return addr;
>  }
>  
> +#ifdef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
> +static inline void arch_ftrace_set_direct_caller(struct pt_regs *regs,
> +						 unsigned long addr)
> +{
> +	/*
> +	 * Place custom trampoline address in regs->orig_x0 to let ftrace
> +	 * trampoline jump to it.
> +	 */
> +	regs->orig_x0 = addr;
> +}
> +#endif /* CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */

Please, let's not abuse pt_regs::orig_x0 for this. That's at best unnecessarily
confusing, and if we really need a field to place a value like this it implies
we should add an ftrace-specific structure to hold the ftrace-specific context
information.

Thanks,
Mark.

> +
>  #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
>  struct dyn_ftrace;
>  int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec);
> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> index 1197e7679882..b1ed0bf01c59 100644
> --- a/arch/arm64/kernel/asm-offsets.c
> +++ b/arch/arm64/kernel/asm-offsets.c
> @@ -80,6 +80,7 @@ int main(void)
>    DEFINE(S_SDEI_TTBR1,		offsetof(struct pt_regs, sdei_ttbr1));
>    DEFINE(S_PMR_SAVE,		offsetof(struct pt_regs, pmr_save));
>    DEFINE(S_STACKFRAME,		offsetof(struct pt_regs, stackframe));
> +  DEFINE(S_ORIG_X0,		offsetof(struct pt_regs, orig_x0));
>    DEFINE(PT_REGS_SIZE,		sizeof(struct pt_regs));
>    BLANK();
>  #ifdef CONFIG_COMPAT
> diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
> index e535480a4069..dfe62c55e3a2 100644
> --- a/arch/arm64/kernel/entry-ftrace.S
> +++ b/arch/arm64/kernel/entry-ftrace.S
> @@ -60,6 +60,9 @@
>  	str	x29, [sp, #S_FP]
>  	.endif
>  
> +	/* Set orig_x0 to zero  */
> +	str     xzr, [sp, #S_ORIG_X0]
> +
>  	/* Save the callsite's SP and LR */
>  	add	x10, sp, #(PT_REGS_SIZE + 16)
>  	stp	x9, x10, [sp, #S_LR]
> @@ -119,12 +122,21 @@ ftrace_common_return:
>  	/* Restore the callsite's FP, LR, PC */
>  	ldr	x29, [sp, #S_FP]
>  	ldr	x30, [sp, #S_LR]
> -	ldr	x9, [sp, #S_PC]
> -
> +	ldr	x10, [sp, #S_PC]
> +
> +	ldr	x11, [sp, #S_ORIG_X0]
> +	cbz	x11, 1f
> +	/* Set x9 to parent ip before jump to custom trampoline */
> +	mov	x9,  x30
> +	/* Set lr to self ip */
> +	ldr	x30, [sp, #S_PC]
> +	/* Set x10 (used for return address) to custom trampoline */
> +	mov	x10, x11
> +1:
>  	/* Restore the callsite's SP */
>  	add	sp, sp, #PT_REGS_SIZE + 16
>  
> -	ret	x9
> +	ret	x10
>  SYM_CODE_END(ftrace_common)
>  
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> -- 
> 2.30.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 2/6] ftrace: Fix deadloop caused by direct call in ftrace selftest
  2022-05-18 13:16   ` Xu Kuohai
@ 2022-05-25 13:43     ` Mark Rutland
  -1 siblings, 0 replies; 60+ messages in thread
From: Mark Rutland @ 2022-05-25 13:43 UTC (permalink / raw)
  To: Xu Kuohai
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi

On Wed, May 18, 2022 at 09:16:34AM -0400, Xu Kuohai wrote:
> After direct call is enabled for arm64, ftrace selftest enters a
> dead loop:

IIUC this means that patch 1 alone is broken, and presumably this patch should
have been part of it?

> <trace_selftest_dynamic_test_func>:
> 00  bti     c
> 01  mov     x9, x30                            <trace_direct_tramp>:
> 02  bl      <trace_direct_tramp>    ---------->     ret
>                                                      |
>                                          lr/x30 is 03, return to 03
>                                                      |
> 03  mov     w0, #0x0   <-----------------------------|
>      |                                               |
>      |                   dead loop!                  |
>      |                                               |
> 04  ret   ---- lr/x30 is still 03, go back to 03 ----|
> 
> The reason is that when the direct caller trace_direct_tramp() returns
> to the patched function trace_selftest_dynamic_test_func(), lr is still
> the address after the instrumented instruction in the patched function,
> so when the patched function exits, it returns to itself!
> 
> To fix this issue, we need to restore lr before trace_direct_tramp()
> exits, so rewrite a dedicated trace_direct_tramp() for arm64.

As mentioned on patch 1 I'd prefer we solved this through indirection, which
would avoid the need for this and would make things more robust generally by
keeping the unusual calling convention private to the patch-site and regular
trampoline.

Thanks,
Mark.

> Reported-by: Li Huafei <lihuafei1@huawei.com>
> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
> ---
>  arch/arm64/include/asm/ftrace.h  | 10 ++++++++++
>  arch/arm64/kernel/entry-ftrace.S | 10 ++++++++++
>  kernel/trace/trace_selftest.c    |  2 ++
>  3 files changed, 22 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
> index 14a35a5df0a1..6f6b184e72fb 100644
> --- a/arch/arm64/include/asm/ftrace.h
> +++ b/arch/arm64/include/asm/ftrace.h
> @@ -126,6 +126,16 @@ static inline bool arch_syscall_match_sym_name(const char *sym,
>  	 */
>  	return !strcmp(sym + 8, name);
>  }
> +
> +#ifdef CONFIG_FTRACE_SELFTEST
> +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
> +
> +#define trace_direct_tramp trace_direct_tramp
> +extern void trace_direct_tramp(void);
> +
> +#endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
> +#endif /* CONFIG_FTRACE_SELFTEST */
> +
>  #endif /* ifndef __ASSEMBLY__ */
>  
>  #endif /* __ASM_FTRACE_H */
> diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
> index dfe62c55e3a2..a47e87d4d3dd 100644
> --- a/arch/arm64/kernel/entry-ftrace.S
> +++ b/arch/arm64/kernel/entry-ftrace.S
> @@ -357,3 +357,13 @@ SYM_CODE_START(return_to_handler)
>  	ret
>  SYM_CODE_END(return_to_handler)
>  #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
> +
> +#ifdef CONFIG_FTRACE_SELFTEST
> +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
> +SYM_FUNC_START(trace_direct_tramp)
> +	mov x10, x30
> +	mov x30, x9
> +	ret x10
> +SYM_FUNC_END(trace_direct_tramp)
> +#endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
> +#endif /* CONFIG_FTRACE_SELFTEST */
> diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
> index abcadbe933bb..e7ccd0d10c39 100644
> --- a/kernel/trace/trace_selftest.c
> +++ b/kernel/trace/trace_selftest.c
> @@ -785,8 +785,10 @@ static struct fgraph_ops fgraph_ops __initdata  = {
>  };
>  
>  #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
> +#ifndef trace_direct_tramp
>  noinline __noclone static void trace_direct_tramp(void) { }
>  #endif
> +#endif
>  
>  /*
>   * Pretty much the same than for the function tracer from which the selftest
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 2/6] ftrace: Fix deadloop caused by direct call in ftrace selftest
@ 2022-05-25 13:43     ` Mark Rutland
  0 siblings, 0 replies; 60+ messages in thread
From: Mark Rutland @ 2022-05-25 13:43 UTC (permalink / raw)
  To: Xu Kuohai
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi

On Wed, May 18, 2022 at 09:16:34AM -0400, Xu Kuohai wrote:
> After direct call is enabled for arm64, ftrace selftest enters a
> dead loop:

IIUC this means that patch 1 alone is broken, and presumably this patch should
have been part of it?

> <trace_selftest_dynamic_test_func>:
> 00  bti     c
> 01  mov     x9, x30                            <trace_direct_tramp>:
> 02  bl      <trace_direct_tramp>    ---------->     ret
>                                                      |
>                                          lr/x30 is 03, return to 03
>                                                      |
> 03  mov     w0, #0x0   <-----------------------------|
>      |                                               |
>      |                   dead loop!                  |
>      |                                               |
> 04  ret   ---- lr/x30 is still 03, go back to 03 ----|
> 
> The reason is that when the direct caller trace_direct_tramp() returns
> to the patched function trace_selftest_dynamic_test_func(), lr is still
> the address after the instrumented instruction in the patched function,
> so when the patched function exits, it returns to itself!
> 
> To fix this issue, we need to restore lr before trace_direct_tramp()
> exits, so rewrite a dedicated trace_direct_tramp() for arm64.

As mentioned on patch 1 I'd prefer we solved this through indirection, which
would avoid the need for this and would make things more robust generally by
keeping the unusual calling convention private to the patch-site and regular
trampoline.

Thanks,
Mark.

> Reported-by: Li Huafei <lihuafei1@huawei.com>
> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
> ---
>  arch/arm64/include/asm/ftrace.h  | 10 ++++++++++
>  arch/arm64/kernel/entry-ftrace.S | 10 ++++++++++
>  kernel/trace/trace_selftest.c    |  2 ++
>  3 files changed, 22 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
> index 14a35a5df0a1..6f6b184e72fb 100644
> --- a/arch/arm64/include/asm/ftrace.h
> +++ b/arch/arm64/include/asm/ftrace.h
> @@ -126,6 +126,16 @@ static inline bool arch_syscall_match_sym_name(const char *sym,
>  	 */
>  	return !strcmp(sym + 8, name);
>  }
> +
> +#ifdef CONFIG_FTRACE_SELFTEST
> +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
> +
> +#define trace_direct_tramp trace_direct_tramp
> +extern void trace_direct_tramp(void);
> +
> +#endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
> +#endif /* CONFIG_FTRACE_SELFTEST */
> +
>  #endif /* ifndef __ASSEMBLY__ */
>  
>  #endif /* __ASM_FTRACE_H */
> diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
> index dfe62c55e3a2..a47e87d4d3dd 100644
> --- a/arch/arm64/kernel/entry-ftrace.S
> +++ b/arch/arm64/kernel/entry-ftrace.S
> @@ -357,3 +357,13 @@ SYM_CODE_START(return_to_handler)
>  	ret
>  SYM_CODE_END(return_to_handler)
>  #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
> +
> +#ifdef CONFIG_FTRACE_SELFTEST
> +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
> +SYM_FUNC_START(trace_direct_tramp)
> +	mov x10, x30
> +	mov x30, x9
> +	ret x10
> +SYM_FUNC_END(trace_direct_tramp)
> +#endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
> +#endif /* CONFIG_FTRACE_SELFTEST */
> diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
> index abcadbe933bb..e7ccd0d10c39 100644
> --- a/kernel/trace/trace_selftest.c
> +++ b/kernel/trace/trace_selftest.c
> @@ -785,8 +785,10 @@ static struct fgraph_ops fgraph_ops __initdata  = {
>  };
>  
>  #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
> +#ifndef trace_direct_tramp
>  noinline __noclone static void trace_direct_tramp(void) { }
>  #endif
> +#endif
>  
>  /*
>   * Pretty much the same than for the function tracer from which the selftest
> -- 
> 2.30.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 3/6] bpf: Remove is_valid_bpf_tramp_flags()
  2022-05-18 13:16   ` Xu Kuohai
@ 2022-05-25 13:45     ` Mark Rutland
  -1 siblings, 0 replies; 60+ messages in thread
From: Mark Rutland @ 2022-05-25 13:45 UTC (permalink / raw)
  To: Xu Kuohai
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi

On Wed, May 18, 2022 at 09:16:35AM -0400, Xu Kuohai wrote:
> BPF_TRAM_F_XXX flags are not used by user code and are almost constant
> at compile time, so run time validation is a bit overkill. Remove
> is_valid_bpf_tramp_flags() and add some usage comments.
> 
> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
> Acked-by: Song Liu <songliubraving@fb.com>

Am I right in thinking this is independent of the arm64-specific bits, and
could be taken on its own now?

Mark.

> ---
>  arch/x86/net/bpf_jit_comp.c | 20 --------------------
>  kernel/bpf/bpf_struct_ops.c |  3 +++
>  kernel/bpf/trampoline.c     |  3 +++
>  3 files changed, 6 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index a2b6d197c226..7698ef3b4821 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -1922,23 +1922,6 @@ static int invoke_bpf_mod_ret(const struct btf_func_model *m, u8 **pprog,
>  	return 0;
>  }
>  
> -static bool is_valid_bpf_tramp_flags(unsigned int flags)
> -{
> -	if ((flags & BPF_TRAMP_F_RESTORE_REGS) &&
> -	    (flags & BPF_TRAMP_F_SKIP_FRAME))
> -		return false;
> -
> -	/*
> -	 * BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops,
> -	 * and it must be used alone.
> -	 */
> -	if ((flags & BPF_TRAMP_F_RET_FENTRY_RET) &&
> -	    (flags & ~BPF_TRAMP_F_RET_FENTRY_RET))
> -		return false;
> -
> -	return true;
> -}
> -
>  /* Example:
>   * __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev);
>   * its 'struct btf_func_model' will be nr_args=2
> @@ -2017,9 +2000,6 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>  	if (nr_args > 6)
>  		return -ENOTSUPP;
>  
> -	if (!is_valid_bpf_tramp_flags(flags))
> -		return -EINVAL;
> -
>  	/* Generated trampoline stack layout:
>  	 *
>  	 * RBP + 8         [ return address  ]
> diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
> index d9a3c9207240..0572cc5aeb28 100644
> --- a/kernel/bpf/bpf_struct_ops.c
> +++ b/kernel/bpf/bpf_struct_ops.c
> @@ -341,6 +341,9 @@ int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks,
>  
>  	tlinks[BPF_TRAMP_FENTRY].links[0] = link;
>  	tlinks[BPF_TRAMP_FENTRY].nr_links = 1;
> +	/* BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops,
> +	 * and it must be used alone.
> +	 */
>  	flags = model->ret_size > 0 ? BPF_TRAMP_F_RET_FENTRY_RET : 0;
>  	return arch_prepare_bpf_trampoline(NULL, image, image_end,
>  					   model, flags, tlinks, NULL);
> diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
> index 93c7675f0c9e..bd3f2e673874 100644
> --- a/kernel/bpf/trampoline.c
> +++ b/kernel/bpf/trampoline.c
> @@ -358,6 +358,9 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr)
>  
>  	if (tlinks[BPF_TRAMP_FEXIT].nr_links ||
>  	    tlinks[BPF_TRAMP_MODIFY_RETURN].nr_links)
> +		/* NOTE: BPF_TRAMP_F_RESTORE_REGS and BPF_TRAMP_F_SKIP_FRAME
> +		 * should not be set together.
> +		 */
>  		flags = BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME;
>  
>  	if (ip_arg)
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 3/6] bpf: Remove is_valid_bpf_tramp_flags()
@ 2022-05-25 13:45     ` Mark Rutland
  0 siblings, 0 replies; 60+ messages in thread
From: Mark Rutland @ 2022-05-25 13:45 UTC (permalink / raw)
  To: Xu Kuohai
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi

On Wed, May 18, 2022 at 09:16:35AM -0400, Xu Kuohai wrote:
> BPF_TRAM_F_XXX flags are not used by user code and are almost constant
> at compile time, so run time validation is a bit overkill. Remove
> is_valid_bpf_tramp_flags() and add some usage comments.
> 
> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
> Acked-by: Song Liu <songliubraving@fb.com>

Am I right in thinking this is independent of the arm64-specific bits, and
could be taken on its own now?

Mark.

> ---
>  arch/x86/net/bpf_jit_comp.c | 20 --------------------
>  kernel/bpf/bpf_struct_ops.c |  3 +++
>  kernel/bpf/trampoline.c     |  3 +++
>  3 files changed, 6 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index a2b6d197c226..7698ef3b4821 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -1922,23 +1922,6 @@ static int invoke_bpf_mod_ret(const struct btf_func_model *m, u8 **pprog,
>  	return 0;
>  }
>  
> -static bool is_valid_bpf_tramp_flags(unsigned int flags)
> -{
> -	if ((flags & BPF_TRAMP_F_RESTORE_REGS) &&
> -	    (flags & BPF_TRAMP_F_SKIP_FRAME))
> -		return false;
> -
> -	/*
> -	 * BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops,
> -	 * and it must be used alone.
> -	 */
> -	if ((flags & BPF_TRAMP_F_RET_FENTRY_RET) &&
> -	    (flags & ~BPF_TRAMP_F_RET_FENTRY_RET))
> -		return false;
> -
> -	return true;
> -}
> -
>  /* Example:
>   * __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev);
>   * its 'struct btf_func_model' will be nr_args=2
> @@ -2017,9 +2000,6 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>  	if (nr_args > 6)
>  		return -ENOTSUPP;
>  
> -	if (!is_valid_bpf_tramp_flags(flags))
> -		return -EINVAL;
> -
>  	/* Generated trampoline stack layout:
>  	 *
>  	 * RBP + 8         [ return address  ]
> diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
> index d9a3c9207240..0572cc5aeb28 100644
> --- a/kernel/bpf/bpf_struct_ops.c
> +++ b/kernel/bpf/bpf_struct_ops.c
> @@ -341,6 +341,9 @@ int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks,
>  
>  	tlinks[BPF_TRAMP_FENTRY].links[0] = link;
>  	tlinks[BPF_TRAMP_FENTRY].nr_links = 1;
> +	/* BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops,
> +	 * and it must be used alone.
> +	 */
>  	flags = model->ret_size > 0 ? BPF_TRAMP_F_RET_FENTRY_RET : 0;
>  	return arch_prepare_bpf_trampoline(NULL, image, image_end,
>  					   model, flags, tlinks, NULL);
> diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
> index 93c7675f0c9e..bd3f2e673874 100644
> --- a/kernel/bpf/trampoline.c
> +++ b/kernel/bpf/trampoline.c
> @@ -358,6 +358,9 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr)
>  
>  	if (tlinks[BPF_TRAMP_FEXIT].nr_links ||
>  	    tlinks[BPF_TRAMP_MODIFY_RETURN].nr_links)
> +		/* NOTE: BPF_TRAMP_F_RESTORE_REGS and BPF_TRAMP_F_SKIP_FRAME
> +		 * should not be set together.
> +		 */
>  		flags = BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME;
>  
>  	if (ip_arg)
> -- 
> 2.30.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 4/6] bpf, arm64: Impelment bpf_arch_text_poke() for arm64
  2022-05-18 13:16   ` Xu Kuohai
@ 2022-05-25 14:10     ` Mark Rutland
  -1 siblings, 0 replies; 60+ messages in thread
From: Mark Rutland @ 2022-05-25 14:10 UTC (permalink / raw)
  To: Xu Kuohai
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi

On Wed, May 18, 2022 at 09:16:36AM -0400, Xu Kuohai wrote:
> Impelment bpf_arch_text_poke() for arm64, so bpf trampoline code can use
> it to replace nop with jump, or replace jump with nop.
> 
> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
> Acked-by: Song Liu <songliubraving@fb.com>
> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
>  arch/arm64/net/bpf_jit.h      |   1 +
>  arch/arm64/net/bpf_jit_comp.c | 107 +++++++++++++++++++++++++++++++---
>  2 files changed, 99 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h
> index 194c95ccc1cf..1c4b0075a3e2 100644
> --- a/arch/arm64/net/bpf_jit.h
> +++ b/arch/arm64/net/bpf_jit.h
> @@ -270,6 +270,7 @@
>  #define A64_BTI_C  A64_HINT(AARCH64_INSN_HINT_BTIC)
>  #define A64_BTI_J  A64_HINT(AARCH64_INSN_HINT_BTIJ)
>  #define A64_BTI_JC A64_HINT(AARCH64_INSN_HINT_BTIJC)
> +#define A64_NOP    A64_HINT(AARCH64_INSN_HINT_NOP)
>  
>  /* DMB */
>  #define A64_DMB_ISH aarch64_insn_gen_dmb(AARCH64_INSN_MB_ISH)
> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
> index 8ab4035dea27..5ce6ed5f42a1 100644
> --- a/arch/arm64/net/bpf_jit_comp.c
> +++ b/arch/arm64/net/bpf_jit_comp.c
> @@ -9,6 +9,7 @@
>  
>  #include <linux/bitfield.h>
>  #include <linux/bpf.h>
> +#include <linux/memory.h>
>  #include <linux/filter.h>
>  #include <linux/printk.h>
>  #include <linux/slab.h>
> @@ -18,6 +19,7 @@
>  #include <asm/cacheflush.h>
>  #include <asm/debug-monitors.h>
>  #include <asm/insn.h>
> +#include <asm/patching.h>
>  #include <asm/set_memory.h>
>  
>  #include "bpf_jit.h"
> @@ -235,13 +237,13 @@ static bool is_lsi_offset(int offset, int scale)
>  	return true;
>  }
>  
> +#define BTI_INSNS (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) ? 1 : 0)
> +#define PAC_INSNS (IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL) ? 1 : 0)
> +
>  /* Tail call offset to jump into */
> -#if IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) || \
> -	IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL)
> -#define PROLOGUE_OFFSET 9
> -#else
> -#define PROLOGUE_OFFSET 8
> -#endif
> +#define PROLOGUE_OFFSET	(BTI_INSNS + 2 + PAC_INSNS + 8)
> +/* Offset of nop instruction in bpf prog entry to be poked */
> +#define POKE_OFFSET	(BTI_INSNS + 1)
>  
>  static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
>  {
> @@ -279,12 +281,15 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
>  	 *
>  	 */
>  
> +	if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
> +		emit(A64_BTI_C, ctx);
> +
> +	emit(A64_MOV(1, A64_R(9), A64_LR), ctx);
> +	emit(A64_NOP, ctx);

I take it the idea is to make this the same as the regular ftrace patch-site
sequence, so that this can call the same trampoline(s) ?

If so, we need some commentary to that effect, and we need some comments in the
ftrace code explaining that this needs to be kept in-sync.

> +
>  	/* Sign lr */
>  	if (IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL))
>  		emit(A64_PACIASP, ctx);
> -	/* BTI landing pad */
> -	else if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
> -		emit(A64_BTI_C, ctx);
>  
>  	/* Save FP and LR registers to stay align with ARM64 AAPCS */
>  	emit(A64_PUSH(A64_FP, A64_LR, A64_SP), ctx);
> @@ -1529,3 +1534,87 @@ void bpf_jit_free_exec(void *addr)
>  {
>  	return vfree(addr);
>  }
> +
> +static int gen_branch_or_nop(enum aarch64_insn_branch_type type, void *ip,
> +			     void *addr, u32 *insn)
> +{
> +	if (!addr)
> +		*insn = aarch64_insn_gen_nop();
> +	else
> +		*insn = aarch64_insn_gen_branch_imm((unsigned long)ip,
> +						    (unsigned long)addr,
> +						    type);
> +
> +	return *insn != AARCH64_BREAK_FAULT ? 0 : -EFAULT;
> +}
> +
> +int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type poke_type,
> +		       void *old_addr, void *new_addr)
> +{
> +	int ret;
> +	u32 old_insn;
> +	u32 new_insn;
> +	u32 replaced;
> +	unsigned long offset = ~0UL;
> +	enum aarch64_insn_branch_type branch_type;
> +	char namebuf[KSYM_NAME_LEN];
> +
> +	if (!__bpf_address_lookup((unsigned long)ip, NULL, &offset, namebuf))
> +		/* Only poking bpf text is supported. Since kernel function
> +		 * entry is set up by ftrace, we reply on ftrace to poke kernel
> +		 * functions.
> +		 */
> +		return -EINVAL;
> +
> +	/* bpf entry */
> +	if (offset == 0UL)
> +		/* skip to the nop instruction in bpf prog entry:
> +		 * bti c	// if BTI enabled
> +		 * mov x9, x30
> +		 * nop
> +		 */
> +		ip = ip + POKE_OFFSET * AARCH64_INSN_SIZE;

When is offset non-zero? is this ever called to patch other instructions, and
could this ever be used to try to patch the BTI specifically?

I strongly suspect we need a higher-level API to say "poke the patchable
callsite in the prologue", rather than assuming that offset 0 always means
that, or it'll be *very* easy for this to go wrong.

> +
> +	if (poke_type == BPF_MOD_CALL)
> +		branch_type = AARCH64_INSN_BRANCH_LINK;
> +	else
> +		branch_type = AARCH64_INSN_BRANCH_NOLINK;

When is poke_type *not* BPF_MOD_CALL?

I assume that means BPF also uses this for non-ftrace reasons?

> +	if (gen_branch_or_nop(branch_type, ip, old_addr, &old_insn) < 0)
> +		return -EFAULT;
> +
> +	if (gen_branch_or_nop(branch_type, ip, new_addr, &new_insn) < 0)
> +		return -EFAULT;
> +
> +	mutex_lock(&text_mutex);
> +	if (aarch64_insn_read(ip, &replaced)) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	if (replaced != old_insn) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	/* We call aarch64_insn_patch_text_nosync() to replace instruction
> +	 * atomically, so no other CPUs will fetch a half-new and half-old
> +	 * instruction. But there is chance that another CPU fetches the old
> +	 * instruction after bpf_arch_text_poke() finishes, that is, different
> +	 * CPUs may execute different versions of instructions at the same
> +	 * time before the icache is synchronized by hardware.
> +	 *
> +	 * 1. when a new trampoline is attached, it is not an issue for
> +	 *    different CPUs to jump to different trampolines temporarily.
> +	 *
> +	 * 2. when an old trampoline is freed, we should wait for all other
> +	 *    CPUs to exit the trampoline and make sure the trampoline is no
> +	 *    longer reachable, since bpf_tramp_image_put() function already
> +	 *    uses percpu_ref and rcu task to do the sync, no need to call the
> +	 *    sync interface here.
> +	 */

How is RCU used for that? It's not clear to me how that works for PREEMPT_RCU
(which is the usual configuration for arm64), since we can easily be in a
preemptible context, outside of an RCU read side critical section, yet call
into a trampoline.

I know that for livepatching we need to use stacktracing to ensure we've
finished using code we'd like to free, and I can't immediately see how you can
avoid that here. I'm suspicious that there's still a race where threads can
enter the trampoline and it can be subsequently freed.

For ftrace today we get away with entering the existing trampolines when not
intended because those are statically allocated, and the race is caught when
acquiring the ops inside the ftrace core code. This case is different because
the CPU can fetch the instruction and execute that at any time, without any RCU
involvement.

Can you give more details on how the scheme described above works? How
*exactly*` do you ensure that threads which have entered the trampoline (and
may have been immediately preempted by an interrupt) have returned? Which RCU
mechanism are you using?

If you can point me at where this is implemented I'm happy to take a look.

Thanks,
Mark.

> +	ret = aarch64_insn_patch_text_nosync(ip, new_insn);
> +out:
> +	mutex_unlock(&text_mutex);
> +	return ret;
> +}
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 4/6] bpf, arm64: Impelment bpf_arch_text_poke() for arm64
@ 2022-05-25 14:10     ` Mark Rutland
  0 siblings, 0 replies; 60+ messages in thread
From: Mark Rutland @ 2022-05-25 14:10 UTC (permalink / raw)
  To: Xu Kuohai
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi

On Wed, May 18, 2022 at 09:16:36AM -0400, Xu Kuohai wrote:
> Impelment bpf_arch_text_poke() for arm64, so bpf trampoline code can use
> it to replace nop with jump, or replace jump with nop.
> 
> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
> Acked-by: Song Liu <songliubraving@fb.com>
> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
>  arch/arm64/net/bpf_jit.h      |   1 +
>  arch/arm64/net/bpf_jit_comp.c | 107 +++++++++++++++++++++++++++++++---
>  2 files changed, 99 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h
> index 194c95ccc1cf..1c4b0075a3e2 100644
> --- a/arch/arm64/net/bpf_jit.h
> +++ b/arch/arm64/net/bpf_jit.h
> @@ -270,6 +270,7 @@
>  #define A64_BTI_C  A64_HINT(AARCH64_INSN_HINT_BTIC)
>  #define A64_BTI_J  A64_HINT(AARCH64_INSN_HINT_BTIJ)
>  #define A64_BTI_JC A64_HINT(AARCH64_INSN_HINT_BTIJC)
> +#define A64_NOP    A64_HINT(AARCH64_INSN_HINT_NOP)
>  
>  /* DMB */
>  #define A64_DMB_ISH aarch64_insn_gen_dmb(AARCH64_INSN_MB_ISH)
> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
> index 8ab4035dea27..5ce6ed5f42a1 100644
> --- a/arch/arm64/net/bpf_jit_comp.c
> +++ b/arch/arm64/net/bpf_jit_comp.c
> @@ -9,6 +9,7 @@
>  
>  #include <linux/bitfield.h>
>  #include <linux/bpf.h>
> +#include <linux/memory.h>
>  #include <linux/filter.h>
>  #include <linux/printk.h>
>  #include <linux/slab.h>
> @@ -18,6 +19,7 @@
>  #include <asm/cacheflush.h>
>  #include <asm/debug-monitors.h>
>  #include <asm/insn.h>
> +#include <asm/patching.h>
>  #include <asm/set_memory.h>
>  
>  #include "bpf_jit.h"
> @@ -235,13 +237,13 @@ static bool is_lsi_offset(int offset, int scale)
>  	return true;
>  }
>  
> +#define BTI_INSNS (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) ? 1 : 0)
> +#define PAC_INSNS (IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL) ? 1 : 0)
> +
>  /* Tail call offset to jump into */
> -#if IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) || \
> -	IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL)
> -#define PROLOGUE_OFFSET 9
> -#else
> -#define PROLOGUE_OFFSET 8
> -#endif
> +#define PROLOGUE_OFFSET	(BTI_INSNS + 2 + PAC_INSNS + 8)
> +/* Offset of nop instruction in bpf prog entry to be poked */
> +#define POKE_OFFSET	(BTI_INSNS + 1)
>  
>  static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
>  {
> @@ -279,12 +281,15 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
>  	 *
>  	 */
>  
> +	if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
> +		emit(A64_BTI_C, ctx);
> +
> +	emit(A64_MOV(1, A64_R(9), A64_LR), ctx);
> +	emit(A64_NOP, ctx);

I take it the idea is to make this the same as the regular ftrace patch-site
sequence, so that this can call the same trampoline(s) ?

If so, we need some commentary to that effect, and we need some comments in the
ftrace code explaining that this needs to be kept in-sync.

> +
>  	/* Sign lr */
>  	if (IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL))
>  		emit(A64_PACIASP, ctx);
> -	/* BTI landing pad */
> -	else if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
> -		emit(A64_BTI_C, ctx);
>  
>  	/* Save FP and LR registers to stay align with ARM64 AAPCS */
>  	emit(A64_PUSH(A64_FP, A64_LR, A64_SP), ctx);
> @@ -1529,3 +1534,87 @@ void bpf_jit_free_exec(void *addr)
>  {
>  	return vfree(addr);
>  }
> +
> +static int gen_branch_or_nop(enum aarch64_insn_branch_type type, void *ip,
> +			     void *addr, u32 *insn)
> +{
> +	if (!addr)
> +		*insn = aarch64_insn_gen_nop();
> +	else
> +		*insn = aarch64_insn_gen_branch_imm((unsigned long)ip,
> +						    (unsigned long)addr,
> +						    type);
> +
> +	return *insn != AARCH64_BREAK_FAULT ? 0 : -EFAULT;
> +}
> +
> +int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type poke_type,
> +		       void *old_addr, void *new_addr)
> +{
> +	int ret;
> +	u32 old_insn;
> +	u32 new_insn;
> +	u32 replaced;
> +	unsigned long offset = ~0UL;
> +	enum aarch64_insn_branch_type branch_type;
> +	char namebuf[KSYM_NAME_LEN];
> +
> +	if (!__bpf_address_lookup((unsigned long)ip, NULL, &offset, namebuf))
> +		/* Only poking bpf text is supported. Since kernel function
> +		 * entry is set up by ftrace, we reply on ftrace to poke kernel
> +		 * functions.
> +		 */
> +		return -EINVAL;
> +
> +	/* bpf entry */
> +	if (offset == 0UL)
> +		/* skip to the nop instruction in bpf prog entry:
> +		 * bti c	// if BTI enabled
> +		 * mov x9, x30
> +		 * nop
> +		 */
> +		ip = ip + POKE_OFFSET * AARCH64_INSN_SIZE;

When is offset non-zero? is this ever called to patch other instructions, and
could this ever be used to try to patch the BTI specifically?

I strongly suspect we need a higher-level API to say "poke the patchable
callsite in the prologue", rather than assuming that offset 0 always means
that, or it'll be *very* easy for this to go wrong.

> +
> +	if (poke_type == BPF_MOD_CALL)
> +		branch_type = AARCH64_INSN_BRANCH_LINK;
> +	else
> +		branch_type = AARCH64_INSN_BRANCH_NOLINK;

When is poke_type *not* BPF_MOD_CALL?

I assume that means BPF also uses this for non-ftrace reasons?

> +	if (gen_branch_or_nop(branch_type, ip, old_addr, &old_insn) < 0)
> +		return -EFAULT;
> +
> +	if (gen_branch_or_nop(branch_type, ip, new_addr, &new_insn) < 0)
> +		return -EFAULT;
> +
> +	mutex_lock(&text_mutex);
> +	if (aarch64_insn_read(ip, &replaced)) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	if (replaced != old_insn) {
> +		ret = -EFAULT;
> +		goto out;
> +	}
> +
> +	/* We call aarch64_insn_patch_text_nosync() to replace instruction
> +	 * atomically, so no other CPUs will fetch a half-new and half-old
> +	 * instruction. But there is chance that another CPU fetches the old
> +	 * instruction after bpf_arch_text_poke() finishes, that is, different
> +	 * CPUs may execute different versions of instructions at the same
> +	 * time before the icache is synchronized by hardware.
> +	 *
> +	 * 1. when a new trampoline is attached, it is not an issue for
> +	 *    different CPUs to jump to different trampolines temporarily.
> +	 *
> +	 * 2. when an old trampoline is freed, we should wait for all other
> +	 *    CPUs to exit the trampoline and make sure the trampoline is no
> +	 *    longer reachable, since bpf_tramp_image_put() function already
> +	 *    uses percpu_ref and rcu task to do the sync, no need to call the
> +	 *    sync interface here.
> +	 */

How is RCU used for that? It's not clear to me how that works for PREEMPT_RCU
(which is the usual configuration for arm64), since we can easily be in a
preemptible context, outside of an RCU read side critical section, yet call
into a trampoline.

I know that for livepatching we need to use stacktracing to ensure we've
finished using code we'd like to free, and I can't immediately see how you can
avoid that here. I'm suspicious that there's still a race where threads can
enter the trampoline and it can be subsequently freed.

For ftrace today we get away with entering the existing trampolines when not
intended because those are statically allocated, and the race is caught when
acquiring the ops inside the ftrace core code. This case is different because
the CPU can fetch the instruction and execute that at any time, without any RCU
involvement.

Can you give more details on how the scheme described above works? How
*exactly*` do you ensure that threads which have entered the trampoline (and
may have been immediately preempted by an interrupt) have returned? Which RCU
mechanism are you using?

If you can point me at where this is implemented I'm happy to take a look.

Thanks,
Mark.

> +	ret = aarch64_insn_patch_text_nosync(ip, new_insn);
> +out:
> +	mutex_unlock(&text_mutex);
> +	return ret;
> +}
> -- 
> 2.30.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 1/6] arm64: ftrace: Add ftrace direct call support
  2022-05-25 13:38     ` Mark Rutland
@ 2022-05-26  9:45       ` Xu Kuohai
  -1 siblings, 0 replies; 60+ messages in thread
From: Xu Kuohai @ 2022-05-26  9:45 UTC (permalink / raw)
  To: Mark Rutland
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi, Wang ShaoBo,
	cj.chengjian, huawei.libin, xiexiuqi, liwei391

On 5/25/2022 9:38 PM, Mark Rutland wrote:
> On Wed, May 18, 2022 at 09:16:33AM -0400, Xu Kuohai wrote:
>> Add ftrace direct support for arm64.
>>
>> 1. When there is custom trampoline only, replace the fentry nop to a
>>    jump instruction that jumps directly to the custom trampoline.
>>
>> 2. When ftrace trampoline and custom trampoline coexist, jump from
>>    fentry to ftrace trampoline first, then jump to custom trampoline
>>    when ftrace trampoline exits. The current unused register
>>    pt_regs->orig_x0 is used as an intermediary for jumping from ftrace
>>    trampoline to custom trampoline.
> 
> For those of us not all that familiar with BPF, can you explain *why* you want
> this? The above explains what the patch implements, but not why that's useful.
> 
> e.g. is this just to avoid the overhead of the ops list processing in the
> regular ftrace code, or is the custom trampoline there to allow you to do
> something special?

IIUC, ftrace direct call was designed to *remove* the unnecessary
overhead of saving regs completely [1][2].

[1]
https://lore.kernel.org/all/20191022175052.frjzlnjjfwwfov64@ast-mbp.dhcp.thefacebook.com/
[2] https://lore.kernel.org/all/20191108212834.594904349@goodmis.org/

This patch itself is just a variant of [3].

[3] https://lore.kernel.org/all/20191108213450.891579507@goodmis.org/

> 
> There is another patch series on the list from some of your colleagues which
> uses dynamic trampolines to try to avoid that ops list overhead, and it's not
> clear to me whether these are trying to solve the largely same problem or
> something different. That other thread is at:
> 
>   https://lore.kernel.org/linux-arm-kernel/20220316100132.244849-1-bobo.shaobowang@huawei.com/
> 
> ... and I've added the relevant parties to CC here, since there doesn't seem to
> be any overlap in the CC lists of the two threads.

We're not working to solve the same problem. The trampoline introduced
in this series helps us to monitor kernel function or another bpf prog
with bpf, and also helps us to use bpf prog like a normal kernel
function pointer.

> 
> In that other thread I've suggested a general approach we could follow at:
>   
>   https://lore.kernel.org/linux-arm-kernel/YmGF%2FOpIhAF8YeVq@lakrids/
>

Is it possible for a kernel function to take a long jump to common
trampoline when we get a huge kernel image?

> As noted in that thread, I have a few concerns which equally apply here:
> 
> * Due to the limited range of BL instructions, it's not always possible to
>   patch an ftrace call-site to branch to an arbitrary trampoline. The way this
>   works for ftrace today relies upon knowingthe set of trampolines at
>   compile-time, and allocating module PLTs for those, and that approach cannot
>   work reliably for dynanically allocated trampolines.

Currently patch 5 returns -ENOTSUPP when long jump is detected, so no
bpf trampoline is constructed for out of range patch-site:

if (is_long_jump(orig_call, image))
	return -ENOTSUPP;

> 
>   I'd strongly prefer to avoid custom tramplines unless they're strictly
>   necessary for functional reasons, so that we can have this work reliably and
>   consistently.

bpf trampoline is needed by bpf itself, not to replace ftrace trampolines.

>> * If this is mostly about avoiding the ops list processing overhead, I
beleive
>   we can implement some custom ops support more generally in ftrace which would
>   still use a common trampoline but could directly call into those custom ops.
>   I would strongly prefer this over custom trampolines.
> 
> * I'm looking to minimize the set of regs ftrace saves, and never save a full
>   pt_regs, since today we (incompletely) fill that with bogus values and cannot
>   acquire some state reliably (e.g. PSTATE). I'd like to avoid usage of pt_regs
>   unless necessary, and I don't want to add additional reliance upon that
>   structure.

Even if such a common trampoline is used, bpf trampoline is still
necessary since we need to construct custom instructions to implement
bpf functions, for example, to implement kernel function pointer with a
bpf prog.

> 
>> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
>> Acked-by: Song Liu <songliubraving@fb.com>
>> ---
>>  arch/arm64/Kconfig               |  2 ++
>>  arch/arm64/include/asm/ftrace.h  | 12 ++++++++++++
>>  arch/arm64/kernel/asm-offsets.c  |  1 +
>>  arch/arm64/kernel/entry-ftrace.S | 18 +++++++++++++++---
>>  4 files changed, 30 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 57c4c995965f..81cc330daafc 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -177,6 +177,8 @@ config ARM64
>>  	select HAVE_DYNAMIC_FTRACE
>>  	select HAVE_DYNAMIC_FTRACE_WITH_REGS \
>>  		if $(cc-option,-fpatchable-function-entry=2)
>> +	select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS \
>> +		if DYNAMIC_FTRACE_WITH_REGS
>>  	select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY \
>>  		if DYNAMIC_FTRACE_WITH_REGS
>>  	select HAVE_EFFICIENT_UNALIGNED_ACCESS
>> diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
>> index 1494cfa8639b..14a35a5df0a1 100644
>> --- a/arch/arm64/include/asm/ftrace.h
>> +++ b/arch/arm64/include/asm/ftrace.h
>> @@ -78,6 +78,18 @@ static inline unsigned long ftrace_call_adjust(unsigned long addr)
>>  	return addr;
>>  }
>>  
>> +#ifdef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
>> +static inline void arch_ftrace_set_direct_caller(struct pt_regs *regs,
>> +						 unsigned long addr)
>> +{
>> +	/*
>> +	 * Place custom trampoline address in regs->orig_x0 to let ftrace
>> +	 * trampoline jump to it.
>> +	 */
>> +	regs->orig_x0 = addr;
>> +}
>> +#endif /* CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
> 
> Please, let's not abuse pt_regs::orig_x0 for this. That's at best unnecessarily
> confusing, and if we really need a field to place a value like this it implies
> we should add an ftrace-specific structure to hold the ftrace-specific context
> information.
> 

Sorry for this confusion, this was modified in the x86 way:

https://lore.kernel.org/all/20191108213450.891579507@goodmis.org/

> Thanks,
> Mark.
> 
>> +
>>  #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
>>  struct dyn_ftrace;g w
>>  int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec);
>> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
>> index 1197e7679882..b1ed0bf01c59 100644
>> --- a/arch/arm64/kernel/asm-offsets.c
>> +++ b/arch/arm64/kernel/asm-offsets.c
>> @@ -80,6 +80,7 @@ int main(void)
>>    DEFINE(S_SDEI_TTBR1,		offsetof(struct pt_regs, sdei_ttbr1));
>>    DEFINE(S_PMR_SAVE,		offsetof(struct pt_regs, pmr_save));
>>    DEFINE(S_STACKFRAME,		offsetof(struct pt_regs, stackframe));
>> +  DEFINE(S_ORIG_X0,		offsetof(struct pt_regs, orig_x0));
>>    DEFINE(PT_REGS_SIZE,		sizeof(struct pt_regs));
>>    BLANK();
>>  #ifdef CONFIG_COMPAT
>> diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
>> index e535480a4069..dfe62c55e3a2 100644
>> --- a/arch/arm64/kernel/entry-ftrace.S
>> +++ b/arch/arm64/kernel/entry-ftrace.S
>> @@ -60,6 +60,9 @@
>>  	str	x29, [sp, #S_FP]
>>  	.endif
>>  
>> +	/* Set orig_x0 to zero  */
>> +	str     xzr, [sp, #S_ORIG_X0]
>> +
>>  	/* Save the callsite's SP and LR */
>>  	add	x10, sp, #(PT_REGS_SIZE + 16)
>>  	stp	x9, x10, [sp, #S_LR]
>> @@ -119,12 +122,21 @@ ftrace_common_return:
>>  	/* Restore the callsite's FP, LR, PC */
>>  	ldr	x29, [sp, #S_FP]
>>  	ldr	x30, [sp, #S_LR]
>> -	ldr	x9, [sp, #S_PC]
>> -
>> +	ldr	x10, [sp, #S_PC]
>> +
>> +	ldr	x11, [sp, #S_ORIG_X0]
>> +	cbz	x11, 1f
>> +	/* Set x9 to parent ip before jump to custom trampoline */
>> +	mov	x9,  x30
>> +	/* Set lr to self ip */
>> +	ldr	x30, [sp, #S_PC]
>> +	/* Set x10 (used for return address) to custom trampoline */
>> +	mov	x10, x11
>> +1:
>>  	/* Restore the callsite's SP */
>>  	add	sp, sp, #PT_REGS_SIZE + 16
>>  
>> -	ret	x9
>> +	ret	x10
>>  SYM_CODE_END(ftrace_common)
>>  
>>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>> -- 
>> 2.30.2
>>
> .


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 1/6] arm64: ftrace: Add ftrace direct call support
@ 2022-05-26  9:45       ` Xu Kuohai
  0 siblings, 0 replies; 60+ messages in thread
From: Xu Kuohai @ 2022-05-26  9:45 UTC (permalink / raw)
  To: Mark Rutland
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi, Wang ShaoBo,
	cj.chengjian, huawei.libin, xiexiuqi, liwei391

On 5/25/2022 9:38 PM, Mark Rutland wrote:
> On Wed, May 18, 2022 at 09:16:33AM -0400, Xu Kuohai wrote:
>> Add ftrace direct support for arm64.
>>
>> 1. When there is custom trampoline only, replace the fentry nop to a
>>    jump instruction that jumps directly to the custom trampoline.
>>
>> 2. When ftrace trampoline and custom trampoline coexist, jump from
>>    fentry to ftrace trampoline first, then jump to custom trampoline
>>    when ftrace trampoline exits. The current unused register
>>    pt_regs->orig_x0 is used as an intermediary for jumping from ftrace
>>    trampoline to custom trampoline.
> 
> For those of us not all that familiar with BPF, can you explain *why* you want
> this? The above explains what the patch implements, but not why that's useful.
> 
> e.g. is this just to avoid the overhead of the ops list processing in the
> regular ftrace code, or is the custom trampoline there to allow you to do
> something special?

IIUC, ftrace direct call was designed to *remove* the unnecessary
overhead of saving regs completely [1][2].

[1]
https://lore.kernel.org/all/20191022175052.frjzlnjjfwwfov64@ast-mbp.dhcp.thefacebook.com/
[2] https://lore.kernel.org/all/20191108212834.594904349@goodmis.org/

This patch itself is just a variant of [3].

[3] https://lore.kernel.org/all/20191108213450.891579507@goodmis.org/

> 
> There is another patch series on the list from some of your colleagues which
> uses dynamic trampolines to try to avoid that ops list overhead, and it's not
> clear to me whether these are trying to solve the largely same problem or
> something different. That other thread is at:
> 
>   https://lore.kernel.org/linux-arm-kernel/20220316100132.244849-1-bobo.shaobowang@huawei.com/
> 
> ... and I've added the relevant parties to CC here, since there doesn't seem to
> be any overlap in the CC lists of the two threads.

We're not working to solve the same problem. The trampoline introduced
in this series helps us to monitor kernel function or another bpf prog
with bpf, and also helps us to use bpf prog like a normal kernel
function pointer.

> 
> In that other thread I've suggested a general approach we could follow at:
>   
>   https://lore.kernel.org/linux-arm-kernel/YmGF%2FOpIhAF8YeVq@lakrids/
>

Is it possible for a kernel function to take a long jump to common
trampoline when we get a huge kernel image?

> As noted in that thread, I have a few concerns which equally apply here:
> 
> * Due to the limited range of BL instructions, it's not always possible to
>   patch an ftrace call-site to branch to an arbitrary trampoline. The way this
>   works for ftrace today relies upon knowingthe set of trampolines at
>   compile-time, and allocating module PLTs for those, and that approach cannot
>   work reliably for dynanically allocated trampolines.

Currently patch 5 returns -ENOTSUPP when long jump is detected, so no
bpf trampoline is constructed for out of range patch-site:

if (is_long_jump(orig_call, image))
	return -ENOTSUPP;

> 
>   I'd strongly prefer to avoid custom tramplines unless they're strictly
>   necessary for functional reasons, so that we can have this work reliably and
>   consistently.

bpf trampoline is needed by bpf itself, not to replace ftrace trampolines.

>> * If this is mostly about avoiding the ops list processing overhead, I
beleive
>   we can implement some custom ops support more generally in ftrace which would
>   still use a common trampoline but could directly call into those custom ops.
>   I would strongly prefer this over custom trampolines.
> 
> * I'm looking to minimize the set of regs ftrace saves, and never save a full
>   pt_regs, since today we (incompletely) fill that with bogus values and cannot
>   acquire some state reliably (e.g. PSTATE). I'd like to avoid usage of pt_regs
>   unless necessary, and I don't want to add additional reliance upon that
>   structure.

Even if such a common trampoline is used, bpf trampoline is still
necessary since we need to construct custom instructions to implement
bpf functions, for example, to implement kernel function pointer with a
bpf prog.

> 
>> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
>> Acked-by: Song Liu <songliubraving@fb.com>
>> ---
>>  arch/arm64/Kconfig               |  2 ++
>>  arch/arm64/include/asm/ftrace.h  | 12 ++++++++++++
>>  arch/arm64/kernel/asm-offsets.c  |  1 +
>>  arch/arm64/kernel/entry-ftrace.S | 18 +++++++++++++++---
>>  4 files changed, 30 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 57c4c995965f..81cc330daafc 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -177,6 +177,8 @@ config ARM64
>>  	select HAVE_DYNAMIC_FTRACE
>>  	select HAVE_DYNAMIC_FTRACE_WITH_REGS \
>>  		if $(cc-option,-fpatchable-function-entry=2)
>> +	select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS \
>> +		if DYNAMIC_FTRACE_WITH_REGS
>>  	select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY \
>>  		if DYNAMIC_FTRACE_WITH_REGS
>>  	select HAVE_EFFICIENT_UNALIGNED_ACCESS
>> diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
>> index 1494cfa8639b..14a35a5df0a1 100644
>> --- a/arch/arm64/include/asm/ftrace.h
>> +++ b/arch/arm64/include/asm/ftrace.h
>> @@ -78,6 +78,18 @@ static inline unsigned long ftrace_call_adjust(unsigned long addr)
>>  	return addr;
>>  }
>>  
>> +#ifdef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
>> +static inline void arch_ftrace_set_direct_caller(struct pt_regs *regs,
>> +						 unsigned long addr)
>> +{
>> +	/*
>> +	 * Place custom trampoline address in regs->orig_x0 to let ftrace
>> +	 * trampoline jump to it.
>> +	 */
>> +	regs->orig_x0 = addr;
>> +}
>> +#endif /* CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
> 
> Please, let's not abuse pt_regs::orig_x0 for this. That's at best unnecessarily
> confusing, and if we really need a field to place a value like this it implies
> we should add an ftrace-specific structure to hold the ftrace-specific context
> information.
> 

Sorry for this confusion, this was modified in the x86 way:

https://lore.kernel.org/all/20191108213450.891579507@goodmis.org/

> Thanks,
> Mark.
> 
>> +
>>  #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
>>  struct dyn_ftrace;g w
>>  int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec);
>> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
>> index 1197e7679882..b1ed0bf01c59 100644
>> --- a/arch/arm64/kernel/asm-offsets.c
>> +++ b/arch/arm64/kernel/asm-offsets.c
>> @@ -80,6 +80,7 @@ int main(void)
>>    DEFINE(S_SDEI_TTBR1,		offsetof(struct pt_regs, sdei_ttbr1));
>>    DEFINE(S_PMR_SAVE,		offsetof(struct pt_regs, pmr_save));
>>    DEFINE(S_STACKFRAME,		offsetof(struct pt_regs, stackframe));
>> +  DEFINE(S_ORIG_X0,		offsetof(struct pt_regs, orig_x0));
>>    DEFINE(PT_REGS_SIZE,		sizeof(struct pt_regs));
>>    BLANK();
>>  #ifdef CONFIG_COMPAT
>> diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
>> index e535480a4069..dfe62c55e3a2 100644
>> --- a/arch/arm64/kernel/entry-ftrace.S
>> +++ b/arch/arm64/kernel/entry-ftrace.S
>> @@ -60,6 +60,9 @@
>>  	str	x29, [sp, #S_FP]
>>  	.endif
>>  
>> +	/* Set orig_x0 to zero  */
>> +	str     xzr, [sp, #S_ORIG_X0]
>> +
>>  	/* Save the callsite's SP and LR */
>>  	add	x10, sp, #(PT_REGS_SIZE + 16)
>>  	stp	x9, x10, [sp, #S_LR]
>> @@ -119,12 +122,21 @@ ftrace_common_return:
>>  	/* Restore the callsite's FP, LR, PC */
>>  	ldr	x29, [sp, #S_FP]
>>  	ldr	x30, [sp, #S_LR]
>> -	ldr	x9, [sp, #S_PC]
>> -
>> +	ldr	x10, [sp, #S_PC]
>> +
>> +	ldr	x11, [sp, #S_ORIG_X0]
>> +	cbz	x11, 1f
>> +	/* Set x9 to parent ip before jump to custom trampoline */
>> +	mov	x9,  x30
>> +	/* Set lr to self ip */
>> +	ldr	x30, [sp, #S_PC]
>> +	/* Set x10 (used for return address) to custom trampoline */
>> +	mov	x10, x11
>> +1:
>>  	/* Restore the callsite's SP */
>>  	add	sp, sp, #PT_REGS_SIZE + 16
>>  
>> -	ret	x9
>> +	ret	x10
>>  SYM_CODE_END(ftrace_common)
>>  
>>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>> -- 
>> 2.30.2
>>
> .


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 2/6] ftrace: Fix deadloop caused by direct call in ftrace selftest
  2022-05-25 13:43     ` Mark Rutland
@ 2022-05-26  9:45       ` Xu Kuohai
  -1 siblings, 0 replies; 60+ messages in thread
From: Xu Kuohai @ 2022-05-26  9:45 UTC (permalink / raw)
  To: Mark Rutland
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi

On 5/25/2022 9:43 PM, Mark Rutland wrote:
> On Wed, May 18, 2022 at 09:16:34AM -0400, Xu Kuohai wrote:
>> After direct call is enabled for arm64, ftrace selftest enters a
>> dead loop:
> 
> IIUC this means that patch 1 alone is broken, and presumably this patch should
> have been part of it?

No, patch 1 is not broken. This patch fixes bug in the selftest
trampoline, not bug in patch 1.

> 
>> <trace_selftest_dynamic_test_func>:
>> 00  bti     c
>> 01  mov     x9, x30                            <trace_direct_tramp>:
>> 02  bl      <trace_direct_tramp>    ---------->     ret
>>                                                      |
>>                                          lr/x30 is 03, return to 03
>>                                                      |
>> 03  mov     w0, #0x0   <-----------------------------|
>>      |                                               |
>>      |                   dead loop!                  |
>>      |                                               |
>> 04  ret   ---- lr/x30 is still 03, go back to 03 ----|
>>
>> The reason is that when the direct caller trace_direct_tramp() returns
>> to the patched function trace_selftest_dynamic_test_func(), lr is still
>> the address after the instrumented instruction in the patched function,
>> so when the patched function exits, it returns to itself!
>>
>> To fix this issue, we need to restore lr before trace_direct_tramp()
>> exits, so rewrite a dedicated trace_direct_tramp() for arm64.
> 
> As mentioned on patch 1 I'd prefer we solved this through indirection, which
> would avoid the need for this and would make things more robust generally by
> keeping the unusual calling convention private to the patch-site and regular
> trampoline.
>

IIUC, we still need to restore x30 before returning from the trampoline
even through indirection, so this bug is still there.

> Thanks,
> Mark.
> 
>> Reported-by: Li Huafei <lihuafei1@huawei.com>
>> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
>> ---
>>  arch/arm64/include/asm/ftrace.h  | 10 ++++++++++
>>  arch/arm64/kernel/entry-ftrace.S | 10 ++++++++++
>>  kernel/trace/trace_selftest.c    |  2 ++
>>  3 files changed, 22 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
>> index 14a35a5df0a1..6f6b184e72fb 100644
>> --- a/arch/arm64/include/asm/ftrace.h
>> +++ b/arch/arm64/include/asm/ftrace.h
>> @@ -126,6 +126,16 @@ static inline bool arch_syscall_match_sym_name(const char *sym,
>>  	 */
>>  	return !strcmp(sym + 8, name);
>>  }
>> +
>> +#ifdef CONFIG_FTRACE_SELFTEST
>> +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
>> +
>> +#define trace_direct_tramp trace_direct_tramp
>> +extern void trace_direct_tramp(void);
>> +
>> +#endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
>> +#endif /* CONFIG_FTRACE_SELFTEST */
>> +
>>  #endif /* ifndef __ASSEMBLY__ */
>>  
>>  #endif /* __ASM_FTRACE_H */
>> diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
>> index dfe62c55e3a2..a47e87d4d3dd 100644
>> --- a/arch/arm64/kernel/entry-ftrace.S
>> +++ b/arch/arm64/kernel/entry-ftrace.S
>> @@ -357,3 +357,13 @@ SYM_CODE_START(return_to_handler)
>>  	ret
>>  SYM_CODE_END(return_to_handler)
>>  #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
>> +
>> +#ifdef CONFIG_FTRACE_SELFTEST
>> +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
>> +SYM_FUNC_START(trace_direct_tramp)
>> +	mov x10, x30
>> +	mov x30, x9
>> +	ret x10
>> +SYM_FUNC_END(trace_direct_tramp)
>> +#endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
>> +#endif /* CONFIG_FTRACE_SELFTEST */
>> diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
>> index abcadbe933bb..e7ccd0d10c39 100644
>> --- a/kernel/trace/trace_selftest.c
>> +++ b/kernel/trace/trace_selftest.c
>> @@ -785,8 +785,10 @@ static struct fgraph_ops fgraph_ops __initdata  = {
>>  };
>>  
>>  #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
>> +#ifndef trace_direct_tramp
>>  noinline __noclone static void trace_direct_tramp(void) { }
>>  #endif
>> +#endif
>>  
>>  /*
>>   * Pretty much the same than for the function tracer from which the selftest
>> -- 
>> 2.30.2
>>
> .


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 2/6] ftrace: Fix deadloop caused by direct call in ftrace selftest
@ 2022-05-26  9:45       ` Xu Kuohai
  0 siblings, 0 replies; 60+ messages in thread
From: Xu Kuohai @ 2022-05-26  9:45 UTC (permalink / raw)
  To: Mark Rutland
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi

On 5/25/2022 9:43 PM, Mark Rutland wrote:
> On Wed, May 18, 2022 at 09:16:34AM -0400, Xu Kuohai wrote:
>> After direct call is enabled for arm64, ftrace selftest enters a
>> dead loop:
> 
> IIUC this means that patch 1 alone is broken, and presumably this patch should
> have been part of it?

No, patch 1 is not broken. This patch fixes bug in the selftest
trampoline, not bug in patch 1.

> 
>> <trace_selftest_dynamic_test_func>:
>> 00  bti     c
>> 01  mov     x9, x30                            <trace_direct_tramp>:
>> 02  bl      <trace_direct_tramp>    ---------->     ret
>>                                                      |
>>                                          lr/x30 is 03, return to 03
>>                                                      |
>> 03  mov     w0, #0x0   <-----------------------------|
>>      |                                               |
>>      |                   dead loop!                  |
>>      |                                               |
>> 04  ret   ---- lr/x30 is still 03, go back to 03 ----|
>>
>> The reason is that when the direct caller trace_direct_tramp() returns
>> to the patched function trace_selftest_dynamic_test_func(), lr is still
>> the address after the instrumented instruction in the patched function,
>> so when the patched function exits, it returns to itself!
>>
>> To fix this issue, we need to restore lr before trace_direct_tramp()
>> exits, so rewrite a dedicated trace_direct_tramp() for arm64.
> 
> As mentioned on patch 1 I'd prefer we solved this through indirection, which
> would avoid the need for this and would make things more robust generally by
> keeping the unusual calling convention private to the patch-site and regular
> trampoline.
>

IIUC, we still need to restore x30 before returning from the trampoline
even through indirection, so this bug is still there.

> Thanks,
> Mark.
> 
>> Reported-by: Li Huafei <lihuafei1@huawei.com>
>> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
>> ---
>>  arch/arm64/include/asm/ftrace.h  | 10 ++++++++++
>>  arch/arm64/kernel/entry-ftrace.S | 10 ++++++++++
>>  kernel/trace/trace_selftest.c    |  2 ++
>>  3 files changed, 22 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
>> index 14a35a5df0a1..6f6b184e72fb 100644
>> --- a/arch/arm64/include/asm/ftrace.h
>> +++ b/arch/arm64/include/asm/ftrace.h
>> @@ -126,6 +126,16 @@ static inline bool arch_syscall_match_sym_name(const char *sym,
>>  	 */
>>  	return !strcmp(sym + 8, name);
>>  }
>> +
>> +#ifdef CONFIG_FTRACE_SELFTEST
>> +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
>> +
>> +#define trace_direct_tramp trace_direct_tramp
>> +extern void trace_direct_tramp(void);
>> +
>> +#endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
>> +#endif /* CONFIG_FTRACE_SELFTEST */
>> +
>>  #endif /* ifndef __ASSEMBLY__ */
>>  
>>  #endif /* __ASM_FTRACE_H */
>> diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
>> index dfe62c55e3a2..a47e87d4d3dd 100644
>> --- a/arch/arm64/kernel/entry-ftrace.S
>> +++ b/arch/arm64/kernel/entry-ftrace.S
>> @@ -357,3 +357,13 @@ SYM_CODE_START(return_to_handler)
>>  	ret
>>  SYM_CODE_END(return_to_handler)
>>  #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
>> +
>> +#ifdef CONFIG_FTRACE_SELFTEST
>> +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
>> +SYM_FUNC_START(trace_direct_tramp)
>> +	mov x10, x30
>> +	mov x30, x9
>> +	ret x10
>> +SYM_FUNC_END(trace_direct_tramp)
>> +#endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
>> +#endif /* CONFIG_FTRACE_SELFTEST */
>> diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
>> index abcadbe933bb..e7ccd0d10c39 100644
>> --- a/kernel/trace/trace_selftest.c
>> +++ b/kernel/trace/trace_selftest.c
>> @@ -785,8 +785,10 @@ static struct fgraph_ops fgraph_ops __initdata  = {
>>  };
>>  
>>  #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
>> +#ifndef trace_direct_tramp
>>  noinline __noclone static void trace_direct_tramp(void) { }
>>  #endif
>> +#endif
>>  
>>  /*
>>   * Pretty much the same than for the function tracer from which the selftest
>> -- 
>> 2.30.2
>>
> .


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 3/6] bpf: Remove is_valid_bpf_tramp_flags()
  2022-05-25 13:45     ` Mark Rutland
@ 2022-05-26  9:45       ` Xu Kuohai
  -1 siblings, 0 replies; 60+ messages in thread
From: Xu Kuohai @ 2022-05-26  9:45 UTC (permalink / raw)
  To: Mark Rutland
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi

On 5/25/2022 9:45 PM, Mark Rutland wrote:
> On Wed, May 18, 2022 at 09:16:35AM -0400, Xu Kuohai wrote:
>> BPF_TRAM_F_XXX flags are not used by user code and are almost constant
>> at compile time, so run time validation is a bit overkill. Remove
>> is_valid_bpf_tramp_flags() and add some usage comments.
>>
>> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
>> Acked-by: Song Liu <songliubraving@fb.com>
> 
> Am I right in thinking this is independent of the arm64-specific bits, and
> could be taken on its own now?
> 

Currenly is_valid_bpf_tramp_flags() is defined in x86 and called before
bpf trampoline is constructed. The check logic is irrelevant to the
architecture code. So we also need to call this function on arm64. But
as Alexei pointed out, the check is not requried, so it's better to
remove it before adding bpf trampoline to arm64.

> Mark.
> 
>> ---
>>  arch/x86/net/bpf_jit_comp.c | 20 --------------------
>>  kernel/bpf/bpf_struct_ops.c |  3 +++
>>  kernel/bpf/trampoline.c     |  3 +++
>>  3 files changed, 6 insertions(+), 20 deletions(-)
>>
>> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
>> index a2b6d197c226..7698ef3b4821 100644
>> --- a/arch/x86/net/bpf_jit_comp.c
>> +++ b/arch/x86/net/bpf_jit_comp.c
>> @@ -1922,23 +1922,6 @@ static int invoke_bpf_mod_ret(const struct btf_func_model *m, u8 **pprog,
>>  	return 0;
>>  }
>>  
>> -static bool is_valid_bpf_tramp_flags(unsigned int flags)
>> -{
>> -	if ((flags & BPF_TRAMP_F_RESTORE_REGS) &&
>> -	    (flags & BPF_TRAMP_F_SKIP_FRAME))
>> -		return false;
>> -
>> -	/*
>> -	 * BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops,
>> -	 * and it must be used alone.
>> -	 */
>> -	if ((flags & BPF_TRAMP_F_RET_FENTRY_RET) &&
>> -	    (flags & ~BPF_TRAMP_F_RET_FENTRY_RET))
>> -		return false;
>> -
>> -	return true;
>> -}
>> -
>>  /* Example:
>>   * __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev);
>>   * its 'struct btf_func_model' will be nr_args=2
>> @@ -2017,9 +2000,6 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>>  	if (nr_args > 6)
>>  		return -ENOTSUPP;
>>  
>> -	if (!is_valid_bpf_tramp_flags(flags))
>> -		return -EINVAL;
>> -
>>  	/* Generated trampoline stack layout:
>>  	 *
>>  	 * RBP + 8         [ return address  ]
>> diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
>> index d9a3c9207240..0572cc5aeb28 100644
>> --- a/kernel/bpf/bpf_struct_ops.c
>> +++ b/kernel/bpf/bpf_struct_ops.c
>> @@ -341,6 +341,9 @@ int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks,
>>  
>>  	tlinks[BPF_TRAMP_FENTRY].links[0] = link;
>>  	tlinks[BPF_TRAMP_FENTRY].nr_links = 1;
>> +	/* BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops,
>> +	 * and it must be used alone.
>> +	 */
>>  	flags = model->ret_size > 0 ? BPF_TRAMP_F_RET_FENTRY_RET : 0;
>>  	return arch_prepare_bpf_trampoline(NULL, image, image_end,
>>  					   model, flags, tlinks, NULL);
>> diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
>> index 93c7675f0c9e..bd3f2e673874 100644
>> --- a/kernel/bpf/trampoline.c
>> +++ b/kernel/bpf/trampoline.c
>> @@ -358,6 +358,9 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr)
>>  
>>  	if (tlinks[BPF_TRAMP_FEXIT].nr_links ||
>>  	    tlinks[BPF_TRAMP_MODIFY_RETURN].nr_links)
>> +		/* NOTE: BPF_TRAMP_F_RESTORE_REGS and BPF_TRAMP_F_SKIP_FRAME
>> +		 * should not be set together.
>> +		 */
>>  		flags = BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME;
>>  
>>  	if (ip_arg)
>> -- 
>> 2.30.2
>>
> .


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 3/6] bpf: Remove is_valid_bpf_tramp_flags()
@ 2022-05-26  9:45       ` Xu Kuohai
  0 siblings, 0 replies; 60+ messages in thread
From: Xu Kuohai @ 2022-05-26  9:45 UTC (permalink / raw)
  To: Mark Rutland
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi

On 5/25/2022 9:45 PM, Mark Rutland wrote:
> On Wed, May 18, 2022 at 09:16:35AM -0400, Xu Kuohai wrote:
>> BPF_TRAM_F_XXX flags are not used by user code and are almost constant
>> at compile time, so run time validation is a bit overkill. Remove
>> is_valid_bpf_tramp_flags() and add some usage comments.
>>
>> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
>> Acked-by: Song Liu <songliubraving@fb.com>
> 
> Am I right in thinking this is independent of the arm64-specific bits, and
> could be taken on its own now?
> 

Currenly is_valid_bpf_tramp_flags() is defined in x86 and called before
bpf trampoline is constructed. The check logic is irrelevant to the
architecture code. So we also need to call this function on arm64. But
as Alexei pointed out, the check is not requried, so it's better to
remove it before adding bpf trampoline to arm64.

> Mark.
> 
>> ---
>>  arch/x86/net/bpf_jit_comp.c | 20 --------------------
>>  kernel/bpf/bpf_struct_ops.c |  3 +++
>>  kernel/bpf/trampoline.c     |  3 +++
>>  3 files changed, 6 insertions(+), 20 deletions(-)
>>
>> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
>> index a2b6d197c226..7698ef3b4821 100644
>> --- a/arch/x86/net/bpf_jit_comp.c
>> +++ b/arch/x86/net/bpf_jit_comp.c
>> @@ -1922,23 +1922,6 @@ static int invoke_bpf_mod_ret(const struct btf_func_model *m, u8 **pprog,
>>  	return 0;
>>  }
>>  
>> -static bool is_valid_bpf_tramp_flags(unsigned int flags)
>> -{
>> -	if ((flags & BPF_TRAMP_F_RESTORE_REGS) &&
>> -	    (flags & BPF_TRAMP_F_SKIP_FRAME))
>> -		return false;
>> -
>> -	/*
>> -	 * BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops,
>> -	 * and it must be used alone.
>> -	 */
>> -	if ((flags & BPF_TRAMP_F_RET_FENTRY_RET) &&
>> -	    (flags & ~BPF_TRAMP_F_RET_FENTRY_RET))
>> -		return false;
>> -
>> -	return true;
>> -}
>> -
>>  /* Example:
>>   * __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev);
>>   * its 'struct btf_func_model' will be nr_args=2
>> @@ -2017,9 +2000,6 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>>  	if (nr_args > 6)
>>  		return -ENOTSUPP;
>>  
>> -	if (!is_valid_bpf_tramp_flags(flags))
>> -		return -EINVAL;
>> -
>>  	/* Generated trampoline stack layout:
>>  	 *
>>  	 * RBP + 8         [ return address  ]
>> diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
>> index d9a3c9207240..0572cc5aeb28 100644
>> --- a/kernel/bpf/bpf_struct_ops.c
>> +++ b/kernel/bpf/bpf_struct_ops.c
>> @@ -341,6 +341,9 @@ int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks,
>>  
>>  	tlinks[BPF_TRAMP_FENTRY].links[0] = link;
>>  	tlinks[BPF_TRAMP_FENTRY].nr_links = 1;
>> +	/* BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops,
>> +	 * and it must be used alone.
>> +	 */
>>  	flags = model->ret_size > 0 ? BPF_TRAMP_F_RET_FENTRY_RET : 0;
>>  	return arch_prepare_bpf_trampoline(NULL, image, image_end,
>>  					   model, flags, tlinks, NULL);
>> diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
>> index 93c7675f0c9e..bd3f2e673874 100644
>> --- a/kernel/bpf/trampoline.c
>> +++ b/kernel/bpf/trampoline.c
>> @@ -358,6 +358,9 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr)
>>  
>>  	if (tlinks[BPF_TRAMP_FEXIT].nr_links ||
>>  	    tlinks[BPF_TRAMP_MODIFY_RETURN].nr_links)
>> +		/* NOTE: BPF_TRAMP_F_RESTORE_REGS and BPF_TRAMP_F_SKIP_FRAME
>> +		 * should not be set together.
>> +		 */
>>  		flags = BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME;
>>  
>>  	if (ip_arg)
>> -- 
>> 2.30.2
>>
> .


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 4/6] bpf, arm64: Impelment bpf_arch_text_poke() for arm64
  2022-05-25 14:10     ` Mark Rutland
@ 2022-05-26  9:45       ` Xu Kuohai
  -1 siblings, 0 replies; 60+ messages in thread
From: Xu Kuohai @ 2022-05-26  9:45 UTC (permalink / raw)
  To: Mark Rutland
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi

On 5/25/2022 10:10 PM, Mark Rutland wrote:
> On Wed, May 18, 2022 at 09:16:36AM -0400, Xu Kuohai wrote:
>> Impelment bpf_arch_text_poke() for arm64, so bpf trampoline code can use
>> it to replace nop with jump, or replace jump with nop.
>>
>> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
>> Acked-by: Song Liu <songliubraving@fb.com>
>> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
>> ---
>>  arch/arm64/net/bpf_jit.h      |   1 +
>>  arch/arm64/net/bpf_jit_comp.c | 107 +++++++++++++++++++++++++++++++---
>>  2 files changed, 99 insertions(+), 9 deletions(-)
>>
>> diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h
>> index 194c95ccc1cf..1c4b0075a3e2 100644
>> --- a/arch/arm64/net/bpf_jit.h
>> +++ b/arch/arm64/net/bpf_jit.h
>> @@ -270,6 +270,7 @@
>>  #define A64_BTI_C  A64_HINT(AARCH64_INSN_HINT_BTIC)
>>  #define A64_BTI_J  A64_HINT(AARCH64_INSN_HINT_BTIJ)
>>  #define A64_BTI_JC A64_HINT(AARCH64_INSN_HINT_BTIJC)
>> +#define A64_NOP    A64_HINT(AARCH64_INSN_HINT_NOP)
>>  
>>  /* DMB */
>>  #define A64_DMB_ISH aarch64_insn_gen_dmb(AARCH64_INSN_MB_ISH)
>> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
>> index 8ab4035dea27..5ce6ed5f42a1 100644
>> --- a/arch/arm64/net/bpf_jit_comp.c
>> +++ b/arch/arm64/net/bpf_jit_comp.c
>> @@ -9,6 +9,7 @@
>>  
>>  #include <linux/bitfield.h>
>>  #include <linux/bpf.h>
>> +#include <linux/memory.h>
>>  #include <linux/filter.h>
>>  #include <linux/printk.h>
>>  #include <linux/slab.h>
>> @@ -18,6 +19,7 @@
>>  #include <asm/cacheflush.h>
>>  #include <asm/debug-monitors.h>
>>  #include <asm/insn.h>
>> +#include <asm/patching.h>
>>  #include <asm/set_memory.h>
>>  
>>  #include "bpf_jit.h"
>> @@ -235,13 +237,13 @@ static bool is_lsi_offset(int offset, int scale)
>>  	return true;
>>  }
>>  
>> +#define BTI_INSNS (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) ? 1 : 0)
>> +#define PAC_INSNS (IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL) ? 1 : 0)
>> +
>>  /* Tail call offset to jump into */
>> -#if IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) || \
>> -	IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL)
>> -#define PROLOGUE_OFFSET 9
>> -#else
>> -#define PROLOGUE_OFFSET 8
>> -#endif
>> +#define PROLOGUE_OFFSET	(BTI_INSNS + 2 + PAC_INSNS + 8)
>> +/* Offset of nop instruction in bpf prog entry to be poked */
>> +#define POKE_OFFSET	(BTI_INSNS + 1)
>>  
>>  static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
>>  {
>> @@ -279,12 +281,15 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
>>  	 *
>>  	 */
>>  
>> +	if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
>> +		emit(A64_BTI_C, ctx);
>> +
>> +	emit(A64_MOV(1, A64_R(9), A64_LR), ctx);
>> +	emit(A64_NOP, ctx);
> 
> I take it the idea is to make this the same as the regular ftrace patch-site
> sequence, so that this can call the same trampoline(s) ?
> 

Yes, we can attach a bpf trampoline to bpf prog.

> If so, we need some commentary to that effect, and we need some comments in the
> ftrace code explaining that this needs to be kept in-sync.
> 

This is patched by bpf_arch_text_poke(), not ftrace.

>> +
>>  	/* Sign lr */
>>  	if (IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL))
>>  		emit(A64_PACIASP, ctx);
>> -	/* BTI landing pad */
>> -	else if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
>> -		emit(A64_BTI_C, ctx);
>>  
>>  	/* Save FP and LR registers to stay align with ARM64 AAPCS */
>>  	emit(A64_PUSH(A64_FP, A64_LR, A64_SP), ctx);
>> @@ -1529,3 +1534,87 @@ void bpf_jit_free_exec(void *addr)
>>  {
>>  	return vfree(addr);
>>  }
>> +
>> +static int gen_branch_or_nop(enum aarch64_insn_branch_type type, void *ip,
>> +			     void *addr, u32 *insn)
>> +{
>> +	if (!addr)
>> +		*insn = aarch64_insn_gen_nop();
>> +	else
>> +		*insn = aarch64_insn_gen_branch_imm((unsigned long)ip,
>> +						    (unsigned long)addr,
>> +						    type);
>> +
>> +	return *insn != AARCH64_BREAK_FAULT ? 0 : -EFAULT;
>> +}
>> +
>> +int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type poke_type,
>> +		       void *old_addr, void *new_addr)
>> +{
>> +	int ret;
>> +	u32 old_insn;
>> +	u32 new_insn;
>> +	u32 replaced;
>> +	unsigned long offset = ~0UL;
>> +	enum aarch64_insn_branch_type branch_type;
>> +	char namebuf[KSYM_NAME_LEN];
>> +
>> +	if (!__bpf_address_lookup((unsigned long)ip, NULL, &offset, namebuf))
>> +		/* Only poking bpf text is supported. Since kernel function
>> +		 * entry is set up by ftrace, we reply on ftrace to poke kernel
>> +		 * functions.
>> +		 */
>> +		return -EINVAL;
>> +
>> +	/* bpf entry */
>> +	if (offset == 0UL)
>> +		/* skip to the nop instruction in bpf prog entry:
>> +		 * bti c	// if BTI enabled
>> +		 * mov x9, x30
>> +		 * nop
>> +		 */
>> +		ip = ip + POKE_OFFSET * AARCH64_INSN_SIZE;
> 
> When is offset non-zero? is this ever called to patch other instructions, and
> could this ever be used to try to patch the BTI specifically?
> 

bpf_arch_text_poke() is also called to patch other instructions, for
example, bpf_tramp_image_put() calls this to skip calling fexit bpf progs:

int err = bpf_arch_text_poke(im->ip_after_call, BPF_MOD_JUMP,
                             NULL, im->ip_epilogue);


Before this is called, a bpf trampoline looks like:

	[...]
ip_after_call:
	nop // to be patched
	bl <fexit prog>
ip_epilogue:
	bti j
	[...]

After:
	[...]
ip_after_call:
	b <ip_epilogue> // patched
	bl <fexit prog>
ip_epilogue:
	bti j
	[...]


> I strongly suspect we need a higher-level API to say "poke the patchable
> callsite in the prologue", rather than assuming that offset 0 always means
> that, or it'll be *very* easy for this to go wrong.
> 

Ah, bpf_arch_text_poke() only patches bpf progs, and the patch-site in
bpf prog prologue is constructed for bpf_arch_text_poke(), so we always
know the patch-site offset. There's no compiler generated instruction
here, so it seems to be not a problem.

>> +
>> +	if (poke_type == BPF_MOD_CALL)
>> +		branch_type = AARCH64_INSN_BRANCH_LINK;
>> +	else
>> +		branch_type = AARCH64_INSN_BRANCH_NOLINK;
> 
> When is poke_type *not* BPF_MOD_CALL?>

The bpf_tramp_image_put() example above uses BPF_MOD_JUMP.

> I assume that means BPF also uses this for non-ftrace reasons?
>

This function is NOT used for ftrace patch-site. It's only used to patch
bpf image.

>> +	if (gen_branch_or_nop(branch_type, ip, old_addr, &old_insn) < 0)
>> +		return -EFAULT;
>> +
>> +	if (gen_branch_or_nop(branch_type, ip, new_addr, &new_insn) < 0)
>> +		return -EFAULT;
>> +
>> +	mutex_lock(&text_mutex);
>> +	if (aarch64_insn_read(ip, &replaced)) {
>> +		ret = -EFAULT;
>> +		goto out;
>> +	}
>> +
>> +	if (replaced != old_insn) {
>> +		ret = -EFAULT;
>> +		goto out;
>> +	}
>> +
>> +	/* We call aarch64_insn_patch_text_nosync() to replace instruction
>> +	 * atomically, so no other CPUs will fetch a half-new and half-old
>> +	 * instruction. But there is chance that another CPU fetches the old
>> +	 * instruction after bpf_arch_text_poke() finishes, that is, different
>> +	 * CPUs may execute different versions of instructions at the same
>> +	 * time before the icache is synchronized by hardware.
>> +	 *
>> +	 * 1. when a new trampoline is attached, it is not an issue for
>> +	 *    different CPUs to jump to different trampolines temporarily.
>> +	 *
>> +	 * 2. when an old trampoline is freed, we should wait for all other
>> +	 *    CPUs to exit the trampoline and make sure the trampoline is no
>> +	 *    longer reachable, since bpf_tramp_image_put() function already
>> +	 *    uses percpu_ref and rcu task to do the sync, no need to call the
>> +	 *    sync interface here.
>> +	 */
> 
> How is RCU used for that? It's not clear to me how that works for PREEMPT_RCU
> (which is the usual configuration for arm64), since we can easily be in a
> preemptible context, outside of an RCU read side critical section, yet call
> into a trampoline.
> 
> I know that for livepatching we need to use stacktracing to ensure we've
> finished using code we'd like to free, and I can't immediately see how you can
> avoid that here. I'm suspicious that there's still a race where threads can
> enter the trampoline and it can be subsequently freed.
> 
> For ftrace today we get away with entering the existing trampolines when not
> intended because those are statically allocated, and the race is caught when
> acquiring the ops inside the ftrace core code. This case is different because
> the CPU can fetch the instruction and execute that at any time, without any RCU
> involvement.
> 
> Can you give more details on how the scheme described above works? How
> *exactly*` do you ensure that threads which have entered the trampoline (and
> may have been immediately preempted by an interrupt) have returned? Which RCU
> mechanism are you using?
> 
> If you can point me at where this is implemented I'm happy to take a look.
>
IIUC, task rcu's critical section ends at a voluntary context switch,
since no volutary context switch occurs in a progoluge, when a task
rcu's critical section ends, we can ensure no one is running in the
prologue [1].

For bpf trampoline, the scenario is similar, except that it may sleep,
so a reference count is increased when entering the trampoline and
decreased when exiting the trampoline, so we can wait for the reference
count to become zero to make sure there is no one in the sleepable
region [2].

[1] https://lore.kernel.org/all/20140804192017.GA18337@linux.vnet.ibm.com/
[2]
https://lore.kernel.org/bpf/20210316210007.38949-1-alexei.starovoitov@gmail.com/

> Thanks,
> Mark.
> 
>> +	ret = aarch64_insn_patch_text_nosync(ip, new_insn);
>> +out:
>> +	mutex_unlock(&text_mutex);
>> +	return ret;
>> +}
>> -- 
>> 2.30.2
>>
> .


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 4/6] bpf, arm64: Impelment bpf_arch_text_poke() for arm64
@ 2022-05-26  9:45       ` Xu Kuohai
  0 siblings, 0 replies; 60+ messages in thread
From: Xu Kuohai @ 2022-05-26  9:45 UTC (permalink / raw)
  To: Mark Rutland
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi

On 5/25/2022 10:10 PM, Mark Rutland wrote:
> On Wed, May 18, 2022 at 09:16:36AM -0400, Xu Kuohai wrote:
>> Impelment bpf_arch_text_poke() for arm64, so bpf trampoline code can use
>> it to replace nop with jump, or replace jump with nop.
>>
>> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
>> Acked-by: Song Liu <songliubraving@fb.com>
>> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
>> ---
>>  arch/arm64/net/bpf_jit.h      |   1 +
>>  arch/arm64/net/bpf_jit_comp.c | 107 +++++++++++++++++++++++++++++++---
>>  2 files changed, 99 insertions(+), 9 deletions(-)
>>
>> diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h
>> index 194c95ccc1cf..1c4b0075a3e2 100644
>> --- a/arch/arm64/net/bpf_jit.h
>> +++ b/arch/arm64/net/bpf_jit.h
>> @@ -270,6 +270,7 @@
>>  #define A64_BTI_C  A64_HINT(AARCH64_INSN_HINT_BTIC)
>>  #define A64_BTI_J  A64_HINT(AARCH64_INSN_HINT_BTIJ)
>>  #define A64_BTI_JC A64_HINT(AARCH64_INSN_HINT_BTIJC)
>> +#define A64_NOP    A64_HINT(AARCH64_INSN_HINT_NOP)
>>  
>>  /* DMB */
>>  #define A64_DMB_ISH aarch64_insn_gen_dmb(AARCH64_INSN_MB_ISH)
>> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
>> index 8ab4035dea27..5ce6ed5f42a1 100644
>> --- a/arch/arm64/net/bpf_jit_comp.c
>> +++ b/arch/arm64/net/bpf_jit_comp.c
>> @@ -9,6 +9,7 @@
>>  
>>  #include <linux/bitfield.h>
>>  #include <linux/bpf.h>
>> +#include <linux/memory.h>
>>  #include <linux/filter.h>
>>  #include <linux/printk.h>
>>  #include <linux/slab.h>
>> @@ -18,6 +19,7 @@
>>  #include <asm/cacheflush.h>
>>  #include <asm/debug-monitors.h>
>>  #include <asm/insn.h>
>> +#include <asm/patching.h>
>>  #include <asm/set_memory.h>
>>  
>>  #include "bpf_jit.h"
>> @@ -235,13 +237,13 @@ static bool is_lsi_offset(int offset, int scale)
>>  	return true;
>>  }
>>  
>> +#define BTI_INSNS (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) ? 1 : 0)
>> +#define PAC_INSNS (IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL) ? 1 : 0)
>> +
>>  /* Tail call offset to jump into */
>> -#if IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) || \
>> -	IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL)
>> -#define PROLOGUE_OFFSET 9
>> -#else
>> -#define PROLOGUE_OFFSET 8
>> -#endif
>> +#define PROLOGUE_OFFSET	(BTI_INSNS + 2 + PAC_INSNS + 8)
>> +/* Offset of nop instruction in bpf prog entry to be poked */
>> +#define POKE_OFFSET	(BTI_INSNS + 1)
>>  
>>  static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
>>  {
>> @@ -279,12 +281,15 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
>>  	 *
>>  	 */
>>  
>> +	if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
>> +		emit(A64_BTI_C, ctx);
>> +
>> +	emit(A64_MOV(1, A64_R(9), A64_LR), ctx);
>> +	emit(A64_NOP, ctx);
> 
> I take it the idea is to make this the same as the regular ftrace patch-site
> sequence, so that this can call the same trampoline(s) ?
> 

Yes, we can attach a bpf trampoline to bpf prog.

> If so, we need some commentary to that effect, and we need some comments in the
> ftrace code explaining that this needs to be kept in-sync.
> 

This is patched by bpf_arch_text_poke(), not ftrace.

>> +
>>  	/* Sign lr */
>>  	if (IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL))
>>  		emit(A64_PACIASP, ctx);
>> -	/* BTI landing pad */
>> -	else if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
>> -		emit(A64_BTI_C, ctx);
>>  
>>  	/* Save FP and LR registers to stay align with ARM64 AAPCS */
>>  	emit(A64_PUSH(A64_FP, A64_LR, A64_SP), ctx);
>> @@ -1529,3 +1534,87 @@ void bpf_jit_free_exec(void *addr)
>>  {
>>  	return vfree(addr);
>>  }
>> +
>> +static int gen_branch_or_nop(enum aarch64_insn_branch_type type, void *ip,
>> +			     void *addr, u32 *insn)
>> +{
>> +	if (!addr)
>> +		*insn = aarch64_insn_gen_nop();
>> +	else
>> +		*insn = aarch64_insn_gen_branch_imm((unsigned long)ip,
>> +						    (unsigned long)addr,
>> +						    type);
>> +
>> +	return *insn != AARCH64_BREAK_FAULT ? 0 : -EFAULT;
>> +}
>> +
>> +int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type poke_type,
>> +		       void *old_addr, void *new_addr)
>> +{
>> +	int ret;
>> +	u32 old_insn;
>> +	u32 new_insn;
>> +	u32 replaced;
>> +	unsigned long offset = ~0UL;
>> +	enum aarch64_insn_branch_type branch_type;
>> +	char namebuf[KSYM_NAME_LEN];
>> +
>> +	if (!__bpf_address_lookup((unsigned long)ip, NULL, &offset, namebuf))
>> +		/* Only poking bpf text is supported. Since kernel function
>> +		 * entry is set up by ftrace, we reply on ftrace to poke kernel
>> +		 * functions.
>> +		 */
>> +		return -EINVAL;
>> +
>> +	/* bpf entry */
>> +	if (offset == 0UL)
>> +		/* skip to the nop instruction in bpf prog entry:
>> +		 * bti c	// if BTI enabled
>> +		 * mov x9, x30
>> +		 * nop
>> +		 */
>> +		ip = ip + POKE_OFFSET * AARCH64_INSN_SIZE;
> 
> When is offset non-zero? is this ever called to patch other instructions, and
> could this ever be used to try to patch the BTI specifically?
> 

bpf_arch_text_poke() is also called to patch other instructions, for
example, bpf_tramp_image_put() calls this to skip calling fexit bpf progs:

int err = bpf_arch_text_poke(im->ip_after_call, BPF_MOD_JUMP,
                             NULL, im->ip_epilogue);


Before this is called, a bpf trampoline looks like:

	[...]
ip_after_call:
	nop // to be patched
	bl <fexit prog>
ip_epilogue:
	bti j
	[...]

After:
	[...]
ip_after_call:
	b <ip_epilogue> // patched
	bl <fexit prog>
ip_epilogue:
	bti j
	[...]


> I strongly suspect we need a higher-level API to say "poke the patchable
> callsite in the prologue", rather than assuming that offset 0 always means
> that, or it'll be *very* easy for this to go wrong.
> 

Ah, bpf_arch_text_poke() only patches bpf progs, and the patch-site in
bpf prog prologue is constructed for bpf_arch_text_poke(), so we always
know the patch-site offset. There's no compiler generated instruction
here, so it seems to be not a problem.

>> +
>> +	if (poke_type == BPF_MOD_CALL)
>> +		branch_type = AARCH64_INSN_BRANCH_LINK;
>> +	else
>> +		branch_type = AARCH64_INSN_BRANCH_NOLINK;
> 
> When is poke_type *not* BPF_MOD_CALL?>

The bpf_tramp_image_put() example above uses BPF_MOD_JUMP.

> I assume that means BPF also uses this for non-ftrace reasons?
>

This function is NOT used for ftrace patch-site. It's only used to patch
bpf image.

>> +	if (gen_branch_or_nop(branch_type, ip, old_addr, &old_insn) < 0)
>> +		return -EFAULT;
>> +
>> +	if (gen_branch_or_nop(branch_type, ip, new_addr, &new_insn) < 0)
>> +		return -EFAULT;
>> +
>> +	mutex_lock(&text_mutex);
>> +	if (aarch64_insn_read(ip, &replaced)) {
>> +		ret = -EFAULT;
>> +		goto out;
>> +	}
>> +
>> +	if (replaced != old_insn) {
>> +		ret = -EFAULT;
>> +		goto out;
>> +	}
>> +
>> +	/* We call aarch64_insn_patch_text_nosync() to replace instruction
>> +	 * atomically, so no other CPUs will fetch a half-new and half-old
>> +	 * instruction. But there is chance that another CPU fetches the old
>> +	 * instruction after bpf_arch_text_poke() finishes, that is, different
>> +	 * CPUs may execute different versions of instructions at the same
>> +	 * time before the icache is synchronized by hardware.
>> +	 *
>> +	 * 1. when a new trampoline is attached, it is not an issue for
>> +	 *    different CPUs to jump to different trampolines temporarily.
>> +	 *
>> +	 * 2. when an old trampoline is freed, we should wait for all other
>> +	 *    CPUs to exit the trampoline and make sure the trampoline is no
>> +	 *    longer reachable, since bpf_tramp_image_put() function already
>> +	 *    uses percpu_ref and rcu task to do the sync, no need to call the
>> +	 *    sync interface here.
>> +	 */
> 
> How is RCU used for that? It's not clear to me how that works for PREEMPT_RCU
> (which is the usual configuration for arm64), since we can easily be in a
> preemptible context, outside of an RCU read side critical section, yet call
> into a trampoline.
> 
> I know that for livepatching we need to use stacktracing to ensure we've
> finished using code we'd like to free, and I can't immediately see how you can
> avoid that here. I'm suspicious that there's still a race where threads can
> enter the trampoline and it can be subsequently freed.
> 
> For ftrace today we get away with entering the existing trampolines when not
> intended because those are statically allocated, and the race is caught when
> acquiring the ops inside the ftrace core code. This case is different because
> the CPU can fetch the instruction and execute that at any time, without any RCU
> involvement.
> 
> Can you give more details on how the scheme described above works? How
> *exactly*` do you ensure that threads which have entered the trampoline (and
> may have been immediately preempted by an interrupt) have returned? Which RCU
> mechanism are you using?
> 
> If you can point me at where this is implemented I'm happy to take a look.
>
IIUC, task rcu's critical section ends at a voluntary context switch,
since no volutary context switch occurs in a progoluge, when a task
rcu's critical section ends, we can ensure no one is running in the
prologue [1].

For bpf trampoline, the scenario is similar, except that it may sleep,
so a reference count is increased when entering the trampoline and
decreased when exiting the trampoline, so we can wait for the reference
count to become zero to make sure there is no one in the sleepable
region [2].

[1] https://lore.kernel.org/all/20140804192017.GA18337@linux.vnet.ibm.com/
[2]
https://lore.kernel.org/bpf/20210316210007.38949-1-alexei.starovoitov@gmail.com/

> Thanks,
> Mark.
> 
>> +	ret = aarch64_insn_patch_text_nosync(ip, new_insn);
>> +out:
>> +	mutex_unlock(&text_mutex);
>> +	return ret;
>> +}
>> -- 
>> 2.30.2
>>
> .


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 1/6] arm64: ftrace: Add ftrace direct call support
  2022-05-26  9:45       ` Xu Kuohai
@ 2022-05-26 10:06         ` Mark Rutland
  -1 siblings, 0 replies; 60+ messages in thread
From: Mark Rutland @ 2022-05-26 10:06 UTC (permalink / raw)
  To: Xu Kuohai
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi, Wang ShaoBo,
	cj.chengjian, huawei.libin, xiexiuqi, liwei391

On Thu, May 26, 2022 at 05:45:03PM +0800, Xu Kuohai wrote:
> On 5/25/2022 9:38 PM, Mark Rutland wrote:
> > On Wed, May 18, 2022 at 09:16:33AM -0400, Xu Kuohai wrote:
> >> Add ftrace direct support for arm64.
> >>
> >> 1. When there is custom trampoline only, replace the fentry nop to a
> >>    jump instruction that jumps directly to the custom trampoline.
> >>
> >> 2. When ftrace trampoline and custom trampoline coexist, jump from
> >>    fentry to ftrace trampoline first, then jump to custom trampoline
> >>    when ftrace trampoline exits. The current unused register
> >>    pt_regs->orig_x0 is used as an intermediary for jumping from ftrace
> >>    trampoline to custom trampoline.
> > 
> > For those of us not all that familiar with BPF, can you explain *why* you want
> > this? The above explains what the patch implements, but not why that's useful.
> > 
> > e.g. is this just to avoid the overhead of the ops list processing in the
> > regular ftrace code, or is the custom trampoline there to allow you to do
> > something special?
> 
> IIUC, ftrace direct call was designed to *remove* the unnecessary
> overhead of saving regs completely [1][2].

Ok. My plan is to get rid of most of the register saving generally, so I think
that aspect can be solved without direct calls.

> [1]
> https://lore.kernel.org/all/20191022175052.frjzlnjjfwwfov64@ast-mbp.dhcp.thefacebook.com/
> [2] https://lore.kernel.org/all/20191108212834.594904349@goodmis.org/
> 
> This patch itself is just a variant of [3].
> 
> [3] https://lore.kernel.org/all/20191108213450.891579507@goodmis.org/
> 
> > 
> > There is another patch series on the list from some of your colleagues which
> > uses dynamic trampolines to try to avoid that ops list overhead, and it's not
> > clear to me whether these are trying to solve the largely same problem or
> > something different. That other thread is at:
> > 
> >   https://lore.kernel.org/linux-arm-kernel/20220316100132.244849-1-bobo.shaobowang@huawei.com/
> > 
> > ... and I've added the relevant parties to CC here, since there doesn't seem to
> > be any overlap in the CC lists of the two threads.
> 
> We're not working to solve the same problem. The trampoline introduced
> in this series helps us to monitor kernel function or another bpf prog
> with bpf, and also helps us to use bpf prog like a normal kernel
> function pointer.

Ok, but why is it necessary to have a special trampoline?

Is that *just* to avoid overhead, or do you need to do something special that
the regular trampoline won't do?

> > 
> > In that other thread I've suggested a general approach we could follow at:
> >   
> >   https://lore.kernel.org/linux-arm-kernel/YmGF%2FOpIhAF8YeVq@lakrids/
> >
> 
> Is it possible for a kernel function to take a long jump to common
> trampoline when we get a huge kernel image?

It is possible, but only where the kernel Image itself is massive and the .text
section exceeeds 128MiB, at which point other things break anyway. Practically
speaking, this doesn't happen for production kernels, or reasonable test
kernels.

I've been meaning to add some logic to detect this at boot time and idsable
ftrace (or at build time), since live patching would also be broken in that
case.

> > As noted in that thread, I have a few concerns which equally apply here:
> > 
> > * Due to the limited range of BL instructions, it's not always possible to
> >   patch an ftrace call-site to branch to an arbitrary trampoline. The way this
> >   works for ftrace today relies upon knowingthe set of trampolines at
> >   compile-time, and allocating module PLTs for those, and that approach cannot
> >   work reliably for dynanically allocated trampolines.
> 
> Currently patch 5 returns -ENOTSUPP when long jump is detected, so no
> bpf trampoline is constructed for out of range patch-site:
> 
> if (is_long_jump(orig_call, image))
> 	return -ENOTSUPP;

Sure, my point is that in practice that means that (from the user's PoV) this
may randomly fail to work, and I'd like something that we can ensure works
consistently.

> >   I'd strongly prefer to avoid custom tramplines unless they're strictly
> >   necessary for functional reasons, so that we can have this work reliably and
> >   consistently.
> 
> bpf trampoline is needed by bpf itself, not to replace ftrace trampolines.

As above, can you please let me know *why* specifically it is needed? Why can't
we invoke the BPF code through the usual ops mechanism?

Is that to avoid overhead, or are there other functional reasons you need a
special trampoline?

> >> * If this is mostly about avoiding the ops list processing overhead, I
> beleive
> >   we can implement some custom ops support more generally in ftrace which would
> >   still use a common trampoline but could directly call into those custom ops.
> >   I would strongly prefer this over custom trampolines.
> > 
> > * I'm looking to minimize the set of regs ftrace saves, and never save a full
> >   pt_regs, since today we (incompletely) fill that with bogus values and cannot
> >   acquire some state reliably (e.g. PSTATE). I'd like to avoid usage of pt_regs
> >   unless necessary, and I don't want to add additional reliance upon that
> >   structure.
> 
> Even if such a common trampoline is used, bpf trampoline is still
> necessary since we need to construct custom instructions to implement
> bpf functions, for example, to implement kernel function pointer with a
> bpf prog.

Sorry, but I'm struggling to understand this. What specifically do you need to
do that means this can't use the same calling convention as the regular ops
function pointers?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 1/6] arm64: ftrace: Add ftrace direct call support
@ 2022-05-26 10:06         ` Mark Rutland
  0 siblings, 0 replies; 60+ messages in thread
From: Mark Rutland @ 2022-05-26 10:06 UTC (permalink / raw)
  To: Xu Kuohai
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi, Wang ShaoBo,
	cj.chengjian, huawei.libin, xiexiuqi, liwei391

On Thu, May 26, 2022 at 05:45:03PM +0800, Xu Kuohai wrote:
> On 5/25/2022 9:38 PM, Mark Rutland wrote:
> > On Wed, May 18, 2022 at 09:16:33AM -0400, Xu Kuohai wrote:
> >> Add ftrace direct support for arm64.
> >>
> >> 1. When there is custom trampoline only, replace the fentry nop to a
> >>    jump instruction that jumps directly to the custom trampoline.
> >>
> >> 2. When ftrace trampoline and custom trampoline coexist, jump from
> >>    fentry to ftrace trampoline first, then jump to custom trampoline
> >>    when ftrace trampoline exits. The current unused register
> >>    pt_regs->orig_x0 is used as an intermediary for jumping from ftrace
> >>    trampoline to custom trampoline.
> > 
> > For those of us not all that familiar with BPF, can you explain *why* you want
> > this? The above explains what the patch implements, but not why that's useful.
> > 
> > e.g. is this just to avoid the overhead of the ops list processing in the
> > regular ftrace code, or is the custom trampoline there to allow you to do
> > something special?
> 
> IIUC, ftrace direct call was designed to *remove* the unnecessary
> overhead of saving regs completely [1][2].

Ok. My plan is to get rid of most of the register saving generally, so I think
that aspect can be solved without direct calls.

> [1]
> https://lore.kernel.org/all/20191022175052.frjzlnjjfwwfov64@ast-mbp.dhcp.thefacebook.com/
> [2] https://lore.kernel.org/all/20191108212834.594904349@goodmis.org/
> 
> This patch itself is just a variant of [3].
> 
> [3] https://lore.kernel.org/all/20191108213450.891579507@goodmis.org/
> 
> > 
> > There is another patch series on the list from some of your colleagues which
> > uses dynamic trampolines to try to avoid that ops list overhead, and it's not
> > clear to me whether these are trying to solve the largely same problem or
> > something different. That other thread is at:
> > 
> >   https://lore.kernel.org/linux-arm-kernel/20220316100132.244849-1-bobo.shaobowang@huawei.com/
> > 
> > ... and I've added the relevant parties to CC here, since there doesn't seem to
> > be any overlap in the CC lists of the two threads.
> 
> We're not working to solve the same problem. The trampoline introduced
> in this series helps us to monitor kernel function or another bpf prog
> with bpf, and also helps us to use bpf prog like a normal kernel
> function pointer.

Ok, but why is it necessary to have a special trampoline?

Is that *just* to avoid overhead, or do you need to do something special that
the regular trampoline won't do?

> > 
> > In that other thread I've suggested a general approach we could follow at:
> >   
> >   https://lore.kernel.org/linux-arm-kernel/YmGF%2FOpIhAF8YeVq@lakrids/
> >
> 
> Is it possible for a kernel function to take a long jump to common
> trampoline when we get a huge kernel image?

It is possible, but only where the kernel Image itself is massive and the .text
section exceeeds 128MiB, at which point other things break anyway. Practically
speaking, this doesn't happen for production kernels, or reasonable test
kernels.

I've been meaning to add some logic to detect this at boot time and idsable
ftrace (or at build time), since live patching would also be broken in that
case.

> > As noted in that thread, I have a few concerns which equally apply here:
> > 
> > * Due to the limited range of BL instructions, it's not always possible to
> >   patch an ftrace call-site to branch to an arbitrary trampoline. The way this
> >   works for ftrace today relies upon knowingthe set of trampolines at
> >   compile-time, and allocating module PLTs for those, and that approach cannot
> >   work reliably for dynanically allocated trampolines.
> 
> Currently patch 5 returns -ENOTSUPP when long jump is detected, so no
> bpf trampoline is constructed for out of range patch-site:
> 
> if (is_long_jump(orig_call, image))
> 	return -ENOTSUPP;

Sure, my point is that in practice that means that (from the user's PoV) this
may randomly fail to work, and I'd like something that we can ensure works
consistently.

> >   I'd strongly prefer to avoid custom tramplines unless they're strictly
> >   necessary for functional reasons, so that we can have this work reliably and
> >   consistently.
> 
> bpf trampoline is needed by bpf itself, not to replace ftrace trampolines.

As above, can you please let me know *why* specifically it is needed? Why can't
we invoke the BPF code through the usual ops mechanism?

Is that to avoid overhead, or are there other functional reasons you need a
special trampoline?

> >> * If this is mostly about avoiding the ops list processing overhead, I
> beleive
> >   we can implement some custom ops support more generally in ftrace which would
> >   still use a common trampoline but could directly call into those custom ops.
> >   I would strongly prefer this over custom trampolines.
> > 
> > * I'm looking to minimize the set of regs ftrace saves, and never save a full
> >   pt_regs, since today we (incompletely) fill that with bogus values and cannot
> >   acquire some state reliably (e.g. PSTATE). I'd like to avoid usage of pt_regs
> >   unless necessary, and I don't want to add additional reliance upon that
> >   structure.
> 
> Even if such a common trampoline is used, bpf trampoline is still
> necessary since we need to construct custom instructions to implement
> bpf functions, for example, to implement kernel function pointer with a
> bpf prog.

Sorry, but I'm struggling to understand this. What specifically do you need to
do that means this can't use the same calling convention as the regular ops
function pointers?

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 3/6] bpf: Remove is_valid_bpf_tramp_flags()
  2022-05-26  9:45       ` Xu Kuohai
@ 2022-05-26 10:12         ` Mark Rutland
  -1 siblings, 0 replies; 60+ messages in thread
From: Mark Rutland @ 2022-05-26 10:12 UTC (permalink / raw)
  To: Xu Kuohai
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi

On Thu, May 26, 2022 at 05:45:25PM +0800, Xu Kuohai wrote:
> On 5/25/2022 9:45 PM, Mark Rutland wrote:
> > On Wed, May 18, 2022 at 09:16:35AM -0400, Xu Kuohai wrote:
> >> BPF_TRAM_F_XXX flags are not used by user code and are almost constant
> >> at compile time, so run time validation is a bit overkill. Remove
> >> is_valid_bpf_tramp_flags() and add some usage comments.
> >>
> >> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
> >> Acked-by: Song Liu <songliubraving@fb.com>
> > 
> > Am I right in thinking this is independent of the arm64-specific bits, and
> > could be taken on its own now?
> > 
> 
> Currenly is_valid_bpf_tramp_flags() is defined in x86 and called before
> bpf trampoline is constructed. The check logic is irrelevant to the
> architecture code. So we also need to call this function on arm64. But
> as Alexei pointed out, the check is not requried, so it's better to
> remove it before adding bpf trampoline to arm64.

Cool. So this patch could be merged now, even if the rest of the series needs
more work?

Thanks,
Mark.

> >> ---
> >>  arch/x86/net/bpf_jit_comp.c | 20 --------------------
> >>  kernel/bpf/bpf_struct_ops.c |  3 +++
> >>  kernel/bpf/trampoline.c     |  3 +++
> >>  3 files changed, 6 insertions(+), 20 deletions(-)
> >>
> >> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> >> index a2b6d197c226..7698ef3b4821 100644
> >> --- a/arch/x86/net/bpf_jit_comp.c
> >> +++ b/arch/x86/net/bpf_jit_comp.c
> >> @@ -1922,23 +1922,6 @@ static int invoke_bpf_mod_ret(const struct btf_func_model *m, u8 **pprog,
> >>  	return 0;
> >>  }
> >>  
> >> -static bool is_valid_bpf_tramp_flags(unsigned int flags)
> >> -{
> >> -	if ((flags & BPF_TRAMP_F_RESTORE_REGS) &&
> >> -	    (flags & BPF_TRAMP_F_SKIP_FRAME))
> >> -		return false;
> >> -
> >> -	/*
> >> -	 * BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops,
> >> -	 * and it must be used alone.
> >> -	 */
> >> -	if ((flags & BPF_TRAMP_F_RET_FENTRY_RET) &&
> >> -	    (flags & ~BPF_TRAMP_F_RET_FENTRY_RET))
> >> -		return false;
> >> -
> >> -	return true;
> >> -}
> >> -
> >>  /* Example:
> >>   * __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev);
> >>   * its 'struct btf_func_model' will be nr_args=2
> >> @@ -2017,9 +2000,6 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
> >>  	if (nr_args > 6)
> >>  		return -ENOTSUPP;
> >>  
> >> -	if (!is_valid_bpf_tramp_flags(flags))
> >> -		return -EINVAL;
> >> -
> >>  	/* Generated trampoline stack layout:
> >>  	 *
> >>  	 * RBP + 8         [ return address  ]
> >> diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
> >> index d9a3c9207240..0572cc5aeb28 100644
> >> --- a/kernel/bpf/bpf_struct_ops.c
> >> +++ b/kernel/bpf/bpf_struct_ops.c
> >> @@ -341,6 +341,9 @@ int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks,
> >>  
> >>  	tlinks[BPF_TRAMP_FENTRY].links[0] = link;
> >>  	tlinks[BPF_TRAMP_FENTRY].nr_links = 1;
> >> +	/* BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops,
> >> +	 * and it must be used alone.
> >> +	 */
> >>  	flags = model->ret_size > 0 ? BPF_TRAMP_F_RET_FENTRY_RET : 0;
> >>  	return arch_prepare_bpf_trampoline(NULL, image, image_end,
> >>  					   model, flags, tlinks, NULL);
> >> diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
> >> index 93c7675f0c9e..bd3f2e673874 100644
> >> --- a/kernel/bpf/trampoline.c
> >> +++ b/kernel/bpf/trampoline.c
> >> @@ -358,6 +358,9 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr)
> >>  
> >>  	if (tlinks[BPF_TRAMP_FEXIT].nr_links ||
> >>  	    tlinks[BPF_TRAMP_MODIFY_RETURN].nr_links)
> >> +		/* NOTE: BPF_TRAMP_F_RESTORE_REGS and BPF_TRAMP_F_SKIP_FRAME
> >> +		 * should not be set together.
> >> +		 */
> >>  		flags = BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME;
> >>  
> >>  	if (ip_arg)
> >> -- 
> >> 2.30.2
> >>
> > .
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 3/6] bpf: Remove is_valid_bpf_tramp_flags()
@ 2022-05-26 10:12         ` Mark Rutland
  0 siblings, 0 replies; 60+ messages in thread
From: Mark Rutland @ 2022-05-26 10:12 UTC (permalink / raw)
  To: Xu Kuohai
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi

On Thu, May 26, 2022 at 05:45:25PM +0800, Xu Kuohai wrote:
> On 5/25/2022 9:45 PM, Mark Rutland wrote:
> > On Wed, May 18, 2022 at 09:16:35AM -0400, Xu Kuohai wrote:
> >> BPF_TRAM_F_XXX flags are not used by user code and are almost constant
> >> at compile time, so run time validation is a bit overkill. Remove
> >> is_valid_bpf_tramp_flags() and add some usage comments.
> >>
> >> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
> >> Acked-by: Song Liu <songliubraving@fb.com>
> > 
> > Am I right in thinking this is independent of the arm64-specific bits, and
> > could be taken on its own now?
> > 
> 
> Currenly is_valid_bpf_tramp_flags() is defined in x86 and called before
> bpf trampoline is constructed. The check logic is irrelevant to the
> architecture code. So we also need to call this function on arm64. But
> as Alexei pointed out, the check is not requried, so it's better to
> remove it before adding bpf trampoline to arm64.

Cool. So this patch could be merged now, even if the rest of the series needs
more work?

Thanks,
Mark.

> >> ---
> >>  arch/x86/net/bpf_jit_comp.c | 20 --------------------
> >>  kernel/bpf/bpf_struct_ops.c |  3 +++
> >>  kernel/bpf/trampoline.c     |  3 +++
> >>  3 files changed, 6 insertions(+), 20 deletions(-)
> >>
> >> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> >> index a2b6d197c226..7698ef3b4821 100644
> >> --- a/arch/x86/net/bpf_jit_comp.c
> >> +++ b/arch/x86/net/bpf_jit_comp.c
> >> @@ -1922,23 +1922,6 @@ static int invoke_bpf_mod_ret(const struct btf_func_model *m, u8 **pprog,
> >>  	return 0;
> >>  }
> >>  
> >> -static bool is_valid_bpf_tramp_flags(unsigned int flags)
> >> -{
> >> -	if ((flags & BPF_TRAMP_F_RESTORE_REGS) &&
> >> -	    (flags & BPF_TRAMP_F_SKIP_FRAME))
> >> -		return false;
> >> -
> >> -	/*
> >> -	 * BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops,
> >> -	 * and it must be used alone.
> >> -	 */
> >> -	if ((flags & BPF_TRAMP_F_RET_FENTRY_RET) &&
> >> -	    (flags & ~BPF_TRAMP_F_RET_FENTRY_RET))
> >> -		return false;
> >> -
> >> -	return true;
> >> -}
> >> -
> >>  /* Example:
> >>   * __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev);
> >>   * its 'struct btf_func_model' will be nr_args=2
> >> @@ -2017,9 +2000,6 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
> >>  	if (nr_args > 6)
> >>  		return -ENOTSUPP;
> >>  
> >> -	if (!is_valid_bpf_tramp_flags(flags))
> >> -		return -EINVAL;
> >> -
> >>  	/* Generated trampoline stack layout:
> >>  	 *
> >>  	 * RBP + 8         [ return address  ]
> >> diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
> >> index d9a3c9207240..0572cc5aeb28 100644
> >> --- a/kernel/bpf/bpf_struct_ops.c
> >> +++ b/kernel/bpf/bpf_struct_ops.c
> >> @@ -341,6 +341,9 @@ int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks,
> >>  
> >>  	tlinks[BPF_TRAMP_FENTRY].links[0] = link;
> >>  	tlinks[BPF_TRAMP_FENTRY].nr_links = 1;
> >> +	/* BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops,
> >> +	 * and it must be used alone.
> >> +	 */
> >>  	flags = model->ret_size > 0 ? BPF_TRAMP_F_RET_FENTRY_RET : 0;
> >>  	return arch_prepare_bpf_trampoline(NULL, image, image_end,
> >>  					   model, flags, tlinks, NULL);
> >> diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
> >> index 93c7675f0c9e..bd3f2e673874 100644
> >> --- a/kernel/bpf/trampoline.c
> >> +++ b/kernel/bpf/trampoline.c
> >> @@ -358,6 +358,9 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr)
> >>  
> >>  	if (tlinks[BPF_TRAMP_FEXIT].nr_links ||
> >>  	    tlinks[BPF_TRAMP_MODIFY_RETURN].nr_links)
> >> +		/* NOTE: BPF_TRAMP_F_RESTORE_REGS and BPF_TRAMP_F_SKIP_FRAME
> >> +		 * should not be set together.
> >> +		 */
> >>  		flags = BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME;
> >>  
> >>  	if (ip_arg)
> >> -- 
> >> 2.30.2
> >>
> > .
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 4/6] bpf, arm64: Impelment bpf_arch_text_poke() for arm64
  2022-05-26  9:45       ` Xu Kuohai
@ 2022-05-26 10:34         ` Mark Rutland
  -1 siblings, 0 replies; 60+ messages in thread
From: Mark Rutland @ 2022-05-26 10:34 UTC (permalink / raw)
  To: Xu Kuohai
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi

On Thu, May 26, 2022 at 05:45:30PM +0800, Xu Kuohai wrote:
> On 5/25/2022 10:10 PM, Mark Rutland wrote:
> > On Wed, May 18, 2022 at 09:16:36AM -0400, Xu Kuohai wrote:
> >> Impelment bpf_arch_text_poke() for arm64, so bpf trampoline code can use
> >> it to replace nop with jump, or replace jump with nop.
> >>
> >> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
> >> Acked-by: Song Liu <songliubraving@fb.com>
> >> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
> >> ---
> >>  arch/arm64/net/bpf_jit.h      |   1 +
> >>  arch/arm64/net/bpf_jit_comp.c | 107 +++++++++++++++++++++++++++++++---
> >>  2 files changed, 99 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h
> >> index 194c95ccc1cf..1c4b0075a3e2 100644
> >> --- a/arch/arm64/net/bpf_jit.h
> >> +++ b/arch/arm64/net/bpf_jit.h
> >> @@ -270,6 +270,7 @@
> >>  #define A64_BTI_C  A64_HINT(AARCH64_INSN_HINT_BTIC)
> >>  #define A64_BTI_J  A64_HINT(AARCH64_INSN_HINT_BTIJ)
> >>  #define A64_BTI_JC A64_HINT(AARCH64_INSN_HINT_BTIJC)
> >> +#define A64_NOP    A64_HINT(AARCH64_INSN_HINT_NOP)
> >>  
> >>  /* DMB */
> >>  #define A64_DMB_ISH aarch64_insn_gen_dmb(AARCH64_INSN_MB_ISH)
> >> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
> >> index 8ab4035dea27..5ce6ed5f42a1 100644
> >> --- a/arch/arm64/net/bpf_jit_comp.c
> >> +++ b/arch/arm64/net/bpf_jit_comp.c
> >> @@ -9,6 +9,7 @@
> >>  
> >>  #include <linux/bitfield.h>
> >>  #include <linux/bpf.h>
> >> +#include <linux/memory.h>
> >>  #include <linux/filter.h>
> >>  #include <linux/printk.h>
> >>  #include <linux/slab.h>
> >> @@ -18,6 +19,7 @@
> >>  #include <asm/cacheflush.h>
> >>  #include <asm/debug-monitors.h>
> >>  #include <asm/insn.h>
> >> +#include <asm/patching.h>
> >>  #include <asm/set_memory.h>
> >>  
> >>  #include "bpf_jit.h"
> >> @@ -235,13 +237,13 @@ static bool is_lsi_offset(int offset, int scale)
> >>  	return true;
> >>  }
> >>  
> >> +#define BTI_INSNS (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) ? 1 : 0)
> >> +#define PAC_INSNS (IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL) ? 1 : 0)
> >> +
> >>  /* Tail call offset to jump into */
> >> -#if IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) || \
> >> -	IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL)
> >> -#define PROLOGUE_OFFSET 9
> >> -#else
> >> -#define PROLOGUE_OFFSET 8
> >> -#endif
> >> +#define PROLOGUE_OFFSET	(BTI_INSNS + 2 + PAC_INSNS + 8)
> >> +/* Offset of nop instruction in bpf prog entry to be poked */
> >> +#define POKE_OFFSET	(BTI_INSNS + 1)
> >>  
> >>  static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
> >>  {
> >> @@ -279,12 +281,15 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
> >>  	 *
> >>  	 */
> >>  
> >> +	if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
> >> +		emit(A64_BTI_C, ctx);
> >> +
> >> +	emit(A64_MOV(1, A64_R(9), A64_LR), ctx);
> >> +	emit(A64_NOP, ctx);
> > 
> > I take it the idea is to make this the same as the regular ftrace patch-site
> > sequence, so that this can call the same trampoline(s) ?
> > 
> 
> Yes, we can attach a bpf trampoline to bpf prog.

Just to check, is the BPF trampoline *only* attached to BPF programs, or could
that be attached to a regular ftrace patch-site?

I has assumed that the point of adding direct calls support was so that this
could be called from regular ftrace patch sites, but your replies below on how
the trampoline is protected imply that's not the case.

> > If so, we need some commentary to that effect, and we need some comments in the
> > ftrace code explaining that this needs to be kept in-sync.
> > 
> 
> This is patched by bpf_arch_text_poke(), not ftrace.

I understood that, but if the idea is that the instruction sequence must match,
then we need to make that clear. Otherwise changes to one side may break the
other unexpectedly.

> >> +
> >>  	/* Sign lr */
> >>  	if (IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL))
> >>  		emit(A64_PACIASP, ctx);
> >> -	/* BTI landing pad */
> >> -	else if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
> >> -		emit(A64_BTI_C, ctx);
> >>  
> >>  	/* Save FP and LR registers to stay align with ARM64 AAPCS */
> >>  	emit(A64_PUSH(A64_FP, A64_LR, A64_SP), ctx);
> >> @@ -1529,3 +1534,87 @@ void bpf_jit_free_exec(void *addr)
> >>  {
> >>  	return vfree(addr);
> >>  }
> >> +
> >> +static int gen_branch_or_nop(enum aarch64_insn_branch_type type, void *ip,
> >> +			     void *addr, u32 *insn)
> >> +{
> >> +	if (!addr)
> >> +		*insn = aarch64_insn_gen_nop();
> >> +	else
> >> +		*insn = aarch64_insn_gen_branch_imm((unsigned long)ip,
> >> +						    (unsigned long)addr,
> >> +						    type);
> >> +
> >> +	return *insn != AARCH64_BREAK_FAULT ? 0 : -EFAULT;
> >> +}
> >> +
> >> +int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type poke_type,
> >> +		       void *old_addr, void *new_addr)
> >> +{
> >> +	int ret;
> >> +	u32 old_insn;
> >> +	u32 new_insn;
> >> +	u32 replaced;
> >> +	unsigned long offset = ~0UL;
> >> +	enum aarch64_insn_branch_type branch_type;
> >> +	char namebuf[KSYM_NAME_LEN];
> >> +
> >> +	if (!__bpf_address_lookup((unsigned long)ip, NULL, &offset, namebuf))
> >> +		/* Only poking bpf text is supported. Since kernel function
> >> +		 * entry is set up by ftrace, we reply on ftrace to poke kernel
> >> +		 * functions.
> >> +		 */
> >> +		return -EINVAL;
> >> +
> >> +	/* bpf entry */
> >> +	if (offset == 0UL)
> >> +		/* skip to the nop instruction in bpf prog entry:
> >> +		 * bti c	// if BTI enabled
> >> +		 * mov x9, x30
> >> +		 * nop
> >> +		 */
> >> +		ip = ip + POKE_OFFSET * AARCH64_INSN_SIZE;
> > 
> > When is offset non-zero? is this ever called to patch other instructions, and
> > could this ever be used to try to patch the BTI specifically?
> 
> bpf_arch_text_poke() is also called to patch other instructions, for
> example, bpf_tramp_image_put() calls this to skip calling fexit bpf progs:
> 
> int err = bpf_arch_text_poke(im->ip_after_call, BPF_MOD_JUMP,
>                              NULL, im->ip_epilogue);
> 
> 
> Before this is called, a bpf trampoline looks like:
> 
> 	[...]
> ip_after_call:
> 	nop // to be patched
> 	bl <fexit prog>
> ip_epilogue:
> 	bti j
> 	[...]
> 
> After:
> 	[...]
> ip_after_call:
> 	b <ip_epilogue> // patched
> 	bl <fexit prog>
> ip_epilogue:
> 	bti j
> 	[...]
> 
> 
> > I strongly suspect we need a higher-level API to say "poke the patchable
> > callsite in the prologue", rather than assuming that offset 0 always means
> > that, or it'll be *very* easy for this to go wrong.
> 
> Ah, bpf_arch_text_poke() only patches bpf progs, and the patch-site in
> bpf prog prologue is constructed for bpf_arch_text_poke(), so we always
> know the patch-site offset. There's no compiler generated instruction
> here, so it seems to be not a problem.

I understand all of that. My point is that if anyone ever wants to use
bpf_arch_text_poke() to poke the *actual* first instruction, this will go
wrong.

I see that x86 does the same thing to skip ENDBR, so if this is just a fragile
API, and people are expected to not do that, then fine. It's just confusing.

> >> +
> >> +	if (poke_type == BPF_MOD_CALL)
> >> +		branch_type = AARCH64_INSN_BRANCH_LINK;
> >> +	else
> >> +		branch_type = AARCH64_INSN_BRANCH_NOLINK;
> > 
> > When is poke_type *not* BPF_MOD_CALL?>
> 
> The bpf_tramp_image_put() example above uses BPF_MOD_JUMP.
> 
> > I assume that means BPF also uses this for non-ftrace reasons?
> >
> 
> This function is NOT used for ftrace patch-site. It's only used to patch
> bpf image.

I understand that this isn't ftrace; I meant for the patch site we introduced
in the BPF program that is compatible with the patch-site sequence that ftrace
uses.

> >> +	if (gen_branch_or_nop(branch_type, ip, old_addr, &old_insn) < 0)
> >> +		return -EFAULT;
> >> +
> >> +	if (gen_branch_or_nop(branch_type, ip, new_addr, &new_insn) < 0)
> >> +		return -EFAULT;
> >> +
> >> +	mutex_lock(&text_mutex);
> >> +	if (aarch64_insn_read(ip, &replaced)) {
> >> +		ret = -EFAULT;
> >> +		goto out;
> >> +	}
> >> +
> >> +	if (replaced != old_insn) {
> >> +		ret = -EFAULT;
> >> +		goto out;
> >> +	}
> >> +
> >> +	/* We call aarch64_insn_patch_text_nosync() to replace instruction
> >> +	 * atomically, so no other CPUs will fetch a half-new and half-old
> >> +	 * instruction. But there is chance that another CPU fetches the old
> >> +	 * instruction after bpf_arch_text_poke() finishes, that is, different
> >> +	 * CPUs may execute different versions of instructions at the same
> >> +	 * time before the icache is synchronized by hardware.
> >> +	 *
> >> +	 * 1. when a new trampoline is attached, it is not an issue for
> >> +	 *    different CPUs to jump to different trampolines temporarily.
> >> +	 *
> >> +	 * 2. when an old trampoline is freed, we should wait for all other
> >> +	 *    CPUs to exit the trampoline and make sure the trampoline is no
> >> +	 *    longer reachable, since bpf_tramp_image_put() function already
> >> +	 *    uses percpu_ref and rcu task to do the sync, no need to call the
> >> +	 *    sync interface here.
> >> +	 */
> > 
> > How is RCU used for that? It's not clear to me how that works for PREEMPT_RCU
> > (which is the usual configuration for arm64), since we can easily be in a
> > preemptible context, outside of an RCU read side critical section, yet call
> > into a trampoline.
> > 
> > I know that for livepatching we need to use stacktracing to ensure we've
> > finished using code we'd like to free, and I can't immediately see how you can
> > avoid that here. I'm suspicious that there's still a race where threads can
> > enter the trampoline and it can be subsequently freed.
> > 
> > For ftrace today we get away with entering the existing trampolines when not
> > intended because those are statically allocated, and the race is caught when
> > acquiring the ops inside the ftrace core code. This case is different because
> > the CPU can fetch the instruction and execute that at any time, without any RCU
> > involvement.
> > 
> > Can you give more details on how the scheme described above works? How
> > *exactly*` do you ensure that threads which have entered the trampoline (and
> > may have been immediately preempted by an interrupt) have returned? Which RCU
> > mechanism are you using?
> > 
> > If you can point me at where this is implemented I'm happy to take a look.
> >
> IIUC, task rcu's critical section ends at a voluntary context switch,
> since no volutary context switch occurs in a progoluge, when a task
> rcu's critical section ends, we can ensure no one is running in the
> prologue [1].
> 
> For bpf trampoline, the scenario is similar, except that it may sleep,
> so a reference count is increased when entering the trampoline and
> decreased when exiting the trampoline, so we can wait for the reference
> count to become zero to make sure there is no one in the sleepable
> region [2].
> 
> [1] https://lore.kernel.org/all/20140804192017.GA18337@linux.vnet.ibm.com/
> [2]
> https://lore.kernel.org/bpf/20210316210007.38949-1-alexei.starovoitov@gmail.com/

Ok. Am I correct in understanding that this means the BPF trampoline is only
ever attached to BPF programs and those programs handle this around calling the
BPF trampoline?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 4/6] bpf, arm64: Impelment bpf_arch_text_poke() for arm64
@ 2022-05-26 10:34         ` Mark Rutland
  0 siblings, 0 replies; 60+ messages in thread
From: Mark Rutland @ 2022-05-26 10:34 UTC (permalink / raw)
  To: Xu Kuohai
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi

On Thu, May 26, 2022 at 05:45:30PM +0800, Xu Kuohai wrote:
> On 5/25/2022 10:10 PM, Mark Rutland wrote:
> > On Wed, May 18, 2022 at 09:16:36AM -0400, Xu Kuohai wrote:
> >> Impelment bpf_arch_text_poke() for arm64, so bpf trampoline code can use
> >> it to replace nop with jump, or replace jump with nop.
> >>
> >> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
> >> Acked-by: Song Liu <songliubraving@fb.com>
> >> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
> >> ---
> >>  arch/arm64/net/bpf_jit.h      |   1 +
> >>  arch/arm64/net/bpf_jit_comp.c | 107 +++++++++++++++++++++++++++++++---
> >>  2 files changed, 99 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h
> >> index 194c95ccc1cf..1c4b0075a3e2 100644
> >> --- a/arch/arm64/net/bpf_jit.h
> >> +++ b/arch/arm64/net/bpf_jit.h
> >> @@ -270,6 +270,7 @@
> >>  #define A64_BTI_C  A64_HINT(AARCH64_INSN_HINT_BTIC)
> >>  #define A64_BTI_J  A64_HINT(AARCH64_INSN_HINT_BTIJ)
> >>  #define A64_BTI_JC A64_HINT(AARCH64_INSN_HINT_BTIJC)
> >> +#define A64_NOP    A64_HINT(AARCH64_INSN_HINT_NOP)
> >>  
> >>  /* DMB */
> >>  #define A64_DMB_ISH aarch64_insn_gen_dmb(AARCH64_INSN_MB_ISH)
> >> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
> >> index 8ab4035dea27..5ce6ed5f42a1 100644
> >> --- a/arch/arm64/net/bpf_jit_comp.c
> >> +++ b/arch/arm64/net/bpf_jit_comp.c
> >> @@ -9,6 +9,7 @@
> >>  
> >>  #include <linux/bitfield.h>
> >>  #include <linux/bpf.h>
> >> +#include <linux/memory.h>
> >>  #include <linux/filter.h>
> >>  #include <linux/printk.h>
> >>  #include <linux/slab.h>
> >> @@ -18,6 +19,7 @@
> >>  #include <asm/cacheflush.h>
> >>  #include <asm/debug-monitors.h>
> >>  #include <asm/insn.h>
> >> +#include <asm/patching.h>
> >>  #include <asm/set_memory.h>
> >>  
> >>  #include "bpf_jit.h"
> >> @@ -235,13 +237,13 @@ static bool is_lsi_offset(int offset, int scale)
> >>  	return true;
> >>  }
> >>  
> >> +#define BTI_INSNS (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) ? 1 : 0)
> >> +#define PAC_INSNS (IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL) ? 1 : 0)
> >> +
> >>  /* Tail call offset to jump into */
> >> -#if IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) || \
> >> -	IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL)
> >> -#define PROLOGUE_OFFSET 9
> >> -#else
> >> -#define PROLOGUE_OFFSET 8
> >> -#endif
> >> +#define PROLOGUE_OFFSET	(BTI_INSNS + 2 + PAC_INSNS + 8)
> >> +/* Offset of nop instruction in bpf prog entry to be poked */
> >> +#define POKE_OFFSET	(BTI_INSNS + 1)
> >>  
> >>  static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
> >>  {
> >> @@ -279,12 +281,15 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
> >>  	 *
> >>  	 */
> >>  
> >> +	if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
> >> +		emit(A64_BTI_C, ctx);
> >> +
> >> +	emit(A64_MOV(1, A64_R(9), A64_LR), ctx);
> >> +	emit(A64_NOP, ctx);
> > 
> > I take it the idea is to make this the same as the regular ftrace patch-site
> > sequence, so that this can call the same trampoline(s) ?
> > 
> 
> Yes, we can attach a bpf trampoline to bpf prog.

Just to check, is the BPF trampoline *only* attached to BPF programs, or could
that be attached to a regular ftrace patch-site?

I has assumed that the point of adding direct calls support was so that this
could be called from regular ftrace patch sites, but your replies below on how
the trampoline is protected imply that's not the case.

> > If so, we need some commentary to that effect, and we need some comments in the
> > ftrace code explaining that this needs to be kept in-sync.
> > 
> 
> This is patched by bpf_arch_text_poke(), not ftrace.

I understood that, but if the idea is that the instruction sequence must match,
then we need to make that clear. Otherwise changes to one side may break the
other unexpectedly.

> >> +
> >>  	/* Sign lr */
> >>  	if (IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL))
> >>  		emit(A64_PACIASP, ctx);
> >> -	/* BTI landing pad */
> >> -	else if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
> >> -		emit(A64_BTI_C, ctx);
> >>  
> >>  	/* Save FP and LR registers to stay align with ARM64 AAPCS */
> >>  	emit(A64_PUSH(A64_FP, A64_LR, A64_SP), ctx);
> >> @@ -1529,3 +1534,87 @@ void bpf_jit_free_exec(void *addr)
> >>  {
> >>  	return vfree(addr);
> >>  }
> >> +
> >> +static int gen_branch_or_nop(enum aarch64_insn_branch_type type, void *ip,
> >> +			     void *addr, u32 *insn)
> >> +{
> >> +	if (!addr)
> >> +		*insn = aarch64_insn_gen_nop();
> >> +	else
> >> +		*insn = aarch64_insn_gen_branch_imm((unsigned long)ip,
> >> +						    (unsigned long)addr,
> >> +						    type);
> >> +
> >> +	return *insn != AARCH64_BREAK_FAULT ? 0 : -EFAULT;
> >> +}
> >> +
> >> +int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type poke_type,
> >> +		       void *old_addr, void *new_addr)
> >> +{
> >> +	int ret;
> >> +	u32 old_insn;
> >> +	u32 new_insn;
> >> +	u32 replaced;
> >> +	unsigned long offset = ~0UL;
> >> +	enum aarch64_insn_branch_type branch_type;
> >> +	char namebuf[KSYM_NAME_LEN];
> >> +
> >> +	if (!__bpf_address_lookup((unsigned long)ip, NULL, &offset, namebuf))
> >> +		/* Only poking bpf text is supported. Since kernel function
> >> +		 * entry is set up by ftrace, we reply on ftrace to poke kernel
> >> +		 * functions.
> >> +		 */
> >> +		return -EINVAL;
> >> +
> >> +	/* bpf entry */
> >> +	if (offset == 0UL)
> >> +		/* skip to the nop instruction in bpf prog entry:
> >> +		 * bti c	// if BTI enabled
> >> +		 * mov x9, x30
> >> +		 * nop
> >> +		 */
> >> +		ip = ip + POKE_OFFSET * AARCH64_INSN_SIZE;
> > 
> > When is offset non-zero? is this ever called to patch other instructions, and
> > could this ever be used to try to patch the BTI specifically?
> 
> bpf_arch_text_poke() is also called to patch other instructions, for
> example, bpf_tramp_image_put() calls this to skip calling fexit bpf progs:
> 
> int err = bpf_arch_text_poke(im->ip_after_call, BPF_MOD_JUMP,
>                              NULL, im->ip_epilogue);
> 
> 
> Before this is called, a bpf trampoline looks like:
> 
> 	[...]
> ip_after_call:
> 	nop // to be patched
> 	bl <fexit prog>
> ip_epilogue:
> 	bti j
> 	[...]
> 
> After:
> 	[...]
> ip_after_call:
> 	b <ip_epilogue> // patched
> 	bl <fexit prog>
> ip_epilogue:
> 	bti j
> 	[...]
> 
> 
> > I strongly suspect we need a higher-level API to say "poke the patchable
> > callsite in the prologue", rather than assuming that offset 0 always means
> > that, or it'll be *very* easy for this to go wrong.
> 
> Ah, bpf_arch_text_poke() only patches bpf progs, and the patch-site in
> bpf prog prologue is constructed for bpf_arch_text_poke(), so we always
> know the patch-site offset. There's no compiler generated instruction
> here, so it seems to be not a problem.

I understand all of that. My point is that if anyone ever wants to use
bpf_arch_text_poke() to poke the *actual* first instruction, this will go
wrong.

I see that x86 does the same thing to skip ENDBR, so if this is just a fragile
API, and people are expected to not do that, then fine. It's just confusing.

> >> +
> >> +	if (poke_type == BPF_MOD_CALL)
> >> +		branch_type = AARCH64_INSN_BRANCH_LINK;
> >> +	else
> >> +		branch_type = AARCH64_INSN_BRANCH_NOLINK;
> > 
> > When is poke_type *not* BPF_MOD_CALL?>
> 
> The bpf_tramp_image_put() example above uses BPF_MOD_JUMP.
> 
> > I assume that means BPF also uses this for non-ftrace reasons?
> >
> 
> This function is NOT used for ftrace patch-site. It's only used to patch
> bpf image.

I understand that this isn't ftrace; I meant for the patch site we introduced
in the BPF program that is compatible with the patch-site sequence that ftrace
uses.

> >> +	if (gen_branch_or_nop(branch_type, ip, old_addr, &old_insn) < 0)
> >> +		return -EFAULT;
> >> +
> >> +	if (gen_branch_or_nop(branch_type, ip, new_addr, &new_insn) < 0)
> >> +		return -EFAULT;
> >> +
> >> +	mutex_lock(&text_mutex);
> >> +	if (aarch64_insn_read(ip, &replaced)) {
> >> +		ret = -EFAULT;
> >> +		goto out;
> >> +	}
> >> +
> >> +	if (replaced != old_insn) {
> >> +		ret = -EFAULT;
> >> +		goto out;
> >> +	}
> >> +
> >> +	/* We call aarch64_insn_patch_text_nosync() to replace instruction
> >> +	 * atomically, so no other CPUs will fetch a half-new and half-old
> >> +	 * instruction. But there is chance that another CPU fetches the old
> >> +	 * instruction after bpf_arch_text_poke() finishes, that is, different
> >> +	 * CPUs may execute different versions of instructions at the same
> >> +	 * time before the icache is synchronized by hardware.
> >> +	 *
> >> +	 * 1. when a new trampoline is attached, it is not an issue for
> >> +	 *    different CPUs to jump to different trampolines temporarily.
> >> +	 *
> >> +	 * 2. when an old trampoline is freed, we should wait for all other
> >> +	 *    CPUs to exit the trampoline and make sure the trampoline is no
> >> +	 *    longer reachable, since bpf_tramp_image_put() function already
> >> +	 *    uses percpu_ref and rcu task to do the sync, no need to call the
> >> +	 *    sync interface here.
> >> +	 */
> > 
> > How is RCU used for that? It's not clear to me how that works for PREEMPT_RCU
> > (which is the usual configuration for arm64), since we can easily be in a
> > preemptible context, outside of an RCU read side critical section, yet call
> > into a trampoline.
> > 
> > I know that for livepatching we need to use stacktracing to ensure we've
> > finished using code we'd like to free, and I can't immediately see how you can
> > avoid that here. I'm suspicious that there's still a race where threads can
> > enter the trampoline and it can be subsequently freed.
> > 
> > For ftrace today we get away with entering the existing trampolines when not
> > intended because those are statically allocated, and the race is caught when
> > acquiring the ops inside the ftrace core code. This case is different because
> > the CPU can fetch the instruction and execute that at any time, without any RCU
> > involvement.
> > 
> > Can you give more details on how the scheme described above works? How
> > *exactly*` do you ensure that threads which have entered the trampoline (and
> > may have been immediately preempted by an interrupt) have returned? Which RCU
> > mechanism are you using?
> > 
> > If you can point me at where this is implemented I'm happy to take a look.
> >
> IIUC, task rcu's critical section ends at a voluntary context switch,
> since no volutary context switch occurs in a progoluge, when a task
> rcu's critical section ends, we can ensure no one is running in the
> prologue [1].
> 
> For bpf trampoline, the scenario is similar, except that it may sleep,
> so a reference count is increased when entering the trampoline and
> decreased when exiting the trampoline, so we can wait for the reference
> count to become zero to make sure there is no one in the sleepable
> region [2].
> 
> [1] https://lore.kernel.org/all/20140804192017.GA18337@linux.vnet.ibm.com/
> [2]
> https://lore.kernel.org/bpf/20210316210007.38949-1-alexei.starovoitov@gmail.com/

Ok. Am I correct in understanding that this means the BPF trampoline is only
ever attached to BPF programs and those programs handle this around calling the
BPF trampoline?

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 3/6] bpf: Remove is_valid_bpf_tramp_flags()
  2022-05-26 10:12         ` Mark Rutland
@ 2022-05-26 14:46           ` Xu Kuohai
  -1 siblings, 0 replies; 60+ messages in thread
From: Xu Kuohai @ 2022-05-26 14:46 UTC (permalink / raw)
  To: Mark Rutland
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi

On 5/26/2022 6:12 PM, Mark Rutland wrote:
> On Thu, May 26, 2022 at 05:45:25PM +0800, Xu Kuohai wrote:
>> On 5/25/2022 9:45 PM, Mark Rutland wrote:
>>> On Wed, May 18, 2022 at 09:16:35AM -0400, Xu Kuohai wrote:
>>>> BPF_TRAM_F_XXX flags are not used by user code and are almost constant
>>>> at compile time, so run time validation is a bit overkill. Remove
>>>> is_valid_bpf_tramp_flags() and add some usage comments.
>>>>
>>>> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
>>>> Acked-by: Song Liu <songliubraving@fb.com>
>>>
>>> Am I right in thinking this is independent of the arm64-specific bits, and
>>> could be taken on its own now?
>>>
>>
>> Currenly is_valid_bpf_tramp_flags() is defined in x86 and called before
>> bpf trampoline is constructed. The check logic is irrelevant to the
>> architecture code. So we also need to call this function on arm64. But
>> as Alexei pointed out, the check is not requried, so it's better to
>> remove it before adding bpf trampoline to arm64.
> 
> Cool. So this patch could be merged now, even if the rest of the series needs
> more work?
> 

Agree with you, thanks.

> Thanks,
> Mark.
> 
>>>> ---
>>>>  arch/x86/net/bpf_jit_comp.c | 20 --------------------
>>>>  kernel/bpf/bpf_struct_ops.c |  3 +++
>>>>  kernel/bpf/trampoline.c     |  3 +++
>>>>  3 files changed, 6 insertions(+), 20 deletions(-)
>>>>
>>>> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
>>>> index a2b6d197c226..7698ef3b4821 100644
>>>> --- a/arch/x86/net/bpf_jit_comp.c
>>>> +++ b/arch/x86/net/bpf_jit_comp.c
>>>> @@ -1922,23 +1922,6 @@ static int invoke_bpf_mod_ret(const struct btf_func_model *m, u8 **pprog,
>>>>  	return 0;
>>>>  }
>>>>  
>>>> -static bool is_valid_bpf_tramp_flags(unsigned int flags)
>>>> -{
>>>> -	if ((flags & BPF_TRAMP_F_RESTORE_REGS) &&
>>>> -	    (flags & BPF_TRAMP_F_SKIP_FRAME))
>>>> -		return false;
>>>> -
>>>> -	/*
>>>> -	 * BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops,
>>>> -	 * and it must be used alone.
>>>> -	 */
>>>> -	if ((flags & BPF_TRAMP_F_RET_FENTRY_RET) &&
>>>> -	    (flags & ~BPF_TRAMP_F_RET_FENTRY_RET))
>>>> -		return false;
>>>> -
>>>> -	return true;
>>>> -}
>>>> -
>>>>  /* Example:
>>>>   * __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev);
>>>>   * its 'struct btf_func_model' will be nr_args=2
>>>> @@ -2017,9 +2000,6 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>>>>  	if (nr_args > 6)
>>>>  		return -ENOTSUPP;
>>>>  
>>>> -	if (!is_valid_bpf_tramp_flags(flags))
>>>> -		return -EINVAL;
>>>> -
>>>>  	/* Generated trampoline stack layout:
>>>>  	 *
>>>>  	 * RBP + 8         [ return address  ]
>>>> diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
>>>> index d9a3c9207240..0572cc5aeb28 100644
>>>> --- a/kernel/bpf/bpf_struct_ops.c
>>>> +++ b/kernel/bpf/bpf_struct_ops.c
>>>> @@ -341,6 +341,9 @@ int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks,
>>>>  
>>>>  	tlinks[BPF_TRAMP_FENTRY].links[0] = link;
>>>>  	tlinks[BPF_TRAMP_FENTRY].nr_links = 1;
>>>> +	/* BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops,
>>>> +	 * and it must be used alone.
>>>> +	 */
>>>>  	flags = model->ret_size > 0 ? BPF_TRAMP_F_RET_FENTRY_RET : 0;
>>>>  	return arch_prepare_bpf_trampoline(NULL, image, image_end,
>>>>  					   model, flags, tlinks, NULL);
>>>> diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
>>>> index 93c7675f0c9e..bd3f2e673874 100644
>>>> --- a/kernel/bpf/trampoline.c
>>>> +++ b/kernel/bpf/trampoline.c
>>>> @@ -358,6 +358,9 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr)
>>>>  
>>>>  	if (tlinks[BPF_TRAMP_FEXIT].nr_links ||
>>>>  	    tlinks[BPF_TRAMP_MODIFY_RETURN].nr_links)
>>>> +		/* NOTE: BPF_TRAMP_F_RESTORE_REGS and BPF_TRAMP_F_SKIP_FRAME
>>>> +		 * should not be set together.
>>>> +		 */
>>>>  		flags = BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME;
>>>>  
>>>>  	if (ip_arg)
>>>> -- 
>>>> 2.30.2
>>>>
>>> .
>>
> .


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 3/6] bpf: Remove is_valid_bpf_tramp_flags()
@ 2022-05-26 14:46           ` Xu Kuohai
  0 siblings, 0 replies; 60+ messages in thread
From: Xu Kuohai @ 2022-05-26 14:46 UTC (permalink / raw)
  To: Mark Rutland
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi

On 5/26/2022 6:12 PM, Mark Rutland wrote:
> On Thu, May 26, 2022 at 05:45:25PM +0800, Xu Kuohai wrote:
>> On 5/25/2022 9:45 PM, Mark Rutland wrote:
>>> On Wed, May 18, 2022 at 09:16:35AM -0400, Xu Kuohai wrote:
>>>> BPF_TRAM_F_XXX flags are not used by user code and are almost constant
>>>> at compile time, so run time validation is a bit overkill. Remove
>>>> is_valid_bpf_tramp_flags() and add some usage comments.
>>>>
>>>> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
>>>> Acked-by: Song Liu <songliubraving@fb.com>
>>>
>>> Am I right in thinking this is independent of the arm64-specific bits, and
>>> could be taken on its own now?
>>>
>>
>> Currenly is_valid_bpf_tramp_flags() is defined in x86 and called before
>> bpf trampoline is constructed. The check logic is irrelevant to the
>> architecture code. So we also need to call this function on arm64. But
>> as Alexei pointed out, the check is not requried, so it's better to
>> remove it before adding bpf trampoline to arm64.
> 
> Cool. So this patch could be merged now, even if the rest of the series needs
> more work?
> 

Agree with you, thanks.

> Thanks,
> Mark.
> 
>>>> ---
>>>>  arch/x86/net/bpf_jit_comp.c | 20 --------------------
>>>>  kernel/bpf/bpf_struct_ops.c |  3 +++
>>>>  kernel/bpf/trampoline.c     |  3 +++
>>>>  3 files changed, 6 insertions(+), 20 deletions(-)
>>>>
>>>> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
>>>> index a2b6d197c226..7698ef3b4821 100644
>>>> --- a/arch/x86/net/bpf_jit_comp.c
>>>> +++ b/arch/x86/net/bpf_jit_comp.c
>>>> @@ -1922,23 +1922,6 @@ static int invoke_bpf_mod_ret(const struct btf_func_model *m, u8 **pprog,
>>>>  	return 0;
>>>>  }
>>>>  
>>>> -static bool is_valid_bpf_tramp_flags(unsigned int flags)
>>>> -{
>>>> -	if ((flags & BPF_TRAMP_F_RESTORE_REGS) &&
>>>> -	    (flags & BPF_TRAMP_F_SKIP_FRAME))
>>>> -		return false;
>>>> -
>>>> -	/*
>>>> -	 * BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops,
>>>> -	 * and it must be used alone.
>>>> -	 */
>>>> -	if ((flags & BPF_TRAMP_F_RET_FENTRY_RET) &&
>>>> -	    (flags & ~BPF_TRAMP_F_RET_FENTRY_RET))
>>>> -		return false;
>>>> -
>>>> -	return true;
>>>> -}
>>>> -
>>>>  /* Example:
>>>>   * __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev);
>>>>   * its 'struct btf_func_model' will be nr_args=2
>>>> @@ -2017,9 +2000,6 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
>>>>  	if (nr_args > 6)
>>>>  		return -ENOTSUPP;
>>>>  
>>>> -	if (!is_valid_bpf_tramp_flags(flags))
>>>> -		return -EINVAL;
>>>> -
>>>>  	/* Generated trampoline stack layout:
>>>>  	 *
>>>>  	 * RBP + 8         [ return address  ]
>>>> diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
>>>> index d9a3c9207240..0572cc5aeb28 100644
>>>> --- a/kernel/bpf/bpf_struct_ops.c
>>>> +++ b/kernel/bpf/bpf_struct_ops.c
>>>> @@ -341,6 +341,9 @@ int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks,
>>>>  
>>>>  	tlinks[BPF_TRAMP_FENTRY].links[0] = link;
>>>>  	tlinks[BPF_TRAMP_FENTRY].nr_links = 1;
>>>> +	/* BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops,
>>>> +	 * and it must be used alone.
>>>> +	 */
>>>>  	flags = model->ret_size > 0 ? BPF_TRAMP_F_RET_FENTRY_RET : 0;
>>>>  	return arch_prepare_bpf_trampoline(NULL, image, image_end,
>>>>  					   model, flags, tlinks, NULL);
>>>> diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
>>>> index 93c7675f0c9e..bd3f2e673874 100644
>>>> --- a/kernel/bpf/trampoline.c
>>>> +++ b/kernel/bpf/trampoline.c
>>>> @@ -358,6 +358,9 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr)
>>>>  
>>>>  	if (tlinks[BPF_TRAMP_FEXIT].nr_links ||
>>>>  	    tlinks[BPF_TRAMP_MODIFY_RETURN].nr_links)
>>>> +		/* NOTE: BPF_TRAMP_F_RESTORE_REGS and BPF_TRAMP_F_SKIP_FRAME
>>>> +		 * should not be set together.
>>>> +		 */
>>>>  		flags = BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME;
>>>>  
>>>>  	if (ip_arg)
>>>> -- 
>>>> 2.30.2
>>>>
>>> .
>>
> .


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 4/6] bpf, arm64: Impelment bpf_arch_text_poke() for arm64
  2022-05-26 10:34         ` Mark Rutland
@ 2022-05-26 14:47           ` Xu Kuohai
  -1 siblings, 0 replies; 60+ messages in thread
From: Xu Kuohai @ 2022-05-26 14:47 UTC (permalink / raw)
  To: Mark Rutland
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi

On 5/26/2022 6:34 PM, Mark Rutland wrote:
> On Thu, May 26, 2022 at 05:45:30PM +0800, Xu Kuohai wrote:
>> On 5/25/2022 10:10 PM, Mark Rutland wrote:
>>> On Wed, May 18, 2022 at 09:16:36AM -0400, Xu Kuohai wrote:
>>>> Impelment bpf_arch_text_poke() for arm64, so bpf trampoline code can use
>>>> it to replace nop with jump, or replace jump with nop.
>>>>
>>>> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
>>>> Acked-by: Song Liu <songliubraving@fb.com>
>>>> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
>>>> ---
>>>>  arch/arm64/net/bpf_jit.h      |   1 +
>>>>  arch/arm64/net/bpf_jit_comp.c | 107 +++++++++++++++++++++++++++++++---
>>>>  2 files changed, 99 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h
>>>> index 194c95ccc1cf..1c4b0075a3e2 100644
>>>> --- a/arch/arm64/net/bpf_jit.h
>>>> +++ b/arch/arm64/net/bpf_jit.h
>>>> @@ -270,6 +270,7 @@
>>>>  #define A64_BTI_C  A64_HINT(AARCH64_INSN_HINT_BTIC)
>>>>  #define A64_BTI_J  A64_HINT(AARCH64_INSN_HINT_BTIJ)
>>>>  #define A64_BTI_JC A64_HINT(AARCH64_INSN_HINT_BTIJC)
>>>> +#define A64_NOP    A64_HINT(AARCH64_INSN_HINT_NOP)
>>>>  
>>>>  /* DMB */
>>>>  #define A64_DMB_ISH aarch64_insn_gen_dmb(AARCH64_INSN_MB_ISH)
>>>> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
>>>> index 8ab4035dea27..5ce6ed5f42a1 100644
>>>> --- a/arch/arm64/net/bpf_jit_comp.c
>>>> +++ b/arch/arm64/net/bpf_jit_comp.c
>>>> @@ -9,6 +9,7 @@
>>>>  
>>>>  #include <linux/bitfield.h>
>>>>  #include <linux/bpf.h>
>>>> +#include <linux/memory.h>
>>>>  #include <linux/filter.h>
>>>>  #include <linux/printk.h>
>>>>  #include <linux/slab.h>
>>>> @@ -18,6 +19,7 @@
>>>>  #include <asm/cacheflush.h>
>>>>  #include <asm/debug-monitors.h>
>>>>  #include <asm/insn.h>
>>>> +#include <asm/patching.h>
>>>>  #include <asm/set_memory.h>
>>>>  
>>>>  #include "bpf_jit.h"
>>>> @@ -235,13 +237,13 @@ static bool is_lsi_offset(int offset, int scale)
>>>>  	return true;
>>>>  }
>>>>  
>>>> +#define BTI_INSNS (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) ? 1 : 0)
>>>> +#define PAC_INSNS (IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL) ? 1 : 0)
>>>> +
>>>>  /* Tail call offset to jump into */
>>>> -#if IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) || \
>>>> -	IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL)
>>>> -#define PROLOGUE_OFFSET 9
>>>> -#else
>>>> -#define PROLOGUE_OFFSET 8
>>>> -#endif
>>>> +#define PROLOGUE_OFFSET	(BTI_INSNS + 2 + PAC_INSNS + 8)
>>>> +/* Offset of nop instruction in bpf prog entry to be poked */
>>>> +#define POKE_OFFSET	(BTI_INSNS + 1)
>>>>  
>>>>  static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
>>>>  {
>>>> @@ -279,12 +281,15 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
>>>>  	 *
>>>>  	 */
>>>>  
>>>> +	if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
>>>> +		emit(A64_BTI_C, ctx);
>>>> +
>>>> +	emit(A64_MOV(1, A64_R(9), A64_LR), ctx);
>>>> +	emit(A64_NOP, ctx);
>>>
>>> I take it the idea is to make this the same as the regular ftrace patch-site
>>> sequence, so that this can call the same trampoline(s) ?
>>>
>>
>> Yes, we can attach a bpf trampoline to bpf prog.
> 
> Just to check, is the BPF trampoline *only* attached to BPF programs, or could
> that be attached to a regular ftrace patch-site?
> 
> I has assumed that the point of adding direct calls support was so that this
> could be called from regular ftrace patch sites, but your replies below on how
> the trampoline is protected imply that's not the case.
> 

Sorry for the confusion. bpf trampoline could be attached to a regular
ftrace patch-site. In this scenario the patch-site is patched by ftrace.

>>> If so, we need some commentary to that effect, and we need some comments in the
>>> ftrace code explaining that this needs to be kept in-sync.
>>>
>>
>> This is patched by bpf_arch_text_poke(), not ftrace.
> > I understood that, but if the idea is that the instruction sequence
must match,
> then we need to make that clear. Otherwise changes to one side may break the
> other unexpectedly.
> 

Yes, it's better that these two matches. But I don't think it's a must,
since the bpf proglogue is patched by bpf_arch_text_poke(), while a
regular kernel function progolue is patched by ftrace.

>>>> +
>>>>  	/* Sign lr */
>>>>  	if (IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL))
>>>>  		emit(A64_PACIASP, ctx);
>>>> -	/* BTI landing pad */
>>>> -	else if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
>>>> -		emit(A64_BTI_C, ctx);
>>>>  
>>>>  	/* Save FP and LR registers to stay align with ARM64 AAPCS */
>>>>  	emit(A64_PUSH(A64_FP, A64_LR, A64_SP), ctx);
>>>> @@ -1529,3 +1534,87 @@ void bpf_jit_free_exec(void *addr)
>>>>  {
>>>>  	return vfree(addr);
>>>>  }
>>>> +
>>>> +static int gen_branch_or_nop(enum aarch64_insn_branch_type type, void *ip,
>>>> +			     void *addr, u32 *insn)
>>>> +{
>>>> +	if (!addr)
>>>> +		*insn = aarch64_insn_gen_nop();
>>>> +	else
>>>> +		*insn = aarch64_insn_gen_branch_imm((unsigned long)ip,
>>>> +						    (unsigned long)addr,
>>>> +						    type);
>>>> +
>>>> +	return *insn != AARCH64_BREAK_FAULT ? 0 : -EFAULT;
>>>> +}
>>>> +
>>>> +int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type poke_type,
>>>> +		       void *old_addr, void *new_addr)
>>>> +{
>>>> +	int ret;
>>>> +	u32 old_insn;
>>>> +	u32 new_insn;
>>>> +	u32 replaced;
>>>> +	unsigned long offset = ~0UL;
>>>> +	enum aarch64_insn_branch_type branch_type;
>>>> +	char namebuf[KSYM_NAME_LEN];
>>>> +
>>>> +	if (!__bpf_address_lookup((unsigned long)ip, NULL, &offset, namebuf))
>>>> +		/* Only poking bpf text is supported. Since kernel function
>>>> +		 * entry is set up by ftrace, we reply on ftrace to poke kernel
>>>> +		 * functions.
>>>> +		 */
>>>> +		return -EINVAL;
>>>> +
>>>> +	/* bpf entry */
>>>> +	if (offset == 0UL)
>>>> +		/* skip to the nop instruction in bpf prog entry:
>>>> +		 * bti c	// if BTI enabled
>>>> +		 * mov x9, x30
>>>> +		 * nop
>>>> +		 */
>>>> +		ip = ip + POKE_OFFSET * AARCH64_INSN_SIZE;
>>>
>>> When is offset non-zero? is this ever called to patch other instructions, and
>>> could this ever be used to try to patch the BTI specifically?
>>
>> bpf_arch_text_poke() is also called to patch other instructions, for
>> example, bpf_tramp_image_put() calls this to skip calling fexit bpf progs:
>>
>> int err = bpf_arch_text_poke(im->ip_after_call, BPF_MOD_JUMP,
>>                              NULL, im->ip_epilogue);
>>
>>
>> Before this is called, a bpf trampoline looks like:
>>
>> 	[...]
>> ip_after_call:
>> 	nop // to be patched
>> 	bl <fexit prog>
>> ip_epilogue:
>> 	bti j
>> 	[...]
>>
>> After:
>> 	[...]
>> ip_after_call:
>> 	b <ip_epilogue> // patched
>> 	bl <fexit prog>
>> ip_epilogue:
>> 	bti j
>> 	[...]
>>
>>
>>> I strongly suspect we need a higher-level API to say "poke the patchable
>>> callsite in the prologue", rather than assuming that offset 0 always means
>>> that, or it'll be *very* easy for this to go wrong.
>>
>> Ah, bpf_arch_text_poke() only patches bpf progs, and the patch-site in
>> bpf prog prologue is constructed for bpf_arch_text_poke(), so we always
>> know the patch-site offset. There's no compiler generated instruction
>> here, so it seems to be not a problem.
> 
> I understand all of that. My point is that if anyone ever wants to use
> bpf_arch_text_poke() to poke the *actual* first instruction, this will go
> wrong.
> 
> I see that x86 does the same thing to skip ENDBR, so if this is just a fragile
> API, and people are expected to not do that, then fine. It's just confusing.
> >>>> +
>>>> +	if (poke_type == BPF_MOD_CALL)
>>>> +		branch_type = AARCH64_INSN_BRANCH_LINK;
>>>> +	else
>>>> +		branch_type = AARCH64_INSN_BRANCH_NOLINK;
>>>
>>> When is poke_type *not* BPF_MOD_CALL?>
>>
>> The bpf_tramp_image_put() example above uses BPF_MOD_JUMP.
>>
>>> I assume that means BPF also uses this for non-ftrace reasons?
>>>
>>
>> This function is NOT used for ftrace patch-site. It's only used to patch
>> bpf image.
> 
> I understand that this isn't ftrace; I meant for the patch site we introduced
> in the BPF program that is compatible with the patch-site sequence that ftrace
> uses.
> 
>>>> +	if (gen_branch_or_nop(branch_type, ip, old_addr, &old_insn) < 0)
>>>> +		return -EFAULT;
>>>> +
>>>> +	if (gen_branch_or_nop(branch_type, ip, new_addr, &new_insn) < 0)
>>>> +		return -EFAULT;
>>>> +
>>>> +	mutex_lock(&text_mutex);
>>>> +	if (aarch64_insn_read(ip, &replaced)) {
>>>> +		ret = -EFAULT;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	if (replaced != old_insn) {
>>>> +		ret = -EFAULT;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	/* We call aarch64_insn_patch_text_nosync() to replace instruction
>>>> +	 * atomically, so no other CPUs will fetch a half-new and half-old
>>>> +	 * instruction. But there is chance that another CPU fetches the old
>>>> +	 * instruction after bpf_arch_text_poke() finishes, that is, different
>>>> +	 * CPUs may execute different versions of instructions at the same
>>>> +	 * time before the icache is synchronized by hardware.
>>>> +	 *
>>>> +	 * 1. when a new trampoline is attached, it is not an issue for
>>>> +	 *    different CPUs to jump to different trampolines temporarily.
>>>> +	 *
>>>> +	 * 2. when an old trampoline is freed, we should wait for all other
>>>> +	 *    CPUs to exit the trampoline and make sure the trampoline is no
>>>> +	 *    longer reachable, since bpf_tramp_image_put() function already
>>>> +	 *    uses percpu_ref and rcu task to do the sync, no need to call the
>>>> +	 *    sync interface here.
>>>> +	 */
>>>
>>> How is RCU used for that? It's not clear to me how that works for PREEMPT_RCU
>>> (which is the usual configuration for arm64), since we can easily be in a
>>> preemptible context, outside of an RCU read side critical section, yet call
>>> into a trampoline.
>>>
>>> I know that for livepatching we need to use stacktracing to ensure we've
>>> finished using code we'd like to free, and I can't immediately see how you can
>>> avoid that here. I'm suspicious that there's still a race where threads can
>>> enter the trampoline and it can be subsequently freed.
>>>
>>> For ftrace today we get away with entering the existing trampolines when not
>>> intended because those are statically allocated, and the race is caught when
>>> acquiring the ops inside the ftrace core code. This case is different because
>>> the CPU can fetch the instruction and execute that at any time, without any RCU
>>> involvement.
>>>
>>> Can you give more details on how the scheme described above works? How
>>> *exactly*` do you ensure that threads which have entered the trampoline (and
>>> may have been immediately preempted by an interrupt) have returned? Which RCU
>>> mechanism are you using?
>>>
>>> If you can point me at where this is implemented I'm happy to take a look.
>>>
>> IIUC, task rcu's critical section ends at a voluntary context switch,
>> since no volutary context switch occurs in a progoluge, when a task
>> rcu's critical section ends, we can ensure no one is running in the
>> prologue [1].
>>
>> For bpf trampoline, the scenario is similar, except that it may sleep,
>> so a reference count is increased when entering the trampoline and
>> decreased when exiting the trampoline, so we can wait for the reference
>> count to become zero to make sure there is no one in the sleepable
>> region [2].
>>
>> [1] https://lore.kernel.org/all/20140804192017.GA18337@linux.vnet.ibm.com/
>> [2]
>> https://lore.kernel.org/bpf/20210316210007.38949-1-alexei.starovoitov@gmail.com/
> 
> Ok. Am I correct in understanding that this means the BPF trampoline is only
> ever attached to BPF programs and those programs handle this around calling the
> BPF trampoline?
> 

bpf trampline could be attached to a regular function. This scenario is
handled by the same function.

> Thanks,
> Mark.
> .


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 4/6] bpf, arm64: Impelment bpf_arch_text_poke() for arm64
@ 2022-05-26 14:47           ` Xu Kuohai
  0 siblings, 0 replies; 60+ messages in thread
From: Xu Kuohai @ 2022-05-26 14:47 UTC (permalink / raw)
  To: Mark Rutland
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi

On 5/26/2022 6:34 PM, Mark Rutland wrote:
> On Thu, May 26, 2022 at 05:45:30PM +0800, Xu Kuohai wrote:
>> On 5/25/2022 10:10 PM, Mark Rutland wrote:
>>> On Wed, May 18, 2022 at 09:16:36AM -0400, Xu Kuohai wrote:
>>>> Impelment bpf_arch_text_poke() for arm64, so bpf trampoline code can use
>>>> it to replace nop with jump, or replace jump with nop.
>>>>
>>>> Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
>>>> Acked-by: Song Liu <songliubraving@fb.com>
>>>> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
>>>> ---
>>>>  arch/arm64/net/bpf_jit.h      |   1 +
>>>>  arch/arm64/net/bpf_jit_comp.c | 107 +++++++++++++++++++++++++++++++---
>>>>  2 files changed, 99 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h
>>>> index 194c95ccc1cf..1c4b0075a3e2 100644
>>>> --- a/arch/arm64/net/bpf_jit.h
>>>> +++ b/arch/arm64/net/bpf_jit.h
>>>> @@ -270,6 +270,7 @@
>>>>  #define A64_BTI_C  A64_HINT(AARCH64_INSN_HINT_BTIC)
>>>>  #define A64_BTI_J  A64_HINT(AARCH64_INSN_HINT_BTIJ)
>>>>  #define A64_BTI_JC A64_HINT(AARCH64_INSN_HINT_BTIJC)
>>>> +#define A64_NOP    A64_HINT(AARCH64_INSN_HINT_NOP)
>>>>  
>>>>  /* DMB */
>>>>  #define A64_DMB_ISH aarch64_insn_gen_dmb(AARCH64_INSN_MB_ISH)
>>>> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
>>>> index 8ab4035dea27..5ce6ed5f42a1 100644
>>>> --- a/arch/arm64/net/bpf_jit_comp.c
>>>> +++ b/arch/arm64/net/bpf_jit_comp.c
>>>> @@ -9,6 +9,7 @@
>>>>  
>>>>  #include <linux/bitfield.h>
>>>>  #include <linux/bpf.h>
>>>> +#include <linux/memory.h>
>>>>  #include <linux/filter.h>
>>>>  #include <linux/printk.h>
>>>>  #include <linux/slab.h>
>>>> @@ -18,6 +19,7 @@
>>>>  #include <asm/cacheflush.h>
>>>>  #include <asm/debug-monitors.h>
>>>>  #include <asm/insn.h>
>>>> +#include <asm/patching.h>
>>>>  #include <asm/set_memory.h>
>>>>  
>>>>  #include "bpf_jit.h"
>>>> @@ -235,13 +237,13 @@ static bool is_lsi_offset(int offset, int scale)
>>>>  	return true;
>>>>  }
>>>>  
>>>> +#define BTI_INSNS (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) ? 1 : 0)
>>>> +#define PAC_INSNS (IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL) ? 1 : 0)
>>>> +
>>>>  /* Tail call offset to jump into */
>>>> -#if IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) || \
>>>> -	IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL)
>>>> -#define PROLOGUE_OFFSET 9
>>>> -#else
>>>> -#define PROLOGUE_OFFSET 8
>>>> -#endif
>>>> +#define PROLOGUE_OFFSET	(BTI_INSNS + 2 + PAC_INSNS + 8)
>>>> +/* Offset of nop instruction in bpf prog entry to be poked */
>>>> +#define POKE_OFFSET	(BTI_INSNS + 1)
>>>>  
>>>>  static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
>>>>  {
>>>> @@ -279,12 +281,15 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf)
>>>>  	 *
>>>>  	 */
>>>>  
>>>> +	if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
>>>> +		emit(A64_BTI_C, ctx);
>>>> +
>>>> +	emit(A64_MOV(1, A64_R(9), A64_LR), ctx);
>>>> +	emit(A64_NOP, ctx);
>>>
>>> I take it the idea is to make this the same as the regular ftrace patch-site
>>> sequence, so that this can call the same trampoline(s) ?
>>>
>>
>> Yes, we can attach a bpf trampoline to bpf prog.
> 
> Just to check, is the BPF trampoline *only* attached to BPF programs, or could
> that be attached to a regular ftrace patch-site?
> 
> I has assumed that the point of adding direct calls support was so that this
> could be called from regular ftrace patch sites, but your replies below on how
> the trampoline is protected imply that's not the case.
> 

Sorry for the confusion. bpf trampoline could be attached to a regular
ftrace patch-site. In this scenario the patch-site is patched by ftrace.

>>> If so, we need some commentary to that effect, and we need some comments in the
>>> ftrace code explaining that this needs to be kept in-sync.
>>>
>>
>> This is patched by bpf_arch_text_poke(), not ftrace.
> > I understood that, but if the idea is that the instruction sequence
must match,
> then we need to make that clear. Otherwise changes to one side may break the
> other unexpectedly.
> 

Yes, it's better that these two matches. But I don't think it's a must,
since the bpf proglogue is patched by bpf_arch_text_poke(), while a
regular kernel function progolue is patched by ftrace.

>>>> +
>>>>  	/* Sign lr */
>>>>  	if (IS_ENABLED(CONFIG_ARM64_PTR_AUTH_KERNEL))
>>>>  		emit(A64_PACIASP, ctx);
>>>> -	/* BTI landing pad */
>>>> -	else if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL))
>>>> -		emit(A64_BTI_C, ctx);
>>>>  
>>>>  	/* Save FP and LR registers to stay align with ARM64 AAPCS */
>>>>  	emit(A64_PUSH(A64_FP, A64_LR, A64_SP), ctx);
>>>> @@ -1529,3 +1534,87 @@ void bpf_jit_free_exec(void *addr)
>>>>  {
>>>>  	return vfree(addr);
>>>>  }
>>>> +
>>>> +static int gen_branch_or_nop(enum aarch64_insn_branch_type type, void *ip,
>>>> +			     void *addr, u32 *insn)
>>>> +{
>>>> +	if (!addr)
>>>> +		*insn = aarch64_insn_gen_nop();
>>>> +	else
>>>> +		*insn = aarch64_insn_gen_branch_imm((unsigned long)ip,
>>>> +						    (unsigned long)addr,
>>>> +						    type);
>>>> +
>>>> +	return *insn != AARCH64_BREAK_FAULT ? 0 : -EFAULT;
>>>> +}
>>>> +
>>>> +int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type poke_type,
>>>> +		       void *old_addr, void *new_addr)
>>>> +{
>>>> +	int ret;
>>>> +	u32 old_insn;
>>>> +	u32 new_insn;
>>>> +	u32 replaced;
>>>> +	unsigned long offset = ~0UL;
>>>> +	enum aarch64_insn_branch_type branch_type;
>>>> +	char namebuf[KSYM_NAME_LEN];
>>>> +
>>>> +	if (!__bpf_address_lookup((unsigned long)ip, NULL, &offset, namebuf))
>>>> +		/* Only poking bpf text is supported. Since kernel function
>>>> +		 * entry is set up by ftrace, we reply on ftrace to poke kernel
>>>> +		 * functions.
>>>> +		 */
>>>> +		return -EINVAL;
>>>> +
>>>> +	/* bpf entry */
>>>> +	if (offset == 0UL)
>>>> +		/* skip to the nop instruction in bpf prog entry:
>>>> +		 * bti c	// if BTI enabled
>>>> +		 * mov x9, x30
>>>> +		 * nop
>>>> +		 */
>>>> +		ip = ip + POKE_OFFSET * AARCH64_INSN_SIZE;
>>>
>>> When is offset non-zero? is this ever called to patch other instructions, and
>>> could this ever be used to try to patch the BTI specifically?
>>
>> bpf_arch_text_poke() is also called to patch other instructions, for
>> example, bpf_tramp_image_put() calls this to skip calling fexit bpf progs:
>>
>> int err = bpf_arch_text_poke(im->ip_after_call, BPF_MOD_JUMP,
>>                              NULL, im->ip_epilogue);
>>
>>
>> Before this is called, a bpf trampoline looks like:
>>
>> 	[...]
>> ip_after_call:
>> 	nop // to be patched
>> 	bl <fexit prog>
>> ip_epilogue:
>> 	bti j
>> 	[...]
>>
>> After:
>> 	[...]
>> ip_after_call:
>> 	b <ip_epilogue> // patched
>> 	bl <fexit prog>
>> ip_epilogue:
>> 	bti j
>> 	[...]
>>
>>
>>> I strongly suspect we need a higher-level API to say "poke the patchable
>>> callsite in the prologue", rather than assuming that offset 0 always means
>>> that, or it'll be *very* easy for this to go wrong.
>>
>> Ah, bpf_arch_text_poke() only patches bpf progs, and the patch-site in
>> bpf prog prologue is constructed for bpf_arch_text_poke(), so we always
>> know the patch-site offset. There's no compiler generated instruction
>> here, so it seems to be not a problem.
> 
> I understand all of that. My point is that if anyone ever wants to use
> bpf_arch_text_poke() to poke the *actual* first instruction, this will go
> wrong.
> 
> I see that x86 does the same thing to skip ENDBR, so if this is just a fragile
> API, and people are expected to not do that, then fine. It's just confusing.
> >>>> +
>>>> +	if (poke_type == BPF_MOD_CALL)
>>>> +		branch_type = AARCH64_INSN_BRANCH_LINK;
>>>> +	else
>>>> +		branch_type = AARCH64_INSN_BRANCH_NOLINK;
>>>
>>> When is poke_type *not* BPF_MOD_CALL?>
>>
>> The bpf_tramp_image_put() example above uses BPF_MOD_JUMP.
>>
>>> I assume that means BPF also uses this for non-ftrace reasons?
>>>
>>
>> This function is NOT used for ftrace patch-site. It's only used to patch
>> bpf image.
> 
> I understand that this isn't ftrace; I meant for the patch site we introduced
> in the BPF program that is compatible with the patch-site sequence that ftrace
> uses.
> 
>>>> +	if (gen_branch_or_nop(branch_type, ip, old_addr, &old_insn) < 0)
>>>> +		return -EFAULT;
>>>> +
>>>> +	if (gen_branch_or_nop(branch_type, ip, new_addr, &new_insn) < 0)
>>>> +		return -EFAULT;
>>>> +
>>>> +	mutex_lock(&text_mutex);
>>>> +	if (aarch64_insn_read(ip, &replaced)) {
>>>> +		ret = -EFAULT;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	if (replaced != old_insn) {
>>>> +		ret = -EFAULT;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	/* We call aarch64_insn_patch_text_nosync() to replace instruction
>>>> +	 * atomically, so no other CPUs will fetch a half-new and half-old
>>>> +	 * instruction. But there is chance that another CPU fetches the old
>>>> +	 * instruction after bpf_arch_text_poke() finishes, that is, different
>>>> +	 * CPUs may execute different versions of instructions at the same
>>>> +	 * time before the icache is synchronized by hardware.
>>>> +	 *
>>>> +	 * 1. when a new trampoline is attached, it is not an issue for
>>>> +	 *    different CPUs to jump to different trampolines temporarily.
>>>> +	 *
>>>> +	 * 2. when an old trampoline is freed, we should wait for all other
>>>> +	 *    CPUs to exit the trampoline and make sure the trampoline is no
>>>> +	 *    longer reachable, since bpf_tramp_image_put() function already
>>>> +	 *    uses percpu_ref and rcu task to do the sync, no need to call the
>>>> +	 *    sync interface here.
>>>> +	 */
>>>
>>> How is RCU used for that? It's not clear to me how that works for PREEMPT_RCU
>>> (which is the usual configuration for arm64), since we can easily be in a
>>> preemptible context, outside of an RCU read side critical section, yet call
>>> into a trampoline.
>>>
>>> I know that for livepatching we need to use stacktracing to ensure we've
>>> finished using code we'd like to free, and I can't immediately see how you can
>>> avoid that here. I'm suspicious that there's still a race where threads can
>>> enter the trampoline and it can be subsequently freed.
>>>
>>> For ftrace today we get away with entering the existing trampolines when not
>>> intended because those are statically allocated, and the race is caught when
>>> acquiring the ops inside the ftrace core code. This case is different because
>>> the CPU can fetch the instruction and execute that at any time, without any RCU
>>> involvement.
>>>
>>> Can you give more details on how the scheme described above works? How
>>> *exactly*` do you ensure that threads which have entered the trampoline (and
>>> may have been immediately preempted by an interrupt) have returned? Which RCU
>>> mechanism are you using?
>>>
>>> If you can point me at where this is implemented I'm happy to take a look.
>>>
>> IIUC, task rcu's critical section ends at a voluntary context switch,
>> since no volutary context switch occurs in a progoluge, when a task
>> rcu's critical section ends, we can ensure no one is running in the
>> prologue [1].
>>
>> For bpf trampoline, the scenario is similar, except that it may sleep,
>> so a reference count is increased when entering the trampoline and
>> decreased when exiting the trampoline, so we can wait for the reference
>> count to become zero to make sure there is no one in the sleepable
>> region [2].
>>
>> [1] https://lore.kernel.org/all/20140804192017.GA18337@linux.vnet.ibm.com/
>> [2]
>> https://lore.kernel.org/bpf/20210316210007.38949-1-alexei.starovoitov@gmail.com/
> 
> Ok. Am I correct in understanding that this means the BPF trampoline is only
> ever attached to BPF programs and those programs handle this around calling the
> BPF trampoline?
> 

bpf trampline could be attached to a regular function. This scenario is
handled by the same function.

> Thanks,
> Mark.
> .


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 1/6] arm64: ftrace: Add ftrace direct call support
  2022-05-26 10:06         ` Mark Rutland
@ 2022-05-26 14:48           ` Xu Kuohai
  -1 siblings, 0 replies; 60+ messages in thread
From: Xu Kuohai @ 2022-05-26 14:48 UTC (permalink / raw)
  To: Mark Rutland
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi, Wang ShaoBo,
	cj.chengjian, huawei.libin, xiexiuqi, liwei391

On 5/26/2022 6:06 PM, Mark Rutland wrote:
> On Thu, May 26, 2022 at 05:45:03PM +0800, Xu Kuohai wrote:
>> On 5/25/2022 9:38 PM, Mark Rutland wrote:
>>> On Wed, May 18, 2022 at 09:16:33AM -0400, Xu Kuohai wrote:
>>>> Add ftrace direct support for arm64.
>>>>
>>>> 1. When there is custom trampoline only, replace the fentry nop to a
>>>>    jump instruction that jumps directly to the custom trampoline.
>>>>
>>>> 2. When ftrace trampoline and custom trampoline coexist, jump from
>>>>    fentry to ftrace trampoline first, then jump to custom trampoline
>>>>    when ftrace trampoline exits. The current unused register
>>>>    pt_regs->orig_x0 is used as an intermediary for jumping from ftrace
>>>>    trampoline to custom trampoline.
>>>
>>> For those of us not all that familiar with BPF, can you explain *why* you want
>>> this? The above explains what the patch implements, but not why that's useful.
>>>
>>> e.g. is this just to avoid the overhead of the ops list processing in the
>>> regular ftrace code, or is the custom trampoline there to allow you to do
>>> something special?
>>
>> IIUC, ftrace direct call was designed to *remove* the unnecessary
>> overhead of saving regs completely [1][2].
> 
> Ok. My plan is to get rid of most of the register saving generally, so I think
> that aspect can be solved without direct calls.
Looking forward to your new solution.

> 
>> [1]
>> https://lore.kernel.org/all/20191022175052.frjzlnjjfwwfov64@ast-mbp.dhcp.thefacebook.com/
>> [2] https://lore.kernel.org/all/20191108212834.594904349@goodmis.org/
>>
>> This patch itself is just a variant of [3].
>>
>> [3] https://lore.kernel.org/all/20191108213450.891579507@goodmis.org/
>>
>>>
>>> There is another patch series on the list from some of your colleagues which
>>> uses dynamic trampolines to try to avoid that ops list overhead, and it's not
>>> clear to me whether these are trying to solve the largely same problem or
>>> something different. That other thread is at:
>>>
>>>   https://lore.kernel.org/linux-arm-kernel/20220316100132.244849-1-bobo.shaobowang@huawei.com/
>>>
>>> ... and I've added the relevant parties to CC here, since there doesn't seem to
>>> be any overlap in the CC lists of the two threads.
>>
>> We're not working to solve the same problem. The trampoline introduced
>> in this series helps us to monitor kernel function or another bpf prog
>> with bpf, and also helps us to use bpf prog like a normal kernel
>> function pointer.
> 
> Ok, but why is it necessary to have a special trampoline?
> 
> Is that *just* to avoid overhead, or do you need to do something special that
> the regular trampoline won't do?
> 

Sorry for not explaining the problem. The main bpf prog accepts only a
single argument 'ctx' in r1, so to allow kernel code to call bpf prog
transparently, we need a trampoline to convert native calling convention
into BPF calling convention [1].

[1] https://lore.kernel.org/bpf/20191114185720.1641606-5-ast@kernel.org/

For example,

SEC("struct_ops/dctcp_state")
void BPF_PROG(dctcp_state, struct sock *sk, __u8 new_state)
{
    // do something
}

The above bpf prog will be compiled to something like this:

dctcp_state:
    r2 = *(u64 *)(r1 + 8)  // new_state
    r1 = *(u64 *)(r1 + 0)  // sk
    ...

It accepts only one argument 'ctx' in r1, and loads the actual arugment
'sk' and 'new_state' from r1 + 0 and r1 + 8, resepectively. So before
calling this prog, we need to construct 'ctx' and store its address to r1.

>>>
>>> In that other thread I've suggested a general approach we could follow at:
>>>   
>>>   https://lore.kernel.org/linux-arm-kernel/YmGF%2FOpIhAF8YeVq@lakrids/
>>>
>>
>> Is it possible for a kernel function to take a long jump to common
>> trampoline when we get a huge kernel image?
> 
> It is possible, but only where the kernel Image itself is massive and the .text
> section exceeeds 128MiB, at which point other things break anyway. Practically
> speaking, this doesn't happen for production kernels, or reasonable test
> kernels.
> 

So even for normal kernel functions, we need some way to construct and
destruct long jumps atomically and safely.

> I've been meaning to add some logic to detect this at boot time and idsable
> ftrace (or at build time), since live patching would also be broken in that
> case.
>>>> As noted in that thread, I have a few concerns which equally apply here:
>>>
>>> * Due to the limited range of BL instructions, it's not always possible to
>>>   patch an ftrace call-site to branch to an arbitrary trampoline. The way this
>>>   works for ftrace today relies upon knowingthe set of trampolines at
>>>   compile-time, and allocating module PLTs for those, and that approach cannot
>>>   work reliably for dynanically allocated trampolines.
>>
>> Currently patch 5 returns -ENOTSUPP when long jump is detected, so no
>> bpf trampoline is constructed for out of range patch-site:
>>
>> if (is_long_jump(orig_call, image))
>> 	return -ENOTSUPP;
> 
> Sure, my point is that in practice that means that (from the user's PoV) this
> may randomly fail to work, and I'd like something that we can ensure works
> consistently.
> 

OK, should I suspend this work until you finish refactoring ftrace?

>>>   I'd strongly prefer to avoid custom tramplines unless they're strictly
>>>   necessary for functional reasons, so that we can have this work reliably and
>>>   consistently.
>>
>> bpf trampoline is needed by bpf itself, not to replace ftrace trampolines.
> 
> As above, can you please let me know *why* specifically it is needed? Why can't
> we invoke the BPF code through the usual ops mechanism?
> 
> Is that to avoid overhead, or are there other functional reasons you need a
> special trampoline?
> 
>>>> * If this is mostly about avoiding the ops list processing overhead, I
>> beleive
>>>   we can implement some custom ops support more generally in ftrace which would
>>>   still use a common trampoline but could directly call into those custom ops.
>>>   I would strongly prefer this over custom trampolines.
>>>
>>> * I'm looking to minimize the set of regs ftrace saves, and never save a full
>>>   pt_regs, since today we (incompletely) fill that with bogus values and cannot
>>>   acquire some state reliably (e.g. PSTATE). I'd like to avoid usage of pt_regs
>>>   unless necessary, and I don't want to add additional reliance upon that
>>>   structure.
>>
>> Even if such a common trampoline is used, bpf trampoline is still
>> necessary since we need to construct custom instructions to implement
>> bpf functions, for example, to implement kernel function pointer with a
>> bpf prog.
> 
> Sorry, but I'm struggling to understand this. What specifically do you need to
> do that means this can't use the same calling convention as the regular ops
> function pointers?
> > Thanks,
> Mark.
> .



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 1/6] arm64: ftrace: Add ftrace direct call support
@ 2022-05-26 14:48           ` Xu Kuohai
  0 siblings, 0 replies; 60+ messages in thread
From: Xu Kuohai @ 2022-05-26 14:48 UTC (permalink / raw)
  To: Mark Rutland
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi, Wang ShaoBo,
	cj.chengjian, huawei.libin, xiexiuqi, liwei391

On 5/26/2022 6:06 PM, Mark Rutland wrote:
> On Thu, May 26, 2022 at 05:45:03PM +0800, Xu Kuohai wrote:
>> On 5/25/2022 9:38 PM, Mark Rutland wrote:
>>> On Wed, May 18, 2022 at 09:16:33AM -0400, Xu Kuohai wrote:
>>>> Add ftrace direct support for arm64.
>>>>
>>>> 1. When there is custom trampoline only, replace the fentry nop to a
>>>>    jump instruction that jumps directly to the custom trampoline.
>>>>
>>>> 2. When ftrace trampoline and custom trampoline coexist, jump from
>>>>    fentry to ftrace trampoline first, then jump to custom trampoline
>>>>    when ftrace trampoline exits. The current unused register
>>>>    pt_regs->orig_x0 is used as an intermediary for jumping from ftrace
>>>>    trampoline to custom trampoline.
>>>
>>> For those of us not all that familiar with BPF, can you explain *why* you want
>>> this? The above explains what the patch implements, but not why that's useful.
>>>
>>> e.g. is this just to avoid the overhead of the ops list processing in the
>>> regular ftrace code, or is the custom trampoline there to allow you to do
>>> something special?
>>
>> IIUC, ftrace direct call was designed to *remove* the unnecessary
>> overhead of saving regs completely [1][2].
> 
> Ok. My plan is to get rid of most of the register saving generally, so I think
> that aspect can be solved without direct calls.
Looking forward to your new solution.

> 
>> [1]
>> https://lore.kernel.org/all/20191022175052.frjzlnjjfwwfov64@ast-mbp.dhcp.thefacebook.com/
>> [2] https://lore.kernel.org/all/20191108212834.594904349@goodmis.org/
>>
>> This patch itself is just a variant of [3].
>>
>> [3] https://lore.kernel.org/all/20191108213450.891579507@goodmis.org/
>>
>>>
>>> There is another patch series on the list from some of your colleagues which
>>> uses dynamic trampolines to try to avoid that ops list overhead, and it's not
>>> clear to me whether these are trying to solve the largely same problem or
>>> something different. That other thread is at:
>>>
>>>   https://lore.kernel.org/linux-arm-kernel/20220316100132.244849-1-bobo.shaobowang@huawei.com/
>>>
>>> ... and I've added the relevant parties to CC here, since there doesn't seem to
>>> be any overlap in the CC lists of the two threads.
>>
>> We're not working to solve the same problem. The trampoline introduced
>> in this series helps us to monitor kernel function or another bpf prog
>> with bpf, and also helps us to use bpf prog like a normal kernel
>> function pointer.
> 
> Ok, but why is it necessary to have a special trampoline?
> 
> Is that *just* to avoid overhead, or do you need to do something special that
> the regular trampoline won't do?
> 

Sorry for not explaining the problem. The main bpf prog accepts only a
single argument 'ctx' in r1, so to allow kernel code to call bpf prog
transparently, we need a trampoline to convert native calling convention
into BPF calling convention [1].

[1] https://lore.kernel.org/bpf/20191114185720.1641606-5-ast@kernel.org/

For example,

SEC("struct_ops/dctcp_state")
void BPF_PROG(dctcp_state, struct sock *sk, __u8 new_state)
{
    // do something
}

The above bpf prog will be compiled to something like this:

dctcp_state:
    r2 = *(u64 *)(r1 + 8)  // new_state
    r1 = *(u64 *)(r1 + 0)  // sk
    ...

It accepts only one argument 'ctx' in r1, and loads the actual arugment
'sk' and 'new_state' from r1 + 0 and r1 + 8, resepectively. So before
calling this prog, we need to construct 'ctx' and store its address to r1.

>>>
>>> In that other thread I've suggested a general approach we could follow at:
>>>   
>>>   https://lore.kernel.org/linux-arm-kernel/YmGF%2FOpIhAF8YeVq@lakrids/
>>>
>>
>> Is it possible for a kernel function to take a long jump to common
>> trampoline when we get a huge kernel image?
> 
> It is possible, but only where the kernel Image itself is massive and the .text
> section exceeeds 128MiB, at which point other things break anyway. Practically
> speaking, this doesn't happen for production kernels, or reasonable test
> kernels.
> 

So even for normal kernel functions, we need some way to construct and
destruct long jumps atomically and safely.

> I've been meaning to add some logic to detect this at boot time and idsable
> ftrace (or at build time), since live patching would also be broken in that
> case.
>>>> As noted in that thread, I have a few concerns which equally apply here:
>>>
>>> * Due to the limited range of BL instructions, it's not always possible to
>>>   patch an ftrace call-site to branch to an arbitrary trampoline. The way this
>>>   works for ftrace today relies upon knowingthe set of trampolines at
>>>   compile-time, and allocating module PLTs for those, and that approach cannot
>>>   work reliably for dynanically allocated trampolines.
>>
>> Currently patch 5 returns -ENOTSUPP when long jump is detected, so no
>> bpf trampoline is constructed for out of range patch-site:
>>
>> if (is_long_jump(orig_call, image))
>> 	return -ENOTSUPP;
> 
> Sure, my point is that in practice that means that (from the user's PoV) this
> may randomly fail to work, and I'd like something that we can ensure works
> consistently.
> 

OK, should I suspend this work until you finish refactoring ftrace?

>>>   I'd strongly prefer to avoid custom tramplines unless they're strictly
>>>   necessary for functional reasons, so that we can have this work reliably and
>>>   consistently.
>>
>> bpf trampoline is needed by bpf itself, not to replace ftrace trampolines.
> 
> As above, can you please let me know *why* specifically it is needed? Why can't
> we invoke the BPF code through the usual ops mechanism?
> 
> Is that to avoid overhead, or are there other functional reasons you need a
> special trampoline?
> 
>>>> * If this is mostly about avoiding the ops list processing overhead, I
>> beleive
>>>   we can implement some custom ops support more generally in ftrace which would
>>>   still use a common trampoline but could directly call into those custom ops.
>>>   I would strongly prefer this over custom trampolines.
>>>
>>> * I'm looking to minimize the set of regs ftrace saves, and never save a full
>>>   pt_regs, since today we (incompletely) fill that with bogus values and cannot
>>>   acquire some state reliably (e.g. PSTATE). I'd like to avoid usage of pt_regs
>>>   unless necessary, and I don't want to add additional reliance upon that
>>>   structure.
>>
>> Even if such a common trampoline is used, bpf trampoline is still
>> necessary since we need to construct custom instructions to implement
>> bpf functions, for example, to implement kernel function pointer with a
>> bpf prog.
> 
> Sorry, but I'm struggling to understand this. What specifically do you need to
> do that means this can't use the same calling convention as the regular ops
> function pointers?
> > Thanks,
> Mark.
> .



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 1/6] arm64: ftrace: Add ftrace direct call support
  2022-05-26 14:48           ` Xu Kuohai
@ 2022-06-06 16:35             ` Mark Rutland
  -1 siblings, 0 replies; 60+ messages in thread
From: Mark Rutland @ 2022-06-06 16:35 UTC (permalink / raw)
  To: Xu Kuohai
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi, Wang ShaoBo,
	cj.chengjian, huawei.libin, xiexiuqi, liwei391

On Thu, May 26, 2022 at 10:48:05PM +0800, Xu Kuohai wrote:
> On 5/26/2022 6:06 PM, Mark Rutland wrote:
> > On Thu, May 26, 2022 at 05:45:03PM +0800, Xu Kuohai wrote:
> >> On 5/25/2022 9:38 PM, Mark Rutland wrote:
> >>> On Wed, May 18, 2022 at 09:16:33AM -0400, Xu Kuohai wrote:
> >>>> Add ftrace direct support for arm64.
> >>>>
> >>>> 1. When there is custom trampoline only, replace the fentry nop to a
> >>>>    jump instruction that jumps directly to the custom trampoline.
> >>>>
> >>>> 2. When ftrace trampoline and custom trampoline coexist, jump from
> >>>>    fentry to ftrace trampoline first, then jump to custom trampoline
> >>>>    when ftrace trampoline exits. The current unused register
> >>>>    pt_regs->orig_x0 is used as an intermediary for jumping from ftrace
> >>>>    trampoline to custom trampoline.
> >>>
> >>> For those of us not all that familiar with BPF, can you explain *why* you want
> >>> this? The above explains what the patch implements, but not why that's useful.
> >>>
> >>> e.g. is this just to avoid the overhead of the ops list processing in the
> >>> regular ftrace code, or is the custom trampoline there to allow you to do
> >>> something special?
> >>
> >> IIUC, ftrace direct call was designed to *remove* the unnecessary
> >> overhead of saving regs completely [1][2].
> > 
> > Ok. My plan is to get rid of most of the register saving generally, so I think
> > that aspect can be solved without direct calls.
> Looking forward to your new solution.

For the register saving rework, I have a WIP branch on my kernel.org repo:

  https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/ftrace/minimal-regs
  git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/ftrace/minimal-regs

I'm working on that at the moment along with a per-callsite ops implementaiton
that would avoid most of the need for custom trampolines (and work with branch
range limitations):

  https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/ftrace/per-callsite-ops
  git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/ftrace/per-callsite-ops

> >> [1]
> >> https://lore.kernel.org/all/20191022175052.frjzlnjjfwwfov64@ast-mbp.dhcp.thefacebook.com/
> >> [2] https://lore.kernel.org/all/20191108212834.594904349@goodmis.org/
> >>
> >> This patch itself is just a variant of [3].
> >>
> >> [3] https://lore.kernel.org/all/20191108213450.891579507@goodmis.org/
> >>
> >>>
> >>> There is another patch series on the list from some of your colleagues which
> >>> uses dynamic trampolines to try to avoid that ops list overhead, and it's not
> >>> clear to me whether these are trying to solve the largely same problem or
> >>> something different. That other thread is at:
> >>>
> >>>   https://lore.kernel.org/linux-arm-kernel/20220316100132.244849-1-bobo.shaobowang@huawei.com/
> >>>
> >>> ... and I've added the relevant parties to CC here, since there doesn't seem to
> >>> be any overlap in the CC lists of the two threads.
> >>
> >> We're not working to solve the same problem. The trampoline introduced
> >> in this series helps us to monitor kernel function or another bpf prog
> >> with bpf, and also helps us to use bpf prog like a normal kernel
> >> function pointer.
> > 
> > Ok, but why is it necessary to have a special trampoline?
> > 
> > Is that *just* to avoid overhead, or do you need to do something special that
> > the regular trampoline won't do?
> > 
> 
> Sorry for not explaining the problem. The main bpf prog accepts only a
> single argument 'ctx' in r1, so to allow kernel code to call bpf prog
> transparently, we need a trampoline to convert native calling convention
> into BPF calling convention [1].
> 
> [1] https://lore.kernel.org/bpf/20191114185720.1641606-5-ast@kernel.org/

Thanks for the pointer; I'll go page that in.

> For example,
> 
> SEC("struct_ops/dctcp_state")
> void BPF_PROG(dctcp_state, struct sock *sk, __u8 new_state)
> {
>     // do something
> }
> 
> The above bpf prog will be compiled to something like this:
> 
> dctcp_state:
>     r2 = *(u64 *)(r1 + 8)  // new_state
>     r1 = *(u64 *)(r1 + 0)  // sk
>     ...
> 
> It accepts only one argument 'ctx' in r1, and loads the actual arugment
> 'sk' and 'new_state' from r1 + 0 and r1 + 8, resepectively. So before
> calling this prog, we need to construct 'ctx' and store its address to r1.
> 
> >>>
> >>> In that other thread I've suggested a general approach we could follow at:
> >>>   
> >>>   https://lore.kernel.org/linux-arm-kernel/YmGF%2FOpIhAF8YeVq@lakrids/
> >>
> >> Is it possible for a kernel function to take a long jump to common
> >> trampoline when we get a huge kernel image?
> > 
> > It is possible, but only where the kernel Image itself is massive and the .text
> > section exceeeds 128MiB, at which point other things break anyway. Practically
> > speaking, this doesn't happen for production kernels, or reasonable test
> > kernels.
> 
> So even for normal kernel functions, we need some way to construct and
> destruct long jumps atomically and safely.

My point was that case is unrealistic for production kernels, and utterly
broken anyway (and as below I intend to make ftrace detect this and mark itself
as broken).

FWIW, an allmodconfig kernel built with GCC 12.1.0 has a ~30MB .text segment,
so for realistic kernels we have plenty of headroom for normal functions to
reach the in-kernel trampoline.

> > I've been meaning to add some logic to detect this at boot time and idsable
> > ftrace (or at build time), since live patching would also be broken in that
> > case.
> >>>> As noted in that thread, I have a few concerns which equally apply here:
> >>>
> >>> * Due to the limited range of BL instructions, it's not always possible to
> >>>   patch an ftrace call-site to branch to an arbitrary trampoline. The way this
> >>>   works for ftrace today relies upon knowingthe set of trampolines at
> >>>   compile-time, and allocating module PLTs for those, and that approach cannot
> >>>   work reliably for dynanically allocated trampolines.
> >>
> >> Currently patch 5 returns -ENOTSUPP when long jump is detected, so no
> >> bpf trampoline is constructed for out of range patch-site:
> >>
> >> if (is_long_jump(orig_call, image))
> >> 	return -ENOTSUPP;
> > 
> > Sure, my point is that in practice that means that (from the user's PoV) this
> > may randomly fail to work, and I'd like something that we can ensure works
> > consistently.
> > 
> 
> OK, should I suspend this work until you finish refactoring ftrace?

Yes; I'd appreciate if we could hold on this for a bit.

I think with some ground work we can avoid most of the painful edge cases and
might be able to avoid the need for custom trampolines.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 1/6] arm64: ftrace: Add ftrace direct call support
@ 2022-06-06 16:35             ` Mark Rutland
  0 siblings, 0 replies; 60+ messages in thread
From: Mark Rutland @ 2022-06-06 16:35 UTC (permalink / raw)
  To: Xu Kuohai
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi, Wang ShaoBo,
	cj.chengjian, huawei.libin, xiexiuqi, liwei391

On Thu, May 26, 2022 at 10:48:05PM +0800, Xu Kuohai wrote:
> On 5/26/2022 6:06 PM, Mark Rutland wrote:
> > On Thu, May 26, 2022 at 05:45:03PM +0800, Xu Kuohai wrote:
> >> On 5/25/2022 9:38 PM, Mark Rutland wrote:
> >>> On Wed, May 18, 2022 at 09:16:33AM -0400, Xu Kuohai wrote:
> >>>> Add ftrace direct support for arm64.
> >>>>
> >>>> 1. When there is custom trampoline only, replace the fentry nop to a
> >>>>    jump instruction that jumps directly to the custom trampoline.
> >>>>
> >>>> 2. When ftrace trampoline and custom trampoline coexist, jump from
> >>>>    fentry to ftrace trampoline first, then jump to custom trampoline
> >>>>    when ftrace trampoline exits. The current unused register
> >>>>    pt_regs->orig_x0 is used as an intermediary for jumping from ftrace
> >>>>    trampoline to custom trampoline.
> >>>
> >>> For those of us not all that familiar with BPF, can you explain *why* you want
> >>> this? The above explains what the patch implements, but not why that's useful.
> >>>
> >>> e.g. is this just to avoid the overhead of the ops list processing in the
> >>> regular ftrace code, or is the custom trampoline there to allow you to do
> >>> something special?
> >>
> >> IIUC, ftrace direct call was designed to *remove* the unnecessary
> >> overhead of saving regs completely [1][2].
> > 
> > Ok. My plan is to get rid of most of the register saving generally, so I think
> > that aspect can be solved without direct calls.
> Looking forward to your new solution.

For the register saving rework, I have a WIP branch on my kernel.org repo:

  https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/ftrace/minimal-regs
  git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/ftrace/minimal-regs

I'm working on that at the moment along with a per-callsite ops implementaiton
that would avoid most of the need for custom trampolines (and work with branch
range limitations):

  https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/ftrace/per-callsite-ops
  git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/ftrace/per-callsite-ops

> >> [1]
> >> https://lore.kernel.org/all/20191022175052.frjzlnjjfwwfov64@ast-mbp.dhcp.thefacebook.com/
> >> [2] https://lore.kernel.org/all/20191108212834.594904349@goodmis.org/
> >>
> >> This patch itself is just a variant of [3].
> >>
> >> [3] https://lore.kernel.org/all/20191108213450.891579507@goodmis.org/
> >>
> >>>
> >>> There is another patch series on the list from some of your colleagues which
> >>> uses dynamic trampolines to try to avoid that ops list overhead, and it's not
> >>> clear to me whether these are trying to solve the largely same problem or
> >>> something different. That other thread is at:
> >>>
> >>>   https://lore.kernel.org/linux-arm-kernel/20220316100132.244849-1-bobo.shaobowang@huawei.com/
> >>>
> >>> ... and I've added the relevant parties to CC here, since there doesn't seem to
> >>> be any overlap in the CC lists of the two threads.
> >>
> >> We're not working to solve the same problem. The trampoline introduced
> >> in this series helps us to monitor kernel function or another bpf prog
> >> with bpf, and also helps us to use bpf prog like a normal kernel
> >> function pointer.
> > 
> > Ok, but why is it necessary to have a special trampoline?
> > 
> > Is that *just* to avoid overhead, or do you need to do something special that
> > the regular trampoline won't do?
> > 
> 
> Sorry for not explaining the problem. The main bpf prog accepts only a
> single argument 'ctx' in r1, so to allow kernel code to call bpf prog
> transparently, we need a trampoline to convert native calling convention
> into BPF calling convention [1].
> 
> [1] https://lore.kernel.org/bpf/20191114185720.1641606-5-ast@kernel.org/

Thanks for the pointer; I'll go page that in.

> For example,
> 
> SEC("struct_ops/dctcp_state")
> void BPF_PROG(dctcp_state, struct sock *sk, __u8 new_state)
> {
>     // do something
> }
> 
> The above bpf prog will be compiled to something like this:
> 
> dctcp_state:
>     r2 = *(u64 *)(r1 + 8)  // new_state
>     r1 = *(u64 *)(r1 + 0)  // sk
>     ...
> 
> It accepts only one argument 'ctx' in r1, and loads the actual arugment
> 'sk' and 'new_state' from r1 + 0 and r1 + 8, resepectively. So before
> calling this prog, we need to construct 'ctx' and store its address to r1.
> 
> >>>
> >>> In that other thread I've suggested a general approach we could follow at:
> >>>   
> >>>   https://lore.kernel.org/linux-arm-kernel/YmGF%2FOpIhAF8YeVq@lakrids/
> >>
> >> Is it possible for a kernel function to take a long jump to common
> >> trampoline when we get a huge kernel image?
> > 
> > It is possible, but only where the kernel Image itself is massive and the .text
> > section exceeeds 128MiB, at which point other things break anyway. Practically
> > speaking, this doesn't happen for production kernels, or reasonable test
> > kernels.
> 
> So even for normal kernel functions, we need some way to construct and
> destruct long jumps atomically and safely.

My point was that case is unrealistic for production kernels, and utterly
broken anyway (and as below I intend to make ftrace detect this and mark itself
as broken).

FWIW, an allmodconfig kernel built with GCC 12.1.0 has a ~30MB .text segment,
so for realistic kernels we have plenty of headroom for normal functions to
reach the in-kernel trampoline.

> > I've been meaning to add some logic to detect this at boot time and idsable
> > ftrace (or at build time), since live patching would also be broken in that
> > case.
> >>>> As noted in that thread, I have a few concerns which equally apply here:
> >>>
> >>> * Due to the limited range of BL instructions, it's not always possible to
> >>>   patch an ftrace call-site to branch to an arbitrary trampoline. The way this
> >>>   works for ftrace today relies upon knowingthe set of trampolines at
> >>>   compile-time, and allocating module PLTs for those, and that approach cannot
> >>>   work reliably for dynanically allocated trampolines.
> >>
> >> Currently patch 5 returns -ENOTSUPP when long jump is detected, so no
> >> bpf trampoline is constructed for out of range patch-site:
> >>
> >> if (is_long_jump(orig_call, image))
> >> 	return -ENOTSUPP;
> > 
> > Sure, my point is that in practice that means that (from the user's PoV) this
> > may randomly fail to work, and I'd like something that we can ensure works
> > consistently.
> > 
> 
> OK, should I suspend this work until you finish refactoring ftrace?

Yes; I'd appreciate if we could hold on this for a bit.

I think with some ground work we can avoid most of the painful edge cases and
might be able to avoid the need for custom trampolines.

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 1/6] arm64: ftrace: Add ftrace direct call support
  2022-06-06 16:35             ` Mark Rutland
@ 2022-06-09  4:27               ` Xu Kuohai
  -1 siblings, 0 replies; 60+ messages in thread
From: Xu Kuohai @ 2022-06-09  4:27 UTC (permalink / raw)
  To: Mark Rutland
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi, Wang ShaoBo,
	cj.chengjian, huawei.libin, xiexiuqi, liwei391

On 6/7/2022 12:35 AM, Mark Rutland wrote:
> On Thu, May 26, 2022 at 10:48:05PM +0800, Xu Kuohai wrote:
>> On 5/26/2022 6:06 PM, Mark Rutland wrote:
>>> On Thu, May 26, 2022 at 05:45:03PM +0800, Xu Kuohai wrote:
>>>> On 5/25/2022 9:38 PM, Mark Rutland wrote:
>>>>> On Wed, May 18, 2022 at 09:16:33AM -0400, Xu Kuohai wrote:
>>>>>> Add ftrace direct support for arm64.
>>>>>>
>>>>>> 1. When there is custom trampoline only, replace the fentry nop to a
>>>>>>    jump instruction that jumps directly to the custom trampoline.
>>>>>>
>>>>>> 2. When ftrace trampoline and custom trampoline coexist, jump from
>>>>>>    fentry to ftrace trampoline first, then jump to custom trampoline
>>>>>>    when ftrace trampoline exits. The current unused register
>>>>>>    pt_regs->orig_x0 is used as an intermediary for jumping from ftrace
>>>>>>    trampoline to custom trampoline.
>>>>>
>>>>> For those of us not all that familiar with BPF, can you explain *why* you want
>>>>> this? The above explains what the patch implements, but not why that's useful.
>>>>>
>>>>> e.g. is this just to avoid the overhead of the ops list processing in the
>>>>> regular ftrace code, or is the custom trampoline there to allow you to do
>>>>> something special?
>>>>
>>>> IIUC, ftrace direct call was designed to *remove* the unnecessary
>>>> overhead of saving regs completely [1][2].
>>>
>>> Ok. My plan is to get rid of most of the register saving generally, so I think
>>> that aspect can be solved without direct calls.
>> Looking forward to your new solution.
> 
> For the register saving rework, I have a WIP branch on my kernel.org repo:
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/ftrace/minimal-regs
>   git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/ftrace/minimal-regs
> 
> I'm working on that at the moment along with a per-callsite ops implementaiton
> that would avoid most of the need for custom trampolines (and work with branch
> range limitations):
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/ftrace/per-callsite-ops
>   git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/ftrace/per-callsite-ops
> 
>>>> [1]
>>>> https://lore.kernel.org/all/20191022175052.frjzlnjjfwwfov64@ast-mbp.dhcp.thefacebook.com/
>>>> [2] https://lore.kernel.org/all/20191108212834.594904349@goodmis.org/
>>>>
>>>> This patch itself is just a variant of [3].
>>>>
>>>> [3] https://lore.kernel.org/all/20191108213450.891579507@goodmis.org/
>>>>
>>>>>
>>>>> There is another patch series on the list from some of your colleagues which
>>>>> uses dynamic trampolines to try to avoid that ops list overhead, and it's not
>>>>> clear to me whether these are trying to solve the largely same problem or
>>>>> something different. That other thread is at:
>>>>>
>>>>>   https://lore.kernel.org/linux-arm-kernel/20220316100132.244849-1-bobo.shaobowang@huawei.com/
>>>>>
>>>>> ... and I've added the relevant parties to CC here, since there doesn't seem to
>>>>> be any overlap in the CC lists of the two threads.
>>>>
>>>> We're not working to solve the same problem. The trampoline introduced
>>>> in this series helps us to monitor kernel function or another bpf prog
>>>> with bpf, and also helps us to use bpf prog like a normal kernel
>>>> function pointer.
>>>
>>> Ok, but why is it necessary to have a special trampoline?
>>>
>>> Is that *just* to avoid overhead, or do you need to do something special that
>>> the regular trampoline won't do?
>>>
>>
>> Sorry for not explaining the problem. The main bpf prog accepts only a
>> single argument 'ctx' in r1, so to allow kernel code to call bpf prog
>> transparently, we need a trampoline to convert native calling convention
>> into BPF calling convention [1].
>>
>> [1] https://lore.kernel.org/bpf/20191114185720.1641606-5-ast@kernel.org/
> 
> Thanks for the pointer; I'll go page that in.
> 
>> For example,
>>
>> SEC("struct_ops/dctcp_state")
>> void BPF_PROG(dctcp_state, struct sock *sk, __u8 new_state)
>> {
>>     // do something
>> }
>>
>> The above bpf prog will be compiled to something like this:
>>
>> dctcp_state:
>>     r2 = *(u64 *)(r1 + 8)  // new_state
>>     r1 = *(u64 *)(r1 + 0)  // sk
>>     ...
>>
>> It accepts only one argument 'ctx' in r1, and loads the actual arugment
>> 'sk' and 'new_state' from r1 + 0 and r1 + 8, resepectively. So before
>> calling this prog, we need to construct 'ctx' and store its address to r1.
>>
>>>>>
>>>>> In that other thread I've suggested a general approach we could follow at:
>>>>>   
>>>>>   https://lore.kernel.org/linux-arm-kernel/YmGF%2FOpIhAF8YeVq@lakrids/
>>>>
>>>> Is it possible for a kernel function to take a long jump to common
>>>> trampoline when we get a huge kernel image?
>>>
>>> It is possible, but only where the kernel Image itself is massive and the .text
>>> section exceeeds 128MiB, at which point other things break anyway. Practically
>>> speaking, this doesn't happen for production kernels, or reasonable test
>>> kernels.
>>
>> So even for normal kernel functions, we need some way to construct and
>> destruct long jumps atomically and safely.
> 
> My point was that case is unrealistic for production kernels, and utterly
> broken anyway (and as below I intend to make ftrace detect this and mark itself
> as broken).
> 
> FWIW, an allmodconfig kernel built with GCC 12.1.0 has a ~30MB .text segment,
> so for realistic kernels we have plenty of headroom for normal functions to
> reach the in-kernel trampoline.
> 
>>> I've been meaning to add some logic to detect this at boot time and idsable
>>> ftrace (or at build time), since live patching would also be broken in that
>>> case.
>>>>>> As noted in that thread, I have a few concerns which equally apply here:
>>>>>
>>>>> * Due to the limited range of BL instructions, it's not always possible to
>>>>>   patch an ftrace call-site to branch to an arbitrary trampoline. The way this
>>>>>   works for ftrace today relies upon knowingthe set of trampolines at
>>>>>   compile-time, and allocating module PLTs for those, and that approach cannot
>>>>>   work reliably for dynanically allocated trampolines.
>>>>
>>>> Currently patch 5 returns -ENOTSUPP when long jump is detected, so no
>>>> bpf trampoline is constructed for out of range patch-site:
>>>>
>>>> if (is_long_jump(orig_call, image))
>>>> 	return -ENOTSUPP;
>>>
>>> Sure, my point is that in practice that means that (from the user's PoV) this
>>> may randomly fail to work, and I'd like something that we can ensure works
>>> consistently.
>>>
>>
>> OK, should I suspend this work until you finish refactoring ftrace?
> 
> Yes; I'd appreciate if we could hold on this for a bit.
> 
> I think with some ground work we can avoid most of the painful edge cases and
> might be able to avoid the need for custom trampolines.
> 

I'v read your WIP code, but unfortunately I didn't find any mechanism to
replace bpf trampoline in your code, sorry.

It looks like bpf trampoline and ftrace works can be done at the same
time. I think for now we can just attach bpf trampoline to bpf prog.
Once your ftrace work is done, we can add support for attaching bpf
trampoline to regular kernel function. Is this OK?

> Thanks,
> Mark.
> .

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 1/6] arm64: ftrace: Add ftrace direct call support
@ 2022-06-09  4:27               ` Xu Kuohai
  0 siblings, 0 replies; 60+ messages in thread
From: Xu Kuohai @ 2022-06-09  4:27 UTC (permalink / raw)
  To: Mark Rutland
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi, Wang ShaoBo,
	cj.chengjian, huawei.libin, xiexiuqi, liwei391

On 6/7/2022 12:35 AM, Mark Rutland wrote:
> On Thu, May 26, 2022 at 10:48:05PM +0800, Xu Kuohai wrote:
>> On 5/26/2022 6:06 PM, Mark Rutland wrote:
>>> On Thu, May 26, 2022 at 05:45:03PM +0800, Xu Kuohai wrote:
>>>> On 5/25/2022 9:38 PM, Mark Rutland wrote:
>>>>> On Wed, May 18, 2022 at 09:16:33AM -0400, Xu Kuohai wrote:
>>>>>> Add ftrace direct support for arm64.
>>>>>>
>>>>>> 1. When there is custom trampoline only, replace the fentry nop to a
>>>>>>    jump instruction that jumps directly to the custom trampoline.
>>>>>>
>>>>>> 2. When ftrace trampoline and custom trampoline coexist, jump from
>>>>>>    fentry to ftrace trampoline first, then jump to custom trampoline
>>>>>>    when ftrace trampoline exits. The current unused register
>>>>>>    pt_regs->orig_x0 is used as an intermediary for jumping from ftrace
>>>>>>    trampoline to custom trampoline.
>>>>>
>>>>> For those of us not all that familiar with BPF, can you explain *why* you want
>>>>> this? The above explains what the patch implements, but not why that's useful.
>>>>>
>>>>> e.g. is this just to avoid the overhead of the ops list processing in the
>>>>> regular ftrace code, or is the custom trampoline there to allow you to do
>>>>> something special?
>>>>
>>>> IIUC, ftrace direct call was designed to *remove* the unnecessary
>>>> overhead of saving regs completely [1][2].
>>>
>>> Ok. My plan is to get rid of most of the register saving generally, so I think
>>> that aspect can be solved without direct calls.
>> Looking forward to your new solution.
> 
> For the register saving rework, I have a WIP branch on my kernel.org repo:
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/ftrace/minimal-regs
>   git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/ftrace/minimal-regs
> 
> I'm working on that at the moment along with a per-callsite ops implementaiton
> that would avoid most of the need for custom trampolines (and work with branch
> range limitations):
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/ftrace/per-callsite-ops
>   git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/ftrace/per-callsite-ops
> 
>>>> [1]
>>>> https://lore.kernel.org/all/20191022175052.frjzlnjjfwwfov64@ast-mbp.dhcp.thefacebook.com/
>>>> [2] https://lore.kernel.org/all/20191108212834.594904349@goodmis.org/
>>>>
>>>> This patch itself is just a variant of [3].
>>>>
>>>> [3] https://lore.kernel.org/all/20191108213450.891579507@goodmis.org/
>>>>
>>>>>
>>>>> There is another patch series on the list from some of your colleagues which
>>>>> uses dynamic trampolines to try to avoid that ops list overhead, and it's not
>>>>> clear to me whether these are trying to solve the largely same problem or
>>>>> something different. That other thread is at:
>>>>>
>>>>>   https://lore.kernel.org/linux-arm-kernel/20220316100132.244849-1-bobo.shaobowang@huawei.com/
>>>>>
>>>>> ... and I've added the relevant parties to CC here, since there doesn't seem to
>>>>> be any overlap in the CC lists of the two threads.
>>>>
>>>> We're not working to solve the same problem. The trampoline introduced
>>>> in this series helps us to monitor kernel function or another bpf prog
>>>> with bpf, and also helps us to use bpf prog like a normal kernel
>>>> function pointer.
>>>
>>> Ok, but why is it necessary to have a special trampoline?
>>>
>>> Is that *just* to avoid overhead, or do you need to do something special that
>>> the regular trampoline won't do?
>>>
>>
>> Sorry for not explaining the problem. The main bpf prog accepts only a
>> single argument 'ctx' in r1, so to allow kernel code to call bpf prog
>> transparently, we need a trampoline to convert native calling convention
>> into BPF calling convention [1].
>>
>> [1] https://lore.kernel.org/bpf/20191114185720.1641606-5-ast@kernel.org/
> 
> Thanks for the pointer; I'll go page that in.
> 
>> For example,
>>
>> SEC("struct_ops/dctcp_state")
>> void BPF_PROG(dctcp_state, struct sock *sk, __u8 new_state)
>> {
>>     // do something
>> }
>>
>> The above bpf prog will be compiled to something like this:
>>
>> dctcp_state:
>>     r2 = *(u64 *)(r1 + 8)  // new_state
>>     r1 = *(u64 *)(r1 + 0)  // sk
>>     ...
>>
>> It accepts only one argument 'ctx' in r1, and loads the actual arugment
>> 'sk' and 'new_state' from r1 + 0 and r1 + 8, resepectively. So before
>> calling this prog, we need to construct 'ctx' and store its address to r1.
>>
>>>>>
>>>>> In that other thread I've suggested a general approach we could follow at:
>>>>>   
>>>>>   https://lore.kernel.org/linux-arm-kernel/YmGF%2FOpIhAF8YeVq@lakrids/
>>>>
>>>> Is it possible for a kernel function to take a long jump to common
>>>> trampoline when we get a huge kernel image?
>>>
>>> It is possible, but only where the kernel Image itself is massive and the .text
>>> section exceeeds 128MiB, at which point other things break anyway. Practically
>>> speaking, this doesn't happen for production kernels, or reasonable test
>>> kernels.
>>
>> So even for normal kernel functions, we need some way to construct and
>> destruct long jumps atomically and safely.
> 
> My point was that case is unrealistic for production kernels, and utterly
> broken anyway (and as below I intend to make ftrace detect this and mark itself
> as broken).
> 
> FWIW, an allmodconfig kernel built with GCC 12.1.0 has a ~30MB .text segment,
> so for realistic kernels we have plenty of headroom for normal functions to
> reach the in-kernel trampoline.
> 
>>> I've been meaning to add some logic to detect this at boot time and idsable
>>> ftrace (or at build time), since live patching would also be broken in that
>>> case.
>>>>>> As noted in that thread, I have a few concerns which equally apply here:
>>>>>
>>>>> * Due to the limited range of BL instructions, it's not always possible to
>>>>>   patch an ftrace call-site to branch to an arbitrary trampoline. The way this
>>>>>   works for ftrace today relies upon knowingthe set of trampolines at
>>>>>   compile-time, and allocating module PLTs for those, and that approach cannot
>>>>>   work reliably for dynanically allocated trampolines.
>>>>
>>>> Currently patch 5 returns -ENOTSUPP when long jump is detected, so no
>>>> bpf trampoline is constructed for out of range patch-site:
>>>>
>>>> if (is_long_jump(orig_call, image))
>>>> 	return -ENOTSUPP;
>>>
>>> Sure, my point is that in practice that means that (from the user's PoV) this
>>> may randomly fail to work, and I'd like something that we can ensure works
>>> consistently.
>>>
>>
>> OK, should I suspend this work until you finish refactoring ftrace?
> 
> Yes; I'd appreciate if we could hold on this for a bit.
> 
> I think with some ground work we can avoid most of the painful edge cases and
> might be able to avoid the need for custom trampolines.
> 

I'v read your WIP code, but unfortunately I didn't find any mechanism to
replace bpf trampoline in your code, sorry.

It looks like bpf trampoline and ftrace works can be done at the same
time. I think for now we can just attach bpf trampoline to bpf prog.
Once your ftrace work is done, we can add support for attaching bpf
trampoline to regular kernel function. Is this OK?

> Thanks,
> Mark.
> .

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 1/6] arm64: ftrace: Add ftrace direct call support
  2022-06-09  4:27               ` Xu Kuohai
@ 2022-08-09 17:03                 ` Florent Revest
  -1 siblings, 0 replies; 60+ messages in thread
From: Florent Revest @ 2022-08-09 17:03 UTC (permalink / raw)
  To: Xu Kuohai
  Cc: Mark Rutland, bpf, linux-arm-kernel, linux-kernel, netdev,
	linux-kselftest, Catalin Marinas, Will Deacon, Steven Rostedt,
	Ingo Molnar, Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi, Wang ShaoBo,
	cj.chengjian, huawei.libin, xiexiuqi, liwei391

On Thu, Jun 9, 2022 at 6:27 AM Xu Kuohai <xukuohai@huawei.com> wrote:
> On 6/7/2022 12:35 AM, Mark Rutland wrote:
> > On Thu, May 26, 2022 at 10:48:05PM +0800, Xu Kuohai wrote:
> >> On 5/26/2022 6:06 PM, Mark Rutland wrote:
> >>> On Thu, May 26, 2022 at 05:45:03PM +0800, Xu Kuohai wrote:
> >>>> On 5/25/2022 9:38 PM, Mark Rutland wrote:
> >>>>> On Wed, May 18, 2022 at 09:16:33AM -0400, Xu Kuohai wrote:
> >>>>>> As noted in that thread, I have a few concerns which equally apply here:
> >>>>>
> >>>>> * Due to the limited range of BL instructions, it's not always possible to
> >>>>>   patch an ftrace call-site to branch to an arbitrary trampoline. The way this
> >>>>>   works for ftrace today relies upon knowingthe set of trampolines at
> >>>>>   compile-time, and allocating module PLTs for those, and that approach cannot
> >>>>>   work reliably for dynanically allocated trampolines.
> >>>>
> >>>> Currently patch 5 returns -ENOTSUPP when long jump is detected, so no
> >>>> bpf trampoline is constructed for out of range patch-site:
> >>>>
> >>>> if (is_long_jump(orig_call, image))
> >>>>    return -ENOTSUPP;
> >>>
> >>> Sure, my point is that in practice that means that (from the user's PoV) this
> >>> may randomly fail to work, and I'd like something that we can ensure works
> >>> consistently.
> >>>
> >>
> >> OK, should I suspend this work until you finish refactoring ftrace?
> >
> > Yes; I'd appreciate if we could hold on this for a bit.
> >
> > I think with some ground work we can avoid most of the painful edge cases and
> > might be able to avoid the need for custom trampolines.
> >
>
> I'v read your WIP code, but unfortunately I didn't find any mechanism to
> replace bpf trampoline in your code, sorry.
>
> It looks like bpf trampoline and ftrace works can be done at the same
> time. I think for now we can just attach bpf trampoline to bpf prog.
> Once your ftrace work is done, we can add support for attaching bpf
> trampoline to regular kernel function. Is this OK?

Hey Mark and Xu! :)

I'm interested in this feature too and would be happy to help.

I've been trying to understand what you both have in mind to figure out a way
forward, please correct me if I got anything wrong! :)


It looks like, currently, there are three places where an indirection to BPF is
technically possible. Chronologically these are:

- the function's patchsite (currently there are 2 nops, this could become 4
  nops with Mark's series on per call-site ops)

- the ftrace ops (currently called by iterating over a global list but could be
  called more directly with Mark's series on per-call-site ops or by
  dynamically generated branches with Wang's series on dynamic trampolines)

- a ftrace trampoline tail call (currently, this is after restoring a full
  pt_regs but this could become an args only restoration with Mark's series on
  DYNAMIC_FTRACE_WITH_ARGS)


If we first consider the situation when only a BPF program is attached to a
kernel function:
- Using the patchsite for indirection (proposed by Xu, same as on x86)
   Pros:
   - We have BPF trampolines anyway because they are required for orthogonal
     features such as calling BPF programs as functions, so jumping into that
     existing JITed code is straightforward
   - This has the minimum overhead (eg: these trampolines only save the actual
     number of args used by the function in ctx and avoid indirect calls)
   Cons:
   - If the BPF trampoline is JITed outside BL's limits, attachment can
     randomly fail

- Using a ftrace op for indirection (proposed by Mark)
  Pros:
  - BPF doesn't need to care about BL's range, ftrace_caller will be in range
  Cons:
  - The ftrace trampoline would first save all args in an ftrace_regs only for
    the BPF op to then re-save them in a BPF ctx array (as per BPF calling
    convention) so we'd effectively have to do the work of saving args twice
  - BPF currently uses DYNAMIC_FTRACE_WITH_DIRECT_CALLS APIs. Either arm64
    should implement DIRECT_CALLS with... an indirect call :) (that is, the
    arch_ftrace_set_direct_caller op would turn back its ftrace_regs into
    arguments for the BPF trampoline) or BPF would need to use a different
    ftrace API just on arm64 (to define new ops, which, unless if they would be
    dynamically JITed, wouldn't be as performant as the existing BPF
    trampolines)

- Using a ftrace trampoline tail call for indirection (not discussed yet iiuc)
  Pros:
  - BPF also doesn't need to care about BL's range
  - This also leverages the existing BPF trampolines
  Cons:
  - This also does the work of saving/restoring arguments twice
  - DYNAMIC_FTRACE_WITH_DIRECT_CALLS depends on DYNAMIC_FTRACE_WITH_REGS now
    although in practice the registers kept by DYNAMIC_FTRACE_WITH_ARGS
    should be enough to call BPF trampolines

If we consider the situation when both ftrace ops and BPF programs are attached
to a kernel function:
- Using the patchsite for indirection can't solve this

- Using a ftrace op for indirection (proposed by Mark) or using a ftrace
  trampoline tail call as an indirection (proposed by Xu, same as on x86) have
  the same pros & cons as in the BPF only situation except that this time we
  pay the cost of registers saving twice for good reasons (we need args in both
  ftrace_regs and the BPF ctx array formats anyway)


Unless I'm missing something, it sounds like the following approach would work:
- Always patch patchsites with calls to ftrace trampolines (within BL ranges)
- Always go through ops and have arch_ftrace_set_direct_caller set
  ftrace_regs->direct_call (instead of pt_regs->orig_x0 in this patch)
- If ftrace_regs->direct_call != 0 at the end of the ftrace trampoline, tail
  call it

Once Mark's series on DYNAMIC_FTRACE_WITH_ARGS is merged, we would need to have
DYNAMIC_FTRACE_WITH_DIRECT_CALLS
  depend on DYNAMIC_FTRACE_WITH_REGS || DYNAMIC_FTRACE_WITH_ARGS
BPF trampolines (the only users of this API now) only care about args to the
attachment point anyway so I think this would work transparently ?

Once Mark's series on per-callsite ops is merged, the second step (going
through ops) would be significantly faster in the situation where only one
program is used, therefore one arch_ftrace_set_direct_caller op.

Once Wang's series on dynamic trampolines is merged, the second step (going
through ops) would also be significantly faster in the case when multiple ops
are attached.


What are your thoughts? If this sounds somewhat sane, I'm happy to help out
with the implementation as well :)

Thanks!
Florent

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 1/6] arm64: ftrace: Add ftrace direct call support
@ 2022-08-09 17:03                 ` Florent Revest
  0 siblings, 0 replies; 60+ messages in thread
From: Florent Revest @ 2022-08-09 17:03 UTC (permalink / raw)
  To: Xu Kuohai
  Cc: Mark Rutland, bpf, linux-arm-kernel, linux-kernel, netdev,
	linux-kselftest, Catalin Marinas, Will Deacon, Steven Rostedt,
	Ingo Molnar, Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi, Wang ShaoBo,
	cj.chengjian, huawei.libin, xiexiuqi, liwei391

On Thu, Jun 9, 2022 at 6:27 AM Xu Kuohai <xukuohai@huawei.com> wrote:
> On 6/7/2022 12:35 AM, Mark Rutland wrote:
> > On Thu, May 26, 2022 at 10:48:05PM +0800, Xu Kuohai wrote:
> >> On 5/26/2022 6:06 PM, Mark Rutland wrote:
> >>> On Thu, May 26, 2022 at 05:45:03PM +0800, Xu Kuohai wrote:
> >>>> On 5/25/2022 9:38 PM, Mark Rutland wrote:
> >>>>> On Wed, May 18, 2022 at 09:16:33AM -0400, Xu Kuohai wrote:
> >>>>>> As noted in that thread, I have a few concerns which equally apply here:
> >>>>>
> >>>>> * Due to the limited range of BL instructions, it's not always possible to
> >>>>>   patch an ftrace call-site to branch to an arbitrary trampoline. The way this
> >>>>>   works for ftrace today relies upon knowingthe set of trampolines at
> >>>>>   compile-time, and allocating module PLTs for those, and that approach cannot
> >>>>>   work reliably for dynanically allocated trampolines.
> >>>>
> >>>> Currently patch 5 returns -ENOTSUPP when long jump is detected, so no
> >>>> bpf trampoline is constructed for out of range patch-site:
> >>>>
> >>>> if (is_long_jump(orig_call, image))
> >>>>    return -ENOTSUPP;
> >>>
> >>> Sure, my point is that in practice that means that (from the user's PoV) this
> >>> may randomly fail to work, and I'd like something that we can ensure works
> >>> consistently.
> >>>
> >>
> >> OK, should I suspend this work until you finish refactoring ftrace?
> >
> > Yes; I'd appreciate if we could hold on this for a bit.
> >
> > I think with some ground work we can avoid most of the painful edge cases and
> > might be able to avoid the need for custom trampolines.
> >
>
> I'v read your WIP code, but unfortunately I didn't find any mechanism to
> replace bpf trampoline in your code, sorry.
>
> It looks like bpf trampoline and ftrace works can be done at the same
> time. I think for now we can just attach bpf trampoline to bpf prog.
> Once your ftrace work is done, we can add support for attaching bpf
> trampoline to regular kernel function. Is this OK?

Hey Mark and Xu! :)

I'm interested in this feature too and would be happy to help.

I've been trying to understand what you both have in mind to figure out a way
forward, please correct me if I got anything wrong! :)


It looks like, currently, there are three places where an indirection to BPF is
technically possible. Chronologically these are:

- the function's patchsite (currently there are 2 nops, this could become 4
  nops with Mark's series on per call-site ops)

- the ftrace ops (currently called by iterating over a global list but could be
  called more directly with Mark's series on per-call-site ops or by
  dynamically generated branches with Wang's series on dynamic trampolines)

- a ftrace trampoline tail call (currently, this is after restoring a full
  pt_regs but this could become an args only restoration with Mark's series on
  DYNAMIC_FTRACE_WITH_ARGS)


If we first consider the situation when only a BPF program is attached to a
kernel function:
- Using the patchsite for indirection (proposed by Xu, same as on x86)
   Pros:
   - We have BPF trampolines anyway because they are required for orthogonal
     features such as calling BPF programs as functions, so jumping into that
     existing JITed code is straightforward
   - This has the minimum overhead (eg: these trampolines only save the actual
     number of args used by the function in ctx and avoid indirect calls)
   Cons:
   - If the BPF trampoline is JITed outside BL's limits, attachment can
     randomly fail

- Using a ftrace op for indirection (proposed by Mark)
  Pros:
  - BPF doesn't need to care about BL's range, ftrace_caller will be in range
  Cons:
  - The ftrace trampoline would first save all args in an ftrace_regs only for
    the BPF op to then re-save them in a BPF ctx array (as per BPF calling
    convention) so we'd effectively have to do the work of saving args twice
  - BPF currently uses DYNAMIC_FTRACE_WITH_DIRECT_CALLS APIs. Either arm64
    should implement DIRECT_CALLS with... an indirect call :) (that is, the
    arch_ftrace_set_direct_caller op would turn back its ftrace_regs into
    arguments for the BPF trampoline) or BPF would need to use a different
    ftrace API just on arm64 (to define new ops, which, unless if they would be
    dynamically JITed, wouldn't be as performant as the existing BPF
    trampolines)

- Using a ftrace trampoline tail call for indirection (not discussed yet iiuc)
  Pros:
  - BPF also doesn't need to care about BL's range
  - This also leverages the existing BPF trampolines
  Cons:
  - This also does the work of saving/restoring arguments twice
  - DYNAMIC_FTRACE_WITH_DIRECT_CALLS depends on DYNAMIC_FTRACE_WITH_REGS now
    although in practice the registers kept by DYNAMIC_FTRACE_WITH_ARGS
    should be enough to call BPF trampolines

If we consider the situation when both ftrace ops and BPF programs are attached
to a kernel function:
- Using the patchsite for indirection can't solve this

- Using a ftrace op for indirection (proposed by Mark) or using a ftrace
  trampoline tail call as an indirection (proposed by Xu, same as on x86) have
  the same pros & cons as in the BPF only situation except that this time we
  pay the cost of registers saving twice for good reasons (we need args in both
  ftrace_regs and the BPF ctx array formats anyway)


Unless I'm missing something, it sounds like the following approach would work:
- Always patch patchsites with calls to ftrace trampolines (within BL ranges)
- Always go through ops and have arch_ftrace_set_direct_caller set
  ftrace_regs->direct_call (instead of pt_regs->orig_x0 in this patch)
- If ftrace_regs->direct_call != 0 at the end of the ftrace trampoline, tail
  call it

Once Mark's series on DYNAMIC_FTRACE_WITH_ARGS is merged, we would need to have
DYNAMIC_FTRACE_WITH_DIRECT_CALLS
  depend on DYNAMIC_FTRACE_WITH_REGS || DYNAMIC_FTRACE_WITH_ARGS
BPF trampolines (the only users of this API now) only care about args to the
attachment point anyway so I think this would work transparently ?

Once Mark's series on per-callsite ops is merged, the second step (going
through ops) would be significantly faster in the situation where only one
program is used, therefore one arch_ftrace_set_direct_caller op.

Once Wang's series on dynamic trampolines is merged, the second step (going
through ops) would also be significantly faster in the case when multiple ops
are attached.


What are your thoughts? If this sounds somewhat sane, I'm happy to help out
with the implementation as well :)

Thanks!
Florent

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 1/6] arm64: ftrace: Add ftrace direct call support
  2022-08-09 17:03                 ` Florent Revest
@ 2022-08-10  8:10                   ` Xu Kuohai
  -1 siblings, 0 replies; 60+ messages in thread
From: Xu Kuohai @ 2022-08-10  8:10 UTC (permalink / raw)
  To: Florent Revest
  Cc: Mark Rutland, bpf, linux-arm-kernel, linux-kernel, netdev,
	linux-kselftest, Catalin Marinas, Will Deacon, Steven Rostedt,
	Ingo Molnar, Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi, Wang ShaoBo,
	cj.chengjian, huawei.libin, xiexiuqi, liwei391

On 8/10/2022 1:03 AM, Florent Revest wrote:
> On Thu, Jun 9, 2022 at 6:27 AM Xu Kuohai <xukuohai@huawei.com> wrote:
>> On 6/7/2022 12:35 AM, Mark Rutland wrote:
>>> On Thu, May 26, 2022 at 10:48:05PM +0800, Xu Kuohai wrote:
>>>> On 5/26/2022 6:06 PM, Mark Rutland wrote:
>>>>> On Thu, May 26, 2022 at 05:45:03PM +0800, Xu Kuohai wrote:
>>>>>> On 5/25/2022 9:38 PM, Mark Rutland wrote:
>>>>>>> On Wed, May 18, 2022 at 09:16:33AM -0400, Xu Kuohai wrote:
>>>>>>>> As noted in that thread, I have a few concerns which equally apply here:
>>>>>>>
>>>>>>> * Due to the limited range of BL instructions, it's not always possible to
>>>>>>>    patch an ftrace call-site to branch to an arbitrary trampoline. The way this
>>>>>>>    works for ftrace today relies upon knowingthe set of trampolines at
>>>>>>>    compile-time, and allocating module PLTs for those, and that approach cannot
>>>>>>>    work reliably for dynanically allocated trampolines.
>>>>>>
>>>>>> Currently patch 5 returns -ENOTSUPP when long jump is detected, so no
>>>>>> bpf trampoline is constructed for out of range patch-site:
>>>>>>
>>>>>> if (is_long_jump(orig_call, image))
>>>>>>     return -ENOTSUPP;
>>>>>
>>>>> Sure, my point is that in practice that means that (from the user's PoV) this
>>>>> may randomly fail to work, and I'd like something that we can ensure works
>>>>> consistently.
>>>>>
>>>>
>>>> OK, should I suspend this work until you finish refactoring ftrace?
>>>
>>> Yes; I'd appreciate if we could hold on this for a bit.
>>>
>>> I think with some ground work we can avoid most of the painful edge cases and
>>> might be able to avoid the need for custom trampolines.
>>>
>>
>> I'v read your WIP code, but unfortunately I didn't find any mechanism to
>> replace bpf trampoline in your code, sorry.
>>
>> It looks like bpf trampoline and ftrace works can be done at the same
>> time. I think for now we can just attach bpf trampoline to bpf prog.
>> Once your ftrace work is done, we can add support for attaching bpf
>> trampoline to regular kernel function. Is this OK?
> 
> Hey Mark and Xu! :)
> 
> I'm interested in this feature too and would be happy to help.
> 
> I've been trying to understand what you both have in mind to figure out a way
> forward, please correct me if I got anything wrong! :)
> 
> 
> It looks like, currently, there are three places where an indirection to BPF is
> technically possible. Chronologically these are:
> 
> - the function's patchsite (currently there are 2 nops, this could become 4
>    nops with Mark's series on per call-site ops)
> 
> - the ftrace ops (currently called by iterating over a global list but could be
>    called more directly with Mark's series on per-call-site ops or by
>    dynamically generated branches with Wang's series on dynamic trampolines)
> 
> - a ftrace trampoline tail call (currently, this is after restoring a full
>    pt_regs but this could become an args only restoration with Mark's series on
>    DYNAMIC_FTRACE_WITH_ARGS)
> 
> 
> If we first consider the situation when only a BPF program is attached to a
> kernel function:
> - Using the patchsite for indirection (proposed by Xu, same as on x86)
>     Pros:
>     - We have BPF trampolines anyway because they are required for orthogonal
>       features such as calling BPF programs as functions, so jumping into that
>       existing JITed code is straightforward
>     - This has the minimum overhead (eg: these trampolines only save the actual
>       number of args used by the function in ctx and avoid indirect calls)
>     Cons:
>     - If the BPF trampoline is JITed outside BL's limits, attachment can
>       randomly fail
> 
> - Using a ftrace op for indirection (proposed by Mark)
>    Pros:
>    - BPF doesn't need to care about BL's range, ftrace_caller will be in range
>    Cons:
>    - The ftrace trampoline would first save all args in an ftrace_regs only for
>      the BPF op to then re-save them in a BPF ctx array (as per BPF calling
>      convention) so we'd effectively have to do the work of saving args twice
>    - BPF currently uses DYNAMIC_FTRACE_WITH_DIRECT_CALLS APIs. Either arm64
>      should implement DIRECT_CALLS with... an indirect call :) (that is, the
>      arch_ftrace_set_direct_caller op would turn back its ftrace_regs into
>      arguments for the BPF trampoline) or BPF would need to use a different
>      ftrace API just on arm64 (to define new ops, which, unless if they would be
>      dynamically JITed, wouldn't be as performant as the existing BPF
>      trampolines)
> 
> - Using a ftrace trampoline tail call for indirection (not discussed yet iiuc)
>    Pros:
>    - BPF also doesn't need to care about BL's range
>    - This also leverages the existing BPF trampolines
>    Cons:
>    - This also does the work of saving/restoring arguments twice
>    - DYNAMIC_FTRACE_WITH_DIRECT_CALLS depends on DYNAMIC_FTRACE_WITH_REGS now
>      although in practice the registers kept by DYNAMIC_FTRACE_WITH_ARGS
>      should be enough to call BPF trampolines
> 
> If we consider the situation when both ftrace ops and BPF programs are attached
> to a kernel function:
> - Using the patchsite for indirection can't solve this
> 
> - Using a ftrace op for indirection (proposed by Mark) or using a ftrace
>    trampoline tail call as an indirection (proposed by Xu, same as on x86) have
>    the same pros & cons as in the BPF only situation except that this time we
>    pay the cost of registers saving twice for good reasons (we need args in both
>    ftrace_regs and the BPF ctx array formats anyway)
> 
> 
> Unless I'm missing something, it sounds like the following approach would work:
> - Always patch patchsites with calls to ftrace trampolines (within BL ranges)
> - Always go through ops and have arch_ftrace_set_direct_caller set
>    ftrace_regs->direct_call (instead of pt_regs->orig_x0 in this patch)
> - If ftrace_regs->direct_call != 0 at the end of the ftrace trampoline, tail
>    call it
> 
> Once Mark's series on DYNAMIC_FTRACE_WITH_ARGS is merged, we would need to have
> DYNAMIC_FTRACE_WITH_DIRECT_CALLS
>    depend on DYNAMIC_FTRACE_WITH_REGS || DYNAMIC_FTRACE_WITH_ARGS
> BPF trampolines (the only users of this API now) only care about args to the
> attachment point anyway so I think this would work transparently ?
> 
> Once Mark's series on per-callsite ops is merged, the second step (going
> through ops) would be significantly faster in the situation where only one
> program is used, therefore one arch_ftrace_set_direct_caller op.
> 
> Once Wang's series on dynamic trampolines is merged, the second step (going
> through ops) would also be significantly faster in the case when multiple ops
> are attached.
> 
> 
> What are your thoughts? If this sounds somewhat sane, I'm happy to help out
> with the implementation as well :)
> 

Hi Florent,

I'm struggling with how to attach bpf trampoline to regular kernel functions. I
think your suggestion is fine. Thanks for the help!

> Thanks!
> Florent
> .


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH bpf-next v5 1/6] arm64: ftrace: Add ftrace direct call support
@ 2022-08-10  8:10                   ` Xu Kuohai
  0 siblings, 0 replies; 60+ messages in thread
From: Xu Kuohai @ 2022-08-10  8:10 UTC (permalink / raw)
  To: Florent Revest
  Cc: Mark Rutland, bpf, linux-arm-kernel, linux-kernel, netdev,
	linux-kselftest, Catalin Marinas, Will Deacon, Steven Rostedt,
	Ingo Molnar, Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, David S . Miller, Hideaki YOSHIFUJI,
	David Ahern, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
	hpa, Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi, Wang ShaoBo,
	cj.chengjian, huawei.libin, xiexiuqi, liwei391

On 8/10/2022 1:03 AM, Florent Revest wrote:
> On Thu, Jun 9, 2022 at 6:27 AM Xu Kuohai <xukuohai@huawei.com> wrote:
>> On 6/7/2022 12:35 AM, Mark Rutland wrote:
>>> On Thu, May 26, 2022 at 10:48:05PM +0800, Xu Kuohai wrote:
>>>> On 5/26/2022 6:06 PM, Mark Rutland wrote:
>>>>> On Thu, May 26, 2022 at 05:45:03PM +0800, Xu Kuohai wrote:
>>>>>> On 5/25/2022 9:38 PM, Mark Rutland wrote:
>>>>>>> On Wed, May 18, 2022 at 09:16:33AM -0400, Xu Kuohai wrote:
>>>>>>>> As noted in that thread, I have a few concerns which equally apply here:
>>>>>>>
>>>>>>> * Due to the limited range of BL instructions, it's not always possible to
>>>>>>>    patch an ftrace call-site to branch to an arbitrary trampoline. The way this
>>>>>>>    works for ftrace today relies upon knowingthe set of trampolines at
>>>>>>>    compile-time, and allocating module PLTs for those, and that approach cannot
>>>>>>>    work reliably for dynanically allocated trampolines.
>>>>>>
>>>>>> Currently patch 5 returns -ENOTSUPP when long jump is detected, so no
>>>>>> bpf trampoline is constructed for out of range patch-site:
>>>>>>
>>>>>> if (is_long_jump(orig_call, image))
>>>>>>     return -ENOTSUPP;
>>>>>
>>>>> Sure, my point is that in practice that means that (from the user's PoV) this
>>>>> may randomly fail to work, and I'd like something that we can ensure works
>>>>> consistently.
>>>>>
>>>>
>>>> OK, should I suspend this work until you finish refactoring ftrace?
>>>
>>> Yes; I'd appreciate if we could hold on this for a bit.
>>>
>>> I think with some ground work we can avoid most of the painful edge cases and
>>> might be able to avoid the need for custom trampolines.
>>>
>>
>> I'v read your WIP code, but unfortunately I didn't find any mechanism to
>> replace bpf trampoline in your code, sorry.
>>
>> It looks like bpf trampoline and ftrace works can be done at the same
>> time. I think for now we can just attach bpf trampoline to bpf prog.
>> Once your ftrace work is done, we can add support for attaching bpf
>> trampoline to regular kernel function. Is this OK?
> 
> Hey Mark and Xu! :)
> 
> I'm interested in this feature too and would be happy to help.
> 
> I've been trying to understand what you both have in mind to figure out a way
> forward, please correct me if I got anything wrong! :)
> 
> 
> It looks like, currently, there are three places where an indirection to BPF is
> technically possible. Chronologically these are:
> 
> - the function's patchsite (currently there are 2 nops, this could become 4
>    nops with Mark's series on per call-site ops)
> 
> - the ftrace ops (currently called by iterating over a global list but could be
>    called more directly with Mark's series on per-call-site ops or by
>    dynamically generated branches with Wang's series on dynamic trampolines)
> 
> - a ftrace trampoline tail call (currently, this is after restoring a full
>    pt_regs but this could become an args only restoration with Mark's series on
>    DYNAMIC_FTRACE_WITH_ARGS)
> 
> 
> If we first consider the situation when only a BPF program is attached to a
> kernel function:
> - Using the patchsite for indirection (proposed by Xu, same as on x86)
>     Pros:
>     - We have BPF trampolines anyway because they are required for orthogonal
>       features such as calling BPF programs as functions, so jumping into that
>       existing JITed code is straightforward
>     - This has the minimum overhead (eg: these trampolines only save the actual
>       number of args used by the function in ctx and avoid indirect calls)
>     Cons:
>     - If the BPF trampoline is JITed outside BL's limits, attachment can
>       randomly fail
> 
> - Using a ftrace op for indirection (proposed by Mark)
>    Pros:
>    - BPF doesn't need to care about BL's range, ftrace_caller will be in range
>    Cons:
>    - The ftrace trampoline would first save all args in an ftrace_regs only for
>      the BPF op to then re-save them in a BPF ctx array (as per BPF calling
>      convention) so we'd effectively have to do the work of saving args twice
>    - BPF currently uses DYNAMIC_FTRACE_WITH_DIRECT_CALLS APIs. Either arm64
>      should implement DIRECT_CALLS with... an indirect call :) (that is, the
>      arch_ftrace_set_direct_caller op would turn back its ftrace_regs into
>      arguments for the BPF trampoline) or BPF would need to use a different
>      ftrace API just on arm64 (to define new ops, which, unless if they would be
>      dynamically JITed, wouldn't be as performant as the existing BPF
>      trampolines)
> 
> - Using a ftrace trampoline tail call for indirection (not discussed yet iiuc)
>    Pros:
>    - BPF also doesn't need to care about BL's range
>    - This also leverages the existing BPF trampolines
>    Cons:
>    - This also does the work of saving/restoring arguments twice
>    - DYNAMIC_FTRACE_WITH_DIRECT_CALLS depends on DYNAMIC_FTRACE_WITH_REGS now
>      although in practice the registers kept by DYNAMIC_FTRACE_WITH_ARGS
>      should be enough to call BPF trampolines
> 
> If we consider the situation when both ftrace ops and BPF programs are attached
> to a kernel function:
> - Using the patchsite for indirection can't solve this
> 
> - Using a ftrace op for indirection (proposed by Mark) or using a ftrace
>    trampoline tail call as an indirection (proposed by Xu, same as on x86) have
>    the same pros & cons as in the BPF only situation except that this time we
>    pay the cost of registers saving twice for good reasons (we need args in both
>    ftrace_regs and the BPF ctx array formats anyway)
> 
> 
> Unless I'm missing something, it sounds like the following approach would work:
> - Always patch patchsites with calls to ftrace trampolines (within BL ranges)
> - Always go through ops and have arch_ftrace_set_direct_caller set
>    ftrace_regs->direct_call (instead of pt_regs->orig_x0 in this patch)
> - If ftrace_regs->direct_call != 0 at the end of the ftrace trampoline, tail
>    call it
> 
> Once Mark's series on DYNAMIC_FTRACE_WITH_ARGS is merged, we would need to have
> DYNAMIC_FTRACE_WITH_DIRECT_CALLS
>    depend on DYNAMIC_FTRACE_WITH_REGS || DYNAMIC_FTRACE_WITH_ARGS
> BPF trampolines (the only users of this API now) only care about args to the
> attachment point anyway so I think this would work transparently ?
> 
> Once Mark's series on per-callsite ops is merged, the second step (going
> through ops) would be significantly faster in the situation where only one
> program is used, therefore one arch_ftrace_set_direct_caller op.
> 
> Once Wang's series on dynamic trampolines is merged, the second step (going
> through ops) would also be significantly faster in the case when multiple ops
> are attached.
> 
> 
> What are your thoughts? If this sounds somewhat sane, I'm happy to help out
> with the implementation as well :)
> 

Hi Florent,

I'm struggling with how to attach bpf trampoline to regular kernel functions. I
think your suggestion is fine. Thanks for the help!

> Thanks!
> Florent
> .


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2022-08-10  8:12 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-18 13:16 [PATCH bpf-next v5 0/6] bpf trampoline for arm64 Xu Kuohai
2022-05-18 13:16 ` Xu Kuohai
2022-05-18 13:16 ` [PATCH bpf-next v5 1/6] arm64: ftrace: Add ftrace direct call support Xu Kuohai
2022-05-18 13:16   ` Xu Kuohai
2022-05-23  1:39   ` KP Singh
2022-05-23  1:39     ` KP Singh
2022-05-25 13:38   ` Mark Rutland
2022-05-25 13:38     ` Mark Rutland
2022-05-26  9:45     ` Xu Kuohai
2022-05-26  9:45       ` Xu Kuohai
2022-05-26 10:06       ` Mark Rutland
2022-05-26 10:06         ` Mark Rutland
2022-05-26 14:48         ` Xu Kuohai
2022-05-26 14:48           ` Xu Kuohai
2022-06-06 16:35           ` Mark Rutland
2022-06-06 16:35             ` Mark Rutland
2022-06-09  4:27             ` Xu Kuohai
2022-06-09  4:27               ` Xu Kuohai
2022-08-09 17:03               ` Florent Revest
2022-08-09 17:03                 ` Florent Revest
2022-08-10  8:10                 ` Xu Kuohai
2022-08-10  8:10                   ` Xu Kuohai
2022-05-18 13:16 ` [PATCH bpf-next v5 2/6] ftrace: Fix deadloop caused by direct call in ftrace selftest Xu Kuohai
2022-05-18 13:16   ` Xu Kuohai
2022-05-25 13:43   ` Mark Rutland
2022-05-25 13:43     ` Mark Rutland
2022-05-26  9:45     ` Xu Kuohai
2022-05-26  9:45       ` Xu Kuohai
2022-05-18 13:16 ` [PATCH bpf-next v5 3/6] bpf: Remove is_valid_bpf_tramp_flags() Xu Kuohai
2022-05-18 13:16   ` Xu Kuohai
2022-05-25 13:45   ` Mark Rutland
2022-05-25 13:45     ` Mark Rutland
2022-05-26  9:45     ` Xu Kuohai
2022-05-26  9:45       ` Xu Kuohai
2022-05-26 10:12       ` Mark Rutland
2022-05-26 10:12         ` Mark Rutland
2022-05-26 14:46         ` Xu Kuohai
2022-05-26 14:46           ` Xu Kuohai
2022-05-18 13:16 ` [PATCH bpf-next v5 4/6] bpf, arm64: Impelment bpf_arch_text_poke() for arm64 Xu Kuohai
2022-05-18 13:16   ` Xu Kuohai
2022-05-23  1:41   ` KP Singh
2022-05-23  1:41     ` KP Singh
2022-05-25 14:10   ` Mark Rutland
2022-05-25 14:10     ` Mark Rutland
2022-05-26  9:45     ` Xu Kuohai
2022-05-26  9:45       ` Xu Kuohai
2022-05-26 10:34       ` Mark Rutland
2022-05-26 10:34         ` Mark Rutland
2022-05-26 14:47         ` Xu Kuohai
2022-05-26 14:47           ` Xu Kuohai
2022-05-18 13:16 ` [PATCH bpf-next v5 5/6] bpf, arm64: bpf trampoline " Xu Kuohai
2022-05-18 13:16   ` Xu Kuohai
2022-05-20 21:18   ` Alexei Starovoitov
2022-05-20 21:18     ` Alexei Starovoitov
2022-05-23 16:09     ` Mark Rutland
2022-05-23 16:09       ` Mark Rutland
2022-05-23  1:36   ` KP Singh
2022-05-23  1:36     ` KP Singh
2022-05-18 13:16 ` [PATCH bpf-next v5 6/6] selftests/bpf: Fix trivial typo in fentry_fexit.c Xu Kuohai
2022-05-18 13:16   ` Xu Kuohai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.