All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH bpf-next 0/3] bpf, x64: Fix tailcall hierarchy
@ 2023-10-05 14:58 Leon Hwang
  2023-10-05 14:58 ` [RFC PATCH bpf-next 1/3] " Leon Hwang
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Leon Hwang @ 2023-10-05 14:58 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, maciej.fijalkowski, jakub, iii, hengqi.chen,
	hffilwlqm

This patchset fixes a tailcall hierarchy issue.

The issue is confirmed in the discussions of "bpf, x64: Fix tailcall infinite
loop"[0].

For resolving details, please read the next patch.

Currently, I only resolve this issue on x86. The ones on arm64, s390x and
loongarch are waiting to be resolved too. So, the ci pipeline fails to run for
this issue fixing.

[0] https://lore.kernel.org/bpf/6203dd01-789d-f02c-5293-def4c1b18aef@gmail.com/

Leon Hwang (3):
  bpf, x64: Fix tailcall hierarchy
  bpf, x64: Load tail_call_cnt pointer
  selftests/bpf: Add testcases for tailcall hierarchy fixing

 arch/x86/net/bpf_jit_comp.c                   | 136 ++++---
 .../selftests/bpf/prog_tests/tailcalls.c      | 384 ++++++++++++++++++
 .../bpf/progs/tailcall_bpf2bpf_hierarchy1.c   |  34 ++
 .../bpf/progs/tailcall_bpf2bpf_hierarchy2.c   |  55 +++
 .../bpf/progs/tailcall_bpf2bpf_hierarchy3.c   |  46 +++
 5 files changed, 603 insertions(+), 52 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy1.c
 create mode 100644 tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy2.c
 create mode 100644 tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy3.c


base-commit: 2e7e9faf9a5d46788bf7a4d07c6c1caf57367d23
-- 
2.41.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH bpf-next 1/3] bpf, x64: Fix tailcall hierarchy
  2023-10-05 14:58 [RFC PATCH bpf-next 0/3] bpf, x64: Fix tailcall hierarchy Leon Hwang
@ 2023-10-05 14:58 ` Leon Hwang
  2023-10-05 18:05   ` Stanislav Fomichev
  2023-10-05 14:58 ` [RFC PATCH bpf-next 2/3] bpf, x64: Load tail_call_cnt pointer Leon Hwang
  2023-10-05 14:58 ` [RFC PATCH bpf-next 3/3] selftests/bpf: Add testcases for tailcall hierarchy fixing Leon Hwang
  2 siblings, 1 reply; 8+ messages in thread
From: Leon Hwang @ 2023-10-05 14:58 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, maciej.fijalkowski, jakub, iii, hengqi.chen,
	hffilwlqm

From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and tailcall
handling in JIT"), the tailcall on x64 works better than before.

From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF subprograms
for x64 JIT"), tailcall is able to run in BPF subprograms on x64.

How about:

1. More than 1 subprograms are called in a bpf program.
2. The tailcalls in the subprograms call the bpf program.

Because of missing tail_call_cnt back-propagation, a tailcall hierarchy
comes up. And MAX_TAIL_CALL_CNT limit does not work for this case.

As we know, in tail call context, the tail_call_cnt propagates by stack
and rax register between BPF subprograms. So, propagating tail_call_cnt
pointer by stack and rax register makes tail_call_cnt as like a global
variable, in order to make MAX_TAIL_CALL_CNT limit works for tailcall
hierarchy cases.

Before jumping to other bpf prog, load tail_call_cnt from the pointer
and then compare with MAX_TAIL_CALL_CNT. Finally, increment
tail_call_cnt by the pointer.

But, where does tail_call_cnt store?

It stores on the stack of uppest-hierarchy-layer bpf prog, like

 |  STACK  |
 +---------+ RBP
 |         |
 |         |
 |         |
 | tcc_ptr |
 |   tcc   |
 |   rbx   |
 +---------+ RSP

Why not back-propagate tail_call_cnt?

It's because it's vulnerable to back-propagate it. It's unable to work
well with the following case.

int prog1();
int prog2();

prog1 is tail caller, and prog2 is tail callee. If we do back-propagate
tail_call_cnt at the epilogue of prog2, can prog2 run standalone at the
same time? The answer is NO. Otherwise, there will be a register to be
polluted, which will make kernel crash.

Can tail_call_cnt store at other place instead of the stack of bpf prog?

I'm not able to infer a better place to store tail_call_cnt. It's not a
working inference to store it at ctx or on the stack of bpf prog's
caller.

Fixes: ebf7d1f508a7 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT")
Fixes: e411901c0b77 ("bpf: allow for tailcalls in BPF subprograms for x64 JIT")
Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
---
 arch/x86/net/bpf_jit_comp.c | 120 +++++++++++++++++++++++-------------
 1 file changed, 76 insertions(+), 44 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 8c10d9abc2394..8ad6368353c2b 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -256,7 +256,7 @@ struct jit_context {
 /* Number of bytes emit_patch() needs to generate instructions */
 #define X86_PATCH_SIZE		5
 /* Number of bytes that will be skipped on tailcall */
-#define X86_TAIL_CALL_OFFSET	(11 + ENDBR_INSN_SIZE)
+#define X86_TAIL_CALL_OFFSET	(24 + ENDBR_INSN_SIZE)
 
 static void push_r12(u8 **pprog)
 {
@@ -304,6 +304,25 @@ static void pop_callee_regs(u8 **pprog, bool *callee_regs_used)
 	*pprog = prog;
 }
 
+static void emit_nops(u8 **pprog, int len)
+{
+	u8 *prog = *pprog;
+	int i, noplen;
+
+	while (len > 0) {
+		noplen = len;
+
+		if (noplen > ASM_NOP_MAX)
+			noplen = ASM_NOP_MAX;
+
+		for (i = 0; i < noplen; i++)
+			EMIT1(x86_nops[noplen][i]);
+		len -= noplen;
+	}
+
+	*pprog = prog;
+}
+
 /*
  * Emit x86-64 prologue code for BPF program.
  * bpf_tail_call helper will skip the first X86_TAIL_CALL_OFFSET bytes
@@ -313,24 +332,15 @@ static void emit_prologue(u8 **pprog, u32 stack_depth, bool ebpf_from_cbpf,
 			  bool tail_call_reachable, bool is_subprog,
 			  bool is_exception_cb)
 {
+	int tcc_ptr_off = round_up(stack_depth, 8) + 8;
+	int tcc_off = tcc_ptr_off + 8;
 	u8 *prog = *pprog;
 
 	/* BPF trampoline can be made to work without these nops,
 	 * but let's waste 5 bytes for now and optimize later
 	 */
 	EMIT_ENDBR();
-	memcpy(prog, x86_nops[5], X86_PATCH_SIZE);
-	prog += X86_PATCH_SIZE;
-	if (!ebpf_from_cbpf) {
-		if (tail_call_reachable && !is_subprog)
-			/* When it's the entry of the whole tailcall context,
-			 * zeroing rax means initialising tail_call_cnt.
-			 */
-			EMIT2(0x31, 0xC0); /* xor eax, eax */
-		else
-			/* Keep the same instruction layout. */
-			EMIT2(0x66, 0x90); /* nop2 */
-	}
+	emit_nops(&prog, X86_PATCH_SIZE);
 	/* Exception callback receives FP as third parameter */
 	if (is_exception_cb) {
 		EMIT3(0x48, 0x89, 0xF4); /* mov rsp, rsi */
@@ -347,15 +357,52 @@ static void emit_prologue(u8 **pprog, u32 stack_depth, bool ebpf_from_cbpf,
 		EMIT1(0x55);             /* push rbp */
 		EMIT3(0x48, 0x89, 0xE5); /* mov rbp, rsp */
 	}
+	if (!ebpf_from_cbpf) {
+		if (tail_call_reachable && !is_subprog) {
+			/* Make rax as ptr that points to tail_call_cnt. */
+			EMIT3(0x48, 0x89, 0xE8);          /* mov rax, rbp */
+			EMIT2_off32(0x48, 0x2D, tcc_off); /* sub rax, tcc_off */
+			/* When it's the entry of the whole tail call context,
+			 * storing 0 means initialising tail_call_cnt.
+			 */
+			EMIT2_off32(0xC7, 0x00, 0);       /* mov dword ptr [rax], 0 */
+		} else {
+			/* Keep the same instruction layout. */
+			emit_nops(&prog, 3);
+			emit_nops(&prog, 6);
+			emit_nops(&prog, 6);
+		}
+	}
 
 	/* X86_TAIL_CALL_OFFSET is here */
 	EMIT_ENDBR();
 
+	if (tail_call_reachable) {
+		/* Here, rax is tail_call_cnt_ptr. */
+		if (!is_subprog) {
+			/* Because pushing tail_call_cnt_ptr may cover tail_call_cnt,
+			 * it's required to store tail_call_cnt before storing
+			 * tail_call_cnt_ptr.
+			 */
+			EMIT1(0x50);                       /* push rax */
+			EMIT2(0x8B, 0x00);                 /* mov eax, dword ptr [rax] */
+			EMIT2_off32(0x89, 0x85, -tcc_off); /* mov dword ptr [rbp - tcc_off], eax */
+			EMIT1(0x58);                       /* pop rax */
+			/* mov qword ptr [rbp - tcc_ptr_off], rax */
+			EMIT3_off32(0x48, 0x89, 0x85, -tcc_ptr_off);
+		} else {
+			/* As for subprog, tail_call_cnt is meaningless. Storing
+			 * tail_call_cnt_ptr is enough.
+			 */
+			/* mov qword ptr [rbp - tcc_ptr_off], rax */
+			EMIT3_off32(0x48, 0x89, 0x85, -tcc_ptr_off);
+		}
+		/* Reserve 16 bytes for tail_call_cnt_ptr and tail_call_cnt. */
+		stack_depth += 16;
+	}
 	/* sub rsp, rounded_stack_depth */
 	if (stack_depth)
 		EMIT3_off32(0x48, 0x81, 0xEC, round_up(stack_depth, 8));
-	if (tail_call_reachable)
-		EMIT1(0x50);         /* push rax */
 	*pprog = prog;
 }
 
@@ -510,7 +557,7 @@ static void emit_bpf_tail_call_indirect(struct bpf_prog *bpf_prog,
 					u32 stack_depth, u8 *ip,
 					struct jit_context *ctx)
 {
-	int tcc_off = -4 - round_up(stack_depth, 8);
+	int tcc_ptr_off = -8 - round_up(stack_depth, 8);
 	u8 *prog = *pprog, *start = *pprog;
 	int offset;
 
@@ -535,13 +582,12 @@ static void emit_bpf_tail_call_indirect(struct bpf_prog *bpf_prog,
 	 * if (tail_call_cnt++ >= MAX_TAIL_CALL_CNT)
 	 *	goto out;
 	 */
-	EMIT2_off32(0x8B, 0x85, tcc_off);         /* mov eax, dword ptr [rbp - tcc_off] */
-	EMIT3(0x83, 0xF8, MAX_TAIL_CALL_CNT);     /* cmp eax, MAX_TAIL_CALL_CNT */
+	EMIT3_off32(0x48, 0x8B, 0x85, tcc_ptr_off); /* mov rax, qword ptr [rbp - tcc_ptr_off] */
+	EMIT3(0x83, 0x38, MAX_TAIL_CALL_CNT);     /* cmp dword ptr [rax], MAX_TAIL_CALL_CNT */
 
 	offset = ctx->tail_call_indirect_label - (prog + 2 - start);
 	EMIT2(X86_JAE, offset);                   /* jae out */
-	EMIT3(0x83, 0xC0, 0x01);                  /* add eax, 1 */
-	EMIT2_off32(0x89, 0x85, tcc_off);         /* mov dword ptr [rbp - tcc_off], eax */
+	EMIT3(0x83, 0x00, 0x01);                  /* add dword ptr [rax], 1 */
 
 	/* prog = array->ptrs[index]; */
 	EMIT4_off32(0x48, 0x8B, 0x8C, 0xD6,       /* mov rcx, [rsi + rdx * 8 + offsetof(...)] */
@@ -563,6 +609,9 @@ static void emit_bpf_tail_call_indirect(struct bpf_prog *bpf_prog,
 		pop_callee_regs(&prog, callee_regs_used);
 	}
 
+	/* pop tail_call_cnt */
+	EMIT1(0x58);                              /* pop rax */
+	/* pop tail_call_cnt_ptr */
 	EMIT1(0x58);                              /* pop rax */
 	if (stack_depth)
 		EMIT3_off32(0x48, 0x81, 0xC4,     /* add rsp, sd */
@@ -591,7 +640,7 @@ static void emit_bpf_tail_call_direct(struct bpf_prog *bpf_prog,
 				      bool *callee_regs_used, u32 stack_depth,
 				      struct jit_context *ctx)
 {
-	int tcc_off = -4 - round_up(stack_depth, 8);
+	int tcc_ptr_off = -8 - round_up(stack_depth, 8);
 	u8 *prog = *pprog, *start = *pprog;
 	int offset;
 
@@ -599,13 +648,12 @@ static void emit_bpf_tail_call_direct(struct bpf_prog *bpf_prog,
 	 * if (tail_call_cnt++ >= MAX_TAIL_CALL_CNT)
 	 *	goto out;
 	 */
-	EMIT2_off32(0x8B, 0x85, tcc_off);             /* mov eax, dword ptr [rbp - tcc_off] */
-	EMIT3(0x83, 0xF8, MAX_TAIL_CALL_CNT);         /* cmp eax, MAX_TAIL_CALL_CNT */
+	EMIT3_off32(0x48, 0x8B, 0x85, tcc_ptr_off);   /* mov rax, qword ptr [rbp - tcc_ptr_off] */
+	EMIT3(0x83, 0x38, MAX_TAIL_CALL_CNT);         /* cmp dword ptr [rax], MAX_TAIL_CALL_CNT */
 
 	offset = ctx->tail_call_direct_label - (prog + 2 - start);
 	EMIT2(X86_JAE, offset);                       /* jae out */
-	EMIT3(0x83, 0xC0, 0x01);                      /* add eax, 1 */
-	EMIT2_off32(0x89, 0x85, tcc_off);             /* mov dword ptr [rbp - tcc_off], eax */
+	EMIT3(0x83, 0x00, 0x01);                      /* add dword ptr [rax], 1 */
 
 	poke->tailcall_bypass = ip + (prog - start);
 	poke->adj_off = X86_TAIL_CALL_OFFSET;
@@ -622,6 +670,9 @@ static void emit_bpf_tail_call_direct(struct bpf_prog *bpf_prog,
 		pop_callee_regs(&prog, callee_regs_used);
 	}
 
+	/* pop tail_call_cnt */
+	EMIT1(0x58);                                  /* pop rax */
+	/* pop tail_call_cnt_ptr */
 	EMIT1(0x58);                                  /* pop rax */
 	if (stack_depth)
 		EMIT3_off32(0x48, 0x81, 0xC4, round_up(stack_depth, 8));
@@ -989,25 +1040,6 @@ static void detect_reg_usage(struct bpf_insn *insn, int insn_cnt,
 	}
 }
 
-static void emit_nops(u8 **pprog, int len)
-{
-	u8 *prog = *pprog;
-	int i, noplen;
-
-	while (len > 0) {
-		noplen = len;
-
-		if (noplen > ASM_NOP_MAX)
-			noplen = ASM_NOP_MAX;
-
-		for (i = 0; i < noplen; i++)
-			EMIT1(x86_nops[noplen][i]);
-		len -= noplen;
-	}
-
-	*pprog = prog;
-}
-
 /* emit the 3-byte VEX prefix
  *
  * r: same as rex.r, extra bit for ModRM reg field
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH bpf-next 2/3] bpf, x64: Load tail_call_cnt pointer
  2023-10-05 14:58 [RFC PATCH bpf-next 0/3] bpf, x64: Fix tailcall hierarchy Leon Hwang
  2023-10-05 14:58 ` [RFC PATCH bpf-next 1/3] " Leon Hwang
@ 2023-10-05 14:58 ` Leon Hwang
  2023-10-05 14:58 ` [RFC PATCH bpf-next 3/3] selftests/bpf: Add testcases for tailcall hierarchy fixing Leon Hwang
  2 siblings, 0 replies; 8+ messages in thread
From: Leon Hwang @ 2023-10-05 14:58 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, maciej.fijalkowski, jakub, iii, hengqi.chen,
	hffilwlqm

Rename RESTORE_TAIL_CALL_CNT() to LOAD_TAIL_CALL_CNT_PTR().

With previous commit, rax is used to propagate tail_call_cnt pointer
instead of tail_call_cnt. So, RESTORE_TAIL_CALL_CNT() is misleading.

Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
---
 arch/x86/net/bpf_jit_comp.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 8ad6368353c2b..7a8397a60ea1e 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1102,7 +1102,7 @@ static void emit_shiftx(u8 **pprog, u32 dst_reg, u8 src_reg, bool is64, u8 op)
 #define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp)))
 
 /* mov rax, qword ptr [rbp - rounded_stack_depth - 8] */
-#define RESTORE_TAIL_CALL_CNT(stack)				\
+#define LOAD_TAIL_CALL_CNT_PTR(stack)				\
 	EMIT3_off32(0x48, 0x8B, 0x85, -round_up(stack, 8) - 8)
 
 static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image,
@@ -1722,7 +1722,7 @@ st:			if (is_imm8(insn->off))
 
 			func = (u8 *) __bpf_call_base + imm32;
 			if (tail_call_reachable) {
-				RESTORE_TAIL_CALL_CNT(bpf_prog->aux->stack_depth);
+				LOAD_TAIL_CALL_CNT_PTR(bpf_prog->aux->stack_depth);
 				if (!imm32)
 					return -EINVAL;
 				offs = 7 + x86_call_depth_emit_accounting(&prog, func);
@@ -2624,10 +2624,10 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
 		save_args(m, &prog, arg_stack_off, true);
 
 		if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
-			/* Before calling the original function, restore the
-			 * tail_call_cnt from stack to rax.
+			/* Before calling the original function, load the
+			 * tail_call_cnt_ptr to rax.
 			 */
-			RESTORE_TAIL_CALL_CNT(stack_size);
+			LOAD_TAIL_CALL_CNT_PTR(stack_size);
 
 		if (flags & BPF_TRAMP_F_ORIG_STACK) {
 			emit_ldx(&prog, BPF_DW, BPF_REG_6, BPF_REG_FP, 8);
@@ -2683,10 +2683,10 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
 			goto cleanup;
 		}
 	} else if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
-		/* Before running the original function, restore the
-		 * tail_call_cnt from stack to rax.
+		/* Before running the original function, load the
+		 * tail_call_cnt_ptr to rax.
 		 */
-		RESTORE_TAIL_CALL_CNT(stack_size);
+		LOAD_TAIL_CALL_CNT_PTR(stack_size);
 
 	/* restore return value of orig_call or fentry prog back into RAX */
 	if (save_ret)
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH bpf-next 3/3] selftests/bpf: Add testcases for tailcall hierarchy fixing
  2023-10-05 14:58 [RFC PATCH bpf-next 0/3] bpf, x64: Fix tailcall hierarchy Leon Hwang
  2023-10-05 14:58 ` [RFC PATCH bpf-next 1/3] " Leon Hwang
  2023-10-05 14:58 ` [RFC PATCH bpf-next 2/3] bpf, x64: Load tail_call_cnt pointer Leon Hwang
@ 2023-10-05 14:58 ` Leon Hwang
  2 siblings, 0 replies; 8+ messages in thread
From: Leon Hwang @ 2023-10-05 14:58 UTC (permalink / raw)
  To: bpf
  Cc: ast, daniel, andrii, maciej.fijalkowski, jakub, iii, hengqi.chen,
	hffilwlqm

Add some test cases to confirm the tailcall hierarchy issue has been fixed.

Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
---
 .../selftests/bpf/prog_tests/tailcalls.c      | 384 ++++++++++++++++++
 .../bpf/progs/tailcall_bpf2bpf_hierarchy1.c   |  34 ++
 .../bpf/progs/tailcall_bpf2bpf_hierarchy2.c   |  55 +++
 .../bpf/progs/tailcall_bpf2bpf_hierarchy3.c   |  46 +++
 4 files changed, 519 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy1.c
 create mode 100644 tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy2.c
 create mode 100644 tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy3.c

diff --git a/tools/testing/selftests/bpf/prog_tests/tailcalls.c b/tools/testing/selftests/bpf/prog_tests/tailcalls.c
index fc6b2954e8f50..3133fe63808ac 100644
--- a/tools/testing/selftests/bpf/prog_tests/tailcalls.c
+++ b/tools/testing/selftests/bpf/prog_tests/tailcalls.c
@@ -1105,6 +1105,378 @@ static void test_tailcall_bpf2bpf_fentry_entry(void)
 	bpf_object__close(tgt_obj);
 }
 
+
+static void test_tailcall_hierarchy_count(const char *which, bool test_fentry,
+					  bool test_fexit)
+{
+	int err, map_fd, prog_fd, main_data_fd, fentry_data_fd, fexit_data_fd, i, val;
+	struct bpf_object *obj = NULL, *fentry_obj = NULL, *fexit_obj = NULL;
+	struct bpf_link *fentry_link = NULL, *fexit_link = NULL;
+	struct bpf_map *prog_array, *data_map;
+	struct bpf_program *prog;
+	char buff[128] = {};
+
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+		.data_in = buff,
+		.data_size_in = sizeof(buff),
+		.repeat = 1,
+	);
+
+	err = bpf_prog_test_load(which, BPF_PROG_TYPE_SCHED_CLS, &obj,
+				 &prog_fd);
+	if (!ASSERT_OK(err, "load obj"))
+		return;
+
+	prog = bpf_object__find_program_by_name(obj, "entry");
+	if (!ASSERT_OK_PTR(prog, "find entry prog"))
+		goto out;
+
+	prog_fd = bpf_program__fd(prog);
+	if (!ASSERT_GE(prog_fd, 0, "prog_fd"))
+		goto out;
+
+	prog_array = bpf_object__find_map_by_name(obj, "jmp_table");
+	if (!ASSERT_OK_PTR(prog_array, "find jmp_table"))
+		goto out;
+
+	map_fd = bpf_map__fd(prog_array);
+	if (!ASSERT_GE(map_fd, 0, "map_fd"))
+		goto out;
+
+	i = 0;
+	err = bpf_map_update_elem(map_fd, &i, &prog_fd, BPF_ANY);
+	if (!ASSERT_OK(err, "update jmp_table"))
+		goto out;
+
+	if (test_fentry) {
+		fentry_obj = bpf_object__open_file("tailcall_bpf2bpf_fentry.bpf.o",
+						   NULL);
+		if (!ASSERT_OK_PTR(fentry_obj, "open fentry_obj file"))
+			goto out;
+
+		prog = bpf_object__find_program_by_name(fentry_obj, "fentry");
+		if (!ASSERT_OK_PTR(prog, "find fentry prog"))
+			goto out;
+
+		err = bpf_program__set_attach_target(prog, prog_fd,
+						     "subprog_tail");
+		if (!ASSERT_OK(err, "set_attach_target subprog_tail"))
+			goto out;
+
+		err = bpf_object__load(fentry_obj);
+		if (!ASSERT_OK(err, "load fentry_obj"))
+			goto out;
+
+		fentry_link = bpf_program__attach_trace(prog);
+		if (!ASSERT_OK_PTR(fentry_link, "attach_trace"))
+			goto out;
+	}
+
+	if (test_fexit) {
+		fexit_obj = bpf_object__open_file("tailcall_bpf2bpf_fexit.bpf.o",
+						  NULL);
+		if (!ASSERT_OK_PTR(fexit_obj, "open fexit_obj file"))
+			goto out;
+
+		prog = bpf_object__find_program_by_name(fexit_obj, "fexit");
+		if (!ASSERT_OK_PTR(prog, "find fexit prog"))
+			goto out;
+
+		err = bpf_program__set_attach_target(prog, prog_fd,
+						     "subprog_tail");
+		if (!ASSERT_OK(err, "set_attach_target subprog_tail"))
+			goto out;
+
+		err = bpf_object__load(fexit_obj);
+		if (!ASSERT_OK(err, "load fexit_obj"))
+			goto out;
+
+		fexit_link = bpf_program__attach_trace(prog);
+		if (!ASSERT_OK_PTR(fexit_link, "attach_trace"))
+			goto out;
+	}
+
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	ASSERT_OK(err, "tailcall");
+	ASSERT_EQ(topts.retval, 1, "tailcall retval");
+
+	data_map = bpf_object__find_map_by_name(obj, ".bss");
+	if (!ASSERT_FALSE(!data_map || !bpf_map__is_internal(data_map),
+			  "find data_map"))
+		goto out;
+
+	main_data_fd = bpf_map__fd(data_map);
+	if (!ASSERT_GE(main_data_fd, 0, "main_data_fd"))
+		goto out;
+
+	i = 0;
+	err = bpf_map_lookup_elem(main_data_fd, &i, &val);
+	ASSERT_OK(err, "tailcall count");
+	ASSERT_EQ(val, 34, "tailcall count");
+
+	if (test_fentry) {
+		data_map = bpf_object__find_map_by_name(fentry_obj, ".bss");
+		if (!ASSERT_FALSE(!data_map || !bpf_map__is_internal(data_map),
+				  "find tailcall_bpf2bpf_fentry.bss map"))
+			goto out;
+
+		fentry_data_fd = bpf_map__fd(data_map);
+		if (!ASSERT_GE(fentry_data_fd, 0,
+				  "find tailcall_bpf2bpf_fentry.bss map fd"))
+			goto out;
+
+		i = 0;
+		err = bpf_map_lookup_elem(fentry_data_fd, &i, &val);
+		ASSERT_OK(err, "fentry count");
+		ASSERT_EQ(val, 68, "fentry count");
+	}
+
+	if (test_fexit) {
+		data_map = bpf_object__find_map_by_name(fexit_obj, ".bss");
+		if (!ASSERT_FALSE(!data_map || !bpf_map__is_internal(data_map),
+				  "find tailcall_bpf2bpf_fexit.bss map"))
+			goto out;
+
+		fexit_data_fd = bpf_map__fd(data_map);
+		if (!ASSERT_GE(fexit_data_fd, 0,
+				  "find tailcall_bpf2bpf_fexit.bss map fd"))
+			goto out;
+
+		i = 0;
+		err = bpf_map_lookup_elem(fexit_data_fd, &i, &val);
+		ASSERT_OK(err, "fexit count");
+		ASSERT_EQ(val, 68, "fexit count");
+	}
+
+	i = 0;
+	err = bpf_map_delete_elem(map_fd, &i);
+	if (!ASSERT_OK(err, "delete_elem from jmp_table"))
+		goto out;
+
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	ASSERT_OK(err, "tailcall");
+	ASSERT_EQ(topts.retval, 1, "tailcall retval");
+
+	i = 0;
+	err = bpf_map_lookup_elem(main_data_fd, &i, &val);
+	ASSERT_OK(err, "tailcall count");
+	ASSERT_EQ(val, 35, "tailcall count");
+
+	if (test_fentry) {
+		i = 0;
+		err = bpf_map_lookup_elem(fentry_data_fd, &i, &val);
+		ASSERT_OK(err, "fentry count");
+		ASSERT_EQ(val, 70, "fentry count");
+	}
+
+	if (test_fexit) {
+		i = 0;
+		err = bpf_map_lookup_elem(fexit_data_fd, &i, &val);
+		ASSERT_OK(err, "fexit count");
+		ASSERT_EQ(val, 70, "fexit count");
+	}
+
+out:
+	bpf_link__destroy(fentry_link);
+	bpf_link__destroy(fexit_link);
+	bpf_object__close(fentry_obj);
+	bpf_object__close(fexit_obj);
+	bpf_object__close(obj);
+}
+
+static void test_tailcall_bpf2bpf_hierarchy_1(void)
+{
+	test_tailcall_hierarchy_count("tailcall_bpf2bpf_hierarchy1.bpf.o",
+				      false, false);
+}
+
+static void test_tailcall_bpf2bpf_hierarchy_fentry(void)
+{
+	test_tailcall_hierarchy_count("tailcall_bpf2bpf_hierarchy1.bpf.o",
+				      true, false);
+}
+
+static void test_tailcall_bpf2bpf_hierarchy_fexit(void)
+{
+	test_tailcall_hierarchy_count("tailcall_bpf2bpf_hierarchy1.bpf.o",
+				      false, true);
+}
+
+static void test_tailcall_bpf2bpf_hierarchy_fentry_fexit(void)
+{
+	test_tailcall_hierarchy_count("tailcall_bpf2bpf_hierarchy1.bpf.o",
+				      true, true);
+}
+
+static void test_tailcall_bpf2bpf_hierarchy_2(void)
+{
+	int err, map_fd, prog_fd, data_fd, main_fd, i, val[2];
+	struct bpf_map *prog_array, *data_map;
+	struct bpf_object *obj = NULL;
+	struct bpf_program *prog;
+	char buff[128] = {};
+
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+		.data_in = buff,
+		.data_size_in = sizeof(buff),
+		.repeat = 1,
+	);
+
+	err = bpf_prog_test_load("tailcall_bpf2bpf_hierarchy2.bpf.o",
+				 BPF_PROG_TYPE_SCHED_CLS,
+				 &obj, &prog_fd);
+	if (!ASSERT_OK(err, "load obj"))
+		return;
+
+	prog = bpf_object__find_program_by_name(obj, "entry");
+	if (!ASSERT_OK_PTR(prog, "find entry prog"))
+		goto out;
+
+	main_fd = bpf_program__fd(prog);
+	if (!ASSERT_GE(main_fd, 0, "main_fd"))
+		goto out;
+
+	prog_array = bpf_object__find_map_by_name(obj, "jmp_table");
+	if (!ASSERT_OK_PTR(prog_array, "find jmp_table map"))
+		goto out;
+
+	map_fd = bpf_map__fd(prog_array);
+	if (!ASSERT_GE(map_fd, 0, "find jmp_table map fd"))
+		goto out;
+
+	prog = bpf_object__find_program_by_name(obj, "classifier_0");
+	if (!ASSERT_OK_PTR(prog, "find classifier_0 prog"))
+		goto out;
+
+	prog_fd = bpf_program__fd(prog);
+	if (!ASSERT_GE(prog_fd, 0, "find classifier_0 prog fd"))
+		goto out;
+
+	i = 0;
+	err = bpf_map_update_elem(map_fd, &i, &prog_fd, BPF_ANY);
+	if (!ASSERT_OK(err, "update jmp_table"))
+		goto out;
+
+	prog = bpf_object__find_program_by_name(obj, "classifier_1");
+	if (!ASSERT_OK_PTR(prog, "find classifier_1 prog"))
+		goto out;
+
+	prog_fd = bpf_program__fd(prog);
+	if (!ASSERT_GE(prog_fd, 0, "find classifier_1 prog fd"))
+		goto out;
+
+	i = 1;
+	err = bpf_map_update_elem(map_fd, &i, &prog_fd, BPF_ANY);
+	if (!ASSERT_OK(err, "update jmp_table"))
+		goto out;
+
+	err = bpf_prog_test_run_opts(main_fd, &topts);
+	ASSERT_OK(err, "tailcall");
+	ASSERT_EQ(topts.retval, 1, "tailcall retval");
+
+	data_map = bpf_object__find_map_by_name(obj, ".bss");
+	if (!ASSERT_FALSE(!data_map || !bpf_map__is_internal(data_map),
+			  "find .bss map"))
+		goto out;
+
+	data_fd = bpf_map__fd(data_map);
+	if (!ASSERT_GE(data_fd, 0, "find .bss map fd"))
+		goto out;
+
+	i = 0;
+	err = bpf_map_lookup_elem(data_fd, &i, &val);
+	ASSERT_OK(err, "tailcall counts");
+	ASSERT_EQ(val[0], 33, "tailcall count0");
+	ASSERT_EQ(val[1], 0, "tailcall count1");
+
+out:
+	bpf_object__close(obj);
+}
+
+static void test_tailcall_bpf2bpf_hierarchy_3(void)
+{
+	int err, map_fd, prog_fd, data_fd, main_fd, i, val;
+	struct bpf_map *prog_array, *data_map;
+	struct bpf_object *obj = NULL;
+	struct bpf_program *prog;
+	char buff[128] = {};
+
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+		.data_in = buff,
+		.data_size_in = sizeof(buff),
+		.repeat = 1,
+	);
+
+	err = bpf_prog_test_load("tailcall_bpf2bpf_hierarchy3.bpf.o",
+				 BPF_PROG_TYPE_SCHED_CLS,
+				 &obj, &prog_fd);
+	if (!ASSERT_OK(err, "load obj"))
+		return;
+
+	prog = bpf_object__find_program_by_name(obj, "entry");
+	if (!ASSERT_OK_PTR(prog, "find entry prog"))
+		goto out;
+
+	main_fd = bpf_program__fd(prog);
+	if (!ASSERT_GE(main_fd, 0, "main_fd"))
+		goto out;
+
+	prog_array = bpf_object__find_map_by_name(obj, "jmp_table0");
+	if (!ASSERT_OK_PTR(prog_array, "find jmp_table0 map"))
+		goto out;
+
+	map_fd = bpf_map__fd(prog_array);
+	if (!ASSERT_GE(map_fd, 0, "find jmp_table0 map fd"))
+		goto out;
+
+	prog = bpf_object__find_program_by_name(obj, "classifier_0");
+	if (!ASSERT_OK_PTR(prog, "find classifier_0 prog"))
+		goto out;
+
+	prog_fd = bpf_program__fd(prog);
+	if (!ASSERT_GE(prog_fd, 0, "find classifier_0 prog fd"))
+		goto out;
+
+	i = 0;
+	err = bpf_map_update_elem(map_fd, &i, &prog_fd, BPF_ANY);
+	if (!ASSERT_OK(err, "update jmp_table0"))
+		goto out;
+
+	prog_array = bpf_object__find_map_by_name(obj, "jmp_table1");
+	if (!ASSERT_OK_PTR(prog_array, "find jmp_table1 map"))
+		goto out;
+
+	map_fd = bpf_map__fd(prog_array);
+	if (!ASSERT_GE(map_fd, 0, "find jmp_table1 map fd"))
+		goto out;
+
+	i = 0;
+	err = bpf_map_update_elem(map_fd, &i, &prog_fd, BPF_ANY);
+	if (!ASSERT_OK(err, "update jmp_table1"))
+		goto out;
+
+	err = bpf_prog_test_run_opts(main_fd, &topts);
+	ASSERT_OK(err, "tailcall");
+	ASSERT_EQ(topts.retval, 1, "tailcall retval");
+
+	data_map = bpf_object__find_map_by_name(obj, ".bss");
+	if (!ASSERT_FALSE(!data_map || !bpf_map__is_internal(data_map),
+			  "find .bss map"))
+		goto out;
+
+	data_fd = bpf_map__fd(data_map);
+	if (!ASSERT_GE(data_fd, 0, "find .bss map fd"))
+		goto out;
+
+	i = 0;
+	err = bpf_map_lookup_elem(data_fd, &i, &val);
+	ASSERT_OK(err, "tailcall count");
+	ASSERT_EQ(val, 33, "tailcall count");
+
+out:
+	bpf_object__close(obj);
+}
+
 void test_tailcalls(void)
 {
 	if (test__start_subtest("tailcall_1"))
@@ -1139,4 +1511,16 @@ void test_tailcalls(void)
 		test_tailcall_bpf2bpf_fentry_fexit();
 	if (test__start_subtest("tailcall_bpf2bpf_fentry_entry"))
 		test_tailcall_bpf2bpf_fentry_entry();
+	if (test__start_subtest("tailcall_bpf2bpf_hierarchy_1"))
+		test_tailcall_bpf2bpf_hierarchy_1();
+	if (test__start_subtest("tailcall_bpf2bpf_hierarchy_fentry"))
+		test_tailcall_bpf2bpf_hierarchy_fentry();
+	if (test__start_subtest("tailcall_bpf2bpf_hierarchy_fexit"))
+		test_tailcall_bpf2bpf_hierarchy_fexit();
+	if (test__start_subtest("tailcall_bpf2bpf_hierarchy_fentry_fexit"))
+		test_tailcall_bpf2bpf_hierarchy_fentry_fexit();
+	if (test__start_subtest("tailcall_bpf2bpf_hierarchy_2"))
+		test_tailcall_bpf2bpf_hierarchy_2();
+	if (test__start_subtest("tailcall_bpf2bpf_hierarchy_3"))
+		test_tailcall_bpf2bpf_hierarchy_3();
 }
diff --git a/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy1.c b/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy1.c
new file mode 100644
index 0000000000000..0bfbb7c9637b7
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy1.c
@@ -0,0 +1,34 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_legacy.h"
+
+struct {
+	__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
+	__uint(max_entries, 1);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(__u32));
+} jmp_table SEC(".maps");
+
+int count = 0;
+
+static __noinline
+int subprog_tail(struct __sk_buff *skb)
+{
+	bpf_tail_call_static(skb, &jmp_table, 0);
+	return 0;
+}
+
+SEC("tc")
+int entry(struct __sk_buff *skb)
+{
+	volatile int ret = 1;
+
+	count++;
+	subprog_tail(skb);
+	subprog_tail(skb);
+
+	return ret;
+}
+
+char __license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy2.c b/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy2.c
new file mode 100644
index 0000000000000..b84541546082e
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy2.c
@@ -0,0 +1,55 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_legacy.h"
+
+struct {
+	__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
+	__uint(max_entries, 2);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(__u32));
+} jmp_table SEC(".maps");
+
+int count0 = 0;
+int count1 = 0;
+
+static __noinline
+int subprog_tail0(struct __sk_buff *skb)
+{
+	bpf_tail_call_static(skb, &jmp_table, 0);
+	return 0;
+}
+
+SEC("tc")
+int classifier_0(struct __sk_buff *skb)
+{
+	count0++;
+	subprog_tail0(skb);
+	return 0;
+}
+
+static __noinline
+int subprog_tail1(struct __sk_buff *skb)
+{
+	bpf_tail_call_static(skb, &jmp_table, 1);
+	return 0;
+}
+
+SEC("tc")
+int classifier_1(struct __sk_buff *skb)
+{
+	count1++;
+	subprog_tail1(skb);
+	return 0;
+}
+
+SEC("tc")
+int entry(struct __sk_buff *skb)
+{
+	subprog_tail0(skb);
+	subprog_tail1(skb);
+
+	return 1;
+}
+
+char __license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy3.c b/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy3.c
new file mode 100644
index 0000000000000..6398a1d277fc7
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy3.c
@@ -0,0 +1,46 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_legacy.h"
+
+struct {
+	__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
+	__uint(max_entries, 1);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(__u32));
+} jmp_table0 SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
+	__uint(max_entries, 1);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(__u32));
+} jmp_table1 SEC(".maps");
+
+int count = 0;
+
+static __noinline
+int subprog_tail(struct __sk_buff *skb, void *jmp_table)
+{
+	bpf_tail_call_static(skb, jmp_table, 0);
+	return 0;
+}
+
+SEC("tc")
+int classifier_0(struct __sk_buff *skb)
+{
+	count++;
+	subprog_tail(skb, &jmp_table0);
+	subprog_tail(skb, &jmp_table1);
+	return 1;
+}
+
+SEC("tc")
+int entry(struct __sk_buff *skb)
+{
+	bpf_tail_call_static(skb, &jmp_table0, 0);
+
+	return 0;
+}
+
+char __license[] SEC("license") = "GPL";
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH bpf-next 1/3] bpf, x64: Fix tailcall hierarchy
  2023-10-05 14:58 ` [RFC PATCH bpf-next 1/3] " Leon Hwang
@ 2023-10-05 18:05   ` Stanislav Fomichev
  2023-10-06  1:43     ` Leon Hwang
  0 siblings, 1 reply; 8+ messages in thread
From: Stanislav Fomichev @ 2023-10-05 18:05 UTC (permalink / raw)
  To: Leon Hwang
  Cc: bpf, ast, daniel, andrii, maciej.fijalkowski, jakub, iii, hengqi.chen

On 10/05, Leon Hwang wrote:
> From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and tailcall
> handling in JIT"), the tailcall on x64 works better than before.
> 
> From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF subprograms
> for x64 JIT"), tailcall is able to run in BPF subprograms on x64.
> 
> How about:
> 
> 1. More than 1 subprograms are called in a bpf program.
> 2. The tailcalls in the subprograms call the bpf program.
> 
> Because of missing tail_call_cnt back-propagation, a tailcall hierarchy
> comes up. And MAX_TAIL_CALL_CNT limit does not work for this case.
> 
> As we know, in tail call context, the tail_call_cnt propagates by stack
> and rax register between BPF subprograms. So, propagating tail_call_cnt
> pointer by stack and rax register makes tail_call_cnt as like a global
> variable, in order to make MAX_TAIL_CALL_CNT limit works for tailcall
> hierarchy cases.
> 
> Before jumping to other bpf prog, load tail_call_cnt from the pointer
> and then compare with MAX_TAIL_CALL_CNT. Finally, increment
> tail_call_cnt by the pointer.
> 
> But, where does tail_call_cnt store?
> 
> It stores on the stack of uppest-hierarchy-layer bpf prog, like
> 
>  |  STACK  |
>  +---------+ RBP
>  |         |
>  |         |
>  |         |
>  | tcc_ptr |
>  |   tcc   |
>  |   rbx   |
>  +---------+ RSP
> 
> Why not back-propagate tail_call_cnt?
> 
> It's because it's vulnerable to back-propagate it. It's unable to work
> well with the following case.
> 
> int prog1();
> int prog2();
> 
> prog1 is tail caller, and prog2 is tail callee. If we do back-propagate
> tail_call_cnt at the epilogue of prog2, can prog2 run standalone at the
> same time? The answer is NO. Otherwise, there will be a register to be
> polluted, which will make kernel crash.
> 
> Can tail_call_cnt store at other place instead of the stack of bpf prog?
> 
> I'm not able to infer a better place to store tail_call_cnt. It's not a
> working inference to store it at ctx or on the stack of bpf prog's
> caller.
> 
> Fixes: ebf7d1f508a7 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT")
> Fixes: e411901c0b77 ("bpf: allow for tailcalls in BPF subprograms for x64 JIT")
> Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
> ---
>  arch/x86/net/bpf_jit_comp.c | 120 +++++++++++++++++++++++-------------
>  1 file changed, 76 insertions(+), 44 deletions(-)
> 
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index 8c10d9abc2394..8ad6368353c2b 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -256,7 +256,7 @@ struct jit_context {
>  /* Number of bytes emit_patch() needs to generate instructions */
>  #define X86_PATCH_SIZE		5
>  /* Number of bytes that will be skipped on tailcall */
> -#define X86_TAIL_CALL_OFFSET	(11 + ENDBR_INSN_SIZE)
> +#define X86_TAIL_CALL_OFFSET	(24 + ENDBR_INSN_SIZE)
>  
>  static void push_r12(u8 **pprog)
>  {
> @@ -304,6 +304,25 @@ static void pop_callee_regs(u8 **pprog, bool *callee_regs_used)
>  	*pprog = prog;
>  }
>  

[..]

> +static void emit_nops(u8 **pprog, int len)
> +{
> +	u8 *prog = *pprog;
> +	int i, noplen;
> +
> +	while (len > 0) {
> +		noplen = len;
> +
> +		if (noplen > ASM_NOP_MAX)
> +			noplen = ASM_NOP_MAX;
> +
> +		for (i = 0; i < noplen; i++)
> +			EMIT1(x86_nops[noplen][i]);
> +		len -= noplen;
> +	}
> +
> +	*pprog = prog;
> +}

From high level - makes sense to me.
I'll leave a thorough review to the people who understand more :-)
I see Maciej commenting on your original "Fix tailcall infinite loop"
series.

One suggestion I have is: the changes to 'memcpy(prog, x86_nops[5],
X86_PATCH_SIZE);' and this emit_nops move here don't seem like
they actually belong to this patch. Maybe do them separately?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH bpf-next 1/3] bpf, x64: Fix tailcall hierarchy
  2023-10-05 18:05   ` Stanislav Fomichev
@ 2023-10-06  1:43     ` Leon Hwang
  2023-10-06 16:44       ` Stanislav Fomichev
  0 siblings, 1 reply; 8+ messages in thread
From: Leon Hwang @ 2023-10-06  1:43 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: bpf, ast, daniel, andrii, maciej.fijalkowski, jakub, iii, hengqi.chen



On 6/10/23 02:05, Stanislav Fomichev wrote:
> On 10/05, Leon Hwang wrote:
>> From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and tailcall
>> handling in JIT"), the tailcall on x64 works better than before.
>>
>> From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF subprograms
>> for x64 JIT"), tailcall is able to run in BPF subprograms on x64.
>>
>> How about:
>>
>> 1. More than 1 subprograms are called in a bpf program.
>> 2. The tailcalls in the subprograms call the bpf program.
>>
>> Because of missing tail_call_cnt back-propagation, a tailcall hierarchy
>> comes up. And MAX_TAIL_CALL_CNT limit does not work for this case.
>>
>> As we know, in tail call context, the tail_call_cnt propagates by stack
>> and rax register between BPF subprograms. So, propagating tail_call_cnt
>> pointer by stack and rax register makes tail_call_cnt as like a global
>> variable, in order to make MAX_TAIL_CALL_CNT limit works for tailcall
>> hierarchy cases.
>>
>> Before jumping to other bpf prog, load tail_call_cnt from the pointer
>> and then compare with MAX_TAIL_CALL_CNT. Finally, increment
>> tail_call_cnt by the pointer.
>>
>> But, where does tail_call_cnt store?
>>
>> It stores on the stack of uppest-hierarchy-layer bpf prog, like
>>
>>  |  STACK  |
>>  +---------+ RBP
>>  |         |
>>  |         |
>>  |         |
>>  | tcc_ptr |
>>  |   tcc   |
>>  |   rbx   |
>>  +---------+ RSP
>>
>> Why not back-propagate tail_call_cnt?
>>
>> It's because it's vulnerable to back-propagate it. It's unable to work
>> well with the following case.
>>
>> int prog1();
>> int prog2();
>>
>> prog1 is tail caller, and prog2 is tail callee. If we do back-propagate
>> tail_call_cnt at the epilogue of prog2, can prog2 run standalone at the
>> same time? The answer is NO. Otherwise, there will be a register to be
>> polluted, which will make kernel crash.
>>
>> Can tail_call_cnt store at other place instead of the stack of bpf prog?
>>
>> I'm not able to infer a better place to store tail_call_cnt. It's not a
>> working inference to store it at ctx or on the stack of bpf prog's
>> caller.
>>
>> Fixes: ebf7d1f508a7 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT")
>> Fixes: e411901c0b77 ("bpf: allow for tailcalls in BPF subprograms for x64 JIT")
>> Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
>> ---
>>  arch/x86/net/bpf_jit_comp.c | 120 +++++++++++++++++++++++-------------
>>  1 file changed, 76 insertions(+), 44 deletions(-)
>>
>> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
>> index 8c10d9abc2394..8ad6368353c2b 100644
>> --- a/arch/x86/net/bpf_jit_comp.c
>> +++ b/arch/x86/net/bpf_jit_comp.c
>> @@ -256,7 +256,7 @@ struct jit_context {
>>  /* Number of bytes emit_patch() needs to generate instructions */
>>  #define X86_PATCH_SIZE		5
>>  /* Number of bytes that will be skipped on tailcall */
>> -#define X86_TAIL_CALL_OFFSET	(11 + ENDBR_INSN_SIZE)
>> +#define X86_TAIL_CALL_OFFSET	(24 + ENDBR_INSN_SIZE)
>>  
>>  static void push_r12(u8 **pprog)
>>  {
>> @@ -304,6 +304,25 @@ static void pop_callee_regs(u8 **pprog, bool *callee_regs_used)
>>  	*pprog = prog;
>>  }
>>  
> 
> [..]
> 
>> +static void emit_nops(u8 **pprog, int len)
>> +{
>> +	u8 *prog = *pprog;
>> +	int i, noplen;
>> +
>> +	while (len > 0) {
>> +		noplen = len;
>> +
>> +		if (noplen > ASM_NOP_MAX)
>> +			noplen = ASM_NOP_MAX;
>> +
>> +		for (i = 0; i < noplen; i++)
>> +			EMIT1(x86_nops[noplen][i]);
>> +		len -= noplen;
>> +	}
>> +
>> +	*pprog = prog;
>> +}
> 
> From high level - makes sense to me.
> I'll leave a thorough review to the people who understand more :-)
> I see Maciej commenting on your original "Fix tailcall infinite loop"
> series.

Welcome for your review.

> 
> One suggestion I have is: the changes to 'memcpy(prog, x86_nops[5],
> X86_PATCH_SIZE);' and this emit_nops move here don't seem like
> they actually belong to this patch. Maybe do them separately?

Moving emit_nops here is for them:

+			/* Keep the same instruction layout. */
+			emit_nops(&prog, 3);
+			emit_nops(&prog, 6);
+			emit_nops(&prog, 6);

and do the changes to 'memcpy(prog, x86_nops[5], X86_PATCH_SIZE);' BTW.

Thanks,
Leon

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH bpf-next 1/3] bpf, x64: Fix tailcall hierarchy
  2023-10-06  1:43     ` Leon Hwang
@ 2023-10-06 16:44       ` Stanislav Fomichev
  2023-10-07  5:50         ` Leon Hwang
  0 siblings, 1 reply; 8+ messages in thread
From: Stanislav Fomichev @ 2023-10-06 16:44 UTC (permalink / raw)
  To: Leon Hwang
  Cc: bpf, ast, daniel, andrii, maciej.fijalkowski, jakub, iii, hengqi.chen

On Thu, Oct 5, 2023 at 6:43 PM Leon Hwang <hffilwlqm@gmail.com> wrote:
>
>
>
> On 6/10/23 02:05, Stanislav Fomichev wrote:
> > On 10/05, Leon Hwang wrote:
> >> From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and tailcall
> >> handling in JIT"), the tailcall on x64 works better than before.
> >>
> >> From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF subprograms
> >> for x64 JIT"), tailcall is able to run in BPF subprograms on x64.
> >>
> >> How about:
> >>
> >> 1. More than 1 subprograms are called in a bpf program.
> >> 2. The tailcalls in the subprograms call the bpf program.
> >>
> >> Because of missing tail_call_cnt back-propagation, a tailcall hierarchy
> >> comes up. And MAX_TAIL_CALL_CNT limit does not work for this case.
> >>
> >> As we know, in tail call context, the tail_call_cnt propagates by stack
> >> and rax register between BPF subprograms. So, propagating tail_call_cnt
> >> pointer by stack and rax register makes tail_call_cnt as like a global
> >> variable, in order to make MAX_TAIL_CALL_CNT limit works for tailcall
> >> hierarchy cases.
> >>
> >> Before jumping to other bpf prog, load tail_call_cnt from the pointer
> >> and then compare with MAX_TAIL_CALL_CNT. Finally, increment
> >> tail_call_cnt by the pointer.
> >>
> >> But, where does tail_call_cnt store?
> >>
> >> It stores on the stack of uppest-hierarchy-layer bpf prog, like
> >>
> >>  |  STACK  |
> >>  +---------+ RBP
> >>  |         |
> >>  |         |
> >>  |         |
> >>  | tcc_ptr |
> >>  |   tcc   |
> >>  |   rbx   |
> >>  +---------+ RSP
> >>
> >> Why not back-propagate tail_call_cnt?
> >>
> >> It's because it's vulnerable to back-propagate it. It's unable to work
> >> well with the following case.
> >>
> >> int prog1();
> >> int prog2();
> >>
> >> prog1 is tail caller, and prog2 is tail callee. If we do back-propagate
> >> tail_call_cnt at the epilogue of prog2, can prog2 run standalone at the
> >> same time? The answer is NO. Otherwise, there will be a register to be
> >> polluted, which will make kernel crash.
> >>
> >> Can tail_call_cnt store at other place instead of the stack of bpf prog?
> >>
> >> I'm not able to infer a better place to store tail_call_cnt. It's not a
> >> working inference to store it at ctx or on the stack of bpf prog's
> >> caller.
> >>
> >> Fixes: ebf7d1f508a7 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT")
> >> Fixes: e411901c0b77 ("bpf: allow for tailcalls in BPF subprograms for x64 JIT")
> >> Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
> >> ---
> >>  arch/x86/net/bpf_jit_comp.c | 120 +++++++++++++++++++++++-------------
> >>  1 file changed, 76 insertions(+), 44 deletions(-)
> >>
> >> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> >> index 8c10d9abc2394..8ad6368353c2b 100644
> >> --- a/arch/x86/net/bpf_jit_comp.c
> >> +++ b/arch/x86/net/bpf_jit_comp.c
> >> @@ -256,7 +256,7 @@ struct jit_context {
> >>  /* Number of bytes emit_patch() needs to generate instructions */
> >>  #define X86_PATCH_SIZE              5
> >>  /* Number of bytes that will be skipped on tailcall */
> >> -#define X86_TAIL_CALL_OFFSET        (11 + ENDBR_INSN_SIZE)
> >> +#define X86_TAIL_CALL_OFFSET        (24 + ENDBR_INSN_SIZE)
> >>
> >>  static void push_r12(u8 **pprog)
> >>  {
> >> @@ -304,6 +304,25 @@ static void pop_callee_regs(u8 **pprog, bool *callee_regs_used)
> >>      *pprog = prog;
> >>  }
> >>
> >
> > [..]
> >
> >> +static void emit_nops(u8 **pprog, int len)
> >> +{
> >> +    u8 *prog = *pprog;
> >> +    int i, noplen;
> >> +
> >> +    while (len > 0) {
> >> +            noplen = len;
> >> +
> >> +            if (noplen > ASM_NOP_MAX)
> >> +                    noplen = ASM_NOP_MAX;
> >> +
> >> +            for (i = 0; i < noplen; i++)
> >> +                    EMIT1(x86_nops[noplen][i]);
> >> +            len -= noplen;
> >> +    }
> >> +
> >> +    *pprog = prog;
> >> +}
> >
> > From high level - makes sense to me.
> > I'll leave a thorough review to the people who understand more :-)
> > I see Maciej commenting on your original "Fix tailcall infinite loop"
> > series.
>
> Welcome for your review.
>
> >
> > One suggestion I have is: the changes to 'memcpy(prog, x86_nops[5],
> > X86_PATCH_SIZE);' and this emit_nops move here don't seem like
> > they actually belong to this patch. Maybe do them separately?
>
> Moving emit_nops here is for them:
>
> +                       /* Keep the same instruction layout. */
> +                       emit_nops(&prog, 3);
> +                       emit_nops(&prog, 6);
> +                       emit_nops(&prog, 6);
>
> and do the changes to 'memcpy(prog, x86_nops[5], X86_PATCH_SIZE);' BTW.

Right, I'm saying that you can do the move + replace memcpy in a
separate (first) patch to make the patch with the actual changes a bit
smaller.
But that's not strictly required, up to you.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH bpf-next 1/3] bpf, x64: Fix tailcall hierarchy
  2023-10-06 16:44       ` Stanislav Fomichev
@ 2023-10-07  5:50         ` Leon Hwang
  0 siblings, 0 replies; 8+ messages in thread
From: Leon Hwang @ 2023-10-07  5:50 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: bpf, ast, daniel, andrii, maciej.fijalkowski, jakub, iii, hengqi.chen



On 2023/10/7 00:44, Stanislav Fomichev wrote:
> On Thu, Oct 5, 2023 at 6:43 PM Leon Hwang <hffilwlqm@gmail.com> wrote:
>>
>>
>>
>> On 6/10/23 02:05, Stanislav Fomichev wrote:
>>> On 10/05, Leon Hwang wrote:
>>>> From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and tailcall
>>>> handling in JIT"), the tailcall on x64 works better than before.
>>>>
>>>> From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF subprograms
>>>> for x64 JIT"), tailcall is able to run in BPF subprograms on x64.
>>>>
>>>> How about:
>>>>
>>>> 1. More than 1 subprograms are called in a bpf program.
>>>> 2. The tailcalls in the subprograms call the bpf program.
>>>>
>>>> Because of missing tail_call_cnt back-propagation, a tailcall hierarchy
>>>> comes up. And MAX_TAIL_CALL_CNT limit does not work for this case.
>>>>
>>>> As we know, in tail call context, the tail_call_cnt propagates by stack
>>>> and rax register between BPF subprograms. So, propagating tail_call_cnt
>>>> pointer by stack and rax register makes tail_call_cnt as like a global
>>>> variable, in order to make MAX_TAIL_CALL_CNT limit works for tailcall
>>>> hierarchy cases.
>>>>
>>>> Before jumping to other bpf prog, load tail_call_cnt from the pointer
>>>> and then compare with MAX_TAIL_CALL_CNT. Finally, increment
>>>> tail_call_cnt by the pointer.
>>>>
>>>> But, where does tail_call_cnt store?
>>>>
>>>> It stores on the stack of uppest-hierarchy-layer bpf prog, like
>>>>
>>>>  |  STACK  |
>>>>  +---------+ RBP
>>>>  |         |
>>>>  |         |
>>>>  |         |
>>>>  | tcc_ptr |
>>>>  |   tcc   |
>>>>  |   rbx   |
>>>>  +---------+ RSP
>>>>
>>>> Why not back-propagate tail_call_cnt?
>>>>
>>>> It's because it's vulnerable to back-propagate it. It's unable to work
>>>> well with the following case.
>>>>
>>>> int prog1();
>>>> int prog2();
>>>>
>>>> prog1 is tail caller, and prog2 is tail callee. If we do back-propagate
>>>> tail_call_cnt at the epilogue of prog2, can prog2 run standalone at the
>>>> same time? The answer is NO. Otherwise, there will be a register to be
>>>> polluted, which will make kernel crash.
>>>>
>>>> Can tail_call_cnt store at other place instead of the stack of bpf prog?
>>>>
>>>> I'm not able to infer a better place to store tail_call_cnt. It's not a
>>>> working inference to store it at ctx or on the stack of bpf prog's
>>>> caller.
>>>>
>>>> Fixes: ebf7d1f508a7 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT")
>>>> Fixes: e411901c0b77 ("bpf: allow for tailcalls in BPF subprograms for x64 JIT")
>>>> Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
>>>> ---
>>>>  arch/x86/net/bpf_jit_comp.c | 120 +++++++++++++++++++++++-------------
>>>>  1 file changed, 76 insertions(+), 44 deletions(-)
>>>>
>>>> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
>>>> index 8c10d9abc2394..8ad6368353c2b 100644
>>>> --- a/arch/x86/net/bpf_jit_comp.c
>>>> +++ b/arch/x86/net/bpf_jit_comp.c
>>>> @@ -256,7 +256,7 @@ struct jit_context {
>>>>  /* Number of bytes emit_patch() needs to generate instructions */
>>>>  #define X86_PATCH_SIZE              5
>>>>  /* Number of bytes that will be skipped on tailcall */
>>>> -#define X86_TAIL_CALL_OFFSET        (11 + ENDBR_INSN_SIZE)
>>>> +#define X86_TAIL_CALL_OFFSET        (24 + ENDBR_INSN_SIZE)
>>>>
>>>>  static void push_r12(u8 **pprog)
>>>>  {
>>>> @@ -304,6 +304,25 @@ static void pop_callee_regs(u8 **pprog, bool *callee_regs_used)
>>>>      *pprog = prog;
>>>>  }
>>>>
>>>
>>> [..]
>>>
>>>> +static void emit_nops(u8 **pprog, int len)
>>>> +{
>>>> +    u8 *prog = *pprog;
>>>> +    int i, noplen;
>>>> +
>>>> +    while (len > 0) {
>>>> +            noplen = len;
>>>> +
>>>> +            if (noplen > ASM_NOP_MAX)
>>>> +                    noplen = ASM_NOP_MAX;
>>>> +
>>>> +            for (i = 0; i < noplen; i++)
>>>> +                    EMIT1(x86_nops[noplen][i]);
>>>> +            len -= noplen;
>>>> +    }
>>>> +
>>>> +    *pprog = prog;
>>>> +}
>>>
>>> From high level - makes sense to me.
>>> I'll leave a thorough review to the people who understand more :-)
>>> I see Maciej commenting on your original "Fix tailcall infinite loop"
>>> series.
>>
>> Welcome for your review.
>>
>>>
>>> One suggestion I have is: the changes to 'memcpy(prog, x86_nops[5],
>>> X86_PATCH_SIZE);' and this emit_nops move here don't seem like
>>> they actually belong to this patch. Maybe do them separately?
>>
>> Moving emit_nops here is for them:
>>
>> +                       /* Keep the same instruction layout. */
>> +                       emit_nops(&prog, 3);
>> +                       emit_nops(&prog, 6);
>> +                       emit_nops(&prog, 6);
>>
>> and do the changes to 'memcpy(prog, x86_nops[5], X86_PATCH_SIZE);' BTW.
> 
> Right, I'm saying that you can do the move + replace memcpy in a
> separate (first) patch to make the patch with the actual changes a bit
> smaller.
> But that's not strictly required, up to you.

LGTM

Thanks,
Leon

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-10-07  5:51 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-05 14:58 [RFC PATCH bpf-next 0/3] bpf, x64: Fix tailcall hierarchy Leon Hwang
2023-10-05 14:58 ` [RFC PATCH bpf-next 1/3] " Leon Hwang
2023-10-05 18:05   ` Stanislav Fomichev
2023-10-06  1:43     ` Leon Hwang
2023-10-06 16:44       ` Stanislav Fomichev
2023-10-07  5:50         ` Leon Hwang
2023-10-05 14:58 ` [RFC PATCH bpf-next 2/3] bpf, x64: Load tail_call_cnt pointer Leon Hwang
2023-10-05 14:58 ` [RFC PATCH bpf-next 3/3] selftests/bpf: Add testcases for tailcall hierarchy fixing Leon Hwang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.