All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/4] Mixing bpf2bpf and tailcalls for RV64
@ 2023-09-19  3:57 ` Pu Lehui
  0 siblings, 0 replies; 42+ messages in thread
From: Pu Lehui @ 2023-09-19  3:57 UTC (permalink / raw)
  To: bpf, linux-riscv, netdev
  Cc: Björn Töpel, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Palmer Dabbelt, Conor Dooley, Luke Nelson, Pu Lehui, Pu Lehui

In the current RV64 JIT, if we just don't initialize the TCC in subprog,
the TCC can be propagated from the parent process to the subprocess, but
the TCC of the parent process cannot be restored when the subprocess
exits. Since the RV64 TCC is initialized before saving the callee saved
registers into the stack, we cannot use the callee saved register to
pass the TCC, otherwise the original value of the callee saved register
will be destroyed. So we implemented mixing bpf2bpf and tailcalls
similar to x86_64, i.e. using a non-callee saved register to transfer
the TCC between functions, and saving that register to the stack to
protect the TCC value. At the same time, we also consider the scenario
of mixing trampoline.

In addition, some code cleans are also attached to this patchset.

Tests test_bpf.ko and test_verifier have passed, as well as the relative
testcases of test_progs*.

Pu Lehui (4):
  riscv, bpf: Remove redundant ctx->offset initialization
  riscv, bpf: Using kvcalloc to allocate cache buffer
  riscv, bpf: Add RV_TAILCALL_OFFSET macro to format tailcall offset
  riscv, bpf: Mixing bpf2bpf and tailcalls

 arch/riscv/net/bpf_jit.h        |   1 +
 arch/riscv/net/bpf_jit_comp64.c | 104 ++++++++++++++------------------
 arch/riscv/net/bpf_jit_core.c   |   9 +--
 3 files changed, 48 insertions(+), 66 deletions(-)

-- 
2.25.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH bpf-next 0/4] Mixing bpf2bpf and tailcalls for RV64
@ 2023-09-19  3:57 ` Pu Lehui
  0 siblings, 0 replies; 42+ messages in thread
From: Pu Lehui @ 2023-09-19  3:57 UTC (permalink / raw)
  To: bpf, linux-riscv, netdev
  Cc: Björn Töpel, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Palmer Dabbelt, Conor Dooley, Luke Nelson, Pu Lehui, Pu Lehui

In the current RV64 JIT, if we just don't initialize the TCC in subprog,
the TCC can be propagated from the parent process to the subprocess, but
the TCC of the parent process cannot be restored when the subprocess
exits. Since the RV64 TCC is initialized before saving the callee saved
registers into the stack, we cannot use the callee saved register to
pass the TCC, otherwise the original value of the callee saved register
will be destroyed. So we implemented mixing bpf2bpf and tailcalls
similar to x86_64, i.e. using a non-callee saved register to transfer
the TCC between functions, and saving that register to the stack to
protect the TCC value. At the same time, we also consider the scenario
of mixing trampoline.

In addition, some code cleans are also attached to this patchset.

Tests test_bpf.ko and test_verifier have passed, as well as the relative
testcases of test_progs*.

Pu Lehui (4):
  riscv, bpf: Remove redundant ctx->offset initialization
  riscv, bpf: Using kvcalloc to allocate cache buffer
  riscv, bpf: Add RV_TAILCALL_OFFSET macro to format tailcall offset
  riscv, bpf: Mixing bpf2bpf and tailcalls

 arch/riscv/net/bpf_jit.h        |   1 +
 arch/riscv/net/bpf_jit_comp64.c | 104 ++++++++++++++------------------
 arch/riscv/net/bpf_jit_core.c   |   9 +--
 3 files changed, 48 insertions(+), 66 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH bpf-next 1/4] riscv, bpf: Remove redundant ctx->offset initialization
  2023-09-19  3:57 ` Pu Lehui
@ 2023-09-19  3:57   ` Pu Lehui
  -1 siblings, 0 replies; 42+ messages in thread
From: Pu Lehui @ 2023-09-19  3:57 UTC (permalink / raw)
  To: bpf, linux-riscv, netdev
  Cc: Björn Töpel, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Palmer Dabbelt, Conor Dooley, Luke Nelson, Pu Lehui, Pu Lehui

From: Pu Lehui <pulehui@huawei.com>

There is no gain in initializing ctx->offset and prev_insns, and
ctx->offset is already zero-initialized. Let's remove this code.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
---
 arch/riscv/net/bpf_jit_core.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/arch/riscv/net/bpf_jit_core.c b/arch/riscv/net/bpf_jit_core.c
index 7a26a3e1c..dd00103a2 100644
--- a/arch/riscv/net/bpf_jit_core.c
+++ b/arch/riscv/net/bpf_jit_core.c
@@ -89,11 +89,6 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 		goto out_offset;
 	}
 
-	for (i = 0; i < prog->len; i++) {
-		prev_ninsns += 32;
-		ctx->offset[i] = prev_ninsns;
-	}
-
 	for (i = 0; i < NR_JIT_ITERATIONS; i++) {
 		pass++;
 		ctx->ninsns = 0;
-- 
2.25.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH bpf-next 1/4] riscv, bpf: Remove redundant ctx->offset initialization
@ 2023-09-19  3:57   ` Pu Lehui
  0 siblings, 0 replies; 42+ messages in thread
From: Pu Lehui @ 2023-09-19  3:57 UTC (permalink / raw)
  To: bpf, linux-riscv, netdev
  Cc: Björn Töpel, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Palmer Dabbelt, Conor Dooley, Luke Nelson, Pu Lehui, Pu Lehui

From: Pu Lehui <pulehui@huawei.com>

There is no gain in initializing ctx->offset and prev_insns, and
ctx->offset is already zero-initialized. Let's remove this code.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
---
 arch/riscv/net/bpf_jit_core.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/arch/riscv/net/bpf_jit_core.c b/arch/riscv/net/bpf_jit_core.c
index 7a26a3e1c..dd00103a2 100644
--- a/arch/riscv/net/bpf_jit_core.c
+++ b/arch/riscv/net/bpf_jit_core.c
@@ -89,11 +89,6 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 		goto out_offset;
 	}
 
-	for (i = 0; i < prog->len; i++) {
-		prev_ninsns += 32;
-		ctx->offset[i] = prev_ninsns;
-	}
-
 	for (i = 0; i < NR_JIT_ITERATIONS; i++) {
 		pass++;
 		ctx->ninsns = 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH bpf-next 2/4] riscv, bpf: Using kvcalloc to allocate cache buffer
  2023-09-19  3:57 ` Pu Lehui
@ 2023-09-19  3:57   ` Pu Lehui
  -1 siblings, 0 replies; 42+ messages in thread
From: Pu Lehui @ 2023-09-19  3:57 UTC (permalink / raw)
  To: bpf, linux-riscv, netdev
  Cc: Björn Töpel, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Palmer Dabbelt, Conor Dooley, Luke Nelson, Pu Lehui, Pu Lehui

From: Pu Lehui <pulehui@huawei.com>

It is unnecessary to allocate continuous physical memory for cache
buffer, and when ebpf program is too large, it may cause memory
allocation failure.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
---
 arch/riscv/net/bpf_jit_comp64.c | 4 ++--
 arch/riscv/net/bpf_jit_core.c   | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
index 8423f4ddf..57535cfe1 100644
--- a/arch/riscv/net/bpf_jit_comp64.c
+++ b/arch/riscv/net/bpf_jit_comp64.c
@@ -906,7 +906,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
 	}
 
 	if (fmod_ret->nr_links) {
-		branches_off = kcalloc(fmod_ret->nr_links, sizeof(int), GFP_KERNEL);
+		branches_off = kvcalloc(fmod_ret->nr_links, sizeof(int), GFP_KERNEL);
 		if (!branches_off)
 			return -ENOMEM;
 
@@ -993,7 +993,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
 
 	ret = ctx->ninsns;
 out:
-	kfree(branches_off);
+	kvfree(branches_off);
 	return ret;
 }
 
diff --git a/arch/riscv/net/bpf_jit_core.c b/arch/riscv/net/bpf_jit_core.c
index dd00103a2..d9c82537d 100644
--- a/arch/riscv/net/bpf_jit_core.c
+++ b/arch/riscv/net/bpf_jit_core.c
@@ -78,7 +78,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 	}
 
 	ctx->prog = prog;
-	ctx->offset = kcalloc(prog->len, sizeof(int), GFP_KERNEL);
+	ctx->offset = kvcalloc(prog->len, sizeof(int), GFP_KERNEL);
 	if (!ctx->offset) {
 		prog = orig_prog;
 		goto out_offset;
@@ -170,7 +170,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 			ctx->offset[i] = ninsns_rvoff(ctx->offset[i]);
 		bpf_prog_fill_jited_linfo(prog, ctx->offset);
 out_offset:
-		kfree(ctx->offset);
+		kvfree(ctx->offset);
 		kfree(jit_data);
 		prog->aux->jit_data = NULL;
 	}
-- 
2.25.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH bpf-next 2/4] riscv, bpf: Using kvcalloc to allocate cache buffer
@ 2023-09-19  3:57   ` Pu Lehui
  0 siblings, 0 replies; 42+ messages in thread
From: Pu Lehui @ 2023-09-19  3:57 UTC (permalink / raw)
  To: bpf, linux-riscv, netdev
  Cc: Björn Töpel, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Palmer Dabbelt, Conor Dooley, Luke Nelson, Pu Lehui, Pu Lehui

From: Pu Lehui <pulehui@huawei.com>

It is unnecessary to allocate continuous physical memory for cache
buffer, and when ebpf program is too large, it may cause memory
allocation failure.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
---
 arch/riscv/net/bpf_jit_comp64.c | 4 ++--
 arch/riscv/net/bpf_jit_core.c   | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
index 8423f4ddf..57535cfe1 100644
--- a/arch/riscv/net/bpf_jit_comp64.c
+++ b/arch/riscv/net/bpf_jit_comp64.c
@@ -906,7 +906,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
 	}
 
 	if (fmod_ret->nr_links) {
-		branches_off = kcalloc(fmod_ret->nr_links, sizeof(int), GFP_KERNEL);
+		branches_off = kvcalloc(fmod_ret->nr_links, sizeof(int), GFP_KERNEL);
 		if (!branches_off)
 			return -ENOMEM;
 
@@ -993,7 +993,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
 
 	ret = ctx->ninsns;
 out:
-	kfree(branches_off);
+	kvfree(branches_off);
 	return ret;
 }
 
diff --git a/arch/riscv/net/bpf_jit_core.c b/arch/riscv/net/bpf_jit_core.c
index dd00103a2..d9c82537d 100644
--- a/arch/riscv/net/bpf_jit_core.c
+++ b/arch/riscv/net/bpf_jit_core.c
@@ -78,7 +78,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 	}
 
 	ctx->prog = prog;
-	ctx->offset = kcalloc(prog->len, sizeof(int), GFP_KERNEL);
+	ctx->offset = kvcalloc(prog->len, sizeof(int), GFP_KERNEL);
 	if (!ctx->offset) {
 		prog = orig_prog;
 		goto out_offset;
@@ -170,7 +170,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 			ctx->offset[i] = ninsns_rvoff(ctx->offset[i]);
 		bpf_prog_fill_jited_linfo(prog, ctx->offset);
 out_offset:
-		kfree(ctx->offset);
+		kvfree(ctx->offset);
 		kfree(jit_data);
 		prog->aux->jit_data = NULL;
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH bpf-next 3/4] riscv, bpf: Add RV_TAILCALL_OFFSET macro to format tailcall offset
  2023-09-19  3:57 ` Pu Lehui
@ 2023-09-19  3:57   ` Pu Lehui
  -1 siblings, 0 replies; 42+ messages in thread
From: Pu Lehui @ 2023-09-19  3:57 UTC (permalink / raw)
  To: bpf, linux-riscv, netdev
  Cc: Björn Töpel, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Palmer Dabbelt, Conor Dooley, Luke Nelson, Pu Lehui, Pu Lehui

From: Pu Lehui <pulehui@huawei.com>

Add RV_TAILCALL_OFFSET macro to format tailcall offset, and correct the
relevant comments.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
---
 arch/riscv/net/bpf_jit_comp64.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
index 57535cfe1..f2ded1151 100644
--- a/arch/riscv/net/bpf_jit_comp64.c
+++ b/arch/riscv/net/bpf_jit_comp64.c
@@ -13,7 +13,9 @@
 #include <asm/patch.h>
 #include "bpf_jit.h"
 
-#define RV_FENTRY_NINSNS 2
+#define RV_FENTRY_NINSNS	2
+/* fentry and TCC init insns will be skipped on tailcall */
+#define RV_TAILCALL_OFFSET	((RV_FENTRY_NINSNS + 1) * 4)
 
 #define RV_REG_TCC RV_REG_A6
 #define RV_REG_TCC_SAVED RV_REG_S6 /* Store A6 in S6 if program do calls */
@@ -243,8 +245,7 @@ static void __build_epilogue(bool is_tail_call, struct rv_jit_context *ctx)
 	if (!is_tail_call)
 		emit_mv(RV_REG_A0, RV_REG_A5, ctx);
 	emit_jalr(RV_REG_ZERO, is_tail_call ? RV_REG_T3 : RV_REG_RA,
-		  is_tail_call ? (RV_FENTRY_NINSNS + 1) * 4 : 0, /* skip reserved nops and TCC init */
-		  ctx);
+		  is_tail_call ? RV_TAILCALL_OFFSET : 0, ctx);
 }
 
 static void emit_bcc(u8 cond, u8 rd, u8 rs, int rvoff,
@@ -371,7 +372,7 @@ static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
 	off = ninsns_rvoff(tc_ninsn - (ctx->ninsns - start_insn));
 	emit_branch(BPF_JEQ, RV_REG_T2, RV_REG_ZERO, off, ctx);
 
-	/* goto *(prog->bpf_func + 4); */
+	/* goto *(prog->bpf_func + RV_TAILCALL_OFFSET); */
 	off = offsetof(struct bpf_prog, bpf_func);
 	if (is_12b_check(off, insn))
 		return -1;
-- 
2.25.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH bpf-next 3/4] riscv, bpf: Add RV_TAILCALL_OFFSET macro to format tailcall offset
@ 2023-09-19  3:57   ` Pu Lehui
  0 siblings, 0 replies; 42+ messages in thread
From: Pu Lehui @ 2023-09-19  3:57 UTC (permalink / raw)
  To: bpf, linux-riscv, netdev
  Cc: Björn Töpel, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Palmer Dabbelt, Conor Dooley, Luke Nelson, Pu Lehui, Pu Lehui

From: Pu Lehui <pulehui@huawei.com>

Add RV_TAILCALL_OFFSET macro to format tailcall offset, and correct the
relevant comments.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
---
 arch/riscv/net/bpf_jit_comp64.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
index 57535cfe1..f2ded1151 100644
--- a/arch/riscv/net/bpf_jit_comp64.c
+++ b/arch/riscv/net/bpf_jit_comp64.c
@@ -13,7 +13,9 @@
 #include <asm/patch.h>
 #include "bpf_jit.h"
 
-#define RV_FENTRY_NINSNS 2
+#define RV_FENTRY_NINSNS	2
+/* fentry and TCC init insns will be skipped on tailcall */
+#define RV_TAILCALL_OFFSET	((RV_FENTRY_NINSNS + 1) * 4)
 
 #define RV_REG_TCC RV_REG_A6
 #define RV_REG_TCC_SAVED RV_REG_S6 /* Store A6 in S6 if program do calls */
@@ -243,8 +245,7 @@ static void __build_epilogue(bool is_tail_call, struct rv_jit_context *ctx)
 	if (!is_tail_call)
 		emit_mv(RV_REG_A0, RV_REG_A5, ctx);
 	emit_jalr(RV_REG_ZERO, is_tail_call ? RV_REG_T3 : RV_REG_RA,
-		  is_tail_call ? (RV_FENTRY_NINSNS + 1) * 4 : 0, /* skip reserved nops and TCC init */
-		  ctx);
+		  is_tail_call ? RV_TAILCALL_OFFSET : 0, ctx);
 }
 
 static void emit_bcc(u8 cond, u8 rd, u8 rs, int rvoff,
@@ -371,7 +372,7 @@ static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
 	off = ninsns_rvoff(tc_ninsn - (ctx->ninsns - start_insn));
 	emit_branch(BPF_JEQ, RV_REG_T2, RV_REG_ZERO, off, ctx);
 
-	/* goto *(prog->bpf_func + 4); */
+	/* goto *(prog->bpf_func + RV_TAILCALL_OFFSET); */
 	off = offsetof(struct bpf_prog, bpf_func);
 	if (is_12b_check(off, insn))
 		return -1;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls
  2023-09-19  3:57 ` Pu Lehui
@ 2023-09-19  3:57   ` Pu Lehui
  -1 siblings, 0 replies; 42+ messages in thread
From: Pu Lehui @ 2023-09-19  3:57 UTC (permalink / raw)
  To: bpf, linux-riscv, netdev
  Cc: Björn Töpel, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Palmer Dabbelt, Conor Dooley, Luke Nelson, Pu Lehui, Pu Lehui

From: Pu Lehui <pulehui@huawei.com>

In the current RV64 JIT, if we just don't initialize the TCC in subprog,
the TCC can be propagated from the parent process to the subprocess, but
the TCC of the parent process cannot be restored when the subprocess
exits. Since the RV64 TCC is initialized before saving the callee saved
registers into the stack, we cannot use the callee saved register to
pass the TCC, otherwise the original value of the callee saved register
will be destroyed. So we implemented mixing bpf2bpf and tailcalls
similar to x86_64, i.e. using a non-callee saved register to transfer
the TCC between functions, and saving that register to the stack to
protect the TCC value. At the same time, we also consider the scenario
of mixing trampoline.

Tests test_bpf.ko and test_verifier have passed, as well as the relative
testcases of test_progs*.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
---
 arch/riscv/net/bpf_jit.h        |  1 +
 arch/riscv/net/bpf_jit_comp64.c | 91 ++++++++++++++-------------------
 2 files changed, 39 insertions(+), 53 deletions(-)

diff --git a/arch/riscv/net/bpf_jit.h b/arch/riscv/net/bpf_jit.h
index d21c6c92a..ca518846c 100644
--- a/arch/riscv/net/bpf_jit.h
+++ b/arch/riscv/net/bpf_jit.h
@@ -75,6 +75,7 @@ struct rv_jit_context {
 	int nexentries;
 	unsigned long flags;
 	int stack_size;
+	int tcc_offset;
 };
 
 /* Convert from ninsns to bytes. */
diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
index f2ded1151..f37be4911 100644
--- a/arch/riscv/net/bpf_jit_comp64.c
+++ b/arch/riscv/net/bpf_jit_comp64.c
@@ -13,13 +13,11 @@
 #include <asm/patch.h>
 #include "bpf_jit.h"
 
+#define RV_REG_TCC		RV_REG_A6
 #define RV_FENTRY_NINSNS	2
 /* fentry and TCC init insns will be skipped on tailcall */
 #define RV_TAILCALL_OFFSET	((RV_FENTRY_NINSNS + 1) * 4)
 
-#define RV_REG_TCC RV_REG_A6
-#define RV_REG_TCC_SAVED RV_REG_S6 /* Store A6 in S6 if program do calls */
-
 static const int regmap[] = {
 	[BPF_REG_0] =	RV_REG_A5,
 	[BPF_REG_1] =	RV_REG_A0,
@@ -51,14 +49,12 @@ static const int pt_regmap[] = {
 };
 
 enum {
-	RV_CTX_F_SEEN_TAIL_CALL =	0,
 	RV_CTX_F_SEEN_CALL =		RV_REG_RA,
 	RV_CTX_F_SEEN_S1 =		RV_REG_S1,
 	RV_CTX_F_SEEN_S2 =		RV_REG_S2,
 	RV_CTX_F_SEEN_S3 =		RV_REG_S3,
 	RV_CTX_F_SEEN_S4 =		RV_REG_S4,
 	RV_CTX_F_SEEN_S5 =		RV_REG_S5,
-	RV_CTX_F_SEEN_S6 =		RV_REG_S6,
 };
 
 static u8 bpf_to_rv_reg(int bpf_reg, struct rv_jit_context *ctx)
@@ -71,7 +67,6 @@ static u8 bpf_to_rv_reg(int bpf_reg, struct rv_jit_context *ctx)
 	case RV_CTX_F_SEEN_S3:
 	case RV_CTX_F_SEEN_S4:
 	case RV_CTX_F_SEEN_S5:
-	case RV_CTX_F_SEEN_S6:
 		__set_bit(reg, &ctx->flags);
 	}
 	return reg;
@@ -86,7 +81,6 @@ static bool seen_reg(int reg, struct rv_jit_context *ctx)
 	case RV_CTX_F_SEEN_S3:
 	case RV_CTX_F_SEEN_S4:
 	case RV_CTX_F_SEEN_S5:
-	case RV_CTX_F_SEEN_S6:
 		return test_bit(reg, &ctx->flags);
 	}
 	return false;
@@ -102,32 +96,6 @@ static void mark_call(struct rv_jit_context *ctx)
 	__set_bit(RV_CTX_F_SEEN_CALL, &ctx->flags);
 }
 
-static bool seen_call(struct rv_jit_context *ctx)
-{
-	return test_bit(RV_CTX_F_SEEN_CALL, &ctx->flags);
-}
-
-static void mark_tail_call(struct rv_jit_context *ctx)
-{
-	__set_bit(RV_CTX_F_SEEN_TAIL_CALL, &ctx->flags);
-}
-
-static bool seen_tail_call(struct rv_jit_context *ctx)
-{
-	return test_bit(RV_CTX_F_SEEN_TAIL_CALL, &ctx->flags);
-}
-
-static u8 rv_tail_call_reg(struct rv_jit_context *ctx)
-{
-	mark_tail_call(ctx);
-
-	if (seen_call(ctx)) {
-		__set_bit(RV_CTX_F_SEEN_S6, &ctx->flags);
-		return RV_REG_S6;
-	}
-	return RV_REG_A6;
-}
-
 static bool is_32b_int(s64 val)
 {
 	return -(1L << 31) <= val && val < (1L << 31);
@@ -235,10 +203,7 @@ static void __build_epilogue(bool is_tail_call, struct rv_jit_context *ctx)
 		emit_ld(RV_REG_S5, store_offset, RV_REG_SP, ctx);
 		store_offset -= 8;
 	}
-	if (seen_reg(RV_REG_S6, ctx)) {
-		emit_ld(RV_REG_S6, store_offset, RV_REG_SP, ctx);
-		store_offset -= 8;
-	}
+	emit_ld(RV_REG_TCC, store_offset, RV_REG_SP, ctx);
 
 	emit_addi(RV_REG_SP, RV_REG_SP, stack_adjust, ctx);
 	/* Set return value. */
@@ -332,7 +297,6 @@ static void emit_zext_32(u8 reg, struct rv_jit_context *ctx)
 static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
 {
 	int tc_ninsn, off, start_insn = ctx->ninsns;
-	u8 tcc = rv_tail_call_reg(ctx);
 
 	/* a0: &ctx
 	 * a1: &array
@@ -355,9 +319,11 @@ static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
 	/* if (--TCC < 0)
 	 *     goto out;
 	 */
-	emit_addi(RV_REG_TCC, tcc, -1, ctx);
+	emit_ld(RV_REG_TCC, ctx->tcc_offset, RV_REG_SP, ctx);
+	emit_addi(RV_REG_TCC, RV_REG_TCC, -1, ctx);
 	off = ninsns_rvoff(tc_ninsn - (ctx->ninsns - start_insn));
 	emit_branch(BPF_JSLT, RV_REG_TCC, RV_REG_ZERO, off, ctx);
+	emit_sd(RV_REG_SP, ctx->tcc_offset, RV_REG_TCC, ctx);
 
 	/* prog = array->ptrs[index];
 	 * if (!prog)
@@ -763,7 +729,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
 	int i, ret, offset;
 	int *branches_off = NULL;
 	int stack_size = 0, nregs = m->nr_args;
-	int retval_off, args_off, nregs_off, ip_off, run_ctx_off, sreg_off;
+	int retval_off, args_off, nregs_off, ip_off, run_ctx_off, sreg_off, tcc_off;
 	struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
 	struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
 	struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
@@ -807,6 +773,8 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
 	 *
 	 * FP - sreg_off    [ callee saved reg	]
 	 *
+	 * FP - tcc_off     [ tail call count	] BPF_TRAMP_F_TAIL_CALL_CTX
+	 *
 	 *		    [ pads              ] pads for 16 bytes alignment
 	 */
 
@@ -848,6 +816,11 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
 	stack_size += 8;
 	sreg_off = stack_size;
 
+	if (flags & BPF_TRAMP_F_TAIL_CALL_CTX) {
+		stack_size += 8;
+		tcc_off = stack_size;
+	}
+
 	stack_size = round_up(stack_size, 16);
 
 	if (func_addr) {
@@ -874,6 +847,10 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
 		emit_addi(RV_REG_FP, RV_REG_SP, stack_size, ctx);
 	}
 
+	/* store tail call count */
+	if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
+		emit_sd(RV_REG_FP, -tcc_off, RV_REG_TCC, ctx);
+
 	/* callee saved register S1 to pass start time */
 	emit_sd(RV_REG_FP, -sreg_off, RV_REG_S1, ctx);
 
@@ -927,6 +904,9 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
 
 	if (flags & BPF_TRAMP_F_CALL_ORIG) {
 		restore_args(nregs, args_off, ctx);
+		/* restore TCC to RV_REG_TCC before calling the original function */
+		if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
+			emit_ld(RV_REG_TCC, -tcc_off, RV_REG_FP, ctx);
 		ret = emit_call((const u64)orig_call, true, ctx);
 		if (ret)
 			goto out;
@@ -967,6 +947,10 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
 
 	emit_ld(RV_REG_S1, -sreg_off, RV_REG_FP, ctx);
 
+	/* restore TCC to RV_REG_TCC before calling the original function */
+	if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
+		emit_ld(RV_REG_TCC, -tcc_off, RV_REG_FP, ctx);
+
 	if (func_addr) {
 		/* trampoline called from function entry */
 		emit_ld(RV_REG_T0, stack_size - 8, RV_REG_SP, ctx);
@@ -1476,6 +1460,9 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 		if (ret < 0)
 			return ret;
 
+		/* restore TCC from stack to RV_REG_TCC */
+		emit_ld(RV_REG_TCC, ctx->tcc_offset, RV_REG_SP, ctx);
+
 		ret = emit_call(addr, fixed_addr, ctx);
 		if (ret)
 			return ret;
@@ -1735,6 +1722,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 void bpf_jit_build_prologue(struct rv_jit_context *ctx)
 {
 	int i, stack_adjust = 0, store_offset, bpf_stack_adjust;
+	bool is_main = ctx->prog->aux->func_idx == 0;
 
 	bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16);
 	if (bpf_stack_adjust)
@@ -1753,8 +1741,7 @@ void bpf_jit_build_prologue(struct rv_jit_context *ctx)
 		stack_adjust += 8;
 	if (seen_reg(RV_REG_S5, ctx))
 		stack_adjust += 8;
-	if (seen_reg(RV_REG_S6, ctx))
-		stack_adjust += 8;
+	stack_adjust += 8; /* RV_REG_TCC */
 
 	stack_adjust = round_up(stack_adjust, 16);
 	stack_adjust += bpf_stack_adjust;
@@ -1769,7 +1756,8 @@ void bpf_jit_build_prologue(struct rv_jit_context *ctx)
 	 * (TCC) register. This instruction is skipped for tail calls.
 	 * Force using a 4-byte (non-compressed) instruction.
 	 */
-	emit(rv_addi(RV_REG_TCC, RV_REG_ZERO, MAX_TAIL_CALL_CNT), ctx);
+	if (is_main)
+		emit(rv_addi(RV_REG_TCC, RV_REG_ZERO, MAX_TAIL_CALL_CNT), ctx);
 
 	emit_addi(RV_REG_SP, RV_REG_SP, -stack_adjust, ctx);
 
@@ -1799,22 +1787,14 @@ void bpf_jit_build_prologue(struct rv_jit_context *ctx)
 		emit_sd(RV_REG_SP, store_offset, RV_REG_S5, ctx);
 		store_offset -= 8;
 	}
-	if (seen_reg(RV_REG_S6, ctx)) {
-		emit_sd(RV_REG_SP, store_offset, RV_REG_S6, ctx);
-		store_offset -= 8;
-	}
+	emit_sd(RV_REG_SP, store_offset, RV_REG_TCC, ctx);
+	ctx->tcc_offset = store_offset;
 
 	emit_addi(RV_REG_FP, RV_REG_SP, stack_adjust, ctx);
 
 	if (bpf_stack_adjust)
 		emit_addi(RV_REG_S5, RV_REG_SP, bpf_stack_adjust, ctx);
 
-	/* Program contains calls and tail calls, so RV_REG_TCC need
-	 * to be saved across calls.
-	 */
-	if (seen_tail_call(ctx) && seen_call(ctx))
-		emit_mv(RV_REG_TCC_SAVED, RV_REG_TCC, ctx);
-
 	ctx->stack_size = stack_adjust;
 }
 
@@ -1827,3 +1807,8 @@ bool bpf_jit_supports_kfunc_call(void)
 {
 	return true;
 }
+
+bool bpf_jit_supports_subprog_tailcalls(void)
+{
+	return true;
+}
-- 
2.25.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls
@ 2023-09-19  3:57   ` Pu Lehui
  0 siblings, 0 replies; 42+ messages in thread
From: Pu Lehui @ 2023-09-19  3:57 UTC (permalink / raw)
  To: bpf, linux-riscv, netdev
  Cc: Björn Töpel, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Palmer Dabbelt, Conor Dooley, Luke Nelson, Pu Lehui, Pu Lehui

From: Pu Lehui <pulehui@huawei.com>

In the current RV64 JIT, if we just don't initialize the TCC in subprog,
the TCC can be propagated from the parent process to the subprocess, but
the TCC of the parent process cannot be restored when the subprocess
exits. Since the RV64 TCC is initialized before saving the callee saved
registers into the stack, we cannot use the callee saved register to
pass the TCC, otherwise the original value of the callee saved register
will be destroyed. So we implemented mixing bpf2bpf and tailcalls
similar to x86_64, i.e. using a non-callee saved register to transfer
the TCC between functions, and saving that register to the stack to
protect the TCC value. At the same time, we also consider the scenario
of mixing trampoline.

Tests test_bpf.ko and test_verifier have passed, as well as the relative
testcases of test_progs*.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
---
 arch/riscv/net/bpf_jit.h        |  1 +
 arch/riscv/net/bpf_jit_comp64.c | 91 ++++++++++++++-------------------
 2 files changed, 39 insertions(+), 53 deletions(-)

diff --git a/arch/riscv/net/bpf_jit.h b/arch/riscv/net/bpf_jit.h
index d21c6c92a..ca518846c 100644
--- a/arch/riscv/net/bpf_jit.h
+++ b/arch/riscv/net/bpf_jit.h
@@ -75,6 +75,7 @@ struct rv_jit_context {
 	int nexentries;
 	unsigned long flags;
 	int stack_size;
+	int tcc_offset;
 };
 
 /* Convert from ninsns to bytes. */
diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
index f2ded1151..f37be4911 100644
--- a/arch/riscv/net/bpf_jit_comp64.c
+++ b/arch/riscv/net/bpf_jit_comp64.c
@@ -13,13 +13,11 @@
 #include <asm/patch.h>
 #include "bpf_jit.h"
 
+#define RV_REG_TCC		RV_REG_A6
 #define RV_FENTRY_NINSNS	2
 /* fentry and TCC init insns will be skipped on tailcall */
 #define RV_TAILCALL_OFFSET	((RV_FENTRY_NINSNS + 1) * 4)
 
-#define RV_REG_TCC RV_REG_A6
-#define RV_REG_TCC_SAVED RV_REG_S6 /* Store A6 in S6 if program do calls */
-
 static const int regmap[] = {
 	[BPF_REG_0] =	RV_REG_A5,
 	[BPF_REG_1] =	RV_REG_A0,
@@ -51,14 +49,12 @@ static const int pt_regmap[] = {
 };
 
 enum {
-	RV_CTX_F_SEEN_TAIL_CALL =	0,
 	RV_CTX_F_SEEN_CALL =		RV_REG_RA,
 	RV_CTX_F_SEEN_S1 =		RV_REG_S1,
 	RV_CTX_F_SEEN_S2 =		RV_REG_S2,
 	RV_CTX_F_SEEN_S3 =		RV_REG_S3,
 	RV_CTX_F_SEEN_S4 =		RV_REG_S4,
 	RV_CTX_F_SEEN_S5 =		RV_REG_S5,
-	RV_CTX_F_SEEN_S6 =		RV_REG_S6,
 };
 
 static u8 bpf_to_rv_reg(int bpf_reg, struct rv_jit_context *ctx)
@@ -71,7 +67,6 @@ static u8 bpf_to_rv_reg(int bpf_reg, struct rv_jit_context *ctx)
 	case RV_CTX_F_SEEN_S3:
 	case RV_CTX_F_SEEN_S4:
 	case RV_CTX_F_SEEN_S5:
-	case RV_CTX_F_SEEN_S6:
 		__set_bit(reg, &ctx->flags);
 	}
 	return reg;
@@ -86,7 +81,6 @@ static bool seen_reg(int reg, struct rv_jit_context *ctx)
 	case RV_CTX_F_SEEN_S3:
 	case RV_CTX_F_SEEN_S4:
 	case RV_CTX_F_SEEN_S5:
-	case RV_CTX_F_SEEN_S6:
 		return test_bit(reg, &ctx->flags);
 	}
 	return false;
@@ -102,32 +96,6 @@ static void mark_call(struct rv_jit_context *ctx)
 	__set_bit(RV_CTX_F_SEEN_CALL, &ctx->flags);
 }
 
-static bool seen_call(struct rv_jit_context *ctx)
-{
-	return test_bit(RV_CTX_F_SEEN_CALL, &ctx->flags);
-}
-
-static void mark_tail_call(struct rv_jit_context *ctx)
-{
-	__set_bit(RV_CTX_F_SEEN_TAIL_CALL, &ctx->flags);
-}
-
-static bool seen_tail_call(struct rv_jit_context *ctx)
-{
-	return test_bit(RV_CTX_F_SEEN_TAIL_CALL, &ctx->flags);
-}
-
-static u8 rv_tail_call_reg(struct rv_jit_context *ctx)
-{
-	mark_tail_call(ctx);
-
-	if (seen_call(ctx)) {
-		__set_bit(RV_CTX_F_SEEN_S6, &ctx->flags);
-		return RV_REG_S6;
-	}
-	return RV_REG_A6;
-}
-
 static bool is_32b_int(s64 val)
 {
 	return -(1L << 31) <= val && val < (1L << 31);
@@ -235,10 +203,7 @@ static void __build_epilogue(bool is_tail_call, struct rv_jit_context *ctx)
 		emit_ld(RV_REG_S5, store_offset, RV_REG_SP, ctx);
 		store_offset -= 8;
 	}
-	if (seen_reg(RV_REG_S6, ctx)) {
-		emit_ld(RV_REG_S6, store_offset, RV_REG_SP, ctx);
-		store_offset -= 8;
-	}
+	emit_ld(RV_REG_TCC, store_offset, RV_REG_SP, ctx);
 
 	emit_addi(RV_REG_SP, RV_REG_SP, stack_adjust, ctx);
 	/* Set return value. */
@@ -332,7 +297,6 @@ static void emit_zext_32(u8 reg, struct rv_jit_context *ctx)
 static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
 {
 	int tc_ninsn, off, start_insn = ctx->ninsns;
-	u8 tcc = rv_tail_call_reg(ctx);
 
 	/* a0: &ctx
 	 * a1: &array
@@ -355,9 +319,11 @@ static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
 	/* if (--TCC < 0)
 	 *     goto out;
 	 */
-	emit_addi(RV_REG_TCC, tcc, -1, ctx);
+	emit_ld(RV_REG_TCC, ctx->tcc_offset, RV_REG_SP, ctx);
+	emit_addi(RV_REG_TCC, RV_REG_TCC, -1, ctx);
 	off = ninsns_rvoff(tc_ninsn - (ctx->ninsns - start_insn));
 	emit_branch(BPF_JSLT, RV_REG_TCC, RV_REG_ZERO, off, ctx);
+	emit_sd(RV_REG_SP, ctx->tcc_offset, RV_REG_TCC, ctx);
 
 	/* prog = array->ptrs[index];
 	 * if (!prog)
@@ -763,7 +729,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
 	int i, ret, offset;
 	int *branches_off = NULL;
 	int stack_size = 0, nregs = m->nr_args;
-	int retval_off, args_off, nregs_off, ip_off, run_ctx_off, sreg_off;
+	int retval_off, args_off, nregs_off, ip_off, run_ctx_off, sreg_off, tcc_off;
 	struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
 	struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
 	struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
@@ -807,6 +773,8 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
 	 *
 	 * FP - sreg_off    [ callee saved reg	]
 	 *
+	 * FP - tcc_off     [ tail call count	] BPF_TRAMP_F_TAIL_CALL_CTX
+	 *
 	 *		    [ pads              ] pads for 16 bytes alignment
 	 */
 
@@ -848,6 +816,11 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
 	stack_size += 8;
 	sreg_off = stack_size;
 
+	if (flags & BPF_TRAMP_F_TAIL_CALL_CTX) {
+		stack_size += 8;
+		tcc_off = stack_size;
+	}
+
 	stack_size = round_up(stack_size, 16);
 
 	if (func_addr) {
@@ -874,6 +847,10 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
 		emit_addi(RV_REG_FP, RV_REG_SP, stack_size, ctx);
 	}
 
+	/* store tail call count */
+	if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
+		emit_sd(RV_REG_FP, -tcc_off, RV_REG_TCC, ctx);
+
 	/* callee saved register S1 to pass start time */
 	emit_sd(RV_REG_FP, -sreg_off, RV_REG_S1, ctx);
 
@@ -927,6 +904,9 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
 
 	if (flags & BPF_TRAMP_F_CALL_ORIG) {
 		restore_args(nregs, args_off, ctx);
+		/* restore TCC to RV_REG_TCC before calling the original function */
+		if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
+			emit_ld(RV_REG_TCC, -tcc_off, RV_REG_FP, ctx);
 		ret = emit_call((const u64)orig_call, true, ctx);
 		if (ret)
 			goto out;
@@ -967,6 +947,10 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
 
 	emit_ld(RV_REG_S1, -sreg_off, RV_REG_FP, ctx);
 
+	/* restore TCC to RV_REG_TCC before calling the original function */
+	if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
+		emit_ld(RV_REG_TCC, -tcc_off, RV_REG_FP, ctx);
+
 	if (func_addr) {
 		/* trampoline called from function entry */
 		emit_ld(RV_REG_T0, stack_size - 8, RV_REG_SP, ctx);
@@ -1476,6 +1460,9 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 		if (ret < 0)
 			return ret;
 
+		/* restore TCC from stack to RV_REG_TCC */
+		emit_ld(RV_REG_TCC, ctx->tcc_offset, RV_REG_SP, ctx);
+
 		ret = emit_call(addr, fixed_addr, ctx);
 		if (ret)
 			return ret;
@@ -1735,6 +1722,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 void bpf_jit_build_prologue(struct rv_jit_context *ctx)
 {
 	int i, stack_adjust = 0, store_offset, bpf_stack_adjust;
+	bool is_main = ctx->prog->aux->func_idx == 0;
 
 	bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16);
 	if (bpf_stack_adjust)
@@ -1753,8 +1741,7 @@ void bpf_jit_build_prologue(struct rv_jit_context *ctx)
 		stack_adjust += 8;
 	if (seen_reg(RV_REG_S5, ctx))
 		stack_adjust += 8;
-	if (seen_reg(RV_REG_S6, ctx))
-		stack_adjust += 8;
+	stack_adjust += 8; /* RV_REG_TCC */
 
 	stack_adjust = round_up(stack_adjust, 16);
 	stack_adjust += bpf_stack_adjust;
@@ -1769,7 +1756,8 @@ void bpf_jit_build_prologue(struct rv_jit_context *ctx)
 	 * (TCC) register. This instruction is skipped for tail calls.
 	 * Force using a 4-byte (non-compressed) instruction.
 	 */
-	emit(rv_addi(RV_REG_TCC, RV_REG_ZERO, MAX_TAIL_CALL_CNT), ctx);
+	if (is_main)
+		emit(rv_addi(RV_REG_TCC, RV_REG_ZERO, MAX_TAIL_CALL_CNT), ctx);
 
 	emit_addi(RV_REG_SP, RV_REG_SP, -stack_adjust, ctx);
 
@@ -1799,22 +1787,14 @@ void bpf_jit_build_prologue(struct rv_jit_context *ctx)
 		emit_sd(RV_REG_SP, store_offset, RV_REG_S5, ctx);
 		store_offset -= 8;
 	}
-	if (seen_reg(RV_REG_S6, ctx)) {
-		emit_sd(RV_REG_SP, store_offset, RV_REG_S6, ctx);
-		store_offset -= 8;
-	}
+	emit_sd(RV_REG_SP, store_offset, RV_REG_TCC, ctx);
+	ctx->tcc_offset = store_offset;
 
 	emit_addi(RV_REG_FP, RV_REG_SP, stack_adjust, ctx);
 
 	if (bpf_stack_adjust)
 		emit_addi(RV_REG_S5, RV_REG_SP, bpf_stack_adjust, ctx);
 
-	/* Program contains calls and tail calls, so RV_REG_TCC need
-	 * to be saved across calls.
-	 */
-	if (seen_tail_call(ctx) && seen_call(ctx))
-		emit_mv(RV_REG_TCC_SAVED, RV_REG_TCC, ctx);
-
 	ctx->stack_size = stack_adjust;
 }
 
@@ -1827,3 +1807,8 @@ bool bpf_jit_supports_kfunc_call(void)
 {
 	return true;
 }
+
+bool bpf_jit_supports_subprog_tailcalls(void)
+{
+	return true;
+}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls
  2023-09-19  3:57   ` Pu Lehui
@ 2023-09-19 10:04     ` Conor Dooley
  -1 siblings, 0 replies; 42+ messages in thread
From: Conor Dooley @ 2023-09-19 10:04 UTC (permalink / raw)
  To: Pu Lehui
  Cc: bpf, linux-riscv, netdev, Björn Töpel,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Luke Nelson, Pu Lehui


[-- Attachment #1.1: Type: text/plain, Size: 1169 bytes --]

On Tue, Sep 19, 2023 at 11:57:11AM +0800, Pu Lehui wrote:
> From: Pu Lehui <pulehui@huawei.com>
> 
> In the current RV64 JIT, if we just don't initialize the TCC in subprog,
> the TCC can be propagated from the parent process to the subprocess, but
> the TCC of the parent process cannot be restored when the subprocess
> exits. Since the RV64 TCC is initialized before saving the callee saved
> registers into the stack, we cannot use the callee saved register to
> pass the TCC, otherwise the original value of the callee saved register
> will be destroyed. So we implemented mixing bpf2bpf and tailcalls
> similar to x86_64, i.e. using a non-callee saved register to transfer
> the TCC between functions, and saving that register to the stack to
> protect the TCC value. At the same time, we also consider the scenario
> of mixing trampoline.
> 
> Tests test_bpf.ko and test_verifier have passed, as well as the relative
> testcases of test_progs*.
> 
> Signed-off-by: Pu Lehui <pulehui@huawei.com>

Breaks the build:
../arch/riscv/net/bpf_jit_comp64.c:846:14: error: use of undeclared identifier 'BPF_TRAMP_F_TAIL_CALL_CTX'

Thanks,
Conor.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls
@ 2023-09-19 10:04     ` Conor Dooley
  0 siblings, 0 replies; 42+ messages in thread
From: Conor Dooley @ 2023-09-19 10:04 UTC (permalink / raw)
  To: Pu Lehui
  Cc: bpf, linux-riscv, netdev, Björn Töpel,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Luke Nelson, Pu Lehui

[-- Attachment #1: Type: text/plain, Size: 1169 bytes --]

On Tue, Sep 19, 2023 at 11:57:11AM +0800, Pu Lehui wrote:
> From: Pu Lehui <pulehui@huawei.com>
> 
> In the current RV64 JIT, if we just don't initialize the TCC in subprog,
> the TCC can be propagated from the parent process to the subprocess, but
> the TCC of the parent process cannot be restored when the subprocess
> exits. Since the RV64 TCC is initialized before saving the callee saved
> registers into the stack, we cannot use the callee saved register to
> pass the TCC, otherwise the original value of the callee saved register
> will be destroyed. So we implemented mixing bpf2bpf and tailcalls
> similar to x86_64, i.e. using a non-callee saved register to transfer
> the TCC between functions, and saving that register to the stack to
> protect the TCC value. At the same time, we also consider the scenario
> of mixing trampoline.
> 
> Tests test_bpf.ko and test_verifier have passed, as well as the relative
> testcases of test_progs*.
> 
> Signed-off-by: Pu Lehui <pulehui@huawei.com>

Breaks the build:
../arch/riscv/net/bpf_jit_comp64.c:846:14: error: use of undeclared identifier 'BPF_TRAMP_F_TAIL_CALL_CTX'

Thanks,
Conor.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls
  2023-09-19 10:04     ` Conor Dooley
@ 2023-09-19 10:54       ` Conor Dooley
  -1 siblings, 0 replies; 42+ messages in thread
From: Conor Dooley @ 2023-09-19 10:54 UTC (permalink / raw)
  To: Pu Lehui
  Cc: bpf, linux-riscv, netdev, Björn Töpel,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Luke Nelson, Pu Lehui


[-- Attachment #1.1: Type: text/plain, Size: 1347 bytes --]

On Tue, Sep 19, 2023 at 11:04:53AM +0100, Conor Dooley wrote:
> On Tue, Sep 19, 2023 at 11:57:11AM +0800, Pu Lehui wrote:
> > From: Pu Lehui <pulehui@huawei.com>
> > 
> > In the current RV64 JIT, if we just don't initialize the TCC in subprog,
> > the TCC can be propagated from the parent process to the subprocess, but
> > the TCC of the parent process cannot be restored when the subprocess
> > exits. Since the RV64 TCC is initialized before saving the callee saved
> > registers into the stack, we cannot use the callee saved register to
> > pass the TCC, otherwise the original value of the callee saved register
> > will be destroyed. So we implemented mixing bpf2bpf and tailcalls
> > similar to x86_64, i.e. using a non-callee saved register to transfer
> > the TCC between functions, and saving that register to the stack to
> > protect the TCC value. At the same time, we also consider the scenario
> > of mixing trampoline.
> > 
> > Tests test_bpf.ko and test_verifier have passed, as well as the relative
> > testcases of test_progs*.
> > 
> > Signed-off-by: Pu Lehui <pulehui@huawei.com>
> 
> Breaks the build:
> ../arch/riscv/net/bpf_jit_comp64.c:846:14: error: use of undeclared identifier 'BPF_TRAMP_F_TAIL_CALL_CTX'

Probably should have specified, happens with allmodconfig, but not a
defconfig build.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls
@ 2023-09-19 10:54       ` Conor Dooley
  0 siblings, 0 replies; 42+ messages in thread
From: Conor Dooley @ 2023-09-19 10:54 UTC (permalink / raw)
  To: Pu Lehui
  Cc: bpf, linux-riscv, netdev, Björn Töpel,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Luke Nelson, Pu Lehui

[-- Attachment #1: Type: text/plain, Size: 1347 bytes --]

On Tue, Sep 19, 2023 at 11:04:53AM +0100, Conor Dooley wrote:
> On Tue, Sep 19, 2023 at 11:57:11AM +0800, Pu Lehui wrote:
> > From: Pu Lehui <pulehui@huawei.com>
> > 
> > In the current RV64 JIT, if we just don't initialize the TCC in subprog,
> > the TCC can be propagated from the parent process to the subprocess, but
> > the TCC of the parent process cannot be restored when the subprocess
> > exits. Since the RV64 TCC is initialized before saving the callee saved
> > registers into the stack, we cannot use the callee saved register to
> > pass the TCC, otherwise the original value of the callee saved register
> > will be destroyed. So we implemented mixing bpf2bpf and tailcalls
> > similar to x86_64, i.e. using a non-callee saved register to transfer
> > the TCC between functions, and saving that register to the stack to
> > protect the TCC value. At the same time, we also consider the scenario
> > of mixing trampoline.
> > 
> > Tests test_bpf.ko and test_verifier have passed, as well as the relative
> > testcases of test_progs*.
> > 
> > Signed-off-by: Pu Lehui <pulehui@huawei.com>
> 
> Breaks the build:
> ../arch/riscv/net/bpf_jit_comp64.c:846:14: error: use of undeclared identifier 'BPF_TRAMP_F_TAIL_CALL_CTX'

Probably should have specified, happens with allmodconfig, but not a
defconfig build.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls
  2023-09-19 10:04     ` Conor Dooley
@ 2023-09-19 11:23       ` Pu Lehui
  -1 siblings, 0 replies; 42+ messages in thread
From: Pu Lehui @ 2023-09-19 11:23 UTC (permalink / raw)
  To: Conor Dooley, Pu Lehui
  Cc: bpf, linux-riscv, netdev, Björn Töpel,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Luke Nelson



On 2023/9/19 18:04, Conor Dooley wrote:
> On Tue, Sep 19, 2023 at 11:57:11AM +0800, Pu Lehui wrote:
>> From: Pu Lehui <pulehui@huawei.com>
>>
>> In the current RV64 JIT, if we just don't initialize the TCC in subprog,
>> the TCC can be propagated from the parent process to the subprocess, but
>> the TCC of the parent process cannot be restored when the subprocess
>> exits. Since the RV64 TCC is initialized before saving the callee saved
>> registers into the stack, we cannot use the callee saved register to
>> pass the TCC, otherwise the original value of the callee saved register
>> will be destroyed. So we implemented mixing bpf2bpf and tailcalls
>> similar to x86_64, i.e. using a non-callee saved register to transfer
>> the TCC between functions, and saving that register to the stack to
>> protect the TCC value. At the same time, we also consider the scenario
>> of mixing trampoline.
>>
>> Tests test_bpf.ko and test_verifier have passed, as well as the relative
>> testcases of test_progs*.
>>
>> Signed-off-by: Pu Lehui <pulehui@huawei.com>
> 
> Breaks the build:
> ../arch/riscv/net/bpf_jit_comp64.c:846:14: error: use of undeclared identifier 'BPF_TRAMP_F_TAIL_CALL_CTX'
> 

Hi Conor,

BPF_TRAMP_F_TAIL_CALL_CTX rely on commit [0], and it has been merged 
into bpf-next tree.

Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=2b5dcb31a19a2e0acd869b12c9db9b2d696ef544 
[0]

> Thanks,
> Conor.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls
@ 2023-09-19 11:23       ` Pu Lehui
  0 siblings, 0 replies; 42+ messages in thread
From: Pu Lehui @ 2023-09-19 11:23 UTC (permalink / raw)
  To: Conor Dooley, Pu Lehui
  Cc: bpf, linux-riscv, netdev, Björn Töpel,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Luke Nelson



On 2023/9/19 18:04, Conor Dooley wrote:
> On Tue, Sep 19, 2023 at 11:57:11AM +0800, Pu Lehui wrote:
>> From: Pu Lehui <pulehui@huawei.com>
>>
>> In the current RV64 JIT, if we just don't initialize the TCC in subprog,
>> the TCC can be propagated from the parent process to the subprocess, but
>> the TCC of the parent process cannot be restored when the subprocess
>> exits. Since the RV64 TCC is initialized before saving the callee saved
>> registers into the stack, we cannot use the callee saved register to
>> pass the TCC, otherwise the original value of the callee saved register
>> will be destroyed. So we implemented mixing bpf2bpf and tailcalls
>> similar to x86_64, i.e. using a non-callee saved register to transfer
>> the TCC between functions, and saving that register to the stack to
>> protect the TCC value. At the same time, we also consider the scenario
>> of mixing trampoline.
>>
>> Tests test_bpf.ko and test_verifier have passed, as well as the relative
>> testcases of test_progs*.
>>
>> Signed-off-by: Pu Lehui <pulehui@huawei.com>
> 
> Breaks the build:
> ../arch/riscv/net/bpf_jit_comp64.c:846:14: error: use of undeclared identifier 'BPF_TRAMP_F_TAIL_CALL_CTX'
> 

Hi Conor,

BPF_TRAMP_F_TAIL_CALL_CTX rely on commit [0], and it has been merged 
into bpf-next tree.

Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=2b5dcb31a19a2e0acd869b12c9db9b2d696ef544 
[0]

> Thanks,
> Conor.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls
  2023-09-19 11:23       ` Pu Lehui
@ 2023-09-19 11:50         ` Conor Dooley
  -1 siblings, 0 replies; 42+ messages in thread
From: Conor Dooley @ 2023-09-19 11:50 UTC (permalink / raw)
  To: Pu Lehui
  Cc: Pu Lehui, bpf, linux-riscv, netdev, Björn Töpel,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Luke Nelson


[-- Attachment #1.1: Type: text/plain, Size: 1742 bytes --]

On Tue, Sep 19, 2023 at 07:23:07PM +0800, Pu Lehui wrote:
> 
> 
> On 2023/9/19 18:04, Conor Dooley wrote:
> > On Tue, Sep 19, 2023 at 11:57:11AM +0800, Pu Lehui wrote:
> > > From: Pu Lehui <pulehui@huawei.com>
> > > 
> > > In the current RV64 JIT, if we just don't initialize the TCC in subprog,
> > > the TCC can be propagated from the parent process to the subprocess, but
> > > the TCC of the parent process cannot be restored when the subprocess
> > > exits. Since the RV64 TCC is initialized before saving the callee saved
> > > registers into the stack, we cannot use the callee saved register to
> > > pass the TCC, otherwise the original value of the callee saved register
> > > will be destroyed. So we implemented mixing bpf2bpf and tailcalls
> > > similar to x86_64, i.e. using a non-callee saved register to transfer
> > > the TCC between functions, and saving that register to the stack to
> > > protect the TCC value. At the same time, we also consider the scenario
> > > of mixing trampoline.
> > > 
> > > Tests test_bpf.ko and test_verifier have passed, as well as the relative
> > > testcases of test_progs*.
> > > 
> > > Signed-off-by: Pu Lehui <pulehui@huawei.com>
> > 
> > Breaks the build:
> > ../arch/riscv/net/bpf_jit_comp64.c:846:14: error: use of undeclared identifier 'BPF_TRAMP_F_TAIL_CALL_CTX'
> > 
> 
> Hi Conor,
> 
> BPF_TRAMP_F_TAIL_CALL_CTX rely on commit [0], and it has been merged into
> bpf-next tree.

I see. I did check the cover to see if there was anything relevant
there, like a link or base commit, but since there were neither I opted
to pass on the warning from the patchwork automation we have :)

Thanks & sorry for the noise on this one.

Thanks,
Conor.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls
@ 2023-09-19 11:50         ` Conor Dooley
  0 siblings, 0 replies; 42+ messages in thread
From: Conor Dooley @ 2023-09-19 11:50 UTC (permalink / raw)
  To: Pu Lehui
  Cc: Pu Lehui, bpf, linux-riscv, netdev, Björn Töpel,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Luke Nelson

[-- Attachment #1: Type: text/plain, Size: 1742 bytes --]

On Tue, Sep 19, 2023 at 07:23:07PM +0800, Pu Lehui wrote:
> 
> 
> On 2023/9/19 18:04, Conor Dooley wrote:
> > On Tue, Sep 19, 2023 at 11:57:11AM +0800, Pu Lehui wrote:
> > > From: Pu Lehui <pulehui@huawei.com>
> > > 
> > > In the current RV64 JIT, if we just don't initialize the TCC in subprog,
> > > the TCC can be propagated from the parent process to the subprocess, but
> > > the TCC of the parent process cannot be restored when the subprocess
> > > exits. Since the RV64 TCC is initialized before saving the callee saved
> > > registers into the stack, we cannot use the callee saved register to
> > > pass the TCC, otherwise the original value of the callee saved register
> > > will be destroyed. So we implemented mixing bpf2bpf and tailcalls
> > > similar to x86_64, i.e. using a non-callee saved register to transfer
> > > the TCC between functions, and saving that register to the stack to
> > > protect the TCC value. At the same time, we also consider the scenario
> > > of mixing trampoline.
> > > 
> > > Tests test_bpf.ko and test_verifier have passed, as well as the relative
> > > testcases of test_progs*.
> > > 
> > > Signed-off-by: Pu Lehui <pulehui@huawei.com>
> > 
> > Breaks the build:
> > ../arch/riscv/net/bpf_jit_comp64.c:846:14: error: use of undeclared identifier 'BPF_TRAMP_F_TAIL_CALL_CTX'
> > 
> 
> Hi Conor,
> 
> BPF_TRAMP_F_TAIL_CALL_CTX rely on commit [0], and it has been merged into
> bpf-next tree.

I see. I did check the cover to see if there was anything relevant
there, like a link or base commit, but since there were neither I opted
to pass on the warning from the patchwork automation we have :)

Thanks & sorry for the noise on this one.

Thanks,
Conor.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls
  2023-09-19 11:50         ` Conor Dooley
@ 2023-09-19 12:01           ` Pu Lehui
  -1 siblings, 0 replies; 42+ messages in thread
From: Pu Lehui @ 2023-09-19 12:01 UTC (permalink / raw)
  To: Conor Dooley
  Cc: Pu Lehui, bpf, linux-riscv, netdev, Björn Töpel,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Luke Nelson



On 2023/9/19 19:50, Conor Dooley wrote:
> On Tue, Sep 19, 2023 at 07:23:07PM +0800, Pu Lehui wrote:
>>
>>
>> On 2023/9/19 18:04, Conor Dooley wrote:
>>> On Tue, Sep 19, 2023 at 11:57:11AM +0800, Pu Lehui wrote:
>>>> From: Pu Lehui <pulehui@huawei.com>
>>>>
>>>> In the current RV64 JIT, if we just don't initialize the TCC in subprog,
>>>> the TCC can be propagated from the parent process to the subprocess, but
>>>> the TCC of the parent process cannot be restored when the subprocess
>>>> exits. Since the RV64 TCC is initialized before saving the callee saved
>>>> registers into the stack, we cannot use the callee saved register to
>>>> pass the TCC, otherwise the original value of the callee saved register
>>>> will be destroyed. So we implemented mixing bpf2bpf and tailcalls
>>>> similar to x86_64, i.e. using a non-callee saved register to transfer
>>>> the TCC between functions, and saving that register to the stack to
>>>> protect the TCC value. At the same time, we also consider the scenario
>>>> of mixing trampoline.
>>>>
>>>> Tests test_bpf.ko and test_verifier have passed, as well as the relative
>>>> testcases of test_progs*.
>>>>
>>>> Signed-off-by: Pu Lehui <pulehui@huawei.com>
>>>
>>> Breaks the build:
>>> ../arch/riscv/net/bpf_jit_comp64.c:846:14: error: use of undeclared identifier 'BPF_TRAMP_F_TAIL_CALL_CTX'
>>>
>>
>> Hi Conor,
>>
>> BPF_TRAMP_F_TAIL_CALL_CTX rely on commit [0], and it has been merged into
>> bpf-next tree.
> 
> I see. I did check the cover to see if there was anything relevant
> there, like a link or base commit, but since there were neither I opted
> to pass on the warning from the patchwork automation we have :) >

Thanks, maybe it should be better to attach it to the cover.

> Thanks & sorry for the noise on this one.
> 
> Thanks,
> Conor.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls
@ 2023-09-19 12:01           ` Pu Lehui
  0 siblings, 0 replies; 42+ messages in thread
From: Pu Lehui @ 2023-09-19 12:01 UTC (permalink / raw)
  To: Conor Dooley
  Cc: Pu Lehui, bpf, linux-riscv, netdev, Björn Töpel,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Luke Nelson



On 2023/9/19 19:50, Conor Dooley wrote:
> On Tue, Sep 19, 2023 at 07:23:07PM +0800, Pu Lehui wrote:
>>
>>
>> On 2023/9/19 18:04, Conor Dooley wrote:
>>> On Tue, Sep 19, 2023 at 11:57:11AM +0800, Pu Lehui wrote:
>>>> From: Pu Lehui <pulehui@huawei.com>
>>>>
>>>> In the current RV64 JIT, if we just don't initialize the TCC in subprog,
>>>> the TCC can be propagated from the parent process to the subprocess, but
>>>> the TCC of the parent process cannot be restored when the subprocess
>>>> exits. Since the RV64 TCC is initialized before saving the callee saved
>>>> registers into the stack, we cannot use the callee saved register to
>>>> pass the TCC, otherwise the original value of the callee saved register
>>>> will be destroyed. So we implemented mixing bpf2bpf and tailcalls
>>>> similar to x86_64, i.e. using a non-callee saved register to transfer
>>>> the TCC between functions, and saving that register to the stack to
>>>> protect the TCC value. At the same time, we also consider the scenario
>>>> of mixing trampoline.
>>>>
>>>> Tests test_bpf.ko and test_verifier have passed, as well as the relative
>>>> testcases of test_progs*.
>>>>
>>>> Signed-off-by: Pu Lehui <pulehui@huawei.com>
>>>
>>> Breaks the build:
>>> ../arch/riscv/net/bpf_jit_comp64.c:846:14: error: use of undeclared identifier 'BPF_TRAMP_F_TAIL_CALL_CTX'
>>>
>>
>> Hi Conor,
>>
>> BPF_TRAMP_F_TAIL_CALL_CTX rely on commit [0], and it has been merged into
>> bpf-next tree.
> 
> I see. I did check the cover to see if there was anything relevant
> there, like a link or base commit, but since there were neither I opted
> to pass on the warning from the patchwork automation we have :) >

Thanks, maybe it should be better to attach it to the cover.

> Thanks & sorry for the noise on this one.
> 
> Thanks,
> Conor.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next 0/4] Mixing bpf2bpf and tailcalls for RV64
  2023-09-19  3:57 ` Pu Lehui
@ 2023-09-26 13:30   ` Björn Töpel
  -1 siblings, 0 replies; 42+ messages in thread
From: Björn Töpel @ 2023-09-26 13:30 UTC (permalink / raw)
  To: Pu Lehui, bpf, linux-riscv, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Conor Dooley, Luke Nelson, Pu Lehui, Pu Lehui

Pu Lehui <pulehui@huaweicloud.com> writes:

> In the current RV64 JIT, if we just don't initialize the TCC in subprog,
> the TCC can be propagated from the parent process to the subprocess, but
> the TCC of the parent process cannot be restored when the subprocess
> exits. Since the RV64 TCC is initialized before saving the callee saved
> registers into the stack, we cannot use the callee saved register to
> pass the TCC, otherwise the original value of the callee saved register
> will be destroyed. So we implemented mixing bpf2bpf and tailcalls
> similar to x86_64, i.e. using a non-callee saved register to transfer
> the TCC between functions, and saving that register to the stack to
> protect the TCC value. At the same time, we also consider the scenario
> of mixing trampoline.
>
> In addition, some code cleans are also attached to this patchset.
>
> Tests test_bpf.ko and test_verifier have passed, as well as the relative
> testcases of test_progs*.

Apologies for the review delay. I'm travelling, and will pick it up ASAP
when I'm back.


Björn

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next 0/4] Mixing bpf2bpf and tailcalls for RV64
@ 2023-09-26 13:30   ` Björn Töpel
  0 siblings, 0 replies; 42+ messages in thread
From: Björn Töpel @ 2023-09-26 13:30 UTC (permalink / raw)
  To: Pu Lehui, bpf, linux-riscv, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Conor Dooley, Luke Nelson, Pu Lehui, Pu Lehui

Pu Lehui <pulehui@huaweicloud.com> writes:

> In the current RV64 JIT, if we just don't initialize the TCC in subprog,
> the TCC can be propagated from the parent process to the subprocess, but
> the TCC of the parent process cannot be restored when the subprocess
> exits. Since the RV64 TCC is initialized before saving the callee saved
> registers into the stack, we cannot use the callee saved register to
> pass the TCC, otherwise the original value of the callee saved register
> will be destroyed. So we implemented mixing bpf2bpf and tailcalls
> similar to x86_64, i.e. using a non-callee saved register to transfer
> the TCC between functions, and saving that register to the stack to
> protect the TCC value. At the same time, we also consider the scenario
> of mixing trampoline.
>
> In addition, some code cleans are also attached to this patchset.
>
> Tests test_bpf.ko and test_verifier have passed, as well as the relative
> testcases of test_progs*.

Apologies for the review delay. I'm travelling, and will pick it up ASAP
when I'm back.


Björn

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls
  2023-09-19  3:57   ` Pu Lehui
@ 2023-09-28  9:59     ` Björn Töpel
  -1 siblings, 0 replies; 42+ messages in thread
From: Björn Töpel @ 2023-09-28  9:59 UTC (permalink / raw)
  To: Pu Lehui, bpf, linux-riscv, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Conor Dooley, Luke Nelson, Pu Lehui, Pu Lehui

Pu Lehui <pulehui@huaweicloud.com> writes:

> From: Pu Lehui <pulehui@huawei.com>
>
> In the current RV64 JIT, if we just don't initialize the TCC in subprog,
> the TCC can be propagated from the parent process to the subprocess, but
> the TCC of the parent process cannot be restored when the subprocess
> exits. Since the RV64 TCC is initialized before saving the callee saved
> registers into the stack, we cannot use the callee saved register to
> pass the TCC, otherwise the original value of the callee saved register
> will be destroyed. So we implemented mixing bpf2bpf and tailcalls
> similar to x86_64, i.e. using a non-callee saved register to transfer
> the TCC between functions, and saving that register to the stack to
> protect the TCC value. At the same time, we also consider the scenario
> of mixing trampoline.

Hi!

The RISC-V JIT tries to minimize the stack usage, e.g. it doesn't have a
fixed pro/epilogue like some of the other JITs. I think we can do better
here, so that the pass-TCC-via-register can be used, and the additional
stack access can be avoided.

Today, the TCC is passed via a register (a6) and can be viewed as a
"state" variable/transparent argument/return value. As you point out, we
loose this when we do a call. On (any) calls we move the TCC to a
callee-saved register.

WDYT about the following scheme:

1 Pickup the arm64 bpf2bpf/tailmix mechanism of just clearing the TCC
  for the main program.
2 For BPF helper calls, move TCC to s6, perform the call, and restore
  a6. Dito for kfunc calls (BPF_PSEUDO_KFUNC_CALL).
3 For all other calls, a6 is passed transparently.

For 2 bpf_jit_get_func_addr() can be used to determine if the callee is
a BPF helper or not.

In summary; Determine in the JIT if we're leaving BPF-land, and need to
move the TCC to a callee-saved reg, or not, and save us a bunch of stack
store/loads.


Björn

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls
@ 2023-09-28  9:59     ` Björn Töpel
  0 siblings, 0 replies; 42+ messages in thread
From: Björn Töpel @ 2023-09-28  9:59 UTC (permalink / raw)
  To: Pu Lehui, bpf, linux-riscv, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Conor Dooley, Luke Nelson, Pu Lehui, Pu Lehui

Pu Lehui <pulehui@huaweicloud.com> writes:

> From: Pu Lehui <pulehui@huawei.com>
>
> In the current RV64 JIT, if we just don't initialize the TCC in subprog,
> the TCC can be propagated from the parent process to the subprocess, but
> the TCC of the parent process cannot be restored when the subprocess
> exits. Since the RV64 TCC is initialized before saving the callee saved
> registers into the stack, we cannot use the callee saved register to
> pass the TCC, otherwise the original value of the callee saved register
> will be destroyed. So we implemented mixing bpf2bpf and tailcalls
> similar to x86_64, i.e. using a non-callee saved register to transfer
> the TCC between functions, and saving that register to the stack to
> protect the TCC value. At the same time, we also consider the scenario
> of mixing trampoline.

Hi!

The RISC-V JIT tries to minimize the stack usage, e.g. it doesn't have a
fixed pro/epilogue like some of the other JITs. I think we can do better
here, so that the pass-TCC-via-register can be used, and the additional
stack access can be avoided.

Today, the TCC is passed via a register (a6) and can be viewed as a
"state" variable/transparent argument/return value. As you point out, we
loose this when we do a call. On (any) calls we move the TCC to a
callee-saved register.

WDYT about the following scheme:

1 Pickup the arm64 bpf2bpf/tailmix mechanism of just clearing the TCC
  for the main program.
2 For BPF helper calls, move TCC to s6, perform the call, and restore
  a6. Dito for kfunc calls (BPF_PSEUDO_KFUNC_CALL).
3 For all other calls, a6 is passed transparently.

For 2 bpf_jit_get_func_addr() can be used to determine if the callee is
a BPF helper or not.

In summary; Determine in the JIT if we're leaving BPF-land, and need to
move the TCC to a callee-saved reg, or not, and save us a bunch of stack
store/loads.


Björn

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls
  2023-09-28  9:59     ` Björn Töpel
@ 2023-09-28 10:39       ` Pu Lehui
  -1 siblings, 0 replies; 42+ messages in thread
From: Pu Lehui @ 2023-09-28 10:39 UTC (permalink / raw)
  To: Björn Töpel, bpf, linux-riscv, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Conor Dooley, Luke Nelson, Pu Lehui



On 2023/9/28 17:59, Björn Töpel wrote:
> Pu Lehui <pulehui@huaweicloud.com> writes:
> 
>> From: Pu Lehui <pulehui@huawei.com>
>>
>> In the current RV64 JIT, if we just don't initialize the TCC in subprog,
>> the TCC can be propagated from the parent process to the subprocess, but
>> the TCC of the parent process cannot be restored when the subprocess
>> exits. Since the RV64 TCC is initialized before saving the callee saved
>> registers into the stack, we cannot use the callee saved register to
>> pass the TCC, otherwise the original value of the callee saved register
>> will be destroyed. So we implemented mixing bpf2bpf and tailcalls
>> similar to x86_64, i.e. using a non-callee saved register to transfer
>> the TCC between functions, and saving that register to the stack to
>> protect the TCC value. At the same time, we also consider the scenario
>> of mixing trampoline.
> 
> Hi!
> 
> The RISC-V JIT tries to minimize the stack usage, e.g. it doesn't have a
> fixed pro/epilogue like some of the other JITs. I think we can do better
> here, so that the pass-TCC-via-register can be used, and the additional
> stack access can be avoided.
> 
> Today, the TCC is passed via a register (a6) and can be viewed as a
> "state" variable/transparent argument/return value. As you point out, we
> loose this when we do a call. On (any) calls we move the TCC to a
> callee-saved register.
> 
> WDYT about the following scheme:
> 
> 1 Pickup the arm64 bpf2bpf/tailmix mechanism of just clearing the TCC
>    for the main program.
> 2 For BPF helper calls, move TCC to s6, perform the call, and restore
>    a6. Dito for kfunc calls (BPF_PSEUDO_KFUNC_CALL).
> 3 For all other calls, a6 is passed transparently.
> 
> For 2 bpf_jit_get_func_addr() can be used to determine if the callee is
> a BPF helper or not.
> 
> In summary; Determine in the JIT if we're leaving BPF-land, and need to
> move the TCC to a callee-saved reg, or not, and save us a bunch of stack
> store/loads.
> 

Sorry, I am on holiday, will deal with it after the holiday.

> 
> Björn


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls
@ 2023-09-28 10:39       ` Pu Lehui
  0 siblings, 0 replies; 42+ messages in thread
From: Pu Lehui @ 2023-09-28 10:39 UTC (permalink / raw)
  To: Björn Töpel, bpf, linux-riscv, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Conor Dooley, Luke Nelson, Pu Lehui



On 2023/9/28 17:59, Björn Töpel wrote:
> Pu Lehui <pulehui@huaweicloud.com> writes:
> 
>> From: Pu Lehui <pulehui@huawei.com>
>>
>> In the current RV64 JIT, if we just don't initialize the TCC in subprog,
>> the TCC can be propagated from the parent process to the subprocess, but
>> the TCC of the parent process cannot be restored when the subprocess
>> exits. Since the RV64 TCC is initialized before saving the callee saved
>> registers into the stack, we cannot use the callee saved register to
>> pass the TCC, otherwise the original value of the callee saved register
>> will be destroyed. So we implemented mixing bpf2bpf and tailcalls
>> similar to x86_64, i.e. using a non-callee saved register to transfer
>> the TCC between functions, and saving that register to the stack to
>> protect the TCC value. At the same time, we also consider the scenario
>> of mixing trampoline.
> 
> Hi!
> 
> The RISC-V JIT tries to minimize the stack usage, e.g. it doesn't have a
> fixed pro/epilogue like some of the other JITs. I think we can do better
> here, so that the pass-TCC-via-register can be used, and the additional
> stack access can be avoided.
> 
> Today, the TCC is passed via a register (a6) and can be viewed as a
> "state" variable/transparent argument/return value. As you point out, we
> loose this when we do a call. On (any) calls we move the TCC to a
> callee-saved register.
> 
> WDYT about the following scheme:
> 
> 1 Pickup the arm64 bpf2bpf/tailmix mechanism of just clearing the TCC
>    for the main program.
> 2 For BPF helper calls, move TCC to s6, perform the call, and restore
>    a6. Dito for kfunc calls (BPF_PSEUDO_KFUNC_CALL).
> 3 For all other calls, a6 is passed transparently.
> 
> For 2 bpf_jit_get_func_addr() can be used to determine if the callee is
> a BPF helper or not.
> 
> In summary; Determine in the JIT if we're leaving BPF-land, and need to
> move the TCC to a callee-saved reg, or not, and save us a bunch of stack
> store/loads.
> 

Sorry, I am on holiday, will deal with it after the holiday.

> 
> Björn


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls
  2023-09-28  9:59     ` Björn Töpel
@ 2024-01-16 14:21       ` Pu Lehui
  -1 siblings, 0 replies; 42+ messages in thread
From: Pu Lehui @ 2024-01-16 14:21 UTC (permalink / raw)
  To: Björn Töpel, bpf, linux-riscv, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Conor Dooley, Luke Nelson, Pu Lehui



On 2023/9/28 17:59, Björn Töpel wrote:
> Pu Lehui <pulehui@huaweicloud.com> writes:
> 
>> From: Pu Lehui <pulehui@huawei.com>
>>
>> In the current RV64 JIT, if we just don't initialize the TCC in subprog,
>> the TCC can be propagated from the parent process to the subprocess, but
>> the TCC of the parent process cannot be restored when the subprocess
>> exits. Since the RV64 TCC is initialized before saving the callee saved
>> registers into the stack, we cannot use the callee saved register to
>> pass the TCC, otherwise the original value of the callee saved register
>> will be destroyed. So we implemented mixing bpf2bpf and tailcalls
>> similar to x86_64, i.e. using a non-callee saved register to transfer
>> the TCC between functions, and saving that register to the stack to
>> protect the TCC value. At the same time, we also consider the scenario
>> of mixing trampoline.
> 
> Hi!
> 
> The RISC-V JIT tries to minimize the stack usage, e.g. it doesn't have a
> fixed pro/epilogue like some of the other JITs. I think we can do better
> here, so that the pass-TCC-via-register can be used, and the additional
> stack access can be avoided.
> 
> Today, the TCC is passed via a register (a6) and can be viewed as a
> "state" variable/transparent argument/return value. As you point out, we
> loose this when we do a call. On (any) calls we move the TCC to a
> callee-saved register.
> 
> WDYT about the following scheme:
> 
> 1 Pickup the arm64 bpf2bpf/tailmix mechanism of just clearing the TCC
>    for the main program.
> 2 For BPF helper calls, move TCC to s6, perform the call, and restore
>    a6. Dito for kfunc calls (BPF_PSEUDO_KFUNC_CALL).
> 3 For all other calls, a6 is passed transparently.
> 
> For 2 bpf_jit_get_func_addr() can be used to determine if the callee is
> a BPF helper or not.
> 
> In summary; Determine in the JIT if we're leaving BPF-land, and need to
> move the TCC to a callee-saved reg, or not, and save us a bunch of stack
> store/loads.
> 

Valuable scheme. But we need to consider TCC back propagation. Let me 
show an example of calling subprog with TCC stored in A6:

prog1(TCC==1){
     subprog1(TCC==1)
         -> tailcall1(TCC==0)
             -> subprog2(TCC==0)
     subprog3(TCC==0) <--- should be TCC==1
         -\-> tailcall2 <--- can't be called
}

We call prog1 and TCC is 1. prog1 has two subprogs, subprog1 and 
subprog3. subprog1 calls tailcall1 and TCC become to 0. tailcall1 call 
subprog2 and then return to prog1 with TCC is 0. At this time, subprog3 
cannot call tailcall2 because TCC is 0. But TCC should be 1 here.

The question is A6 cannot be saved and restored, that is why I saved A6 
in stack at prologue, and restored at epilogue.

> 
> Björn


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls
@ 2024-01-16 14:21       ` Pu Lehui
  0 siblings, 0 replies; 42+ messages in thread
From: Pu Lehui @ 2024-01-16 14:21 UTC (permalink / raw)
  To: Björn Töpel, bpf, linux-riscv, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Conor Dooley, Luke Nelson, Pu Lehui



On 2023/9/28 17:59, Björn Töpel wrote:
> Pu Lehui <pulehui@huaweicloud.com> writes:
> 
>> From: Pu Lehui <pulehui@huawei.com>
>>
>> In the current RV64 JIT, if we just don't initialize the TCC in subprog,
>> the TCC can be propagated from the parent process to the subprocess, but
>> the TCC of the parent process cannot be restored when the subprocess
>> exits. Since the RV64 TCC is initialized before saving the callee saved
>> registers into the stack, we cannot use the callee saved register to
>> pass the TCC, otherwise the original value of the callee saved register
>> will be destroyed. So we implemented mixing bpf2bpf and tailcalls
>> similar to x86_64, i.e. using a non-callee saved register to transfer
>> the TCC between functions, and saving that register to the stack to
>> protect the TCC value. At the same time, we also consider the scenario
>> of mixing trampoline.
> 
> Hi!
> 
> The RISC-V JIT tries to minimize the stack usage, e.g. it doesn't have a
> fixed pro/epilogue like some of the other JITs. I think we can do better
> here, so that the pass-TCC-via-register can be used, and the additional
> stack access can be avoided.
> 
> Today, the TCC is passed via a register (a6) and can be viewed as a
> "state" variable/transparent argument/return value. As you point out, we
> loose this when we do a call. On (any) calls we move the TCC to a
> callee-saved register.
> 
> WDYT about the following scheme:
> 
> 1 Pickup the arm64 bpf2bpf/tailmix mechanism of just clearing the TCC
>    for the main program.
> 2 For BPF helper calls, move TCC to s6, perform the call, and restore
>    a6. Dito for kfunc calls (BPF_PSEUDO_KFUNC_CALL).
> 3 For all other calls, a6 is passed transparently.
> 
> For 2 bpf_jit_get_func_addr() can be used to determine if the callee is
> a BPF helper or not.
> 
> In summary; Determine in the JIT if we're leaving BPF-land, and need to
> move the TCC to a callee-saved reg, or not, and save us a bunch of stack
> store/loads.
> 

Valuable scheme. But we need to consider TCC back propagation. Let me 
show an example of calling subprog with TCC stored in A6:

prog1(TCC==1){
     subprog1(TCC==1)
         -> tailcall1(TCC==0)
             -> subprog2(TCC==0)
     subprog3(TCC==0) <--- should be TCC==1
         -\-> tailcall2 <--- can't be called
}

We call prog1 and TCC is 1. prog1 has two subprogs, subprog1 and 
subprog3. subprog1 calls tailcall1 and TCC become to 0. tailcall1 call 
subprog2 and then return to prog1 with TCC is 0. At this time, subprog3 
cannot call tailcall2 because TCC is 0. But TCC should be 1 here.

The question is A6 cannot be saved and restored, that is why I saved A6 
in stack at prologue, and restored at epilogue.

> 
> Björn


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls
  2023-09-19  3:57   ` Pu Lehui
@ 2024-01-30  3:26     ` Pu Lehui
  -1 siblings, 0 replies; 42+ messages in thread
From: Pu Lehui @ 2024-01-30  3:26 UTC (permalink / raw)
  To: Pu Lehui, bpf, linux-riscv, netdev
  Cc: Björn Töpel, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Palmer Dabbelt, Conor Dooley, Luke Nelson



On 2023/9/19 11:57, Pu Lehui wrote:
> From: Pu Lehui <pulehui@huawei.com>
> 
> In the current RV64 JIT, if we just don't initialize the TCC in subprog,
> the TCC can be propagated from the parent process to the subprocess, but
> the TCC of the parent process cannot be restored when the subprocess
> exits. Since the RV64 TCC is initialized before saving the callee saved
> registers into the stack, we cannot use the callee saved register to
> pass the TCC, otherwise the original value of the callee saved register
> will be destroyed. So we implemented mixing bpf2bpf and tailcalls
> similar to x86_64, i.e. using a non-callee saved register to transfer
> the TCC between functions, and saving that register to the stack to
> protect the TCC value. At the same time, we also consider the scenario
> of mixing trampoline.
> 
> Tests test_bpf.ko and test_verifier have passed, as well as the relative
> testcases of test_progs*.
> 
> Signed-off-by: Pu Lehui <pulehui@huawei.com>
> ---
>   arch/riscv/net/bpf_jit.h        |  1 +
>   arch/riscv/net/bpf_jit_comp64.c | 91 ++++++++++++++-------------------
>   2 files changed, 39 insertions(+), 53 deletions(-)
> 
> diff --git a/arch/riscv/net/bpf_jit.h b/arch/riscv/net/bpf_jit.h
> index d21c6c92a..ca518846c 100644
> --- a/arch/riscv/net/bpf_jit.h
> +++ b/arch/riscv/net/bpf_jit.h
> @@ -75,6 +75,7 @@ struct rv_jit_context {
>   	int nexentries;
>   	unsigned long flags;
>   	int stack_size;
> +	int tcc_offset;
>   };
>   
>   /* Convert from ninsns to bytes. */
> diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
> index f2ded1151..f37be4911 100644
> --- a/arch/riscv/net/bpf_jit_comp64.c
> +++ b/arch/riscv/net/bpf_jit_comp64.c
> @@ -13,13 +13,11 @@
>   #include <asm/patch.h>
>   #include "bpf_jit.h"
>   
> +#define RV_REG_TCC		RV_REG_A6
>   #define RV_FENTRY_NINSNS	2
>   /* fentry and TCC init insns will be skipped on tailcall */
>   #define RV_TAILCALL_OFFSET	((RV_FENTRY_NINSNS + 1) * 4)
>   
> -#define RV_REG_TCC RV_REG_A6
> -#define RV_REG_TCC_SAVED RV_REG_S6 /* Store A6 in S6 if program do calls */
> -
>   static const int regmap[] = {
>   	[BPF_REG_0] =	RV_REG_A5,
>   	[BPF_REG_1] =	RV_REG_A0,
> @@ -51,14 +49,12 @@ static const int pt_regmap[] = {
>   };
>   
>   enum {
> -	RV_CTX_F_SEEN_TAIL_CALL =	0,
>   	RV_CTX_F_SEEN_CALL =		RV_REG_RA,
>   	RV_CTX_F_SEEN_S1 =		RV_REG_S1,
>   	RV_CTX_F_SEEN_S2 =		RV_REG_S2,
>   	RV_CTX_F_SEEN_S3 =		RV_REG_S3,
>   	RV_CTX_F_SEEN_S4 =		RV_REG_S4,
>   	RV_CTX_F_SEEN_S5 =		RV_REG_S5,
> -	RV_CTX_F_SEEN_S6 =		RV_REG_S6,
>   };
>   
>   static u8 bpf_to_rv_reg(int bpf_reg, struct rv_jit_context *ctx)
> @@ -71,7 +67,6 @@ static u8 bpf_to_rv_reg(int bpf_reg, struct rv_jit_context *ctx)
>   	case RV_CTX_F_SEEN_S3:
>   	case RV_CTX_F_SEEN_S4:
>   	case RV_CTX_F_SEEN_S5:
> -	case RV_CTX_F_SEEN_S6:
>   		__set_bit(reg, &ctx->flags);
>   	}
>   	return reg;
> @@ -86,7 +81,6 @@ static bool seen_reg(int reg, struct rv_jit_context *ctx)
>   	case RV_CTX_F_SEEN_S3:
>   	case RV_CTX_F_SEEN_S4:
>   	case RV_CTX_F_SEEN_S5:
> -	case RV_CTX_F_SEEN_S6:
>   		return test_bit(reg, &ctx->flags);
>   	}
>   	return false;
> @@ -102,32 +96,6 @@ static void mark_call(struct rv_jit_context *ctx)
>   	__set_bit(RV_CTX_F_SEEN_CALL, &ctx->flags);
>   }
>   
> -static bool seen_call(struct rv_jit_context *ctx)
> -{
> -	return test_bit(RV_CTX_F_SEEN_CALL, &ctx->flags);
> -}
> -
> -static void mark_tail_call(struct rv_jit_context *ctx)
> -{
> -	__set_bit(RV_CTX_F_SEEN_TAIL_CALL, &ctx->flags);
> -}
> -
> -static bool seen_tail_call(struct rv_jit_context *ctx)
> -{
> -	return test_bit(RV_CTX_F_SEEN_TAIL_CALL, &ctx->flags);
> -}
> -
> -static u8 rv_tail_call_reg(struct rv_jit_context *ctx)
> -{
> -	mark_tail_call(ctx);
> -
> -	if (seen_call(ctx)) {
> -		__set_bit(RV_CTX_F_SEEN_S6, &ctx->flags);
> -		return RV_REG_S6;
> -	}
> -	return RV_REG_A6;
> -}
> -
>   static bool is_32b_int(s64 val)
>   {
>   	return -(1L << 31) <= val && val < (1L << 31);
> @@ -235,10 +203,7 @@ static void __build_epilogue(bool is_tail_call, struct rv_jit_context *ctx)
>   		emit_ld(RV_REG_S5, store_offset, RV_REG_SP, ctx);
>   		store_offset -= 8;
>   	}
> -	if (seen_reg(RV_REG_S6, ctx)) {
> -		emit_ld(RV_REG_S6, store_offset, RV_REG_SP, ctx);
> -		store_offset -= 8;
> -	}
> +	emit_ld(RV_REG_TCC, store_offset, RV_REG_SP, ctx);
>   
>   	emit_addi(RV_REG_SP, RV_REG_SP, stack_adjust, ctx);
>   	/* Set return value. */
> @@ -332,7 +297,6 @@ static void emit_zext_32(u8 reg, struct rv_jit_context *ctx)
>   static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
>   {
>   	int tc_ninsn, off, start_insn = ctx->ninsns;
> -	u8 tcc = rv_tail_call_reg(ctx);
>   
>   	/* a0: &ctx
>   	 * a1: &array
> @@ -355,9 +319,11 @@ static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
>   	/* if (--TCC < 0)
>   	 *     goto out;
>   	 */
> -	emit_addi(RV_REG_TCC, tcc, -1, ctx);
> +	emit_ld(RV_REG_TCC, ctx->tcc_offset, RV_REG_SP, ctx);
> +	emit_addi(RV_REG_TCC, RV_REG_TCC, -1, ctx);
>   	off = ninsns_rvoff(tc_ninsn - (ctx->ninsns - start_insn));
>   	emit_branch(BPF_JSLT, RV_REG_TCC, RV_REG_ZERO, off, ctx);
> +	emit_sd(RV_REG_SP, ctx->tcc_offset, RV_REG_TCC, ctx);
>   
>   	/* prog = array->ptrs[index];
>   	 * if (!prog)
> @@ -763,7 +729,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
>   	int i, ret, offset;
>   	int *branches_off = NULL;
>   	int stack_size = 0, nregs = m->nr_args;
> -	int retval_off, args_off, nregs_off, ip_off, run_ctx_off, sreg_off;
> +	int retval_off, args_off, nregs_off, ip_off, run_ctx_off, sreg_off, tcc_off;
>   	struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
>   	struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
>   	struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
> @@ -807,6 +773,8 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
>   	 *
>   	 * FP - sreg_off    [ callee saved reg	]
>   	 *
> +	 * FP - tcc_off     [ tail call count	] BPF_TRAMP_F_TAIL_CALL_CTX
> +	 *
>   	 *		    [ pads              ] pads for 16 bytes alignment
>   	 */
>   
> @@ -848,6 +816,11 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
>   	stack_size += 8;
>   	sreg_off = stack_size;
>   
> +	if (flags & BPF_TRAMP_F_TAIL_CALL_CTX) {
> +		stack_size += 8;
> +		tcc_off = stack_size;
> +	}
> +
>   	stack_size = round_up(stack_size, 16);
>   
>   	if (func_addr) {
> @@ -874,6 +847,10 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
>   		emit_addi(RV_REG_FP, RV_REG_SP, stack_size, ctx);
>   	}
>   
> +	/* store tail call count */
> +	if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
> +		emit_sd(RV_REG_FP, -tcc_off, RV_REG_TCC, ctx);
> +
>   	/* callee saved register S1 to pass start time */
>   	emit_sd(RV_REG_FP, -sreg_off, RV_REG_S1, ctx);
>   
> @@ -927,6 +904,9 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
>   
>   	if (flags & BPF_TRAMP_F_CALL_ORIG) {
>   		restore_args(nregs, args_off, ctx);
> +		/* restore TCC to RV_REG_TCC before calling the original function */
> +		if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
> +			emit_ld(RV_REG_TCC, -tcc_off, RV_REG_FP, ctx);
>   		ret = emit_call((const u64)orig_call, true, ctx);
>   		if (ret)
>   			goto out;
> @@ -967,6 +947,10 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
>   
>   	emit_ld(RV_REG_S1, -sreg_off, RV_REG_FP, ctx);
>   
> +	/* restore TCC to RV_REG_TCC before calling the original function */
> +	if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
> +		emit_ld(RV_REG_TCC, -tcc_off, RV_REG_FP, ctx);

Sorry guys. This will emit double times when `flags & 
BPF_TRAMP_F_CALL_ORIG`. Will fix it in next version.

> +
>   	if (func_addr) {
>   		/* trampoline called from function entry */
>   		emit_ld(RV_REG_T0, stack_size - 8, RV_REG_SP, ctx);
> @@ -1476,6 +1460,9 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
>   		if (ret < 0)
>   			return ret;
>   
> +		/* restore TCC from stack to RV_REG_TCC */
> +		emit_ld(RV_REG_TCC, ctx->tcc_offset, RV_REG_SP, ctx);
> +
>   		ret = emit_call(addr, fixed_addr, ctx);
>   		if (ret)
>   			return ret;
> @@ -1735,6 +1722,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
>   void bpf_jit_build_prologue(struct rv_jit_context *ctx)
>   {
>   	int i, stack_adjust = 0, store_offset, bpf_stack_adjust;
> +	bool is_main = ctx->prog->aux->func_idx == 0;
>   
>   	bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16);
>   	if (bpf_stack_adjust)
> @@ -1753,8 +1741,7 @@ void bpf_jit_build_prologue(struct rv_jit_context *ctx)
>   		stack_adjust += 8;
>   	if (seen_reg(RV_REG_S5, ctx))
>   		stack_adjust += 8;
> -	if (seen_reg(RV_REG_S6, ctx))
> -		stack_adjust += 8;
> +	stack_adjust += 8; /* RV_REG_TCC */
>   
>   	stack_adjust = round_up(stack_adjust, 16);
>   	stack_adjust += bpf_stack_adjust;
> @@ -1769,7 +1756,8 @@ void bpf_jit_build_prologue(struct rv_jit_context *ctx)
>   	 * (TCC) register. This instruction is skipped for tail calls.
>   	 * Force using a 4-byte (non-compressed) instruction.
>   	 */
> -	emit(rv_addi(RV_REG_TCC, RV_REG_ZERO, MAX_TAIL_CALL_CNT), ctx);
> +	if (is_main)
> +		emit(rv_addi(RV_REG_TCC, RV_REG_ZERO, MAX_TAIL_CALL_CNT), ctx);
>   
>   	emit_addi(RV_REG_SP, RV_REG_SP, -stack_adjust, ctx);
>   
> @@ -1799,22 +1787,14 @@ void bpf_jit_build_prologue(struct rv_jit_context *ctx)
>   		emit_sd(RV_REG_SP, store_offset, RV_REG_S5, ctx);
>   		store_offset -= 8;
>   	}
> -	if (seen_reg(RV_REG_S6, ctx)) {
> -		emit_sd(RV_REG_SP, store_offset, RV_REG_S6, ctx);
> -		store_offset -= 8;
> -	}
> +	emit_sd(RV_REG_SP, store_offset, RV_REG_TCC, ctx);
> +	ctx->tcc_offset = store_offset;
>   
>   	emit_addi(RV_REG_FP, RV_REG_SP, stack_adjust, ctx);
>   
>   	if (bpf_stack_adjust)
>   		emit_addi(RV_REG_S5, RV_REG_SP, bpf_stack_adjust, ctx);
>   
> -	/* Program contains calls and tail calls, so RV_REG_TCC need
> -	 * to be saved across calls.
> -	 */
> -	if (seen_tail_call(ctx) && seen_call(ctx))
> -		emit_mv(RV_REG_TCC_SAVED, RV_REG_TCC, ctx);
> -
>   	ctx->stack_size = stack_adjust;
>   }
>   
> @@ -1827,3 +1807,8 @@ bool bpf_jit_supports_kfunc_call(void)
>   {
>   	return true;
>   }
> +
> +bool bpf_jit_supports_subprog_tailcalls(void)
> +{
> +	return true;
> +}

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls
@ 2024-01-30  3:26     ` Pu Lehui
  0 siblings, 0 replies; 42+ messages in thread
From: Pu Lehui @ 2024-01-30  3:26 UTC (permalink / raw)
  To: Pu Lehui, bpf, linux-riscv, netdev
  Cc: Björn Töpel, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Palmer Dabbelt, Conor Dooley, Luke Nelson



On 2023/9/19 11:57, Pu Lehui wrote:
> From: Pu Lehui <pulehui@huawei.com>
> 
> In the current RV64 JIT, if we just don't initialize the TCC in subprog,
> the TCC can be propagated from the parent process to the subprocess, but
> the TCC of the parent process cannot be restored when the subprocess
> exits. Since the RV64 TCC is initialized before saving the callee saved
> registers into the stack, we cannot use the callee saved register to
> pass the TCC, otherwise the original value of the callee saved register
> will be destroyed. So we implemented mixing bpf2bpf and tailcalls
> similar to x86_64, i.e. using a non-callee saved register to transfer
> the TCC between functions, and saving that register to the stack to
> protect the TCC value. At the same time, we also consider the scenario
> of mixing trampoline.
> 
> Tests test_bpf.ko and test_verifier have passed, as well as the relative
> testcases of test_progs*.
> 
> Signed-off-by: Pu Lehui <pulehui@huawei.com>
> ---
>   arch/riscv/net/bpf_jit.h        |  1 +
>   arch/riscv/net/bpf_jit_comp64.c | 91 ++++++++++++++-------------------
>   2 files changed, 39 insertions(+), 53 deletions(-)
> 
> diff --git a/arch/riscv/net/bpf_jit.h b/arch/riscv/net/bpf_jit.h
> index d21c6c92a..ca518846c 100644
> --- a/arch/riscv/net/bpf_jit.h
> +++ b/arch/riscv/net/bpf_jit.h
> @@ -75,6 +75,7 @@ struct rv_jit_context {
>   	int nexentries;
>   	unsigned long flags;
>   	int stack_size;
> +	int tcc_offset;
>   };
>   
>   /* Convert from ninsns to bytes. */
> diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
> index f2ded1151..f37be4911 100644
> --- a/arch/riscv/net/bpf_jit_comp64.c
> +++ b/arch/riscv/net/bpf_jit_comp64.c
> @@ -13,13 +13,11 @@
>   #include <asm/patch.h>
>   #include "bpf_jit.h"
>   
> +#define RV_REG_TCC		RV_REG_A6
>   #define RV_FENTRY_NINSNS	2
>   /* fentry and TCC init insns will be skipped on tailcall */
>   #define RV_TAILCALL_OFFSET	((RV_FENTRY_NINSNS + 1) * 4)
>   
> -#define RV_REG_TCC RV_REG_A6
> -#define RV_REG_TCC_SAVED RV_REG_S6 /* Store A6 in S6 if program do calls */
> -
>   static const int regmap[] = {
>   	[BPF_REG_0] =	RV_REG_A5,
>   	[BPF_REG_1] =	RV_REG_A0,
> @@ -51,14 +49,12 @@ static const int pt_regmap[] = {
>   };
>   
>   enum {
> -	RV_CTX_F_SEEN_TAIL_CALL =	0,
>   	RV_CTX_F_SEEN_CALL =		RV_REG_RA,
>   	RV_CTX_F_SEEN_S1 =		RV_REG_S1,
>   	RV_CTX_F_SEEN_S2 =		RV_REG_S2,
>   	RV_CTX_F_SEEN_S3 =		RV_REG_S3,
>   	RV_CTX_F_SEEN_S4 =		RV_REG_S4,
>   	RV_CTX_F_SEEN_S5 =		RV_REG_S5,
> -	RV_CTX_F_SEEN_S6 =		RV_REG_S6,
>   };
>   
>   static u8 bpf_to_rv_reg(int bpf_reg, struct rv_jit_context *ctx)
> @@ -71,7 +67,6 @@ static u8 bpf_to_rv_reg(int bpf_reg, struct rv_jit_context *ctx)
>   	case RV_CTX_F_SEEN_S3:
>   	case RV_CTX_F_SEEN_S4:
>   	case RV_CTX_F_SEEN_S5:
> -	case RV_CTX_F_SEEN_S6:
>   		__set_bit(reg, &ctx->flags);
>   	}
>   	return reg;
> @@ -86,7 +81,6 @@ static bool seen_reg(int reg, struct rv_jit_context *ctx)
>   	case RV_CTX_F_SEEN_S3:
>   	case RV_CTX_F_SEEN_S4:
>   	case RV_CTX_F_SEEN_S5:
> -	case RV_CTX_F_SEEN_S6:
>   		return test_bit(reg, &ctx->flags);
>   	}
>   	return false;
> @@ -102,32 +96,6 @@ static void mark_call(struct rv_jit_context *ctx)
>   	__set_bit(RV_CTX_F_SEEN_CALL, &ctx->flags);
>   }
>   
> -static bool seen_call(struct rv_jit_context *ctx)
> -{
> -	return test_bit(RV_CTX_F_SEEN_CALL, &ctx->flags);
> -}
> -
> -static void mark_tail_call(struct rv_jit_context *ctx)
> -{
> -	__set_bit(RV_CTX_F_SEEN_TAIL_CALL, &ctx->flags);
> -}
> -
> -static bool seen_tail_call(struct rv_jit_context *ctx)
> -{
> -	return test_bit(RV_CTX_F_SEEN_TAIL_CALL, &ctx->flags);
> -}
> -
> -static u8 rv_tail_call_reg(struct rv_jit_context *ctx)
> -{
> -	mark_tail_call(ctx);
> -
> -	if (seen_call(ctx)) {
> -		__set_bit(RV_CTX_F_SEEN_S6, &ctx->flags);
> -		return RV_REG_S6;
> -	}
> -	return RV_REG_A6;
> -}
> -
>   static bool is_32b_int(s64 val)
>   {
>   	return -(1L << 31) <= val && val < (1L << 31);
> @@ -235,10 +203,7 @@ static void __build_epilogue(bool is_tail_call, struct rv_jit_context *ctx)
>   		emit_ld(RV_REG_S5, store_offset, RV_REG_SP, ctx);
>   		store_offset -= 8;
>   	}
> -	if (seen_reg(RV_REG_S6, ctx)) {
> -		emit_ld(RV_REG_S6, store_offset, RV_REG_SP, ctx);
> -		store_offset -= 8;
> -	}
> +	emit_ld(RV_REG_TCC, store_offset, RV_REG_SP, ctx);
>   
>   	emit_addi(RV_REG_SP, RV_REG_SP, stack_adjust, ctx);
>   	/* Set return value. */
> @@ -332,7 +297,6 @@ static void emit_zext_32(u8 reg, struct rv_jit_context *ctx)
>   static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
>   {
>   	int tc_ninsn, off, start_insn = ctx->ninsns;
> -	u8 tcc = rv_tail_call_reg(ctx);
>   
>   	/* a0: &ctx
>   	 * a1: &array
> @@ -355,9 +319,11 @@ static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
>   	/* if (--TCC < 0)
>   	 *     goto out;
>   	 */
> -	emit_addi(RV_REG_TCC, tcc, -1, ctx);
> +	emit_ld(RV_REG_TCC, ctx->tcc_offset, RV_REG_SP, ctx);
> +	emit_addi(RV_REG_TCC, RV_REG_TCC, -1, ctx);
>   	off = ninsns_rvoff(tc_ninsn - (ctx->ninsns - start_insn));
>   	emit_branch(BPF_JSLT, RV_REG_TCC, RV_REG_ZERO, off, ctx);
> +	emit_sd(RV_REG_SP, ctx->tcc_offset, RV_REG_TCC, ctx);
>   
>   	/* prog = array->ptrs[index];
>   	 * if (!prog)
> @@ -763,7 +729,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
>   	int i, ret, offset;
>   	int *branches_off = NULL;
>   	int stack_size = 0, nregs = m->nr_args;
> -	int retval_off, args_off, nregs_off, ip_off, run_ctx_off, sreg_off;
> +	int retval_off, args_off, nregs_off, ip_off, run_ctx_off, sreg_off, tcc_off;
>   	struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
>   	struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
>   	struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
> @@ -807,6 +773,8 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
>   	 *
>   	 * FP - sreg_off    [ callee saved reg	]
>   	 *
> +	 * FP - tcc_off     [ tail call count	] BPF_TRAMP_F_TAIL_CALL_CTX
> +	 *
>   	 *		    [ pads              ] pads for 16 bytes alignment
>   	 */
>   
> @@ -848,6 +816,11 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
>   	stack_size += 8;
>   	sreg_off = stack_size;
>   
> +	if (flags & BPF_TRAMP_F_TAIL_CALL_CTX) {
> +		stack_size += 8;
> +		tcc_off = stack_size;
> +	}
> +
>   	stack_size = round_up(stack_size, 16);
>   
>   	if (func_addr) {
> @@ -874,6 +847,10 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
>   		emit_addi(RV_REG_FP, RV_REG_SP, stack_size, ctx);
>   	}
>   
> +	/* store tail call count */
> +	if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
> +		emit_sd(RV_REG_FP, -tcc_off, RV_REG_TCC, ctx);
> +
>   	/* callee saved register S1 to pass start time */
>   	emit_sd(RV_REG_FP, -sreg_off, RV_REG_S1, ctx);
>   
> @@ -927,6 +904,9 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
>   
>   	if (flags & BPF_TRAMP_F_CALL_ORIG) {
>   		restore_args(nregs, args_off, ctx);
> +		/* restore TCC to RV_REG_TCC before calling the original function */
> +		if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
> +			emit_ld(RV_REG_TCC, -tcc_off, RV_REG_FP, ctx);
>   		ret = emit_call((const u64)orig_call, true, ctx);
>   		if (ret)
>   			goto out;
> @@ -967,6 +947,10 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
>   
>   	emit_ld(RV_REG_S1, -sreg_off, RV_REG_FP, ctx);
>   
> +	/* restore TCC to RV_REG_TCC before calling the original function */
> +	if (flags & BPF_TRAMP_F_TAIL_CALL_CTX)
> +		emit_ld(RV_REG_TCC, -tcc_off, RV_REG_FP, ctx);

Sorry guys. This will emit double times when `flags & 
BPF_TRAMP_F_CALL_ORIG`. Will fix it in next version.

> +
>   	if (func_addr) {
>   		/* trampoline called from function entry */
>   		emit_ld(RV_REG_T0, stack_size - 8, RV_REG_SP, ctx);
> @@ -1476,6 +1460,9 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
>   		if (ret < 0)
>   			return ret;
>   
> +		/* restore TCC from stack to RV_REG_TCC */
> +		emit_ld(RV_REG_TCC, ctx->tcc_offset, RV_REG_SP, ctx);
> +
>   		ret = emit_call(addr, fixed_addr, ctx);
>   		if (ret)
>   			return ret;
> @@ -1735,6 +1722,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
>   void bpf_jit_build_prologue(struct rv_jit_context *ctx)
>   {
>   	int i, stack_adjust = 0, store_offset, bpf_stack_adjust;
> +	bool is_main = ctx->prog->aux->func_idx == 0;
>   
>   	bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16);
>   	if (bpf_stack_adjust)
> @@ -1753,8 +1741,7 @@ void bpf_jit_build_prologue(struct rv_jit_context *ctx)
>   		stack_adjust += 8;
>   	if (seen_reg(RV_REG_S5, ctx))
>   		stack_adjust += 8;
> -	if (seen_reg(RV_REG_S6, ctx))
> -		stack_adjust += 8;
> +	stack_adjust += 8; /* RV_REG_TCC */
>   
>   	stack_adjust = round_up(stack_adjust, 16);
>   	stack_adjust += bpf_stack_adjust;
> @@ -1769,7 +1756,8 @@ void bpf_jit_build_prologue(struct rv_jit_context *ctx)
>   	 * (TCC) register. This instruction is skipped for tail calls.
>   	 * Force using a 4-byte (non-compressed) instruction.
>   	 */
> -	emit(rv_addi(RV_REG_TCC, RV_REG_ZERO, MAX_TAIL_CALL_CNT), ctx);
> +	if (is_main)
> +		emit(rv_addi(RV_REG_TCC, RV_REG_ZERO, MAX_TAIL_CALL_CNT), ctx);
>   
>   	emit_addi(RV_REG_SP, RV_REG_SP, -stack_adjust, ctx);
>   
> @@ -1799,22 +1787,14 @@ void bpf_jit_build_prologue(struct rv_jit_context *ctx)
>   		emit_sd(RV_REG_SP, store_offset, RV_REG_S5, ctx);
>   		store_offset -= 8;
>   	}
> -	if (seen_reg(RV_REG_S6, ctx)) {
> -		emit_sd(RV_REG_SP, store_offset, RV_REG_S6, ctx);
> -		store_offset -= 8;
> -	}
> +	emit_sd(RV_REG_SP, store_offset, RV_REG_TCC, ctx);
> +	ctx->tcc_offset = store_offset;
>   
>   	emit_addi(RV_REG_FP, RV_REG_SP, stack_adjust, ctx);
>   
>   	if (bpf_stack_adjust)
>   		emit_addi(RV_REG_S5, RV_REG_SP, bpf_stack_adjust, ctx);
>   
> -	/* Program contains calls and tail calls, so RV_REG_TCC need
> -	 * to be saved across calls.
> -	 */
> -	if (seen_tail_call(ctx) && seen_call(ctx))
> -		emit_mv(RV_REG_TCC_SAVED, RV_REG_TCC, ctx);
> -
>   	ctx->stack_size = stack_adjust;
>   }
>   
> @@ -1827,3 +1807,8 @@ bool bpf_jit_supports_kfunc_call(void)
>   {
>   	return true;
>   }
> +
> +bool bpf_jit_supports_subprog_tailcalls(void)
> +{
> +	return true;
> +}

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls
  2024-01-16 14:21       ` Pu Lehui
@ 2024-01-30  8:29         ` Björn Töpel
  -1 siblings, 0 replies; 42+ messages in thread
From: Björn Töpel @ 2024-01-30  8:29 UTC (permalink / raw)
  To: Pu Lehui, bpf, linux-riscv, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Conor Dooley, Luke Nelson, Pu Lehui

Pu Lehui <pulehui@huaweicloud.com> writes:

> On 2023/9/28 17:59, Björn Töpel wrote:
>> Pu Lehui <pulehui@huaweicloud.com> writes:
>> 
>>> From: Pu Lehui <pulehui@huawei.com>
>>>
>>> In the current RV64 JIT, if we just don't initialize the TCC in subprog,
>>> the TCC can be propagated from the parent process to the subprocess, but
>>> the TCC of the parent process cannot be restored when the subprocess
>>> exits. Since the RV64 TCC is initialized before saving the callee saved
>>> registers into the stack, we cannot use the callee saved register to
>>> pass the TCC, otherwise the original value of the callee saved register
>>> will be destroyed. So we implemented mixing bpf2bpf and tailcalls
>>> similar to x86_64, i.e. using a non-callee saved register to transfer
>>> the TCC between functions, and saving that register to the stack to
>>> protect the TCC value. At the same time, we also consider the scenario
>>> of mixing trampoline.
>> 
>> Hi!
>> 
>> The RISC-V JIT tries to minimize the stack usage, e.g. it doesn't have a
>> fixed pro/epilogue like some of the other JITs. I think we can do better
>> here, so that the pass-TCC-via-register can be used, and the additional
>> stack access can be avoided.
>> 
>> Today, the TCC is passed via a register (a6) and can be viewed as a
>> "state" variable/transparent argument/return value. As you point out, we
>> loose this when we do a call. On (any) calls we move the TCC to a
>> callee-saved register.
>> 
>> WDYT about the following scheme:
>> 
>> 1 Pickup the arm64 bpf2bpf/tailmix mechanism of just clearing the TCC
>>    for the main program.
>> 2 For BPF helper calls, move TCC to s6, perform the call, and restore
>>    a6. Dito for kfunc calls (BPF_PSEUDO_KFUNC_CALL).
>> 3 For all other calls, a6 is passed transparently.
>> 
>> For 2 bpf_jit_get_func_addr() can be used to determine if the callee is
>> a BPF helper or not.
>> 
>> In summary; Determine in the JIT if we're leaving BPF-land, and need to
>> move the TCC to a callee-saved reg, or not, and save us a bunch of stack
>> store/loads.
>> 
>
> Valuable scheme. But we need to consider TCC back propagation. Let me 
> show an example of calling subprog with TCC stored in A6:
>
> prog1(TCC==1){
>      subprog1(TCC==1)
>          -> tailcall1(TCC==0)
>              -> subprog2(TCC==0)
>      subprog3(TCC==0) <--- should be TCC==1
>          -\-> tailcall2 <--- can't be called
> }
>
> We call prog1 and TCC is 1. prog1 has two subprogs, subprog1 and 
> subprog3. subprog1 calls tailcall1 and TCC become to 0. tailcall1 call 
> subprog2 and then return to prog1 with TCC is 0. At this time, subprog3 
> cannot call tailcall2 because TCC is 0. But TCC should be 1 here.

Huh, I'm not following, and I don't see the issue. Help me out! You're
only allowed to do X tail calls "globally" for a BPF context, right? So
in the example you're outlining above, tailcall2 shouldn't be allowed to
be called.


Björn

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls
@ 2024-01-30  8:29         ` Björn Töpel
  0 siblings, 0 replies; 42+ messages in thread
From: Björn Töpel @ 2024-01-30  8:29 UTC (permalink / raw)
  To: Pu Lehui, bpf, linux-riscv, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Conor Dooley, Luke Nelson, Pu Lehui

Pu Lehui <pulehui@huaweicloud.com> writes:

> On 2023/9/28 17:59, Björn Töpel wrote:
>> Pu Lehui <pulehui@huaweicloud.com> writes:
>> 
>>> From: Pu Lehui <pulehui@huawei.com>
>>>
>>> In the current RV64 JIT, if we just don't initialize the TCC in subprog,
>>> the TCC can be propagated from the parent process to the subprocess, but
>>> the TCC of the parent process cannot be restored when the subprocess
>>> exits. Since the RV64 TCC is initialized before saving the callee saved
>>> registers into the stack, we cannot use the callee saved register to
>>> pass the TCC, otherwise the original value of the callee saved register
>>> will be destroyed. So we implemented mixing bpf2bpf and tailcalls
>>> similar to x86_64, i.e. using a non-callee saved register to transfer
>>> the TCC between functions, and saving that register to the stack to
>>> protect the TCC value. At the same time, we also consider the scenario
>>> of mixing trampoline.
>> 
>> Hi!
>> 
>> The RISC-V JIT tries to minimize the stack usage, e.g. it doesn't have a
>> fixed pro/epilogue like some of the other JITs. I think we can do better
>> here, so that the pass-TCC-via-register can be used, and the additional
>> stack access can be avoided.
>> 
>> Today, the TCC is passed via a register (a6) and can be viewed as a
>> "state" variable/transparent argument/return value. As you point out, we
>> loose this when we do a call. On (any) calls we move the TCC to a
>> callee-saved register.
>> 
>> WDYT about the following scheme:
>> 
>> 1 Pickup the arm64 bpf2bpf/tailmix mechanism of just clearing the TCC
>>    for the main program.
>> 2 For BPF helper calls, move TCC to s6, perform the call, and restore
>>    a6. Dito for kfunc calls (BPF_PSEUDO_KFUNC_CALL).
>> 3 For all other calls, a6 is passed transparently.
>> 
>> For 2 bpf_jit_get_func_addr() can be used to determine if the callee is
>> a BPF helper or not.
>> 
>> In summary; Determine in the JIT if we're leaving BPF-land, and need to
>> move the TCC to a callee-saved reg, or not, and save us a bunch of stack
>> store/loads.
>> 
>
> Valuable scheme. But we need to consider TCC back propagation. Let me 
> show an example of calling subprog with TCC stored in A6:
>
> prog1(TCC==1){
>      subprog1(TCC==1)
>          -> tailcall1(TCC==0)
>              -> subprog2(TCC==0)
>      subprog3(TCC==0) <--- should be TCC==1
>          -\-> tailcall2 <--- can't be called
> }
>
> We call prog1 and TCC is 1. prog1 has two subprogs, subprog1 and 
> subprog3. subprog1 calls tailcall1 and TCC become to 0. tailcall1 call 
> subprog2 and then return to prog1 with TCC is 0. At this time, subprog3 
> cannot call tailcall2 because TCC is 0. But TCC should be 1 here.

Huh, I'm not following, and I don't see the issue. Help me out! You're
only allowed to do X tail calls "globally" for a BPF context, right? So
in the example you're outlining above, tailcall2 shouldn't be allowed to
be called.


Björn

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls
  2024-01-30  8:29         ` Björn Töpel
@ 2024-01-30  9:14           ` Pu Lehui
  -1 siblings, 0 replies; 42+ messages in thread
From: Pu Lehui @ 2024-01-30  9:14 UTC (permalink / raw)
  To: Björn Töpel, Pu Lehui, bpf, linux-riscv, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Conor Dooley, Luke Nelson

[-- Attachment #1: Type: text/plain, Size: 3561 bytes --]



On 2024/1/30 16:29, Björn Töpel wrote:
> Pu Lehui <pulehui@huaweicloud.com> writes:
> 
>> On 2023/9/28 17:59, Björn Töpel wrote:
>>> Pu Lehui <pulehui@huaweicloud.com> writes:
>>>
>>>> From: Pu Lehui <pulehui@huawei.com>
>>>>
>>>> In the current RV64 JIT, if we just don't initialize the TCC in subprog,
>>>> the TCC can be propagated from the parent process to the subprocess, but
>>>> the TCC of the parent process cannot be restored when the subprocess
>>>> exits. Since the RV64 TCC is initialized before saving the callee saved
>>>> registers into the stack, we cannot use the callee saved register to
>>>> pass the TCC, otherwise the original value of the callee saved register
>>>> will be destroyed. So we implemented mixing bpf2bpf and tailcalls
>>>> similar to x86_64, i.e. using a non-callee saved register to transfer
>>>> the TCC between functions, and saving that register to the stack to
>>>> protect the TCC value. At the same time, we also consider the scenario
>>>> of mixing trampoline.
>>>
>>> Hi!
>>>
>>> The RISC-V JIT tries to minimize the stack usage, e.g. it doesn't have a
>>> fixed pro/epilogue like some of the other JITs. I think we can do better
>>> here, so that the pass-TCC-via-register can be used, and the additional
>>> stack access can be avoided.
>>>
>>> Today, the TCC is passed via a register (a6) and can be viewed as a
>>> "state" variable/transparent argument/return value. As you point out, we
>>> loose this when we do a call. On (any) calls we move the TCC to a
>>> callee-saved register.
>>>
>>> WDYT about the following scheme:
>>>
>>> 1 Pickup the arm64 bpf2bpf/tailmix mechanism of just clearing the TCC
>>>     for the main program.
>>> 2 For BPF helper calls, move TCC to s6, perform the call, and restore
>>>     a6. Dito for kfunc calls (BPF_PSEUDO_KFUNC_CALL).
>>> 3 For all other calls, a6 is passed transparently.
>>>
>>> For 2 bpf_jit_get_func_addr() can be used to determine if the callee is
>>> a BPF helper or not.
>>>
>>> In summary; Determine in the JIT if we're leaving BPF-land, and need to
>>> move the TCC to a callee-saved reg, or not, and save us a bunch of stack
>>> store/loads.
>>>
>>
>> Valuable scheme. But we need to consider TCC back propagation. Let me
>> show an example of calling subprog with TCC stored in A6:
>>
>> prog1(TCC==1){
>>       subprog1(TCC==1)
>>           -> tailcall1(TCC==0)
>>               -> subprog2(TCC==0)
>>       subprog3(TCC==0) <--- should be TCC==1
>>           -\-> tailcall2 <--- can't be called
>> }

Let's back with this example again. Imagine that the tailcall chain is a 
list limited to 33 elements. When the list has 32 elements, we call 
subprog1 and then tailcall1. At this time, the list elements count 
becomes 33. Then we call subprog2 and return prog1. At this time, the 
list removes 1 element and becomes 32 elements. At this time, there 
still can perform 1 tailcall.

I've attached a diagram that shows mixing tailcall and subprogs is 
nearly a "call". It can return to caller function.

>>
>> We call prog1 and TCC is 1. prog1 has two subprogs, subprog1 and
>> subprog3. subprog1 calls tailcall1 and TCC become to 0. tailcall1 call
>> subprog2 and then return to prog1 with TCC is 0. At this time, subprog3
>> cannot call tailcall2 because TCC is 0. But TCC should be 1 here.
> 
> Huh, I'm not following, and I don't see the issue. Help me out! You're
> only allowed to do X tail calls "globally" for a BPF context, right? So
> in the example you're outlining above, tailcall2 shouldn't be allowed to
> be called.
> 
> 
> Björn

[-- Attachment #2: bpf2bpf&tailcall.png --]
[-- Type: image/png, Size: 116596 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls
@ 2024-01-30  9:14           ` Pu Lehui
  0 siblings, 0 replies; 42+ messages in thread
From: Pu Lehui @ 2024-01-30  9:14 UTC (permalink / raw)
  To: Björn Töpel, Pu Lehui, bpf, linux-riscv, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Conor Dooley, Luke Nelson

[-- Attachment #1: Type: text/plain, Size: 3561 bytes --]



On 2024/1/30 16:29, Björn Töpel wrote:
> Pu Lehui <pulehui@huaweicloud.com> writes:
> 
>> On 2023/9/28 17:59, Björn Töpel wrote:
>>> Pu Lehui <pulehui@huaweicloud.com> writes:
>>>
>>>> From: Pu Lehui <pulehui@huawei.com>
>>>>
>>>> In the current RV64 JIT, if we just don't initialize the TCC in subprog,
>>>> the TCC can be propagated from the parent process to the subprocess, but
>>>> the TCC of the parent process cannot be restored when the subprocess
>>>> exits. Since the RV64 TCC is initialized before saving the callee saved
>>>> registers into the stack, we cannot use the callee saved register to
>>>> pass the TCC, otherwise the original value of the callee saved register
>>>> will be destroyed. So we implemented mixing bpf2bpf and tailcalls
>>>> similar to x86_64, i.e. using a non-callee saved register to transfer
>>>> the TCC between functions, and saving that register to the stack to
>>>> protect the TCC value. At the same time, we also consider the scenario
>>>> of mixing trampoline.
>>>
>>> Hi!
>>>
>>> The RISC-V JIT tries to minimize the stack usage, e.g. it doesn't have a
>>> fixed pro/epilogue like some of the other JITs. I think we can do better
>>> here, so that the pass-TCC-via-register can be used, and the additional
>>> stack access can be avoided.
>>>
>>> Today, the TCC is passed via a register (a6) and can be viewed as a
>>> "state" variable/transparent argument/return value. As you point out, we
>>> loose this when we do a call. On (any) calls we move the TCC to a
>>> callee-saved register.
>>>
>>> WDYT about the following scheme:
>>>
>>> 1 Pickup the arm64 bpf2bpf/tailmix mechanism of just clearing the TCC
>>>     for the main program.
>>> 2 For BPF helper calls, move TCC to s6, perform the call, and restore
>>>     a6. Dito for kfunc calls (BPF_PSEUDO_KFUNC_CALL).
>>> 3 For all other calls, a6 is passed transparently.
>>>
>>> For 2 bpf_jit_get_func_addr() can be used to determine if the callee is
>>> a BPF helper or not.
>>>
>>> In summary; Determine in the JIT if we're leaving BPF-land, and need to
>>> move the TCC to a callee-saved reg, or not, and save us a bunch of stack
>>> store/loads.
>>>
>>
>> Valuable scheme. But we need to consider TCC back propagation. Let me
>> show an example of calling subprog with TCC stored in A6:
>>
>> prog1(TCC==1){
>>       subprog1(TCC==1)
>>           -> tailcall1(TCC==0)
>>               -> subprog2(TCC==0)
>>       subprog3(TCC==0) <--- should be TCC==1
>>           -\-> tailcall2 <--- can't be called
>> }

Let's back with this example again. Imagine that the tailcall chain is a 
list limited to 33 elements. When the list has 32 elements, we call 
subprog1 and then tailcall1. At this time, the list elements count 
becomes 33. Then we call subprog2 and return prog1. At this time, the 
list removes 1 element and becomes 32 elements. At this time, there 
still can perform 1 tailcall.

I've attached a diagram that shows mixing tailcall and subprogs is 
nearly a "call". It can return to caller function.

>>
>> We call prog1 and TCC is 1. prog1 has two subprogs, subprog1 and
>> subprog3. subprog1 calls tailcall1 and TCC become to 0. tailcall1 call
>> subprog2 and then return to prog1 with TCC is 0. At this time, subprog3
>> cannot call tailcall2 because TCC is 0. But TCC should be 1 here.
> 
> Huh, I'm not following, and I don't see the issue. Help me out! You're
> only allowed to do X tail calls "globally" for a BPF context, right? So
> in the example you're outlining above, tailcall2 shouldn't be allowed to
> be called.
> 
> 
> Björn

[-- Attachment #2: bpf2bpf&tailcall.png --]
[-- Type: image/png, Size: 116596 bytes --]

[-- Attachment #3: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls
  2024-01-30  9:14           ` Pu Lehui
@ 2024-01-30 13:28             ` Björn Töpel
  -1 siblings, 0 replies; 42+ messages in thread
From: Björn Töpel @ 2024-01-30 13:28 UTC (permalink / raw)
  To: Pu Lehui, Pu Lehui, bpf, linux-riscv, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Conor Dooley, Luke Nelson

Pu Lehui <pulehui@huawei.com> writes:

> On 2024/1/30 16:29, Björn Töpel wrote:
>> Pu Lehui <pulehui@huaweicloud.com> writes:
>> 
>>> On 2023/9/28 17:59, Björn Töpel wrote:
>>>> Pu Lehui <pulehui@huaweicloud.com> writes:
>>>>
>>>>> From: Pu Lehui <pulehui@huawei.com>
>>>>>
>>>>> In the current RV64 JIT, if we just don't initialize the TCC in subprog,
>>>>> the TCC can be propagated from the parent process to the subprocess, but
>>>>> the TCC of the parent process cannot be restored when the subprocess
>>>>> exits. Since the RV64 TCC is initialized before saving the callee saved
>>>>> registers into the stack, we cannot use the callee saved register to
>>>>> pass the TCC, otherwise the original value of the callee saved register
>>>>> will be destroyed. So we implemented mixing bpf2bpf and tailcalls
>>>>> similar to x86_64, i.e. using a non-callee saved register to transfer
>>>>> the TCC between functions, and saving that register to the stack to
>>>>> protect the TCC value. At the same time, we also consider the scenario
>>>>> of mixing trampoline.
>>>>
>>>> Hi!
>>>>
>>>> The RISC-V JIT tries to minimize the stack usage, e.g. it doesn't have a
>>>> fixed pro/epilogue like some of the other JITs. I think we can do better
>>>> here, so that the pass-TCC-via-register can be used, and the additional
>>>> stack access can be avoided.
>>>>
>>>> Today, the TCC is passed via a register (a6) and can be viewed as a
>>>> "state" variable/transparent argument/return value. As you point out, we
>>>> loose this when we do a call. On (any) calls we move the TCC to a
>>>> callee-saved register.
>>>>
>>>> WDYT about the following scheme:
>>>>
>>>> 1 Pickup the arm64 bpf2bpf/tailmix mechanism of just clearing the TCC
>>>>     for the main program.
>>>> 2 For BPF helper calls, move TCC to s6, perform the call, and restore
>>>>     a6. Dito for kfunc calls (BPF_PSEUDO_KFUNC_CALL).
>>>> 3 For all other calls, a6 is passed transparently.
>>>>
>>>> For 2 bpf_jit_get_func_addr() can be used to determine if the callee is
>>>> a BPF helper or not.
>>>>
>>>> In summary; Determine in the JIT if we're leaving BPF-land, and need to
>>>> move the TCC to a callee-saved reg, or not, and save us a bunch of stack
>>>> store/loads.
>>>>
>>>
>>> Valuable scheme. But we need to consider TCC back propagation. Let me
>>> show an example of calling subprog with TCC stored in A6:
>>>
>>> prog1(TCC==1){
>>>       subprog1(TCC==1)
>>>           -> tailcall1(TCC==0)
>>>               -> subprog2(TCC==0)
>>>       subprog3(TCC==0) <--- should be TCC==1
>>>           -\-> tailcall2 <--- can't be called
>>> }
>
> Let's back with this example again. Imagine that the tailcall chain is a 
> list limited to 33 elements. When the list has 32 elements, we call 
> subprog1 and then tailcall1. At this time, the list elements count 
> becomes 33. Then we call subprog2 and return prog1. At this time, the 
> list removes 1 element and becomes 32 elements. At this time, there 
> still can perform 1 tailcall.
>
> I've attached a diagram that shows mixing tailcall and subprogs is 
> nearly a "call". It can return to caller function.

Hmm. Let me put my Q in another way.

The kernel calls into BPF_PROG_RUN() (~a BPF context). Would it ever be
OK to do more than 33 tail calls, regardless of subprogs or not?

In your example, TCC is 1. You are allowed to perform one tail call. In
your example prog1 performs two.

My view of TCC has always been ~a counter of the number of tailcalls~.

With your example expanded:
prog1(TCC==33){
      subprog1(TCC==33)
          -> tailcall1(TCC==33) -> tailcall1(TCC==32) -> tailcall1(TCC==31) -> ... // 33 times
      // Lehui says TCC should be 33 again.
      // Björn says "it's the number of tailcalls", and subprog3 cannot perform a tail call
      subprog3(TCC==?)
          
My view has, again, been than TCC is a run-time count of the number
tailcalls (fentry/fexit patch bpf-programs included).

What does x86 and arm64 do?


Björn

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls
@ 2024-01-30 13:28             ` Björn Töpel
  0 siblings, 0 replies; 42+ messages in thread
From: Björn Töpel @ 2024-01-30 13:28 UTC (permalink / raw)
  To: Pu Lehui, Pu Lehui, bpf, linux-riscv, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Conor Dooley, Luke Nelson

Pu Lehui <pulehui@huawei.com> writes:

> On 2024/1/30 16:29, Björn Töpel wrote:
>> Pu Lehui <pulehui@huaweicloud.com> writes:
>> 
>>> On 2023/9/28 17:59, Björn Töpel wrote:
>>>> Pu Lehui <pulehui@huaweicloud.com> writes:
>>>>
>>>>> From: Pu Lehui <pulehui@huawei.com>
>>>>>
>>>>> In the current RV64 JIT, if we just don't initialize the TCC in subprog,
>>>>> the TCC can be propagated from the parent process to the subprocess, but
>>>>> the TCC of the parent process cannot be restored when the subprocess
>>>>> exits. Since the RV64 TCC is initialized before saving the callee saved
>>>>> registers into the stack, we cannot use the callee saved register to
>>>>> pass the TCC, otherwise the original value of the callee saved register
>>>>> will be destroyed. So we implemented mixing bpf2bpf and tailcalls
>>>>> similar to x86_64, i.e. using a non-callee saved register to transfer
>>>>> the TCC between functions, and saving that register to the stack to
>>>>> protect the TCC value. At the same time, we also consider the scenario
>>>>> of mixing trampoline.
>>>>
>>>> Hi!
>>>>
>>>> The RISC-V JIT tries to minimize the stack usage, e.g. it doesn't have a
>>>> fixed pro/epilogue like some of the other JITs. I think we can do better
>>>> here, so that the pass-TCC-via-register can be used, and the additional
>>>> stack access can be avoided.
>>>>
>>>> Today, the TCC is passed via a register (a6) and can be viewed as a
>>>> "state" variable/transparent argument/return value. As you point out, we
>>>> loose this when we do a call. On (any) calls we move the TCC to a
>>>> callee-saved register.
>>>>
>>>> WDYT about the following scheme:
>>>>
>>>> 1 Pickup the arm64 bpf2bpf/tailmix mechanism of just clearing the TCC
>>>>     for the main program.
>>>> 2 For BPF helper calls, move TCC to s6, perform the call, and restore
>>>>     a6. Dito for kfunc calls (BPF_PSEUDO_KFUNC_CALL).
>>>> 3 For all other calls, a6 is passed transparently.
>>>>
>>>> For 2 bpf_jit_get_func_addr() can be used to determine if the callee is
>>>> a BPF helper or not.
>>>>
>>>> In summary; Determine in the JIT if we're leaving BPF-land, and need to
>>>> move the TCC to a callee-saved reg, or not, and save us a bunch of stack
>>>> store/loads.
>>>>
>>>
>>> Valuable scheme. But we need to consider TCC back propagation. Let me
>>> show an example of calling subprog with TCC stored in A6:
>>>
>>> prog1(TCC==1){
>>>       subprog1(TCC==1)
>>>           -> tailcall1(TCC==0)
>>>               -> subprog2(TCC==0)
>>>       subprog3(TCC==0) <--- should be TCC==1
>>>           -\-> tailcall2 <--- can't be called
>>> }
>
> Let's back with this example again. Imagine that the tailcall chain is a 
> list limited to 33 elements. When the list has 32 elements, we call 
> subprog1 and then tailcall1. At this time, the list elements count 
> becomes 33. Then we call subprog2 and return prog1. At this time, the 
> list removes 1 element and becomes 32 elements. At this time, there 
> still can perform 1 tailcall.
>
> I've attached a diagram that shows mixing tailcall and subprogs is 
> nearly a "call". It can return to caller function.

Hmm. Let me put my Q in another way.

The kernel calls into BPF_PROG_RUN() (~a BPF context). Would it ever be
OK to do more than 33 tail calls, regardless of subprogs or not?

In your example, TCC is 1. You are allowed to perform one tail call. In
your example prog1 performs two.

My view of TCC has always been ~a counter of the number of tailcalls~.

With your example expanded:
prog1(TCC==33){
      subprog1(TCC==33)
          -> tailcall1(TCC==33) -> tailcall1(TCC==32) -> tailcall1(TCC==31) -> ... // 33 times
      // Lehui says TCC should be 33 again.
      // Björn says "it's the number of tailcalls", and subprog3 cannot perform a tail call
      subprog3(TCC==?)
          
My view has, again, been than TCC is a run-time count of the number
tailcalls (fentry/fexit patch bpf-programs included).

What does x86 and arm64 do?


Björn

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls
  2024-01-30 13:28             ` Björn Töpel
@ 2024-01-30 14:10               ` Pu Lehui
  -1 siblings, 0 replies; 42+ messages in thread
From: Pu Lehui @ 2024-01-30 14:10 UTC (permalink / raw)
  To: Björn Töpel, Pu Lehui, bpf, linux-riscv, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Conor Dooley, Luke Nelson



On 2024/1/30 21:28, Björn Töpel wrote:
> Pu Lehui <pulehui@huawei.com> writes:
> 
>> On 2024/1/30 16:29, Björn Töpel wrote:
>>> Pu Lehui <pulehui@huaweicloud.com> writes:
>>>
>>>> On 2023/9/28 17:59, Björn Töpel wrote:
>>>>> Pu Lehui <pulehui@huaweicloud.com> writes:
>>>>>
>>>>>> From: Pu Lehui <pulehui@huawei.com>
>>>>>>
>>>>>> In the current RV64 JIT, if we just don't initialize the TCC in subprog,
>>>>>> the TCC can be propagated from the parent process to the subprocess, but
>>>>>> the TCC of the parent process cannot be restored when the subprocess
>>>>>> exits. Since the RV64 TCC is initialized before saving the callee saved
>>>>>> registers into the stack, we cannot use the callee saved register to
>>>>>> pass the TCC, otherwise the original value of the callee saved register
>>>>>> will be destroyed. So we implemented mixing bpf2bpf and tailcalls
>>>>>> similar to x86_64, i.e. using a non-callee saved register to transfer
>>>>>> the TCC between functions, and saving that register to the stack to
>>>>>> protect the TCC value. At the same time, we also consider the scenario
>>>>>> of mixing trampoline.
>>>>>
>>>>> Hi!
>>>>>
>>>>> The RISC-V JIT tries to minimize the stack usage, e.g. it doesn't have a
>>>>> fixed pro/epilogue like some of the other JITs. I think we can do better
>>>>> here, so that the pass-TCC-via-register can be used, and the additional
>>>>> stack access can be avoided.
>>>>>
>>>>> Today, the TCC is passed via a register (a6) and can be viewed as a
>>>>> "state" variable/transparent argument/return value. As you point out, we
>>>>> loose this when we do a call. On (any) calls we move the TCC to a
>>>>> callee-saved register.
>>>>>
>>>>> WDYT about the following scheme:
>>>>>
>>>>> 1 Pickup the arm64 bpf2bpf/tailmix mechanism of just clearing the TCC
>>>>>      for the main program.
>>>>> 2 For BPF helper calls, move TCC to s6, perform the call, and restore
>>>>>      a6. Dito for kfunc calls (BPF_PSEUDO_KFUNC_CALL).
>>>>> 3 For all other calls, a6 is passed transparently.
>>>>>
>>>>> For 2 bpf_jit_get_func_addr() can be used to determine if the callee is
>>>>> a BPF helper or not.
>>>>>
>>>>> In summary; Determine in the JIT if we're leaving BPF-land, and need to
>>>>> move the TCC to a callee-saved reg, or not, and save us a bunch of stack
>>>>> store/loads.
>>>>>
>>>>
>>>> Valuable scheme. But we need to consider TCC back propagation. Let me
>>>> show an example of calling subprog with TCC stored in A6:
>>>>
>>>> prog1(TCC==1){
>>>>        subprog1(TCC==1)
>>>>            -> tailcall1(TCC==0)
>>>>                -> subprog2(TCC==0)
>>>>        subprog3(TCC==0) <--- should be TCC==1
>>>>            -\-> tailcall2 <--- can't be called
>>>> }
>>
>> Let's back with this example again. Imagine that the tailcall chain is a
>> list limited to 33 elements. When the list has 32 elements, we call
>> subprog1 and then tailcall1. At this time, the list elements count
>> becomes 33. Then we call subprog2 and return prog1. At this time, the
>> list removes 1 element and becomes 32 elements. At this time, there
>> still can perform 1 tailcall.
>>
>> I've attached a diagram that shows mixing tailcall and subprogs is
>> nearly a "call". It can return to caller function.
> 
> Hmm. Let me put my Q in another way.
> 
> The kernel calls into BPF_PROG_RUN() (~a BPF context). Would it ever be
> OK to do more than 33 tail calls, regardless of subprogs or not?
> 
> In your example, TCC is 1. You are allowed to perform one tail call. In
> your example prog1 performs two.
> 
> My view of TCC has always been ~a counter of the number of tailcalls~.
> 
> With your example expanded:
> prog1(TCC==33){
>        subprog1(TCC==33)
>            -> tailcall1(TCC==33) -> tailcall1(TCC==32) -> tailcall1(TCC==31) -> ... // 33 times
>        // Lehui says TCC should be 33 again.
>        // Björn says "it's the number of tailcalls", and subprog3 cannot perform a tail call
>        subprog3(TCC==?)

Yes, my view is take this something like a stack,while you take this as 
a fixed global value.

prog1(TCC==33){
     subprog1(TCC==33)
         -> tailcall1(TCC==33) -> tailcall1(TCC==32) -> 
tailcall1(TCC==31) -> ... // 33 times -> subprog2(TCC==0)
     subprog3(TCC==33)
	-> tailcall1(TCC==33) -> tailcall1(TCC==32) -> tailcall1(TCC==31) -> 
... // 33 times

>            
> My view has, again, been than TCC is a run-time count of the number
> tailcalls (fentry/fexit patch bpf-programs included).
> 
> What does x86 and arm64 do?

When subprog return back to caller bpf program, they both restore TCC to 
the value when enter into subprog. The ARM64 uses the callee saved 
register to store the TCC. When the ARM64 exits, the TCC is restored to 
the value when it enter. The while x86 uses the stack to do the same thing.

> 
> 
> Björn

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls
@ 2024-01-30 14:10               ` Pu Lehui
  0 siblings, 0 replies; 42+ messages in thread
From: Pu Lehui @ 2024-01-30 14:10 UTC (permalink / raw)
  To: Björn Töpel, Pu Lehui, bpf, linux-riscv, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Conor Dooley, Luke Nelson



On 2024/1/30 21:28, Björn Töpel wrote:
> Pu Lehui <pulehui@huawei.com> writes:
> 
>> On 2024/1/30 16:29, Björn Töpel wrote:
>>> Pu Lehui <pulehui@huaweicloud.com> writes:
>>>
>>>> On 2023/9/28 17:59, Björn Töpel wrote:
>>>>> Pu Lehui <pulehui@huaweicloud.com> writes:
>>>>>
>>>>>> From: Pu Lehui <pulehui@huawei.com>
>>>>>>
>>>>>> In the current RV64 JIT, if we just don't initialize the TCC in subprog,
>>>>>> the TCC can be propagated from the parent process to the subprocess, but
>>>>>> the TCC of the parent process cannot be restored when the subprocess
>>>>>> exits. Since the RV64 TCC is initialized before saving the callee saved
>>>>>> registers into the stack, we cannot use the callee saved register to
>>>>>> pass the TCC, otherwise the original value of the callee saved register
>>>>>> will be destroyed. So we implemented mixing bpf2bpf and tailcalls
>>>>>> similar to x86_64, i.e. using a non-callee saved register to transfer
>>>>>> the TCC between functions, and saving that register to the stack to
>>>>>> protect the TCC value. At the same time, we also consider the scenario
>>>>>> of mixing trampoline.
>>>>>
>>>>> Hi!
>>>>>
>>>>> The RISC-V JIT tries to minimize the stack usage, e.g. it doesn't have a
>>>>> fixed pro/epilogue like some of the other JITs. I think we can do better
>>>>> here, so that the pass-TCC-via-register can be used, and the additional
>>>>> stack access can be avoided.
>>>>>
>>>>> Today, the TCC is passed via a register (a6) and can be viewed as a
>>>>> "state" variable/transparent argument/return value. As you point out, we
>>>>> loose this when we do a call. On (any) calls we move the TCC to a
>>>>> callee-saved register.
>>>>>
>>>>> WDYT about the following scheme:
>>>>>
>>>>> 1 Pickup the arm64 bpf2bpf/tailmix mechanism of just clearing the TCC
>>>>>      for the main program.
>>>>> 2 For BPF helper calls, move TCC to s6, perform the call, and restore
>>>>>      a6. Dito for kfunc calls (BPF_PSEUDO_KFUNC_CALL).
>>>>> 3 For all other calls, a6 is passed transparently.
>>>>>
>>>>> For 2 bpf_jit_get_func_addr() can be used to determine if the callee is
>>>>> a BPF helper or not.
>>>>>
>>>>> In summary; Determine in the JIT if we're leaving BPF-land, and need to
>>>>> move the TCC to a callee-saved reg, or not, and save us a bunch of stack
>>>>> store/loads.
>>>>>
>>>>
>>>> Valuable scheme. But we need to consider TCC back propagation. Let me
>>>> show an example of calling subprog with TCC stored in A6:
>>>>
>>>> prog1(TCC==1){
>>>>        subprog1(TCC==1)
>>>>            -> tailcall1(TCC==0)
>>>>                -> subprog2(TCC==0)
>>>>        subprog3(TCC==0) <--- should be TCC==1
>>>>            -\-> tailcall2 <--- can't be called
>>>> }
>>
>> Let's back with this example again. Imagine that the tailcall chain is a
>> list limited to 33 elements. When the list has 32 elements, we call
>> subprog1 and then tailcall1. At this time, the list elements count
>> becomes 33. Then we call subprog2 and return prog1. At this time, the
>> list removes 1 element and becomes 32 elements. At this time, there
>> still can perform 1 tailcall.
>>
>> I've attached a diagram that shows mixing tailcall and subprogs is
>> nearly a "call". It can return to caller function.
> 
> Hmm. Let me put my Q in another way.
> 
> The kernel calls into BPF_PROG_RUN() (~a BPF context). Would it ever be
> OK to do more than 33 tail calls, regardless of subprogs or not?
> 
> In your example, TCC is 1. You are allowed to perform one tail call. In
> your example prog1 performs two.
> 
> My view of TCC has always been ~a counter of the number of tailcalls~.
> 
> With your example expanded:
> prog1(TCC==33){
>        subprog1(TCC==33)
>            -> tailcall1(TCC==33) -> tailcall1(TCC==32) -> tailcall1(TCC==31) -> ... // 33 times
>        // Lehui says TCC should be 33 again.
>        // Björn says "it's the number of tailcalls", and subprog3 cannot perform a tail call
>        subprog3(TCC==?)

Yes, my view is take this something like a stack,while you take this as 
a fixed global value.

prog1(TCC==33){
     subprog1(TCC==33)
         -> tailcall1(TCC==33) -> tailcall1(TCC==32) -> 
tailcall1(TCC==31) -> ... // 33 times -> subprog2(TCC==0)
     subprog3(TCC==33)
	-> tailcall1(TCC==33) -> tailcall1(TCC==32) -> tailcall1(TCC==31) -> 
... // 33 times

>            
> My view has, again, been than TCC is a run-time count of the number
> tailcalls (fentry/fexit patch bpf-programs included).
> 
> What does x86 and arm64 do?

When subprog return back to caller bpf program, they both restore TCC to 
the value when enter into subprog. The ARM64 uses the callee saved 
register to store the TCC. When the ARM64 exits, the TCC is restored to 
the value when it enter. The while x86 uses the stack to do the same thing.

> 
> 
> Björn

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls
  2024-01-30 14:10               ` Pu Lehui
@ 2024-01-30 16:03                 ` Björn Töpel
  -1 siblings, 0 replies; 42+ messages in thread
From: Björn Töpel @ 2024-01-30 16:03 UTC (permalink / raw)
  To: Pu Lehui, Pu Lehui, bpf, linux-riscv, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Conor Dooley, Luke Nelson

Pu Lehui <pulehui@huawei.com> writes:

> On 2024/1/30 21:28, Björn Töpel wrote:
>> Pu Lehui <pulehui@huawei.com> writes:
>> 
>>> On 2024/1/30 16:29, Björn Töpel wrote:
>>>> Pu Lehui <pulehui@huaweicloud.com> writes:
>>>>
>>>>> On 2023/9/28 17:59, Björn Töpel wrote:
>>>>>> Pu Lehui <pulehui@huaweicloud.com> writes:
>>>>>>
>>>>>>> From: Pu Lehui <pulehui@huawei.com>
>>>>>>>
>>>>>>> In the current RV64 JIT, if we just don't initialize the TCC in subprog,
>>>>>>> the TCC can be propagated from the parent process to the subprocess, but
>>>>>>> the TCC of the parent process cannot be restored when the subprocess
>>>>>>> exits. Since the RV64 TCC is initialized before saving the callee saved
>>>>>>> registers into the stack, we cannot use the callee saved register to
>>>>>>> pass the TCC, otherwise the original value of the callee saved register
>>>>>>> will be destroyed. So we implemented mixing bpf2bpf and tailcalls
>>>>>>> similar to x86_64, i.e. using a non-callee saved register to transfer
>>>>>>> the TCC between functions, and saving that register to the stack to
>>>>>>> protect the TCC value. At the same time, we also consider the scenario
>>>>>>> of mixing trampoline.
>>>>>>
>>>>>> Hi!
>>>>>>
>>>>>> The RISC-V JIT tries to minimize the stack usage, e.g. it doesn't have a
>>>>>> fixed pro/epilogue like some of the other JITs. I think we can do better
>>>>>> here, so that the pass-TCC-via-register can be used, and the additional
>>>>>> stack access can be avoided.
>>>>>>
>>>>>> Today, the TCC is passed via a register (a6) and can be viewed as a
>>>>>> "state" variable/transparent argument/return value. As you point out, we
>>>>>> loose this when we do a call. On (any) calls we move the TCC to a
>>>>>> callee-saved register.
>>>>>>
>>>>>> WDYT about the following scheme:
>>>>>>
>>>>>> 1 Pickup the arm64 bpf2bpf/tailmix mechanism of just clearing the TCC
>>>>>>      for the main program.
>>>>>> 2 For BPF helper calls, move TCC to s6, perform the call, and restore
>>>>>>      a6. Dito for kfunc calls (BPF_PSEUDO_KFUNC_CALL).
>>>>>> 3 For all other calls, a6 is passed transparently.
>>>>>>
>>>>>> For 2 bpf_jit_get_func_addr() can be used to determine if the callee is
>>>>>> a BPF helper or not.
>>>>>>
>>>>>> In summary; Determine in the JIT if we're leaving BPF-land, and need to
>>>>>> move the TCC to a callee-saved reg, or not, and save us a bunch of stack
>>>>>> store/loads.
>>>>>>
>>>>>
>>>>> Valuable scheme. But we need to consider TCC back propagation. Let me
>>>>> show an example of calling subprog with TCC stored in A6:
>>>>>
>>>>> prog1(TCC==1){
>>>>>        subprog1(TCC==1)
>>>>>            -> tailcall1(TCC==0)
>>>>>                -> subprog2(TCC==0)
>>>>>        subprog3(TCC==0) <--- should be TCC==1
>>>>>            -\-> tailcall2 <--- can't be called
>>>>> }
>>>
>>> Let's back with this example again. Imagine that the tailcall chain is a
>>> list limited to 33 elements. When the list has 32 elements, we call
>>> subprog1 and then tailcall1. At this time, the list elements count
>>> becomes 33. Then we call subprog2 and return prog1. At this time, the
>>> list removes 1 element and becomes 32 elements. At this time, there
>>> still can perform 1 tailcall.
>>>
>>> I've attached a diagram that shows mixing tailcall and subprogs is
>>> nearly a "call". It can return to caller function.
>> 
>> Hmm. Let me put my Q in another way.
>> 
>> The kernel calls into BPF_PROG_RUN() (~a BPF context). Would it ever be
>> OK to do more than 33 tail calls, regardless of subprogs or not?
>> 
>> In your example, TCC is 1. You are allowed to perform one tail call. In
>> your example prog1 performs two.
>> 
>> My view of TCC has always been ~a counter of the number of tailcalls~.
>> 
>> With your example expanded:
>> prog1(TCC==33){
>>        subprog1(TCC==33)
>>            -> tailcall1(TCC==33) -> tailcall1(TCC==32) -> tailcall1(TCC==31) -> ... // 33 times
>>        // Lehui says TCC should be 33 again.
>>        // Björn says "it's the number of tailcalls", and subprog3 cannot perform a tail call
>>        subprog3(TCC==?)
>
> Yes, my view is take this something like a stack,while you take this as 
> a fixed global value.
>
> prog1(TCC==33){
>      subprog1(TCC==33)
>          -> tailcall1(TCC==33) -> tailcall1(TCC==32) -> 
> tailcall1(TCC==31) -> ... // 33 times -> subprog2(TCC==0)
>      subprog3(TCC==33)
> 	-> tailcall1(TCC==33) -> tailcall1(TCC==32) -> tailcall1(TCC==31) -> 
> ... // 33 times
>
>>            
>> My view has, again, been than TCC is a run-time count of the number
>> tailcalls (fentry/fexit patch bpf-programs included).
>> 
>> What does x86 and arm64 do?
>
> When subprog return back to caller bpf program, they both restore TCC to 
> the value when enter into subprog. The ARM64 uses the callee saved 
> register to store the TCC. When the ARM64 exits, the TCC is restored to 
> the value when it enter. The while x86 uses the stack to do the same thing.

Ok! Thanks for clarifying. I'll continue reviewing the v2 of your
series!

BTW, I wonder if we can trigger this [1] on RV64 -- i.e. calling the
main prog, will reset the tcc count.

[1] https://lore.kernel.org/bpf/20240104142226.87869-1-hffilwlqm@gmail.com/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls
@ 2024-01-30 16:03                 ` Björn Töpel
  0 siblings, 0 replies; 42+ messages in thread
From: Björn Töpel @ 2024-01-30 16:03 UTC (permalink / raw)
  To: Pu Lehui, Pu Lehui, bpf, linux-riscv, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Conor Dooley, Luke Nelson

Pu Lehui <pulehui@huawei.com> writes:

> On 2024/1/30 21:28, Björn Töpel wrote:
>> Pu Lehui <pulehui@huawei.com> writes:
>> 
>>> On 2024/1/30 16:29, Björn Töpel wrote:
>>>> Pu Lehui <pulehui@huaweicloud.com> writes:
>>>>
>>>>> On 2023/9/28 17:59, Björn Töpel wrote:
>>>>>> Pu Lehui <pulehui@huaweicloud.com> writes:
>>>>>>
>>>>>>> From: Pu Lehui <pulehui@huawei.com>
>>>>>>>
>>>>>>> In the current RV64 JIT, if we just don't initialize the TCC in subprog,
>>>>>>> the TCC can be propagated from the parent process to the subprocess, but
>>>>>>> the TCC of the parent process cannot be restored when the subprocess
>>>>>>> exits. Since the RV64 TCC is initialized before saving the callee saved
>>>>>>> registers into the stack, we cannot use the callee saved register to
>>>>>>> pass the TCC, otherwise the original value of the callee saved register
>>>>>>> will be destroyed. So we implemented mixing bpf2bpf and tailcalls
>>>>>>> similar to x86_64, i.e. using a non-callee saved register to transfer
>>>>>>> the TCC between functions, and saving that register to the stack to
>>>>>>> protect the TCC value. At the same time, we also consider the scenario
>>>>>>> of mixing trampoline.
>>>>>>
>>>>>> Hi!
>>>>>>
>>>>>> The RISC-V JIT tries to minimize the stack usage, e.g. it doesn't have a
>>>>>> fixed pro/epilogue like some of the other JITs. I think we can do better
>>>>>> here, so that the pass-TCC-via-register can be used, and the additional
>>>>>> stack access can be avoided.
>>>>>>
>>>>>> Today, the TCC is passed via a register (a6) and can be viewed as a
>>>>>> "state" variable/transparent argument/return value. As you point out, we
>>>>>> loose this when we do a call. On (any) calls we move the TCC to a
>>>>>> callee-saved register.
>>>>>>
>>>>>> WDYT about the following scheme:
>>>>>>
>>>>>> 1 Pickup the arm64 bpf2bpf/tailmix mechanism of just clearing the TCC
>>>>>>      for the main program.
>>>>>> 2 For BPF helper calls, move TCC to s6, perform the call, and restore
>>>>>>      a6. Dito for kfunc calls (BPF_PSEUDO_KFUNC_CALL).
>>>>>> 3 For all other calls, a6 is passed transparently.
>>>>>>
>>>>>> For 2 bpf_jit_get_func_addr() can be used to determine if the callee is
>>>>>> a BPF helper or not.
>>>>>>
>>>>>> In summary; Determine in the JIT if we're leaving BPF-land, and need to
>>>>>> move the TCC to a callee-saved reg, or not, and save us a bunch of stack
>>>>>> store/loads.
>>>>>>
>>>>>
>>>>> Valuable scheme. But we need to consider TCC back propagation. Let me
>>>>> show an example of calling subprog with TCC stored in A6:
>>>>>
>>>>> prog1(TCC==1){
>>>>>        subprog1(TCC==1)
>>>>>            -> tailcall1(TCC==0)
>>>>>                -> subprog2(TCC==0)
>>>>>        subprog3(TCC==0) <--- should be TCC==1
>>>>>            -\-> tailcall2 <--- can't be called
>>>>> }
>>>
>>> Let's back with this example again. Imagine that the tailcall chain is a
>>> list limited to 33 elements. When the list has 32 elements, we call
>>> subprog1 and then tailcall1. At this time, the list elements count
>>> becomes 33. Then we call subprog2 and return prog1. At this time, the
>>> list removes 1 element and becomes 32 elements. At this time, there
>>> still can perform 1 tailcall.
>>>
>>> I've attached a diagram that shows mixing tailcall and subprogs is
>>> nearly a "call". It can return to caller function.
>> 
>> Hmm. Let me put my Q in another way.
>> 
>> The kernel calls into BPF_PROG_RUN() (~a BPF context). Would it ever be
>> OK to do more than 33 tail calls, regardless of subprogs or not?
>> 
>> In your example, TCC is 1. You are allowed to perform one tail call. In
>> your example prog1 performs two.
>> 
>> My view of TCC has always been ~a counter of the number of tailcalls~.
>> 
>> With your example expanded:
>> prog1(TCC==33){
>>        subprog1(TCC==33)
>>            -> tailcall1(TCC==33) -> tailcall1(TCC==32) -> tailcall1(TCC==31) -> ... // 33 times
>>        // Lehui says TCC should be 33 again.
>>        // Björn says "it's the number of tailcalls", and subprog3 cannot perform a tail call
>>        subprog3(TCC==?)
>
> Yes, my view is take this something like a stack,while you take this as 
> a fixed global value.
>
> prog1(TCC==33){
>      subprog1(TCC==33)
>          -> tailcall1(TCC==33) -> tailcall1(TCC==32) -> 
> tailcall1(TCC==31) -> ... // 33 times -> subprog2(TCC==0)
>      subprog3(TCC==33)
> 	-> tailcall1(TCC==33) -> tailcall1(TCC==32) -> tailcall1(TCC==31) -> 
> ... // 33 times
>
>>            
>> My view has, again, been than TCC is a run-time count of the number
>> tailcalls (fentry/fexit patch bpf-programs included).
>> 
>> What does x86 and arm64 do?
>
> When subprog return back to caller bpf program, they both restore TCC to 
> the value when enter into subprog. The ARM64 uses the callee saved 
> register to store the TCC. When the ARM64 exits, the TCC is restored to 
> the value when it enter. The while x86 uses the stack to do the same thing.

Ok! Thanks for clarifying. I'll continue reviewing the v2 of your
series!

BTW, I wonder if we can trigger this [1] on RV64 -- i.e. calling the
main prog, will reset the tcc count.

[1] https://lore.kernel.org/bpf/20240104142226.87869-1-hffilwlqm@gmail.com/

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls
  2024-01-30 16:03                 ` Björn Töpel
@ 2024-01-31  9:18                   ` Pu Lehui
  -1 siblings, 0 replies; 42+ messages in thread
From: Pu Lehui @ 2024-01-31  9:18 UTC (permalink / raw)
  To: Björn Töpel, Pu Lehui, bpf, linux-riscv, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Conor Dooley, Luke Nelson



On 2024/1/31 0:03, Björn Töpel wrote:
> Pu Lehui <pulehui@huawei.com> writes:
> 
>> On 2024/1/30 21:28, Björn Töpel wrote:
>>> Pu Lehui <pulehui@huawei.com> writes:
>>>
>>>> On 2024/1/30 16:29, Björn Töpel wrote:
>>>>> Pu Lehui <pulehui@huaweicloud.com> writes:
>>>>>
>>>>>> On 2023/9/28 17:59, Björn Töpel wrote:
>>>>>>> Pu Lehui <pulehui@huaweicloud.com> writes:
>>>>>>>
>>>>>>>> From: Pu Lehui <pulehui@huawei.com>
>>>>>>>>
>>>>>>>> In the current RV64 JIT, if we just don't initialize the TCC in subprog,
>>>>>>>> the TCC can be propagated from the parent process to the subprocess, but
>>>>>>>> the TCC of the parent process cannot be restored when the subprocess
>>>>>>>> exits. Since the RV64 TCC is initialized before saving the callee saved
>>>>>>>> registers into the stack, we cannot use the callee saved register to
>>>>>>>> pass the TCC, otherwise the original value of the callee saved register
>>>>>>>> will be destroyed. So we implemented mixing bpf2bpf and tailcalls
>>>>>>>> similar to x86_64, i.e. using a non-callee saved register to transfer
>>>>>>>> the TCC between functions, and saving that register to the stack to
>>>>>>>> protect the TCC value. At the same time, we also consider the scenario
>>>>>>>> of mixing trampoline.
>>>>>>>
>>>>>>> Hi!
>>>>>>>
>>>>>>> The RISC-V JIT tries to minimize the stack usage, e.g. it doesn't have a
>>>>>>> fixed pro/epilogue like some of the other JITs. I think we can do better
>>>>>>> here, so that the pass-TCC-via-register can be used, and the additional
>>>>>>> stack access can be avoided.
>>>>>>>
>>>>>>> Today, the TCC is passed via a register (a6) and can be viewed as a
>>>>>>> "state" variable/transparent argument/return value. As you point out, we
>>>>>>> loose this when we do a call. On (any) calls we move the TCC to a
>>>>>>> callee-saved register.
>>>>>>>
>>>>>>> WDYT about the following scheme:
>>>>>>>
>>>>>>> 1 Pickup the arm64 bpf2bpf/tailmix mechanism of just clearing the TCC
>>>>>>>       for the main program.
>>>>>>> 2 For BPF helper calls, move TCC to s6, perform the call, and restore
>>>>>>>       a6. Dito for kfunc calls (BPF_PSEUDO_KFUNC_CALL).
>>>>>>> 3 For all other calls, a6 is passed transparently.
>>>>>>>
>>>>>>> For 2 bpf_jit_get_func_addr() can be used to determine if the callee is
>>>>>>> a BPF helper or not.
>>>>>>>
>>>>>>> In summary; Determine in the JIT if we're leaving BPF-land, and need to
>>>>>>> move the TCC to a callee-saved reg, or not, and save us a bunch of stack
>>>>>>> store/loads.
>>>>>>>
>>>>>>
>>>>>> Valuable scheme. But we need to consider TCC back propagation. Let me
>>>>>> show an example of calling subprog with TCC stored in A6:
>>>>>>
>>>>>> prog1(TCC==1){
>>>>>>         subprog1(TCC==1)
>>>>>>             -> tailcall1(TCC==0)
>>>>>>                 -> subprog2(TCC==0)
>>>>>>         subprog3(TCC==0) <--- should be TCC==1
>>>>>>             -\-> tailcall2 <--- can't be called
>>>>>> }
>>>>
>>>> Let's back with this example again. Imagine that the tailcall chain is a
>>>> list limited to 33 elements. When the list has 32 elements, we call
>>>> subprog1 and then tailcall1. At this time, the list elements count
>>>> becomes 33. Then we call subprog2 and return prog1. At this time, the
>>>> list removes 1 element and becomes 32 elements. At this time, there
>>>> still can perform 1 tailcall.
>>>>
>>>> I've attached a diagram that shows mixing tailcall and subprogs is
>>>> nearly a "call". It can return to caller function.
>>>
>>> Hmm. Let me put my Q in another way.
>>>
>>> The kernel calls into BPF_PROG_RUN() (~a BPF context). Would it ever be
>>> OK to do more than 33 tail calls, regardless of subprogs or not?
>>>
>>> In your example, TCC is 1. You are allowed to perform one tail call. In
>>> your example prog1 performs two.
>>>
>>> My view of TCC has always been ~a counter of the number of tailcalls~.
>>>
>>> With your example expanded:
>>> prog1(TCC==33){
>>>         subprog1(TCC==33)
>>>             -> tailcall1(TCC==33) -> tailcall1(TCC==32) -> tailcall1(TCC==31) -> ... // 33 times
>>>         // Lehui says TCC should be 33 again.
>>>         // Björn says "it's the number of tailcalls", and subprog3 cannot perform a tail call
>>>         subprog3(TCC==?)
>>
>> Yes, my view is take this something like a stack,while you take this as
>> a fixed global value.
>>
>> prog1(TCC==33){
>>       subprog1(TCC==33)
>>           -> tailcall1(TCC==33) -> tailcall1(TCC==32) ->
>> tailcall1(TCC==31) -> ... // 33 times -> subprog2(TCC==0)
>>       subprog3(TCC==33)
>> 	-> tailcall1(TCC==33) -> tailcall1(TCC==32) -> tailcall1(TCC==31) ->
>> ... // 33 times
>>
>>>             
>>> My view has, again, been than TCC is a run-time count of the number
>>> tailcalls (fentry/fexit patch bpf-programs included).
>>>
>>> What does x86 and arm64 do?
>>
>> When subprog return back to caller bpf program, they both restore TCC to
>> the value when enter into subprog. The ARM64 uses the callee saved
>> register to store the TCC. When the ARM64 exits, the TCC is restored to
>> the value when it enter. The while x86 uses the stack to do the same thing.
> 
> Ok! Thanks for clarifying. I'll continue reviewing the v2 of your
> series!
> 
> BTW, I wonder if we can trigger this [1] on RV64 -- i.e. calling the
> main prog, will reset the tcc count.
> 
> [1] https://lore.kernel.org/bpf/20240104142226.87869-1-hffilwlqm@gmail.com/

Yes, I have been paying attention to this matter recently and will 
allocate time to analyze it.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls
@ 2024-01-31  9:18                   ` Pu Lehui
  0 siblings, 0 replies; 42+ messages in thread
From: Pu Lehui @ 2024-01-31  9:18 UTC (permalink / raw)
  To: Björn Töpel, Pu Lehui, bpf, linux-riscv, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Conor Dooley, Luke Nelson



On 2024/1/31 0:03, Björn Töpel wrote:
> Pu Lehui <pulehui@huawei.com> writes:
> 
>> On 2024/1/30 21:28, Björn Töpel wrote:
>>> Pu Lehui <pulehui@huawei.com> writes:
>>>
>>>> On 2024/1/30 16:29, Björn Töpel wrote:
>>>>> Pu Lehui <pulehui@huaweicloud.com> writes:
>>>>>
>>>>>> On 2023/9/28 17:59, Björn Töpel wrote:
>>>>>>> Pu Lehui <pulehui@huaweicloud.com> writes:
>>>>>>>
>>>>>>>> From: Pu Lehui <pulehui@huawei.com>
>>>>>>>>
>>>>>>>> In the current RV64 JIT, if we just don't initialize the TCC in subprog,
>>>>>>>> the TCC can be propagated from the parent process to the subprocess, but
>>>>>>>> the TCC of the parent process cannot be restored when the subprocess
>>>>>>>> exits. Since the RV64 TCC is initialized before saving the callee saved
>>>>>>>> registers into the stack, we cannot use the callee saved register to
>>>>>>>> pass the TCC, otherwise the original value of the callee saved register
>>>>>>>> will be destroyed. So we implemented mixing bpf2bpf and tailcalls
>>>>>>>> similar to x86_64, i.e. using a non-callee saved register to transfer
>>>>>>>> the TCC between functions, and saving that register to the stack to
>>>>>>>> protect the TCC value. At the same time, we also consider the scenario
>>>>>>>> of mixing trampoline.
>>>>>>>
>>>>>>> Hi!
>>>>>>>
>>>>>>> The RISC-V JIT tries to minimize the stack usage, e.g. it doesn't have a
>>>>>>> fixed pro/epilogue like some of the other JITs. I think we can do better
>>>>>>> here, so that the pass-TCC-via-register can be used, and the additional
>>>>>>> stack access can be avoided.
>>>>>>>
>>>>>>> Today, the TCC is passed via a register (a6) and can be viewed as a
>>>>>>> "state" variable/transparent argument/return value. As you point out, we
>>>>>>> loose this when we do a call. On (any) calls we move the TCC to a
>>>>>>> callee-saved register.
>>>>>>>
>>>>>>> WDYT about the following scheme:
>>>>>>>
>>>>>>> 1 Pickup the arm64 bpf2bpf/tailmix mechanism of just clearing the TCC
>>>>>>>       for the main program.
>>>>>>> 2 For BPF helper calls, move TCC to s6, perform the call, and restore
>>>>>>>       a6. Dito for kfunc calls (BPF_PSEUDO_KFUNC_CALL).
>>>>>>> 3 For all other calls, a6 is passed transparently.
>>>>>>>
>>>>>>> For 2 bpf_jit_get_func_addr() can be used to determine if the callee is
>>>>>>> a BPF helper or not.
>>>>>>>
>>>>>>> In summary; Determine in the JIT if we're leaving BPF-land, and need to
>>>>>>> move the TCC to a callee-saved reg, or not, and save us a bunch of stack
>>>>>>> store/loads.
>>>>>>>
>>>>>>
>>>>>> Valuable scheme. But we need to consider TCC back propagation. Let me
>>>>>> show an example of calling subprog with TCC stored in A6:
>>>>>>
>>>>>> prog1(TCC==1){
>>>>>>         subprog1(TCC==1)
>>>>>>             -> tailcall1(TCC==0)
>>>>>>                 -> subprog2(TCC==0)
>>>>>>         subprog3(TCC==0) <--- should be TCC==1
>>>>>>             -\-> tailcall2 <--- can't be called
>>>>>> }
>>>>
>>>> Let's back with this example again. Imagine that the tailcall chain is a
>>>> list limited to 33 elements. When the list has 32 elements, we call
>>>> subprog1 and then tailcall1. At this time, the list elements count
>>>> becomes 33. Then we call subprog2 and return prog1. At this time, the
>>>> list removes 1 element and becomes 32 elements. At this time, there
>>>> still can perform 1 tailcall.
>>>>
>>>> I've attached a diagram that shows mixing tailcall and subprogs is
>>>> nearly a "call". It can return to caller function.
>>>
>>> Hmm. Let me put my Q in another way.
>>>
>>> The kernel calls into BPF_PROG_RUN() (~a BPF context). Would it ever be
>>> OK to do more than 33 tail calls, regardless of subprogs or not?
>>>
>>> In your example, TCC is 1. You are allowed to perform one tail call. In
>>> your example prog1 performs two.
>>>
>>> My view of TCC has always been ~a counter of the number of tailcalls~.
>>>
>>> With your example expanded:
>>> prog1(TCC==33){
>>>         subprog1(TCC==33)
>>>             -> tailcall1(TCC==33) -> tailcall1(TCC==32) -> tailcall1(TCC==31) -> ... // 33 times
>>>         // Lehui says TCC should be 33 again.
>>>         // Björn says "it's the number of tailcalls", and subprog3 cannot perform a tail call
>>>         subprog3(TCC==?)
>>
>> Yes, my view is take this something like a stack,while you take this as
>> a fixed global value.
>>
>> prog1(TCC==33){
>>       subprog1(TCC==33)
>>           -> tailcall1(TCC==33) -> tailcall1(TCC==32) ->
>> tailcall1(TCC==31) -> ... // 33 times -> subprog2(TCC==0)
>>       subprog3(TCC==33)
>> 	-> tailcall1(TCC==33) -> tailcall1(TCC==32) -> tailcall1(TCC==31) ->
>> ... // 33 times
>>
>>>             
>>> My view has, again, been than TCC is a run-time count of the number
>>> tailcalls (fentry/fexit patch bpf-programs included).
>>>
>>> What does x86 and arm64 do?
>>
>> When subprog return back to caller bpf program, they both restore TCC to
>> the value when enter into subprog. The ARM64 uses the callee saved
>> register to store the TCC. When the ARM64 exits, the TCC is restored to
>> the value when it enter. The while x86 uses the stack to do the same thing.
> 
> Ok! Thanks for clarifying. I'll continue reviewing the v2 of your
> series!
> 
> BTW, I wonder if we can trigger this [1] on RV64 -- i.e. calling the
> main prog, will reset the tcc count.
> 
> [1] https://lore.kernel.org/bpf/20240104142226.87869-1-hffilwlqm@gmail.com/

Yes, I have been paying attention to this matter recently and will 
allocate time to analyze it.


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2024-01-31 10:31 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-19  3:57 [PATCH bpf-next 0/4] Mixing bpf2bpf and tailcalls for RV64 Pu Lehui
2023-09-19  3:57 ` Pu Lehui
2023-09-19  3:57 ` [PATCH bpf-next 1/4] riscv, bpf: Remove redundant ctx->offset initialization Pu Lehui
2023-09-19  3:57   ` Pu Lehui
2023-09-19  3:57 ` [PATCH bpf-next 2/4] riscv, bpf: Using kvcalloc to allocate cache buffer Pu Lehui
2023-09-19  3:57   ` Pu Lehui
2023-09-19  3:57 ` [PATCH bpf-next 3/4] riscv, bpf: Add RV_TAILCALL_OFFSET macro to format tailcall offset Pu Lehui
2023-09-19  3:57   ` Pu Lehui
2023-09-19  3:57 ` [PATCH bpf-next 4/4] riscv, bpf: Mixing bpf2bpf and tailcalls Pu Lehui
2023-09-19  3:57   ` Pu Lehui
2023-09-19 10:04   ` Conor Dooley
2023-09-19 10:04     ` Conor Dooley
2023-09-19 10:54     ` Conor Dooley
2023-09-19 10:54       ` Conor Dooley
2023-09-19 11:23     ` Pu Lehui
2023-09-19 11:23       ` Pu Lehui
2023-09-19 11:50       ` Conor Dooley
2023-09-19 11:50         ` Conor Dooley
2023-09-19 12:01         ` Pu Lehui
2023-09-19 12:01           ` Pu Lehui
2023-09-28  9:59   ` Björn Töpel
2023-09-28  9:59     ` Björn Töpel
2023-09-28 10:39     ` Pu Lehui
2023-09-28 10:39       ` Pu Lehui
2024-01-16 14:21     ` Pu Lehui
2024-01-16 14:21       ` Pu Lehui
2024-01-30  8:29       ` Björn Töpel
2024-01-30  8:29         ` Björn Töpel
2024-01-30  9:14         ` Pu Lehui
2024-01-30  9:14           ` Pu Lehui
2024-01-30 13:28           ` Björn Töpel
2024-01-30 13:28             ` Björn Töpel
2024-01-30 14:10             ` Pu Lehui
2024-01-30 14:10               ` Pu Lehui
2024-01-30 16:03               ` Björn Töpel
2024-01-30 16:03                 ` Björn Töpel
2024-01-31  9:18                 ` Pu Lehui
2024-01-31  9:18                   ` Pu Lehui
2024-01-30  3:26   ` Pu Lehui
2024-01-30  3:26     ` Pu Lehui
2023-09-26 13:30 ` [PATCH bpf-next 0/4] Mixing bpf2bpf and tailcalls for RV64 Björn Töpel
2023-09-26 13:30   ` Björn Töpel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.