bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v2 0/6] Zbb support and code simplification for RV64 JIT
@ 2023-09-19  3:58 Pu Lehui
  2023-09-19  3:58 ` [PATCH bpf-next v2 1/6] riscv, bpf: Unify 32-bit sign-extension to emit_sextw Pu Lehui
                   ` (7 more replies)
  0 siblings, 8 replies; 20+ messages in thread
From: Pu Lehui @ 2023-09-19  3:58 UTC (permalink / raw)
  To: bpf, linux-riscv, netdev
  Cc: Björn Töpel, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Palmer Dabbelt, Conor Dooley, Luke Nelson, Pu Lehui, Pu Lehui

Add Zbb support [0] to optimize code size and performance of RV64 JIT.
Meanwhile, adjust the code for unification and simplification. Tests
test_bpf.ko and test_verifier have passed, as well as the relative
testcases of test_progs*.

Link: https://github.com/riscv/riscv-bitmanip/releases/download/1.0.0/bitmanip-1.0.0-38-g865e7a7.pdf [0]

v2:
- Add runtime detection for Zbb instructions. (Conor Dooley)
- Correct formatting issues detected by checkpatch. (Simon Horman)

v1:
https://lore.kernel.org/bpf/20230913153413.1446068-1-pulehui@huaweicloud.com/

Pu Lehui (6):
  riscv, bpf: Unify 32-bit sign-extension to emit_sextw
  riscv, bpf: Unify 32-bit zero-extension to emit_zextw
  riscv, bpf: Simplify sext and zext logics in branch instructions
  riscv, bpf: Add necessary Zbb instructions
  riscv, bpf: Optimize sign-extention mov insns with Zbb support
  riscv, bpf: Optimize bswap insns with Zbb support

 arch/riscv/net/bpf_jit.h        | 124 +++++++++++++++++++
 arch/riscv/net/bpf_jit_comp64.c | 213 +++++++++++---------------------
 2 files changed, 195 insertions(+), 142 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v2 1/6] riscv, bpf: Unify 32-bit sign-extension to emit_sextw
  2023-09-19  3:58 [PATCH bpf-next v2 0/6] Zbb support and code simplification for RV64 JIT Pu Lehui
@ 2023-09-19  3:58 ` Pu Lehui
  2023-09-28 10:45   ` Björn Töpel
  2023-09-19  3:58 ` [PATCH bpf-next v2 2/6] riscv, bpf: Unify 32-bit zero-extension to emit_zextw Pu Lehui
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Pu Lehui @ 2023-09-19  3:58 UTC (permalink / raw)
  To: bpf, linux-riscv, netdev
  Cc: Björn Töpel, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Palmer Dabbelt, Conor Dooley, Luke Nelson, Pu Lehui, Pu Lehui

From: Pu Lehui <pulehui@huawei.com>

For code unification, add emit_sextw wrapper to unify all the 32-bit
sign-extension operations.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
---
 arch/riscv/net/bpf_jit.h        |  5 +++++
 arch/riscv/net/bpf_jit_comp64.c | 10 +++++-----
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/riscv/net/bpf_jit.h b/arch/riscv/net/bpf_jit.h
index d21c6c92a..03a6ecb43 100644
--- a/arch/riscv/net/bpf_jit.h
+++ b/arch/riscv/net/bpf_jit.h
@@ -1084,6 +1084,11 @@ static inline void emit_subw(u8 rd, u8 rs1, u8 rs2, struct rv_jit_context *ctx)
 		emit(rv_subw(rd, rs1, rs2), ctx);
 }
 
+static inline void emit_sextw(u8 rd, u8 rs, struct rv_jit_context *ctx)
+{
+	emit_addiw(rd, rs, 0, ctx);
+}
+
 #endif /* __riscv_xlen == 64 */
 
 void bpf_jit_build_prologue(struct rv_jit_context *ctx);
diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
index 8423f4ddf..8b654c0cf 100644
--- a/arch/riscv/net/bpf_jit_comp64.c
+++ b/arch/riscv/net/bpf_jit_comp64.c
@@ -413,8 +413,8 @@ static void emit_zext_32_rd_rs(u8 *rd, u8 *rs, struct rv_jit_context *ctx)
 
 static void emit_sext_32_rd_rs(u8 *rd, u8 *rs, struct rv_jit_context *ctx)
 {
-	emit_addiw(RV_REG_T2, *rd, 0, ctx);
-	emit_addiw(RV_REG_T1, *rs, 0, ctx);
+	emit_sextw(RV_REG_T2, *rd, ctx);
+	emit_sextw(RV_REG_T1, *rs, ctx);
 	*rd = RV_REG_T2;
 	*rs = RV_REG_T1;
 }
@@ -429,7 +429,7 @@ static void emit_zext_32_rd_t1(u8 *rd, struct rv_jit_context *ctx)
 
 static void emit_sext_32_rd(u8 *rd, struct rv_jit_context *ctx)
 {
-	emit_addiw(RV_REG_T2, *rd, 0, ctx);
+	emit_sextw(RV_REG_T2, *rd, ctx);
 	*rd = RV_REG_T2;
 }
 
@@ -1057,7 +1057,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 			emit_srai(rd, RV_REG_T1, 64 - insn->off, ctx);
 			break;
 		case 32:
-			emit_addiw(rd, rs, 0, ctx);
+			emit_sextw(rd, rs, ctx);
 			break;
 		}
 		if (!is64 && !aux->verifier_zext)
@@ -1457,7 +1457,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 		 * as t1 is used only in comparison against zero.
 		 */
 		if (!is64 && imm < 0)
-			emit_addiw(RV_REG_T1, RV_REG_T1, 0, ctx);
+			emit_sextw(RV_REG_T1, RV_REG_T1, ctx);
 		e = ctx->ninsns;
 		rvoff -= ninsns_rvoff(e - s);
 		emit_branch(BPF_JNE, RV_REG_T1, RV_REG_ZERO, rvoff, ctx);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v2 2/6] riscv, bpf: Unify 32-bit zero-extension to emit_zextw
  2023-09-19  3:58 [PATCH bpf-next v2 0/6] Zbb support and code simplification for RV64 JIT Pu Lehui
  2023-09-19  3:58 ` [PATCH bpf-next v2 1/6] riscv, bpf: Unify 32-bit sign-extension to emit_sextw Pu Lehui
@ 2023-09-19  3:58 ` Pu Lehui
  2023-09-28 10:46   ` Björn Töpel
  2023-09-19  3:58 ` [PATCH bpf-next v2 3/6] riscv, bpf: Simplify sext and zext logics in branch instructions Pu Lehui
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Pu Lehui @ 2023-09-19  3:58 UTC (permalink / raw)
  To: bpf, linux-riscv, netdev
  Cc: Björn Töpel, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Palmer Dabbelt, Conor Dooley, Luke Nelson, Pu Lehui, Pu Lehui

From: Pu Lehui <pulehui@huawei.com>

For code unification, add emit_zextw wrapper to unify all the 32-bit
zero-extension operations.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
---
 arch/riscv/net/bpf_jit.h        |  6 +++
 arch/riscv/net/bpf_jit_comp64.c | 80 +++++++++++++++------------------
 2 files changed, 43 insertions(+), 43 deletions(-)

diff --git a/arch/riscv/net/bpf_jit.h b/arch/riscv/net/bpf_jit.h
index 03a6ecb43..8e0ef4d08 100644
--- a/arch/riscv/net/bpf_jit.h
+++ b/arch/riscv/net/bpf_jit.h
@@ -1089,6 +1089,12 @@ static inline void emit_sextw(u8 rd, u8 rs, struct rv_jit_context *ctx)
 	emit_addiw(rd, rs, 0, ctx);
 }
 
+static inline void emit_zextw(u8 rd, u8 rs, struct rv_jit_context *ctx)
+{
+	emit_slli(rd, rs, 32, ctx);
+	emit_srli(rd, rd, 32, ctx);
+}
+
 #endif /* __riscv_xlen == 64 */
 
 void bpf_jit_build_prologue(struct rv_jit_context *ctx);
diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
index 8b654c0cf..4a649e195 100644
--- a/arch/riscv/net/bpf_jit_comp64.c
+++ b/arch/riscv/net/bpf_jit_comp64.c
@@ -322,12 +322,6 @@ static void emit_branch(u8 cond, u8 rd, u8 rs, int rvoff,
 	emit(rv_jalr(RV_REG_ZERO, RV_REG_T1, lower), ctx);
 }
 
-static void emit_zext_32(u8 reg, struct rv_jit_context *ctx)
-{
-	emit_slli(reg, reg, 32, ctx);
-	emit_srli(reg, reg, 32, ctx);
-}
-
 static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
 {
 	int tc_ninsn, off, start_insn = ctx->ninsns;
@@ -342,7 +336,7 @@ static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
 	 */
 	tc_ninsn = insn ? ctx->offset[insn] - ctx->offset[insn - 1] :
 		   ctx->offset[0];
-	emit_zext_32(RV_REG_A2, ctx);
+	emit_zextw(RV_REG_A2, RV_REG_A2, ctx);
 
 	off = offsetof(struct bpf_array, map.max_entries);
 	if (is_12b_check(off, insn))
@@ -404,9 +398,9 @@ static void init_regs(u8 *rd, u8 *rs, const struct bpf_insn *insn,
 static void emit_zext_32_rd_rs(u8 *rd, u8 *rs, struct rv_jit_context *ctx)
 {
 	emit_mv(RV_REG_T2, *rd, ctx);
-	emit_zext_32(RV_REG_T2, ctx);
+	emit_zextw(RV_REG_T2, RV_REG_T2, ctx);
 	emit_mv(RV_REG_T1, *rs, ctx);
-	emit_zext_32(RV_REG_T1, ctx);
+	emit_zextw(RV_REG_T1, RV_REG_T1, ctx);
 	*rd = RV_REG_T2;
 	*rs = RV_REG_T1;
 }
@@ -422,8 +416,8 @@ static void emit_sext_32_rd_rs(u8 *rd, u8 *rs, struct rv_jit_context *ctx)
 static void emit_zext_32_rd_t1(u8 *rd, struct rv_jit_context *ctx)
 {
 	emit_mv(RV_REG_T2, *rd, ctx);
-	emit_zext_32(RV_REG_T2, ctx);
-	emit_zext_32(RV_REG_T1, ctx);
+	emit_zextw(RV_REG_T2, RV_REG_T2, ctx);
+	emit_zextw(RV_REG_T1, RV_REG_T2, ctx);
 	*rd = RV_REG_T2;
 }
 
@@ -511,32 +505,32 @@ static void emit_atomic(u8 rd, u8 rs, s16 off, s32 imm, bool is64,
 		emit(is64 ? rv_amoadd_d(rs, rs, rd, 0, 0) :
 		     rv_amoadd_w(rs, rs, rd, 0, 0), ctx);
 		if (!is64)
-			emit_zext_32(rs, ctx);
+			emit_zextw(rs, rs, ctx);
 		break;
 	case BPF_AND | BPF_FETCH:
 		emit(is64 ? rv_amoand_d(rs, rs, rd, 0, 0) :
 		     rv_amoand_w(rs, rs, rd, 0, 0), ctx);
 		if (!is64)
-			emit_zext_32(rs, ctx);
+			emit_zextw(rs, rs, ctx);
 		break;
 	case BPF_OR | BPF_FETCH:
 		emit(is64 ? rv_amoor_d(rs, rs, rd, 0, 0) :
 		     rv_amoor_w(rs, rs, rd, 0, 0), ctx);
 		if (!is64)
-			emit_zext_32(rs, ctx);
+			emit_zextw(rs, rs, ctx);
 		break;
 	case BPF_XOR | BPF_FETCH:
 		emit(is64 ? rv_amoxor_d(rs, rs, rd, 0, 0) :
 		     rv_amoxor_w(rs, rs, rd, 0, 0), ctx);
 		if (!is64)
-			emit_zext_32(rs, ctx);
+			emit_zextw(rs, rs, ctx);
 		break;
 	/* src_reg = atomic_xchg(dst_reg + off16, src_reg); */
 	case BPF_XCHG:
 		emit(is64 ? rv_amoswap_d(rs, rs, rd, 0, 0) :
 		     rv_amoswap_w(rs, rs, rd, 0, 0), ctx);
 		if (!is64)
-			emit_zext_32(rs, ctx);
+			emit_zextw(rs, rs, ctx);
 		break;
 	/* r0 = atomic_cmpxchg(dst_reg + off16, r0, src_reg); */
 	case BPF_CMPXCHG:
@@ -1044,7 +1038,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 	case BPF_ALU64 | BPF_MOV | BPF_X:
 		if (imm == 1) {
 			/* Special mov32 for zext */
-			emit_zext_32(rd, ctx);
+			emit_zextw(rd, rd, ctx);
 			break;
 		}
 		switch (insn->off) {
@@ -1061,7 +1055,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 			break;
 		}
 		if (!is64 && !aux->verifier_zext)
-			emit_zext_32(rd, ctx);
+			emit_zextw(rd, rd, ctx);
 		break;
 
 	/* dst = dst OP src */
@@ -1069,7 +1063,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 	case BPF_ALU64 | BPF_ADD | BPF_X:
 		emit_add(rd, rd, rs, ctx);
 		if (!is64 && !aux->verifier_zext)
-			emit_zext_32(rd, ctx);
+			emit_zextw(rd, rd, ctx);
 		break;
 	case BPF_ALU | BPF_SUB | BPF_X:
 	case BPF_ALU64 | BPF_SUB | BPF_X:
@@ -1079,31 +1073,31 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 			emit_subw(rd, rd, rs, ctx);
 
 		if (!is64 && !aux->verifier_zext)
-			emit_zext_32(rd, ctx);
+			emit_zextw(rd, rd, ctx);
 		break;
 	case BPF_ALU | BPF_AND | BPF_X:
 	case BPF_ALU64 | BPF_AND | BPF_X:
 		emit_and(rd, rd, rs, ctx);
 		if (!is64 && !aux->verifier_zext)
-			emit_zext_32(rd, ctx);
+			emit_zextw(rd, rd, ctx);
 		break;
 	case BPF_ALU | BPF_OR | BPF_X:
 	case BPF_ALU64 | BPF_OR | BPF_X:
 		emit_or(rd, rd, rs, ctx);
 		if (!is64 && !aux->verifier_zext)
-			emit_zext_32(rd, ctx);
+			emit_zextw(rd, rd, ctx);
 		break;
 	case BPF_ALU | BPF_XOR | BPF_X:
 	case BPF_ALU64 | BPF_XOR | BPF_X:
 		emit_xor(rd, rd, rs, ctx);
 		if (!is64 && !aux->verifier_zext)
-			emit_zext_32(rd, ctx);
+			emit_zextw(rd, rd, ctx);
 		break;
 	case BPF_ALU | BPF_MUL | BPF_X:
 	case BPF_ALU64 | BPF_MUL | BPF_X:
 		emit(is64 ? rv_mul(rd, rd, rs) : rv_mulw(rd, rd, rs), ctx);
 		if (!is64 && !aux->verifier_zext)
-			emit_zext_32(rd, ctx);
+			emit_zextw(rd, rd, ctx);
 		break;
 	case BPF_ALU | BPF_DIV | BPF_X:
 	case BPF_ALU64 | BPF_DIV | BPF_X:
@@ -1112,7 +1106,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 		else
 			emit(is64 ? rv_divu(rd, rd, rs) : rv_divuw(rd, rd, rs), ctx);
 		if (!is64 && !aux->verifier_zext)
-			emit_zext_32(rd, ctx);
+			emit_zextw(rd, rd, ctx);
 		break;
 	case BPF_ALU | BPF_MOD | BPF_X:
 	case BPF_ALU64 | BPF_MOD | BPF_X:
@@ -1121,25 +1115,25 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 		else
 			emit(is64 ? rv_remu(rd, rd, rs) : rv_remuw(rd, rd, rs), ctx);
 		if (!is64 && !aux->verifier_zext)
-			emit_zext_32(rd, ctx);
+			emit_zextw(rd, rd, ctx);
 		break;
 	case BPF_ALU | BPF_LSH | BPF_X:
 	case BPF_ALU64 | BPF_LSH | BPF_X:
 		emit(is64 ? rv_sll(rd, rd, rs) : rv_sllw(rd, rd, rs), ctx);
 		if (!is64 && !aux->verifier_zext)
-			emit_zext_32(rd, ctx);
+			emit_zextw(rd, rd, ctx);
 		break;
 	case BPF_ALU | BPF_RSH | BPF_X:
 	case BPF_ALU64 | BPF_RSH | BPF_X:
 		emit(is64 ? rv_srl(rd, rd, rs) : rv_srlw(rd, rd, rs), ctx);
 		if (!is64 && !aux->verifier_zext)
-			emit_zext_32(rd, ctx);
+			emit_zextw(rd, rd, ctx);
 		break;
 	case BPF_ALU | BPF_ARSH | BPF_X:
 	case BPF_ALU64 | BPF_ARSH | BPF_X:
 		emit(is64 ? rv_sra(rd, rd, rs) : rv_sraw(rd, rd, rs), ctx);
 		if (!is64 && !aux->verifier_zext)
-			emit_zext_32(rd, ctx);
+			emit_zextw(rd, rd, ctx);
 		break;
 
 	/* dst = -dst */
@@ -1147,7 +1141,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 	case BPF_ALU64 | BPF_NEG:
 		emit_sub(rd, RV_REG_ZERO, rd, ctx);
 		if (!is64 && !aux->verifier_zext)
-			emit_zext_32(rd, ctx);
+			emit_zextw(rd, rd, ctx);
 		break;
 
 	/* dst = BSWAP##imm(dst) */
@@ -1159,7 +1153,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 			break;
 		case 32:
 			if (!aux->verifier_zext)
-				emit_zext_32(rd, ctx);
+				emit_zextw(rd, rd, ctx);
 			break;
 		case 64:
 			/* Do nothing */
@@ -1221,7 +1215,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 	case BPF_ALU64 | BPF_MOV | BPF_K:
 		emit_imm(rd, imm, ctx);
 		if (!is64 && !aux->verifier_zext)
-			emit_zext_32(rd, ctx);
+			emit_zextw(rd, rd, ctx);
 		break;
 
 	/* dst = dst OP imm */
@@ -1234,7 +1228,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 			emit_add(rd, rd, RV_REG_T1, ctx);
 		}
 		if (!is64 && !aux->verifier_zext)
-			emit_zext_32(rd, ctx);
+			emit_zextw(rd, rd, ctx);
 		break;
 	case BPF_ALU | BPF_SUB | BPF_K:
 	case BPF_ALU64 | BPF_SUB | BPF_K:
@@ -1245,7 +1239,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 			emit_sub(rd, rd, RV_REG_T1, ctx);
 		}
 		if (!is64 && !aux->verifier_zext)
-			emit_zext_32(rd, ctx);
+			emit_zextw(rd, rd, ctx);
 		break;
 	case BPF_ALU | BPF_AND | BPF_K:
 	case BPF_ALU64 | BPF_AND | BPF_K:
@@ -1256,7 +1250,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 			emit_and(rd, rd, RV_REG_T1, ctx);
 		}
 		if (!is64 && !aux->verifier_zext)
-			emit_zext_32(rd, ctx);
+			emit_zextw(rd, rd, ctx);
 		break;
 	case BPF_ALU | BPF_OR | BPF_K:
 	case BPF_ALU64 | BPF_OR | BPF_K:
@@ -1267,7 +1261,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 			emit_or(rd, rd, RV_REG_T1, ctx);
 		}
 		if (!is64 && !aux->verifier_zext)
-			emit_zext_32(rd, ctx);
+			emit_zextw(rd, rd, ctx);
 		break;
 	case BPF_ALU | BPF_XOR | BPF_K:
 	case BPF_ALU64 | BPF_XOR | BPF_K:
@@ -1278,7 +1272,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 			emit_xor(rd, rd, RV_REG_T1, ctx);
 		}
 		if (!is64 && !aux->verifier_zext)
-			emit_zext_32(rd, ctx);
+			emit_zextw(rd, rd, ctx);
 		break;
 	case BPF_ALU | BPF_MUL | BPF_K:
 	case BPF_ALU64 | BPF_MUL | BPF_K:
@@ -1286,7 +1280,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 		emit(is64 ? rv_mul(rd, rd, RV_REG_T1) :
 		     rv_mulw(rd, rd, RV_REG_T1), ctx);
 		if (!is64 && !aux->verifier_zext)
-			emit_zext_32(rd, ctx);
+			emit_zextw(rd, rd, ctx);
 		break;
 	case BPF_ALU | BPF_DIV | BPF_K:
 	case BPF_ALU64 | BPF_DIV | BPF_K:
@@ -1298,7 +1292,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 			emit(is64 ? rv_divu(rd, rd, RV_REG_T1) :
 			     rv_divuw(rd, rd, RV_REG_T1), ctx);
 		if (!is64 && !aux->verifier_zext)
-			emit_zext_32(rd, ctx);
+			emit_zextw(rd, rd, ctx);
 		break;
 	case BPF_ALU | BPF_MOD | BPF_K:
 	case BPF_ALU64 | BPF_MOD | BPF_K:
@@ -1310,14 +1304,14 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 			emit(is64 ? rv_remu(rd, rd, RV_REG_T1) :
 			     rv_remuw(rd, rd, RV_REG_T1), ctx);
 		if (!is64 && !aux->verifier_zext)
-			emit_zext_32(rd, ctx);
+			emit_zextw(rd, rd, ctx);
 		break;
 	case BPF_ALU | BPF_LSH | BPF_K:
 	case BPF_ALU64 | BPF_LSH | BPF_K:
 		emit_slli(rd, rd, imm, ctx);
 
 		if (!is64 && !aux->verifier_zext)
-			emit_zext_32(rd, ctx);
+			emit_zextw(rd, rd, ctx);
 		break;
 	case BPF_ALU | BPF_RSH | BPF_K:
 	case BPF_ALU64 | BPF_RSH | BPF_K:
@@ -1327,7 +1321,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 			emit(rv_srliw(rd, rd, imm), ctx);
 
 		if (!is64 && !aux->verifier_zext)
-			emit_zext_32(rd, ctx);
+			emit_zextw(rd, rd, ctx);
 		break;
 	case BPF_ALU | BPF_ARSH | BPF_K:
 	case BPF_ALU64 | BPF_ARSH | BPF_K:
@@ -1337,7 +1331,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 			emit(rv_sraiw(rd, rd, imm), ctx);
 
 		if (!is64 && !aux->verifier_zext)
-			emit_zext_32(rd, ctx);
+			emit_zextw(rd, rd, ctx);
 		break;
 
 	/* JUMP off */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v2 3/6] riscv, bpf: Simplify sext and zext logics in branch instructions
  2023-09-19  3:58 [PATCH bpf-next v2 0/6] Zbb support and code simplification for RV64 JIT Pu Lehui
  2023-09-19  3:58 ` [PATCH bpf-next v2 1/6] riscv, bpf: Unify 32-bit sign-extension to emit_sextw Pu Lehui
  2023-09-19  3:58 ` [PATCH bpf-next v2 2/6] riscv, bpf: Unify 32-bit zero-extension to emit_zextw Pu Lehui
@ 2023-09-19  3:58 ` Pu Lehui
  2023-09-28 10:55   ` Björn Töpel
  2023-09-19  3:58 ` [PATCH bpf-next v2 4/6] riscv, bpf: Add necessary Zbb instructions Pu Lehui
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Pu Lehui @ 2023-09-19  3:58 UTC (permalink / raw)
  To: bpf, linux-riscv, netdev
  Cc: Björn Töpel, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Palmer Dabbelt, Conor Dooley, Luke Nelson, Pu Lehui, Pu Lehui

From: Pu Lehui <pulehui@huawei.com>

There are many extension helpers in the current branch instructions, and
the implementation is a bit complicated. We simplify this logic through
two simple extension helpers with alternate register.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
---
 arch/riscv/net/bpf_jit_comp64.c | 82 +++++++++++++--------------------
 1 file changed, 31 insertions(+), 51 deletions(-)

diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
index 4a649e195..0c6ffe11a 100644
--- a/arch/riscv/net/bpf_jit_comp64.c
+++ b/arch/riscv/net/bpf_jit_comp64.c
@@ -141,6 +141,19 @@ static bool in_auipc_jalr_range(s64 val)
 		val < ((1L << 31) - (1L << 11));
 }
 
+/* Modify rd pointer to alternate reg to avoid corrupting original reg */
+static void emit_sextw_alt(u8 *rd, u8 ra, struct rv_jit_context *ctx)
+{
+	emit_sextw(ra, *rd, ctx);
+	*rd = ra;
+}
+
+static void emit_zextw_alt(u8 *rd, u8 ra, struct rv_jit_context *ctx)
+{
+	emit_zextw(ra, *rd, ctx);
+	*rd = ra;
+}
+
 /* Emit fixed-length instructions for address */
 static int emit_addr(u8 rd, u64 addr, bool extra_pass, struct rv_jit_context *ctx)
 {
@@ -395,38 +408,6 @@ static void init_regs(u8 *rd, u8 *rs, const struct bpf_insn *insn,
 		*rs = bpf_to_rv_reg(insn->src_reg, ctx);
 }
 
-static void emit_zext_32_rd_rs(u8 *rd, u8 *rs, struct rv_jit_context *ctx)
-{
-	emit_mv(RV_REG_T2, *rd, ctx);
-	emit_zextw(RV_REG_T2, RV_REG_T2, ctx);
-	emit_mv(RV_REG_T1, *rs, ctx);
-	emit_zextw(RV_REG_T1, RV_REG_T1, ctx);
-	*rd = RV_REG_T2;
-	*rs = RV_REG_T1;
-}
-
-static void emit_sext_32_rd_rs(u8 *rd, u8 *rs, struct rv_jit_context *ctx)
-{
-	emit_sextw(RV_REG_T2, *rd, ctx);
-	emit_sextw(RV_REG_T1, *rs, ctx);
-	*rd = RV_REG_T2;
-	*rs = RV_REG_T1;
-}
-
-static void emit_zext_32_rd_t1(u8 *rd, struct rv_jit_context *ctx)
-{
-	emit_mv(RV_REG_T2, *rd, ctx);
-	emit_zextw(RV_REG_T2, RV_REG_T2, ctx);
-	emit_zextw(RV_REG_T1, RV_REG_T2, ctx);
-	*rd = RV_REG_T2;
-}
-
-static void emit_sext_32_rd(u8 *rd, struct rv_jit_context *ctx)
-{
-	emit_sextw(RV_REG_T2, *rd, ctx);
-	*rd = RV_REG_T2;
-}
-
 static int emit_jump_and_link(u8 rd, s64 rvoff, bool fixed_addr,
 			      struct rv_jit_context *ctx)
 {
@@ -1372,22 +1353,22 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 		rvoff = rv_offset(i, off, ctx);
 		if (!is64) {
 			s = ctx->ninsns;
-			if (is_signed_bpf_cond(BPF_OP(code)))
-				emit_sext_32_rd_rs(&rd, &rs, ctx);
-			else
-				emit_zext_32_rd_rs(&rd, &rs, ctx);
+			if (is_signed_bpf_cond(BPF_OP(code))) {
+				emit_sextw_alt(&rs, RV_REG_T1, ctx);
+				emit_sextw_alt(&rd, RV_REG_T2, ctx);
+			} else {
+				emit_zextw_alt(&rs, RV_REG_T1, ctx);
+				emit_zextw_alt(&rd, RV_REG_T2, ctx);
+			}
 			e = ctx->ninsns;
-
 			/* Adjust for extra insns */
 			rvoff -= ninsns_rvoff(e - s);
 		}
-
 		if (BPF_OP(code) == BPF_JSET) {
 			/* Adjust for and */
 			rvoff -= 4;
 			emit_and(RV_REG_T1, rd, rs, ctx);
-			emit_branch(BPF_JNE, RV_REG_T1, RV_REG_ZERO, rvoff,
-				    ctx);
+			emit_branch(BPF_JNE, RV_REG_T1, RV_REG_ZERO, rvoff, ctx);
 		} else {
 			emit_branch(BPF_OP(code), rd, rs, rvoff, ctx);
 		}
@@ -1416,21 +1397,20 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 	case BPF_JMP32 | BPF_JSLE | BPF_K:
 		rvoff = rv_offset(i, off, ctx);
 		s = ctx->ninsns;
-		if (imm) {
+		if (imm)
 			emit_imm(RV_REG_T1, imm, ctx);
-			rs = RV_REG_T1;
-		} else {
-			/* If imm is 0, simply use zero register. */
-			rs = RV_REG_ZERO;
-		}
+		rs = imm ? RV_REG_T1 : RV_REG_ZERO;
 		if (!is64) {
-			if (is_signed_bpf_cond(BPF_OP(code)))
-				emit_sext_32_rd(&rd, ctx);
-			else
-				emit_zext_32_rd_t1(&rd, ctx);
+			if (is_signed_bpf_cond(BPF_OP(code))) {
+				emit_sextw_alt(&rd, RV_REG_T2, ctx);
+				/* rs has been sign extended */
+			} else {
+				emit_zextw_alt(&rd, RV_REG_T2, ctx);
+				if (imm)
+					emit_zextw(rs, rs, ctx);
+			}
 		}
 		e = ctx->ninsns;
-
 		/* Adjust for extra insns */
 		rvoff -= ninsns_rvoff(e - s);
 		emit_branch(BPF_OP(code), rd, rs, rvoff, ctx);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v2 4/6] riscv, bpf: Add necessary Zbb instructions
  2023-09-19  3:58 [PATCH bpf-next v2 0/6] Zbb support and code simplification for RV64 JIT Pu Lehui
                   ` (2 preceding siblings ...)
  2023-09-19  3:58 ` [PATCH bpf-next v2 3/6] riscv, bpf: Simplify sext and zext logics in branch instructions Pu Lehui
@ 2023-09-19  3:58 ` Pu Lehui
  2023-09-19  7:38   ` Conor Dooley
  2023-09-28 11:02   ` Björn Töpel
  2023-09-19  3:58 ` [PATCH bpf-next v2 5/6] riscv, bpf: Optimize sign-extention mov insns with Zbb support Pu Lehui
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 20+ messages in thread
From: Pu Lehui @ 2023-09-19  3:58 UTC (permalink / raw)
  To: bpf, linux-riscv, netdev
  Cc: Björn Töpel, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Palmer Dabbelt, Conor Dooley, Luke Nelson, Pu Lehui, Pu Lehui

From: Pu Lehui <pulehui@huawei.com>

Add necessary Zbb instructions introduced by [0] to reduce code size and
improve performance of RV64 JIT. Meanwhile, a runtime deteted helper is
added to check whether the CPU supports Zbb instructions.

Link: https://github.com/riscv/riscv-bitmanip/releases/download/1.0.0/bitmanip-1.0.0-38-g865e7a7.pdf [0]
Suggested-by: Conor Dooley <conor@kernel.org>
Signed-off-by: Pu Lehui <pulehui@huawei.com>
---
 arch/riscv/net/bpf_jit.h | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/arch/riscv/net/bpf_jit.h b/arch/riscv/net/bpf_jit.h
index 8e0ef4d08..4e24fb2bd 100644
--- a/arch/riscv/net/bpf_jit.h
+++ b/arch/riscv/net/bpf_jit.h
@@ -18,6 +18,11 @@ static inline bool rvc_enabled(void)
 	return IS_ENABLED(CONFIG_RISCV_ISA_C);
 }
 
+static inline bool rvzbb_enabled(void)
+{
+	return IS_ENABLED(CONFIG_RISCV_ISA_ZBB) && riscv_has_extension_likely(RISCV_ISA_EXT_ZBB);
+}
+
 enum {
 	RV_REG_ZERO =	0,	/* The constant value 0 */
 	RV_REG_RA =	1,	/* Return address */
@@ -727,6 +732,27 @@ static inline u16 rvc_swsp(u32 imm8, u8 rs2)
 	return rv_css_insn(0x6, imm, rs2, 0x2);
 }
 
+/* RVZBB instrutions. */
+static inline u32 rvzbb_sextb(u8 rd, u8 rs1)
+{
+	return rv_i_insn(0x604, rs1, 1, rd, 0x13);
+}
+
+static inline u32 rvzbb_sexth(u8 rd, u8 rs1)
+{
+	return rv_i_insn(0x605, rs1, 1, rd, 0x13);
+}
+
+static inline u32 rvzbb_zexth(u8 rd, u8 rs)
+{
+	return rv_i_insn(0x80, rs, 4, rd, __riscv_xlen == 64 ? 0x3b : 0x33);
+}
+
+static inline u32 rvzbb_rev8(u8 rd, u8 rs)
+{
+	return rv_i_insn(__riscv_xlen == 64 ? 0x6b8 : 0x698, rs, 5, rd, 0x13);
+}
+
 /*
  * RV64-only instructions.
  *
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v2 5/6] riscv, bpf: Optimize sign-extention mov insns with Zbb support
  2023-09-19  3:58 [PATCH bpf-next v2 0/6] Zbb support and code simplification for RV64 JIT Pu Lehui
                   ` (3 preceding siblings ...)
  2023-09-19  3:58 ` [PATCH bpf-next v2 4/6] riscv, bpf: Add necessary Zbb instructions Pu Lehui
@ 2023-09-19  3:58 ` Pu Lehui
  2023-09-28 11:04   ` Björn Töpel
  2023-09-19  3:58 ` [PATCH bpf-next v2 6/6] riscv, bpf: Optimize bswap " Pu Lehui
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Pu Lehui @ 2023-09-19  3:58 UTC (permalink / raw)
  To: bpf, linux-riscv, netdev
  Cc: Björn Töpel, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Palmer Dabbelt, Conor Dooley, Luke Nelson, Pu Lehui, Pu Lehui

From: Pu Lehui <pulehui@huawei.com>

Add 8-bit and 16-bit sign-extention wraper with Zbb support to optimize
sign-extension mov instructions.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
---
 arch/riscv/net/bpf_jit.h        | 20 ++++++++++++++++++++
 arch/riscv/net/bpf_jit_comp64.c |  5 +++--
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/net/bpf_jit.h b/arch/riscv/net/bpf_jit.h
index 4e24fb2bd..944bdd6e4 100644
--- a/arch/riscv/net/bpf_jit.h
+++ b/arch/riscv/net/bpf_jit.h
@@ -1110,6 +1110,26 @@ static inline void emit_subw(u8 rd, u8 rs1, u8 rs2, struct rv_jit_context *ctx)
 		emit(rv_subw(rd, rs1, rs2), ctx);
 }
 
+static inline void emit_sextb(u8 rd, u8 rs, struct rv_jit_context *ctx)
+{
+	if (rvzbb_enabled()) {
+		emit(rvzbb_sextb(rd, rs), ctx);
+	} else {
+		emit_slli(rd, rs, 56, ctx);
+		emit_srai(rd, rd, 56, ctx);
+	}
+}
+
+static inline void emit_sexth(u8 rd, u8 rs, struct rv_jit_context *ctx)
+{
+	if (rvzbb_enabled()) {
+		emit(rvzbb_sexth(rd, rs), ctx);
+	} else {
+		emit_slli(rd, rs, 48, ctx);
+		emit_srai(rd, rd, 48, ctx);
+	}
+}
+
 static inline void emit_sextw(u8 rd, u8 rs, struct rv_jit_context *ctx)
 {
 	emit_addiw(rd, rs, 0, ctx);
diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
index 0c6ffe11a..f4ca6b787 100644
--- a/arch/riscv/net/bpf_jit_comp64.c
+++ b/arch/riscv/net/bpf_jit_comp64.c
@@ -1027,9 +1027,10 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 			emit_mv(rd, rs, ctx);
 			break;
 		case 8:
+			emit_sextb(rd, rs, ctx);
+			break;
 		case 16:
-			emit_slli(RV_REG_T1, rs, 64 - insn->off, ctx);
-			emit_srai(rd, RV_REG_T1, 64 - insn->off, ctx);
+			emit_sexth(rd, rs, ctx);
 			break;
 		case 32:
 			emit_sextw(rd, rs, ctx);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v2 6/6] riscv, bpf: Optimize bswap insns with Zbb support
  2023-09-19  3:58 [PATCH bpf-next v2 0/6] Zbb support and code simplification for RV64 JIT Pu Lehui
                   ` (4 preceding siblings ...)
  2023-09-19  3:58 ` [PATCH bpf-next v2 5/6] riscv, bpf: Optimize sign-extention mov insns with Zbb support Pu Lehui
@ 2023-09-19  3:58 ` Pu Lehui
  2023-09-28 11:08   ` Björn Töpel
  2023-09-26 13:30 ` [PATCH bpf-next v2 0/6] Zbb support and code simplification for RV64 JIT Björn Töpel
  2023-09-28 10:44 ` Björn Töpel
  7 siblings, 1 reply; 20+ messages in thread
From: Pu Lehui @ 2023-09-19  3:58 UTC (permalink / raw)
  To: bpf, linux-riscv, netdev
  Cc: Björn Töpel, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Palmer Dabbelt, Conor Dooley, Luke Nelson, Pu Lehui, Pu Lehui

From: Pu Lehui <pulehui@huawei.com>

Optimize bswap instructions by rev8 Zbb instruction conbined with srli
instruction. And Optimize 16-bit zero-extension with Zbb support.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
---
 arch/riscv/net/bpf_jit.h        | 67 +++++++++++++++++++++++++++++++++
 arch/riscv/net/bpf_jit_comp64.c | 50 +-----------------------
 2 files changed, 69 insertions(+), 48 deletions(-)

diff --git a/arch/riscv/net/bpf_jit.h b/arch/riscv/net/bpf_jit.h
index 944bdd6e4..a04eed672 100644
--- a/arch/riscv/net/bpf_jit.h
+++ b/arch/riscv/net/bpf_jit.h
@@ -1135,12 +1135,79 @@ static inline void emit_sextw(u8 rd, u8 rs, struct rv_jit_context *ctx)
 	emit_addiw(rd, rs, 0, ctx);
 }
 
+static inline void emit_zexth(u8 rd, u8 rs, struct rv_jit_context *ctx)
+{
+	if (rvzbb_enabled()) {
+		emit(rvzbb_zexth(rd, rs), ctx);
+	} else {
+		emit_slli(rd, rs, 48, ctx);
+		emit_srli(rd, rd, 48, ctx);
+	}
+}
+
 static inline void emit_zextw(u8 rd, u8 rs, struct rv_jit_context *ctx)
 {
 	emit_slli(rd, rs, 32, ctx);
 	emit_srli(rd, rd, 32, ctx);
 }
 
+static inline void emit_bswap(u8 rd, s32 imm, struct rv_jit_context *ctx)
+{
+	if (rvzbb_enabled()) {
+		int bits = 64 - imm;
+
+		emit(rvzbb_rev8(rd, rd), ctx);
+		if (bits)
+			emit_srli(rd, rd, bits, ctx);
+	} else {
+		emit_li(RV_REG_T2, 0, ctx);
+
+		emit_andi(RV_REG_T1, rd, 0xff, ctx);
+		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
+		emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
+		emit_srli(rd, rd, 8, ctx);
+		if (imm == 16)
+			goto out_be;
+
+		emit_andi(RV_REG_T1, rd, 0xff, ctx);
+		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
+		emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
+		emit_srli(rd, rd, 8, ctx);
+
+		emit_andi(RV_REG_T1, rd, 0xff, ctx);
+		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
+		emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
+		emit_srli(rd, rd, 8, ctx);
+		if (imm == 32)
+			goto out_be;
+
+		emit_andi(RV_REG_T1, rd, 0xff, ctx);
+		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
+		emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
+		emit_srli(rd, rd, 8, ctx);
+
+		emit_andi(RV_REG_T1, rd, 0xff, ctx);
+		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
+		emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
+		emit_srli(rd, rd, 8, ctx);
+
+		emit_andi(RV_REG_T1, rd, 0xff, ctx);
+		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
+		emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
+		emit_srli(rd, rd, 8, ctx);
+
+		emit_andi(RV_REG_T1, rd, 0xff, ctx);
+		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
+		emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
+		emit_srli(rd, rd, 8, ctx);
+out_be:
+		emit_andi(RV_REG_T1, rd, 0xff, ctx);
+		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
+
+		emit_mv(rd, RV_REG_T2, ctx);
+	}
+}
+
 #endif /* __riscv_xlen == 64 */
 
 void bpf_jit_build_prologue(struct rv_jit_context *ctx);
diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
index f4ca6b787..35753b142 100644
--- a/arch/riscv/net/bpf_jit_comp64.c
+++ b/arch/riscv/net/bpf_jit_comp64.c
@@ -1130,8 +1130,7 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 	case BPF_ALU | BPF_END | BPF_FROM_LE:
 		switch (imm) {
 		case 16:
-			emit_slli(rd, rd, 48, ctx);
-			emit_srli(rd, rd, 48, ctx);
+			emit_zexth(rd, rd, ctx);
 			break;
 		case 32:
 			if (!aux->verifier_zext)
@@ -1142,54 +1141,9 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
 			break;
 		}
 		break;
-
 	case BPF_ALU | BPF_END | BPF_FROM_BE:
 	case BPF_ALU64 | BPF_END | BPF_FROM_LE:
-		emit_li(RV_REG_T2, 0, ctx);
-
-		emit_andi(RV_REG_T1, rd, 0xff, ctx);
-		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
-		emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
-		emit_srli(rd, rd, 8, ctx);
-		if (imm == 16)
-			goto out_be;
-
-		emit_andi(RV_REG_T1, rd, 0xff, ctx);
-		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
-		emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
-		emit_srli(rd, rd, 8, ctx);
-
-		emit_andi(RV_REG_T1, rd, 0xff, ctx);
-		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
-		emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
-		emit_srli(rd, rd, 8, ctx);
-		if (imm == 32)
-			goto out_be;
-
-		emit_andi(RV_REG_T1, rd, 0xff, ctx);
-		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
-		emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
-		emit_srli(rd, rd, 8, ctx);
-
-		emit_andi(RV_REG_T1, rd, 0xff, ctx);
-		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
-		emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
-		emit_srli(rd, rd, 8, ctx);
-
-		emit_andi(RV_REG_T1, rd, 0xff, ctx);
-		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
-		emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
-		emit_srli(rd, rd, 8, ctx);
-
-		emit_andi(RV_REG_T1, rd, 0xff, ctx);
-		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
-		emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
-		emit_srli(rd, rd, 8, ctx);
-out_be:
-		emit_andi(RV_REG_T1, rd, 0xff, ctx);
-		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
-
-		emit_mv(rd, RV_REG_T2, ctx);
+		emit_bswap(rd, imm, ctx);
 		break;
 
 	/* dst = imm */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 4/6] riscv, bpf: Add necessary Zbb instructions
  2023-09-19  3:58 ` [PATCH bpf-next v2 4/6] riscv, bpf: Add necessary Zbb instructions Pu Lehui
@ 2023-09-19  7:38   ` Conor Dooley
  2023-09-19  7:43     ` Pu Lehui
  2023-09-28 11:02   ` Björn Töpel
  1 sibling, 1 reply; 20+ messages in thread
From: Conor Dooley @ 2023-09-19  7:38 UTC (permalink / raw)
  To: Pu Lehui
  Cc: bpf, linux-riscv, netdev, Björn Töpel,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Luke Nelson, Pu Lehui

[-- Attachment #1: Type: text/plain, Size: 1202 bytes --]

On Tue, Sep 19, 2023 at 11:58:37AM +0800, Pu Lehui wrote:
> From: Pu Lehui <pulehui@huawei.com>
> 
> Add necessary Zbb instructions introduced by [0] to reduce code size and
> improve performance of RV64 JIT. Meanwhile, a runtime deteted helper is
> added to check whether the CPU supports Zbb instructions.
> 
> Link: https://github.com/riscv/riscv-bitmanip/releases/download/1.0.0/bitmanip-1.0.0-38-g865e7a7.pdf [0]
> Suggested-by: Conor Dooley <conor@kernel.org>

Nah, you can drop this. It was just a review comment :)

> Signed-off-by: Pu Lehui <pulehui@huawei.com>
> ---
>  arch/riscv/net/bpf_jit.h | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/arch/riscv/net/bpf_jit.h b/arch/riscv/net/bpf_jit.h
> index 8e0ef4d08..4e24fb2bd 100644
> --- a/arch/riscv/net/bpf_jit.h
> +++ b/arch/riscv/net/bpf_jit.h
> @@ -18,6 +18,11 @@ static inline bool rvc_enabled(void)
>  	return IS_ENABLED(CONFIG_RISCV_ISA_C);
>  }
>  
> +static inline bool rvzbb_enabled(void)
> +{
> +	return IS_ENABLED(CONFIG_RISCV_ISA_ZBB) && riscv_has_extension_likely(RISCV_ISA_EXT_ZBB);

This looks like it should work, thanks for changing it.

Cheers,
Conor.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 4/6] riscv, bpf: Add necessary Zbb instructions
  2023-09-19  7:38   ` Conor Dooley
@ 2023-09-19  7:43     ` Pu Lehui
  0 siblings, 0 replies; 20+ messages in thread
From: Pu Lehui @ 2023-09-19  7:43 UTC (permalink / raw)
  To: Conor Dooley
  Cc: bpf, linux-riscv, netdev, Björn Töpel,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Luke Nelson, Pu Lehui



On 2023/9/19 15:38, Conor Dooley wrote:
> On Tue, Sep 19, 2023 at 11:58:37AM +0800, Pu Lehui wrote:
>> From: Pu Lehui <pulehui@huawei.com>
>>
>> Add necessary Zbb instructions introduced by [0] to reduce code size and
>> improve performance of RV64 JIT. Meanwhile, a runtime deteted helper is
>> added to check whether the CPU supports Zbb instructions.
>>
>> Link: https://github.com/riscv/riscv-bitmanip/releases/download/1.0.0/bitmanip-1.0.0-38-g865e7a7.pdf [0]
>> Suggested-by: Conor Dooley <conor@kernel.org>
> 
> Nah, you can drop this. It was just a review comment :)
> 
OK, will drop if have next

>> Signed-off-by: Pu Lehui <pulehui@huawei.com>
>> ---
>>   arch/riscv/net/bpf_jit.h | 26 ++++++++++++++++++++++++++
>>   1 file changed, 26 insertions(+)
>>
>> diff --git a/arch/riscv/net/bpf_jit.h b/arch/riscv/net/bpf_jit.h
>> index 8e0ef4d08..4e24fb2bd 100644
>> --- a/arch/riscv/net/bpf_jit.h
>> +++ b/arch/riscv/net/bpf_jit.h
>> @@ -18,6 +18,11 @@ static inline bool rvc_enabled(void)
>>   	return IS_ENABLED(CONFIG_RISCV_ISA_C);
>>   }
>>   
>> +static inline bool rvzbb_enabled(void)
>> +{
>> +	return IS_ENABLED(CONFIG_RISCV_ISA_ZBB) && riscv_has_extension_likely(RISCV_ISA_EXT_ZBB);
> 
> This looks like it should work, thanks for changing it.
> 
> Cheers,
> Conor.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 0/6] Zbb support and code simplification for RV64 JIT
  2023-09-19  3:58 [PATCH bpf-next v2 0/6] Zbb support and code simplification for RV64 JIT Pu Lehui
                   ` (5 preceding siblings ...)
  2023-09-19  3:58 ` [PATCH bpf-next v2 6/6] riscv, bpf: Optimize bswap " Pu Lehui
@ 2023-09-26 13:30 ` Björn Töpel
  2023-09-28 10:44 ` Björn Töpel
  7 siblings, 0 replies; 20+ messages in thread
From: Björn Töpel @ 2023-09-26 13:30 UTC (permalink / raw)
  To: Pu Lehui, bpf, linux-riscv, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Conor Dooley, Luke Nelson, Pu Lehui, Pu Lehui

Pu Lehui <pulehui@huaweicloud.com> writes:

> Add Zbb support [0] to optimize code size and performance of RV64 JIT.
> Meanwhile, adjust the code for unification and simplification. Tests
> test_bpf.ko and test_verifier have passed, as well as the relative
> testcases of test_progs*.

Apologies for the review delay. I'm travelling, and will pick it up ASAP
when I'm back.


Björn

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 0/6] Zbb support and code simplification for RV64 JIT
  2023-09-19  3:58 [PATCH bpf-next v2 0/6] Zbb support and code simplification for RV64 JIT Pu Lehui
                   ` (6 preceding siblings ...)
  2023-09-26 13:30 ` [PATCH bpf-next v2 0/6] Zbb support and code simplification for RV64 JIT Björn Töpel
@ 2023-09-28 10:44 ` Björn Töpel
  2024-01-15 12:22   ` Pu Lehui
  7 siblings, 1 reply; 20+ messages in thread
From: Björn Töpel @ 2023-09-28 10:44 UTC (permalink / raw)
  To: Pu Lehui, bpf, linux-riscv, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Conor Dooley, Luke Nelson, Pu Lehui, Pu Lehui

Pu Lehui <pulehui@huaweicloud.com> writes:

> Add Zbb support [0] to optimize code size and performance of RV64 JIT.
> Meanwhile, adjust the code for unification and simplification. Tests
> test_bpf.ko and test_verifier have passed, as well as the relative
> testcases of test_progs*.

Nice work!

Did you measure how the instruction count changed for, say, test_bpf.ko
and test_progs?


Björn

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 1/6] riscv, bpf: Unify 32-bit sign-extension to emit_sextw
  2023-09-19  3:58 ` [PATCH bpf-next v2 1/6] riscv, bpf: Unify 32-bit sign-extension to emit_sextw Pu Lehui
@ 2023-09-28 10:45   ` Björn Töpel
  0 siblings, 0 replies; 20+ messages in thread
From: Björn Töpel @ 2023-09-28 10:45 UTC (permalink / raw)
  To: Pu Lehui, bpf, linux-riscv, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Conor Dooley, Luke Nelson, Pu Lehui, Pu Lehui

Pu Lehui <pulehui@huaweicloud.com> writes:

> From: Pu Lehui <pulehui@huawei.com>
>
> For code unification, add emit_sextw wrapper to unify all the 32-bit
> sign-extension operations.
>
> Signed-off-by: Pu Lehui <pulehui@huawei.com>

Acked-by: Björn Töpel <bjorn@kernel.org>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 2/6] riscv, bpf: Unify 32-bit zero-extension to emit_zextw
  2023-09-19  3:58 ` [PATCH bpf-next v2 2/6] riscv, bpf: Unify 32-bit zero-extension to emit_zextw Pu Lehui
@ 2023-09-28 10:46   ` Björn Töpel
  0 siblings, 0 replies; 20+ messages in thread
From: Björn Töpel @ 2023-09-28 10:46 UTC (permalink / raw)
  To: Pu Lehui, bpf, linux-riscv, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Conor Dooley, Luke Nelson, Pu Lehui, Pu Lehui

Pu Lehui <pulehui@huaweicloud.com> writes:

> From: Pu Lehui <pulehui@huawei.com>
>
> For code unification, add emit_zextw wrapper to unify all the 32-bit
> zero-extension operations.
>
> Signed-off-by: Pu Lehui <pulehui@huawei.com>

Acked-by: Björn Töpel <bjorn@kernel.org>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 3/6] riscv, bpf: Simplify sext and zext logics in branch instructions
  2023-09-19  3:58 ` [PATCH bpf-next v2 3/6] riscv, bpf: Simplify sext and zext logics in branch instructions Pu Lehui
@ 2023-09-28 10:55   ` Björn Töpel
  0 siblings, 0 replies; 20+ messages in thread
From: Björn Töpel @ 2023-09-28 10:55 UTC (permalink / raw)
  To: Pu Lehui, bpf, linux-riscv, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Conor Dooley, Luke Nelson, Pu Lehui, Pu Lehui

Pu Lehui <pulehui@huaweicloud.com> writes:

> From: Pu Lehui <pulehui@huawei.com>
>
> There are many extension helpers in the current branch instructions, and
> the implementation is a bit complicated. We simplify this logic through
> two simple extension helpers with alternate register.
>
> Signed-off-by: Pu Lehui <pulehui@huawei.com>
> ---
>  arch/riscv/net/bpf_jit_comp64.c | 82 +++++++++++++--------------------
>  1 file changed, 31 insertions(+), 51 deletions(-)
>
> diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
> index 4a649e195..0c6ffe11a 100644
> --- a/arch/riscv/net/bpf_jit_comp64.c
> +++ b/arch/riscv/net/bpf_jit_comp64.c
> @@ -141,6 +141,19 @@ static bool in_auipc_jalr_range(s64 val)
>  		val < ((1L << 31) - (1L << 11));
>  }
>  
> +/* Modify rd pointer to alternate reg to avoid corrupting original reg */
> +static void emit_sextw_alt(u8 *rd, u8 ra, struct rv_jit_context *ctx)
> +{
> +	emit_sextw(ra, *rd, ctx);
> +	*rd = ra;
> +}
> +
> +static void emit_zextw_alt(u8 *rd, u8 ra, struct rv_jit_context *ctx)
> +{
> +	emit_zextw(ra, *rd, ctx);
> +	*rd = ra;
> +}
> +
>  /* Emit fixed-length instructions for address */
>  static int emit_addr(u8 rd, u64 addr, bool extra_pass, struct rv_jit_context *ctx)
>  {
> @@ -395,38 +408,6 @@ static void init_regs(u8 *rd, u8 *rs, const struct bpf_insn *insn,
>  		*rs = bpf_to_rv_reg(insn->src_reg, ctx);
>  }
>  
> -static void emit_zext_32_rd_rs(u8 *rd, u8 *rs, struct rv_jit_context *ctx)
> -{
> -	emit_mv(RV_REG_T2, *rd, ctx);
> -	emit_zextw(RV_REG_T2, RV_REG_T2, ctx);
> -	emit_mv(RV_REG_T1, *rs, ctx);
> -	emit_zextw(RV_REG_T1, RV_REG_T1, ctx);
> -	*rd = RV_REG_T2;
> -	*rs = RV_REG_T1;
> -}
> -
> -static void emit_sext_32_rd_rs(u8 *rd, u8 *rs, struct rv_jit_context *ctx)
> -{
> -	emit_sextw(RV_REG_T2, *rd, ctx);
> -	emit_sextw(RV_REG_T1, *rs, ctx);
> -	*rd = RV_REG_T2;
> -	*rs = RV_REG_T1;
> -}
> -
> -static void emit_zext_32_rd_t1(u8 *rd, struct rv_jit_context *ctx)
> -{
> -	emit_mv(RV_REG_T2, *rd, ctx);
> -	emit_zextw(RV_REG_T2, RV_REG_T2, ctx);
> -	emit_zextw(RV_REG_T1, RV_REG_T2, ctx);
> -	*rd = RV_REG_T2;
> -}
> -
> -static void emit_sext_32_rd(u8 *rd, struct rv_jit_context *ctx)
> -{
> -	emit_sextw(RV_REG_T2, *rd, ctx);
> -	*rd = RV_REG_T2;
> -}
> -
>  static int emit_jump_and_link(u8 rd, s64 rvoff, bool fixed_addr,
>  			      struct rv_jit_context *ctx)
>  {
> @@ -1372,22 +1353,22 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
>  		rvoff = rv_offset(i, off, ctx);
>  		if (!is64) {
>  			s = ctx->ninsns;
> -			if (is_signed_bpf_cond(BPF_OP(code)))
> -				emit_sext_32_rd_rs(&rd, &rs, ctx);
> -			else
> -				emit_zext_32_rd_rs(&rd, &rs, ctx);
> +			if (is_signed_bpf_cond(BPF_OP(code))) {
> +				emit_sextw_alt(&rs, RV_REG_T1, ctx);
> +				emit_sextw_alt(&rd, RV_REG_T2, ctx);
> +			} else {
> +				emit_zextw_alt(&rs, RV_REG_T1, ctx);
> +				emit_zextw_alt(&rd, RV_REG_T2, ctx);
> +			}
>  			e = ctx->ninsns;
> -

Please avoid changes like this.


>  			/* Adjust for extra insns */
>  			rvoff -= ninsns_rvoff(e - s);
>  		}
> -

Dito.

>  		if (BPF_OP(code) == BPF_JSET) {
>  			/* Adjust for and */
>  			rvoff -= 4;
>  			emit_and(RV_REG_T1, rd, rs, ctx);
> -			emit_branch(BPF_JNE, RV_REG_T1, RV_REG_ZERO, rvoff,
> -				    ctx);
> +			emit_branch(BPF_JNE, RV_REG_T1, RV_REG_ZERO, rvoff, ctx);
>  		} else {
>  			emit_branch(BPF_OP(code), rd, rs, rvoff, ctx);
>  		}
> @@ -1416,21 +1397,20 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
>  	case BPF_JMP32 | BPF_JSLE | BPF_K:
>  		rvoff = rv_offset(i, off, ctx);
>  		s = ctx->ninsns;
> -		if (imm) {
> +		if (imm)
>  			emit_imm(RV_REG_T1, imm, ctx);
> -			rs = RV_REG_T1;
> -		} else {
> -			/* If imm is 0, simply use zero register. */
> -			rs = RV_REG_ZERO;
> -		}
> +		rs = imm ? RV_REG_T1 : RV_REG_ZERO;
>  		if (!is64) {
> -			if (is_signed_bpf_cond(BPF_OP(code)))
> -				emit_sext_32_rd(&rd, ctx);
> -			else
> -				emit_zext_32_rd_t1(&rd, ctx);
> +			if (is_signed_bpf_cond(BPF_OP(code))) {
> +				emit_sextw_alt(&rd, RV_REG_T2, ctx);
> +				/* rs has been sign extended */
> +			} else {
> +				emit_zextw_alt(&rd, RV_REG_T2, ctx);
> +				if (imm)
> +					emit_zextw(rs, rs, ctx);
> +			}
>  		}
>  		e = ctx->ninsns;
> -

Dito.

Other than the formatting changes, it looks good!


Björn

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 4/6] riscv, bpf: Add necessary Zbb instructions
  2023-09-19  3:58 ` [PATCH bpf-next v2 4/6] riscv, bpf: Add necessary Zbb instructions Pu Lehui
  2023-09-19  7:38   ` Conor Dooley
@ 2023-09-28 11:02   ` Björn Töpel
  1 sibling, 0 replies; 20+ messages in thread
From: Björn Töpel @ 2023-09-28 11:02 UTC (permalink / raw)
  To: Pu Lehui, bpf, linux-riscv, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Conor Dooley, Luke Nelson, Pu Lehui, Pu Lehui

Pu Lehui <pulehui@huaweicloud.com> writes:

> From: Pu Lehui <pulehui@huawei.com>
>
> Add necessary Zbb instructions introduced by [0] to reduce code size and
> improve performance of RV64 JIT. Meanwhile, a runtime deteted helper is
> added to check whether the CPU supports Zbb instructions.
>
> Link: https://github.com/riscv/riscv-bitmanip/releases/download/1.0.0/bitmanip-1.0.0-38-g865e7a7.pdf [0]
> Suggested-by: Conor Dooley <conor@kernel.org>
> Signed-off-by: Pu Lehui <pulehui@huawei.com>
> ---
>  arch/riscv/net/bpf_jit.h | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
>
> diff --git a/arch/riscv/net/bpf_jit.h b/arch/riscv/net/bpf_jit.h
> index 8e0ef4d08..4e24fb2bd 100644
> --- a/arch/riscv/net/bpf_jit.h
> +++ b/arch/riscv/net/bpf_jit.h
> @@ -18,6 +18,11 @@ static inline bool rvc_enabled(void)
>  	return IS_ENABLED(CONFIG_RISCV_ISA_C);
>  }
>  
> +static inline bool rvzbb_enabled(void)
> +{
> +	return IS_ENABLED(CONFIG_RISCV_ISA_ZBB) && riscv_has_extension_likely(RISCV_ISA_EXT_ZBB);
> +}
> +
>  enum {
>  	RV_REG_ZERO =	0,	/* The constant value 0 */
>  	RV_REG_RA =	1,	/* Return address */
> @@ -727,6 +732,27 @@ static inline u16 rvc_swsp(u32 imm8, u8 rs2)
>  	return rv_css_insn(0x6, imm, rs2, 0x2);
>  }
>  
> +/* RVZBB instrutions. */
> +static inline u32 rvzbb_sextb(u8 rd, u8 rs1)
> +{
> +	return rv_i_insn(0x604, rs1, 1, rd, 0x13);
> +}
> +
> +static inline u32 rvzbb_sexth(u8 rd, u8 rs1)
> +{
> +	return rv_i_insn(0x605, rs1, 1, rd, 0x13);
> +}
> +
> +static inline u32 rvzbb_zexth(u8 rd, u8 rs)
> +{
> +	return rv_i_insn(0x80, rs, 4, rd, __riscv_xlen == 64 ? 0x3b : 0x33);

Encoding funcs are hard to read as it is, so let's try to be a bit more
explicit.

I would prefer a

  |        if (IS_ENABLED(CONFIG_64BIT))
  |                return 64bitvariant
  |         return 32bitvariant

version.

Or a 64-bit only variant elsewhere, since this series is only aimed for
64-bit anyway.

> +}
> +
> +static inline u32 rvzbb_rev8(u8 rd, u8 rs)
> +{
> +	return rv_i_insn(__riscv_xlen == 64 ? 0x6b8 : 0x698, rs, 5, rd, 0x13);

Dito.



Björn

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 5/6] riscv, bpf: Optimize sign-extention mov insns with Zbb support
  2023-09-19  3:58 ` [PATCH bpf-next v2 5/6] riscv, bpf: Optimize sign-extention mov insns with Zbb support Pu Lehui
@ 2023-09-28 11:04   ` Björn Töpel
  0 siblings, 0 replies; 20+ messages in thread
From: Björn Töpel @ 2023-09-28 11:04 UTC (permalink / raw)
  To: Pu Lehui, bpf, linux-riscv, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Conor Dooley, Luke Nelson, Pu Lehui, Pu Lehui

Pu Lehui <pulehui@huaweicloud.com> writes:

> From: Pu Lehui <pulehui@huawei.com>
>
> Add 8-bit and 16-bit sign-extention wraper with Zbb support to optimize
> sign-extension mov instructions.
>
> Signed-off-by: Pu Lehui <pulehui@huawei.com>
> ---
>  arch/riscv/net/bpf_jit.h        | 20 ++++++++++++++++++++
>  arch/riscv/net/bpf_jit_comp64.c |  5 +++--
>  2 files changed, 23 insertions(+), 2 deletions(-)
>
> diff --git a/arch/riscv/net/bpf_jit.h b/arch/riscv/net/bpf_jit.h
> index 4e24fb2bd..944bdd6e4 100644
> --- a/arch/riscv/net/bpf_jit.h
> +++ b/arch/riscv/net/bpf_jit.h
> @@ -1110,6 +1110,26 @@ static inline void emit_subw(u8 rd, u8 rs1, u8 rs2, struct rv_jit_context *ctx)
>  		emit(rv_subw(rd, rs1, rs2), ctx);
>  }
>  
> +static inline void emit_sextb(u8 rd, u8 rs, struct rv_jit_context *ctx)
> +{
> +	if (rvzbb_enabled()) {
> +		emit(rvzbb_sextb(rd, rs), ctx);
> +	} else {
> +		emit_slli(rd, rs, 56, ctx);
> +		emit_srai(rd, rd, 56, ctx);
> +	}
> +}
> +
> +static inline void emit_sexth(u8 rd, u8 rs, struct rv_jit_context *ctx)
> +{
> +	if (rvzbb_enabled()) {
> +		emit(rvzbb_sexth(rd, rs), ctx);
> +	} else {
> +		emit_slli(rd, rs, 48, ctx);
> +		emit_srai(rd, rd, 48, ctx);
> +	}
> +}

Nit/personal style: I really find it easier to read early-exit code,
than nested if-else.

  | if (cond) {
  |   foo();
  |   return;
  | }
  | 
  | bar();

> +
>  static inline void emit_sextw(u8 rd, u8 rs, struct rv_jit_context *ctx)
>  {
>  	emit_addiw(rd, rs, 0, ctx);
> diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
> index 0c6ffe11a..f4ca6b787 100644
> --- a/arch/riscv/net/bpf_jit_comp64.c
> +++ b/arch/riscv/net/bpf_jit_comp64.c
> @@ -1027,9 +1027,10 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx,
>  			emit_mv(rd, rs, ctx);
>  			break;
>  		case 8:
> +			emit_sextb(rd, rs, ctx);
> +			break;
>  		case 16:
> -			emit_slli(RV_REG_T1, rs, 64 - insn->off, ctx);
> -			emit_srai(rd, RV_REG_T1, 64 - insn->off, ctx);
> +			emit_sexth(rd, rs, ctx);

Acked-by: Björn Töpel <bjorn@kernel.org>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 6/6] riscv, bpf: Optimize bswap insns with Zbb support
  2023-09-19  3:58 ` [PATCH bpf-next v2 6/6] riscv, bpf: Optimize bswap " Pu Lehui
@ 2023-09-28 11:08   ` Björn Töpel
  2024-01-15 12:26     ` Pu Lehui
  0 siblings, 1 reply; 20+ messages in thread
From: Björn Töpel @ 2023-09-28 11:08 UTC (permalink / raw)
  To: Pu Lehui, bpf, linux-riscv, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Conor Dooley, Luke Nelson, Pu Lehui, Pu Lehui

Pu Lehui <pulehui@huaweicloud.com> writes:

> From: Pu Lehui <pulehui@huawei.com>
>
> Optimize bswap instructions by rev8 Zbb instruction conbined with srli
> instruction. And Optimize 16-bit zero-extension with Zbb support.
>
> Signed-off-by: Pu Lehui <pulehui@huawei.com>
> ---
>  arch/riscv/net/bpf_jit.h        | 67 +++++++++++++++++++++++++++++++++
>  arch/riscv/net/bpf_jit_comp64.c | 50 +-----------------------
>  2 files changed, 69 insertions(+), 48 deletions(-)
>
> diff --git a/arch/riscv/net/bpf_jit.h b/arch/riscv/net/bpf_jit.h
> index 944bdd6e4..a04eed672 100644
> --- a/arch/riscv/net/bpf_jit.h
> +++ b/arch/riscv/net/bpf_jit.h
> @@ -1135,12 +1135,79 @@ static inline void emit_sextw(u8 rd, u8 rs, struct rv_jit_context *ctx)
>  	emit_addiw(rd, rs, 0, ctx);
>  }
>  
> +static inline void emit_zexth(u8 rd, u8 rs, struct rv_jit_context *ctx)
> +{
> +	if (rvzbb_enabled()) {
> +		emit(rvzbb_zexth(rd, rs), ctx);
> +	} else {
> +		emit_slli(rd, rs, 48, ctx);
> +		emit_srli(rd, rd, 48, ctx);
> +	}
> +}
> +

Prefer early-exit.

>  static inline void emit_zextw(u8 rd, u8 rs, struct rv_jit_context *ctx)
>  {
>  	emit_slli(rd, rs, 32, ctx);
>  	emit_srli(rd, rd, 32, ctx);
>  }
>  
> +static inline void emit_bswap(u8 rd, s32 imm, struct rv_jit_context *ctx)
> +{
> +	if (rvzbb_enabled()) {
> +		int bits = 64 - imm;
> +
> +		emit(rvzbb_rev8(rd, rd), ctx);
> +		if (bits)
> +			emit_srli(rd, rd, bits, ctx);
> +	} else {
> +		emit_li(RV_REG_T2, 0, ctx);
> +
> +		emit_andi(RV_REG_T1, rd, 0xff, ctx);
> +		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
> +		emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
> +		emit_srli(rd, rd, 8, ctx);
> +		if (imm == 16)
> +			goto out_be;
> +
> +		emit_andi(RV_REG_T1, rd, 0xff, ctx);
> +		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
> +		emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
> +		emit_srli(rd, rd, 8, ctx);
> +
> +		emit_andi(RV_REG_T1, rd, 0xff, ctx);
> +		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
> +		emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
> +		emit_srli(rd, rd, 8, ctx);
> +		if (imm == 32)
> +			goto out_be;
> +
> +		emit_andi(RV_REG_T1, rd, 0xff, ctx);
> +		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
> +		emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
> +		emit_srli(rd, rd, 8, ctx);
> +
> +		emit_andi(RV_REG_T1, rd, 0xff, ctx);
> +		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
> +		emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
> +		emit_srli(rd, rd, 8, ctx);
> +
> +		emit_andi(RV_REG_T1, rd, 0xff, ctx);
> +		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
> +		emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
> +		emit_srli(rd, rd, 8, ctx);
> +
> +		emit_andi(RV_REG_T1, rd, 0xff, ctx);
> +		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
> +		emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
> +		emit_srli(rd, rd, 8, ctx);
> +out_be:
> +		emit_andi(RV_REG_T1, rd, 0xff, ctx);
> +		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
> +
> +		emit_mv(rd, RV_REG_T2, ctx);
> +	}
> +}

Definitely early-exit for this one!

This function really show-cases why ZBB is nice! ;-)

I'll take the next rev of series for a test!


Björn

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 0/6] Zbb support and code simplification for RV64 JIT
  2023-09-28 10:44 ` Björn Töpel
@ 2024-01-15 12:22   ` Pu Lehui
  2024-01-16  9:05     ` Björn Töpel
  0 siblings, 1 reply; 20+ messages in thread
From: Pu Lehui @ 2024-01-15 12:22 UTC (permalink / raw)
  To: Björn Töpel, bpf, linux-riscv, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Conor Dooley, Luke Nelson, Pu Lehui



On 2023/9/28 18:44, Björn Töpel wrote:
> Pu Lehui <pulehui@huaweicloud.com> writes:
> 
>> Add Zbb support [0] to optimize code size and performance of RV64 JIT.
>> Meanwhile, adjust the code for unification and simplification. Tests
>> test_bpf.ko and test_verifier have passed, as well as the relative
>> testcases of test_progs*.
> 
> Nice work!
> 
> Did you measure how the instruction count changed for, say, test_bpf.ko
> and test_progs? >

Sorry for not responding for so long.

I made statistics on the number of body instructions and the changes are 
as follows:

test_progs:
1. verifier_movsx: 260 -> 224
2. verifier_bswap: 180 -> 56

test_bpf.ko:
1. MOVSX: 154 -> 146
2. BSWAP: 336 -> 136

We can see that the change in BSWAP is obvious, and the change in MOVSX 
is in line with expectations.

> 
> Björn


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 6/6] riscv, bpf: Optimize bswap insns with Zbb support
  2023-09-28 11:08   ` Björn Töpel
@ 2024-01-15 12:26     ` Pu Lehui
  0 siblings, 0 replies; 20+ messages in thread
From: Pu Lehui @ 2024-01-15 12:26 UTC (permalink / raw)
  To: Björn Töpel, bpf, linux-riscv, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Conor Dooley, Luke Nelson, Pu Lehui



On 2023/9/28 19:08, Björn Töpel wrote:
> Pu Lehui <pulehui@huaweicloud.com> writes:
> 
>> From: Pu Lehui <pulehui@huawei.com>
>>
>> Optimize bswap instructions by rev8 Zbb instruction conbined with srli
>> instruction. And Optimize 16-bit zero-extension with Zbb support.
>>
>> Signed-off-by: Pu Lehui <pulehui@huawei.com>
>> ---
>>   arch/riscv/net/bpf_jit.h        | 67 +++++++++++++++++++++++++++++++++
>>   arch/riscv/net/bpf_jit_comp64.c | 50 +-----------------------
>>   2 files changed, 69 insertions(+), 48 deletions(-)
>>
>> diff --git a/arch/riscv/net/bpf_jit.h b/arch/riscv/net/bpf_jit.h
>> index 944bdd6e4..a04eed672 100644
>> --- a/arch/riscv/net/bpf_jit.h
>> +++ b/arch/riscv/net/bpf_jit.h
>> @@ -1135,12 +1135,79 @@ static inline void emit_sextw(u8 rd, u8 rs, struct rv_jit_context *ctx)
>>   	emit_addiw(rd, rs, 0, ctx);
>>   }
>>   
>> +static inline void emit_zexth(u8 rd, u8 rs, struct rv_jit_context *ctx)
>> +{
>> +	if (rvzbb_enabled()) {
>> +		emit(rvzbb_zexth(rd, rs), ctx);
>> +	} else {
>> +		emit_slli(rd, rs, 48, ctx);
>> +		emit_srli(rd, rd, 48, ctx);
>> +	}
>> +}
>> +
> 
> Prefer early-exit.
> 
>>   static inline void emit_zextw(u8 rd, u8 rs, struct rv_jit_context *ctx)
>>   {
>>   	emit_slli(rd, rs, 32, ctx);
>>   	emit_srli(rd, rd, 32, ctx);
>>   }
>>   
>> +static inline void emit_bswap(u8 rd, s32 imm, struct rv_jit_context *ctx)
>> +{
>> +	if (rvzbb_enabled()) {
>> +		int bits = 64 - imm;
>> +
>> +		emit(rvzbb_rev8(rd, rd), ctx);
>> +		if (bits)
>> +			emit_srli(rd, rd, bits, ctx);
>> +	} else {
>> +		emit_li(RV_REG_T2, 0, ctx);
>> +
>> +		emit_andi(RV_REG_T1, rd, 0xff, ctx);
>> +		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
>> +		emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
>> +		emit_srli(rd, rd, 8, ctx);
>> +		if (imm == 16)
>> +			goto out_be;
>> +
>> +		emit_andi(RV_REG_T1, rd, 0xff, ctx);
>> +		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
>> +		emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
>> +		emit_srli(rd, rd, 8, ctx);
>> +
>> +		emit_andi(RV_REG_T1, rd, 0xff, ctx);
>> +		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
>> +		emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
>> +		emit_srli(rd, rd, 8, ctx);
>> +		if (imm == 32)
>> +			goto out_be;
>> +
>> +		emit_andi(RV_REG_T1, rd, 0xff, ctx);
>> +		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
>> +		emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
>> +		emit_srli(rd, rd, 8, ctx);
>> +
>> +		emit_andi(RV_REG_T1, rd, 0xff, ctx);
>> +		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
>> +		emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
>> +		emit_srli(rd, rd, 8, ctx);
>> +
>> +		emit_andi(RV_REG_T1, rd, 0xff, ctx);
>> +		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
>> +		emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
>> +		emit_srli(rd, rd, 8, ctx);
>> +
>> +		emit_andi(RV_REG_T1, rd, 0xff, ctx);
>> +		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
>> +		emit_slli(RV_REG_T2, RV_REG_T2, 8, ctx);
>> +		emit_srli(rd, rd, 8, ctx);
>> +out_be:
>> +		emit_andi(RV_REG_T1, rd, 0xff, ctx);
>> +		emit_add(RV_REG_T2, RV_REG_T2, RV_REG_T1, ctx);
>> +
>> +		emit_mv(rd, RV_REG_T2, ctx);
>> +	}
>> +}
> 
> Definitely early-exit for this one!
> 
> This function really show-cases why ZBB is nice! ;-)
> 
> I'll take the next rev of series for a test!
> 

Okay, the relevant modifications will be presented in v3 and will be 
sent soon.

> 
> Björn


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v2 0/6] Zbb support and code simplification for RV64 JIT
  2024-01-15 12:22   ` Pu Lehui
@ 2024-01-16  9:05     ` Björn Töpel
  0 siblings, 0 replies; 20+ messages in thread
From: Björn Töpel @ 2024-01-16  9:05 UTC (permalink / raw)
  To: Pu Lehui, bpf, linux-riscv, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Palmer Dabbelt,
	Conor Dooley, Luke Nelson, Pu Lehui

Pu Lehui <pulehui@huaweicloud.com> writes:

> On 2023/9/28 18:44, Björn Töpel wrote:
>> Pu Lehui <pulehui@huaweicloud.com> writes:
>> 
>>> Add Zbb support [0] to optimize code size and performance of RV64 JIT.
>>> Meanwhile, adjust the code for unification and simplification. Tests
>>> test_bpf.ko and test_verifier have passed, as well as the relative
>>> testcases of test_progs*.
>> 
>> Nice work!
>> 
>> Did you measure how the instruction count changed for, say, test_bpf.ko
>> and test_progs? >
>
> Sorry for not responding for so long.

Welcome back!

> I made statistics on the number of body instructions and the changes are 
> as follows:
>
> test_progs:
> 1. verifier_movsx: 260 -> 224
> 2. verifier_bswap: 180 -> 56
>
> test_bpf.ko:
> 1. MOVSX: 154 -> 146
> 2. BSWAP: 336 -> 136
>
> We can see that the change in BSWAP is obvious, and the change in MOVSX 
> is in line with expectations.

Thank you. I'll test/review the v3 during the week!


Cheers,
Björn

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2024-01-16  9:05 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-19  3:58 [PATCH bpf-next v2 0/6] Zbb support and code simplification for RV64 JIT Pu Lehui
2023-09-19  3:58 ` [PATCH bpf-next v2 1/6] riscv, bpf: Unify 32-bit sign-extension to emit_sextw Pu Lehui
2023-09-28 10:45   ` Björn Töpel
2023-09-19  3:58 ` [PATCH bpf-next v2 2/6] riscv, bpf: Unify 32-bit zero-extension to emit_zextw Pu Lehui
2023-09-28 10:46   ` Björn Töpel
2023-09-19  3:58 ` [PATCH bpf-next v2 3/6] riscv, bpf: Simplify sext and zext logics in branch instructions Pu Lehui
2023-09-28 10:55   ` Björn Töpel
2023-09-19  3:58 ` [PATCH bpf-next v2 4/6] riscv, bpf: Add necessary Zbb instructions Pu Lehui
2023-09-19  7:38   ` Conor Dooley
2023-09-19  7:43     ` Pu Lehui
2023-09-28 11:02   ` Björn Töpel
2023-09-19  3:58 ` [PATCH bpf-next v2 5/6] riscv, bpf: Optimize sign-extention mov insns with Zbb support Pu Lehui
2023-09-28 11:04   ` Björn Töpel
2023-09-19  3:58 ` [PATCH bpf-next v2 6/6] riscv, bpf: Optimize bswap " Pu Lehui
2023-09-28 11:08   ` Björn Töpel
2024-01-15 12:26     ` Pu Lehui
2023-09-26 13:30 ` [PATCH bpf-next v2 0/6] Zbb support and code simplification for RV64 JIT Björn Töpel
2023-09-28 10:44 ` Björn Töpel
2024-01-15 12:22   ` Pu Lehui
2024-01-16  9:05     ` Björn Töpel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).